Re: [PERFORM] Select performance vs. mssql

2005-07-08 Thread Jochem van Dieten

Enrico Weigelt wrote:

Bruno Wolff III wrote:


This gets brought up a lot. The problem is that the index doesn't include
information about whether the current transaction can see the referenced
row. Putting this information in the index will add significant overhead
to every update and the opinion of the developers is that this would be
a net loss overall.


wouldn't it work well to make this feature optionally for each 
index ? There could be some flag on the index (ie set at create 
time) which tells postgres whether to store mvcc information.


There is no reason to assume it can't work.

There is little reason to assume that it will be the best 
solution in many circumstances.


There is a big reason why people are sceptical: there is no patch.


The issue has been debated and beaten to death. People have 
formed their opinions and are unlikely to change their position. 
If you want to convince people, your best bet is to submit a 
patch and have OSDL measure the performance improvement.


Jochem


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Select performance vs. mssql

2005-07-08 Thread Enrico Weigelt
* Bruno Wolff III <[EMAIL PROTECTED]> wrote:



> This gets brought up a lot. The problem is that the index doesn't include
> information about whether the current transaction can see the referenced
> row. Putting this information in the index will add significant overhead
> to every update and the opinion of the developers is that this would be
> a net loss overall.

wouldn't it work well to make this feature optionally for each 
index ? There could be some flag on the index (ie set at create 
time) which tells postgres whether to store mvcc information.


cu
-- 
-
 Enrico Weigelt==   metux IT service
  phone: +49 36207 519931 www:   http://www.metux.de/
  fax:   +49 36207 519932 email: [EMAIL PROTECTED]
-
  Realtime Forex/Stock Exchange trading powered by postgresSQL :))
http://www.fxignal.net/
-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Select performance vs. mssql

2005-05-29 Thread Jim C. Nasby
On Wed, May 25, 2005 at 09:29:36AM +0800, Christopher Kings-Lynne wrote:
> >--MSSQL's ability to hit the index only and not having
> >to go to the table itself results in a _big_
> >performance/efficiency gain. If someone who's in
> >development wants to pass this along, it would be a
> >nice addition to PostgreSQL sometime in the future.
> >I'd suspect that as well as making one query faster,
> >it would make everything else faster/more scalable as
> >the server load is so much less.
> 
> This is well-known and many databases do it.  However, due to MVCC 
> considerations in PostgreSQL, it's not feasible for us to implement it...

Wasn't there a plan to store some visibility info in indexes? IIRC the
idea was that a bit would be set in the index tuple indicating that all
transactions that wouldn't be able to see that index value were
complete, meaning that there was no reason to hit the heap for that
tuple.

I looked on the TODO but didn't see this, maybe it fell through the
cracks?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread Christopher Kings-Lynne

--As Chris pointed out, how real-world is this test?
His point is valid. The database we're planning will
have a lot of rows and require a lot of summarization
(hence my attempt at a "test"), but we shouldn't be
pulling a million rows at a time.


If you want to do lots of aggregate analysis, I suggest you create a 
sepearate summary table, and create triggers on the main table to 
maintain your summaries in the other table...



--MSSQL's ability to hit the index only and not having
to go to the table itself results in a _big_
performance/efficiency gain. If someone who's in
development wants to pass this along, it would be a
nice addition to PostgreSQL sometime in the future.
I'd suspect that as well as making one query faster,
it would make everything else faster/more scalable as
the server load is so much less.


This is well-known and many databases do it.  However, due to MVCC 
considerations in PostgreSQL, it's not feasible for us to implement it...


Chris

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread John A Meinel

Michael Stone wrote:


On Tue, May 24, 2005 at 04:35:14PM -0700, Josh Berkus wrote:


Pretty much.  There has been discussion about allowing index-only
access to "frozen" tables, i.e. archive partitions.  But it all sort
of hinges on someone implementing it and testing 



Is there any way to expose the planner estimate? For some purposes it's
enough to just give a rough ballpark (e.g., a google-esque "results 1-10
of approximately 1000") so a user knows whether its worth even
starting to page through.

Mike Stone

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Well, you could always do:

EXPLAIN SELECT ...

And then parse out the rows= in the first line.

John
=:->



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread Michael Stone

On Tue, May 24, 2005 at 04:35:14PM -0700, Josh Berkus wrote:
Pretty much.  There has been discussion about allowing index-only access to 
"frozen" tables, i.e. archive partitions.  But it all sort of hinges on 
someone implementing it and testing 


Is there any way to expose the planner estimate? For some purposes it's
enough to just give a rough ballpark (e.g., a google-esque "results 1-10
of approximately 1000") so a user knows whether its worth even
starting to page through.

Mike Stone

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread PFC


Pretty much.  There has been discussion about allowing index-only access  
to

"frozen" tables, i.e. archive partitions.  But it all sort of hinges on
someone implementing it and testing 


	Would be interesting as a parameter to set at index creation (ie. if you  
know this table will have a lot of reads and few writes)... like create an  
index on columns X,Y keeping data on columns X,Y and Z...

But in this case do you still need the table ?
	Or even create a table type where the table and the index are one, like  
an auto-clustered table...

I don't know if it would be used that often, though ;)


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread Josh Berkus
Folks,

> > This gets brought up a lot. The problem is that the
> > index doesn't include
> > information about whether the current transaction
> > can see the referenced
> > row. Putting this information in the index will add
> > significant overhead
> > to every update and the opinion of the developers is
> > that this would be
> > a net loss overall.

Pretty much.  There has been discussion about allowing index-only access to 
"frozen" tables, i.e. archive partitions.  But it all sort of hinges on 
someone implementing it and testing 

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread Alex Turner
Until you start worrying about MVC - we have had problems with the
MSSQL implementation of read consistency because of this 'feature'.

Alex Turner
NetEconomistOn 5/24/05, Bruno Wolff III <[EMAIL PROTECTED]> wrote:
On Tue, May 24, 2005 at 08:36:36 -0700,  mark durrant <[EMAIL PROTECTED]> wrote:>> --MSSQL's ability to hit the index only and not having> to go to the table itself results in a _big_
> performance/efficiency gain. If someone who's in> development wants to pass this along, it would be a> nice addition to PostgreSQL sometime in the future.> I'd suspect that as well as making one query faster,
> it would make everything else faster/more scalable as> the server load is so much less.This gets brought up a lot. The problem is that the index doesn't includeinformation about whether the current transaction can see the referenced
row. Putting this information in the index will add significant overheadto every update and the opinion of the developers is that this would bea net loss overall.---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread mark durrant
I'm far from an expert, so this may be off-base... but
perhaps a suggestion would be to allow a hint to be
sent to the optimizer if the user doesn't care that
the result is "approximate" maybe then this wouldn't
require adding more overhead to the indexes.

MSSQL has something like this with (nolock) 
i.e. select count(*) from customers (nolock) where
name like 'Mark%' 

Regardless, I'm very impressed with PostgreSQL and I
think we're moving ahead with it.

Mark

--- Bruno Wolff III <[EMAIL PROTECTED]> wrote:
> On Tue, May 24, 2005 at 08:36:36 -0700,
>   mark durrant <[EMAIL PROTECTED]> wrote:
> > 
> > --MSSQL's ability to hit the index only and not
> having
> > to go to the table itself results in a _big_
> > performance/efficiency gain. If someone who's in
> > development wants to pass this along, it would be
> a
> > nice addition to PostgreSQL sometime in the
> future.
> > I'd suspect that as well as making one query
> faster,
> > it would make everything else faster/more scalable
> as
> > the server load is so much less.
> 
> This gets brought up a lot. The problem is that the
> index doesn't include
> information about whether the current transaction
> can see the referenced
> row. Putting this information in the index will add
> significant overhead
> to every update and the opinion of the developers is
> that this would be
> a net loss overall.



__ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new Resources site
http://smallbusiness.yahoo.com/resources/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread Bruno Wolff III
On Tue, May 24, 2005 at 08:36:36 -0700,
  mark durrant <[EMAIL PROTECTED]> wrote:
> 
> --MSSQL's ability to hit the index only and not having
> to go to the table itself results in a _big_
> performance/efficiency gain. If someone who's in
> development wants to pass this along, it would be a
> nice addition to PostgreSQL sometime in the future.
> I'd suspect that as well as making one query faster,
> it would make everything else faster/more scalable as
> the server load is so much less.

This gets brought up a lot. The problem is that the index doesn't include
information about whether the current transaction can see the referenced
row. Putting this information in the index will add significant overhead
to every update and the opinion of the developers is that this would be
a net loss overall.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread mark durrant
First, thanks for all the helpful replies. I've
listened to the suggestions and done some more digging
and have results:

I did show_plan_all in MSSQL and found that it was
doing an Index Scan. I've read someplace that if the
data you need is all in the index, then MSSQL has a
feature/hack where it does not have to go to the
table, it can do my COUNT using the index alone. I
think this explains the 1 second query performance.

I changed the query to also include the other column
which is not indexed. The results were MSSQL now used
a TableScan and was MUCH slower than PostgreSQL. 

I clustered the index on MSSQL and PostgreSQL and
increased buffers to 15000 on PGSQL. I saw a
noticeable performance increase on both. On the more
complicated query, PostgreSQL is now 3.5 seconds.
MSSQL is faster again doing an index scan and is at 2
seconds. Remember the MSSQL machine has a slower CPU
as well.

My interpretations:

--Given having to do a table scan, PostgreSQL seems to
be faster. The hardware on my PostrgreSQL machine is
nicer than the MSSQL one, so perhaps they are just
about the same speed with speed determined by the
disk.

--Tuning helps. Clustered index cut my query time
down. More buffers helped. 

--As Chris pointed out, how real-world is this test?
His point is valid. The database we're planning will
have a lot of rows and require a lot of summarization
(hence my attempt at a "test"), but we shouldn't be
pulling a million rows at a time.

--MSSQL's ability to hit the index only and not having
to go to the table itself results in a _big_
performance/efficiency gain. If someone who's in
development wants to pass this along, it would be a
nice addition to PostgreSQL sometime in the future.
I'd suspect that as well as making one query faster,
it would make everything else faster/more scalable as
the server load is so much less.

Thanks again,

Mark



__ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new Resources site
http://smallbusiness.yahoo.com/resources/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread Harald Lau (Sector-X)
Mark,

> MSSQL Machine:
> That "Explain Analyze" command doesn't work for MSSQL,

try this:
set showplan_all on
go
select ...
go

Harald

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Select performance vs. mssql

2005-05-23 Thread Neil Conway

mark durrant wrote:

PostgreSQL Machine:
"Aggregate  (cost=140122.56..140122.56 rows=1 width=0)
(actual time=24516.000..24516.000 rows=1 loops=1)"
"  ->  Index Scan using "day" on mtable 
(cost=0.00..140035.06 rows=35000 width=0) (actual

time=47.000..21841.000 rows=1166025 loops=1)"
"Index Cond: ("day" = 'Mon'::bpchar)"
"Total runtime: 24516.000 ms"


Have you run ANALYZE?

Clustering the table on the "day" index (via the CLUSTER command) would 
be worth trying.


-Neil

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Select performance vs. mssql

2005-05-23 Thread mark durrant
> Post the result of this for us:
> 
> explain analyze select count(*) from mtable where
> day='Mon';
> 
> On both machines.

Hi Chris --

PostgreSQL Machine:
"Aggregate  (cost=140122.56..140122.56 rows=1 width=0)
(actual time=24516.000..24516.000 rows=1 loops=1)"
"  ->  Index Scan using "day" on mtable 
(cost=0.00..140035.06 rows=35000 width=0) (actual
time=47.000..21841.000 rows=1166025 loops=1)"
"Index Cond: ("day" = 'Mon'::bpchar)"
"Total runtime: 24516.000 ms"
(Note this took 24 seconds after fresh reboot, next
execution was 11, and execution without explain
analyze was 6.7 seconds)

MSSQL Machine:
That "Explain Analyze" command doesn't work for MSSQL,
but I did view the Query plan. 97% of it was "Scanning
a particular range of rows from a nonclustered index"

Thanks for your help --Mark

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Select performance vs. mssql

2005-05-23 Thread Christopher Kings-Lynne



select count(*) from mtable where day='Mon'

Results:

1. P3 600 512MB RAM MSSQL. It takes about 4-5 secs to
run. If I run a few queries and everything is cached,
it is sometimes  just 1 second.

2. Athlon 1.3 Ghz 1GB RAM. PostgreSQL takes 7 seconds.
I have played with the buffers setting and currently
have it at 7500. At 2 it took over 20 seconds to
run.

5 seconds vs 7 isn't that big of a deal, but 1 second
vs 7 seconds is. Also, the slower performance is with
much lesser hardware.


Post the result of this for us:

explain analyze select count(*) from mtable where day='Mon';

On both machines.

Chris

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings