Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread Qingqing Zhou

Alvaro Herrera [EMAIL PROTECTED] writes

 Interesting; do they use an overwriting storage manager like Oracle, or
 a non-overwriting one like Postgres?


They call this MVCC RLV(row level versioning). I think they use rollback
segment like Oracle (a.k.a version store or tempdb in SQL Server).  Some
details are explained in their white paper:Database concurrency and row
level versioning in SQL Server 2005.

Regards,
Qingqing



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread John A Meinel
Qingqing Zhou wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes

Interesting; do they use an overwriting storage manager like Oracle, or
a non-overwriting one like Postgres?



 They call this MVCC RLV(row level versioning). I think they use rollback
 segment like Oracle (a.k.a version store or tempdb in SQL Server).  Some
 details are explained in their white paper:Database concurrency and row
 level versioning in SQL Server 2005.

 Regards,
 Qingqing


I found the paper here:
http://www.microsoft.com/technet/prodtechnol/sql/2005/cncrrncy.mspx

And it does sound like they are doing it the Oracle way:

When a record in a table or index is updated, the new record is stamped
with the transaction sequence_number of the transaction that is doing
the update. The previous version of the record is stored in the version
store, and the new record contains a pointer to the old record in the
version store. Old records in the version store may contain pointers to
even older versions. All the old versions of a particular record are
chained in a linked list, and SQL Server may need to follow several
pointers in a list to reach the right version. Version records need to
be kept in the version store only as long as there are there are
operations that might require them.

John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread Stéphane COEZ
John Arbash Meinel wrote : 
 
 You might also try a different query, something like:
 
 SELECT DISTINCT cod FROM mytable ORDER BY cod GROUP BY cod; 
 (You may or may not want order by, or group by, try the different
 combinations.)
 It might be possible to have the planner realize that all you 
 want is unique rows, just doing a group by doesn't give you that.
 
 John
 =:-
 
Thanks John, but using SELECT DISTINCT with or without Order nor Group by is
worth...
30 sec (with index) - stopped at 200 sec without index...

So Hash Aggregate is much better than index scan ...


 
 Thanks for help.
  
 Stéphane COEZ
 
 
 
 
 ---(end of 
 broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 
   
 
 
 
 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread Stéphane COEZ
 De : Magnus Hagander [mailto:[EMAIL PROTECTED] 
 Out of curiosity, what plan do you get from SQLServer? I bet 
 it's a clustered index scan...
 
 
 //Magnus
 

I have a Table scan and Hashaggregate...
Stephane
 




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread Stéphane COEZ
 
 
 One little thing. Did you shutdown sql2000 while testing 
 postgresql? Remember that postgresql uses system cache. 
 Sql2000 uses a large part of memory as buffer and it will not 
 be available to operating system. I must say that, probably, 
 results will be the same, but it will be a better test.
 

Shutting done SQL2000 has no effect on PG performancies.

Stephane.




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-15 Thread Magnus Hagander
 [very, very offtopic]
 Ok. This comparition is just as useless as the other one, 
 because it's comparing oranges with apples (It's funny 
 anyway). I was just choosing an example in which you can see 
 the best of postgresql against 'not so nice' behavior of 
 mssql2000 (no service pack, it's my desktop system, I'll do 
 the same test later with SP4 and different isolation levels 
 and I'll check results).

There will be no difference in the service packs.
SQL 2005 has MVCC (they call it something different, of course, but
that's basicallyi what it is)

 Furthermore, MSSQL2000 is 5 years 
 old now. Does anybody has the same cellular phone, or 
 computer? (I don't want to know :-) ). The big question is

There is a big difference between your database and your cellphone.
There are a lot of systems out there running very solidly on older
products like MSSQL 7 (probably even some on 6.x), as well as Oracle 7,8
and 9...
I'd say there is generally a huge difference in reliabilty in your
cellphone hw/sw than there is in your db hw/sw. I have yet to see a
cellphone that can run for a year without a reboot (or with a lot of
brands, complete replacement).
 
//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-15 Thread Alvaro Herrera
On Mon, Aug 15, 2005 at 10:25:47AM +0200, Magnus Hagander wrote:

 SQL 2005 has MVCC (they call it something different, of course, but
 that's basicallyi what it is)

Interesting; do they use an overwriting storage manager like Oracle, or
a non-overwriting one like Postgres?

-- 
Alvaro Herrera (alvherre[a]alvh.no-ip.org)
The Postgresql hackers have what I call a NASA space shot mentality.
 Quite refreshing in a world of weekend drag racer developers.
(Scott Marlowe)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread John Arbash Meinel
Stéphane COEZ wrote:

Hi,

I have a perfomance issue :

I run PG (8.0.3) and SQLServer2000 on a Windows2000 Server (P4 1,5Ghz 512Mo)
I have a table (320 rows) and I run this single query :

select cod from mytable group by cod
I have an index on cod (char(4) - 88 different values)

PG = ~ 20 sec.
SQLServer =  8 sec


the explain is :

HashAggregate  (cost=64410.09..64410.09 rows=55 width=8)
  -  Seq Scan on mytable  (cost=0.00..56325.27 rows=3233927 width=8)


if I switch to enable_hashagg = false (just for a try...)
the planner will choose my index :

Group  (cost=0.00..76514.01 rows=55 width=8)
  -  Index Scan using myindex on mytable  (cost=0.00..68429.20 rows=3233927
width=8)

but performance will be comparable to previous test.

So with or without using Index I have the same result.
  


My guess is that this is part of a larger query. There isn't really much
you can do. If you want all 3.2M rows, then you have to wait for them to
be pulled in.

What you generally can do for performance, is to restructure things, so
that you *don't* have to touch all 3.2M rows.
If you are just trying to determine what the unique entries are for cod,
you probably are better off doing some normalization, and keeping a
separate table of cod values.

I'm guessing the reason your query is faster with SQLServer is because
of how postgres handles MVCC. Basically, it still has to fetch the main
page to determine if a row exists. While SQL server doesn't do MVCC, so
it can just look things up in the index.

You might also try a different query, something like:

SELECT DISTINCT cod FROM mytable ORDER BY cod GROUP BY cod;
(You may or may not want order by, or group by, try the different
combinations.)
It might be possible to have the planner realize that all you want is
unique rows, just doing a group by doesn't give you that.

John
=:-


Thanks for help.
 
Stéphane COEZ




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

  





signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread Steinar H. Gunderson
On Sun, Aug 14, 2005 at 07:27:38PM -0500, John Arbash Meinel wrote:
 My guess is that this is part of a larger query. There isn't really much
 you can do. If you want all 3.2M rows, then you have to wait for them to
 be pulled in.

To me, it looks like he'll get 88 rows, not 3.2M. Surely we must be able to
do something better than a full sequential scan in this case?

test=# create table foo ( bar char(4) );
CREATE TABLE
test=# insert into foo values ('');
INSERT 24773320 1
test=# insert into foo values ('');
INSERT 24773321 1
test=# insert into foo values ('');
INSERT 24773322 1
test=# select * from foo group by bar;
 bar  
--
 
 
(2 rows)

I considered doing some odd magic with generate_series() and subqueries with
LIMIT 1, but it was a bit too weird in the end :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread Steinar H. Gunderson
On Sun, Aug 14, 2005 at 07:27:38PM -0500, John Arbash Meinel wrote:
 If you are just trying to determine what the unique entries are for cod,
 you probably are better off doing some normalization, and keeping a
 separate table of cod values.

Pah, I missed this part of the e-mail -- you can ignore most of my (other)
reply, then :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread John Arbash Meinel
Steinar H. Gunderson wrote:

On Sun, Aug 14, 2005 at 07:27:38PM -0500, John Arbash Meinel wrote:
  

My guess is that this is part of a larger query. There isn't really much
you can do. If you want all 3.2M rows, then you have to wait for them to
be pulled in.



To me, it looks like he'll get 88 rows, not 3.2M. Surely we must be able to
do something better than a full sequential scan in this case?

test=# create table foo ( bar char(4) );
CREATE TABLE
test=# insert into foo values ('');
INSERT 24773320 1
test=# insert into foo values ('');
INSERT 24773321 1
test=# insert into foo values ('');
INSERT 24773322 1
test=# select * from foo group by bar;
 bar  
--
 
 
(2 rows)

I considered doing some odd magic with generate_series() and subqueries with
LIMIT 1, but it was a bit too weird in the end :-)

/* Steinar */
  

I think a plain GROUP BY is not smart enough to detect it doesn't need
all rows (since it is generally used because you want to get aggregate
values of other columns).
I think you would want something like SELECT DISTINCT, possibly with an
ORDER BY rather than a GROUP BY (which was my final suggestion).

John
=:-



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread Tom Lane
Steinar H. Gunderson [EMAIL PROTECTED] writes:
 To me, it looks like he'll get 88 rows, not 3.2M. Surely we must be able to
 do something better than a full sequential scan in this case?

Not really.  There's been some speculation about implementing index
skip search --- once you've verified there's at least one visible
row of a given index value, tell the index to skip to the next different
value instead of handing back any of the remaining entries of the
current value.  But it'd be a lot of work and AFAICS not useful for
very many kinds of queries besides this.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread Steinar H. Gunderson
On Sun, Aug 14, 2005 at 09:18:45PM -0400, Tom Lane wrote:
 Not really.  There's been some speculation about implementing index
 skip search --- once you've verified there's at least one visible
 row of a given index value, tell the index to skip to the next different
 value instead of handing back any of the remaining entries of the
 current value.  But it'd be a lot of work and AFAICS not useful for
 very many kinds of queries besides this.

This is probably a completely wrong way of handling it all, but could it be
done in a PL/PgSQL query like this? (Pseudo-code, sort of; I'm not very well
versed in the actual syntax, but I'd guess you get the idea.)

x = ( SELECT foo FROM table ORDER BY foo LIMIT 1 );
WHILE x IS NOT NULL
  RETURN NEXT x;
  x = ( SELECT foo FROM table WHERE foo  x ORDER BY foo LIMIT 1 );
END;

(Replace with max() and min() for 8.1, of course.)

/* Steinar */
- fond of horrible hacks :-)
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread dario_d_s
One little thing. Did you shutdown sql2000 while testing postgresql? Remember 
that postgresql uses system cache. Sql2000 uses a large part of memory as 
buffer and it will not be available to operating system. I must say that, 
probably, results will be the same, but it will be a better test.

 I'm guessing the reason your query is faster with SQLServer is because
 of how postgres handles MVCC. Basically, it still has to fetch the main
 page to determine if a row exists. While SQL server doesn't do MVCC, so
 it can just look things up in the index.

Another thing [almost offtopic]:
I would like to add something to understand what does MVCC means and what are 
the consecuences.
MVCC: multiversion concurrency control. (ehhh...)

Just do this.

Open two psql sessions. Do this:
Session 1:
   begin;
   update any_table set any_column = 'value_a' where other_column = 'value_b'
   -- do not commit
Session 2:
   select any_table where other_column = 'value_b'
   Watch the result.
Session 1:
   commit;
Session 2:
   select any_table where other_column = 'value_b'
   Watch the result.

Now open two session in query analyzer. Do the same thing:
Session 1:
   begin tran
   update any_table set any_column = 'value_a' where other_column = 'value_b'
   -- do not commit
Session 2:
   select any_table where other_column = 'value_b'
   Wait for result.
   Wait... wait... (Oh, a lock! Ok, when you get tired, go back to session 1.)
Session 1:
   commit
Session 2:
   Then watch the result. 

Which one was faster?

[very, very offtopic]
Ok. This comparition is just as useless as the other one, because it's 
comparing oranges with apples (It's funny anyway). I was just choosing an 
example in which you can see the best of postgresql against 'not so nice' 
behavior of mssql2000 (no service pack, it's my desktop system, I'll do the 
same test later with SP4 and different isolation levels and I'll check 
results). Furthermore, MSSQL2000 is 5 years old now. Does anybody has the same 
cellular phone, or computer? (I don't want to know :-) ). The big question is 
'What do you need?'. No system can give you all. That's marketing 'sarasa'.

Sorry for my english and the noise. [End of offtopic]

Long life, little spam and prosperity.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly