Re: [PERFORM] Insert only tables and vacuum performance

2004-04-30 Thread Joseph Shraibman
Tom Lane wrote:
Joseph Shraibman [EMAIL PROTECTED] writes:
I have a table that is never updated, only INSERTED into. Is there a way 
I can prevent vacuum wasting time on this table

What makes you think vacuum is wasting much time on this table?  AFAICS
it will only update any unfixed hint bits ...
			regards, tom lane
INFO:  elog: found 0 removable, 12869411 nonremovable row versions in 
196195 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 5 unused item pointers.
0 pages are entirely empty.
CPU 31.61s/4.53u sec elapsed 1096.83 sec.

It took 1096.83 seconds, and what did it accomplish?  And what are hint 
bits?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] planner/optimizer question

2004-04-30 Thread Gary Doades
On 29 Apr 2004 at 19:17, Tom Lane wrote:

 Josh Berkus [EMAIL PROTECTED] writes:
  Certainly the fact that MSSQL is essentially a single-user database makes 
  things easier for them.
 
 Our recent testing (cf the Xeon thread) says that the interlocking we
 do to make the world safe for multiple backends has a fairly high cost
 (at least on some hardware) compared to the rest of the work in
 scenarios where you are doing zero-I/O scans of data already in memory.
 Especially so for index scans.  I'm not sure this completely explains
 the differential that Gary is complaining about, but it could be part of
 it.  Is it really true that MSSQL doesn't support concurrent operations?
 
   regards, tom lane

As far as I am aware SQLSever supports concurrent operations. It 
certainly creates more threads for each connection. None of my 
observations of the system under load (50 ish concurrent users, 150 ish 
connections) suggest that it is serializing queries.

These tests are currentl on single processor Athlon XP 2000+ systems.

Regards,
Gary.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] planner/optimizer question

2004-04-30 Thread Gary Doades
On 30 Apr 2004 at 7:26, Dennis Bjorklund wrote:

 On Fri, 30 Apr 2004, Gary Doades wrote:

  I should have also pointed out that MSSQL reported that same index scan
  as taking 65% of the overall query time. It was just faster. The
  overall query took 103ms in MSSQL.

 Are your results based on a single client accessing the database and no
 concurrent updates?

 Would adding more clients, and maybe having some client that
 updates/inserts into the tables, still make mssql faster then pg? Maybe
 it's so simple as pg being optimized for more concurrent users then mssql?

 I'm just asking, I don't know much about the inner workings of
 mssql.

 --
 /Dennis Björklund


At the moment it is difficult to set up many clients for testing concurrent
stuff. In the past I have had several SQLServer clients under test,
mainly select queries. MSSQL can certainly execute queries while other
queries are still running in the background.

Our production app is fairly well biased towards selects. Currently it is
about 70% selects, 20% inserts, 6% deletes and 4% updates. Very few
updates are more than one row based on the primary key. Over 90% of
the time spend running SQL is in select queries.

My limited concurrent testing on Postgres gives very good performance
on updates, inserts, deletes, but it is suffering on the selects in certain
areas which why I have been concentrating my efforts on that area.

Having got similar (or the same) access plans in both Postgres and
MSSQL I was getting down to the next level of checking what was going
on when executing the already planned query.

I do have another database system I could try. Sybase SQLAnywhere.
This is not the original Sybase Entrerprise which has the same roots as
MSSQL. In the past my testing suggested that SQLAnywhere
performance was as godd or better than MSSQL. I mey try to set it up
with the same data in these tests for a more detailed comparison.

Regards,
Gary.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] analyzer/planner and clustered rows

2004-04-30 Thread Manfred Koizar
On Thu, 29 Apr 2004 19:09:09 -0400, Joseph Shraibman
[EMAIL PROTECTED] wrote:
How does the analyzer/planner deal with rows clustered together?

There's a correlation value per column.  Just try

SELECT attname, correlation
  FROM pg_stats
 WHERE tablename = '...';

if you are interested.  It indicates how well the hypothetical order of
tuples if sorted by that column corresponds to the physical order.  +1.0
is perfect correlation, 0.0 is totally chaotic, -1.0 means reverse
order.  The optimizer is more willing to choose an index scan if
correlation for the first index column is near +/-1.

  What if the data in the table happens to be close 
together because it was inserted together originally?

Having equal values close to each other is not enough, the values should
be increasing, too.  Compare

5 5 5 4 4 4 7 7 7 2 2 2 6 6 6 3 3 3 8 8 8   low correlation
and
2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8   correlation = 1.0


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] planner/optimizer question

2004-04-30 Thread Jochem van Dieten
Manfred Koizar wrote:
On Wed, 28 Apr 2004 09:05:04 -0400, Tom Lane [EMAIL PROTECTED] wrote:
[ ... visibility information in index tuples ... ]
Storing that information would at least double the overhead space used
for each index tuple.  The resulting index bloat would significantly
slow index operations by requiring more I/O.  So it's far from clear
that this would be a win, even for those who care only about select
speed.
While the storage overhead could be reduced to 1 bit (not a joke)
You mean adding an isLossy bit and only where it is set the head 
tuple has to be checked for visibility, if it is not set the head 
tuple does not have to be checked?


we'd
still have the I/O overhead of locating and updating index tuples for
every heap tuple deleted/updated.
Would there be additional I/O for the additional bit in the index 
tuple (I am unable to find the layout of index tuple headers in 
the docs)?

Jochem
--
I don't get it
immigrants don't work
and steal our jobs
- Loesje

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] planner/optimizer question

2004-04-30 Thread Gary Doades
On 30 Apr 2004 at 8:32, Jeff wrote:
 
   A better comparision query may be a simple select a from mytable 
 where a between foo and bar  to get an index scan.  In that case its a 
 straight up, vanilla index scan.  Nothing else getting in the way.
 

Yes, you're right and I have done this just to prove to myself that it is the index 
scan that 
is the bottleneck. I have some complex SQL that executes very quickly with Postgres, 
similar to MSSQL, but the index scans in most of those only touch a few rows for a few 
loops. It seems to be a problem when the index scan is scanning very many rows and 
for each of these it has to go to the table just to find out if the index it just 
looked at is 
still valid.

Gary.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] planner/optimizer question

2004-04-30 Thread Gary Doades
On 30 Apr 2004 at 9:37, Kevin Barnard wrote:

 
 I was always under the impression that MSSQL used leaf and row level locking and 
 therefore 
 was not a concurrent, in the same sense that postgres is, database. It would still 
 allow for 
 concurrent connections and such but updates will get blocked/ delayed. I might be 
 wrong.
 

Ultimately you may be right. I don't know enough about SQLServer 
internals to say either way. Anyway, most of our system is in selects for 
70% of the time. I could try and set up a test for this when I get a bit 
more time.

Unfortunately I suspect that this topic won't get taken much further. In 
order to test this it would mean modifying quite a bit of code. Whether 
putting additional info in the index header and not visiting the data row 
if all the required data is in the index would be beneficial would require 
quite a bit of work by someone who knows more than I do. I reckon that 
no-one has the time to do this at the moment.

Regards,
Gary.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] planner/optimizer question

2004-04-30 Thread Kris Jurka


On Fri, 30 Apr 2004, Gary Doades wrote:

 Yes, you're right and I have done this just to prove to myself that it
 is the index scan that is the bottleneck. I have some complex SQL that
 executes very quickly with Postgres, similar to MSSQL, but the index
 scans in most of those only touch a few rows for a few loops. It seems
 to be a problem when the index scan is scanning very many rows and for
 each of these it has to go to the table just to find out if the index it
 just looked at is still valid.
 

Another way to speed this up is the TODO item: Use bitmaps to fetch 
heap pages in sequential order  For an indexscan that fetches a number 
of rows those rows may be anywhere in the base table so as each index 
entry is found it fetches the corresponding table row from the heap.  This 
is not ideal because you can be jumping around in the heap and end up 
fetching the same page multiple times because table rows are in the same 
page, but were found in different places in the index.

Kris Jurka

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] planner/optimizer question

2004-04-30 Thread Manfred Koizar
On Fri, 30 Apr 2004 19:46:24 +0200, Jochem van Dieten
[EMAIL PROTECTED] wrote:
 While the storage overhead could be reduced to 1 bit (not a joke)

You mean adding an isLossy bit and only where it is set the head 
tuple has to be checked for visibility, if it is not set the head 
tuple does not have to be checked?

Yes, something like this.  Actually I imagined it the other way round: a
visible-to-all flag similar to the existing dead-to-all flag (search for
LP_DELETE and ItemIdDeleted in nbtree.c).

 we'd
 still have the I/O overhead of locating and updating index tuples for
 every heap tuple deleted/updated.

Would there be additional I/O for the additional bit in the index 
tuple (I am unable to find the layout of index tuple headers in 
the docs)?

Yes, the visible-to-all flag would be set as a by-product of an index
scan, if the heap tuple is found to be visible to all active
transactions.  This update is non-critical and, I think, not very
expensive.

Deleting (and hence updating) a tuple is more critical, regarding both
consistency and performance.  We'd have to locate all index entries
pointing to the heap tuple and set their visible-to-all flags to false.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-30 Thread Robert Creager
When grilled further on (Thu, 29 Apr 2004 11:21:51 -0700),
Josh Berkus [EMAIL PROTECTED] confessed:

 spins_per_delay was not beneficial.   Instead, try increasing them, one step 
 at a time:
 
 (take baseline measurement at 100)
 250
 500
 1000
 1500
 2000
 3000
 5000
 
 ... until you find an optimal level.   Then report the results to us!
 

Some results.  The patch mentioned is what Dave Cramer posted to the Performance
list on 4/21.

A Perl script monitored vmstat 1 for 120 seconds and generated max and average
values.  Unfortunately, I am not present on site, so I cannot physically change
the device under test to increase the db load to where it hit about 10 days ago.
 That will have to wait till the 'real' work week on Monday.

Context switches -  avgmax

Default 7.4.1 code :   10665  69470
Default patch - 10 :   17297  21929
patch at 100   :   26825  87073
patch at 1000  :   37580 110849

Now granted, the db isn't showing the CS swap problem in a bad way (at all), but
should the numbers be trending the way they are with the patched code?  Or will
these numbers potentially change dramatically when I can load up the db?

And, presuming I can re-produce what I was seeing previously (200K CS/s), you
folks want me to carry on with more testing of the patch and report the results?
 Or just go away and be quiet...

The information is provided from a HP Proliant DL380 G3 with 2x 2.4 GHZ Xenon's
(with HT enabled) 2 GB ram, running 2.4.22-26mdkenterprise kernel, RAID
controller w/128 Mb battery backed cache RAID 1 on 2x 15K RPM drives for WAL
drive, RAID 0+1 on 4x 10K RPM drives for data.  The only job this box has is
running this db.

Cheers,
Rob

-- 
 21:54:48 up 2 days,  4:39,  4 users,  load average: 2.00, 2.03, 2.00
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004


pgp88T6PR5F9b.pgp
Description: PGP signature