Re: [PERFORM] I/O on select count(*)

2008-05-15 Thread Robert Lor

Tom Lane wrote:

It's certainly true that hint-bit updates cost something, but
quantifying how much isn't easy. 
Maybe we can instrument the code with DTrace probes to quantify the 
actual costs.  I'm not familiar with the code, but if I know where to 
place the probes, I can easily do a quick test and provide the data.

 The off-the-cuff answer is to do the
select count(*) twice and see how much cheaper the second one is. 
Doesn't seem the second run is cheaper as shown in the results below. 
The data came from the probes I've added recently.


*** Run #1 **
SQL Statement  : select count(*) from accounts;
Execution time : 1086.58 (ms)

 Buffer Read Counts 
Tablespace   Database  Table  Count
 1663  16384   1247  1
 1663  16384   2600  1
 1663  16384   2703  1
 1663  16384   1255  2
 1663  16384   2650  2
 1663  16384   2690  3
 1663  16384   2691  3
 1663  16384  16397   8390

 Dirty Buffer Write Counts =
Tablespace   Database  Table  Count
 1663  16384  16397   2865

Total buffer cache hits  :  1932
Total buffer cache misses:  6471
Average read time from cache :  5638 (ns)
Average read time from disk  :143371 (ns)
Average write time to disk   : 20368 (ns)


*** Run #2 **
SQL Statement  : select count(*) from accounts;
Execution time : 1115.94 (ms)

 Buffer Read Counts 
Tablespace   Database  Table  Count
 1663  16384  16397   8390

 Dirty Buffer Write Counts =
Tablespace   Database  Table  Count
 1663  16384  16397   2865

Total buffer cache hits  :  1931
Total buffer cache misses:  6459
Average read time from cache :  4357 (ns)
Average read time from disk  :154127 (ns)
Average write time to disk   : 20368 (ns)


-Robert

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O on select count(*)

2008-05-15 Thread Robert Lor

Tom Lane wrote:

Hmm, the problem would be trying to figure out what percentage of writes
could be blamed solely on hint-bit updates and not any other change to
the page.  I don't think that the bufmgr currently keeps enough state to
know that, but you could probably modify it easily enough, since callers
distinguish MarkBufferDirty from SetBufferCommitInfoNeedsSave.  Define
another flag bit that's set only by the first, and test it during
write-out.
  
Ok, I made a few changes to bufmgr per my understanding of your 
description above and with my limited understanding of the code. Patch 
is attached.


Assuming the patch is correct, the effect of writes due to hint bits is 
quite significant. I collected the data below by runing pgbench in one 
terminal and psql on another to run the query.


Is the data plausible?

-Robert

--


Backend PID: 16189
SQL Statement  : select count(*) from accounts;
Execution time : 17.33 sec

 Buffer Read Counts 
Tablespace   Database  Table  Count
 1663  16384   2600  1
 1663  16384   2601  1
 1663  16384   2615  1
 1663  16384   1255  2
 1663  16384   2602  2
 1663  16384   2603  2
 1663  16384   2616  2
 1663  16384   2650  2
 1663  16384   2678  2
 1663  16384   1247  3
 1663  16384   1249  3
 1663  16384   2610  3
 1663  16384   2655  3
 1663  16384   2679  3
 1663  16384   2684  3
 1663  16384   2687  3
 1663  16384   2690  3
 1663  16384   2691  3
 1663  16384   2703  4
 1663  16384   1259  5
 1663  16384   2653  5
 1663  16384   2662  5
 1663  16384   2663  5
 1663  16384   2659  7
 1663  16384  16397   8390

 Dirty Buffer Write Counts =
Tablespace   Database  Table  Count
 1663  16384  16402  2
 1663  16384  16394 11
 1663  16384  16397   4771

== Hint Bits Write Counts ==
Tablespace   Database  Table  Count
 1663  16384  16397   4508

Total buffer cache hits  :   732
Total buffer cache misses:  7731
Average read time from cache :  9136 (ns)
Average read time from disk  :384201 (ns)
Average write time to disk   :210709 (ns)


Backend PID: 16189
SQL Statement  : select count(*) from accounts;
Execution time : 12.72 sec

 Buffer Read Counts 
Tablespace   Database  Table  Count
 1663  16384  16397   8392

 Dirty Buffer Write Counts =
Tablespace   Database  Table  Count
 1663  16384  16394  6
 1663  16384  16402  7
 1663  16384  16397   2870

== Hint Bits Write Counts ==
Tablespace   Database  Table  Count
 1663  16384  16402  2
 1663  16384  16397   2010

Total buffer cache hits  :   606
Total buffer cache misses:  7786
Average read time from cache :  6949 (ns)
Average read time from disk  :706288 (ns)
Average write time to disk   : 90426 (ns)

Index: bufmgr.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.228
diff -u -3 -p -r1.228 bufmgr.c
--- bufmgr.c	1 Jan 2008 19:45:51 -	1.228
+++ bufmgr.c	15 May 2008 20:56:38 -
@@ -42,6 +42,7 @@
 #include storage/smgr.h
 #include utils/resowner.h
 #include pgstat.h
+#include pg_trace.h
 
 
 /* Note: these two macros only work on shared buffers, not local ones! */
@@ -171,6 +172,7 @@ ReadBuffer_common(Relation reln, BlockNu
 	if (isExtend)
 		blockNum = smgrnblocks(reln-rd_smgr);
 
+	TRACE_POSTGRESQL_BUFFER_READ_START(blockNum, reln-rd_node.spcNode, reln-rd_node.dbNode, reln-rd_node.relNode, isLocalBuf);
 	pgstat_count_buffer_read(reln);
 
 	if (isLocalBuf)
@@ -200,12 +202,16 @@ ReadBuffer_common(Relation reln, BlockNu
 	{
 		if (!isExtend)
 		{
+			TRACE_POSTGRESQL_BUFFER_HIT();
 			/* Just need to update stats before we exit */
 			pgstat_count_buffer_hit(reln);
 
 			if (VacuumCostActive)
 VacuumCostBalance += VacuumCostPageHit;
 
+			TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum,
+ reln-rd_node.spcNode, reln-rd_node.dbNode,
+ reln-rd_node.relNode, isLocalBuf, found);
 			return BufferDescriptorGetBuffer(bufHdr);
 		}
 
@@ -257,6 +263,7 @@ ReadBuffer_common(Relation reln, BlockNu
 			} while (!StartBufferIO(bufHdr, true));
 		}
 	}
+	TRACE_POSTGRESQL_BUFFER_MISS();
 
 	/*
 	 * if we have gotten to this 

Re: [PERFORM] User concurrency thresholding: where do I look?

2007-07-26 Thread Robert Lor

Tom Lane wrote:


That path would be taking CLogControlLock ... so you're off by at least
one.  Compare the script to src/include/storage/lwlock.h.



Indeed, the indexing was off by one due to the removal of 
BuffMappingLock in src/include/storage/lwlock.h between 8.1 and 8.2 
which was not updated in the DTrace script.


Thanks,
Robert


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-07-23 Thread Robert Lor

Tom Lane wrote:


Yeah, those seem plausible, although the hold time for
CheckpointStartLock seems awfully high --- about 20 msec
per transaction.  Are you using a nonzero commit_delay?


 


I didn't change commit_delay which defaults to zero.


Regards,
-Robert


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-07-23 Thread Robert Lor

Tom Lane wrote:


Hmmm ... AFAICS this must mean that flushing the WAL data to disk
at transaction commit time takes (most of) 20 msec on your hardware.
Which still seems high --- on most modern disks that'd be at least two
disk revolutions, maybe more.  What's the disk hardware you're testing
on, particularly its RPM spec?
 

I actually ran the test on my laptop. It has an Ultra ATA/100 drive 
(5400 rpm). The test was just a quickie to show some data from the 
probes. I'll collect and share data from the T2000 server later.


Regards,
-Robert

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-07-21 Thread Robert Lor

Tom Lane wrote:


Tatsuo Ishii [EMAIL PROTECTED] writes:
 


18% in s_lock is definitely bad :-(.  Were you able to determine which
LWLock(s) are accounting for the contention?
   



 


Sorry for the delay. Finally I got the oprofile data. It's
huge(34MB). If you are interested, I can put somewhere. Please let me
know.
   



I finally got a chance to look at this, and it seems clear that all the
traffic is on the BufMappingLock.  This is essentially the same problem
we were discussing with respect to Gavin Hamill's report of poor
performance on an 8-way IBM PPC64 box (see hackers archives around
2006-04-21).  If your database is fully cached in shared buffers, then
you can do a whole lot of buffer accesses per unit time, and even though
all the BufMappingLock acquisitions are in shared-LWLock mode, the
LWLock's spinlock ends up being heavily contended on an SMP box.

It's likely that CVS HEAD would show somewhat better performance because
of the btree change to cache local copies of index metapages (which
eliminates a fair fraction of buffer accesses, at least in Gavin's test
case).   Getting much further than that seems to require partitioning
the buffer mapping table.  The last discussion stalled on my concerns
about unpredictable shared memory usage, but I have some ideas on that
which I'll post separately.  In the meantime, thanks for sending along
the oprofile data!

regards, tom lane
 

I ran pgbench and fired up a DTrace script using the lwlock probes we've 
added, and it looks like BufMappingLock is the most contended lock, but  
CheckpointStartLocks are held for longer duration!


Lock IdMode   Count
ControlFileLock   Exclusive   1
SubtransControlLock   Exclusive   1
   BgWriterCommLock   Exclusive   6
  FreeSpaceLock   Exclusive   6
   FirstLockMgrLock   Exclusive  48
BufFreelistLock   Exclusive  74
 BufMappingLock   Exclusive  74
CLogControlLock   Exclusive 184
 XidGenLock   Exclusive 184
CheckpointStartLock  Shared 185
   WALWriteLock   Exclusive 185
  ProcArrayLock   Exclusive 368
CLogControlLock  Shared 552
SubtransControlLock  Shared1273
  WALInsertLock   Exclusive1476
 XidGenLock  Shared1842
  ProcArrayLock  Shared3160
 SInvalLock  Shared3684
 BufMappingLock  Shared   14578

Lock Id   Combined Time (ns)
ControlFileLock 7915
   BgWriterCommLock43438
  FreeSpaceLock   39
BufFreelistLock   448530
   FirstLockMgrLock  2879957
CLogControlLock  4237750
SubtransControlLock  6378042
 XidGenLock  9500422
  WALInsertLock 16372040
 SInvalLock 23284554
  ProcArrayLock 32188638
 BufMappingLock113128512
   WALWriteLock142391501
CheckpointStartLock   4171106665


Regards,
-Robert

 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-07-21 Thread Robert Lor

Tom Lane wrote:


Those numbers look a bit suspicious --- I'd expect to see some of the
LWLocks being taken in both shared and exclusive modes, but you don't
show any such cases. You sure your script is counting correctly?
 


I'll double check to make sure no stupid mistakes were made!


Also, it'd be interesting to count time spent holding shared lock
separately from time spent holding exclusive.
 


Will provide that data later today.

Regards,
-Robert


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-07-21 Thread Robert Lor

Tom Lane wrote:


Also, it'd be interesting to count time spent holding shared lock
separately from time spent holding exclusive.

 


Tom,

Here is the break down between exclusive  shared LWLocks. Do the 
numbers look reasonable to you?


Regards,
-Robert

bash-3.00# time ./Tom_lwlock_acquire.d `pgrep -n postgres`
** LWLock Count: Exclusive **
Lock IdMode   Count
ControlFileLock   Exclusive   1
  FreeSpaceLock   Exclusive   9
 XidGenLock   Exclusive 202
CLogControlLock   Exclusive 203
   WALWriteLock   Exclusive 203
   BgWriterCommLock   Exclusive 222
BufFreelistLock   Exclusive 305
 BufMappingLock   Exclusive 305
  ProcArrayLock   Exclusive 405
   FirstLockMgrLock   Exclusive 670
  WALInsertLock   Exclusive1616

** LWLock Count: Shared **
Lock IdMode   Count
CheckpointStartLock  Shared 202
CLogControlLock  Shared 450
SubtransControlLock  Shared 776
 XidGenLock  Shared2020
  ProcArrayLock  Shared3778
 SInvalLock  Shared4040
 BufMappingLock  Shared   40838

** LWLock Time: Exclusive **
Lock Id   Combined Time (ns)
ControlFileLock 8301
  FreeSpaceLock80590
CLogControlLock  1603557
   BgWriterCommLock  1607122
BufFreelistLock  1997406
 XidGenLock  2312442
 BufMappingLock  3161683
   FirstLockMgrLock  5392575
  ProcArrayLock  6034396
  WALInsertLock 12277693
   WALWriteLock324869744

** LWLock Time: Shared **
Lock Id   Combined Time (ns)
CLogControlLock  3183788
SubtransControlLock  6956229
 XidGenLock 12012576
 SInvalLock 35567976
  ProcArrayLock 45400779
 BufMappingLock300669441
CheckpointStartLock   4056134243


real0m24.718s
user0m0.382s
sys 0m0.181s



---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community

2006-06-16 Thread Robert Lor


I am thrill to inform you all that Sun has just donated a fully loaded 
T2000 system to the PostgreSQL community, and it's being setup by Corey 
Shields at OSL (osuosl.org) and should be online probably early next 
week. The system has


* 8 cores, 4 hw threads/core @ 1.2 GHz. Solaris sees the system as 
having 32 virtual CPUs, and each can be enabled or disabled individually

* 32 GB of DDR2 SDRAM memory
* 2 @ 73GB internal SAS drives (1 RPM)
* 4 Gigabit ethernet ports

For complete spec, visit 
http://www.sun.com/servers/coolthreads/t2000/specifications.jsp


I think this system is well suited for PG scalability testing, among 
others. We did an informal test using an internal OLTP benchmark and 
noticed that PG can scale to around 8 CPUs. Would be really cool if all 
32 virtual CPUs can be utilized!!!


Anyways, if you need to access the system for testing purposes, please 
contact Josh Berkus.


Regards,

Robert Lor
Sun Microsystems, Inc.
01-510-574-7189




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community

2006-06-16 Thread Robert Lor

Arjen van der Meijden wrote:



I can already confirm very good scalability (with our workload) on 
postgresql on that machine. We've been testing a 32thread/16G-version 
and it shows near-linear scaling when enabling 1, 2, 4, 6 and 8 cores 
(with all four threads enabled).


The threads are a bit less scalable, but still pretty good. Enabling 
1, 2 or 4 threads for each core yields resp 60 and 130% extra 
performance.


Wow, what type of workload is it? And did you do much tuning to get 
near-linear scalability to 32 threads?


Regards,
-Robert

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] bad performance on Solaris 10

2006-04-13 Thread Robert Lor

Bruce Momjian wrote On 04/13/06 01:39 AM,:
 
 Yes, if someone wants to give us a clear answer on which wal_sync method
 is best on all versions of Solaris, we can easily make that change.
 

We're doing tests to see how various parameters in postgresql.conf
affect performance on Solaris and will share the results shortly.

Regards,
-Robert


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] bad performance on Solaris 10

2006-04-06 Thread Robert Lor

Chris Mair wrote:


Ok, so I did a few runs for each of the sync methods, keeping all the
rest constant and got this:

open_datasync  0.7
fdatasync  4.6
fsync  4.5
fsync_writethrough not supported
open_sync  0.6

in arbitrary units - higher is faster.

Quite impressive!


 


Chris,
Just to make sure the x4100 config is similar to your Linux system, can 
you verify the default setting for disk write cache and make sure they 
are both enabled or disabled. Here's how to check in Solaris.

As root, run format -e - pick a disk - cache - write_cache - display

Not sure how to do it on Linux though!

Regards,
-Robert

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3

2006-04-05 Thread Robert Lor
Tom is right. Unless your workload can generate lots of simultaneous 
queries, you will not reap the full benefit of the Sun Fire T2000 
system. I have tested 8.1.3 with an OLTP workload on an 8 cores system. 
With 1500-2000 client connections, the CPU was only about 30% utilized. 
The UltraSPARC T1 processor was designed for throughput with many cores 
running at lower frequency (1-1.2 GHz) to reduce power consumption. To 
speed up a single big query, you'd be better off with a parallelize DB 
or an Opteron system with higher clock speed like this one 
http://www.sun.com/servers/entry/x4200/


Regards,
-Robert

Tom Lane wrote:


Juan Casero \(FL FLC\) [EMAIL PROTECTED] writes:
 


... This box has a single Ultrasparc T1 cpu with six execution
piplelines that can each handle 4 threads.  With the Unix top utility
the postgresql server appears to bounce around between the available
threads on the system.
   



Try sending it more than one query at a time?  If you're testing with
just one client connection issuing queries, that's about what I'd expect.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend
 




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] PostgreSQL Solaris packages now in beta

2006-01-25 Thread Robert Lor


With big thanks to Josh Berkus and Devrim Gunduz, I'm happy to announce 
that Sun has just released a Solaris distribution of PostgreSQL 8.1.2  
with ready-to-install packages for both Sparc and x86.  These packages 
are currently in Beta, and we expect to FCS in 2 -3 weeks.  The 
packages, along with an install guide, are available for download at 
http://pgfoundry.org/projects/solarispackages/


We have tightly integrated PostgreSQL with Solaris in a manner similar 
to the Linux distributions available on postgresql.org. In fact, the 
directory structures are identical.  Starting with Solaris 10 Update 2, 
PostgreSQL will be distributed with every copy of Solaris, via download 
and physical media.


We welcome any and all feedback on this PostgreSQL Solaris 
distribution.   Please subscribe to the 
[EMAIL PROTECTED] mailing list to give us feedback: 
http://pgfoundry.org/mail/?group_id=163


BTW, I'm a senior engineer at Sun Microsystems, recently working with 
the PostgreSQL community (namely Josh Berkus, Devrim Gunduz, and Gavin 
Sherry) on the Solaris Packages Project at PgFoundry,  PostgreSQL 
performance optimization on Solaris, and leveraging Solaris 10 
capabilities (e.g. DTrace) specifically for PostgreSQL.  I'll be posting 
a Solaris performance tuning guide in a few weeks.


Regards,
Robert Lor



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly