Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Robert Klemme
On Wed, Jul 4, 2012 at 1:13 AM, Reza Taheri rtah...@vmware.com wrote:

 Checking online, the subject of clustered indexes for PostgreSQL comes up
 often. PGSQL does have a concept called “clustered table”, which means a
 table has been organized in the order of an index. This would help with
 sequential accesses to a table, but has nothing to do with this problem.
 PGSQL folks sometimes refer to what we want as “integrated index”.

I do understand this correctly that we are speaking about the concept
which is known under the term index organized table (IOT) in Oracle
land, correct?

http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm#CBBJEBIH

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Hi Daniel,
Yes, it sounds like GIT will take us half the way there by getting rid of much 
of the index I/O if we cluster the tables. We can set the fillfactor parameter 
to keep tables sorted after updates. I am not sure what impact inserts will 
have since the primary key keeps growing with new inserts, so perhaps the table 
will maintain the cluster order and the benefits of GIT for new rows, too. GIT 
won't save CPU cycles the way a clustered/integrated index would, and actually 
adds to the CPU cost since the data page has to be searched for the desired 
tuple.

Thanks,
Reza

 -Original Message-
 From: Daniel Farina [mailto:dan...@heroku.com]
 Sent: Wednesday, July 04, 2012 6:40 AM
 To: Craig Ringer
 Cc: Reza Taheri; pgsql-performance@postgresql.org; Robert Haas
 Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
 performance
 
 On Tue, Jul 3, 2012 at 10:43 PM, Craig Ringer ring...@ringerc.id.au wrote:
  On 07/04/2012 07:13 AM, Reza Taheri wrote:
 
  Following the earlier email introducing the TPC-V benchmark, and that
  we are developing an industry standard benchmarking kit for TPC-V
  using PostgreSQL, here is a specific performance issue we have run into.
 
 
  Which version of PostgreSQL are you using?
 
  How has it been tuned beyond the defaults - autovacuum settings,
  shared_buffers, effective_cache_size, WAL settings, etc?
 
  How much RAM is on the blade? What OS and version are on the blade?
 
 
  Comparing the table sizes, we are close to 2X larger (more on this in
  a later note). But the index size is what stands out. Our overall
  index usage (again, after accounting for different numbers of rows) is
  4.8X times larger. 35% of our I/Os are to the index space. I am
  guessing that the 4.8X ballooning has something to do with this, and
  that in itself explains a lot about our high I/O rate, as well as
  higher CPU/tran cycles compared to MS SQL (we are  2.5-3 times slower).
 
  This is making me wonder about bloat issues and whether proper
  vacuuming is being done. If the visibility map and free space map
  aren't maintained by proper vaccum operation everything gets messy,
 fast.
 
  Well, MS SQL used a clustered index for CT, i.e., the data is held
  in the leaf pages of the index B-Tree. The data and index are in one
  data structure. Once you lookup the index, you also have the data at
  zero additional cost.
 
  [snip]
 
 
 
  Is the PGSQL community willing to invest in a feature that a) has been
  requested by many others already; and b) can make a huge difference in
  a benchmark that can lend substantial credibility to PGSQL performance?
 
 
  while PostgreSQL doesn't support covering indexes or clustered indexes
  at this point, 9.2 has added support for index-only scans, which are a
  half-way point of sorts. See:
 
http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-
 em.html
http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
 
  http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9
  337a21f98ac4ce850bb4145acf47ca27
 
  If at all possible please see how your test is affected by this
  PostgreSQL
  9.2 enhancement. It should make a big difference, and if it doesn't
  it's important to know why.
 
  (CC'd Robert Haas)
 
  I'm not sure what the best option for getting a 9.2 beta build for
  Windows is.
 
 
  As for the invest side - that's really a matter for EnterpriseDB,
  Command Prompt, Red Hat, and the other backers who're employing
 people
  to work on the DB. Consider asking on pgsql-hackers, too; if nothing
  else you'll get a good explanation of the current state and progress toward
 clustered indexes.
 
  Some links that may be useful to you are:
 
http://wiki.postgresql.org/wiki/Todo
Things that it'd be good to support/implement at some point.
  Surprisingly, covering/clustered indexes aren't on there or at least aren't
 easily found.
  It's certainly a much-desired feature despite its apparent absence
  from the TODO.
 
 I think there is, deservingly, a lot of hesitation to implement a strictly
 ordered table construct.  A similar feature that didn't quite get finished --
 but maybe can be beaten into shape -- is the grouped-index-tuple
 implementation:
 
 http://community.enterprisedb.com/git/
 
 It is mentioned on the TODO page.  It's under the category that is perhaps
 poorly syntactically overloaded in the world cluster.
 
 --
 fdr

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Hi Robert,
Yes, the same concept. Oracle's IOT feature is used often with TPC benchmarks.

Thanks,
Reza

 -Original Message-
 From: Robert Klemme [mailto:shortcut...@googlemail.com]
 Sent: Thursday, July 05, 2012 5:30 AM
 To: Reza Taheri
 Cc: pgsql-performance@postgresql.org; Andy Bond (ab...@redhat.com);
 Greg Kopczynski; Jignesh Shah
 Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
 performance
 
 On Wed, Jul 4, 2012 at 1:13 AM, Reza Taheri rtah...@vmware.com wrote:
 
  Checking online, the subject of clustered indexes for PostgreSQL comes
  up often. PGSQL does have a concept called clustered table, which
  means a table has been organized in the order of an index. This would
  help with sequential accesses to a table, but has nothing to do with this
 problem.
  PGSQL folks sometimes refer to what we want as integrated index.
 
 I do understand this correctly that we are speaking about the concept which
 is known under the term index organized table (IOT) in Oracle land,
 correct?
 
 http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm#CBB
 JEBIH
 
 Kind regards
 
 robert
 
 --
 remember.guy do |as, often| as.you_can - without end
 http://blog.rubybestpractices.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Samuel Gendler
On Thu, Jul 5, 2012 at 12:13 PM, Reza Taheri rtah...@vmware.com wrote:

 Hi Robert,
 Yes, the same concept. Oracle's IOT feature is used often with TPC
 benchmarks.


Reza, it would be very helpful if you were to provide the list with a lot
more information about your current software and hardware configuration
before coming to the conclusion that the only possible way forward is with
a significant architectural change to the db engine itself.  Not only is it
not at all clear that you are extracting maximum performance from your
current hardware and software, but I doubt anyone is particularly
interested in doing a bunch of development purely to game a benchmark.
 There has been significant discussion of the necessity and viability of
the feature you are requesting in the past, so you should probably start
where those discussions left off rather than starting the discussion all
over again from the beginning.  Of course, if vmware were to sponsor
development of the feature in question, it probably wouldn't require nearly
as much buy-in from the wider community.

Getting back to the current performance issues -  I have little doubt that
the MS SQL benchmark was set up and run by people who were intimately
familiar with MS SQL performance tuning.  You stated in your earlier email
that your team doesn't have significant postgresql-specific experience, so
it isn't necessarily surprising that your first attempt at tuning didn't
get the results that you are looking for. You stated that you have 14 SSDs
and 90 spinning drives, but you don't specify how they are combined and how
the database is laid out on top of them.  There is no mention of how much
memory is available to the system. We don't know how you've configured
postgresql's memory allocation or how your config weights the relative
costs of index lookups, sequential scans, etc.  The guidelines for this
mailing list include instructions for what information should be provided
when asking about performance improvements.
http://archives.postgresql.org/pgsql-performance/  Let's start by
ascertaining how your benchmark results can be improved without engaging in
a significant development effort on the db engine itself.


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Hi Samuel,
The SSDs were used as a cache for the spinning drives. Here is a 30-second 
iostat sample representative of the whole run:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  24.870.00   12.54   62.390.000.20

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sdd   0.00   137.37 3058.40  106.17 34691.60   974.1322.54
15.754.98   0.32 100.00
sde   0.00   136.07 3063.37  107.70 35267.07   975.0722.86
15.584.92   0.32 100.00
sdf   0.00   135.37 3064.23  109.53 35815.60   979.6023.19
15.824.99   0.32 100.00
sdg   0.00   136.97 3066.57  116.67 35196.53  1014.5322.75
15.874.99   0.31 100.00
sdi   0.00  2011.030.00   87.90 0.00  8395.73   191.03 
0.131.45   1.42  12.51
sdk   0.00   136.63 3066.83  107.53 35805.07   976.6723.17
16.015.04   0.32 100.00
sdm   0.00   138.50 3054.40  111.10 34674.27   998.4022.54
15.524.91   0.32 100.00
sdj   0.00   136.73 3058.70  118.20 35227.20  1019.7322.82
15.814.98   0.31 100.00
sdl   0.00   137.53 3044.97  109.33 34448.00   987.4722.47
15.785.00   0.32 100.00

The data and index tablespaces were striped across the 8 LUNs, and saw an 
average 5ms response. We can beef up the storage to handle more I/Os so that 
our utilization doesn't stay below 40%, but that misses the point: we have an 
I/O rate twice the commercial database because they used clustered indexes.

I provided more config details in an earlier email.

As for asking for development to game a benchmark, no one is asking for 
benchmark specials. The question of enhancements in response to benchmark needs 
is an age old question. We can get into that, but it's really a different 
discussion. Let me just expose the flip side of it: are we willing to watch 
people use other databases to run benchmarks but feel content that no features 
were developed specifically in response to benchmark results?

I am trying to engage with the community. We can drown the mailing list with 
details. So I decided to open the discussion with the high level points, and we 
will give you all the details that you want as we move forward.

Thanks,
Reza

From: Samuel Gendler [mailto:sgend...@ideasculptor.com]
Sent: Thursday, July 05, 2012 12:46 PM
To: Reza Taheri
Cc: Robert Klemme; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V performance


On Thu, Jul 5, 2012 at 12:13 PM, Reza Taheri 
rtah...@vmware.commailto:rtah...@vmware.com wrote:
Hi Robert,
Yes, the same concept. Oracle's IOT feature is used often with TPC benchmarks.

Reza, it would be very helpful if you were to provide the list with a lot more 
information about your current software and hardware configuration before 
coming to the conclusion that the only possible way forward is with a 
significant architectural change to the db engine itself.  Not only is it not 
at all clear that you are extracting maximum performance from your current 
hardware and software, but I doubt anyone is particularly interested in doing a 
bunch of development purely to game a benchmark.  There has been significant 
discussion of the necessity and viability of the feature you are requesting in 
the past, so you should probably start where those discussions left off rather 
than starting the discussion all over again from the beginning.  Of course, if 
vmware were to sponsor development of the feature in question, it probably 
wouldn't require nearly as much buy-in from the wider community.

Getting back to the current performance issues -  I have little doubt that the 
MS SQL benchmark was set up and run by people who were intimately familiar with 
MS SQL performance tuning.  You stated in your earlier email that your team 
doesn't have significant postgresql-specific experience, so it isn't 
necessarily surprising that your first attempt at tuning didn't get the results 
that you are looking for. You stated that you have 14 SSDs and 90 spinning 
drives, but you don't specify how they are combined and how the database is 
laid out on top of them.  There is no mention of how much memory is available 
to the system. We don't know how you've configured postgresql's memory 
allocation or how your config weights the relative costs of index lookups, 
sequential scans, etc.  The guidelines for this mailing list include 
instructions for what information should be provided when asking about 
performance improvements.  http://archives.postgresql.org/pgsql-performance/  
Let's start by ascertaining how your benchmark results can be improved without 
engaging in a significant development effort on the db engine itself.





Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Samuel Gendler
On Thu, Jul 5, 2012 at 1:37 PM, Reza Taheri rtah...@vmware.com wrote:


 I provided more config details in an earlier email.

 ** **



I hate to disagree, but unless I didn't get a message sent to the list, you
haven't provided any details about your postgresql config or otherwise
adhered to the guidelines for starting a discussion of a performance
problem around here.  I just searched my mailbox and no email from you has
any such details.  Several people have asked for them, including myself.
 You say you will give any details we want, but this is at least the 3rd or
4th request for such details and they have not yet been forthcoming.


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Craig Ringer
First: Please do try 9.2 beta if you're upgrading from 8.4. It'll be out 
as a final release soon enough, and index only scans may make a /big/ 
difference for the problem you're currently having.


Looking at your configuration I have a few comments, but it's worth 
noting that I don't work with hardware at that scale, and I'm more used 
to tuning I/O bottlenecked systems with onboard storage rather than 
CPU-bottlenecked ones on big SANs. Hopefully now that you've posted your 
configuration and setup there might be interest from others.


If you're able to post an EXPLAIN ANALYZE or two for a query you feel is 
slow that certainly won't hurt. Using http://explain.depesz.com/ saves 
you the hassle of dealing with word-wrapping when posting them, btw.


As for your config:

I notice that your autovacuum settings are at their defaults. With heavy 
UPDATE / DELETE load this'll tend to lead to table and index bloat, so 
the DB has to scan more useless data to get what it needs. It also means 
table stats won't be maintained as well, potentially leading to poor 
planner decisions. The following fairly scary query can help identify 
bloat, as the database server doesn't currently have anything much built 
in to help you spot such issues:


http://wiki.postgresql.org/wiki/Show_database_bloat

It might be helpful to set effective_cache_size and 
effective_io_concurrency so Pg has more idea of the scale of your 
hardware. The defaults are very conservative - it's supposed to be easy 
for people to use for simple things without melting their systems, and 
it's expected that anyone doing bigger work will tune the database.


http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html

It looks like you've already tweaked many of the critical points for big 
installs - your checkpoint_segments, wal_buffers, shared_buffers, etc. I 
lack the big hardware experience to know if they're appropriate, but 
they're not the extremely conservative defaults, which is a start.


Your random_page_cost and seq_page_cost are probably dead wrong for a 
SAN with RAM and SSD cache in front of fast disks. Their defaults are 
for local uncached spinning HDD media where seeks are expensive. The 
typical advice on such hardware is to set them to something more like 
seq_page_cost = 0.1  random_page_cost  = 0.15 - ie cheaper relative to 
the cpu cost, and with random I/O only a little more expensive than 
sequential I/O. What's right for your situation varies a bit based on DB 
size vs hardware size, etc; Greg discusses this more in his book.


What isolation level do your transactions use? This is significant 
because of the move to true serializable isolation with predicate 
locking in 9.0; it made serializable transactions a bit slower in some 
circumstances in exchange for much stronger correctness guarantees. The 
READ COMMITTED default was unchanged.




It also looks like you might not have seen the second part of my earlier 
reply:



while PostgreSQL doesn't support covering indexes or clustered indexes 
at this point, 9.2 has added support for index-only scans, which are a 
half-way point of sorts. See:


http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27

If at all possible please see how your test is affected by this 
PostgreSQL 9.2 enhancement. It should make a big difference, and if it 
doesn't it's important to know why.


(CC'd Robert Haas)



As for the invest side - that's really a matter for EnterpriseDB, 
Command Prompt, Red Hat, and the other backers who're employing people 
to work on the DB. Consider asking on pgsql-hackers, too; if nothing 
else you'll get a good explanation of the current state and progress 
toward clustered indexes.


Some links that may be useful to you are:

http://wiki.postgresql.org/wiki/Todo
  Things that it'd be good to support/implement at some point. 
Surprisingly, covering/clustered indexes aren't on there or at least 
aren't easily found. It's certainly a much-desired feature despite its 
apparent absence from the TODO.


http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan
http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Open_Items

--
Craig Ringer






[PERFORM] SSDs again, LSI Warpdrive 2 anyone?

2012-07-05 Thread Mark Kirkwood
A vendor has recommended the above drive to us - anyone have experience 
with it or its predecessor Warpdrive?


http://www.storagereview.com/lsi_warpdrive_2_lp_display_idf_2011
http://www.storagereview.com/lsi_warpdrive_slp300_review

The specs look quite good, and the cards have capacitors on them - 
however I can't see any *specific* mention about poweroff safety (am 
going to follow that up directly myself).


Cheers

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Craig Ringer

On 07/06/2012 04:52 AM, Samuel Gendler wrote:



On Thu, Jul 5, 2012 at 1:37 PM, Reza Taheri rtah...@vmware.com 
mailto:rtah...@vmware.com wrote:



I provided more config details in an earlier email.



I hate to disagree, but unless I didn't get a message sent to the list


It looks like that might be the case. I got a message with Message-ID 
66ce997fb523c04e9749452273184c6c137cb88...@exch-mbx-113.vmware.com sent 
at Thu, 5 Jul 2012 11:33:46 -0700 that contained the basic info, 
postgresql.conf, etc. Belated, but it was sent. I can't find this 
message in the archives and the copy I got came direct to me via cc, so 
I suspect our friendly mailing list system has silently held it for 
moderation due to size/attachment.



I'll reproduce the content below, followed by an inline copy of the 
postgresql.conf with only changed lines:


On 07/06/2012 02:33 AM, Reza Taheri wrote:


OK, some config details.

We are using:

·Two blades of an HP BladeSystem c-Class c7000 with 2-socket Intel 
E5520 (Nehalem-EP) processors and 48GB of memory per blade


o8 cores, 16 threads per blade

o48GB of RAM per blade

·Storage was an EMC VNX5700 with 14 SSDs fronting 32 15K RPM drives

·The Tier B database VM was alone on a blade with 16 vCPUs, 40GB of 
memory, 4 virtual drives with various RAID levels


·The driver and Tier A VMs were on the second blade

oSo we set PGHOST on the client system to point to the server

·RHEL 6.1

·PostgreSQL 8.4

·unixODBC 2.3.2

We stuck with PGSQL 8.4 since it is the stock version shipped with 
RHEL 6. I am building a new, larger testbed, and will switch to PGSQL 
9 with that.





postgresql.conf:

[craig@ayaki ~]$ egrep -v '(^\s*#)|(^\s*$)' /tmp/postgresql2.conf  | cut 
-d '#' -f 1

listen_addresses = '*'
max_connections = 320
shared_buffers = 28GB
temp_buffers = 200MB
work_mem = 10MB
maintenance_work_mem = 10MB
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 20
wal_buffers = 16MB
checkpoint_segments = 128
checkpoint_timeout = 30min
checkpoint_completion_target = 0.9
default_statistics_target = 1
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'




Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Well, I keep failing to send an email with an attachment. Do I need a 
moderator's approval?

Yes, running on VMs and a lower bin processor. With the virtualization 
overhead, etc., I figure we would be running right around 2/3 of the Dell 
throughput if we were running the same DBMS.

I sent the following message twice today with attachments (postgresql.conf, 
etc.), and it hasn't been posted yet. Here it is without an attachment.



From: Reza Taheri 
Sent: Thursday, July 05, 2012 11:34 AM
To: 'Craig Ringer'
Cc: pgsql-performance@postgresql.org; Robert Haas
Subject: RE: [PERFORM] The need for clustered indexes to boost TPC-V performance

OK, some config details.
We are using:

*   Two blades of an HP BladeSystem c-Class c7000 with 2-socket Intel E5520 
(Nehalem-EP) processors and 48GB of memory per blade
o   8 cores, 16 threads per blade
o   48GB of RAM per blade
*   Storage was an EMC VNX5700 with 14 SSDs fronting 32 15K RPM drives
*   The Tier B database VM was alone on a blade with 16 vCPUs, 40GB of 
memory, 4 virtual drives with various RAID levels
*   The driver and Tier A VMs were on the second blade
o   So we set PGHOST on the client system to point to the server
*   RHEL 6.1
*   PostgreSQL 8.4
*   unixODBC 2.3.2

We stuck with PGSQL 8.4 since it is the stock version shipped with RHEL 6. I am 
building a new, larger testbed, and will switch to PGSQL 9 with that.

Postgres.conf is attached.

Thanks,
Reza

 -Original Message-
 From: Andy Colson [mailto:a...@squeakycode.net]
 Sent: Thursday, July 05, 2012 5:42 PM
 To: Samuel Gendler
 Cc: Reza Taheri; Robert Klemme; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
 performance
 
 On 07/05/2012 03:52 PM, Samuel Gendler wrote:
 
 
  On Thu, Jul 5, 2012 at 1:37 PM, Reza Taheri rtah...@vmware.com
 mailto:rtah...@vmware.com wrote:
 
 
  I provided more config details in an earlier email.
 
  __ __
 
 
 
  I hate to disagree, but unless I didn't get a message sent to the list, you
 haven't provided any details about your postgresql config or otherwise
 adhered to the guidelines for starting a discussion of a performance
 problem around here.  I just searched my mailbox and no email from you
 has any such details.  Several people have asked for them, including myself.
 You say you will give any details we want, but this is at least the 3rd or 4th
 request for such details and they have not yet been forthcoming.
 
 
 Reza, I went back and looked myself.  I see no specs on OS, or hardware
 unless you mean this:
 
 
  http://bit.ly/QeWXhE. This was run on a similar server, and the database
 size is close to ours.
 
 
 You're running on windows then?  Server is 96Gig ram, 8 cores, (dell
 poweredge T610).
 with two powervault MD1120 NAS's?
 
 But then I assume you were not running on that, were you.  You were
 running vmware on it, probably?
 
 
 -Andy

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Craig Ringer

On 07/06/2012 08:41 AM, Andy Colson wrote:

You're running on windows then?  Server is 96Gig ram, 8 cores, (dell 
poweredge T610).

with two powervault MD1120 NAS's?


Thankfully they're running Pg on Linux (RHEL 6) . It seems that tests to 
date have been run against 8.4 which is pretty elderly, but hopefully 
it'll be brought up to 9.1 or 9.2beta soon.


While the original poster should've given a reasonable amount of 
information to start with when asking performance questions - as per the 
mailing list guidance and plain common sense - more info /was/ sent 
later on /but the lists.postgresql.org mailman ate it /- or held it for 
moderation, anyway. The OP can't be blamed when Pg's mailing list 
manager eats mesages with attachments! Also, remember that not everyone 
uses community mailing lists regularly; it takes a little learning to 
get used to keeping track of conversations, to inline reply style, etc.


--
Craig Ringer


Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL

2012-07-05 Thread Greg Smith

On 07/03/2012 07:08 PM, Reza Taheri wrote:

TPC-V is a new benchmark under development for virtualized databases. A
TPC-V configuration has:

- multiple virtual machines running a mix of DSS, OLTP, and business
logic apps

- VMs running with throughputs ranging from 10% to 40% of the total system
..


I think this would be a lot more interesting to the traditional, 
dedicated hardware part of the PostgreSQL community if there was a clear 
way to run this with only a single active machine too.  If it's possible 
for us to use this to compare instances of PostgreSQL on dedicated 
hardware, too, that is enormously more valuable to people with larger 
installations.  It might be helpful to VMWare as well.  Being able to 
say this VM install gets X% of the performance of a bare-metal install 
answers a question I get asked all the time--when people want to decide 
between dedicated and virtual setups.


The PostgreSQL community could use a benchmark like this for its own 
performance regression testing too.  A lot of that work is going to 
happen on a dedicated machines.


 After waving our hands through a number

of small differences between the platforms, we have calculated a CPU
cost of around 3.2ms/transaction for the published MS SQL results,
versus a measurement of 8.6ms/transaction for PostgreSQL. (TPC
benchmarks are typically pushed to full CPU utilization. One removes all
bottlenecks in storage, networking, etc., to achieve the 100% CPU usage.
So CPU cost/tran is the final decider of performance.) So we need to cut
the CPU cost of transactions in half to make publications with
PostgreSQL comparable to commercial databases.


I appreciate that getting close to parity here is valuable.  This 
situation is so synthetic though--removing other bottlenecks and looking 
at CPU timing--that it's hard to get too excited about optimizing for 
it.  There's a lot of things in PostgreSQL that we know are slower than 
commercial databases because they're optimized for flexibility (the way 
operators are implemented is the best example) or for long-term code 
maintenance.  Microsoft doesn't care if they have terribly ugly code 
that runs faster, because no one sees that code.  PostgreSQL does care.


The measure that's more fair is a system cost based ones.  What I've 
found is that a fair number of people note PostgreSQL's low-level code 
isn't quite as fast as some of the less flexible alternatives--hard 
coding operators is surely cheaper than looking them up each time--but 
the license cost savings more than pays for bigger hardware to offset 
that.  I wish I had any customer whose database was CPU bound, that 
would be an awesome world to live in.


Anyway, guessing at causes here is premature speculation. When there's 
some code for the test kit published, at that point discussing the 
particulars of why it's not running well will get interesting.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Greg Smith

On 07/03/2012 07:13 PM, Reza Taheri wrote:

Is the PGSQL community willing to invest in a feature that a) has been
requested by many others already; and b) can make a huge difference in a
benchmark that can lend substantial credibility to PGSQL performance?


Larger PostgreSQL features usually get built because companies sponsor 
their development, they pass review as both useful  correct, and then 
get committed.  Asking the community to invest in a new feature isn't 
quite the right concept.  Yes, everyone would like one of the smaller 
index representations.  I'm sure we can find reviewers willing to look 
at such a feature and committers who would also be interested enough to 
commit it, on a volunteer basis.  But a feature this size isn't going to 
spring to life based just on volunteer work.  The most useful questions 
would be who would be capable of writing that feature? and how can we 
get them sponsored to focus on it?  I can tell from your comments yet 
what role(s) in that process VMWare wants to take on internally, and 
which it's looking for help with.  The job of convincing people it's a 
useful feature isn't necessary--we know that's true.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MemSQL the world's fastest database?

2012-07-05 Thread Greg Smith

On 07/01/2012 01:00 AM, Scott Marlowe wrote:

Considering I can build a pgsql 8.4 machine with 256G RAM and 64
Opteron cores and a handful of SSDs or HW RAID that can do REAL 7k to
8k RW TPS right now for well under $10k, 20k TPS on an in memory
database isn't all that impressive.


Again, their TPS numbers are useless without a contest of how big each 
transaction is, and we don't know.  I can take MemSQL seriously when 
there's a press release describing how to replicate their benchmark 
independently.  Then it's useful to look at the absolute number.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] What would effect planning time?

2012-07-05 Thread David Kerr
I spent a good chunk of today trying to chase down why a query on
one box ran in 110ms and on another, smaller box it ran in 10ms.
There was no other activity on either box.

Both boxes are PG9.1.1 RHEL 6.2 x64. the faster box is a smallish 
VM. the other box is a big 40core/256GB box.

The plans between both boxes were exactly the same, so it didn't occur
to me to run an analyze on the tables.

I did a number of things including bouncing the DB and reindexing
some of the tables. that didn't help.

Eventually i separated the query out to a prepared statement and
found that it was spending 100ms in PREPARE on the slow box (I 
assume it was planning)

I left the problem for about 30 minutes and came back and the 
query started running at normal speed. 

I suspect an autovacuum kicked in, but would that sort of thing really impact
parse/plan time to that degree? any other thoughts as to what it could have 
been?
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Craig Ringer

On 07/06/2012 09:00 AM, Reza Taheri wrote:

Well, I keep failing to send an email with an attachment. Do I need a 
moderator's approval?


Probably. If so, it's really annoying that mailman isn't telling you 
this via a held for moderation auto-reply. It should be.



We stuck with PGSQL 8.4 since it is the stock version shipped with RHEL 6. I am 
building a new, larger testbed, and will switch to PGSQL 9 with that.


Just so you know, as per PostgreSQL versioning policy major releases are 
x.y, eg 8.4, 9.0 and 9.1 are distinct major releases.


http://www.postgresql.org/support/versioning/

I've always found that pretty odd and wish major releases would just 
increment the first version part, but the policy states how it's being 
done. It's important to realize this when you're talking about Pg 
releases, because 8.4, 9.0, 9.1 and 9.2 are distinct releases with 
different feature sets, so postgresql 9 doesn't mean much.


--
Craig Ringer



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL

2012-07-05 Thread Reza Taheri
Hi Greg,
Yes, a single-instance benchmark is a natural fall-out from the TPC-V kit. Our 
coding team (4 people working directly on the benchmark with another 3-4 folks 
helping in various consulting capacities) is tasked with creating a multi-VM 
benchmark. The benchmark is still missing the critical Market Exchange Emulator 
function. Once that's done, it would be natural for someone else in the TPC to 
take our working prototype and simplify it for a single-system, TPC-E (not V) 
reference kit. The conversion is not technically difficult, but releasing kits 
is a new path for the TPC, and will take some work.

Cheers,
Reza

 -Original Message-
 From: Greg Smith [mailto:g...@2ndquadrant.com]
 Sent: Thursday, July 05, 2012 6:25 PM
 To: Reza Taheri
 Cc: pgsql-performance@postgresql.org; Andy Bond (ab...@redhat.com);
 Greg Kopczynski; Jignesh Shah
 Subject: Re: [PERFORM] Introducing the TPC-V benchmark, and its
 relationship to PostgreSQL
 
 On 07/03/2012 07:08 PM, Reza Taheri wrote:
  TPC-V is a new benchmark under development for virtualized databases.
  A TPC-V configuration has:
 
  - multiple virtual machines running a mix of DSS, OLTP, and business
  logic apps
 
  - VMs running with throughputs ranging from 10% to 40% of the total
  system ..
 
 I think this would be a lot more interesting to the traditional, dedicated
 hardware part of the PostgreSQL community if there was a clear way to run
 this with only a single active machine too.  If it's possible for us to use 
 this to
 compare instances of PostgreSQL on dedicated hardware, too, that is
 enormously more valuable to people with larger installations.  It might be
 helpful to VMWare as well.  Being able to say this VM install gets X% of the
 performance of a bare-metal install
 answers a question I get asked all the time--when people want to decide
 between dedicated and virtual setups.
 
 The PostgreSQL community could use a benchmark like this for its own
 performance regression testing too.  A lot of that work is going to happen on
 a dedicated machines.
 
   After waving our hands through a number
  of small differences between the platforms, we have calculated a CPU
  cost of around 3.2ms/transaction for the published MS SQL results,
  versus a measurement of 8.6ms/transaction for PostgreSQL. (TPC
  benchmarks are typically pushed to full CPU utilization. One removes
  all bottlenecks in storage, networking, etc., to achieve the 100% CPU
 usage.
  So CPU cost/tran is the final decider of performance.) So we need to
  cut the CPU cost of transactions in half to make publications with
  PostgreSQL comparable to commercial databases.
 
 I appreciate that getting close to parity here is valuable.  This situation 
 is so
 synthetic though--removing other bottlenecks and looking at CPU timing--
 that it's hard to get too excited about optimizing for it.  There's a lot of
 things in PostgreSQL that we know are slower than commercial databases
 because they're optimized for flexibility (the way operators are
 implemented is the best example) or for long-term code maintenance.
 Microsoft doesn't care if they have terribly ugly code that runs faster,
 because no one sees that code.  PostgreSQL does care.
 
 The measure that's more fair is a system cost based ones.  What I've found is
 that a fair number of people note PostgreSQL's low-level code isn't quite as
 fast as some of the less flexible alternatives--hard coding operators is 
 surely
 cheaper than looking them up each time--but the license cost savings more
 than pays for bigger hardware to offset that.  I wish I had any customer
 whose database was CPU bound, that would be an awesome world to live
 in.
 
 Anyway, guessing at causes here is premature speculation. When there's
 some code for the test kit published, at that point discussing the particulars
 of why it's not running well will get interesting.
 
 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Just to be clear, we have a number of people from different companies working 
on the kit. This is not a VMware project, it is a TPC project. But I hear you 
regarding coming in from the cold and asking for a major db engine feature. I 
know that I have caused a lot of rolling eyes. Believe me, I have had the same 
(no, worse!) reaction from every one of the commercial database companies in 
response to similar requests over the past 25 years.

We have our skin in the game, and as long as the community values the benchmark 
and wants to support us, we will figure out the details as we go forward.

Thanks,
Reza

 -Original Message-
 From: Greg Smith [mailto:g...@2ndquadrant.com]
 Sent: Thursday, July 05, 2012 6:42 PM
 To: Reza Taheri
 Cc: pgsql-performance@postgresql.org; Andy Bond (ab...@redhat.com);
 Greg Kopczynski; Jignesh Shah
 Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
 performance
 
 On 07/03/2012 07:13 PM, Reza Taheri wrote:
  Is the PGSQL community willing to invest in a feature that a) has been
  requested by many others already; and b) can make a huge difference in
  a benchmark that can lend substantial credibility to PGSQL performance?
 
 Larger PostgreSQL features usually get built because companies sponsor
 their development, they pass review as both useful  correct, and then get
 committed.  Asking the community to invest in a new feature isn't quite the
 right concept.  Yes, everyone would like one of the smaller index
 representations.  I'm sure we can find reviewers willing to look at such a
 feature and committers who would also be interested enough to commit it,
 on a volunteer basis.  But a feature this size isn't going to spring to life 
 based
 just on volunteer work.  The most useful questions would be who would
 be capable of writing that feature? and how can we get them sponsored
 to focus on it?  I can tell from your comments yet what role(s) in that
 process VMWare wants to take on internally, and which it's looking for help
 with.  The job of convincing people it's a useful feature isn't necessary--we
 know that's true.
 
 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL

2012-07-05 Thread Craig Ringer

On 07/06/2012 11:22 AM, Reza Taheri wrote:

Hi Greg,
Yes, a single-instance benchmark is a natural fall-out from the TPC-V kit. Our 
coding team (4 people working directly on the benchmark with another 3-4 folks 
helping in various consulting capacities) is tasked with creating a multi-VM 
benchmark. The benchmark is still missing the critical Market Exchange Emulator 
function. Once that's done, it would be natural for someone else in the TPC to 
take our working prototype and simplify it for a single-system, TPC-E (not V) 
reference kit. The conversion is not technically difficult, but releasing kits 
is a new path for the TPC, and will take some work.
Please consider releasing sample versions early in the process, 
especially as such releases are new to the TPC. Giving others the 
opportunity to contribute different skill sets and experiences before 
everything is locked in to the final configuration is important, 
especially when trying new things.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] The overall experience of TPC-V benchmark team with PostgreSQL

2012-07-05 Thread Reza Taheri
Hello PGSQL fans,
Looking back at my posts the past couple of days and the replies that I've got, 
I realized that I have failed to make one point clear: we are very pleased with 
what we have seen from PostgreSQL so far. Let me explain. At this point of 
developing or porting a benchmark on a new DBMS, the team usually deals with 
stability, scalability, or fundamental performance issues. Our fear was that 
working with an open source DBMS, we'd experience more issues than usual. But 
we got the kit running transactions on PGSQL quickly, and after some early 
tests, I decided to try the kit on a larger testbed (two other folks are the 
developers of the benchmark code; I design, run, and analyze the experiments). 
I have the benchmark running on a 300,000-customer database on a 16-CPU system, 
unusual for this early in the prototyping phase. People who developed TPC-E 
(the father of our benchmark) did their prototyping on commercial databases 
with much smaller databases on smaller systems. On this large testbed, PGSQL 
has been working like a champ, and performance is what I would call decent. Put 
in other words, I have been pleasantly surprised by the throughput I am getting 
out of the system, saturating a 16-way with no visible signs of contention when 
we reduce the database size.

We are developing a reference kit. People are not obligated to use it to 
publish official results. They can use it to kick the tires, then go to one of 
the commercial DBMS vendors and ask for their kit for an official TPC-V 
publication. Even if that's all that people do with the reference kit, our team 
has achieved the goal that the TPC set for us. What I am trying to do is see if 
we can take this to the point that people use PGSQL to publish official results 
and use it in competitive situations. It looks possible, so I'd love to see it 
happen.

Again, overall, our experience with PGSQL has been positive, even in terms of 
performance.

Thanks,
Reza



Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL

2012-07-05 Thread Reza Taheri
Yes, I hear you. TPC's usual mode of operation has been to release details 
after the benchmark is complete. But TPC does have a policy clause that allows 
publication of draft specifications to get public feedback before the benchmark 
is complete. Our 2012 TPC TC paper will have a lot of the high level details. 
We need to see if we can use the draft clause to also release beta versions of 
code.

Thanks,
Reza

 -Original Message-
 From: Craig Ringer [mailto:ring...@ringerc.id.au]
 Sent: Thursday, July 05, 2012 8:41 PM
 To: Reza Taheri
 Cc: Greg Smith; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Introducing the TPC-V benchmark, and its
 relationship to PostgreSQL
 
 On 07/06/2012 11:22 AM, Reza Taheri wrote:
  Hi Greg,
  Yes, a single-instance benchmark is a natural fall-out from the TPC-V kit.
 Our coding team (4 people working directly on the benchmark with another
 3-4 folks helping in various consulting capacities) is tasked with creating a
 multi-VM benchmark. The benchmark is still missing the critical Market
 Exchange Emulator function. Once that's done, it would be natural for
 someone else in the TPC to take our working prototype and simplify it for a
 single-system, TPC-E (not V) reference kit. The conversion is not technically
 difficult, but releasing kits is a new path for the TPC, and will take some
 work.
 Please consider releasing sample versions early in the process, especially as
 such releases are new to the TPC. Giving others the opportunity to
 contribute different skill sets and experiences before everything is locked
 in to the final configuration is important, especially when trying new things.
 
 --
 Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSDs again, LSI Warpdrive 2 anyone?

2012-07-05 Thread Mark Kirkwood

On 06/07/12 12:51, Mark Kirkwood wrote:
A vendor has recommended the above drive to us - anyone have 
experience with it or its predecessor Warpdrive?


http://www.storagereview.com/lsi_warpdrive_2_lp_display_idf_2011
http://www.storagereview.com/lsi_warpdrive_slp300_review

The specs look quite good, and the cards have capacitors on them - 
however I can't see any *specific* mention about poweroff safety (am 
going to follow that up directly myself).


Seems like the Warp Drive 2 was a pre-release name, Nytro is the 
actual appellation.


http://www.lsi.com/channel/products/storagecomponents/Pages/SolidState.aspx

Cheers

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Hi Craig,
I used the tool at depesz.com extensively during our early prototyping. It 
helped uncover ~10 problems that we solved by fixing issues in the code, adding 
or changing indexes, etc. Right now, I believe all our query plans look like 
what I would expect.

Yes, you are right, I did miss the link to the index-only scans. From what I 
can tell, it will do exactly what we want, but only as long as the index has 
all the columns in the query. I don't know what percentage of our queries have 
this property. But it does help.

The two main kit developers are out this week. We'll put our heads together 
next week to see what version to use when I switch to a larger testbed I am 
preparing.

Thanks,
Reza

From: Craig Ringer [mailto:ring...@ringerc.id.au]
Sent: Thursday, July 05, 2012 5:46 PM
To: Reza Taheri
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

First: Please do try 9.2 beta if you're upgrading from 8.4. It'll be out as a 
final release soon enough, and index only scans may make a big difference for 
the problem you're currently having.

Looking at your configuration I have a few comments, but it's worth noting that 
I don't work with hardware at that scale, and I'm more used to tuning I/O 
bottlenecked systems with onboard storage rather than CPU-bottlenecked ones on 
big SANs. Hopefully now that you've posted your configuration and setup there 
might be interest from others.

If you're able to post an EXPLAIN ANALYZE or two for a query you feel is slow 
that certainly won't hurt. Using http://explain.depesz.com/ saves you the 
hassle of dealing with word-wrapping when posting them, btw.

As for your config:

I notice that your autovacuum settings are at their defaults. With heavy UPDATE 
/ DELETE load this'll tend to lead to table and index bloat, so the DB has to 
scan more useless data to get what it needs. It also means table stats won't be 
maintained as well, potentially leading to poor planner decisions. The 
following fairly scary query can help identify bloat, as the database server 
doesn't currently have anything much built in to help you spot such issues:

   http://wiki.postgresql.org/wiki/Show_database_bloat

It might be helpful to set effective_cache_size and effective_io_concurrency so 
Pg has more idea of the scale of your hardware. The defaults are very 
conservative - it's supposed to be easy for people to use for simple things 
without melting their systems, and it's expected that anyone doing bigger work 
will tune the database.

http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html

It looks like you've already tweaked many of the critical points for big 
installs - your checkpoint_segments, wal_buffers, shared_buffers, etc. I lack 
the big hardware experience to know if they're appropriate, but they're not the 
extremely conservative defaults, which is a start.

Your random_page_cost and seq_page_cost are probably dead wrong for a SAN with 
RAM and SSD cache in front of fast disks. Their defaults are for local uncached 
spinning HDD media where seeks are expensive. The typical advice on such 
hardware is to set them to something more like seq_page_cost = 0.1  
random_page_cost  = 0.15 - ie cheaper relative to the cpu cost, and with random 
I/O only a little more expensive than sequential I/O. What's right for your 
situation varies a bit based on DB size vs hardware size, etc; Greg discusses 
this more in his book.

What isolation level do your transactions use? This is significant because of 
the move to true serializable isolation with predicate locking in 9.0; it made 
serializable transactions a bit slower in some circumstances in exchange for 
much stronger correctness guarantees. The READ COMMITTED default was unchanged.



It also looks like you might not have seen the second part of my earlier reply:

while PostgreSQL doesn't support covering indexes or clustered indexes at this 
point, 9.2 has added support for index-only scans, which are a half-way point 
of sorts. See:

  http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
  http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
  
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27

If at all possible please see how your test is affected by this PostgreSQL 9.2 
enhancement. It should make a big difference, and if it doesn't it's important 
to know why.

(CC'd Robert Haas)



As for the invest side - that's really a matter for EnterpriseDB, Command 
Prompt, Red Hat, and the other backers who're employing people to work on the 
DB. Consider asking on pgsql-hackers, too; if nothing else you'll get a good 
explanation of the current state and progress toward clustered indexes.

Some links that may be useful to you are:

  http://wiki.postgresql.org/wiki/Todo
  Things that it'd be good to support/implement at some point. Surprisingly, 
covering/clustered