[PERFORM] Hardware advice for scalable warehouse db

2011-07-15 Thread chris
Hi list,

My employer will be donated a NetApp FAS 3040 SAN [1] and we want to run
our warehouse DB on it. The pg9.0 DB currently comprises ~1.5TB of
tables, 200GB of indexes, and grows ~5%/month. The DB is not update
critical, but undergoes larger read and insert operations frequently.

My employer is a university with little funds and we have to find a
cheap way to scale for the next 3 years, so the SAN seems a good chance
to us. We are now looking for the remaining server parts to maximize DB
performance with costs = $4000. I digged out the following
configuration with the discount we receive from Dell:

  1 x Intel Xeon X5670, 6C, 2.93GHz, 12M Cache
  16 GB (4x4GB) Low Volt DDR3 1066Mhz
  PERC H700 SAS RAID controller
  4 x 300 GB 10k SAS 6Gbps 2.5 in RAID 10

I was thinking to put the WAL and the indexes on the local disks, and
the rest on the SAN. If funds allow, we might downgrade the disks to
SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible).

Any comments on the configuration? Any experiences with iSCSI vs. Fibre
Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a
cheap alternative how to connect as many as 16 x 2TB disks as DAS?

Thanks so much!

Best,
Chris

[1]: http://www.b2net.co.uk/netapp/fas3000.pdf


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware advice for scalable warehouse db

2011-07-15 Thread Greg Smith

chris wrote:

My employer is a university with little funds and we have to find a
cheap way to scale for the next 3 years, so the SAN seems a good chance
to us.


A SAN is rarely ever the cheapest way to scale anything; you're paying 
extra for reliability instead.




I was thinking to put the WAL and the indexes on the local disks, and
the rest on the SAN. If funds allow, we might downgrade the disks to
SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible).
  


If you want to keep the bulk of the data on the SAN, this is a 
reasonable way to go, performance-wise.  But be aware that losing the 
WAL means your database is likely corrupted.  That means that much of 
the reliability benefit of the SAN is lost in this configuration.




Any experiences with iSCSI vs. Fibre
Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a
cheap alternative how to connect as many as 16 x 2TB disks as DAS?
  


I've never heard anyone recommend iSCSI if you care at all about 
performance, while FC works fine for this sort of job.  The physical 
dimensions of 3.5 drives makes getting 16 of them in one reasonably 
sized enclosure normally just out of reach.  But a Dell PowerVault 
MD1000 will give you 15 x 2TB as inexpensively as possible in a single 
3U space (well, as cheaply as you want to go--you might build your own 
giant box cheaper but I wouldn't recommend ).  I've tested MD1000, 
MD1200, and MD1220 arrays before, and always gotten seriously good 
performance relative to the dollars spent with that series.  Only one of 
these Dell storage arrays I've heard two disappointing results from (but 
not tested directly yet) is the MD3220.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inoptimal query plan for max() and multicolumn index

2011-07-15 Thread Gaetano Mendola

On 20/06/2011 07:35, Vladimir Kulev wrote:


But this does not work:
# explain analyze select max(timestamp) from sms where number in
('5502712','5802693','5801981');


Try to rewrite that query this way:

explain analyze select timestamp from sms where number in 
('5502712','5802693','5801981') order by timestamp desc limit 1;



Regards
Gaetano Mendola



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inoptimal query plan for max() and multicolumn index

2011-07-15 Thread Gaetano Mendola

On 20/06/2011 07:35, Vladimir Kulev wrote:


But this does not work:
# explain analyze select max(timestamp) from sms where number in
('5502712','5802693','5801981');


Try to rewrite that query this way:

explain analyze select timestamp from sms where number in 
('5502712','5802693','5801981') order by timestamp desc limit 1;



Regards
Gaetano Mendola



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware advice for scalable warehouse db

2011-07-15 Thread jesper
   1 x Intel Xeon X5670, 6C, 2.93GHz, 12M Cache
   16 GB (4x4GB) Low Volt DDR3 1066Mhz
   PERC H700 SAS RAID controller
   4 x 300 GB 10k SAS 6Gbps 2.5 in RAID 10

Apart from Gregs excellent recommendations. I would strongly suggest
more memory. 16GB in 2011 is really on the low side.

PG is using memory (either shared_buffers og OS cache) for
keeping frequently accessed data in. Good recommendations are hard
without knowledge of data and access-patterns, but 64, 128 and 256GB
system are quite frequent when you have data that can't all be
in memory at once.

SAN's are nice, but I think you can buy a good DAS thing each year
for just the support cost of a Netapp, but you might have gotten a
really good deal there too. But you are getting a huge amount of
advanced configuration features and potential ways of sharing and..
and .. just see the specs.

.. and if you need those the SAN is a good way to go, but
they do come with a huge pricetag.

Jesper


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware advice for scalable warehouse db

2011-07-15 Thread Robert Schnabel


On 7/15/2011 2:10 AM, Greg Smith wrote:

chris wrote:

My employer is a university with little funds and we have to find a
cheap way to scale for the next 3 years, so the SAN seems a good chance
to us.

A SAN is rarely ever the cheapest way to scale anything; you're paying
extra for reliability instead.



I was thinking to put the WAL and the indexes on the local disks, and
the rest on the SAN. If funds allow, we might downgrade the disks to
SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible).


If you want to keep the bulk of the data on the SAN, this is a
reasonable way to go, performance-wise.  But be aware that losing the
WAL means your database is likely corrupted.  That means that much of
the reliability benefit of the SAN is lost in this configuration.



Any experiences with iSCSI vs. Fibre
Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a
cheap alternative how to connect as many as 16 x 2TB disks as DAS?


I've never heard anyone recommend iSCSI if you care at all about
performance, while FC works fine for this sort of job.  The physical
dimensions of 3.5 drives makes getting 16 of them in one reasonably
sized enclosure normally just out of reach.  But a Dell PowerVault
MD1000 will give you 15 x 2TB as inexpensively as possible in a single
3U space (well, as cheaply as you want to go--you might build your own
giant box cheaper but I wouldn't recommend ).


I'm curious what people think of these:
http://www.pc-pitstop.com/sas_cables_enclosures/scsase166g.asp

I currently have my database on two of these and for my purpose they 
seem to be fine and are quite a bit less expensive than the Dell 
MD1000.  I actually have three more of the 3G versions with expanders 
for mass storage arrays (RAID0) and haven't had any issues with them in 
the three years I've had them.


Bob




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Unexpected seq scans when expected result is 1 row out of milions

2011-07-15 Thread Tom Lane
Svetlin Manavski svetlin.manav...@gmail.com writes:
 The question is: why do we get a seq scan on appqosdata.tcpsessiondata_1 and
 appqosdata.tcpsessiondata_2 even if the planner estimates correctly 1 row
 out of millions could potentially be selected? As you can see ~90% of the
 time is spent on those 2 partitions even if they are not apparently
 different from any of the others.

Well, there must be *something* different about them.  Are you sure
they've got the same indexes as the others?  It would be useful to see
psql's \d report for those partitions, as well as for one of the
partitions that's behaving as expected.  You might also compare the
EXPLAIN results for doing the query on just one child table between
the normal and misbehaving partitions.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware advice for scalable warehouse db

2011-07-15 Thread Scott Marlowe
On Fri, Jul 15, 2011 at 12:34 AM, chris chri...@gmx.net wrote:
 I was thinking to put the WAL and the indexes on the local disks, and
 the rest on the SAN. If funds allow, we might downgrade the disks to
 SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible).

Just to add to the conversation, there's no real advantage to putting
WAL on SSD.  Indexes can benefit from them, but WAL is mosty
seqwuential throughput and for that a pair of SATA 1TB drives at
7200RPM work just fine for most folks.  For example, in one big server
we're running we have 24 drives in a RAID-10 for the /data/base dir
with 4 drives in a RAID-10 for pg_xlog, and those 4 drives tend to
have the same io util % under iostat as the 24 drives under normal
usage.  It takes a special kind of load (lots of inserts happening in
large transactions quickly) for the 4 drive RAID-10 to have more than
50% util ever.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware advice for scalable warehouse db

2011-07-15 Thread Scott Marlowe
On Fri, Jul 15, 2011 at 10:39 AM, Robert Schnabel
schnab...@missouri.edu wrote:
 I'm curious what people think of these:
 http://www.pc-pitstop.com/sas_cables_enclosures/scsase166g.asp

 I currently have my database on two of these and for my purpose they seem to
 be fine and are quite a bit less expensive than the Dell MD1000.  I actually
 have three more of the 3G versions with expanders for mass storage arrays
 (RAID0) and haven't had any issues with them in the three years I've had
 them.

I have a co-worker who's familiar with them and they seem a lot like
the 16 drive units we use from Aberdeen, which fully outfitted with
15k SAS drives run $5k to $8k depending on the drives etc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware advice for scalable warehouse db

2011-07-15 Thread Josh Berkus

 Just to add to the conversation, there's no real advantage to putting
 WAL on SSD.  Indexes can benefit from them, but WAL is mosty
 seqwuential throughput and for that a pair of SATA 1TB drives at
 7200RPM work just fine for most folks.  

Actually, there's a strong disadvantage to putting WAL on SSD.  SSD is
very prone to fragmentation if you're doing a lot of deleting and
replacing files.  I've implemented data warehouses where the database
was on SSD but WAL was still on HDD.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware advice for scalable warehouse db

2011-07-15 Thread chris r.
Hi list,

Thanks a lot for your very helpful feedback!

 I've tested MD1000, MD1200, and MD1220 arrays before, and always gotten
 seriously good performance relative to the dollars spent
Great hint, but I'm afraid that's too expensive for us. But it's a great
way to scale over the years, I'll keep that in mind.

I had a look at other server vendors who offer 4U servers with slots for
16 disks for 4k in total (w/o disks), maybe that's an even
cheaper/better solution for us. If you had the choice between 16 x 2TB
SATA vs. a server with some SSDs for WAL/indexes and a SAN (with SATA
disk) for data, what would you choose performance-wise?

Again, thanks so much for your help.

Best,
Chris

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware advice for scalable warehouse db

2011-07-15 Thread Rob Wultsch
On Fri, Jul 15, 2011 at 11:49 AM, chris r. chri...@gmx.net wrote:
 Hi list,

 Thanks a lot for your very helpful feedback!

 I've tested MD1000, MD1200, and MD1220 arrays before, and always gotten
 seriously good performance relative to the dollars spent
 Great hint, but I'm afraid that's too expensive for us. But it's a great
 way to scale over the years, I'll keep that in mind.

 I had a look at other server vendors who offer 4U servers with slots for
 16 disks for 4k in total (w/o disks), maybe that's an even
 cheaper/better solution for us. If you had the choice between 16 x 2TB
 SATA vs. a server with some SSDs for WAL/indexes and a SAN (with SATA
 disk) for data, what would you choose performance-wise?

 Again, thanks so much for your help.

 Best,
 Chris

SATA drives can easily flip bits and postgres does not checksum data,
so it will not automatically detect corruption for you. I would steer
well clear of SATA unless you are going to be using a fs like ZFS
which checksums data. I would hope that a SAN would detect this for
you, but I have no idea.


-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware advice for scalable warehouse db

2011-07-15 Thread Josh Berkus
On 7/14/11 11:34 PM, chris wrote:
 Any comments on the configuration? Any experiences with iSCSI vs. Fibre
 Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a
 cheap alternative how to connect as many as 16 x 2TB disks as DAS?

Here's the problem with iSCSI: on gigabit ethernet, your maximum
possible throughput is 100mb/s, which means that your likely maximum
database throughput (for a seq scan or vacuum, for example) is 30mb/s.
That's about a third of what you can get with good internal RAID.

While multichannel iSCSI is possible, it's hard to configure, and
doesn't really allow you to spread a *single* request across multiple
channels.  So: go with fiber channel if you're using a SAN.

iSCSI also has horrible lag times, but you don't care about that so much
for DW.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Statistics and Multi-Column indexes

2011-07-15 Thread lars

On 07/10/2011 02:31 PM, Samuel Gendler wrote:
What about partitioning tables by tenant id and then maintaining 
indexes on each partition independent of tenant id, since constraint 
exclusion should handle filtering by tenant id for you.  That seems 
like a potentially more tolerable variant of #5 How many tenants are 
we talking about?  I gather partitioning starts to become problematic 
when the number of partitions gets large.



I thought I had replied... Apparently I didn't.

The database can grow in two dimensions: The number of tenants and the 
number of rows per tenant.
We have many tenants with relatively little data and a few with a lot of 
data. So the number of tenants

is known ahead of time and might be 1000's.

-- Lars


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database

2011-07-15 Thread lars

On 07/14/2011 04:03 PM, Jeff Janes wrote:

On Wed, Jul 13, 2011 at 3:41 PM, larslhofha...@yahoo.com  wrote:

On 07/13/2011 11:42 AM, Kevin Grittner wrote:

So transactions without an XID *are* sensitive to
synchronous_commit.  That's likely a useful clue.

How much did it help the run time of the SELECT which followed the
UPDATE?

It has surprisingly little impact on the SELECT side:

If your fsync is truly fsyncing, it seems like it should have
considerable effect.

Could you strace with both -ttt and -T, with and without synchronous commit?

Cheers,

Jeff

Ok, here we go:

Q\0\0\0_select count(*) from test w..., 8192, 0, NULL, NULL) = 96 
5.357152

1310774187.750791 gettimeofday({1310774187, 750809}, NULL) = 0 0.22
1310774187.751023 lseek(12, 0, SEEK_END) = 329908224 0.23
1310774187.751109 lseek(15, 0, SEEK_END) = 396607488 0.22
1310774187.751186 lseek(18, 0, SEEK_END) = 534175744 0.22
1310774187.751360 lseek(12, 0, SEEK_END) = 329908224 0.23
1310774187.753389 brk(0x248e000)= 0x248e000 0.26
1310774187.753953 brk(0x24ce000)= 0x24ce000 0.23
1310774187.755158 brk(0x254e000)= 0x254e000 0.24
1310774187.766605 brk(0x245)= 0x245 0.000170
1310774187.766852 lseek(23, 4513792, SEEK_SET) = 4513792 0.23
1310774187.766927 write(23, 
f\320\1\0\1\0\0\0\320\0\0\0\0\340D-\22\0\0\0\30@!0..., 32768) 
= 32768 0.75

1310774187.767071 fdatasync(23) = 0 0.002618
1310774187.769760 gettimeofday({1310774187, 769778}, NULL) = 0 0.22
1310774187.769848 sendto(5, 
\2\0\0\0\350\0\0\0\1@\0\0\2\0\0\0\1\0\0\0\0\0\0\0\r\201\0\0\0\2\0..., 
232, 0, NULL, 0) = 232 0.64
1310774187.769993 sendto(6, 
T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D..., 
66, 0, NULL, 0) = 66 0.000199


(23 is the WAL fd) vs.

Q\0\0\0_select count(*) from test w..., 8192, 0, NULL, NULL) = 96 
7.343720

1310774306.978767 gettimeofday({1310774306, 978785}, NULL) = 0 0.21
1310774306.978994 lseek(12, 0, SEEK_END) = 330883072 0.24
1310774306.979080 lseek(15, 0, SEEK_END) = 397131776 0.21
1310774306.979157 lseek(18, 0, SEEK_END) = 534732800 0.22
1310774306.979332 lseek(12, 0, SEEK_END) = 330883072 0.22
1310774306.983096 brk(0x248e000)= 0x248e000 0.26
1310774306.983653 brk(0x24ce000)= 0x24ce000 0.23
1310774306.984667 brk(0x254e000)= 0x254e000 0.23
1310774306.996040 brk(0x245)= 0x245 0.000168
1310774306.996298 gettimeofday({1310774306, 996317}, NULL) = 0 0.21
1310774306.996388 sendto(5, 
\2\0\0\0\350\0\0\0\1@\0\0\2\0\0\0\1\0\0\0\0\0\0\0\r\201\0\0\0\2\0..., 
232, 0, NULL, 0) = 232 0.78
1310774306.996550 sendto(6, 
T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D..., 
66, 0, NULL, 0) = 66 0.000202


So the difference is only 2ms. The size of the WAL buffers written is on 
32k,


Here's an example with more dirty rows (I basically let the updater run 
for a while dirtying very many rows).


Q\0\0\0_select count(*) from test w..., 8192, 0, NULL, NULL) = 96 
23.690018

1310775141.398780 gettimeofday({1310775141, 398801}, NULL) = 0 0.28
1310775141.399018 lseek(12, 0, SEEK_END) = 372514816 0.23
1310775141.399105 lseek(15, 0, SEEK_END) = 436232192 0.22
1310775141.399185 lseek(18, 0, SEEK_END) = 573620224 0.23
1310775141.399362 lseek(12, 0, SEEK_END) = 372514816 0.24
1310775141.414017 brk(0x249)= 0x249 0.28
1310775141.414575 brk(0x24d)= 0x24d 0.25
1310775141.415600 brk(0x255)= 0x255 0.24
1310775141.417757 semop(229383, {{0, -1, 0}}, 1) = 0 0.24
...
1310775141.448998 semop(229383, {{0, -1, 0}}, 1) = 0 0.25
1310775141.453134 brk(0x2452000)= 0x2452000 0.000167
1310775141.453377 fadvise64(22, 0, 0, POSIX_FADV_DONTNEED) = 0 0.25
1310775141.453451 close(22) = 0 0.32
1310775141.453537 open(pg_xlog/000100D100C2, O_RDWR) = 22 
0.59
1310775141.453696 write(22, 
f\320\3\0\1\0\0\0\321\0\0\0\0\0\0\3023\356\17N\23l\vN\0\0\0\1\0 
\0\0..., 5365760) = 5365760 0.005991
1310775141.459798 write(22, 
f\320\1\0\1\0\0\0\321\0\0\0\0\340Q\302`\5\0\915!000..., 
9019392) = 9019392 0.010062

1310775141.469965 fdatasync(22) = 0 0.231385
1310775141.701424 semop(229383, {{2, 1, 0}}, 1) = 0 0.31
1310775141.702657 gettimeofday({1310775141, 702682}, NULL) = 0 0.28
1310775141.702765 sendto(5, 
\2\0\0\0\350\0\0\0\1@\0\0\2\0\0\0\1\0\0\0\0\0\0\0\r\201\0\0\0\2\0..., 
232, 0, NULL, 0) = 232 0.71
1310775141.702942 sendto(6, 
T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D..., 
66, 0, NULL, 0) = 66 0.000220


vs

Q\0\0\0_select count(*) from test w..., 8192, 0, NULL, NULL) = 96 
55.595425

1310775406.842823 gettimeofday({1310775406, 842842}, NULL) = 0 0.26
1310775406.843092 lseek(12, 0, SEEK_END) = 382787584 0.23
1310775406.843179 lseek(15, 0, SEEK_END) = 457596928 0.42
1310775406.843280 lseek(18, 0, SEEK_END)