Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

Hi Wes

1. The first interesting thing is that prior to my mentioning this problem to 
C_ a year or two back, the power loss protection was advertised everywhere 
as simply that, without qualifiers about 'not inflight data'. Check out the 
marketing of the M500 for the first year or so and try to find an example where 
they say 'but inflight data isn't protected!'. 

2. The second (and more important) interesting thing is that this is irrelevant!

Fsync'd data is BY DEFINITION not data in flight. 
Fsync means This data is secure on the disk! 
However, the drives corrupt it.

Postgres's sanity depends on a reliable fsync. That's why we see posts on the 
performance list saying 'fsync=no makes your postgres faster but really, don't 
do it in production. 
We are talking about internal DB corruption, not just a crash and a few lost 
transactions.

These drives return from fsync while data is still in volatile cache.
That's breaking the spec, and it's why they are not OK for postgres by 
themselves. 

This is not about 'in-flight' data, it's about fsync'd wal log data. 

Graeme. 


On 07 Jul 2015, at 16:15, Wes Vaske (wvaske) wva...@micron.com wrote:

 The M500/M550/M600 are consumer class drives that don't have power protection 
 for all inflight data.* (like the Samsung 8x0 series and the Intel 3x0  5x0 
 series).
 
 The M500DC has full power protection for inflight data and is an 
 enterprise-class drive (like the Samsung 845DC or Intel S3500  S3700 series).
 
 So any drive without the capacitors to protect inflight data will suffer from 
 data loss if you're using disk write cache and you pull the power.
 
 *Big addendum:
 There are two issues on powerloss that will mess with Postgres. Data Loss and 
 Data Corruption. The micron consumer drives will have power loss protection 
 against Data Corruption and the enterprise drive will have power loss 
 protection against BOTH.
 
 https://www.micron.com/~/media/documents/products/white-paper/wp_ssd_power_loss_protection.pdf
  
 
 The Data Corruption problem is only an issue in non-SLC NAND but it's 
 industry wide. And even though some drives will protect against that, the 
 protection of inflight data that's been fsync'd is more important and should 
 disqualify *any* consumer drives from *any* company from consideration for 
 use with Postgres.
 
 Wes Vaske | Senior Storage Solutions Engineer
 Micron Technology 
 
 -Original Message-
 From: Graeme B. Bell [mailto:graeme.b...@nibio.no] 
 Sent: Tuesday, July 07, 2015 8:26 AM
 To: Merlin Moncure
 Cc: Wes Vaske (wvaske); Craig James; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] New server: SSD/RAID recommendations?
 
 
 As I have warned elsewhere,
 
 The M500/M550 from $SOME_COMPANY is NOT SUITABLE for postgres unless you have 
 a RAID controller with BBU to protect yourself.
 The M500/M550 are NOT plug-pull safe despite the 'power loss protection' 
 claimed on the packaging. Not all fsync'd data is preserved in the event of a 
 power loss, which completely undermines postgres's sanity. 
 
 I would be extremely skeptical about the M500DC given the name and 
 manufacturer. 
 
 I went to quite a lot of trouble to provide $SOME_COMPANYs engineers with the 
 full details of this fault after extensive testing (we have e.g. 20-25 of 
 these disks) on multiple machines and controllers, at their request. Result: 
 they stopped replying to me, and soon after I saw their PR reps talking about 
 how 'power loss protection isn't about protecting all data during a power 
 loss'. 
 
 The only safe way to use an M500/M550 with postgres is:
 
 a) disable the disk cache, which will cripple performance to about 3-5% of 
 normal.
 b) use a battery backed or cap-backed RAID controller, which will generally 
 hurt performance, by limiting you to the peak performance of the flash on the 
 raid controller. 
 
 If you are buying such a drive, I strongly recommend buying only one and 
 doing extensive plug pull testing before commiting to several. 
 For myself, my time is valuable enough that it will be cheaper to buy intel 
 in future. 
 
 Graeme.
 
 On 07 Jul 2015, at 15:12, Merlin Moncure mmonc...@gmail.com wrote:
 
 On Thu, Jul 2, 2015 at 1:00 PM, Wes Vaske (wvaske) wva...@micron.com wrote:
 Storage Review has a pretty good process and reviewed the M500DC when it 
 released last year. 
 http://www.storagereview.com/micron_m500dc_enterprise_ssd_review
 
 
 
 The only database-specific info we have available are for Cassandra and 
 MSSQL:
 
 http://www.micron.com/~/media/documents/products/technical-marketing-brief/cassandra_and_m500dc_enterprise_ssd_tech_brief.pdf
 
 http://www.micron.com/~/media/documents/products/technical-marketing-brief/sql_server_2014_and_m500dc_raid_configuration_tech_brief.pdf
 
 
 
 (some of that info might be relevant)
 
 
 
 In terms of endurance, the M500DC is rated to 2 Drive Writes Per Day (DWPD) 
 for 5-years. For comparison:
 
 Micron M500DC (20nm) - 2 DWPD
 
 Intel S3500 (20nm) 

Re: [PERFORM] wildcard text filter switched to boolean column, performance is way worse

2015-07-07 Thread Tom Lane
Mike Broers mbro...@gmail.com writes:
 I had a query that was filtering with a wildcard search of a text field for
 %SUCCESS%. The query took about 5 seconds and was running often so I wanted
 to improve it.  I suggested that the engineers include a new boolean column
 for successful status.  They implemented the requested field, but the query
 that filters on that new column runs very long (i kill it after letting it
 run for about an hour).  Can someone help me understand why that is the
 case and how to resolve it?

It's hashing the subplan output in the first case and not the second:

 Seq Scan on lead  (cost=130951.81..158059.21 rows=139957 width=369) (actual
 time=4699.619..4699.869 rows=1 loops=1)
 Filter: ((NOT (hashed SubPlan 1)) AND ((ReferenceNumber)::text 
 ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (hashed SubPlan 3
   
vs

   Seq Scan on lead  (cost=85775.78..9005687281.12 rows=139957 width=369)
 Filter: ((NOT (hashed SubPlan 1)) AND ((ReferenceNumber)::text 
 ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (SubPlan 3
   ^

Presumably, the new more-accurate rows count causes the planner to realize
that the hash table will exceed work_mem so it doesn't choose to hash ...
but for your situation, you'd rather it did, because what you're getting
instead is a Materialize node that spills to disk (again, because the data
involved exceeds work_mem) and that's a killer for this query.  You should
be able to get back the old behavior if you raise work_mem enough.

Another idea you might think about is changing the OR'd IN conditions
to a single IN over a UNION ALL of the subselects.  I'm not really sure if
that would produce a better plan, but it's worth trying if it wouldn't
require too much app-side contortion.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Karl Denninger
After a plug-pull during the create, reboot and here is the verify:

root@Dbms2:/var/tmp # ./diskchecker.pl -s newfs verify /test/biteme
 verifying: 0.00%
 verifying: 3.81%
 verifying: 10.91%
 verifying: 18.71%
 verifying: 26.46%
 verifying: 33.95%
 verifying: 41.20%
 verifying: 49.48%
 verifying: 57.23%
 verifying: 64.89%
 verifying: 72.54%
 verifying: 80.04%
 verifying: 87.96%
 verifying: 95.15%
 verifying: 100.00%
Total errors: 0

da6 at mps0 bus 0 scbus0 target 17 lun 0
da6: ATA INTEL SSDSC2BP24 0420 Fixed Direct Access SPC-4 SCSI device
da6: Serial Number BTJR446401KW240AGN 
da6: 600.000MB/s transfers
da6: Command Queueing enabled
da6: 228936MB (468862128 512 byte sectors: 255H 63S/T 29185C)

# smartctl -a /dev/da6

=== START OF INFORMATION SECTION ===
Model Family: Intel 730 and DC S3500/S3700 Series SSDs
Device Model: INTEL SSDSC2BP240G4
Serial Number:BTJR446401KW240AGN
LU WWN Device Id: 5 5cd2e4 04b71afc7
Firmware Version: L2010420
User Capacity:240,057,409,536 bytes [240 GB]
Sector Size:  512 bytes logical/physical
Rotation Rate:Solid State Device
Form Factor:  2.5 inches
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Tue Jul  7 17:01:36 2015 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Note -- same firmware between all three series of Intel devices.. :-)

Yes, I like these SSDs -- they don't lie and they don't lose data on a
power-pull.


On 7/7/2015 08:08, Graeme B. Bell wrote:
 Thanks, this is very useful to know about the 730. When you say 'tested it 
 with plug-pulls', you were using diskchecker.pl, right?

 Graeme.

 On 07 Jul 2015, at 14:39, Karl Denninger k...@denninger.net wrote:

 Incidentally while there are people who have questioned the 730 series power 
 loss protection I've tested it with plug-pulls and in addition it watchdogs 
 its internal power loss capacitors -- from the smartctl -a display of one of 
 them on an in-service machine here:

 175 Power_Loss_Cap_Test 0x0033   100   100   010Pre-fail  Always 
   -   643 (4 6868)



-- 
Karl Denninger
k...@denninger.net mailto:k...@denninger.net
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

 Why would you think that you don't need RAID for ZFS?
 
 Reason I'm asking if because we are moving to ZFS on FreeBSD for our future 
 projects.


Because you have zraid. :-)

https://blogs.oracle.com/bonwick/entry/raid_z

General points:

1. It's my understanding that ZFS is designed to talk to the hardware directly, 
and so it would be bad to hide the physical layer from ZFS unless you had to.
After all, I don't think they implemented a raid-like system inside ZFS just 
for the fun of it. 

2. You have zraid built in and easy to manage within ZFS - and well tested 
compared to NewRaidController (TM) - why add another layer of management to 
your disk storage?

3. You reintroduce the raid write hole.

4. There might be some argument for hardware raid (existing system) but with 
software raid (the point I was addressing) it makes little sense at all.

5. If you're on hardware raid and your controller dies, you're screwed in 
several ways. It's harder to get a new raid controller than a new pc. Your 
chances of recovery are lower than zfs. IMHO more scary to recover from a 
failed raid controller, too. 

6. Recovery is faster if the disks aren't full. e.g. ZFS recovers what it is 
there. This might not seem a big deal but chances are it would save you 50% of 
your downtime in a crisis. 

However, I think with Linux you might want to use RAID for the boot disk. I 
don't know if linux can boot from ZFS yet. I would (and am) using Freebsd with 
zfs.

Graeme.


On 07 Jul 2015, at 18:56, Wei Shan weishan@gmail.com wrote:

 Hi Graeme,
 
 Why would you think that you don't need RAID for ZFS?
 
 Reason I'm asking if because we are moving to ZFS on FreeBSD for our future 
 projects.
 
 Regards,
 Wei Shan
 
 On 8 July 2015 at 00:46, Graeme B. Bell graeme.b...@nibio.no wrote:
 
  RAID controllers are completely unnecessary for SSD as they currently
  exist.
 
 Agreed. The best solution is not to buy cheap disks and not to buy RAID 
 controllers now, imho.
 
 In my own situation, I had a tight budget, high performance demand and a 
 newish machine with RAID controller and HDDs in it as a starting point.
 So it was more a question of 'what can you do with a free raid controller and 
 not much money' back in 2013. And it has worked very well.
 Still, I had hoped for a bit more from the cheaper SSDs though, I'd hoped to 
 use fastpath on the controller and bypass the cache.
 
 The way NVMe prices are going though, I wouldn't do it again if I was doing 
 it this year. I'd just go direct to nvme and trash the raid controller. These 
 sammy and intel nvmes are basically enterprise hardware at consumer prices. 
 Heck, I'll probably put one in my next gaming PC.
 
 Re: software raid.
 
 I agree, but once you accept that software raid is now pretty much superior 
 to hardware raid, you start looking at ZFS and thinking 'why the heck am I 
 even using software raid?'
 
 G
 
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 
 
 
 -- 
 Regards,
 Ang Wei Shan



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] wildcard text filter switched to boolean column, performance is way worse

2015-07-07 Thread Mike Broers
I had a query that was filtering with a wildcard search of a text field for
%SUCCESS%. The query took about 5 seconds and was running often so I wanted
to improve it.  I suggested that the engineers include a new boolean column
for successful status.  They implemented the requested field, but the query
that filters on that new column runs very long (i kill it after letting it
run for about an hour).  Can someone help me understand why that is the
case and how to resolve it?

First query:
SELECT *
FROM lead
WHERE ( NOT ( ( lead.id IN
 ( SELECT U1.lead_id AS lead_id
  FROM event U1
  WHERE U1.event_type = 'type_1' )
   OR ( lead.id IN
 ( SELECT U1.lead_id AS lead_id
  FROM event U1
  WHERE U1.event_type = 'type_2' )
   AND lead.id IN
 ( SELECT U1.lead_id AS lead_id
  FROM event U1
  WHERE UPPER(U1.response::text) LIKE
UPPER('%success%') ) ) ) )
   AND NOT (lead.ReferenceNumber = '') ) ;

explain/analyze result:

Seq Scan on lead  (cost=130951.81..158059.21 rows=139957 width=369) (actual
time=4699.619..4699.869 rows=1 loops=1)
Filter: ((NOT (hashed SubPlan 1)) AND ((ReferenceNumber)::text 
''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (hashed SubPlan 3
Rows Removed by Filter: 375369

SubPlan 1

  -  Seq Scan on event u1  (cost=0.00..42408.62 rows=7748 width=4)
(actual time=0.005..171.350 rows=7414 loops=1)
Filter: ((event_type)::text = 'type_1'::text)

Rows Removed by Filter: 1099436

SubPlan 2

  -  Seq Scan on event u1_1  (cost=0.00..42408.62 rows=375665 width=4)
(actual time=0.006..219.092 rows=373298 loops=1)
Filter: ((event_type)::text = 'type_2'::text)

Rows Removed by Filter: 733552

SubPlan 3

  -  Seq Scan on event u1_2  (cost=0.00..45175.75 rows=111 width=4)
(actual time=0.040..3389.550 rows=712952 loops=1)
Filter: (upper(response) ~~ '%SUCCESS%'::text)

Rows Removed by Filter: 393898

The main thing that sticks out to me for this plan is the low estimate for
the rows it will return on the %SUCCESS% filter.

Here is the second query with explain:

SELECT *
FROM lead
WHERE
(
 NOT
 (
  (lead.id IN
   (
SELECT U1.lead_id AS lead_id
FROM event U1
WHERE U1.event_type ='type_1'
   )
   OR
   (lead.id IN
(
 SELECT U1.lead_id AS lead_id
 FROM event U1
 WHERE U1.event_type = 'type_2
 )
   AND lead.id IN
(
 SELECT U1.lead_id AS lead_id
 FROM event U1
 WHERE successful
)
   )
  )
 )
 AND NOT  (lead.ReferenceNumber = '')
) ;

explain result:
  Seq Scan on lead  (cost=85775.78..9005687281.12 rows=139957 width=369)

Filter: ((NOT (hashed SubPlan 1)) AND ((ReferenceNumber)::text 
''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (SubPlan 3
SubPlan 1

  -  Seq Scan on event u1  (cost=0.00..42408.62 rows=7748 width=4)

Filter: ((event_type)::text = 'type_1'::text)

SubPlan 2

  -  Seq Scan on event u1_1  (cost=0.00..42408.62 rows=375665 width=4)

Filter: ((event_type)::text = 'type_2'::text)

SubPlan 3

  -  Materialize  (cost=0.00..46154.43 rows=731185 width=4)

-  Seq Scan on event u1_2  (cost=0.00..39641.50 rows=731185
width=4)
  Filter: successful

Here is does a materialize and estimates rows properly, but as stated this
query just hangs and pegs load.  There are no locks and its in an active
state the whole time.  I am running these queries in a test environment on
a recently exported full schema from production, with a reindex and a
vacuum/analyze.  This is postgres 9.3.6 on rhel6.

When I run just the different subquery element:
SELECT U1.lead_id AS lead_id
   FROM event U1

  WHERE successful;

it returns in about 250ms, with the text field %SUCCESS% it runs in about 4
seconds.  This seemed like a low hanging fruit query improvement so I'm
surprised its not working, it seems like we are just lucky that the planner
is estimating that filter incorrectly in the original form.

I'm sure the query just needs to be completely overhauled and am starting
to pull it apart and work with the engineers to get something more
efficient set up overall, but I am not sure how to answer the question as
to why this original attempt at improving the query is not successful.

Any guidance is greatly appreciated, thanks!


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Merlin Moncure
 On 07 Jul 2015, at 16:59, Heikki Linnakangas hlinn...@iki.fi wrote:


 So it lies about fsync()... The next question is, does it nevertheless 
 enforce the correct ordering of persisting fsync'd data? If you write to 
 file A and fsync it, then write to another file B and fsync it too, is it 
 guaranteed that if B is persisted, A is as well? Because if it isn't, you 
 can end up with filesystem (or database) corruption anyway.

On Tue, Jul 7, 2015 at 10:58 AM, Graeme B. Bell graeme.b...@nibio.no wrote:

 Yikes. I would not be able to sleep tonight if it were not for the BBU cache 
 in front of these disks...

 diskchecker.pl consistently reported several examples of corruption 
 post-power-loss (usually 10 - 30 ) on unprotected M500s/M550s, so I think 
 it's pretty much open to debate what types of madness and corruption you'll 
 find if you look close enough.

100% agree with your sentiments.   I do believe that there are other
enterprise SSD vendors that offer reliable parts but not at the price
point intel does for the cheaper drives.  The consumer grade vendors
are simply not trustworthy unless proven otherwise (I had my own
unpleasant experience with OCZ for example).  Intel played the same
game with their early parts but have since become a model of how to
ship drives to the market.

RAID controllers are completely unnecessary for SSD as they currently
exist.  Software raid is superior in every way; the hardware features
of raid controllers, BBU, write caching, and write consolidation are
redundant to what the SSD themselves do (being themselves RAID 0
basically).  A hypothetical SSD optimized raid controller is possible;
it could do things like balance wear and optimize writes across
multiple physical drives.  This would require deep participation
between the drive and the controller and FWICT no such things exists
excepting super expensive sans which I don't recommend anyways.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell
 
 RAID controllers are completely unnecessary for SSD as they currently
 exist.

Agreed. The best solution is not to buy cheap disks and not to buy RAID 
controllers now, imho.

In my own situation, I had a tight budget, high performance demand and a newish 
machine with RAID controller and HDDs in it as a starting point. 
So it was more a question of 'what can you do with a free raid controller and 
not much money' back in 2013. And it has worked very well.
Still, I had hoped for a bit more from the cheaper SSDs though, I'd hoped to 
use fastpath on the controller and bypass the cache. 

The way NVMe prices are going though, I wouldn't do it again if I was doing it 
this year. I'd just go direct to nvme and trash the raid controller. These 
sammy and intel nvmes are basically enterprise hardware at consumer prices. 
Heck, I'll probably put one in my next gaming PC. 

Re: software raid. 

I agree, but once you accept that software raid is now pretty much superior to 
hardware raid, you start looking at ZFS and thinking 'why the heck am I even 
using software raid?'

G



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Vitalii Tymchyshyn
Hi.

How would BBU cache help you if it lies about fsync? I suppose any RAID
controller removes data from BBU cache after it was fsynced by the drive.
As I know, there is no other magic command for drive to tell controller
that the data is safe now and can be removed from BBU cache.

Вт, 7 лип. 2015 11:59 Graeme B. Bell graeme.b...@nibio.no пише:


 Yikes. I would not be able to sleep tonight if it were not for the BBU
 cache in front of these disks...

 diskchecker.pl consistently reported several examples of corruption
 post-power-loss (usually 10 - 30 ) on unprotected M500s/M550s, so I think
 it's pretty much open to debate what types of madness and corruption you'll
 find if you look close enough.

 G


 On 07 Jul 2015, at 16:59, Heikki Linnakangas hlinn...@iki.fi wrote:

 
  So it lies about fsync()... The next question is, does it nevertheless
 enforce the correct ordering of persisting fsync'd data? If you write to
 file A and fsync it, then write to another file B and fsync it too, is it
 guaranteed that if B is persisted, A is as well? Because if it isn't, you
 can end up with filesystem (or database) corruption anyway.
 
  - Heikki



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] wildcard text filter switched to boolean column, performance is way worse

2015-07-07 Thread Mike Broers
After bumping up work_mem from 12MB to 25MB that last materialize is indeed
hashing and this cut the query time by about 60%.  Thanks, this was very
helpful and gives me something else to look for when troubleshooting
explains.



On Tue, Jul 7, 2015 at 11:10 AM, Mike Broers mbro...@gmail.com wrote:

 Thanks, very informative! I'll experiment with work_mem settings and
 report back.

 On Tue, Jul 7, 2015 at 11:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Mike Broers mbro...@gmail.com writes:
  I had a query that was filtering with a wildcard search of a text field
 for
  %SUCCESS%. The query took about 5 seconds and was running often so I
 wanted
  to improve it.  I suggested that the engineers include a new boolean
 column
  for successful status.  They implemented the requested field, but the
 query
  that filters on that new column runs very long (i kill it after letting
 it
  run for about an hour).  Can someone help me understand why that is the
  case and how to resolve it?

 It's hashing the subplan output in the first case and not the second:

  Seq Scan on lead  (cost=130951.81..158059.21 rows=139957 width=369)
 (actual
  time=4699.619..4699.869 rows=1 loops=1)
  Filter: ((NOT (hashed SubPlan 1)) AND ((ReferenceNumber)::text 
  ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (hashed SubPlan 3

 vs

Seq Scan on lead  (cost=85775.78..9005687281.12 rows=139957
 width=369)
  Filter: ((NOT (hashed SubPlan 1)) AND ((ReferenceNumber)::text 
  ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (SubPlan 3
^

 Presumably, the new more-accurate rows count causes the planner to realize
 that the hash table will exceed work_mem so it doesn't choose to hash ...
 but for your situation, you'd rather it did, because what you're getting
 instead is a Materialize node that spills to disk (again, because the data
 involved exceeds work_mem) and that's a killer for this query.  You should
 be able to get back the old behavior if you raise work_mem enough.

 Another idea you might think about is changing the OR'd IN conditions
 to a single IN over a UNION ALL of the subselects.  I'm not really sure if
 that would produce a better plan, but it's worth trying if it wouldn't
 require too much app-side contortion.

 regards, tom lane





Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

The comment on HDDs is true and gave me another thought. 

These new 'shingled' HDDs (the 8TB ones) rely on rewriting all the data on 
tracks that overlap your data, any time you change the data. Result: disks 
8-20x slower during writes, after they fill up. 

Do they have power loss protection for the data being rewritten during 
reshingling? You could have data commited at position X and you accidentally 
nuke data at position Y.

[I know that using a shingled disk sounds crazy (it sounds crazy to me) but you 
can bet there are people that just want to max out the disk bays in their 
server... ]

Graeme. 

On 07 Jul 2015, at 19:28, Michael Nolan htf...@gmail.com wrote:

 
 
 On Tue, Jul 7, 2015 at 10:59 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 07/07/2015 05:15 PM, Wes Vaske (wvaske) wrote:
 The M500/M550/M600 are consumer class drives that don't have power
 protection for all inflight data.* (like the Samsung 8x0 series and
 the Intel 3x0  5x0 series).
 
 The M500DC has full power protection for inflight data and is an
 enterprise-class drive (like the Samsung 845DC or Intel S3500  S3700
 series).
 
 So any drive without the capacitors to protect inflight data will
 suffer from data loss if you're using disk write cache and you pull
 the power.
 
 Wow, I would be pretty angry if I installed a SSD in my desktop, and it loses 
 a file that I saved just before pulling the power plug.
 
 That can (and does) happen with spinning disks, too.
  
 
 *Big addendum: There are two issues on powerloss that will mess with
 Postgres. Data Loss and Data Corruption. The micron consumer drives
 will have power loss protection against Data Corruption and the
 enterprise drive will have power loss protection against BOTH.
 
 https://www.micron.com/~/media/documents/products/white-paper/wp_ssd_power_loss_protection.pdf
 
  The Data Corruption problem is only an issue in non-SLC NAND but
 it's industry wide. And even though some drives will protect against
 that, the protection of inflight data that's been fsync'd is more
 important and should disqualify *any* consumer drives from *any*
 company from consideration for use with Postgres.
 
 So it lies about fsync()... The next question is, does it nevertheless 
 enforce the correct ordering of persisting fsync'd data? If you write to file 
 A and fsync it, then write to another file B and fsync it too, is it 
 guaranteed that if B is persisted, A is as well? Because if it isn't, you can 
 end up with filesystem (or database) corruption anyway.
 
 - Heikki
 
 
 The sad fact is that MANY drives (ssd as well as spinning) lie about their 
 fsync status.
 --
 Mike Nolan   
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

Yikes. I would not be able to sleep tonight if it were not for the BBU cache in 
front of these disks... 

diskchecker.pl consistently reported several examples of corruption 
post-power-loss (usually 10 - 30 ) on unprotected M500s/M550s, so I think it's 
pretty much open to debate what types of madness and corruption you'll find if 
you look close enough.

G


On 07 Jul 2015, at 16:59, Heikki Linnakangas hlinn...@iki.fi wrote:

 
 So it lies about fsync()... The next question is, does it nevertheless 
 enforce the correct ordering of persisting fsync'd data? If you write to file 
 A and fsync it, then write to another file B and fsync it too, is it 
 guaranteed that if B is persisted, A is as well? Because if it isn't, you can 
 end up with filesystem (or database) corruption anyway.
 
 - Heikki



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden connection and load average spikes with postgresql 9.3

2015-07-07 Thread Scott Marlowe
Note that if you still have the settings you showed in your original
post you're just moving the goal posts a few feet further back. Any
heavy load can still trigger this kind of behaviour.

On Tue, Jul 7, 2015 at 5:29 AM, eudald_v reaven.galaeind...@gmail.com wrote:
 Hello guys!

 I finally got rid of it.
 It looks that at the end it was all due to transparent_hugepages values.

 I disabled them and cpu spikes disappeared. I am sorry cause it's something
 I usually disable on postgresql servers, but I forgot to do so on this one
 and never thought about it.

 Thanks a lot for all your helpful messages!

 Eudald



 --
 View this message in context: 
 http://postgresql.nabble.com/Sudden-connection-and-load-average-spikes-with-postgresql-9-3-tp5855895p5856914.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
To understand recursion, one must first understand recursion.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] wildcard text filter switched to boolean column, performance is way worse

2015-07-07 Thread Mike Broers
Thanks, very informative! I'll experiment with work_mem settings and report
back.

On Tue, Jul 7, 2015 at 11:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Mike Broers mbro...@gmail.com writes:
  I had a query that was filtering with a wildcard search of a text field
 for
  %SUCCESS%. The query took about 5 seconds and was running often so I
 wanted
  to improve it.  I suggested that the engineers include a new boolean
 column
  for successful status.  They implemented the requested field, but the
 query
  that filters on that new column runs very long (i kill it after letting
 it
  run for about an hour).  Can someone help me understand why that is the
  case and how to resolve it?

 It's hashing the subplan output in the first case and not the second:

  Seq Scan on lead  (cost=130951.81..158059.21 rows=139957 width=369)
 (actual
  time=4699.619..4699.869 rows=1 loops=1)
  Filter: ((NOT (hashed SubPlan 1)) AND ((ReferenceNumber)::text 
  ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (hashed SubPlan 3

 vs

Seq Scan on lead  (cost=85775.78..9005687281.12 rows=139957 width=369)
  Filter: ((NOT (hashed SubPlan 1)) AND ((ReferenceNumber)::text 
  ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (SubPlan 3
^

 Presumably, the new more-accurate rows count causes the planner to realize
 that the hash table will exceed work_mem so it doesn't choose to hash ...
 but for your situation, you'd rather it did, because what you're getting
 instead is a Materialize node that spills to disk (again, because the data
 involved exceeds work_mem) and that's a killer for this query.  You should
 be able to get back the old behavior if you raise work_mem enough.

 Another idea you might think about is changing the OR'd IN conditions
 to a single IN over a UNION ALL of the subselects.  I'm not really sure if
 that would produce a better plan, but it's worth trying if it wouldn't
 require too much app-side contortion.

 regards, tom lane



Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Wei Shan
Hi Graeme,

Why would you think that you don't need RAID for ZFS?

Reason I'm asking if because we are moving to ZFS on FreeBSD for our future
projects.

Regards,
Wei Shan

On 8 July 2015 at 00:46, Graeme B. Bell graeme.b...@nibio.no wrote:

 
  RAID controllers are completely unnecessary for SSD as they currently
  exist.

 Agreed. The best solution is not to buy cheap disks and not to buy RAID
 controllers now, imho.

 In my own situation, I had a tight budget, high performance demand and a
 newish machine with RAID controller and HDDs in it as a starting point.
 So it was more a question of 'what can you do with a free raid controller
 and not much money' back in 2013. And it has worked very well.
 Still, I had hoped for a bit more from the cheaper SSDs though, I'd hoped
 to use fastpath on the controller and bypass the cache.

 The way NVMe prices are going though, I wouldn't do it again if I was
 doing it this year. I'd just go direct to nvme and trash the raid
 controller. These sammy and intel nvmes are basically enterprise hardware
 at consumer prices. Heck, I'll probably put one in my next gaming PC.

 Re: software raid.

 I agree, but once you accept that software raid is now pretty much
 superior to hardware raid, you start looking at ZFS and thinking 'why the
 heck am I even using software raid?'

 G



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Regards,
Ang Wei Shan


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Scott Marlowe
On Tue, Jul 7, 2015 at 11:43 AM, Graeme B. Bell graeme.b...@nibio.no wrote:

 The comment on HDDs is true and gave me another thought.

 These new 'shingled' HDDs (the 8TB ones) rely on rewriting all the data on 
 tracks that overlap your data, any time you change the data. Result: disks 
 8-20x slower during writes, after they fill up.

 Do they have power loss protection for the data being rewritten during 
 reshingling? You could have data commited at position X and you accidentally 
 nuke data at position Y.

 [I know that using a shingled disk sounds crazy (it sounds crazy to me) but 
 you can bet there are people that just want to max out the disk bays in their 
 server... ]

Let's just say no online backup companies are using those disks. :)
Biggest current production spinners being used I know of are 4TB,
non-shingled.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

On 07 Jul 2015, at 19:47, Scott Marlowe scott.marl...@gmail.com wrote:

 [I know that using a shingled disk sounds crazy (it sounds crazy to me) but 
 you can bet there are people that just want to max out the disk bays in 
 their server... ]
 
 Let's just say no online backup companies are using those disks. :)

I'm not so sure. Literally the most famous online backup company is (or was 
planning to):  
https://www.backblaze.com/blog/6-tb-hard-drive-face-off/
But I think that a massive read-only archive really is the only use for these 
things. I hope they go out of fashion, soon. 

But I was thinking more of the 'small company postgres server' or 'charitable 
organisation postgres server'.
Someone is going to make this mistake, you can bet. 
Probably not someone on THIS list, of course... 

 Biggest current production spinners being used I know of are 4TB,
 non-shingled.

I think we may have some 6TB WD reds around here. I'll need to look around.

G



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Merlin Moncure
On Tue, Jul 7, 2015 at 11:46 AM, Graeme B. Bell graeme.b...@nibio.no wrote:

 RAID controllers are completely unnecessary for SSD as they currently
 exist.

 Agreed. The best solution is not to buy cheap disks and not to buy RAID 
 controllers now, imho.

 In my own situation, I had a tight budget, high performance demand and a 
 newish machine with RAID controller and HDDs in it as a starting point.
 So it was more a question of 'what can you do with a free raid controller and 
 not much money' back in 2013. And it has worked very well.
 Still, I had hoped for a bit more from the cheaper SSDs though, I'd hoped to 
 use fastpath on the controller and bypass the cache.

 The way NVMe prices are going though, I wouldn't do it again if I was doing 
 it this year. I'd just go direct to nvme and trash the raid controller. These 
 sammy and intel nvmes are basically enterprise hardware at consumer prices. 
 Heck, I'll probably put one in my next gaming PC.

 Re: software raid.

 I agree, but once you accept that software raid is now pretty much superior 
 to hardware raid, you start looking at ZFS and thinking 'why the heck am I 
 even using software raid?'

Good point. At least for me, I've yet to jump on the ZFS bandwagon and
so don't have an opinion on it.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Heikki Linnakangas

On 07/07/2015 10:59 PM, Graeme B. Bell wrote:

Cache flushing isn't an atomic operation though. Even if the ordering
is right, you are likely to have a partial fsync on the disk when the
lights go out - isn't your FS still corrupt?


If the filesystem is worth its salt, no. Journaling filesystems for 
example rely on the journal to work around that problem, and there are 
other mechanisms.


PostgreSQL has exactly the same problem and uses the WAL to solve it.

- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

Cache flushing isn't an atomic operation though. Even if the ordering is right, 
you are likely to have a partial fsync on the disk when the lights go out - 
isn't your FS still corrupt?

On 07 Jul 2015, at 21:53, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 07/07/2015 09:01 PM, Wes Vaske (wvaske) wrote:
 
 Right, to be precise, the problem isn't the drive lies about fsync(). It lies 
 about FLUSH CACHE instead. Search  replace fsync() with FLUSH CACHE, and the 
 same question remains: When the drive breaks its promise wrt. FLUSH CACHE, 
 does it nevertheless guarantee that the order the data is eventually flushed 
 to disk is consistent with the order in which the data and FLUSH CACHE were 
 sent to the drive? That's an important distinction, because it makes the 
 difference between the most recent data the application saved might be lost 
 even though the FLUSH CACHE command returned and your filesystem is 
 corrupt.
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Wes Vaske (wvaske)
Regarding:
“lie about their fsync status.”

This is mostly semantics but it might help google searches on the issue.

A drive doesn’t support fsync(), that’s a filesystem/kernel process. A drive 
will do a FLUSH CACHE. Before kernels 2.6.low numbers the fsync() call 
wouldn’t sent any ATA or SCSI command to flush the disk cache. 
Whereas—AFAICT—modern kernels and file system versions *will* do this. When 
‘sync’ is called the filesystem will issue the appropriate command to the disk 
to flush the write cache.

For ATA, this is “FLUSH CACHE” (E7h). To check support for the command use:
[root@postgres ~]# smartctl --identify /dev/sdu | grep FLUSH CACHE
  83 13  1   FLUSH CACHE EXT supported
  83 12  1   FLUSH CACHE supported
  86 13  1   FLUSH CACHE EXT supported
  86 12  1   FLUSH CACHE supported

The 1s in the 3rd column represent SUPPORTED for the feature listed in the last 
column.

Cheers,
Wes Vaske

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Michael Nolan
Sent: Tuesday, July 07, 2015 12:28 PM
To: hlinn...@iki.fi
Cc: Wes Vaske (wvaske); Graeme B. Bell; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] New server: SSD/RAID recommendations?



On Tue, Jul 7, 2015 at 10:59 AM, Heikki Linnakangas 
hlinn...@iki.fimailto:hlinn...@iki.fi wrote:
On 07/07/2015 05:15 PM, Wes Vaske (wvaske) wrote:
The M500/M550/M600 are consumer class drives that don't have power
protection for all inflight data.* (like the Samsung 8x0 series and
the Intel 3x0  5x0 series).

The M500DC has full power protection for inflight data and is an
enterprise-class drive (like the Samsung 845DC or Intel S3500  S3700
series).

So any drive without the capacitors to protect inflight data will
suffer from data loss if you're using disk write cache and you pull
the power.

Wow, I would be pretty angry if I installed a SSD in my desktop, and it loses a 
file that I saved just before pulling the power plug.

That can (and does) happen with spinning disks, too.


*Big addendum: There are two issues on powerloss that will mess with
Postgres. Data Loss and Data Corruption. The micron consumer drives
will have power loss protection against Data Corruption and the
enterprise drive will have power loss protection against BOTH.

https://www.micron.com/~/media/documents/products/white-paper/wp_ssd_power_loss_protection.pdf

 The Data Corruption problem is only an issue in non-SLC NAND but
it's industry wide. And even though some drives will protect against
that, the protection of inflight data that's been fsync'd is more
important and should disqualify *any* consumer drives from *any*
company from consideration for use with Postgres.

So it lies about fsync()... The next question is, does it nevertheless enforce 
the correct ordering of persisting fsync'd data? If you write to file A and 
fsync it, then write to another file B and fsync it too, is it guaranteed that 
if B is persisted, A is as well? Because if it isn't, you can end up with 
filesystem (or database) corruption anyway.

- Heikki


The sad fact is that MANY drives (ssd as well as spinning) lie about their 
fsync status.
--
Mike Nolan



Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Heikki Linnakangas

On 07/07/2015 09:01 PM, Wes Vaske (wvaske) wrote:

Regarding:
“lie about their fsync status.”

This is mostly semantics but it might help google searches on the issue.

A drive doesn’t support fsync(), that’s a filesystem/kernel process. A drive will do 
a FLUSH CACHE. Before kernels 2.6.low numbers the fsync() call wouldn’t sent 
any ATA or SCSI command to flush the disk cache. Whereas—AFAICT—modern kernels and 
file system versions*will*  do this. When ‘sync’ is called the filesystem will issue 
the appropriate command to the disk to flush the write cache.

For ATA, this is “FLUSH CACHE” (E7h). To check support for the command use:
[root@postgres ~]# smartctl --identify /dev/sdu | grep FLUSH CACHE
   83 13  1   FLUSH CACHE EXT supported
   83 12  1   FLUSH CACHE supported
   86 13  1   FLUSH CACHE EXT supported
   86 12  1   FLUSH CACHE supported

The 1s in the 3rd column represent SUPPORTED for the feature listed in the last 
column.


Right, to be precise, the problem isn't the drive lies about fsync(). It 
lies about FLUSH CACHE instead. Search  replace fsync() with FLUSH 
CACHE, and the same question remains: When the drive breaks its promise 
wrt. FLUSH CACHE, does it nevertheless guarantee that the order the data 
is eventually flushed to disk is consistent with the order in which the 
data and FLUSH CACHE were sent to the drive? That's an important 
distinction, because it makes the difference between the most recent 
data the application saved might be lost even though the FLUSH CACHE 
command returned and your filesystem is corrupt.


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

That is a very good question, which I have raised elsewhere on the postgresql 
lists previously.

In practice: I have *never* managed to make diskchecker fail with the BBU 
enabled in front of the drives and I spent days trying with plug pulls till I 
reached the point where as a statistical event it just can't be that likely at 
all. That's not to say it can't ever happen, just that I've taken all 
reasonable measures that I can to find out on the time and money budget I had 
available. 

In theory: It may be the fact the BBU makes the drives run at about half speed, 
so that the capacitors go a good bit further to empty the cache, after all: 
without the BBU in the way, the drive manages to save everything but the last 
fragment of writes. But I also suspect that the controller itself maybe 
replaying the last set of writes from around the time of power loss. 

Anyway I'm 50/50 on those two explanations. Any other thoughts welcome. 

This raises another interesting question. Does anyone hear have a document 
explaining how their BBU cache works EXACTLY (at cache / sata level) on their 
server? Because I haven't been able to find any for mine (Dell PERC 
H710/H710P). Can anyone tell me with godlike authority and precision, what 
exactly happens inside that BBU post-power failure?

There is rather too much magic involved for me to be happy.

G

On 07 Jul 2015, at 18:27, Vitalii Tymchyshyn v...@tym.im wrote:

 Hi.
 
 How would BBU cache help you if it lies about fsync? I suppose any RAID 
 controller removes data from BBU cache after it was fsynced by the drive. As 
 I know, there is no other magic command for drive to tell controller that 
 the data is safe now and can be removed from BBU cache.
 
 Вт, 7 лип. 2015 11:59 Graeme B. Bell graeme.b...@nibio.no пише:
 
 Yikes. I would not be able to sleep tonight if it were not for the BBU cache 
 in front of these disks...
 
 diskchecker.pl consistently reported several examples of corruption 
 post-power-loss (usually 10 - 30 ) on unprotected M500s/M550s, so I think 
 it's pretty much open to debate what types of madness and corruption you'll 
 find if you look close enough.
 
 G
 
 
 On 07 Jul 2015, at 16:59, Heikki Linnakangas hlinn...@iki.fi wrote:
 
 
  So it lies about fsync()... The next question is, does it nevertheless 
  enforce the correct ordering of persisting fsync'd data? If you write to 
  file A and fsync it, then write to another file B and fsync it too, is it 
  guaranteed that if B is persisted, A is as well? Because if it isn't, you 
  can end up with filesystem (or database) corruption anyway.
 
  - Heikki
 
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell
 
 This raises another interesting question. Does anyone hear have a document 
 explaining how their BBU cache works EXACTLY (at cache / sata level) on their 
 server? Because I haven't been able to find any for mine (Dell PERC 
 H710/H710P). Can anyone tell me with godlike authority and precision, what 
 exactly happens inside that BBU post-power failure?


(and if you have that manual - how can you know it's accurate? that the 
implementation matches the manual and is free of bugs? because my M500s didn't 
match the packaging and neither did a  H710 we bought - Dell had advertised 
features in some marketing material that were only present on the H710P)

And I see UBER (unrecoverable bit error) rates for SSDs and HDDs, but has 
anyone ever seen them for the flash-based cache on their raid controller?

Sleep well, friends.

Graeme. 

On 07 Jul 2015, at 18:54, Graeme B. Bell graeme.b...@nibio.no wrote:

 
 That is a very good question, which I have raised elsewhere on the postgresql 
 lists previously.
 
 In practice: I have *never* managed to make diskchecker fail with the BBU 
 enabled in front of the drives and I spent days trying with plug pulls till I 
 reached the point where as a statistical event it just can't be that likely 
 at all. That's not to say it can't ever happen, just that I've taken all 
 reasonable measures that I can to find out on the time and money budget I had 
 available. 
 
 In theory: It may be the fact the BBU makes the drives run at about half 
 speed, so that the capacitors go a good bit further to empty the cache, after 
 all: without the BBU in the way, the drive manages to save everything but the 
 last fragment of writes. But I also suspect that the controller itself maybe 
 replaying the last set of writes from around the time of power loss. 
 
 Anyway I'm 50/50 on those two explanations. Any other thoughts welcome. 
 
 This raises another interesting question. Does anyone hear have a document 
 explaining how their BBU cache works EXACTLY (at cache / sata level) on their 
 server? Because I haven't been able to find any for mine (Dell PERC 
 H710/H710P). Can anyone tell me with godlike authority and precision, what 
 exactly happens inside that BBU post-power failure?
 
 There is rather too much magic involved for me to be happy.
 
 G
 
 On 07 Jul 2015, at 18:27, Vitalii Tymchyshyn v...@tym.im wrote:
 
 Hi.
 
 How would BBU cache help you if it lies about fsync? I suppose any RAID 
 controller removes data from BBU cache after it was fsynced by the drive. As 
 I know, there is no other magic command for drive to tell controller that 
 the data is safe now and can be removed from BBU cache.
 
 Вт, 7 лип. 2015 11:59 Graeme B. Bell graeme.b...@nibio.no пише:
 
 Yikes. I would not be able to sleep tonight if it were not for the BBU cache 
 in front of these disks...
 
 diskchecker.pl consistently reported several examples of corruption 
 post-power-loss (usually 10 - 30 ) on unprotected M500s/M550s, so I think 
 it's pretty much open to debate what types of madness and corruption you'll 
 find if you look close enough.
 
 G
 
 
 On 07 Jul 2015, at 16:59, Heikki Linnakangas hlinn...@iki.fi wrote:
 
 
 So it lies about fsync()... The next question is, does it nevertheless 
 enforce the correct ordering of persisting fsync'd data? If you write to 
 file A and fsync it, then write to another file B and fsync it too, is it 
 guaranteed that if B is persisted, A is as well? Because if it isn't, you 
 can end up with filesystem (or database) corruption anyway.
 
 - Heikki
 
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Michael Nolan
On Tue, Jul 7, 2015 at 10:59 AM, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 07/07/2015 05:15 PM, Wes Vaske (wvaske) wrote:

 The M500/M550/M600 are consumer class drives that don't have power
 protection for all inflight data.* (like the Samsung 8x0 series and
 the Intel 3x0  5x0 series).

 The M500DC has full power protection for inflight data and is an
 enterprise-class drive (like the Samsung 845DC or Intel S3500  S3700
 series).

 So any drive without the capacitors to protect inflight data will
 suffer from data loss if you're using disk write cache and you pull
 the power.


 Wow, I would be pretty angry if I installed a SSD in my desktop, and it
 loses a file that I saved just before pulling the power plug.


That can (and does) happen with spinning disks, too.



  *Big addendum: There are two issues on powerloss that will mess with
 Postgres. Data Loss and Data Corruption. The micron consumer drives
 will have power loss protection against Data Corruption and the
 enterprise drive will have power loss protection against BOTH.


 https://www.micron.com/~/media/documents/products/white-paper/wp_ssd_power_loss_protection.pdf

  The Data Corruption problem is only an issue in non-SLC NAND but
 it's industry wide. And even though some drives will protect against
 that, the protection of inflight data that's been fsync'd is more
 important and should disqualify *any* consumer drives from *any*
 company from consideration for use with Postgres.


 So it lies about fsync()... The next question is, does it nevertheless
 enforce the correct ordering of persisting fsync'd data? If you write to
 file A and fsync it, then write to another file B and fsync it too, is it
 guaranteed that if B is persisted, A is as well? Because if it isn't, you
 can end up with filesystem (or database) corruption anyway.

 - Heikki


The sad fact is that MANY drives (ssd as well as spinning) lie about their
fsync status.
--
Mike Nolan


Re: [PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-07 Thread Graeme B. Bell

Hi Merlin, 

Long story short - thanks for the reply, but you're not measuring anything 
about the parallelism of code running in a pl/pgsql environment here. You're 
just measuring whether postgres can parallelise entering that environment and 
get back out. Don't get me wrong - it's great that this scales well because it 
affects situations where you have lots of calls to trivial functions. 
However it's not the problem I'm talking about. I mean 'real' pl'pgsql 
functions. e.g. things that you might find in postgis or similar. 

If you re-read my previous email or look at par_psql  (http://parpsql.com) and 
look at the benchmarks there you'll maybe see more about what I'm talking about.

To clear up the issue I build a little test harness around your comment below. 
If anyone was wondering if it's par_psql itself that causes bad scaling in 
postgres.
The answer is clearly no. :-)

What I found this evening is that there are several problems here. I did some 
testing here using a machine with 16 physical cores and lots of memory/IO. 

- Using a table as a source of input rather than a fixed parameter e.g. 'select 
col1... ' vs. 'select 3'. Please note I am not talking about poor performance, 
I am talking about poor scaling of performance to multicore. There should be no 
reason for this when read-locks are being taken on the table, and no reason for 
this when it is combined with e.g. a bunch of pl/pgsql work in a function. 
However the impact of this problem is only seen above 8 cores where performance 
crashes. 

- Using pl/pgsql itself intensively (e.g. anything non-trivial) causes 
horrifically bad scaling above 2 cores on the systems I've tested and 
performance crashes very hard soon after. This matches what I've seen elsewhere 
in big projects and in par_psql's tests. 

Of course, it could be some wacky postgresql.conf setting (I doubt it here), so 
I'd be glad if others could give it a try. If you're bored, set the time to 5s 
and run, from testing I can tell you it shouldn't alter the results. 

The repo will be up in around 30 minutes time on http://github.com/gbb/t, 
and I'm going to submit it as a bug to the pg bugs list. 

Graeme. 


On 06 Jul 2015, at 18:40, Merlin Moncure mmonc...@gmail.com wrote:

 On Fri, Jul 3, 2015 at 9:48 AM, Graeme B. Bell graeme.b...@nibio.no wrote:
 Hi everyone,
 
 I've written a new open source tool for easily parallelising SQL scripts in 
 postgres.   [obligatory plug:   https://github.com/gbb/par_psql   ]
 
 Using it, I'm seeing a problem that I've also seen in other postgres 
 projects involving high degrees of parallelisation in the last 12 months.
 
 Basically:
 
 - I have machines here with up to 16 CPU cores and 128GB memory, very fast 
 SSDs and controller etc, carefully configured kernel/postgresql.conf for 
 high performance.
 
 - Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff 
 ...), e.g. almost up to 16x performance improvement.
 
 - Non-DB stuff like GDAL, python etc. parallelise nearly perfectly.
 
 - HOWEVER calls to CPU-intensive user-defined pl/pgsql functions (e.g. 
 SELECT myfunction(some_stuff)) do not parallelise well, even when they are 
 independently defined functions, or accessing tables in a read-only way. 
 They hit a limit of 2.5x performance improvement relative to single-CPU 
 performance (pg9.4) and merely 2x performance (pg9.3) regardless of how many 
 CPU cores I throw at them. This is about 6 times slower than I'm expecting.
 
 I can't see what would be locking. It seems like it's the pl/pgsql 
 environment itself that is somehow locking or incurring some huge frictional 
 costs. Whether I use independently defined functions, independent source 
 tables, independent output tables, makes no difference whatsoever, so it 
 doesn't feel 'lock-related'. It also doesn't seem to be WAL/synchronisation 
 related, as the machines I'm using can hit absurdly high pgbench rates, and 
 I'm using unlogged tables for output.
 
 Take a quick peek here: 
 https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md
 
 I'm wondering what I'm missing here. Any ideas?
 
 I'm not necessarily seeing your results.   via pgbench,
 
 mmoncure@mernix2 11:34 AM ~$ ~/pgdev/bin/pgbench -n -T 60 -f b.sql
 transaction type: Custom query
 scaling factor: 1
 query mode: simple
 number of clients: 1
 number of threads: 1
 duration: 60 s
 number of transactions actually processed: 658833
 latency average: 0.091 ms
 tps = 10980.538470 (including connections establishing)
 tps = 10980.994547 (excluding connections establishing)
 mmoncure@mernix2 11:35 AM ~$ ~/pgdev/bin/pgbench -n -T 60 -c4 -j4 -f b.sql
 transaction type: Custom query
 scaling factor: 1
 query mode: simple
 number of clients: 4
 number of threads: 4
 duration: 60 s
 number of transactions actually processed: 2847631
 latency average: 0.084 ms
 tps = 47460.430447 (including connections establishing)
 tps = 47463.702074 (excluding connections establishing)
 
 b.sql:
 select 

Re: [PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-07 Thread Merlin Moncure
On Tue, Jul 7, 2015 at 3:33 PM, Graeme B. Bell graeme.b...@nibio.no wrote:

 Hi Merlin,

 Long story short - thanks for the reply, but you're not measuring anything 
 about the parallelism of code running in a pl/pgsql environment here. You're 
 just measuring whether postgres can parallelise entering that environment and 
 get back out. Don't get me wrong - it's great that this scales well because 
 it affects situations where you have lots of calls to trivial functions.
 However it's not the problem I'm talking about. I mean 'real' pl'pgsql 
 functions. e.g. things that you might find in postgis or similar.

Maybe so.  But it will be a lot easier for me (and others on this)
list if you submit a self contained test case that runs via pgbench.
From there it's a simple matter of a perf top and other standard
locking diagnostic tests and also rules out any suspicion of 3rd party
issues.  This will also get better feedback on -bugs.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

1. Does the sammy nvme have *complete* power loss protection though, for all 
fsync'd data?
I am very badly burned by my experiences with Crucial SSDs and their 'power 
loss protection' which doesn't actually ensure all fsync'd data gets into flash.
It certainly looks pretty with all those capacitors on top in the photos, but 
we need some plug pull tests to be sure. 

2. Apologies for the typo in the previous post, raidz5 should have been raidz1. 

3. Also, something to think about when you start having single disk solutions 
(or non-ZFS raid, for that matter).

SSDs are so unlike HDDs. 

The samsung nvme has a UBER (uncorrectable bit error rate) measured at 1 in 
10^17. That's one bit gone bad in 12500 TB, a good number.  Chances are the 
drives fails before you hit a bit error, and if not, ZFS would catch it.

Whereas current HDDS are at the 1 in 10^14 level. That means an error every 
12TB, by the specs. That means, every time you fill your cheap 6-8TB seagate 
drive, it likely corrupted some of your data *even if it performed according to 
the spec*. (That's also why RAID5 isn't viable for rebuilding large arrays, 
incidentally).

Graeme Bell


On 07 Jul 2015, at 12:56, Mkrtchyan, Tigran tigran.mkrtch...@desy.de wrote:

 
 
 - Original Message -
 From: Graeme B. Bell graeme.b...@nibio.no
 To: Mkrtchyan, Tigran tigran.mkrtch...@desy.de
 Cc: Graeme B. Bell graeme.b...@nibio.no, Steve Crawford 
 scrawf...@pinpointresearch.com, Wes Vaske (wvaske)
 wva...@micron.com, pgsql-performance pgsql-performance@postgresql.org
 Sent: Tuesday, July 7, 2015 12:38:10 PM
 Subject: Re: [PERFORM] New server: SSD/RAID recommendations?
 
 I am unsure about the performance side but, ZFS is generally very attractive 
 to
 me.
 
 Key advantages:
 
 1) Checksumming and automatic fixing-of-broken-things on every file (not just
 postgres pages, but your scripts, O/S, program files).
 2) Built-in  lightweight compression (doesn't help with TOAST tables, in fact
 may slow them down, but helpful for other things). This may actually be a net
 negative for pg so maybe turn it off.
 3) ZRAID mirroring or ZRAID5/6. If you have trouble persuading someone that 
 it's
 safe to replace a RAID array with a single drive... you can use a couple of
 NVMe SSDs with ZFS mirror or zraid, and  get the same availability you'd get
 from a RAID controller. Slightly better, arguably, since they claim to have
 fixed the raid write-hole problem.
 4) filesystem snapshotting
 
 Despite the costs of checksumming etc., I suspect ZRAID running on a fast CPU
 with multiple NVMe drives will outperform quite a lot of the alternatives, 
 with
 great data integrity guarantees.
 
 
 We are planing to have a test setup as well. For now I have single NVMe SSD 
 on my
 test system:
 
 # lspci | grep NVM
 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD 
 Controller 171X (rev 03)
 
 # mount | grep nvm
 /dev/nvme0n1p1 on /var/lib/pgsql/9.5 type ext4 
 (rw,noatime,nodiratime,data=ordered)
 
 
 and quite happy with it. We have write heavy workload on it to see when it 
 will
 break. Postgres Performs very well. About x2.5 faster than with regular disks
 with a single client and almost linear with multiple clients (picture 
 attached.
 On Y number of high level op/s our application does, X number of clients). The
 setup is used last 3 months. Looks promising but for production we need to
 to have disk size twice as big as on the test system. Until today, I was
 planning to use a RAID10 with a HW controller...
 
 Related to ZFS. We use ZFSonlinux and behaviour is not as good as with 
 solaris.
 Let's re-phrase it: performance is unpredictable. We run READZ2 with 30x3TB 
 disks.
 
 Tigran.
 
 
 Haven't built one yet. Hope to, later this year. Steve, I would love to know
 more about how you're getting on with your NVMe disk in postgres!
 
 Graeme.
 
 On 07 Jul 2015, at 12:28, Mkrtchyan, Tigran tigran.mkrtch...@desy.de wrote:
 
 Thanks for the Info.
 
 So if RAID controllers are not an option, what one should use to build
 big databases? LVM with xfs? BtrFs? Zfs?
 
 Tigran.
 
 - Original Message -
 From: Graeme B. Bell graeme.b...@nibio.no
 To: Steve Crawford scrawf...@pinpointresearch.com
 Cc: Wes Vaske (wvaske) wva...@micron.com, pgsql-performance
 pgsql-performance@postgresql.org
 Sent: Tuesday, July 7, 2015 12:22:00 PM
 Subject: Re: [PERFORM] New server: SSD/RAID recommendations?
 
 Completely agree with Steve.
 
 1. Intel NVMe looks like the best bet if you have modern enough hardware 
 for
 NVMe. Otherwise e.g. S3700 mentioned elsewhere.
 
 2. RAID controllers.
 
 We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various 
 machines.
 This might give people idea about where the risk lies in the path from 
 disk to
 CPU.
 
 We've had 2 RAID card failures in the last 12 months that nuked the array 
 with
 days of downtime, and 2 problems with batteries suddenly becoming useless 
 or
 suddenly reporting wildly varying 

Re: [PERFORM] Sudden connection and load average spikes with postgresql 9.3

2015-07-07 Thread eudald_v
Hello guys!

I finally got rid of it.
It looks that at the end it was all due to transparent_hugepages values.

I disabled them and cpu spikes disappeared. I am sorry cause it's something
I usually disable on postgresql servers, but I forgot to do so on this one
and never thought about it.

Thanks a lot for all your helpful messages!

Eudald



--
View this message in context: 
http://postgresql.nabble.com/Sudden-connection-and-load-average-spikes-with-postgresql-9-3-tp5855895p5856914.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell
Hi Karl,

Great post, thanks. 

Though I don't think it's against conventional wisdom to aggregate writes into 
larger blocks rather than rely on 4k performance on ssds :-) 

128kb blocks + compression certainly makes sense. But it might make less sense 
I suppose if you had some incredibly high rate of churn in your rows. 
But for the work we do here, we could use 16MB blocks for all the difference it 
would make. (Tip to others: don't do that. 128kb block performance is already 
enough out the IO bus to most ssds)

Do you have your WAL log on a compressed zfs fs? 

Graeme Bell


On 07 Jul 2015, at 13:28, Karl Denninger k...@denninger.net wrote:

 Lz4 compression and standard 128kb block size has shown to be materially 
 faster here than using 8kb blocks and no compression, both with rotating 
 disks and SSDs.
 
 This is workload dependent in my experience but in the applications we put 
 Postgres to there is a very material improvement in throughput using 
 compression and the larger blocksize, which is counter-intuitive and also 
 opposite the conventional wisdom.
 
 For best throughput we use mirrored vdev sets.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Karl Denninger
On 7/7/2015 06:52, Graeme B. Bell wrote:
 Hi Karl,

 Great post, thanks. 

 Though I don't think it's against conventional wisdom to aggregate writes 
 into larger blocks rather than rely on 4k performance on ssds :-) 

 128kb blocks + compression certainly makes sense. But it might make less 
 sense I suppose if you had some incredibly high rate of churn in your rows. 
 But for the work we do here, we could use 16MB blocks for all the difference 
 it would make. (Tip to others: don't do that. 128kb block performance is 
 already enough out the IO bus to most ssds)

 Do you have your WAL log on a compressed zfs fs? 

 Graeme Bell
Yes.

Data goes on one mirrored set of vdevs, pg_xlog goes on a second,
separate pool.  WAL goes on a third pool on RaidZ2.  WAL typically goes
on rotating storage since I use it (and a basebackup) as disaster
recovery (and in hot spare apps the source for the syncing hot standbys)
and that's nearly a big-block-write-only data stream.  Rotating media is
fine for that in most applications.  I take a new basebackup on
reasonable intervals and rotate the WAL logs to keep that from growing
without boundary.

I use LSI host adapters for the drives themselves (no hardware RAID);
I'm currently running on FreeBSD 10.1.  Be aware that ZFS on FreeBSD has
some fairly nasty issues that I developed (and publish) a patch for;
without it some workloads can result in very undesirable behavior where
working set gets paged out in favor of ZFS ARC; if that happens your
performance will go straight into the toilet.

Back before FreeBSD 9 when ZFS was simply not stable enough for me I
used ARECA hardware RAID adapters and rotating media with BBUs and large
cache memory installed on them with UFS filesystems.  Hardware adapters
are, however, a net lose in a ZFS environment even when they nominally
work well (and they frequently interact very badly with ZFS during
certain operations making them just flat-out unsuitable.)  All-in I far
prefer ZFS on a host adapter to UFS on a RAID adapter both from a data
integrity and performance standpoint.

My SSD drives of choice are all Intel; for lower-end requirements the
730s work very well; the S3500 is next and if your write volume is high
enough the S3700 has much greater endurance (but at a correspondingly
higher price.)  All three are properly power-fail protected.  All three
are much, much faster than rotating storage.  If you can saturate the
SATA channels and need still more I/O throughput NVMe drives are the
next quantum up in performance; I'm not there with our application at
the present time.

Incidentally while there are people who have questioned the 730 series
power loss protection I've tested it with plug-pulls and in addition it
watchdogs its internal power loss capacitors -- from the smartctl -a
display of one of them on an in-service machine here:

175 Power_Loss_Cap_Test 0x0033   100   100   010Pre-fail 
Always   -   643 (4 6868)


-- 
Karl Denninger
k...@denninger.net mailto:k...@denninger.net
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Mkrtchyan, Tigran
Thanks for the Info.

So if RAID controllers are not an option, what one should use to build
big databases? LVM with xfs? BtrFs? Zfs?

Tigran.

- Original Message -
 From: Graeme B. Bell graeme.b...@nibio.no
 To: Steve Crawford scrawf...@pinpointresearch.com
 Cc: Wes Vaske (wvaske) wva...@micron.com, pgsql-performance 
 pgsql-performance@postgresql.org
 Sent: Tuesday, July 7, 2015 12:22:00 PM
 Subject: Re: [PERFORM] New server: SSD/RAID recommendations?

 Completely agree with Steve.
 
 1. Intel NVMe looks like the best bet if you have modern enough hardware for
 NVMe. Otherwise e.g. S3700 mentioned elsewhere.
 
 2. RAID controllers.
 
 We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various machines.
 This might give people idea about where the risk lies in the path from disk to
 CPU.
 
 We've had 2 RAID card failures in the last 12 months that nuked the array with
 days of downtime, and 2 problems with batteries suddenly becoming useless or
 suddenly reporting wildly varying temperatures/overheating. There may have 
 been
 other RAID problems I don't know about.
 
 Our IT dept were replacing Seagate HDDs last year at a rate of 2-3 per week (I
 guess they have 100-200 disks?). We also have about 25-30 Hitachi/HGST HDDs.
 
 So by my estimates:
 30% annual problem rate with RAID controllers
 30-50% failure rate with Seagate HDDs (backblaze saw similar results)
 0% failure rate with HGST HDDs.
 0% failure in our SSDs.   (to be fair, our one samsung SSD apparently has a 
 bug
 in TRIM under linux, which I'll need to investigate to see if we have been
 affected by).
 
 also, RAID controllers aren't free - not just the money but also the 
 management
 of them (ever tried writing a complex install script that interacts work with
 MegaCLI? It can be done but it's not much fun.). Just take a look at the
 MegaCLI manual and ask yourself... is this even worth it (if you have a good
 MTBF on an enterprise SSD).
 
 RAID was meant to be about ensuring availability of data. I have trouble
 believing that these days
 
 Graeme Bell
 
 
 On 06 Jul 2015, at 18:56, Steve Crawford scrawf...@pinpointresearch.com 
 wrote:
 
 
 2. We don't typically have redundant electronic components in our servers. 
 Sure,
 we have dual power supplies and dual NICs (though generally to handle 
 external
 failures) and ECC-RAM but no hot-backup CPU or redundant RAM banks and...no
 backup RAID card. Intel Enterprise SSD already have power-fail protection so 
 I
 don't need a RAID card to give me BBU. Given the MTBF of good enterprise SSD
 I'm left to wonder if placing a RAID card in front merely adds a new point of
 failure and scheduled-downtime-inducing hands-on maintenance (I'm looking at
 you, RAID backup battery).
 
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

I am unsure about the performance side but, ZFS is generally very attractive to 
me. 

Key advantages:

1) Checksumming and automatic fixing-of-broken-things on every file (not just 
postgres pages, but your scripts, O/S, program files). 
2) Built-in  lightweight compression (doesn't help with TOAST tables, in fact 
may slow them down, but helpful for other things). This may actually be a net 
negative for pg so maybe turn it off. 
3) ZRAID mirroring or ZRAID5/6. If you have trouble persuading someone that 
it's safe to replace a RAID array with a single drive... you can use a couple 
of NVMe SSDs with ZFS mirror or zraid, and  get the same availability you'd get 
from a RAID controller. Slightly better, arguably, since they claim to have 
fixed the raid write-hole problem. 
4) filesystem snapshotting

Despite the costs of checksumming etc., I suspect ZRAID running on a fast CPU 
with multiple NVMe drives will outperform quite a lot of the alternatives, with 
great data integrity guarantees. 

Haven't built one yet. Hope to, later this year. Steve, I would love to know 
more about how you're getting on with your NVMe disk in postgres!

Graeme. 

On 07 Jul 2015, at 12:28, Mkrtchyan, Tigran tigran.mkrtch...@desy.de wrote:

 Thanks for the Info.
 
 So if RAID controllers are not an option, what one should use to build
 big databases? LVM with xfs? BtrFs? Zfs?
 
 Tigran.
 
 - Original Message -
 From: Graeme B. Bell graeme.b...@nibio.no
 To: Steve Crawford scrawf...@pinpointresearch.com
 Cc: Wes Vaske (wvaske) wva...@micron.com, pgsql-performance 
 pgsql-performance@postgresql.org
 Sent: Tuesday, July 7, 2015 12:22:00 PM
 Subject: Re: [PERFORM] New server: SSD/RAID recommendations?
 
 Completely agree with Steve.
 
 1. Intel NVMe looks like the best bet if you have modern enough hardware for
 NVMe. Otherwise e.g. S3700 mentioned elsewhere.
 
 2. RAID controllers.
 
 We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various machines.
 This might give people idea about where the risk lies in the path from disk 
 to
 CPU.
 
 We've had 2 RAID card failures in the last 12 months that nuked the array 
 with
 days of downtime, and 2 problems with batteries suddenly becoming useless or
 suddenly reporting wildly varying temperatures/overheating. There may have 
 been
 other RAID problems I don't know about.
 
 Our IT dept were replacing Seagate HDDs last year at a rate of 2-3 per week 
 (I
 guess they have 100-200 disks?). We also have about 25-30 Hitachi/HGST HDDs.
 
 So by my estimates:
 30% annual problem rate with RAID controllers
 30-50% failure rate with Seagate HDDs (backblaze saw similar results)
 0% failure rate with HGST HDDs.
 0% failure in our SSDs.   (to be fair, our one samsung SSD apparently has a 
 bug
 in TRIM under linux, which I'll need to investigate to see if we have been
 affected by).
 
 also, RAID controllers aren't free - not just the money but also the 
 management
 of them (ever tried writing a complex install script that interacts work with
 MegaCLI? It can be done but it's not much fun.). Just take a look at the
 MegaCLI manual and ask yourself... is this even worth it (if you have a good
 MTBF on an enterprise SSD).
 
 RAID was meant to be about ensuring availability of data. I have trouble
 believing that these days
 
 Graeme Bell
 
 
 On 06 Jul 2015, at 18:56, Steve Crawford scrawf...@pinpointresearch.com 
 wrote:
 
 
 2. We don't typically have redundant electronic components in our servers. 
 Sure,
 we have dual power supplies and dual NICs (though generally to handle 
 external
 failures) and ECC-RAM but no hot-backup CPU or redundant RAM banks and...no
 backup RAID card. Intel Enterprise SSD already have power-fail protection 
 so I
 don't need a RAID card to give me BBU. Given the MTBF of good enterprise SSD
 I'm left to wonder if placing a RAID card in front merely adds a new point 
 of
 failure and scheduled-downtime-inducing hands-on maintenance (I'm looking at
 you, RAID backup battery).
 
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Mkrtchyan, Tigran


- Original Message -
 From: Graeme B. Bell graeme.b...@nibio.no
 To: Mkrtchyan, Tigran tigran.mkrtch...@desy.de
 Cc: Graeme B. Bell graeme.b...@nibio.no, Steve Crawford 
 scrawf...@pinpointresearch.com, Wes Vaske (wvaske)
 wva...@micron.com, pgsql-performance pgsql-performance@postgresql.org
 Sent: Tuesday, July 7, 2015 12:38:10 PM
 Subject: Re: [PERFORM] New server: SSD/RAID recommendations?

 I am unsure about the performance side but, ZFS is generally very attractive 
 to
 me.
 
 Key advantages:
 
 1) Checksumming and automatic fixing-of-broken-things on every file (not just
 postgres pages, but your scripts, O/S, program files).
 2) Built-in  lightweight compression (doesn't help with TOAST tables, in fact
 may slow them down, but helpful for other things). This may actually be a net
 negative for pg so maybe turn it off.
 3) ZRAID mirroring or ZRAID5/6. If you have trouble persuading someone that 
 it's
 safe to replace a RAID array with a single drive... you can use a couple of
 NVMe SSDs with ZFS mirror or zraid, and  get the same availability you'd get
 from a RAID controller. Slightly better, arguably, since they claim to have
 fixed the raid write-hole problem.
 4) filesystem snapshotting
 
 Despite the costs of checksumming etc., I suspect ZRAID running on a fast CPU
 with multiple NVMe drives will outperform quite a lot of the alternatives, 
 with
 great data integrity guarantees.


We are planing to have a test setup as well. For now I have single NVMe SSD on 
my
test system:

# lspci | grep NVM
85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD 
Controller 171X (rev 03)

# mount | grep nvm
/dev/nvme0n1p1 on /var/lib/pgsql/9.5 type ext4 
(rw,noatime,nodiratime,data=ordered)


and quite happy with it. We have write heavy workload on it to see when it will
break. Postgres Performs very well. About x2.5 faster than with regular disks
with a single client and almost linear with multiple clients (picture attached.
On Y number of high level op/s our application does, X number of clients). The
setup is used last 3 months. Looks promising but for production we need to
to have disk size twice as big as on the test system. Until today, I was
planning to use a RAID10 with a HW controller...

Related to ZFS. We use ZFSonlinux and behaviour is not as good as with solaris.
Let's re-phrase it: performance is unpredictable. We run READZ2 with 30x3TB 
disks.

Tigran.

 
 Haven't built one yet. Hope to, later this year. Steve, I would love to know
 more about how you're getting on with your NVMe disk in postgres!
 
 Graeme.
 
 On 07 Jul 2015, at 12:28, Mkrtchyan, Tigran tigran.mkrtch...@desy.de wrote:
 
 Thanks for the Info.
 
 So if RAID controllers are not an option, what one should use to build
 big databases? LVM with xfs? BtrFs? Zfs?
 
 Tigran.
 
 - Original Message -
 From: Graeme B. Bell graeme.b...@nibio.no
 To: Steve Crawford scrawf...@pinpointresearch.com
 Cc: Wes Vaske (wvaske) wva...@micron.com, pgsql-performance
 pgsql-performance@postgresql.org
 Sent: Tuesday, July 7, 2015 12:22:00 PM
 Subject: Re: [PERFORM] New server: SSD/RAID recommendations?
 
 Completely agree with Steve.
 
 1. Intel NVMe looks like the best bet if you have modern enough hardware for
 NVMe. Otherwise e.g. S3700 mentioned elsewhere.
 
 2. RAID controllers.
 
 We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various 
 machines.
 This might give people idea about where the risk lies in the path from disk 
 to
 CPU.
 
 We've had 2 RAID card failures in the last 12 months that nuked the array 
 with
 days of downtime, and 2 problems with batteries suddenly becoming useless or
 suddenly reporting wildly varying temperatures/overheating. There may have 
 been
 other RAID problems I don't know about.
 
 Our IT dept were replacing Seagate HDDs last year at a rate of 2-3 per week 
 (I
 guess they have 100-200 disks?). We also have about 25-30 Hitachi/HGST HDDs.
 
 So by my estimates:
 30% annual problem rate with RAID controllers
 30-50% failure rate with Seagate HDDs (backblaze saw similar results)
 0% failure rate with HGST HDDs.
 0% failure in our SSDs.   (to be fair, our one samsung SSD apparently has a 
 bug
 in TRIM under linux, which I'll need to investigate to see if we have been
 affected by).
 
 also, RAID controllers aren't free - not just the money but also the 
 management
 of them (ever tried writing a complex install script that interacts work 
 with
 MegaCLI? It can be done but it's not much fun.). Just take a look at the
 MegaCLI manual and ask yourself... is this even worth it (if you have a good
 MTBF on an enterprise SSD).
 
 RAID was meant to be about ensuring availability of data. I have trouble
 believing that these days
 
 Graeme Bell
 
 
 On 06 Jul 2015, at 18:56, Steve Crawford scrawf...@pinpointresearch.com 
 wrote:
 
 
 2. We don't typically have redundant electronic components in our servers. 
 Sure,
 we have dual power supplies and dual NICs 

Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

Completely agree with Steve.

1. Intel NVMe looks like the best bet if you have modern enough hardware for 
NVMe. Otherwise e.g. S3700 mentioned elsewhere.

2. RAID controllers. 

We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various machines. 
This might give people idea about where the risk lies in the path from disk to 
CPU. 

We've had 2 RAID card failures in the last 12 months that nuked the array with 
days of downtime, and 2 problems with batteries suddenly becoming useless or 
suddenly reporting wildly varying temperatures/overheating. There may have been 
other RAID problems I don't know about. 

Our IT dept were replacing Seagate HDDs last year at a rate of 2-3 per week (I 
guess they have 100-200 disks?). We also have about 25-30 Hitachi/HGST HDDs.

So by my estimates:
30% annual problem rate with RAID controllers
30-50% failure rate with Seagate HDDs (backblaze saw similar results)
0% failure rate with HGST HDDs. 
0% failure in our SSDs.   (to be fair, our one samsung SSD apparently has a bug 
in TRIM under linux, which I'll need to investigate to see if we have been 
affected by). 

also, RAID controllers aren't free - not just the money but also the management 
of them (ever tried writing a complex install script that interacts work with 
MegaCLI? It can be done but it's not much fun.). Just take a look at the 
MegaCLI manual and ask yourself... is this even worth it (if you have a good 
MTBF on an enterprise SSD).

RAID was meant to be about ensuring availability of data. I have trouble 
believing that these days

Graeme Bell


On 06 Jul 2015, at 18:56, Steve Crawford scrawf...@pinpointresearch.com wrote:

 
 2. We don't typically have redundant electronic components in our servers. 
 Sure, we have dual power supplies and dual NICs (though generally to handle 
 external failures) and ECC-RAM but no hot-backup CPU or redundant RAM banks 
 and...no backup RAID card. Intel Enterprise SSD already have power-fail 
 protection so I don't need a RAID card to give me BBU. Given the MTBF of good 
 enterprise SSD I'm left to wonder if placing a RAID card in front merely adds 
 a new point of failure and scheduled-downtime-inducing hands-on maintenance 
 (I'm looking at you, RAID backup battery).



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Karl Denninger
On 7/7/2015 05:56, Mkrtchyan, Tigran wrote:

 - Original Message -
 From: Graeme B. Bell graeme.b...@nibio.no
 To: Mkrtchyan, Tigran tigran.mkrtch...@desy.de
 Cc: Graeme B. Bell graeme.b...@nibio.no, Steve Crawford 
 scrawf...@pinpointresearch.com, Wes Vaske (wvaske)
 wva...@micron.com, pgsql-performance pgsql-performance@postgresql.org
 Sent: Tuesday, July 7, 2015 12:38:10 PM
 Subject: Re: [PERFORM] New server: SSD/RAID recommendations?
 I am unsure about the performance side but, ZFS is generally very attractive 
 to
 me.

 Key advantages:

 1) Checksumming and automatic fixing-of-broken-things on every file (not just
 postgres pages, but your scripts, O/S, program files).
 2) Built-in  lightweight compression (doesn't help with TOAST tables, in fact
 may slow them down, but helpful for other things). This may actually be a net
 negative for pg so maybe turn it off.
 3) ZRAID mirroring or ZRAID5/6. If you have trouble persuading someone that 
 it's
 safe to replace a RAID array with a single drive... you can use a couple of
 NVMe SSDs with ZFS mirror or zraid, and  get the same availability you'd get
 from a RAID controller. Slightly better, arguably, since they claim to have
 fixed the raid write-hole problem.
 4) filesystem snapshotting

 Despite the costs of checksumming etc., I suspect ZRAID running on a fast CPU
 with multiple NVMe drives will outperform quite a lot of the alternatives, 
 with
 great data integrity guarantees.
Lz4 compression and standard 128kb block size has shown to be materially
faster here than using 8kb blocks and no compression, both with rotating
disks and SSDs.

This is workload dependent in my experience but in the applications we
put Postgres to there is a very material improvement in throughput using
compression and the larger blocksize, which is counter-intuitive and
also opposite the conventional wisdom.

For best throughput we use mirrored vdev sets.

-- 
Karl Denninger
k...@denninger.net mailto:k...@denninger.net
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature


[PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-07 Thread Graeme B. Bell
Hi everyone,

I've written a new open source tool for easily parallelising SQL scripts in 
postgres.   [obligatory plug:   https://github.com/gbb/par_psql   ]

Using it, I'm seeing a problem I've seen in other postgres projects involving 
parallelisation in the last 12 months.

Basically:

- I have machines here with up to 16 CPUs and 128GB memory, very fast SSDs and 
controller etc, carefully configured kernel/postgresql.conf for high 
performance.

- Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), 
e.g. almost up to 16x performance improvement.

- Calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT 
myfunction(some_stuff)) do not parallelise well, even when they are independent 
or accessing tables in a read-only way. They hit a limit at 2.5x performance 
improvement relative to single-CPU performance (pg9.4) and 2x performance 
(pg9.3). This is about 6 times slower than I'm expecting. 

- Can't see what would be locking. It seems like it's the pl/pgsql environment 
itself that is somehow locking or incurring some huge frictional costs. Whether 
I use independently defined functions, independent source tables, independent 
output tables, makes no difference whatsoever, so it doesn't feel 'locky'. It 
also doesn't seem to be WAL/synchronisation related, as the machines I'm using 
can hit absurdly high pgbench rates, and I'm using unlogged tables.

Curious? Take a quick peek here: 
https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md

Wondering what I'm missing here. Any ideas?

Graeme. 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-07 Thread Craig James
On Thu, Jul 2, 2015 at 9:15 AM, Graeme B. Bell graeme.b...@nibio.no wrote:

 Hi everyone,

 I've written a new open source tool for easily parallelising SQL scripts
 in postgres.   [obligatory plug:   https://github.com/gbb/par_psql   ]

 Using it, I'm seeing a problem I've seen in other postgres projects
 involving parallelisation in the last 12 months.

 Basically:

 - I have machines here with up to 16 CPUs and 128GB memory, very fast SSDs
 and controller etc, carefully configured kernel/postgresql.conf for high
 performance.

 - Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff
 ...), e.g. almost up to 16x performance improvement.

 - Calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT
 myfunction(some_stuff)) do not parallelise well, even when they are
 independent or accessing tables in a read-only way. They hit a limit at
 2.5x performance improvement relative to single-CPU performance (pg9.4) and
 2x performance (pg9.3). This is about 6 times slower than I'm expecting.

 - Can't see what would be locking. It seems like it's the pl/pgsql
 environment itself that is somehow locking or incurring some huge
 frictional costs. Whether I use independently defined functions,
 independent source tables, independent output tables, makes no difference
 whatsoever, so it doesn't feel 'locky'. It also doesn't seem to be
 WAL/synchronisation related, as the machines I'm using can hit absurdly
 high pgbench rates, and I'm using unlogged tables.

 Curious? Take a quick peek here:
 https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md

 Wondering what I'm missing here. Any ideas?


No ideas, but I ran into the same thing. I have a set of C/C++ functions
that put some chemistry calculations into Postgres as extensions (things
like, calculate the molecular weight of this molecule). As SQL functions,
the whole thing bogged down, and we never got the scalability we needed. On
our 8-CPU setup, we couldn't get more than 2 CPUs busy at the same time,
even with dozens of clients.

When I moved these same functions into an Apache fast-CGI HTTP service
(exact same code, same network overhead), I could easily scale up and use
the full 100% of all eight CPUs.

I have no idea why, and never investigated further. The convenience of
having the functions in SQL wasn't that important.

Craig



 Graeme.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
-
Craig A. James
Chief Technology Officer
eMolecules, Inc.
-


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Wes Vaske (wvaske)
The M500/M550/M600 are consumer class drives that don't have power protection 
for all inflight data.* (like the Samsung 8x0 series and the Intel 3x0  5x0 
series).

The M500DC has full power protection for inflight data and is an 
enterprise-class drive (like the Samsung 845DC or Intel S3500  S3700 series).

So any drive without the capacitors to protect inflight data will suffer from 
data loss if you're using disk write cache and you pull the power.

*Big addendum:
There are two issues on powerloss that will mess with Postgres. Data Loss and 
Data Corruption. The micron consumer drives will have power loss protection 
against Data Corruption and the enterprise drive will have power loss 
protection against BOTH.

https://www.micron.com/~/media/documents/products/white-paper/wp_ssd_power_loss_protection.pdf
 

The Data Corruption problem is only an issue in non-SLC NAND but it's industry 
wide. And even though some drives will protect against that, the protection of 
inflight data that's been fsync'd is more important and should disqualify *any* 
consumer drives from *any* company from consideration for use with Postgres.

Wes Vaske | Senior Storage Solutions Engineer
Micron Technology 

-Original Message-
From: Graeme B. Bell [mailto:graeme.b...@nibio.no] 
Sent: Tuesday, July 07, 2015 8:26 AM
To: Merlin Moncure
Cc: Wes Vaske (wvaske); Craig James; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] New server: SSD/RAID recommendations?


As I have warned elsewhere,

The M500/M550 from $SOME_COMPANY is NOT SUITABLE for postgres unless you have a 
RAID controller with BBU to protect yourself.
The M500/M550 are NOT plug-pull safe despite the 'power loss protection' 
claimed on the packaging. Not all fsync'd data is preserved in the event of a 
power loss, which completely undermines postgres's sanity. 

I would be extremely skeptical about the M500DC given the name and 
manufacturer. 

I went to quite a lot of trouble to provide $SOME_COMPANYs engineers with the 
full details of this fault after extensive testing (we have e.g. 20-25 of these 
disks) on multiple machines and controllers, at their request. Result: they 
stopped replying to me, and soon after I saw their PR reps talking about how 
'power loss protection isn't about protecting all data during a power loss'. 

The only safe way to use an M500/M550 with postgres is:

a) disable the disk cache, which will cripple performance to about 3-5% of 
normal.
b) use a battery backed or cap-backed RAID controller, which will generally 
hurt performance, by limiting you to the peak performance of the flash on the 
raid controller. 

If you are buying such a drive, I strongly recommend buying only one and doing 
extensive plug pull testing before commiting to several. 
For myself, my time is valuable enough that it will be cheaper to buy intel in 
future. 

Graeme.

On 07 Jul 2015, at 15:12, Merlin Moncure mmonc...@gmail.com wrote:

 On Thu, Jul 2, 2015 at 1:00 PM, Wes Vaske (wvaske) wva...@micron.com wrote:
 Storage Review has a pretty good process and reviewed the M500DC when it 
 released last year. 
 http://www.storagereview.com/micron_m500dc_enterprise_ssd_review
 
  
 
 The only database-specific info we have available are for Cassandra and MSSQL:
 
 http://www.micron.com/~/media/documents/products/technical-marketing-brief/cassandra_and_m500dc_enterprise_ssd_tech_brief.pdf
 
 http://www.micron.com/~/media/documents/products/technical-marketing-brief/sql_server_2014_and_m500dc_raid_configuration_tech_brief.pdf
 
  
 
 (some of that info might be relevant)
 
  
 
 In terms of endurance, the M500DC is rated to 2 Drive Writes Per Day (DWPD) 
 for 5-years. For comparison:
 
 Micron M500DC (20nm) - 2 DWPD
 
 Intel S3500 (20nm) - 0.3 DWPD
 
 Intel S3510 (16nm) - 0.3 DWPD
 
 Intel S3710 (20nm) - 10 DWPD
 
  
 
 They're all great drives, the question is how write-intensive is the workload.
 
 
 
 
 Intel added a new product, the 3610, that is rated for 3 DWPD.  Pricing looks 
 to be around 1.20$/GB.
 
 merlin 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

Thanks, this is very useful to know about the 730. When you say 'tested it with 
plug-pulls', you were using diskchecker.pl, right?

Graeme.

On 07 Jul 2015, at 14:39, Karl Denninger k...@denninger.net wrote:

 
 Incidentally while there are people who have questioned the 730 series power 
 loss protection I've tested it with plug-pulls and in addition it watchdogs 
 its internal power loss capacitors -- from the smartctl -a display of one of 
 them on an in-service machine here:
 
 175 Power_Loss_Cap_Test 0x0033   100   100   010Pre-fail  Always  
  -   643 (4 6868)



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Merlin Moncure
On Thu, Jul 2, 2015 at 1:00 PM, Wes Vaske (wvaske) wva...@micron.com
wrote:

  Storage Review has a pretty good process and reviewed the M500DC when it
 released last year.
 http://www.storagereview.com/micron_m500dc_enterprise_ssd_review



 The only database-specific info we have available are for Cassandra and
 MSSQL:


 http://www.micron.com/~/media/documents/products/technical-marketing-brief/cassandra_and_m500dc_enterprise_ssd_tech_brief.pdf


 http://www.micron.com/~/media/documents/products/technical-marketing-brief/sql_server_2014_and_m500dc_raid_configuration_tech_brief.pdf



 (some of that info might be relevant)



 In terms of endurance, the M500DC is rated to 2 Drive Writes Per Day
 (DWPD) for 5-years. For comparison:

 Micron M500DC (20nm) – 2 DWPD

 Intel S3500 (20nm) – 0.3 DWPD

 Intel S3510 (16nm) – 0.3 DWPD

 Intel S3710 (20nm) – 10 DWPD



 They’re all great drives, the question is how write-intensive is the
 workload.



Intel added a new product, the 3610, that is rated for 3 DWPD.  Pricing
looks to be around 1.20$/GB.

merlin


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

As I have warned elsewhere,

The M500/M550 from $SOME_COMPANY is NOT SUITABLE for postgres unless you have a 
RAID controller with BBU to protect yourself.
The M500/M550 are NOT plug-pull safe despite the 'power loss protection' 
claimed on the packaging. Not all fsync'd data is preserved in the event of a 
power loss, which completely undermines postgres's sanity. 

I would be extremely skeptical about the M500DC given the name and 
manufacturer. 

I went to quite a lot of trouble to provide $SOME_COMPANYs engineers with the 
full details of this fault after extensive testing (we have e.g. 20-25 of these 
disks) on multiple machines and controllers, at their request. Result: they 
stopped replying to me, and soon after I saw their PR reps talking about how 
'power loss protection isn't about protecting all data during a power loss'. 

The only safe way to use an M500/M550 with postgres is:

a) disable the disk cache, which will cripple performance to about 3-5% of 
normal.
b) use a battery backed or cap-backed RAID controller, which will generally 
hurt performance, by limiting you to the peak performance of the flash on the 
raid controller. 

If you are buying such a drive, I strongly recommend buying only one and doing 
extensive plug pull testing before commiting to several. 
For myself, my time is valuable enough that it will be cheaper to buy intel in 
future. 

Graeme.

On 07 Jul 2015, at 15:12, Merlin Moncure mmonc...@gmail.com wrote:

 On Thu, Jul 2, 2015 at 1:00 PM, Wes Vaske (wvaske) wva...@micron.com wrote:
 Storage Review has a pretty good process and reviewed the M500DC when it 
 released last year. 
 http://www.storagereview.com/micron_m500dc_enterprise_ssd_review
 
  
 
 The only database-specific info we have available are for Cassandra and MSSQL:
 
 http://www.micron.com/~/media/documents/products/technical-marketing-brief/cassandra_and_m500dc_enterprise_ssd_tech_brief.pdf
 
 http://www.micron.com/~/media/documents/products/technical-marketing-brief/sql_server_2014_and_m500dc_raid_configuration_tech_brief.pdf
 
  
 
 (some of that info might be relevant)
 
  
 
 In terms of endurance, the M500DC is rated to 2 Drive Writes Per Day (DWPD) 
 for 5-years. For comparison:
 
 Micron M500DC (20nm) – 2 DWPD
 
 Intel S3500 (20nm) – 0.3 DWPD
 
 Intel S3510 (16nm) – 0.3 DWPD
 
 Intel S3710 (20nm) – 10 DWPD
 
  
 
 They’re all great drives, the question is how write-intensive is the workload.
 
 
 
 
 Intel added a new product, the 3610, that is rated for 3 DWPD.  Pricing looks 
 to be around 1.20$/GB.
 
 merlin 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-07 Thread Joshua D. Drake


On 07/07/2015 08:05 PM, Craig James wrote:



No ideas, but I ran into the same thing. I have a set of C/C++ functions
that put some chemistry calculations into Postgres as extensions (things
like, calculate the molecular weight of this molecule). As SQL
functions, the whole thing bogged down, and we never got the scalability
we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at
the same time, even with dozens of clients.

When I moved these same functions into an Apache fast-CGI HTTP service
(exact same code, same network overhead), I could easily scale up and
use the full 100% of all eight CPUs.

I have no idea why, and never investigated further. The convenience of
having the functions in SQL wasn't that important.


I admit that I haven't read this whole thread but:

Using Apache Fast-CGI, you are going to fork a process for each instance 
of the function being executed and that in turn will use all CPUs up to 
the max available resource.


With PostgreSQL, that isn't going to happen unless you are running (at 
least) 8 functions across 8 connections.


JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Heikki Linnakangas

On 07/07/2015 05:15 PM, Wes Vaske (wvaske) wrote:

The M500/M550/M600 are consumer class drives that don't have power
protection for all inflight data.* (like the Samsung 8x0 series and
the Intel 3x0  5x0 series).

The M500DC has full power protection for inflight data and is an
enterprise-class drive (like the Samsung 845DC or Intel S3500  S3700
series).

So any drive without the capacitors to protect inflight data will
suffer from data loss if you're using disk write cache and you pull
the power.


Wow, I would be pretty angry if I installed a SSD in my desktop, and it 
loses a file that I saved just before pulling the power plug.



*Big addendum: There are two issues on powerloss that will mess with
Postgres. Data Loss and Data Corruption. The micron consumer drives
will have power loss protection against Data Corruption and the
enterprise drive will have power loss protection against BOTH.

https://www.micron.com/~/media/documents/products/white-paper/wp_ssd_power_loss_protection.pdf

 The Data Corruption problem is only an issue in non-SLC NAND but
it's industry wide. And even though some drives will protect against
that, the protection of inflight data that's been fsync'd is more
important and should disqualify *any* consumer drives from *any*
company from consideration for use with Postgres.


So it lies about fsync()... The next question is, does it nevertheless 
enforce the correct ordering of persisting fsync'd data? If you write to 
file A and fsync it, then write to another file B and fsync it too, is 
it guaranteed that if B is persisted, A is as well? Because if it isn't, 
you can end up with filesystem (or database) corruption anyway.


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance