Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Neto pr
2018-07-17 22:13 GMT-03:00 Neto pr :
> 2018-07-17 20:04 GMT-03:00 Mark Kirkwood :
>> Ok, so dropping the cache is good.
>>
>> How are you ensuring that you have one test setup on the HDDs and one on the
>> SSDs? i.e do you have 2 postgres instances? or are you using one instance
>> with tablespaces to locate the relevant tables? If the 2nd case then you
>> will get pollution of shared_buffers if you don't restart between the HHD
>> and SSD tests. If you have 2 instances then you need to carefully check the
>> parameters are set the same (and probably shut the HDD instance down when
>> testing the SSD etc).
>>
> Dear  Mark
> To ensure that the test is honest and has the same configuration the
> O.S. and also DBMS, my O.S. is installed on the SSD and DBMS as well.
> I have an instance only of DBMS and two database.
> - a database called tpch40gnorhdd with tablespace on the HDD disk.
> - a database called tpch40gnorssd with tablespace on the SSD disk.
> See below:
>
> postgres=# \l
> List of databases
>  Name  |  Owner   | Encoding |   Collate   |Ctype|
> Access privileges
> ---+--+--+-+-+---
>  postgres  | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
>  template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
> =c/postgres  +
>|  |  | | |
> postgres=CTc/postgres
>  template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
> =c/postgres  +
>|  |  | | |
> postgres=CTc/postgres
>  tpch40gnorhdd | user1| UTF8 | en_US.UTF-8 | en_US.UTF-8 |
>  tpch40gnorssd | user1| UTF8 | en_US.UTF-8 | en_US.UTF-8 |
> (5 rows)
>
> postgres=#
>
> After 7 query execution in a database tpch40gnorhdd I restart the DBMS
> (/etc/init.d/pg101norssd restart and drop cache of the O.S.) and go to
> execution test with the database tpch40gnorssd.
> You think in this case there is pollution of shared_buffers?
> Why do you think having O.S. on SSD is bad? Do you could explain better?
>
> Best regards
> []`s Neto
>

+1 information about EVO SSD Samsung:

 Model: 850 Evo 500 GB SATA III 6Gb/s -
http://www.samsung.com/semiconductor/minisite/ssd/product/consumer/850evo/


>> I can see a couple of things in your setup that might pessimize the SDD
>> case:
>> - you have OS on the SSD - if you tests make the system swap then this will
>> wreck the SSD result
>> - you have RAID 0 SSD...some of the cheaper ones slow down when you do this.
>> maybe test with a single SSD
>>
>> regards
>> Mark
>>
>> On 18/07/18 01:04, Neto pr wrote (note snippage):
>>
>>> (echo 3> / proc / sys / vm / drop_caches;
>>>
>>> discs:
>>> - 2 units of Samsung Evo SSD 500 GB (mounted on ZERO RAID)
>>> - 2 SATA 7500 Krpm HDD units - 1TB (mounted on ZERO RAID)
>>>
>>> - The Operating System and the Postgresql DBMS are installed on the SSD
>>> disk.
>>>
>>>
>>



Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Neto pr
2018-07-17 20:04 GMT-03:00 Mark Kirkwood :
> Ok, so dropping the cache is good.
>
> How are you ensuring that you have one test setup on the HDDs and one on the
> SSDs? i.e do you have 2 postgres instances? or are you using one instance
> with tablespaces to locate the relevant tables? If the 2nd case then you
> will get pollution of shared_buffers if you don't restart between the HHD
> and SSD tests. If you have 2 instances then you need to carefully check the
> parameters are set the same (and probably shut the HDD instance down when
> testing the SSD etc).
>
Dear  Mark
To ensure that the test is honest and has the same configuration the
O.S. and also DBMS, my O.S. is installed on the SSD and DBMS as well.
I have an instance only of DBMS and two database.
- a database called tpch40gnorhdd with tablespace on the HDD disk.
- a database called tpch40gnorssd with tablespace on the SSD disk.
See below:

postgres=# \l
List of databases
 Name  |  Owner   | Encoding |   Collate   |Ctype|
Access privileges
---+--+--+-+-+---
 postgres  | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
=c/postgres  +
   |  |  | | |
postgres=CTc/postgres
 template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
=c/postgres  +
   |  |  | | |
postgres=CTc/postgres
 tpch40gnorhdd | user1| UTF8 | en_US.UTF-8 | en_US.UTF-8 |
 tpch40gnorssd | user1| UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(5 rows)

postgres=#

After 7 query execution in a database tpch40gnorhdd I restart the DBMS
(/etc/init.d/pg101norssd restart and drop cache of the O.S.) and go to
execution test with the database tpch40gnorssd.
You think in this case there is pollution of shared_buffers?
Why do you think having O.S. on SSD is bad? Do you could explain better?

Best regards
[]`s Neto

> I can see a couple of things in your setup that might pessimize the SDD
> case:
> - you have OS on the SSD - if you tests make the system swap then this will
> wreck the SSD result
> - you have RAID 0 SSD...some of the cheaper ones slow down when you do this.
> maybe test with a single SSD
>
> regards
> Mark
>
> On 18/07/18 01:04, Neto pr wrote (note snippage):
>
>> (echo 3> / proc / sys / vm / drop_caches;
>>
>> discs:
>> - 2 units of Samsung Evo SSD 500 GB (mounted on ZERO RAID)
>> - 2 SATA 7500 Krpm HDD units - 1TB (mounted on ZERO RAID)
>>
>> - The Operating System and the Postgresql DBMS are installed on the SSD
>> disk.
>>
>>
>



Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Mark Kirkwood

Yeah,

A +1 to telling us the model. In particular the later EVOs use TLC nand 
with a small SLC cache... and when you exhaust the SLC cache the 
performance can be worse than a HDD...



On 18/07/18 01:44, Nicolas Charles wrote:

Hi Neto,

You should list the SSD model also - there are pleinty of Samsung EVO 
drives - and they are not professional grade.


Among the the possible issues, the most likely (from my point of view) 
are:


- TRIM command doesn't go through the RAID (which is really likely) - 
so the SSD controller think it's full, and keep pushing blocks around 
to level wear, causing massive perf degradation - please check this 
config on you RAID driver/adapter


- TRIM is not configured on the OS level for the SSD

- Partitions is not correctly aligned on the SSD blocks


Without so little details on your system, we can only try to guess the 
real issues



Nicolas

Nicolas CHARLES
Le 17/07/2018 à 15:19, Neto pr a écrit :

2018-07-17 10:04 GMT-03:00 Neto pr :

Sorry.. I replied in the wrong message before ...
follows my response.
-

Thanks all, but I still have not figured it out.
This is really strange because the tests were done on the same machine
(I use  HP ML110 Proliant 8gb RAM - Xeon 2.8 ghz processor (4
cores), and POSTGRESQL 10.1.
- Only the mentioned query running at the time of the test.
- I repeated the query 7 times and did not change the results.
- Before running each batch of 7 executions, I discarded the Operating
System cache and restarted DBMS like this:
(echo 3> / proc / sys / vm / drop_caches;

discs:
- 2 units of Samsung Evo SSD 500 GB (mounted on ZERO RAID)
- 2 SATA 7500 Krpm HDD units - 1TB (mounted on ZERO RAID)

- The Operating System and the Postgresql DBMS are installed on the 
SSD disk.



One more information.
I used default configuration to Postgresql.conf
Only exception is to :
random_page_cost on SSD is 1.1



Best Regards
[ ]`s Neto

2018-07-17 1:08 GMT-07:00 Fabio Pardi :
As already mentioned by Robert, please let us know if you made sure 
that

nothing was fished from RAM, over the faster test.

In other words, make sure that all caches are dropped between one test
and another.

Also,to better picture the situation, would be good to know:

- which SSD (brand/model) are you using?
- which HDD?
- how are the disks configured? RAID? or not?
- on which OS?
- what are the mount options? SSD requires tuning
- did you make sure that no other query was running at the time of the
bench?
- are you making a comparison on the same machine?
- is it HW or VM? benchs should better run on bare metal to avoid
results pollution (eg: other VMS on the same hypervisor using the 
disk,

host caching and so on)
- how many times did you run the tests?
- did you change postgres configuration over tests?
- can you post postgres config?
- what about vacuums or maintenance tasks running in the background?

Also, to benchmark disks i would not use a custom query but pgbench.

Be aware: running benchmarks is a science, therefore needs a 
scientific

approach :)

regards

fabio pardi



On 07/17/2018 07:00 AM, Neto pr wrote:

Dear,
Some of you can help me understand this.

This query plan is executed in the query below (query 9 of TPC-H
Benchmark, with scale 40, database with approximately 40 gb).

The experiment consisted of running the query on a HDD (Raid zero).
Then the same query is executed on an SSD (Raid Zero).

Why did the HDD (7200 rpm)  perform better?
HDD - TIME 9 MINUTES
SSD - TIME 15 MINUTES

As far as I know, the SSD has a reading that is 300 times faster 
than SSD.


--- Execution  Plans---
ssd 40g
https://explain.depesz.com/s/rHkh

hdd 40g
https://explain.depesz.com/s/l4sq

Query 

select
 nation,
 o_year,
 sum(amount) as sum_profit
from
 (
 select
 n_name as nation,
 extract(year from o_orderdate) as o_year,
 l_extendedprice * (1 - l_discount) - ps_supplycost *
l_quantity as amount
 from
 part,
 supplier,
 lineitem,
 partsupp,
 orders,
 nation
 where
 s_suppkey = l_suppkey
 and ps_suppkey = l_suppkey
 and ps_partkey = l_partkey
 and p_partkey = l_partkey
 and o_orderkey = l_orderkey
 and s_nationkey = n_nationkey
 and p_name like '%orchid%'
 ) as profit
group by
 nation,
 o_year
order by
 nation,
 o_year desc









Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Mark Kirkwood

Ok, so dropping the cache is good.

How are you ensuring that you have one test setup on the HDDs and one on 
the SSDs? i.e do you have 2 postgres instances? or are you using one 
instance with tablespaces to locate the relevant tables? If the 2nd case 
then you will get pollution of shared_buffers if you don't restart 
between the HHD and SSD tests. If you have 2 instances then you need to 
carefully check the parameters are set the same (and probably shut the 
HDD instance down when testing the SSD etc).


I can see a couple of things in your setup that might pessimize the SDD 
case:
- you have OS on the SSD - if you tests make the system swap then this 
will wreck the SSD result
- you have RAID 0 SSD...some of the cheaper ones slow down when you do 
this. maybe test with a single SSD


regards
Mark

On 18/07/18 01:04, Neto pr wrote (note snippage):

(echo 3> / proc / sys / vm / drop_caches;

discs:
- 2 units of Samsung Evo SSD 500 GB (mounted on ZERO RAID)
- 2 SATA 7500 Krpm HDD units - 1TB (mounted on ZERO RAID)

- The Operating System and the Postgresql DBMS are installed on the SSD disk.







Re: Improving Performance of Query ~ Filter by A, Sort by B

2018-07-17 Thread Jeff Janes
On Mon, Jul 16, 2018 at 5:29 PM, Lincoln Swaine-Moore <
lswainemo...@gmail.com> wrote:

> Tom and Jeff,
>
> Thanks very much for the suggestions!
>
> Here's what I've found so far after playing around for a few more days:
>
> What is your default_statistics_target?  What can you tell us about the
>> distribution of parent_id?  (exponential, power law, etc?).  Can you show
>> the results for select * from pg_stats where tablename='a' and
>> attname='parent_id' \x\g\x  ?
>
>
> The default_statistics_target is 500, which I agree seems quite
> insufficient for these purposes. I bumped this up to 2000, and saw some
> improvement in the row count estimation, but pretty much the same query
> plans. Unfortunately the distribution of counts is not intended to be
> correlated to parent_id, which is one reason I imagine the histograms might
> not be particularly effective unless theres one bucket for every value.
> Here is the output you requested:
>
> select * from pg_stats where tablename='a' and attname='parent_id';
>
> schemaname | public
> tablename  | a
> attname| parent_id
> inherited  | t
> null_frac  | 0
> avg_width  | 4
> n_distinct | 18871
> most_common_vals   | {15503,49787,49786,24595,49784,17549, ...} (2000
> values)
> most_common_freqs  | {0.0252983,0.02435,0.0241317,
> 0.02329,0.019095,0.0103967,0.00758833,0.004245, ...} (2000 values)
>

You showed the 8 most common frequencies.  But could you also show the last
couple of them?  When your queried parent_id value is not on the MCV list,
it is the frequency of the least frequent one on the list which puts an
upper limit on how frequent the one you queried for can be.



> A few questions re: statistics:
>  1) would it be helpful to bump column statistics to, say, 20k (the number
> of distinct values of parent_id)?
>

Only one way to find out...
However you can only go up to 10k, not 20k.



>  2) is the discrepancy between the statistics on the parent and child
> table be expected? certainly I would think that the statistics would be
> different, but I would've imagined they would have histograms of the same
> size given the settings being the same.
>

Is the n_distinct estimate accurate for the partition?  There is an
algorithm (which will change in v11) to stop the MCV from filling the
entire statistics target size if it thinks adding more won't be useful.
But I don't know why the histogram boundary list would be short.  But, I
doubt that that is very important here.  The histogram is only used for
inequality/range, not for equality/set membership.



>  3) is there a way to manually specify the the distribution of rows to be
> even? that is, set the frequency of each value to be ~ n_rows/n_distinct.
> This isn't quite accurate, but is a reasonable assumption about the
> distribution, and might generate better query plans.
>


This would be going in the wrong direction.  Your queries seem to
preferentially use rare parent_ids, not typical parent_ids.  In fact, it
seems like many of your hard-coded parent_ids don't exist in the table at
all.  That certainly isn't going to help the planner any.  Could you
somehow remove those before constructing the query?

You might also take a step back, where is that list of parent_ids coming
from in the first place, and why couldn't you convert the list of literals
into a query that returns that list naturally?


> You could try reversing the order and adding a column to be (tmstmp,
>> parent_id, id) and keeping the table well vacuumed.  This would allow the
>> slow plan to still walk the indexes in tmstmp order but do it as an
>> index-only scan, so it could omit the extra trip to the table. That trip to
>> the table must be awfully slow to explain the numbers you show later in the
>> thread.
>
>
> Just to clarify, do you mean building indexes like:
> CREATE INDEX "a_tmstmp_parent_id_id_idx_[PART_KEY]" on
> "a_partition[PART_KEY]" USING btree("tmstmp", "parent_id", "id")
> That seems promising! Is the intuition here that we want the first key of
> the index to be the one we are ultimately ordering by? Sounds like I make
> have had that flipped initially. My understanding of this whole situation
> (and please do correct me if this doesn't make sense) is the big bottleneck
> here is reading pages from disk (when looking at stopped up queries, the
> wait_event is DataFileRead), and so anything that can be done to minimize
> the pages read will be valuable. Which is why I would love to get the query
> plan to use the tmstmp index without having to filter thereafter by
> parent_id.
>

Yes, that is the index.

You really want it to filter by parent_id in the index, rather than going
to the table to do the filter on parent_id.  The index pages with tmstmp as
the leading column are going to be more tightly packed with potentially
relevant rows, while the table pages are less likely to be densely packed.
So filtering in the 

Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Fabio Pardi



On 07/17/2018 04:05 PM, Neto pr wrote:
> 2018-07-17 10:55 GMT-03:00 Fabio Pardi :

>> Also i think it makes not much sense testing on RAID 0. I would start
>> performing tests on a single disk, bypassing RAID (or, as mentioned, at
>> least disabling cache).
>>
> 
> But in my case, both the 2 SSDs and the 2 HDDs are in RAID ZERO.
> This way it would not be a valid test ? Because the 2 environments are
> in RAID ZERO.
> 
> 

in theory, probably yes and maybe not.
In RAID 0, data is (usually) striped in a round robin fashion, so you
should rely on the fact that, in average, data is spread 50% on each
disk. For the sake of knowledge, you can check what your RAID controller
is actually using as algorithm to spread data over RAID 0.

But you might be in an unlucky case in which more data is on one disk
than in another.
Unlucky or created by the events, like you deleted the records which are
on disk 0 and you only are querying those on disk 1, for instance.

The fact is, that more complexity you add to your test, the less the
results will be closer to your expectations.

Since you are testing disks, and not RAID, i would start empirically and
perform the test straight on 1 disk.
A simple test, like dd i mentioned here above.
If dd, or other more tailored tests on disks show that SSD is way slow,
then you can focus on tuning your disk. or trashing it :)

When you are satisfied with your results, you can build up complexity
from the reliable/consolidated level you reached.

As side note: why to run a test on a setup you can never use on production?

regards,

fabio pardi





Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Nicolas Charles

Le 17/07/2018 à 16:00, Neto pr a écrit :

2018-07-17 10:44 GMT-03:00 Nicolas Charles :

Hi Neto,

You should list the SSD model also - there are pleinty of Samsung EVO drives
- and they are not professional grade.

Among the the possible issues, the most likely (from my point of view) are:

- TRIM command doesn't go through the RAID (which is really likely) - so the
SSD controller think it's full, and keep pushing blocks around to level
wear, causing massive perf degradation - please check this config on you
RAID driver/adapter

- TRIM is not configured on the OS level for the SSD

- Partitions is not correctly aligned on the SSD blocks


Without so little details on your system, we can only try to guess the real
issues


Thank you Nicolas, for your tips.
I believe your assumption is right.

This SSD really is not professional, even if Samsung's advertisement
says yes. If I have to buy another SSD I will prefer INTEL SSDs.

I had a previous problem with it (Sansung EVO) as it lost in
performance to a SAS HDD, but however, the SAS HDD was a 12 Gb/s
transfer rate and the SSD was 6 Gb/s.

But now I tested against an HDD (7200 RPM) that has the same transfer
rate as the SSD 6 Gb/sec. and could not lose in performance.

Maybe it's the unconfigured trim.

Could you give me some help on how I could check if my RAID is
configured for this, I use Hardware RAID using HP software (HP Storage
Provider on boot).
And on Debian 8 Operating System, how could I check the TRIM configuration ?

Best
[]'s Neto


I'm no expert in HP system, but you can have a look at this thread and 
referenced links
For the trim option in Debian, you need to define the mount options of 
your partition, in /etc/fstab, to include "discard" (see 
https://wiki.archlinux.org/index.php/Solid_State_Drive#Continuous_TRIM )


Regards,
Nicolas


Nicolas

Nicolas CHARLES

Le 17/07/2018 à 15:19, Neto pr a écrit :

2018-07-17 10:04 GMT-03:00 Neto pr :

Sorry.. I replied in the wrong message before ...
follows my response.
-

Thanks all, but I still have not figured it out.
This is really strange because the tests were done on the same machine
(I use  HP ML110 Proliant 8gb RAM - Xeon 2.8 ghz processor (4
cores), and POSTGRESQL 10.1.
- Only the mentioned query running at the time of the test.
- I repeated the query 7 times and did not change the results.
- Before running each batch of 7 executions, I discarded the Operating
System cache and restarted DBMS like this:
(echo 3> / proc / sys / vm / drop_caches;

discs:
- 2 units of Samsung Evo SSD 500 GB (mounted on ZERO RAID)
- 2 SATA 7500 Krpm HDD units - 1TB (mounted on ZERO RAID)

- The Operating System and the Postgresql DBMS are installed on the SSD
disk.


One more information.
I used default configuration to Postgresql.conf
Only exception is to :
random_page_cost on SSD is 1.1



Best Regards
[ ]`s Neto

2018-07-17 1:08 GMT-07:00 Fabio Pardi :

As already mentioned by Robert, please let us know if you made sure that
nothing was fished from RAM, over the faster test.

In other words, make sure that all caches are dropped between one test
and another.

Also,to better picture the situation, would be good to know:

- which SSD (brand/model) are you using?
- which HDD?
- how are the disks configured? RAID? or not?
- on which OS?
- what are the mount options? SSD requires tuning
- did you make sure that no other query was running at the time of the
bench?
- are you making a comparison on the same machine?
- is it HW or VM? benchs should better run on bare metal to avoid
results pollution (eg: other VMS on the same hypervisor using the disk,
host caching and so on)
- how many times did you run the tests?
- did you change postgres configuration over tests?
- can you post postgres config?
- what about vacuums or maintenance tasks running in the background?

Also, to benchmark disks i would not use a custom query but pgbench.

Be aware: running benchmarks is a science, therefore needs a scientific
approach :)

regards

fabio pardi



On 07/17/2018 07:00 AM, Neto pr wrote:

Dear,
Some of you can help me understand this.

This query plan is executed in the query below (query 9 of TPC-H
Benchmark, with scale 40, database with approximately 40 gb).

The experiment consisted of running the query on a HDD (Raid zero).
Then the same query is executed on an SSD (Raid Zero).

Why did the HDD (7200 rpm)  perform better?
HDD - TIME 9 MINUTES
SSD - TIME 15 MINUTES

As far as I know, the SSD has a reading that is 300 times faster than
SSD.

--- Execution  Plans---
ssd 40g
https://explain.depesz.com/s/rHkh

hdd 40g
https://explain.depesz.com/s/l4sq

Query 

select
  nation,
  o_year,
  sum(amount) as sum_profit
from
  (
  select
  n_name as nation,
  extract(year from o_orderdate) as o_year,
  l_extendedprice * (1 - l_discount) - ps_supplycost *
l_quantity as amount
  from
  part,
  

Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Neto pr
2018-07-17 10:55 GMT-03:00 Fabio Pardi :
> If you have a RAID cache, i would disable it, since we are only focusing
> on the disks. Cache can give you inconsistent data (even it looks like
> is not the case here).
>
> Also, we can do a step backward, and exclude postgres from the picture
> for the moment.
>
> try to perform a dd test in reading from disk, and let us know.
>
> like:
>
> - create big_enough_file
> - empty OS cache
> - dd if=big_enough_file of=/dev/null
>
> and post the results for both disks.
>
> Also i think it makes not much sense testing on RAID 0. I would start
> performing tests on a single disk, bypassing RAID (or, as mentioned, at
> least disabling cache).
>

But in my case, both the 2 SSDs and the 2 HDDs are in RAID ZERO.
This way it would not be a valid test ? Because the 2 environments are
in RAID ZERO.



> The findings should narrow the focus
>
>
> regards,
>
> fabio pardi
>
> On 07/17/2018 03:19 PM, Neto pr wrote:
>> 2018-07-17 10:04 GMT-03:00 Neto pr :
>>> Sorry.. I replied in the wrong message before ...
>>> follows my response.
>>> -
>>>
>>> Thanks all, but I still have not figured it out.
>>> This is really strange because the tests were done on the same machine
>>> (I use  HP ML110 Proliant 8gb RAM - Xeon 2.8 ghz processor (4
>>> cores), and POSTGRESQL 10.1.
>>> - Only the mentioned query running at the time of the test.
>>> - I repeated the query 7 times and did not change the results.
>>> - Before running each batch of 7 executions, I discarded the Operating
>>> System cache and restarted DBMS like this:
>>> (echo 3> / proc / sys / vm / drop_caches;
>>>
>>> discs:
>>> - 2 units of Samsung Evo SSD 500 GB (mounted on ZERO RAID)
>>> - 2 SATA 7500 Krpm HDD units - 1TB (mounted on ZERO RAID)
>>>
>>> - The Operating System and the Postgresql DBMS are installed on the SSD 
>>> disk.
>>>
>>
>> One more information.
>> I used default configuration to Postgresql.conf
>> Only exception is to :
>> random_page_cost on SSD is 1.1
>>
>>
>>> Best Regards
>>> [ ]`s Neto
>>>
>>> 2018-07-17 1:08 GMT-07:00 Fabio Pardi :
 As already mentioned by Robert, please let us know if you made sure that
 nothing was fished from RAM, over the faster test.

 In other words, make sure that all caches are dropped between one test
 and another.

 Also,to better picture the situation, would be good to know:

 - which SSD (brand/model) are you using?
 - which HDD?
 - how are the disks configured? RAID? or not?
 - on which OS?
 - what are the mount options? SSD requires tuning
 - did you make sure that no other query was running at the time of the
 bench?
 - are you making a comparison on the same machine?
 - is it HW or VM? benchs should better run on bare metal to avoid
 results pollution (eg: other VMS on the same hypervisor using the disk,
 host caching and so on)
 - how many times did you run the tests?
 - did you change postgres configuration over tests?
 - can you post postgres config?
 - what about vacuums or maintenance tasks running in the background?

 Also, to benchmark disks i would not use a custom query but pgbench.

 Be aware: running benchmarks is a science, therefore needs a scientific
 approach :)

 regards

 fabio pardi



 On 07/17/2018 07:00 AM, Neto pr wrote:
> Dear,
> Some of you can help me understand this.
>
> This query plan is executed in the query below (query 9 of TPC-H
> Benchmark, with scale 40, database with approximately 40 gb).
>
> The experiment consisted of running the query on a HDD (Raid zero).
> Then the same query is executed on an SSD (Raid Zero).
>
> Why did the HDD (7200 rpm)  perform better?
> HDD - TIME 9 MINUTES
> SSD - TIME 15 MINUTES
>
> As far as I know, the SSD has a reading that is 300 times faster than SSD.
>
> --- Execution  Plans---
> ssd 40g
> https://explain.depesz.com/s/rHkh
>
> hdd 40g
> https://explain.depesz.com/s/l4sq
>
> Query 
>
> select
> nation,
> o_year,
> sum(amount) as sum_profit
> from
> (
> select
> n_name as nation,
> extract(year from o_orderdate) as o_year,
> l_extendedprice * (1 - l_discount) - ps_supplycost *
> l_quantity as amount
> from
> part,
> supplier,
> lineitem,
> partsupp,
> orders,
> nation
> where
> s_suppkey = l_suppkey
> and ps_suppkey = l_suppkey
> and ps_partkey = l_partkey
> and p_partkey = l_partkey
> and o_orderkey = l_orderkey
> and s_nationkey = n_nationkey
> and p_name like '%orchid%'
> ) as 

Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Neto pr
2018-07-17 10:44 GMT-03:00 Nicolas Charles :
> Hi Neto,
>
> You should list the SSD model also - there are pleinty of Samsung EVO drives
> - and they are not professional grade.
>
> Among the the possible issues, the most likely (from my point of view) are:
>
> - TRIM command doesn't go through the RAID (which is really likely) - so the
> SSD controller think it's full, and keep pushing blocks around to level
> wear, causing massive perf degradation - please check this config on you
> RAID driver/adapter
>
> - TRIM is not configured on the OS level for the SSD
>
> - Partitions is not correctly aligned on the SSD blocks
>
>
> Without so little details on your system, we can only try to guess the real
> issues
>

Thank you Nicolas, for your tips.
I believe your assumption is right.

This SSD really is not professional, even if Samsung's advertisement
says yes. If I have to buy another SSD I will prefer INTEL SSDs.

I had a previous problem with it (Sansung EVO) as it lost in
performance to a SAS HDD, but however, the SAS HDD was a 12 Gb/s
transfer rate and the SSD was 6 Gb/s.

But now I tested against an HDD (7200 RPM) that has the same transfer
rate as the SSD 6 Gb/sec. and could not lose in performance.

Maybe it's the unconfigured trim.

Could you give me some help on how I could check if my RAID is
configured for this, I use Hardware RAID using HP software (HP Storage
Provider on boot).
And on Debian 8 Operating System, how could I check the TRIM configuration ?

Best
[]'s Neto
>
> Nicolas
>
> Nicolas CHARLES
>
> Le 17/07/2018 à 15:19, Neto pr a écrit :
>>
>> 2018-07-17 10:04 GMT-03:00 Neto pr :
>>>
>>> Sorry.. I replied in the wrong message before ...
>>> follows my response.
>>> -
>>>
>>> Thanks all, but I still have not figured it out.
>>> This is really strange because the tests were done on the same machine
>>> (I use  HP ML110 Proliant 8gb RAM - Xeon 2.8 ghz processor (4
>>> cores), and POSTGRESQL 10.1.
>>> - Only the mentioned query running at the time of the test.
>>> - I repeated the query 7 times and did not change the results.
>>> - Before running each batch of 7 executions, I discarded the Operating
>>> System cache and restarted DBMS like this:
>>> (echo 3> / proc / sys / vm / drop_caches;
>>>
>>> discs:
>>> - 2 units of Samsung Evo SSD 500 GB (mounted on ZERO RAID)
>>> - 2 SATA 7500 Krpm HDD units - 1TB (mounted on ZERO RAID)
>>>
>>> - The Operating System and the Postgresql DBMS are installed on the SSD
>>> disk.
>>>
>> One more information.
>> I used default configuration to Postgresql.conf
>> Only exception is to :
>> random_page_cost on SSD is 1.1
>>
>>
>>> Best Regards
>>> [ ]`s Neto
>>>
>>> 2018-07-17 1:08 GMT-07:00 Fabio Pardi :

 As already mentioned by Robert, please let us know if you made sure that
 nothing was fished from RAM, over the faster test.

 In other words, make sure that all caches are dropped between one test
 and another.

 Also,to better picture the situation, would be good to know:

 - which SSD (brand/model) are you using?
 - which HDD?
 - how are the disks configured? RAID? or not?
 - on which OS?
 - what are the mount options? SSD requires tuning
 - did you make sure that no other query was running at the time of the
 bench?
 - are you making a comparison on the same machine?
 - is it HW or VM? benchs should better run on bare metal to avoid
 results pollution (eg: other VMS on the same hypervisor using the disk,
 host caching and so on)
 - how many times did you run the tests?
 - did you change postgres configuration over tests?
 - can you post postgres config?
 - what about vacuums or maintenance tasks running in the background?

 Also, to benchmark disks i would not use a custom query but pgbench.

 Be aware: running benchmarks is a science, therefore needs a scientific
 approach :)

 regards

 fabio pardi



 On 07/17/2018 07:00 AM, Neto pr wrote:
>
> Dear,
> Some of you can help me understand this.
>
> This query plan is executed in the query below (query 9 of TPC-H
> Benchmark, with scale 40, database with approximately 40 gb).
>
> The experiment consisted of running the query on a HDD (Raid zero).
> Then the same query is executed on an SSD (Raid Zero).
>
> Why did the HDD (7200 rpm)  perform better?
> HDD - TIME 9 MINUTES
> SSD - TIME 15 MINUTES
>
> As far as I know, the SSD has a reading that is 300 times faster than
> SSD.
>
> --- Execution  Plans---
> ssd 40g
> https://explain.depesz.com/s/rHkh
>
> hdd 40g
> https://explain.depesz.com/s/l4sq
>
> Query 
>
> select
>  nation,
>  o_year,
>  sum(amount) as sum_profit
> from
>  (
>  select
>  n_name as nation,
>  extract(year 

Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Fabio Pardi
If you have a RAID cache, i would disable it, since we are only focusing
on the disks. Cache can give you inconsistent data (even it looks like
is not the case here).

Also, we can do a step backward, and exclude postgres from the picture
for the moment.

try to perform a dd test in reading from disk, and let us know.

like:

- create big_enough_file
- empty OS cache
- dd if=big_enough_file of=/dev/null

and post the results for both disks.

Also i think it makes not much sense testing on RAID 0. I would start
performing tests on a single disk, bypassing RAID (or, as mentioned, at
least disabling cache).

The findings should narrow the focus


regards,

fabio pardi

On 07/17/2018 03:19 PM, Neto pr wrote:
> 2018-07-17 10:04 GMT-03:00 Neto pr :
>> Sorry.. I replied in the wrong message before ...
>> follows my response.
>> -
>>
>> Thanks all, but I still have not figured it out.
>> This is really strange because the tests were done on the same machine
>> (I use  HP ML110 Proliant 8gb RAM - Xeon 2.8 ghz processor (4
>> cores), and POSTGRESQL 10.1.
>> - Only the mentioned query running at the time of the test.
>> - I repeated the query 7 times and did not change the results.
>> - Before running each batch of 7 executions, I discarded the Operating
>> System cache and restarted DBMS like this:
>> (echo 3> / proc / sys / vm / drop_caches;
>>
>> discs:
>> - 2 units of Samsung Evo SSD 500 GB (mounted on ZERO RAID)
>> - 2 SATA 7500 Krpm HDD units - 1TB (mounted on ZERO RAID)
>>
>> - The Operating System and the Postgresql DBMS are installed on the SSD disk.
>>
> 
> One more information.
> I used default configuration to Postgresql.conf
> Only exception is to :
> random_page_cost on SSD is 1.1
> 
> 
>> Best Regards
>> [ ]`s Neto
>>
>> 2018-07-17 1:08 GMT-07:00 Fabio Pardi :
>>> As already mentioned by Robert, please let us know if you made sure that
>>> nothing was fished from RAM, over the faster test.
>>>
>>> In other words, make sure that all caches are dropped between one test
>>> and another.
>>>
>>> Also,to better picture the situation, would be good to know:
>>>
>>> - which SSD (brand/model) are you using?
>>> - which HDD?
>>> - how are the disks configured? RAID? or not?
>>> - on which OS?
>>> - what are the mount options? SSD requires tuning
>>> - did you make sure that no other query was running at the time of the
>>> bench?
>>> - are you making a comparison on the same machine?
>>> - is it HW or VM? benchs should better run on bare metal to avoid
>>> results pollution (eg: other VMS on the same hypervisor using the disk,
>>> host caching and so on)
>>> - how many times did you run the tests?
>>> - did you change postgres configuration over tests?
>>> - can you post postgres config?
>>> - what about vacuums or maintenance tasks running in the background?
>>>
>>> Also, to benchmark disks i would not use a custom query but pgbench.
>>>
>>> Be aware: running benchmarks is a science, therefore needs a scientific
>>> approach :)
>>>
>>> regards
>>>
>>> fabio pardi
>>>
>>>
>>>
>>> On 07/17/2018 07:00 AM, Neto pr wrote:
 Dear,
 Some of you can help me understand this.

 This query plan is executed in the query below (query 9 of TPC-H
 Benchmark, with scale 40, database with approximately 40 gb).

 The experiment consisted of running the query on a HDD (Raid zero).
 Then the same query is executed on an SSD (Raid Zero).

 Why did the HDD (7200 rpm)  perform better?
 HDD - TIME 9 MINUTES
 SSD - TIME 15 MINUTES

 As far as I know, the SSD has a reading that is 300 times faster than SSD.

 --- Execution  Plans---
 ssd 40g
 https://explain.depesz.com/s/rHkh

 hdd 40g
 https://explain.depesz.com/s/l4sq

 Query 

 select
 nation,
 o_year,
 sum(amount) as sum_profit
 from
 (
 select
 n_name as nation,
 extract(year from o_orderdate) as o_year,
 l_extendedprice * (1 - l_discount) - ps_supplycost *
 l_quantity as amount
 from
 part,
 supplier,
 lineitem,
 partsupp,
 orders,
 nation
 where
 s_suppkey = l_suppkey
 and ps_suppkey = l_suppkey
 and ps_partkey = l_partkey
 and p_partkey = l_partkey
 and o_orderkey = l_orderkey
 and s_nationkey = n_nationkey
 and p_name like '%orchid%'
 ) as profit
 group by
 nation,
 o_year
 order by
 nation,
 o_year desc

>>>



Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Robert Zenz
On 17.07.2018 15:44, Nicolas Charles wrote:
> - Partitions is not correctly aligned on the SSD blocks

Does that really make a noticeable difference? If yes, have you got some further
reading material on that?

Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Jeff Janes
On Tue, Jul 17, 2018 at 1:00 AM, Neto pr  wrote:

> Dear,
> Some of you can help me understand this.
>
> This query plan is executed in the query below (query 9 of TPC-H
> Benchmark, with scale 40, database with approximately 40 gb).
>
> The experiment consisted of running the query on a HDD (Raid zero).
> Then the same query is executed on an SSD (Raid Zero).
>
> Why did the HDD (7200 rpm)  perform better?
> HDD - TIME 9 MINUTES
> SSD - TIME 15 MINUTES
>
> As far as I know, the SSD has a reading that is 300 times faster than SSD.
>

Is the 300 times faster comparing random to random, or sequential to
sequential?  Maybe your SSD simply fails to perform as advertised.  This
would not surprise me at all.

To remove some confounding variables, can you turn off parallelism and
repeat the queries?  (Yes, they will probably get slower.  But is the
relative timings still the same?)  Also, turn on track_io_timings and
repeat the "EXPLAIN (ANALYZE, BUFFERS)", perhaps with TIMINGS OFF.

Also, see how long it takes to read the entire database, or just the
largest table, outside of postgres.

Something like:

time tar -f - $PGDATA/base | wc -c

or

time cat $PGDATA/base//* | wc -c

Cheers,

Jeff


Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Neto pr
2018-07-17 10:04 GMT-03:00 Neto pr :
> Sorry.. I replied in the wrong message before ...
> follows my response.
> -
>
> Thanks all, but I still have not figured it out.
> This is really strange because the tests were done on the same machine
> (I use  HP ML110 Proliant 8gb RAM - Xeon 2.8 ghz processor (4
> cores), and POSTGRESQL 10.1.
> - Only the mentioned query running at the time of the test.
> - I repeated the query 7 times and did not change the results.
> - Before running each batch of 7 executions, I discarded the Operating
> System cache and restarted DBMS like this:
> (echo 3> / proc / sys / vm / drop_caches;
>
> discs:
> - 2 units of Samsung Evo SSD 500 GB (mounted on ZERO RAID)
> - 2 SATA 7500 Krpm HDD units - 1TB (mounted on ZERO RAID)
>
> - The Operating System and the Postgresql DBMS are installed on the SSD disk.
>

One more information.
I used default configuration to Postgresql.conf
Only exception is to :
random_page_cost on SSD is 1.1


> Best Regards
> [ ]`s Neto
>
> 2018-07-17 1:08 GMT-07:00 Fabio Pardi :
>> As already mentioned by Robert, please let us know if you made sure that
>> nothing was fished from RAM, over the faster test.
>>
>> In other words, make sure that all caches are dropped between one test
>> and another.
>>
>> Also,to better picture the situation, would be good to know:
>>
>> - which SSD (brand/model) are you using?
>> - which HDD?
>> - how are the disks configured? RAID? or not?
>> - on which OS?
>> - what are the mount options? SSD requires tuning
>> - did you make sure that no other query was running at the time of the
>> bench?
>> - are you making a comparison on the same machine?
>> - is it HW or VM? benchs should better run on bare metal to avoid
>> results pollution (eg: other VMS on the same hypervisor using the disk,
>> host caching and so on)
>> - how many times did you run the tests?
>> - did you change postgres configuration over tests?
>> - can you post postgres config?
>> - what about vacuums or maintenance tasks running in the background?
>>
>> Also, to benchmark disks i would not use a custom query but pgbench.
>>
>> Be aware: running benchmarks is a science, therefore needs a scientific
>> approach :)
>>
>> regards
>>
>> fabio pardi
>>
>>
>>
>> On 07/17/2018 07:00 AM, Neto pr wrote:
>>> Dear,
>>> Some of you can help me understand this.
>>>
>>> This query plan is executed in the query below (query 9 of TPC-H
>>> Benchmark, with scale 40, database with approximately 40 gb).
>>>
>>> The experiment consisted of running the query on a HDD (Raid zero).
>>> Then the same query is executed on an SSD (Raid Zero).
>>>
>>> Why did the HDD (7200 rpm)  perform better?
>>> HDD - TIME 9 MINUTES
>>> SSD - TIME 15 MINUTES
>>>
>>> As far as I know, the SSD has a reading that is 300 times faster than SSD.
>>>
>>> --- Execution  Plans---
>>> ssd 40g
>>> https://explain.depesz.com/s/rHkh
>>>
>>> hdd 40g
>>> https://explain.depesz.com/s/l4sq
>>>
>>> Query 
>>>
>>> select
>>> nation,
>>> o_year,
>>> sum(amount) as sum_profit
>>> from
>>> (
>>> select
>>> n_name as nation,
>>> extract(year from o_orderdate) as o_year,
>>> l_extendedprice * (1 - l_discount) - ps_supplycost *
>>> l_quantity as amount
>>> from
>>> part,
>>> supplier,
>>> lineitem,
>>> partsupp,
>>> orders,
>>> nation
>>> where
>>> s_suppkey = l_suppkey
>>> and ps_suppkey = l_suppkey
>>> and ps_partkey = l_partkey
>>> and p_partkey = l_partkey
>>> and o_orderkey = l_orderkey
>>> and s_nationkey = n_nationkey
>>> and p_name like '%orchid%'
>>> ) as profit
>>> group by
>>> nation,
>>> o_year
>>> order by
>>> nation,
>>> o_year desc
>>>
>>



Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Neto pr
Sorry.. I replied in the wrong message before ...
follows my response.
-

Thanks all, but I still have not figured it out.
This is really strange because the tests were done on the same machine
(I use  HP ML110 Proliant 8gb RAM - Xeon 2.8 ghz processor (4
cores), and POSTGRESQL 10.1.
- Only the mentioned query running at the time of the test.
- I repeated the query 7 times and did not change the results.
- Before running each batch of 7 executions, I discarded the Operating
System cache and restarted DBMS like this:
(echo 3> / proc / sys / vm / drop_caches;

discs:
- 2 units of Samsung Evo SSD 500 GB (mounted on ZERO RAID)
- 2 SATA 7500 Krpm HDD units - 1TB (mounted on ZERO RAID)

- The Operating System and the Postgresql DBMS are installed on the SSD disk.

Best Regards
[ ]`s Neto

2018-07-17 1:08 GMT-07:00 Fabio Pardi :
> As already mentioned by Robert, please let us know if you made sure that
> nothing was fished from RAM, over the faster test.
>
> In other words, make sure that all caches are dropped between one test
> and another.
>
> Also,to better picture the situation, would be good to know:
>
> - which SSD (brand/model) are you using?
> - which HDD?
> - how are the disks configured? RAID? or not?
> - on which OS?
> - what are the mount options? SSD requires tuning
> - did you make sure that no other query was running at the time of the
> bench?
> - are you making a comparison on the same machine?
> - is it HW or VM? benchs should better run on bare metal to avoid
> results pollution (eg: other VMS on the same hypervisor using the disk,
> host caching and so on)
> - how many times did you run the tests?
> - did you change postgres configuration over tests?
> - can you post postgres config?
> - what about vacuums or maintenance tasks running in the background?
>
> Also, to benchmark disks i would not use a custom query but pgbench.
>
> Be aware: running benchmarks is a science, therefore needs a scientific
> approach :)
>
> regards
>
> fabio pardi
>
>
>
> On 07/17/2018 07:00 AM, Neto pr wrote:
>> Dear,
>> Some of you can help me understand this.
>>
>> This query plan is executed in the query below (query 9 of TPC-H
>> Benchmark, with scale 40, database with approximately 40 gb).
>>
>> The experiment consisted of running the query on a HDD (Raid zero).
>> Then the same query is executed on an SSD (Raid Zero).
>>
>> Why did the HDD (7200 rpm)  perform better?
>> HDD - TIME 9 MINUTES
>> SSD - TIME 15 MINUTES
>>
>> As far as I know, the SSD has a reading that is 300 times faster than SSD.
>>
>> --- Execution  Plans---
>> ssd 40g
>> https://explain.depesz.com/s/rHkh
>>
>> hdd 40g
>> https://explain.depesz.com/s/l4sq
>>
>> Query 
>>
>> select
>> nation,
>> o_year,
>> sum(amount) as sum_profit
>> from
>> (
>> select
>> n_name as nation,
>> extract(year from o_orderdate) as o_year,
>> l_extendedprice * (1 - l_discount) - ps_supplycost *
>> l_quantity as amount
>> from
>> part,
>> supplier,
>> lineitem,
>> partsupp,
>> orders,
>> nation
>> where
>> s_suppkey = l_suppkey
>> and ps_suppkey = l_suppkey
>> and ps_partkey = l_partkey
>> and p_partkey = l_partkey
>> and o_orderkey = l_orderkey
>> and s_nationkey = n_nationkey
>> and p_name like '%orchid%'
>> ) as profit
>> group by
>> nation,
>> o_year
>> order by
>> nation,
>> o_year desc
>>
>



Re: HDD vs SSD without explanation

2018-07-17 Thread Neto pr
Thanks all, but I still have not figured it out.
This is really strange because the tests were done on the same machine
(I use  HP ML110 Proliant 8gb RAM - Xeon 2.8 ghz processor (4
cores), and POSTGRESQL 10.1.
- Only the mentioned query running at the time of the test.
- I repeated the query 7 times and did not change the results.
- Before running each batch of 7 executions, I discarded the Operating
System cache and restarted DBMS like this:
(echo 3> / proc / sys / vm / drop_caches;

discs:
- 2 units of Samsung Evo SSD 500 GB (mounted on ZERO RAID)
- 2 SATA 7500 Krpm HDD units - 1TB (mounted on ZERO RAID)

- The Operating System and the Postgresql DBMS are installed on the SSD disk.

Best Regards
[ ]`s Neto


2018-01-16 13:24 GMT-08:00 Mark Kirkwood :
>
>
> On 16/01/18 23:14, Neto pr wrote:
>>
>> 2018-01-15 20:04 GMT-08:00 Mark Kirkwood :
>>>
>>> On 16/01/18 13:18, Fernando Hevia wrote:
>>>


 The 6 Gb/s interface is capable of a maximum throughput of around 600
 Mb/s. None of your drives can achieve that so I don't think you are
 limited
 to the interface speed. The 12 Gb/s interface speed advantage kicks in
 when
 there are several drives installed and it won't make a diference in a
 single
 drive or even a two drive system.

 But don't take my word for it. Test your drives throughput with the
 command Justin suggested so you know exactly what each drive is capable
 of:

  Can you reproduce the speed difference using dd ?
  time sudo dd if=/dev/sdX of=/dev/null bs=1M count=32K
  skip=$((128*$RANDOM/32)) # set bs to optimal_io_size


 While common sense says SSD drive should outperform the mechanical one,
 your test scenario (large volume sequential reads) evens out the field a
 lot. Still I would have expected somewhat similar results in the
 outcome, so
 yes, it is weird that the SAS drive doubles the SSD performance. That is
 why
 I think there must be something else going on during your tests on the
 SSD
 server. It can also be that the SSD isn't working properly or you are
 running an suboptimal OS+server+controller configuration for the drive.

>>> I would second the analysis above - unless you see your read MB/s slammed
>>> up
>>> against 580-600MB/s contunuously then the interface speed is not the
>>> issue.
>>> We have some similar servers that we replaced 12x SAS with 1x SATA 6
>>> GBit/s
>>> (Intel DC S3710) SSD...and the latter way outperforms the original 12 SAS
>>> drives.
>>>
>>> I suspect the problem is the particular SSD you have - I have benchmarked
>>> the 256GB EVO variant and was underwhelmed by the performance. These
>>> (budget) triple cell nand SSD seem to have highly variable read and write
>>> performance (the write is all about when the SLC nand cache gets
>>> full)...read I'm not so sure of - but it could be crappy chipset/firmware
>>> combination. In short I'd recommend *not* using that particular SSD for a
>>> database workload. I'd recommend one of the Intel Datacenter DC range
>>> (FWIW
>>> I'm not affiliated with Intel in any way...but their DC stuff works
>>> well).
>>>
>>> regards
>>>
>>> Mark
>>
>> Hi Mark
>> In other forums one person said me that on samsung evo should be
>> partition aligned to 3072 not  default 2048 , to start on erase block
>> bounduary .  And fs block should be 8kb. I am studing this too. Some
>> DBAs have reported in other situations that the SSDs when they are
>> full, are very slow. Mine is 85% full, so maybe that is also
>> influencing. I'm disappointed with this SSD from Samsung, because in
>> theory, the read speed of an SSD should be more than 300 times faster
>> than an HDD and this is not happening.
>>
>>
>
> Interesting - I didn't try changing the alignment. However I could get the
> rated write and read performance on simple benchmarks (provided it was in a
> PCIe V3 slot)...so figured it was ok with the default aligning. However once
> more complex workloads were attempted (databases and distributed object
> store) the performance was disappointing.
>
> If the SSD is 85% full that will not help either (also look at the expected
> lifetime of these EVO's - not that great for a server)!
>
> One thing worth trying is messing about with the IO scheduler: if you are
> using noop, then try deadline (like I said crappy firmware)...
>
> Realistically, I'd recommend getting an enterprise/DC SSD (put the EVO in
> your workstation, it will be quite nice there)!
>
> Cheers
> Mark



Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Fabio Pardi
As already mentioned by Robert, please let us know if you made sure that
nothing was fished from RAM, over the faster test.

In other words, make sure that all caches are dropped between one test
and another.

Also,to better picture the situation, would be good to know:

- which SSD (brand/model) are you using?
- which HDD?
- how are the disks configured? RAID? or not?
- on which OS?
- what are the mount options? SSD requires tuning
- did you make sure that no other query was running at the time of the
bench?
- are you making a comparison on the same machine?
- is it HW or VM? benchs should better run on bare metal to avoid
results pollution (eg: other VMS on the same hypervisor using the disk,
host caching and so on)
- how many times did you run the tests?
- did you change postgres configuration over tests?
- can you post postgres config?
- what about vacuums or maintenance tasks running in the background?

Also, to benchmark disks i would not use a custom query but pgbench.

Be aware: running benchmarks is a science, therefore needs a scientific
approach :)

regards

fabio pardi



On 07/17/2018 07:00 AM, Neto pr wrote:
> Dear,
> Some of you can help me understand this.
> 
> This query plan is executed in the query below (query 9 of TPC-H
> Benchmark, with scale 40, database with approximately 40 gb).
> 
> The experiment consisted of running the query on a HDD (Raid zero).
> Then the same query is executed on an SSD (Raid Zero).
> 
> Why did the HDD (7200 rpm)  perform better?
> HDD - TIME 9 MINUTES
> SSD - TIME 15 MINUTES
> 
> As far as I know, the SSD has a reading that is 300 times faster than SSD.
> 
> --- Execution  Plans---
> ssd 40g
> https://explain.depesz.com/s/rHkh
> 
> hdd 40g
> https://explain.depesz.com/s/l4sq
> 
> Query 
> 
> select
> nation,
> o_year,
> sum(amount) as sum_profit
> from
> (
> select
> n_name as nation,
> extract(year from o_orderdate) as o_year,
> l_extendedprice * (1 - l_discount) - ps_supplycost *
> l_quantity as amount
> from
> part,
> supplier,
> lineitem,
> partsupp,
> orders,
> nation
> where
> s_suppkey = l_suppkey
> and ps_suppkey = l_suppkey
> and ps_partkey = l_partkey
> and p_partkey = l_partkey
> and o_orderkey = l_orderkey
> and s_nationkey = n_nationkey
> and p_name like '%orchid%'
> ) as profit
> group by
> nation,
> o_year
> order by
> nation,
> o_year desc
> 



Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Robert Zenz
> Why did the HDD (7200 rpm)  perform better?

Are these different systems? Have you ruled out that during the HDD test the
data was available in memory?

Re: Why HDD performance is better than SSD in this case

2018-07-17 Thread Mark Kirkwood
Can you post make and model of the SSD concerned? In general the cheaper 
consumer grade ones cannot do sustained read/writes at anything like 
their quoted max values.


regards

Mark


On 17/07/18 17:00, Neto pr wrote:

Dear,
Some of you can help me understand this.

This query plan is executed in the query below (query 9 of TPC-H
Benchmark, with scale 40, database with approximately 40 gb).

The experiment consisted of running the query on a HDD (Raid zero).
Then the same query is executed on an SSD (Raid Zero).

Why did the HDD (7200 rpm)  perform better?
HDD - TIME 9 MINUTES
SSD - TIME 15 MINUTES

As far as I know, the SSD has a reading that is 300 times faster than SSD.

--- Execution  Plans---
ssd 40g
https://explain.depesz.com/s/rHkh

hdd 40g
https://explain.depesz.com/s/l4sq

Query 

select
 nation,
 o_year,
 sum(amount) as sum_profit
from
 (
 select
 n_name as nation,
 extract(year from o_orderdate) as o_year,
 l_extendedprice * (1 - l_discount) - ps_supplycost *
l_quantity as amount
 from
 part,
 supplier,
 lineitem,
 partsupp,
 orders,
 nation
 where
 s_suppkey = l_suppkey
 and ps_suppkey = l_suppkey
 and ps_partkey = l_partkey
 and p_partkey = l_partkey
 and o_orderkey = l_orderkey
 and s_nationkey = n_nationkey
 and p_name like '%orchid%'
 ) as profit
group by
 nation,
 o_year
order by
 nation,
 o_year desc