Re: [GENERAL] clustered index benchmark comparing Postgresql vs Mariadb

2017-08-31 Thread Merlin Moncure
On Wed, Aug 30, 2017 at 9:03 PM, 유상지  wrote:

> I want to get help with Postgresql.
>
> I investigated that Postgresql could be rather fast in an environment
> using a secondary index. but It came up with different results on benckmark.
>
> The database I compared was mariadb, and the benchmark tool was sysbench
> 1.0.8 with the postgresql driver.
>
> Server environment: vmware, Ubuntu 17.04, processor: 4, RAM: 4 GB,
> Harddisk: 40 GB, Mariadb (v10.3), PostgreSQL (v9.6.4)
>
mysql and other systems (for example sql server) use a technique where the
table is automatically clustered around an index-- generally the primary
key.  This technique has some tradeoffs; the main upside is that lookups on
the pkey are somewhat faster whereas lookups on any other index are
somewhat slower and insertions can be slower in certain cases (especially
if guids are the pkey).  I would call this technique 'index organized
table'.  The technique exploits the persistent organization so that the
index is implied and does not have to be kept separate from the heap (the
main table data storage).

postgres 'cluster' command currently is a one time pass over the table that
organizes the table physically in index order but does not maintain the
table in that order nor does it exploit the ordering to eliminate the
primary key index in the manner that other systems do.   From a postgres
point of view, the main advantage is that scans (not single record lookups)
over the key will be sequential physical reads and will tend to have to
read less physical pages since adjacent key records logically will also be
adjacent physically.

For my part, I generally prefer the postgres style of organization for most
workloads, particularly for the surrogate key pattern. I would definitely
like to have the option of having the indexed organized style however.
It's definitely possible to tease out the tradeoffs in synthetic
benchmarking but in the gross aggregate I suspect (but can't obviously
prove) the technique is a loser since as database models mature the kinds
of ways tables are indexed looked up and joined tends to proliferate.

merlin


Re: [GENERAL] clustered index benchmark comparing Postgresql vs Mariadb

2017-08-31 Thread Thomas Kellerer
유상지 schrieb am 31.08.2017 um 04:03:
> Cluster secondary indexes were faster than those without cluster indexes in 
> pg, but slower than mariadb.

There is no such thing as a "clustered index" in Postgres. 

The Postgres "cluster" command physically sorts the rows of a table according 
to the sort order of an index, but that is something completely different then 
a "clustered index". The data is still stored in the index and the table. 

A clustered index in MariaDB/MySQL stores the entire table data. So the table 
and the index is the same thing (Oracle calls that an "index organized table" 
which describes this a lot better). As the table and index are the same thing 
you can't have two clustered indexes on the same table. 

An index in Postgres only stores the data of the indexed columns (plus an 
internal row identifier). There is no technical difference between a primary 
key index and any other index. The structure and storage is always the same. So 
the term "secondary index" does not really make sense in Postgres.






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] clustered index benchmark comparing Postgresql vs Mariadb

2017-08-30 Thread Melvin Davidson
On Wed, Aug 30, 2017 at 10:03 PM, 유상지  wrote:

>
>
> I want to get help with Postgresql.
>
> I investigated that Postgresql could be rather fast in an environment
> using a secondary index. but It came up with different results on benckmark.
>
> The database I compared was mariadb, and the benchmark tool was sysbench
> 1.0.8 with the postgresql driver.
>
> Server environment: vmware, Ubuntu 17.04, processor: 4, RAM: 4 GB,
> Harddisk: 40 GB, Mariadb (v10.3), PostgreSQL (v9.6.4)
>
>
>
> The created sysbench progress statement is as follows.
>
> Sysbench /usr/share/sysbench/oltp_read_only.lua --db-driver = mysql
> --mysql-host = 127.0.0.1 --mysql-port = 3306 --mysql-user = root
> --mysql-password = ajdajddl75 - Mysql-db = sysbench --tables = 3
> --table_size = 10 --report-interval = 10 --secondary = on --time = 60
>
>
>
> Used options
>
> Select only, Num of threads = 1, num of tables = 3, table-size = 10
> and Table-size = 100, secondary index select instead of primary key.
>
>
>
>
>
>
>
>  My hypothesis was that  selecting by secondary index in postgresql is
> faster than in Mariadb. However, the results depend on table size.
>
>
>
> Postgresql was faster than Mariadb when the table size was 100, but
> slower at 10.
>
>
>
> Cluster secondary indexes were faster than those without cluster indexes
> in pg, but slower than mariadb.
>
>
>
> I'd like to see the difference in architecture rather than optimization,
> so every benchmark executed with default options except for clustered index.
>
>  I wonder if there are any settings I missed.
>
>
>
> I would be very pleased if someone could explain why these results came up.
>

>Postgresql was faster than Mariadb when the table size was 100, but
slower at 10.

You made a general statement, but you left out a lot of important
information.

A. Did you do an ANALYZE table_name BEFORE running your test?
B. Did you verify the index was being used with EXPLAIN your_query?
C. What was the exact query you used?
D. Most important, what is the structure of the table and index?
E. How much system memory is there?
F. In postgresql.conf What are the settings for
1. shared_memory
2. work_memory
3. All Planner Cost Constants values, All Genetic Query Optimizer values
-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


[GENERAL] clustered index benchmark comparing Postgresql vs Mariadb

2017-08-30 Thread 유상지
 
I want to get help with Postgresql.
I investigated that Postgresql could be rather fast in an environment using a 
secondary index. but It came up with different results on benckmark.
The database I compared was mariadb, and the benchmark tool was sysbench 1.0.8 
with the postgresql driver.
Server environment: vmware, Ubuntu 17.04, processor: 4, RAM: 4 GB, Harddisk: 40 
GB, Mariadb (v10.3), PostgreSQL (v9.6.4)
 
The created sysbench progress statement is as follows.
Sysbench /usr/share/sysbench/oltp_read_only.lua --db-driver = mysql 
--mysql-host = 127.0.0.1 --mysql-port = 3306 --mysql-user = root 
--mysql-password = ajdajddl75 - Mysql-db = sysbench --tables = 3 --table_size = 
10 --report-interval = 10 --secondary = on --time = 60 
 
Used options
Select only, Num of threads = 1, num of tables = 3, table-size = 10 and 
Table-size = 100, secondary index select instead of primary key.
 
 
 
 My hypothesis was that  selecting by secondary index in postgresql is faster 
than in Mariadb. However, the results depend on table size.
 
Postgresql was faster than Mariadb when the table size was 100, but slower 
at 10.
 
Cluster secondary indexes were faster than those without cluster indexes in pg, 
but slower than mariadb.
 
I'd like to see the difference in architecture rather than optimization, so 
every benchmark executed with default options except for clustered index.
 I wonder if there are any settings I missed.
 
I would be very pleased if someone could explain why these results came up.