Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-15 Thread Andrew Borodin
>So on average in a large randomly filled index, pages
spend more time nearer 50% full than 100% full.

I think we can make this number more...controllable.
Before split we can check whether left and right pages both are in
shared buffer and if they are seriously under certain fillfactor, say
under 75%. We can unload some of data there instead of spliting. This
will slow down insertions a bit, but we can have any fillfactor we
want for random inserts. I mean, for sure, someone can construct bad
input to gain low fillfactor, like it is with qsort (
http://www.cs.dartmouth.edu/~doug/mdmspe.pdf ), but every real
scenario will not trigger this behavior.

But then we have to think about locks, WALs etc.


Best regards, Andrey Borodin, Octonica & Ural Federal University.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-15 Thread Mark Kirkwood

On 13/08/16 05:44, Jeff Janes wrote:

On Fri, Aug 12, 2016 at 1:40 AM, Mark Kirkwood

However your index rebuild gets you from 5 to 3 GB - does that really help
performance significantly?

It can make a big difference, depending on how much RAM you have.



Yeah - I suspect this is the issue - loading up a similar type of schema 
with records with a primary key of form 'userx' for (uniformly) 
randomly distributed x... (I was gonna use the Yahoo benchmark but 
it is s slow...). Also I'm using 1000 rows instead of 1 
to avoid waiting a long time (1000 should be enough to show the point):


prefix=# \d prefix
   Table "public.prefix"
 Column | Type  | Modifiers
+---+---
 uid| character varying(30) | not null
 filler | character(255)|
Indexes:
"prefix_pkey" PRIMARY KEY, btree (uid)

Doing an uncached indexed read by forcing a buffer cache clear:

# echo 3 > /proc/sys/vm/drop_caches

prefix=# SELECT 
relfilenode,relname,reltuples,pg_relation_size(oid)/1024/1024 AS mb

FROM pg_class WHERE relname LIKE 'prefix%';
 relfilenode |   relname   | reltuples | mb
-+-+---+-
 6017817 | prefix  | 1e+07 | 422
 6017819 | prefix_pkey | 1e+07 | 391
(2 rows)

prefix=# EXPLAIN ANALYZE SELECT count(*)
 FROM prefix WHERE uid='user1';
  QUERY PLAN


---
 Aggregate  (cost=8.46..8.46 rows=1 width=0) (actual time=3.408..3.408 
rows=1 lo

ops=1)
   ->  Index Only Scan using prefix_pkey on prefix (cost=0.43..8.45 
rows=1 widt

h=0) (actual time=3.406..3.406 rows=0 loops=1)
 Index Cond: (uid = 'user1'::text)
 Heap Fetches: 0
 Planning time: 19.362 ms
 Execution time: 3.429 ms
(6 rows)

Repeating this after REINDEX:

# echo 3 > /proc/sys/vm/drop_caches

prefix=# SELECT 
relfilenode,relname,reltuples,pg_relation_size(oid)/1024/1024 AS mb

FROM pg_class WHERE relname LIKE 'prefix%';
 relfilenode |   relname   | reltuples | mb
-+-+---+-
 6017817 | prefix  | 1e+07 | 422
 6017819 | prefix_pkey | 1e+07 | 300
(2 rows)

prefix=# EXPLAIN ANALYZE SELECT count(*)
 FROM prefix WHERE uid='user1';
  QUERY PLAN


---
 Aggregate  (cost=8.46..8.46 rows=1 width=0) (actual time=3.868..3.868 
rows=1 lo

ops=1)
   ->  Index Only Scan using prefix_pkey on prefix (cost=0.43..8.45 
rows=1 widt

h=0) (actual time=3.866..3.866 rows=0 loops=1)
 Index Cond: (uid = 'user1'::text)
 Heap Fetches: 0
 Planning time: 19.366 ms
 Execution time: 3.889 ms
(6 rows)

So certainly not significantly *slower* with the physically bigger 
index. This suggests that Jeff's analysis was spot on - likely that the 
larger index didn't fix in RAM.


Cheers

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Greg Stark
On Sat, Aug 13, 2016 at 1:18 AM, Andrew Gierth
 wrote:
>
> Hmm? The code in _bt_findsplitloc and _bt_checksplitloc doesn't seem to
> agree with this.
>
> (Inserting on the high leaf page is a special case, which is where the
> fillfactor logic kicks in; that's why sequentially filled indexes are
> (by default) 90% full rather than 100%. But other pages split into
> roughly equal halves.)

Hm, I was going from this lore. I didn't realize it was only for
inserts near the end of the index. That's cleverer than I realized.

commit 1663f3383849968415d29965ef9bfdf5aac4d358
Author: Tom Lane 
Date:   Sat Sep 29 23:49:51 2001 +

Tweak btree page split logic so that when splitting a page that is
rightmost on its tree level, we split 2/3 to the left and 1/3 to the
new right page, rather than the even split we use elsewhere.  The idea
is that when faced with a steadily increasing series of inserted keys
(such as sequence or timestamp values), we'll end up with a btree that's
about 2/3ds full not 1/2 full, which is much closer to the desired
steady-state load for a btree.  Per suggestion from Ann Harrison of
IBPhoenix.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Andrew Gierth
> "Greg" == Greg Stark  writes:

 >> No, because as the pages split, they fill more slowly (because there
 >> are now more pages). So on average in a large randomly filled index,
 >> pages spend more time nearer 50% full than 100% full. This is easy
 >> to demonstrate by creating a table with an indexed float8 column and
 >> adding batches of random() values to it, checking with pgstatindex
 >> at intervals - the average leaf density will rarely exceed 70%.
 >> 
 >> However, worst case conditions can give lower leaf densities;
 >> obviously the worst case is if the data is loaded in an order and
 >> quantity that just happens to leave every leaf page recently split.

 Greg> btree pages don't split 50/50 either. They split biased to assume
 Greg> the greater side of the split will receive more inserts -- iirc
 Greg> 70/30.

Hmm? The code in _bt_findsplitloc and _bt_checksplitloc doesn't seem to
agree with this.

(Inserting on the high leaf page is a special case, which is where the
fillfactor logic kicks in; that's why sequentially filled indexes are
(by default) 90% full rather than 100%. But other pages split into
roughly equal halves.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Greg Stark
On Fri, Aug 12, 2016 at 8:13 PM, Andrew Gierth
 wrote:
> No, because as the pages split, they fill more slowly (because there are
> now more pages). So on average in a large randomly filled index, pages
> spend more time nearer 50% full than 100% full. This is easy to
> demonstrate by creating a table with an indexed float8 column and adding
> batches of random() values to it, checking with pgstatindex at intervals -
> the average leaf density will rarely exceed 70%.
>
> However, worst case conditions can give lower leaf densities; obviously
> the worst case is if the data is loaded in an order and quantity that
> just happens to leave every leaf page recently split.


btree pages don't split 50/50 either. They split biased to assume the
greater side of the split will receive more inserts -- iirc 70/30. So
if they're loaded sequentially you should get a btree that's 70% full
but the worst case is in theory closer to 30% though I think the
insert order would have to be pretty perverse to be worse than 50%.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Andrew Gierth
> "Jeff" == Jeff Janes  writes:

 Jeff> But shouldn't that still leave us with a 75% full index, rather
 Jeff> than slightly over 50% full?

Average is usually about 67%-70%. (For capacity estimation I always
assume 66% for a non-sequentially-filled btree.)

 Jeff> The leaf pages start at 50%, grow to 100%, then split back to
 Jeff> 50%, then grow back to 100%.  So the average should be about 75%.

No, because as the pages split, they fill more slowly (because there are
now more pages). So on average in a large randomly filled index, pages
spend more time nearer 50% full than 100% full. This is easy to
demonstrate by creating a table with an indexed float8 column and adding
batches of random() values to it, checking with pgstatindex at intervals -
the average leaf density will rarely exceed 70%.

However, worst case conditions can give lower leaf densities; obviously
the worst case is if the data is loaded in an order and quantity that
just happens to leave every leaf page recently split.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Jeff Janes
On Fri, Aug 12, 2016 at 1:40 AM, Mark Kirkwood
 wrote:
> After examining the benchmark design - I see we are probably not being
> helped by the repeated insertion of keys all of form 'userxxx' leading
> to some page splitting.

But shouldn't that still leave us with a 75% full index, rather than
slightly over 50% full?

The leaf pages start at 50%, grow to 100%, then split back to 50%,
then grow back to 100%.  So the average should be about 75%.


> However your index rebuild gets you from 5 to 3 GB - does that really help
> performance significantly?

It can make a big difference, depending on how much RAM you have.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Kisung Kim
You're right. Reindex improves the performance of the benchmark workloads
dramatically.
I'm gathering results and will announce them.

But I think we should notice that the results before Reindexing is poorer
than MongoDB.
It seems that this is because of Btree bloating (not exact expression).
The lookup performance for the Btree is most crucial for the results
because the workload is select for primary key.
So larger Btree could mean less cache hits and slower query performance.
I think that PG doesn't pack leaf node to 90% but half for future insertion
and because of this PG's btree is larger than MongoDB
(I turned off prefix compression of WiredTiger index and block compression
for storage.)
But MongoDB (actually WiredTiger) seems to do differently.

Is my speculation is right? I'm not sure because I didn't review the btree
code of PG yet.

And I want to point that the loading performance of MongoDB is better than
PG.
If PG leaves more space for future insertion, then could we get at least
faster loading performance?
Then can we conclude that we have more chances to improve Btree of PG?

Best Regards,



On Fri, Aug 12, 2016 at 5:40 PM, Mark Kirkwood <
mark.kirkw...@catalyst.net.nz> wrote:

> After examining the benchmark design - I see we are probably not being
> helped by the repeated insertion of keys all of form 'userxxx' leading
> to some page splitting.
>
> However your index rebuild gets you from 5 to 3 GB - does that really help
> performance significantly?
>
> regards
>
> Mark
>
>
> On 11/08/16 16:08, Kisung Kim wrote:
>
>> Thank you for your information.
>> Here is the result:
>>
>> After insertions:
>>
>> ycsb=# select * from pgstatindex('usertable_pkey');
>>  version | tree_level | index_size | root_block_no | internal_pages |
>> leaf_pages | empty_pages | deleted_pages | avg_leaf_density |
>> leaf_fragmentation
>> -+++---+
>> ++-+---+
>> --+
>>2 |  3 | 5488721920 | 44337 | 4464 |
>>  665545 |   0 | 0 |   52 | 11
>> (1 row)
>>
>> After rebuild:
>>
>>
>> ycsb=# select * from pgstatindex('usertable_pkey');
>>  version | tree_level | index_size | root_block_no | internal_pages |
>> leaf_pages | empty_pages | deleted_pages | avg_leaf_density |
>> leaf_fragmentation
>> -+++---+
>> ++-+---+
>> --+
>>2 |  3 | 3154296832 | 41827 | 1899 |
>>  383146 |   0 | 0 |90.08 |
>> 0
>>
>>
>> It seems like that rebuild has an effect to reduce the number of internal
>> and leaf_pages and make more dense leaf pages.
>>
>>
>>
>>
>


-- 




Bitnine Global Inc., Kisung Kim, Ph.D
https://sites.google.com/site/kisungresearch/
E-mail : ks...@bitnine.net
Office phone : 070-4800-5890, 408-606-8602
US Mobile phone : 408-805-2192


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Mark Kirkwood
After examining the benchmark design - I see we are probably not being 
helped by the repeated insertion of keys all of form 'userxxx' 
leading to some page splitting.


However your index rebuild gets you from 5 to 3 GB - does that really 
help performance significantly?


regards

Mark

On 11/08/16 16:08, Kisung Kim wrote:

Thank you for your information.
Here is the result:

After insertions:

ycsb=# select * from pgstatindex('usertable_pkey');
 version | tree_level | index_size | root_block_no | internal_pages | 
leaf_pages | empty_pages | deleted_pages | avg_leaf_density | 
leaf_fragmentation

-+++---+++-+---+--+
   2 |  3 | 5488721920 | 44337 | 4464 | 
665545 |   0 | 0 |   52 | 11

(1 row)

After rebuild:


ycsb=# select * from pgstatindex('usertable_pkey');
 version | tree_level | index_size | root_block_no | internal_pages | 
leaf_pages | empty_pages | deleted_pages | avg_leaf_density | 
leaf_fragmentation

-+++---+++-+---+--+
   2 |  3 | 3154296832 | 41827 | 1899 |   
  383146 |   0 | 0 |90.08 |   
   0



It seems like that rebuild has an effect to reduce the number of 
internal and leaf_pages and make more dense leaf pages.








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-10 Thread Kisung Kim
Thank you for your information.
Here is the result:

After insertions:

ycsb=# select * from pgstatindex('usertable_pkey');
 version | tree_level | index_size | root_block_no | internal_pages |
leaf_pages | empty_pages | deleted_pages | avg_leaf_density |
leaf_fragmentation
-+++---+++-+---+--+
   2 |  3 | 5488721920 | 44337 |   4464 |
665545 |   0 | 0 |   52 | 11
(1 row)

After rebuild:


ycsb=# select * from pgstatindex('usertable_pkey');
 version | tree_level | index_size | root_block_no | internal_pages |
leaf_pages | empty_pages | deleted_pages | avg_leaf_density |
leaf_fragmentation
-+++---+
++-+---+
--+
   2 |  3 | 3154296832 | 41827 |   1899 |
383146 |   0 | 0 |90.08 |  0


It seems like that rebuild has an effect to reduce the number of internal
and leaf_pages and make more dense leaf pages.



On Wed, Aug 10, 2016 at 6:47 PM, Lukas Fittl  wrote:

> On Wed, Aug 10, 2016 at 4:24 PM, Kisung Kim  wrote:
>>
>> When I used the index bloating estimation script in
>> https://github.com/ioguix/pgsql-bloat-estimation,
>> the result is as follows:
>>
>
> Regardless of the issue at hand, it might make sense to verify these
> statistics using pgstattuple - those bloat estimation queries can be wildly
> off at times.
>
> See also https://www.postgresql.org/docs/9.5/static/pgstattuple.html as
> well as https://www.keithf4.com/checking-for-postgresql-bloat/
>
> Best,
> Lukas
>
> --
> Lukas Fittl
>
> Skype: lfittl
> Phone: +1 415 321 0630
>



-- 




Bitnine Global Inc., Kisung Kim, Ph.D
https://sites.google.com/site/kisungresearch/
E-mail : ks...@bitnine.net
Office phone : 070-4800-5890, 408-606-8602
US Mobile phone : 408-805-2192


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-10 Thread Lukas Fittl
On Wed, Aug 10, 2016 at 4:24 PM, Kisung Kim  wrote:
>
> When I used the index bloating estimation script in
> https://github.com/ioguix/pgsql-bloat-estimation,
> the result is as follows:
>

Regardless of the issue at hand, it might make sense to verify these
statistics using pgstattuple - those bloat estimation queries can be wildly
off at times.

See also https://www.postgresql.org/docs/9.5/static/pgstattuple.html as
well as https://www.keithf4.com/checking-for-postgresql-bloat/

Best,
Lukas

-- 
Lukas Fittl

Skype: lfittl
Phone: +1 415 321 0630


[HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-10 Thread Kisung Kim
Hi,

I've run YCSB(Yahoo! Cloud Service Benchmark) on PostgreSQL and MongoDB
with WiredTiger.
And I found some interesting results and some issues(maybe) on Btree index
of PostgreSQL.

Here is my experiments and results.
YCSB is for document store benchmark and I build following schema in PG.

CREATE TABLE usertable (
YCSB_KEY varchar(30) primary key,
FIELDS jsonb);

And the benchmark generated avg-300-byte-length Json documents and loaded
100M rows in PG and Mongo.

First I found that the size difference between PG and Mongo:
I configured Mongo not to use any compression for both storage and index.

MongoDB index size: 2.1 GB
PostgreSQL index size: 5.5 GB

When I used the index bloating estimation script in
https://github.com/ioguix/pgsql-bloat-estimation,
the result is as follows:
 current_database | schemaname | tblname  |  idxname
   | real_size  | extra_size |extra_ratio| fillfactor |
bloat_size |bloat_ratio| is_na
--++--+---+++---+++---+---
ycsb | public | usertable| usertable_pkey
 | 5488852992 | 2673713152 |  48.7116917850949 | 90 |
2362122240 |  43.0348971532448 | f

It says that the bloat_ratio is 42 for the index.

So, I rebuilded the index and the result was changed:

 current_database | schemaname | tblname  |  idxname
   | real_size  | extra_size |extra_ratio| fillfactor |
bloat_size |bloat_ratio| is_na
--++--+---+++---+++---+---
 ycsb | public | usertable| usertable_pkey
   | 3154264064 |  339132416 |  10.7515543758863 | 90 |
27533312 | 0.872891788428275 | f


I am curious about the results
1) why the index was bloated even though rows were only inserted not
removed or updated.
2) And then why the bloating is removed when it is rebuilded

I guess that the splits of Btree nodes during inserting rows caused the
bloating but I don't know exact reasons.
And given that MongoDB's index size is much smaller than PG after they
handled the same workload (only insert),
then is there any chances to improve PG's index behavior.

Thank you very much.


-- 




Bitnine Global Inc., Kisung Kim, Ph.D
https://sites.google.com/site/kisungresearch/
E-mail : ks...@bitnine.net
Office phone : 070-4800-5890, 408-606-8602
US Mobile phone : 408-805-2192