Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-10-01 Thread Daniel Gustafsson
> On 06 Sep 2017, at 08:42, Fabien COELHO  wrote:
> 
> Hello Alik,
> 
> Applies, compiles, works for me.
> 
> Some minor comments and suggestions.
> 
> Two typos:
> - "usinng" -> "using"
> - "a rejection method used" -> "a rejection method is used"
> 
> I'm not sure of "least_recently_used_i", this naming style is not used in 
> pgbench. "least_recently_used" would be ok.
> 
> "..nb_cells.. != ZIPF_CACHE_SIZE", ISTM that "<" is more logical,
> even if the result is the same?
> 
> I would put the parameter value check in getZipfianRand, so that if someone 
> reuse the function elsewhere the check is also performed. That would also 
> simplify a bit the already very large expression evaluation function.
> 
> When/if the pgbench tap test patch get through, coverage tests should
> be added.
> 
> Maybe the cache overflow could be counted, to allow for a possible warning 
> message in the final report?

Since this patch has been Waiting for author and no update on this patch has
been posted during the commitfest, it is Returned with feedback.  When you have
a new version of the patch, please re-submit to a future commitfest.

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-09-06 Thread Fabien COELHO


Hello Alik,

Applies, compiles, works for me.

Some minor comments and suggestions.

Two typos:
 - "usinng" -> "using"
 - "a rejection method used" -> "a rejection method is used"

I'm not sure of "least_recently_used_i", this naming style is not used in 
pgbench. "least_recently_used" would be ok.


"..nb_cells.. != ZIPF_CACHE_SIZE", ISTM that "<" is more logical,
even if the result is the same?

I would put the parameter value check in getZipfianRand, so that if 
someone reuse the function elsewhere the check is also performed. That 
would also simplify a bit the already very large expression evaluation 
function.


When/if the pgbench tap test patch get through, coverage tests should
be added.

Maybe the cache overflow could be counted, to allow for a possible warning 
message in the final report?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-09-02 Thread Alik Khilazhev
Hello Fabien,

Thank you for detailed review. I hope I have fixed all the issues you mentioned 
in your letter.



pgbench-zipf-08v.patch
Description: Binary data

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-23 Thread Fabien COELHO


Hello Alik,


I am attaching patch v7.


Patch generates multiple warnings with "git apply", apparently because of 
end-of-line spaces, and fails:


  pgbench-zipf-07v.patch:52: trailing whitespace.
{
  pgbench-zipf-07v.patch:53: trailing whitespace.
"random_zipfian", 3, PGBENCH_RANDOM_ZIPFIAN
  pgbench-zipf-07v.patch:54: trailing whitespace.
},
  pgbench-zipf-07v.patch:66: trailing whitespace.
  #define ZIPF_CACHE_SIZE 15  /* cache cells number */
  pgbench-zipf-07v.patch:67: trailing whitespace.

  error: patch failed: doc/src/sgml/ref/pgbench.sgml:1013
  error: doc/src/sgml/ref/pgbench.sgml: patch does not apply
  error: patch failed: src/bin/pgbench/exprparse.y:191
  error: src/bin/pgbench/exprparse.y: patch does not apply
  error: patch failed: src/bin/pgbench/pgbench.c:93
  error: src/bin/pgbench/pgbench.c: patch does not apply
  error: patch failed: src/bin/pgbench/pgbench.h:75
  error: src/bin/pgbench/pgbench.h: patch does not apply

It also complains with "patch", although it seems to work:

 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file doc/src/sgml/ref/pgbench.sgml
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/bin/pgbench/exprparse.y
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/bin/pgbench/pgbench.c
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/bin/pgbench/pgbench.h
 patch unexpectedly ends in middle of line
 patch unexpectedly ends in middle of line

Please do not put spaces at the end of lines, eg there is a lonely tab on 
the second line of "computeHarmonicZipfian".


It seems that "CR" characters are used:

  sh> sha1sum ~/pgbench-zipf-07v.patch
  439ad1de0a960b3b415ff6de9412b3cc4d6922e7

  sh> file ~/pgbench-zipf-07v.patch
  pgbench-zipf-07v.patch: unified diff output, ASCII text, with CRLF line 
terminators

Compilation issues one warning:

 pgbench.c: In function ‘computeHarmonicZipfian’:
 pgbench.c:894:2: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]
double  uniform = pg_erand48(thread->random_state);

Please conform to pg standard for portability, even if it is a very old one:-)


About the documentation:

I would suggest to replace 0.5 in the first example by 1.5, as a zipfian 
distribution with a lower-than-one parameter is not that well defined.


I'm fine with using references to papers or books for the algorithm.

The documentation lines in the sgml file are too long. At least
limit text paragraphs to maybe 80 columns as done elsewhere.

typo: " Zipfian" (remove space)

typo: "used(described" (missing space)

typo: "numbers(algorithm" (again)

The sentence "It's recommended to pick parameter in range 
(0, 1) for better accuracy." does not make sense with the updated 
generation algorithm.


The text would benefit from a review by a native English speaker, who I am 
not. Anyway here is an attempt at improving/simplifying the text (I have 
removed sgml formatting). If some native English speaker could review it, 
that would be great.


"""
  "random_zipfian" generates an approximated bounded zipfian distribution.
  For parameter in (0,1), an approximated algorithm is taken from
  "Quickly Generating Billion-Record Synthetic Databases", Jim Gray et al, 
SIGMOD 1994.
  For parameter in (1, 1000), a rejection method is used, based on
  "Non-Uniform Random Variate Generation", Luc Devroye, p. 550-551, Springer 
1986.
  The distribution is not defined when the parameter's value is 1.0.
  The drawing performance is poor for parameter values close and above 1.0
  and on a small range.

  "parameter" defines how skewed the distribution is.
  The larger the "parameter", the more frequently values close to the beginning
  of the interval are drawn.
  The closer to 0 "parameter" is, the flatter (more uniform) the access 
distribution.
"""


English in comments and code:

"inited" is not English, I think you want "initialized". Maybe "nb_cells" 
would do.


"it is not exist" (multiple instances) -> "it does not exist".

I think that the references to the paper/book should appear as comments in 
the code, for instance in front of the functions which implement the 
algorithm.


"current_time", "last_touch_time" are counter based, they are not "time". 
I suggest to update the name to "current" and "last_touch" or "last_used".


"time when cell was used last time" -> "last used logical time"


About the code:

I would prefer that you use "1.0" instead of "1." for double constants.

I think that getZipfianRand should be symmetric. Maybe a direct formula 
could do:


  return min - 1 + (s > 1) ? xxx() : yyy();

or at least use an explicit "else" for symmetry.

Function "zipfn" asks for s and n as arguments, but ISTM that they are 
already include as fields in cell, so this seems redundant. Moreover I do 
not think that the zipfn function is that useful. I would suggest to 

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-22 Thread Alik Khilazhev
Hello, Fabien

I am attaching patch v7.

> Yes, I agree. a >= 1 does not make much sense... If you want uniform you 
> should use random(), not call random_zipfian with a = 1. Basically it 
> suggests that too large values of "a" should be rejected. Not sure where to 
> put the limit, though.

I set upper bound for parameter to be equal to 1000.
> 
> Yes, as a general principle I think that the documentation should reflect the 
> implementation.

Documentation have been updated, I have removed algorithms descriptions and put 
references to them there.



pgbench-zipf-07v.patch
Description: Binary data
—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-13 Thread Fabien COELHO


Hello Alik,


Now “a” does not have upper bound, that’s why on using iterative algorithm with a 
>= 1 program will stuck on infinite loop because of following line of code:
double b = pow(2.0, s - 1.0);
Because after overflow “b” becomes “+Inf”.


Yep, overflow can happen.


So should upper bound for “a" be set?


Yes, I agree. a >= 1 does not make much sense... If you want uniform 
you should use random(), not call random_zipfian with a = 1. Basically 
it suggests that too large values of "a" should be rejected. Not sure 
where to put the limit, though.


Should I mention in docs that there are two algorithms are used 
depending on values of a(s/theta)?


Yes, as a general principle I think that the documentation should reflect 
the implementation.


In attaching patch, I have added computeIterativeZipfian method and it’s 
usage in getZipfianRand. Is it better to move code of computing via 
cache to new method, so that getZipfianRand will contain only 2 
computeXXXZipfian method calls?


I have not looked in detail, but from what you say I would agree that the 
implementation should be symmetric, so having one function calling one 
method or the other sounds good.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-13 Thread Alik Khilazhev
Hello Fabien,

> 
> I think that this method should be used for a>1, and the other very rough one 
> can be kept for parameter a in [0, 1), a case which does not make much sense 
> to a mathematician as it diverges if unbounded.

Now “a” does not have upper bound, that’s why on using iterative algorithm with 
a >= 1 program will stuck on infinite loop because of following line of 
code:
double b = pow(2.0, s - 1.0); 
Because after overflow “b” becomes “+Inf”.

So should upper bound for “a" be set? 

Should I mention in docs that there are two algorithms are used depending on 
values of a(s/theta)?

In attaching patch, I have added computeIterativeZipfian method and it’s usage 
in getZipfianRand.
Is it better to move code of computing via cache to new method, so that 
getZipfianRand will contain only 2 computeXXXZipfian method calls?


pgbench-zipf-06v.patch
Description: Binary data
—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-06 Thread Alik Khilazhev
Hello Fabien,


> On 5 Aug 2017, at 12:15, Fabien COELHO  wrote:
> 
> 
> Hello Alik,
> 
> I've done some math investigations, which consisted in spending one hour with 
> Christian, a statistician colleague of mine. He took an old book out of a 
> shelf, opened it to page 550 (roughly in the middle), and explained to me how 
> to build a real zipfian distribution random generator.
> 
> The iterative method is for parameter a>1 and works for unbounded values. It 
> is simple to add a bound. In practice the iterative method is quite 
> effective, i.e. number of iterations is typically small, at least if the 
> bound is large and if parameter a is not too close to 1.

> I've attached a python3 script which implements the algorithm. It looks like 
> magic. Beware that a C implementation should take care of float and int 
> overflows.
> 
Thank you for the script. I will rewrite it to C and add to the patch soon. 

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-05 Thread Fabien COELHO


Hello Alik,

So I would be in favor of expanding the documentation but not 
restricting the parameter beyond avoiding value 1.0.


I have removed restriction and expanded documentation in attaching patch v5.


I've done some math investigations, which consisted in spending one hour 
with Christian, a statistician colleague of mine. He took an old book out 
of a shelf, opened it to page 550 (roughly in the middle), and explained 
to me how to build a real zipfian distribution random generator.


The iterative method is for parameter a>1 and works for unbounded values. 
It is simple to add a bound. In practice the iterative method is quite 
effective, i.e. number of iterations is typically small, at least if the 
bound is large and if parameter a is not too close to 1.


I've attached a python3 script which implements the algorithm. It looks 
like magic. Beware that a C implementation should take care of float and 
int overflows.


  # usage: a, #values, #tests

  sh> zipf.py 1.5 1000 100
  # after 1.7 seconds
  c = [391586, 138668, 75525, 49339, 35222, 26621, ...
   ... 11, 13, 12, 11, 16] (1.338591 iterations per draw)

  sh> zipf.py 1.1 1000 100
  # after 3.1 seconds
  c = [179302, 83927, 53104, 39015, 30557, 25164, ...
   ... 82, 95, 93, 81, 80] (2.681451 iterations per draw)

I think that this method should be used for a>1, and the other very rough 
one can be kept for parameter a in [0, 1), a case which does not make much 
sense to a mathematician as it diverges if unbounded.


--
Fabien.#! /usr/bin/env python3
#
# generate Zipf distribution
#
# method taken from:
#   Luc Devroye,
#  "Non-Uniform Random Variate Generation"
#  p. 550-551.
#  Springer 1986
#
# the method works for an infinite bound, the finite bound condition has been
# added.

a = 1.1
N = 100
M = 1

import sys
if len(sys.argv) >= 3:
a = float(sys.argv[1])
N = int(sys.argv[2])
if len(sys.argv) >= 4:
M = int(sys.argv[3])

# beware, a close to 1 and n small (eg 100) leads to large number of iterations
# i.e. rejection probability is high when a -> 1
# - 1.001: 280
# - 1.002: 139.2
# - 1.005:  55.9
# - 1.010:  28.4
# - 1.020:  14.8
# - 1.050:   6.2
# - 1.100:   3.5
# however if n is larger the number of iterations decreases significantly

from random import random
from math import exp

def zipfgen(a, N):
assert a > 1.0, "a must be greater than 1"
b = 2.0 ** (a - 1.0)
i = 0 # count iterations
while True:
i += 1
u, v = random(), random()
try:
x = int(u ** (- 1.0 / (a - 1.0)))
t = (1.0 + 1.0 / x) ** (a - 1.0)
# reject if too large or out of bound
if v * x * (t - 1.0) / (b - 1.0) <= t / b and x <= N:
break
except OverflowError: # on u ** ...
pass
return (x, i)

if M == 1:
x, i = zipfgen(a, N)
print("X = %d (%d)" % (x, i))
else:
c = [0 for i in range(0, N)]
cost = 0
for i in range(0, M):
x, i = zipfgen(a, N)
# assert 1 <= x and x <= N, "x = %d" % x
cost += i
c[x-1] += 1
print("c = %s (%f iterations per draw)" % (c, cost/M))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-05 Thread Fabien COELHO


Hello Peter,

I think that it would also be nice if there was an option to make 
functions like random_zipfian() actually return a value that has 
undergone perfect hashing. When this option is used, any given value 
that the function returns would actually be taken from a random mapping 
to some other value in the same range. So, you can potentially get a 
Zipfian distribution without the locality.


I definitely agree. This is a standard problem with all non uniform random 
generators in pgbench, namely random_{gaussian,exponential}.


However hashing is not a good solution on a finite domain because of the 
significant collision rate, so that typically 1/3 of values are empty and 
collisions cause spikes. Also, collisions would break PKs.


The solution is to provide a (good) pseudo-random parametric permutation, 
which is non trivial especially for non powers of two, so ISTM that it 
should be a patch on its own.


The good news is that it is on my todo list and I have some ideas on how 
to do it.


The bad news is that given the rate at which I succeed in getting things 
committed in pgbench, it might take some years:-( For instance, simplistic 
functions and operators to extend the current set have been in the pipe 
for 15 months and missed pg10.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-04 Thread Peter Geoghegan
On Fri, Jul 21, 2017 at 4:51 AM, Alik Khilazhev
 wrote:
> (Latest version of pgbench Zipfian patch)

While I'm +1 on this idea, I think that it would also be nice if there
was an option to make functions like random_zipfian() actually return
a value that has undergone perfect hashing. When this option is used,
any given value that the function returns would actually be taken from
a random mapping to some other value in the same range. So, you can
potentially get a Zipfian distribution without the locality.

While I think that Zipfian distributions are common, it's probably
less common for the popular items to be clustered together within the
primary key, unless the PK has multiple attributes, and the first is
the "popular attribute". For example, there would definitely be a lot
of interest among contiguous items within an index on "(artist,
album)" where "artist" is a popular artist, which is certainly quite
possible. But, if "album" is the primary key, and it's a SERIAL PK,
then there will only be weak temporal locality within the PK, and only
because today's fads are more popular than the fads from previous
years.

Just another idea, that doesn't have to hold this patch up.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-21 Thread Alik Khilazhev
Hello!

I realized that I was sending emails as HTML and latest patch is not visible in 
the archive now.
That’s why I am attaching it again.

I am sorry for that.



pgbench-zipf-05v.patch
Description: Binary data

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-21 Thread Alik Khilazhev
Hmmm. On second thought, maybe one or the other is enough, either restrict the parameter to values where the approximation is good, or put out a clear documentation about when the approximation is not very good, but it may be still useful even if not precise.So I would be in favor of expanding the documentation but not restricting the parameter beyond avoiding value 1.0.I have removed restriction and expanded documentation in attaching patch v5. Also I have recorded patch to CF 2017-09 —  https://commitfest.postgresql.org/14/1206/. 

pgbench-zipf-05v.patch
Description: Binary data
—Thanks and Regards,Alik KhilazhevPostgres Professional:http://www.postgrespro.comThe Russian Postgres Company



Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-20 Thread Alik Khilazhev



> I think that developping a test would be much simpler with the improved tap 
> test infrastructure, so I would suggest to wait to know the result of the 
> corresponding patch.

Ok, I will wait then.

> Also, could you recod the patch to CF 2017-09?
> https://commitfest.postgresql.org/14/ 

Yea, I will send it.

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-20 Thread Fabien COELHO


Hello Alik,

About the maths: As already said, I'm not at ease with a random_zipfian 
function which does not display a (good) zipfian distribution. At the 
minimum the documentation should be clear about the approximations 
implied depending on the parameter value.


I add one more sentence to documentation to emphasize that degree of 
proximity depends on parameter .
And also I made restriction on 
parameter, now it can be only in range (0; 1)


Hmmm. On second thought, maybe one or the other is enough, either restrict 
the parameter to values where the approximation is good, or put out a 
clear documentation about when the approximation is not very good, but it 
may be still useful even if not precise.


So I would be in favor of expanding the documentation but not restricting 
the parameter beyond avoiding value 1.0.



[...]
Now it prints warning message if array overflowed. To print message only 
one time, it uses global flag, which is available for all threads. And 
theoretically message can be printed more than one time. [...]

So, should I spend time on solving this issue?


No. Just write a comment.

If the zipf cache is constant size, there is no point in using dynamic 
allocation, just declare an array…


Fixed. Does ZIPF_CACHE_SIZE = 15 is ok?


My guess is yes, till someone complains that it is not the case;-)


There should be non regression tests somehow. If the "improve pgbench
tap test infrastructure" get through, things should be added there.


I will send tests later, as separate patch.


I think that developping a test would be much simpler with the improved 
tap test infrastructure, so I would suggest to wait to know the result of 
the corresponding patch.


Also, could you recod the patch to CF 2017-09?
https://commitfest.postgresql.org/14/

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-20 Thread Alik Khilazhev
Hello Fabien,I am attaching patch v4. On 19 Jul 2017, at 17:21, Fabien COELHO  wrote:About the maths: As already said, I'm not at ease with a random_zipfian function which does not display a (good) zipfian distribution. At the minimum the documentation should be clear about the approximations implied depending on the parameter value.I add one more sentence to documentation to emphasize that degree of proximity depends on parameter . And also I made restriction on parameter, now it can be only in range (0; 1)In the litterature the theta parameter seems to be often called alphaor s (eg see https://en.wikipedia.org/wiki/Zipf%27s_law). I would suggest tostick to "s" instead of "theta”?I have renamed it to “s”.Functions zipfZeta(n, theta) does not really computes the zeta(n) function,so I think that a better name should be chosen. It seems to computeH_{n,theta}, the generalized harmonic number. Idem "thetan" field in struct.Renamed zipfZeta to zipfGeneralizedHarmonic, zetan to harmonicn.The handling of cache overflow by randomly removing one entry looks likea strange idea. Rather remove the oldest entry?Replaced with Least Recently Used replacement algorithm.ISTM that it should print a warning once if the cache array overflows as performance would drop heavily.Now it prints warning message if array overflowed. To print message only one time, it uses global flag, which is available for all threads. And theoretically message can be printed more than one time. It could be solved easily using pg_atomic_test_set_flag() from src/include/port/atomics.h but it can not be used in pgbench because of following lines of code there:#ifdef FRONTEND#error "atomics.h may not be included from frontend code"#endifOr it can be fixed by using mutexes from pthread, but I think code become less readable and more complex in this case.So, should I spend time on solving this issue? If the zipf cache is constant size, there is no point in using dynamic allocation, just declare an array…Fixed. Does ZIPF_CACHE_SIZE = 15 is ok? There should be non regression tests somehow. If the "improve pgbenchtap test infrastructure" get through, things should be added there.I will send tests later, as separate patch.

pgbench-zipf-04v.patch
Description: Binary data
—Thanks and Regards,Alik KhilazhevPostgres Professional:http://www.postgrespro.comThe Russian Postgres Company

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-19 Thread Fabien COELHO


Hello Alik,


I am attaching patch v3.


Among other things I fixed small typo in description of 
random_exponential function in pgbench.sgml file.


Ok. Probably this typo should be committed separatly and independently.

A few comments about v3:

Patch applies cleanly, compiles, works.

About the maths: As already said, I'm not at ease with a random_zipfian 
function which does not display a (good) zipfian distribution. At the 
minimum the documentation should be clear about the approximations implied 
depending on the parameter value.


In the litterature the theta parameter seems to be often called alpha
or s (eg see https://en.wikipedia.org/wiki/Zipf%27s_law). I would suggest to
stick to "s" instead of "theta"?

About the code: looks simpler than the previous version, which is good.

Double constants should be used when the underlying type is a double,
instead of relying on implicit int to double promotion (0 -> 0.0, 1 -> 1.0).

Functions zipfZeta(n, theta) does not really computes the zeta(n) function,
so I think that a better name should be chosen. It seems to compute
H_{n,theta}, the generalized harmonic number. Idem "thetan" field in struct.

The handling of cache overflow by randomly removing one entry looks like
a strange idea. Rather remove the oldest entry?

ISTM that it should print a warning once if the cache array overflows as 
performance would drop heavily.


If the zipf cache is constant size, there is no point in using dynamic 
allocation, just declare an array...


Comments need updating: eg "theta parameter of previous execution" which
dates back when there was only one value.

There should be a comment to explain that the structure is in the thread
for thread safety.

There should be non regression tests somehow. If the "improve pgbench
tap test infrastructure" get through, things should be added there.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-17 Thread Fabien COELHO


Hello,

Is this bias expected from the drawing method, say because it is 
approximated and the approximation is weak at some points, or is there 
an issue with its implementation, says some shift which gets smoothed 
down for higher indexes?


I have checked paper where such implementation was proposed and there 
theta allowed only on range between 0 and 1. It seems like it is not 
guaranteed that it should work well when theta is more than 1.


Ok.

I see a significant issue with having a random_zipfian function which does 
not really return a zipfian distribution under some parameter values. If 
there is no better alternative, I would suggest to restrict the parameter 
for values between 0 and 1, or to find a better approximation for theta >= 
0.



I am attaching paper, see page 23.


Thanks for the paper. It reminds me that I intended to propose a 
parametric pseudo-random permutation for pgbench, some day.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-17 Thread Alik Khilazhev

> On 17 Jul 2017, at 13:51, Fabien COELHO  wrote:
> 
> 
> Is this bias expected from the drawing method, say because it is approximated 
> and the approximation is weak at some points, or is there an issue with its 
> implementation, says some shift which gets smoothed down for higher indexes?
> 

I have checked paper where such implementation was proposed and there theta 
allowed only on range between 0 and 1. It seems like it is not guaranteed that 
it should work well when theta is more than 1.

I am attaching paper, see page 23.


syntheticdatagen.pdf
Description: Adobe PDF document

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-17 Thread Fabien COELHO



Ok, so you did not get the large bias for i=3. Strange.


I got large bias for i=3 and theta > 1 even with a million outcomes,


Ok. So this is similar to what I got.

Is this bias expected from the drawing method, say because it is 
approximated and the approximation is weak at some points, or is there an 
issue with its implementation, says some shift which gets smoothed down 
for higher indexes?


I am attaching patch v3. Among other things I fixed small typo in 
description of random_exponential function in pgbench.sgml file.


I'll look into that.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-17 Thread Alik Khilazhev
Hello Fabien,On 14 Jul 2017, at 17:51, Fabien COELHO  wrote:Ok, so you did not get the large bias for i=3. Strange.I got large bias for i=3 and theta > 1 even with a million outcomes, but for theta < 1 (I have tested on theta = 0.1 and 0.3) it showed quite good results.I am attaching patch v3. Among other things I fixed small typo in description of random_exponential function in pgbench.sgml file.

pgbench-zipf-03v.patch
Description: Binary data
—Thanks and Regards,Alik KhilazhevPostgres Professional:http://www.postgrespro.comThe Russian Postgres Company

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-14 Thread Peter Geoghegan
On Fri, Jul 14, 2017 at 6:37 AM, Alik Khilazhev
 wrote:
> I am attaching results of tests for 32 and 128 clients that were running for 
> 10 minutes, and TPS remains 305 and 115 ktps respectively.
>
> Tests was executed with configuration set for YCSB. And there is very 
> aggressively autovacuum, this can be reason why there is no decline appears 
> with increasing working time.
> Autovacuum config:
>
> autovacuum_max_workers = 8
> autovacuum_naptime = 10s
> autovacuum_vacuum_scale_factor = 0.1
> autovacuum_vacuum_cost_delay = 0ms
> autovacuum_vacuum_cost_limit = 1

I think that what this probably comes down to, more than anything
else, is that you have leftmost hot/bloated leaf pages like this:


  idx  | level | l_item | blkno | btpo_prev |
btpo_next | btpo_flags | type | live_items | dead_items |
avg_item_size | page_size | free_size | highkey
---+---++---+---+---++--+++---+---+---+-
 ...
 pgbench_accounts_pkey | 0 |  1 | 1 | 0 |
2751 | 65 | l|100 | 41 |16 |
   8192 |  5328 | 11 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  2 |  2751 | 1 |
2746 | 65 | l| 48 | 90 |16 |
   8192 |  5388 | 32 00 00 00 00 00 00 00
...

The high key for the leftmost shows that only values below 0x11 belong
on the first page. This is about 16 or 17 possible distinct values,
and yet the page has 100 live items, and 41 dead items; in total,
there is room for 367 items. It's only slightly better with other
nearby pages. This is especially bad because once the keyspace gets
split up this finely, it's *impossible* to reverse it -- it's more or
less a permanent problem, at least until a REINDEX. You cannot merge
pages back together until one is completely empty, which in this case
and many cases will in fact never happen. Aggressive vacuuming is
probably helpful in part because it prevents the problem from ever
getting completely out of hand. That doesn't seem like a very
practical solution, though.

We should probably be treating unique indexes in a special way, since
inserting a new conflicting tuple necessarily supersedes whatever it
conflicted with. Postgres holds on to the old tuple today, but only
because the transaction might abort, or because an old snapshot might
be interested in old tuple versions. However, the general pattern with
unique indexes is that there must be one tuple visible to new
snapshots, and old versions are almost certainly going to became
garbage very quickly. Unique indexes really are quite special, which
nbtree doesn't take advantage of at all. If you read the coverage of
B-Trees within "Transaction Processing: Concepts and Techniques", and
many other publications, the general trend seems to be that unique
indexes have truly unique keys, based only on the user-visible key
values. They make a sharp distinction between primary and secondary
indexes, which doesn't really exist in Postgres (at least, not at the
access method level).

I suspect that the best long term solution is to add GIN-style
duplicate handling within nbtree for unique indexes, with special
pruning style optimizations to the posting list structure. This would
probably only happen with unique indexes. The useful thing about this
approach is it separates these two problems:

1. Representing what values are in the index for lookup, and their
latest row version.

2. Finding old row versions, in the less common case where you have an
old snapshot.

With a design like that, nbtree would never "cut up the keyspace too
finely" because of a temporary surge of UPDATE insertions. You still
get bloat, but you add it to a place where garbage collection can be
much better targeted. Under this scheme, it doesn't really matter so
much if garbage collection doesn't happen very frequently, because
there could be LP_DEAD style hints for the auxiliary posting list
structure. That could probably work based on atomic ops, and would
greatly reduce the number of pages that UPDATE workloads like this
dirtied.

It probably wouldn't make sense to add things like GIN's compression
of item pointers, since most data within the auxiliary posting list
structure is removed very quickly. I wouldn't expect btree_gin to be
faster for this workload today, because it doesn't have
kill_prior_tuple/LP_DEAD support, and because it doesn't support
unique indexes, and so cannot exploit the special situation that
exists with unique indexes.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-14 Thread Fabien COELHO


Algorithm works with theta less than 1. The only problem here is that 
theta can not be 1, because of next line of code


cell->alpha = 1. / (1 - theta);

That’s why I put such restriction. Now I see 2 possible solutions for that:
1) Exclude 1, and allow everything in range (0;+∞).


Yep.


2) Or just increase/decrease theta by very small number if it is 1.


Nope, this seems quite arbitrary.

I've executed scripts that you attached with different theta and number 
of outcomes(not n, n remains the same = 100) and I found out that for 
theta = 0.1 and big number of outcomes it gives distribution very 
similar to zipfian(for number of outcomes = 100 000, bias -6% to 8% in 
whole range and for NOO = 1000 000, bias is -2% to 2%).


Ok, so you did not get the large bias for i=3. Strange.

By, number of outcomes(NOO) I mean how many times random_zipfian was 
called. For example: pgbench -f compte_bench.sql -t 10 So, I think 
it works but works worse for small number of outcomes. And also we need 
to find optimal theta for better results.


Hmmm. I've run one million outcomes each time.

I'll check on the next version.

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-14 Thread Alik Khilazhev

> On 13 Jul 2017, at 23:13, Peter Geoghegan  wrote:
> 
> I just figured out that "result.txt" is only a 60 second pgbench run.
> Is the same true for other tests?

Yes, other tests ran 60 seconds too.

> 
> It would be much more interesting to see runs that lasted 10 minutes
> or more.


I am attaching results of tests for 32 and 128 clients that were running for 10 
minutes, and TPS remains 305 and 115 ktps respectively. 

Tests was executed with configuration set for YCSB. And there is very 
aggressively autovacuum, this can be reason why there is no decline appears 
with increasing working time.
Autovacuum config:

autovacuum_max_workers = 8 
autovacuum_naptime = 10s
autovacuum_vacuum_scale_factor = 0.1
autovacuum_vacuum_cost_delay = 0ms
autovacuum_vacuum_cost_limit = 1

  idx  | level | l_item | blkno | btpo_prev | btpo_next | 
btpo_flags | type | live_items | dead_items | avg_item_size | page_size | 
free_size | highkey 
---+---++---+---+---++--+++---+---+---+-
 pgbench_accounts_pkey | 2 |  1 |   290 | 0 | 0 |   
   2 | r| 10 |  0 |15 |  8192 |  7956 | 
 pgbench_accounts_pkey | 1 |  1 | 3 | 0 |   289 |   
   0 | i|296 |  0 |15 |  8192 |  2236 | 
09 96 01 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  2 |   289 | 3 |   575 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
11 2c 03 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  3 |   575 |   289 |   860 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
19 c2 04 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  4 |   860 |   575 |  1145 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
21 58 06 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  5 |  1145 |   860 |  1430 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
29 ee 07 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  6 |  1430 |  1145 |  1715 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
31 84 09 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  7 |  1715 |  1430 |  2000 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
39 1a 0b 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  8 |  2000 |  1715 |  2285 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
41 b0 0c 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  9 |  2285 |  2000 |  2570 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
49 46 0e 00 00 00 00 00
 pgbench_accounts_pkey | 1 | 10 |  2570 |  2285 | 0 |   
   0 | i|177 |  0 |15 |  8192 |  4616 | 
 pgbench_accounts_pkey | 0 |  1 | 1 | 0 |  2751 |   
  65 | l|100 | 41 |16 |  8192 |  5328 | 
11 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  2 |  2751 | 1 |  2746 |   
  65 | l| 48 | 90 |16 |  8192 |  5388 | 
32 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  3 |  2746 |  2751 |  2749 |   
  65 | l| 47 |  6 |16 |  8192 |  7088 | 
5f 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  4 |  2749 |  2746 |  2745 |   
  65 | l| 57 |  5 |16 |  8192 |  6908 | 
97 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  5 |  2745 |  2749 |  2748 |   
  65 | l| 91 | 11 |16 |  8192 |  6108 | 
eb 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  6 |  2748 |  2745 |  2752 |   
   1 | l| 65 |  0 |16 |  8192 |  6848 | 
28 01 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  7 |  2752 |  2748 | 2 |   
  65 | l| 73 |  1 |16 |  8192 |  6668 | 
6f 01 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  8 | 2 |  2752 |  2753 |   
  65 | l| 69 |  5 |16 |  8192 |  6668 | 
b3 01 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  9 |  2753 | 2 |  2747 |   
  65 | l| 91 |  5 |16 |  8192 |  6228 | 
0a 02 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 10 |  2747 |  2753 |  2754 |   
  65 | l| 87 |

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-14 Thread Alik Khilazhev

> On 13 Jul 2017, at 19:14, Fabien COELHO  wrote:
> 
> Documentation says that the closer theta is from 0 the flatter the 
> distribution
> but the implementation requires at least 1, including strange error messages:
> 
>  zipfian parameter must be greater than 1.00 (not 1.00)
> 
> Could theta be allowed between 0 & 1 ? I've tried forcing with theta = 0.1
> and it worked well, so I'm not sure that I understand the restriction.
> I also tried with theta=0.001 but it seemed less good.

Algorithm works with theta less than 1. The only problem here is that theta can 
not be 1, because of next line of code

cell->alpha = 1. / (1 - theta);

That’s why I put such restriction. Now I see 2 possible solutions for that:
1) Exclude 1, and allow everything in range (0;+∞).
2) Or just increase/decrease theta by very small number if it is 1. 

> I have also tried to check the distribution wrt the explanations, with the 
> attached scripts, n=100, theta=1.01/1.5/3.0: It does not seem to work, 
> there is repeatable +15% bias on i=3 and repeatable -3% to -30% bias for 
> values in i=10-100, this for different values of theta (1.01,1.5, 3.0).
> 
> If you try the script, beware to set parameters (theta, n) consistently.

I've executed scripts that you attached with different theta and number of 
outcomes(not n, n remains the same = 100) and I found out that for theta = 0.1 
and big number of outcomes it gives distribution very similar to zipfian(for 
number of outcomes = 100 000, bias -6% to 8% in whole range and for NOO = 1000 
000, bias is -2% to 2%).

By, number of outcomes(NOO) I mean how many times random_zipfian was called. 
For example:
pgbench -f compte_bench.sql -t 10

So, I think it works but works worse for small number of outcomes. And also we 
need to find optimal theta for better results.

— 
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com 
The Russian Postgres Company

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-13 Thread Peter Geoghegan
On Thu, Jul 13, 2017 at 12:49 PM, Peter Geoghegan  wrote:
> To reiterate what I say above:
>
> The number of leaf pages with dead items is 20 with this most recent
> run (128 clients, patched + unpatched). The leftmost internal page one
> level up from the leaf level contains 289 items. Whereas last time it
> was 353 items.
>
> That's a difference between having 20 hot/bloated leaf pages, and
> probably 84 hot/bloated pages, which I infer must have been the total
> number of bloated leaf pages within "result.txt". I think that
> something about all the "pgbench_index_*txt" tests are very different
> to what we see within "result.txt". It's as if there was a problem
> when "result.txt" ran, but that problem somehow didn't come up when
> you did new tests.

I just figured out that "result.txt" is only a 60 second pgbench run.
Is the same true for other tests?

It would be much more interesting to see runs that lasted 10 minutes
or more. Maybe with pgbench-tools. I would expect that a decline in
performance due to bloating the index could happen only after a
certain threshold was crossed. Things like HOT pruning and LP_DEAD
cleanup could be pretty effective, until finally a tipping point is
reached and they're much less effective.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-13 Thread Peter Geoghegan
On Thu, Jul 13, 2017 at 10:02 AM, Peter Geoghegan  wrote:
> The number of leaf pages at the left hand side of the leaf level seems
> to be ~50 less than the unpatched 128 client case was the first time
> around, which seems like a significant difference. I wonder why. Maybe
> autovacuum ran at the right/wrong time this time around?

To reiterate what I say above:

The number of leaf pages with dead items is 20 with this most recent
run (128 clients, patched + unpatched). The leftmost internal page one
level up from the leaf level contains 289 items. Whereas last time it
was 353 items.

That's a difference between having 20 hot/bloated leaf pages, and
probably 84 hot/bloated pages, which I infer must have been the total
number of bloated leaf pages within "result.txt". I think that
something about all the "pgbench_index_*txt" tests are very different
to what we see within "result.txt". It's as if there was a problem
when "result.txt" ran, but that problem somehow didn't come up when
you did new tests.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-13 Thread Peter Geoghegan
On Thu, Jul 13, 2017 at 4:38 AM, Alik Khilazhev
 wrote:
> I am attaching results of test for 32 and 128 clients for original and
> patched(_bt_doinsert) variants.

Thanks.

The number of leaf pages at the left hand side of the leaf level seems
to be ~50 less than the unpatched 128 client case was the first time
around, which seems like a significant difference. I wonder why. Maybe
autovacuum ran at the right/wrong time this time around?

Did you happen to record TPS for this most recent set of tests?

I notice one possibly interesting thing from these new numbers: the 32
client case is slightly more bloated when unique index enforcement is
removed.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-13 Thread Fabien COELHO


Hello Alik,

A few comments about the patch v2.

Patch applies and compiles.

Documentation says that the closer theta is from 0 the flatter the distribution
but the implementation requires at least 1, including strange error messages:

  zipfian parameter must be greater than 1.00 (not 1.00)

Could theta be allowed between 0 & 1 ? I've tried forcing with theta = 0.1
and it worked well, so I'm not sure that I understand the restriction.
I also tried with theta=0.001 but it seemed less good.

I have also tried to check the distribution wrt the explanations, with the 
attached scripts, n=100, theta=1.01/1.5/3.0: It does not seem to work, 
there is repeatable +15% bias on i=3 and repeatable -3% to -30% bias for 
values in i=10-100, this for different values of theta (1.01,1.5, 
3.0).


If you try the script, beware to set parameters (theta, n) consistently.

About the code:

Remove spaces and tabs at end of lines or on empty lines.

zipfn: I would suggest to move the pg_erand48 out and pass the result
instead of passing the thread. the erand call would move to getZipfianRand.

I'm wondering whether the "nearest" hack could be removed so as to simplify
the cache functions code...

Avoid editing lines without changes (spacesn/tabs?)
  -  thread->logfile = NULL; /* filled in later */
  +  thread->logfile = NULL; /* filled in later */

The documentation explaining the intuitive distribution talks about a N
but does not provide any hint about its value depending on the parameters.

There is an useless empty lien before "" after that.

--
Fabien.

compte_init.sql
Description: application/sql


compte_bench.sql
Description: application/sql


compte_expected.sql
Description: application/sql


compte_results.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-13 Thread Alik Khilazhev
On 13 Jul 2017, at 00:20, Peter Geoghegan  wrote:Actually, I mean that I wonder how much of a difference it would makeif this entire block was commented out within _bt_doinsert():if (checkUnique != UNIQUE_CHECK_NO){    …}I am attaching results of test for 32 and 128 clients for original and patched(_bt_doinsert) variants.— Thanks and Regards,Alik KhilazhevPostgres Professional:http://www.postgrespro.comThe Russian Postgres Company  idx  | level | l_item | blkno | btpo_prev | btpo_next | 
btpo_flags | type | live_items | dead_items | avg_item_size | page_size | 
free_size | highkey 
---+---++---+---+---++--+++---+---+---+-
 pgbench_accounts_pkey | 2 |  1 |   290 | 0 | 0 |   
   2 | r| 10 |  0 |15 |  8192 |  7956 | 
 pgbench_accounts_pkey | 1 |  1 | 3 | 0 |   289 |   
   0 | i|298 |  0 |15 |  8192 |  2196 | 
09 96 01 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  2 |   289 | 3 |   575 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
11 2c 03 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  3 |   575 |   289 |   860 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
19 c2 04 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  4 |   860 |   575 |  1145 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
21 58 06 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  5 |  1145 |   860 |  1430 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
29 ee 07 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  6 |  1430 |  1145 |  1715 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
31 84 09 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  7 |  1715 |  1430 |  2000 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
39 1a 0b 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  8 |  2000 |  1715 |  2285 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
41 b0 0c 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  9 |  2285 |  2000 |  2570 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
49 46 0e 00 00 00 00 00
 pgbench_accounts_pkey | 1 | 10 |  2570 |  2285 | 0 |   
   0 | i|177 |  0 |15 |  8192 |  4616 | 
 pgbench_accounts_pkey | 0 |  1 | 1 | 0 |  2751 |   
  65 | l| 21 |225 |16 |  8192 |  3228 | 
14 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  2 |  2751 | 1 |  2746 |   
  65 | l| 40 |182 |16 |  8192 |  3708 | 
3b 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  3 |  2746 |  2751 |  2750 |   
  65 | l| 43 |240 |16 |  8192 |  2488 | 
65 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  4 |  2750 |  2746 |  2745 |   
  65 | l| 51 | 57 |16 |  8192 |  5988 | 
97 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  5 |  2745 |  2750 |  2756 |   
  65 | l| 42 | 47 |16 |  8192 |  6368 | 
c0 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  6 |  2756 |  2745 |  2748 |   
  65 | l| 54 |139 |16 |  8192 |  4288 | 
f5 00 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  7 |  2748 |  2756 |  2755 |   
  65 | l| 57 |333 |16 |  8192 |   348 | 
2d 01 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  8 |  2755 |  2748 | 2 |   
  65 | l| 67 |308 |16 |  8192 |   648 | 
6f 01 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 |  9 | 2 |  2755 |  2753 |   
  65 | l| 75 |280 |16 |  8192 |  1048 | 
b9 01 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 10 |  2753 | 2 |  2747 |   
  65 | l| 83 |260 |16 |  8192 |  1288 | 
0b 02 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 11 |  2747 |  2753 |  2754 |   
  65 | l| 91 |192 |16 |  8192 |  2488 | 
65 02 00 00 00 00 00 00
 pgbench_accounts_pkey | 0 | 12 |  2754 |  2747 | 4 |   
  65 | l|121 |196 |16 

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Peter Geoghegan
On Wed, Jul 12, 2017 at 2:17 PM, Peter Geoghegan  wrote:
> I'd be interested in seeing the difference it makes if Postgres is
> built with the call to _bt_check_unique() commented out within
> nbtinsert.c.

Actually, I mean that I wonder how much of a difference it would make
if this entire block was commented out within _bt_doinsert():

if (checkUnique != UNIQUE_CHECK_NO)
{
...
}


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Peter Geoghegan
On Wed, Jul 12, 2017 at 1:55 PM, Alvaro Herrera
 wrote:
> Not to mention work done with a "buffer cleanup lock" held -- which is
> compounded by the fact that acquiring such a lock is prone to starvation
> if there are many scanners of that index.  I've seen a case where a very
> hot table is scanned so heavily that vacuum is starved for days waiting
> to acquire cleanup on a single page (vacuum was only able to finish
> because the app using the table was restarted).  I'm sure that a uniform
> distribution of keys, with a uniform distribution of values scanned,
> would give a completely different behavior than a highly skewed
> distribution where a single key receives a large fraction of the scans.

Good point.

I'd be interested in seeing the difference it makes if Postgres is
built with the call to _bt_check_unique() commented out within
nbtinsert.c. The UPDATE query doesn't ever change indexed columns, so
there should be no real need for the checking it does in this case.
We're seeing a lot of duplicates in at least part of the keyspace, and
yet the queries themselves should be as HOT-safe as possible.

Another possibly weird thing that I noticed is latency standard
deviation. This is from Alik's response to my request to run that SQL
query:

latency average = 1.375 ms
tps = 93085.250384 (including connections establishing)
tps = 93125.724773 (excluding connections establishing)
SQL script 1: /home/nglukhov/ycsb_read_zipf.sql
 - weight: 1 (targets 50.0% of total)
 - 2782999 transactions (49.8% of total, tps = 46364.447705)
 - latency average = 0.131 ms
 - latency stddev = 0.087 ms
SQL script 2: /home/nglukhov/ycsb_update_zipf.sql
 - weight: 1 (targets 50.0% of total)
 - 2780197 transactions (49.8% of total, tps = 46317.766703)
 - latency average = 2.630 ms
 - latency stddev = 14.092 ms

This is from the 128 client case -- the slow case.

Notice that the standard deviation is very high for
ycsb_update_zipf.sql. I wonder if this degrades because of some kind
of feedback loop, that reaches a kind of tipping point at higher
client counts.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Alvaro Herrera
Peter Geoghegan wrote:

> Now, that might not seem like that much of a difference, but if you
> consider how duplicates are handled in the B-Tree code, and how unique
> index enforcement works, I think it could be. It could lead to heavy
> buffer lock contention, because we sometimes do a lot of work with an
> exclusive buffer lock held.

Not to mention work done with a "buffer cleanup lock" held -- which is
compounded by the fact that acquiring such a lock is prone to starvation
if there are many scanners of that index.  I've seen a case where a very
hot table is scanned so heavily that vacuum is starved for days waiting
to acquire cleanup on a single page (vacuum was only able to finish
because the app using the table was restarted).  I'm sure that a uniform
distribution of keys, with a uniform distribution of values scanned,
would give a completely different behavior than a highly skewed
distribution where a single key receives a large fraction of the scans.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Peter Geoghegan
On Wed, Jul 12, 2017 at 12:30 PM, Peter Geoghegan  wrote:
> On Wed, Jul 12, 2017 at 4:28 AM, Alik Khilazhev
>  wrote:
>> I am attaching results of query that you sent. It shows that there is
>> nothing have changed after executing tests.
>
> But something did change! In the case where performance was good, all
> internal pages on the level above the leaf level have exactly 285
> items, excluding the rightmost page, which unsurprisingly didn't fill.
> However, after increasing client count to get the slow case, the "hot"
> part of the keyspace (the leaf pages owned by the first/leftmost
> internal page on that level) has 353 items rather than 285.

Could you please run the query again for both cases, but this time
change it to get items from the leaf level too, and not just internal
levels? Just remove the "wheretype != 'l' " clause in one of the CTEs.
That should do it.

It's probably the case that the hot part of the B-tree is actually
significantly less than 353 items (or 285 items), and so the "before"
and "after" difference in how many pages are used for the hot part of
the keyspace could be quite a lot larger than my initial estimate. It
could be that the granularity that is shown on the internal pages is
insufficient to see just how bad the problem is.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Peter Geoghegan
On Wed, Jul 12, 2017 at 4:28 AM, Alik Khilazhev
 wrote:
> I am attaching results of query that you sent. It shows that there is
> nothing have changed after executing tests.

But something did change! In the case where performance was good, all
internal pages on the level above the leaf level have exactly 285
items, excluding the rightmost page, which unsurprisingly didn't fill.
However, after increasing client count to get the slow case, the "hot"
part of the keyspace (the leaf pages owned by the first/leftmost
internal page on that level) has 353 items rather than 285.

Now, that might not seem like that much of a difference, but if you
consider how duplicates are handled in the B-Tree code, and how unique
index enforcement works, I think it could be. It could lead to heavy
buffer lock contention, because we sometimes do a lot of work with an
exclusive buffer lock held.

This is something that I go into a lot of detail on in the Wiki page
on Key normalization:
https://wiki.postgresql.org/wiki/Key_normalization#Avoiding_unnecessary_unique_index_enforcement

Now, I'm not saying that I know for sure that that's what it is. It
seems like a good area to investigate, though. Even if it wasn't
buffer lock contention, we're still talking about the difference
between the hot part of the B-Tree being about 353 pages, versus 285.
Buffer lock contention could naturally limit the growth in size to
"only" 353, by slowing everything down.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Alik Khilazhev
Hello!

I want to say that our company is already engaged in the search for the causes 
of the problem and their solution. And also we have few experimental patches 
that increases performance for 1000 clients by several times.
 
In addition, I have fixed threadsafety issues and implemented per-thread cache 
for zeta values. See attached patch.


pgbench-zipf-02v.patch
Description: Binary data
—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Alik Khilazhev
On 7 Jul 2017, at 21:53, Peter Geoghegan  wrote:Is it possible for you to instrument the number of B-Tree pageaccesses using custom instrumentation for pgbench_accounts_pkey?If that seems like too much work, then it would still be interestingto see what the B-Tree keyspace looks like before and after varyingthe "nclient" count from, say, 32 to 128. Maybe there is a significantdifference in how balanced or skewed it is in each case. Or, the indexcould simply be more bloated.There is a query that I sometimes use, that itself uses pageinspect,to summarize the keyspace quickly. It shows you the highkey for everyinternal page, starting from the root and working down to the lowestinternal page level (the one just before the leaf level -- level 1),in logical/keyspace order. You can use it to visualize thedistribution of values. It could easily include the leaf level, too,but that's less interesting and tends to make the query take ages. Iwonder what the query will show here.Here is the query:…I am attaching results of query that you sent. It shows that there is nothing have changed after executing tests. ...before 128

  idx  | level | l_item | blkno | btpo_prev | btpo_next | 
btpo_flags | type | live_items | dead_items | avg_item_size | page_size | 
free_size | highkey 
-—-+---++---+---+---++--+++---+---+---+-
 pgbench_accounts_pkey | 2 |  1 |   290 | 0 | 0 |   
   2 | r| 10 |  0 |15 |  8192 |  7956 | 
 pgbench_accounts_pkey | 1 |  1 | 3 | 0 |   289 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
09 96 01 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  2 |   289 | 3 |   575 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
11 2c 03 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  3 |   575 |   289 |   860 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
19 c2 04 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  4 |   860 |   575 |  1145 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
21 58 06 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  5 |  1145 |   860 |  1430 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
29 ee 07 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  6 |  1430 |  1145 |  1715 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
31 84 09 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  7 |  1715 |  1430 |  2000 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
39 1a 0b 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  8 |  2000 |  1715 |  2285 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
41 b0 0c 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  9 |  2285 |  2000 |  2570 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
49 46 0e 00 00 00 00 00
 pgbench_accounts_pkey | 1 | 10 |  2570 |  2285 | 0 |   
   0 | i|177 |  0 |15 |  8192 |  4616 | 
(11 rows)

latency average = 1.375 ms
tps = 93085.250384 (including connections establishing)
tps = 93125.724773 (excluding connections establishing)
SQL script 1: /home/nglukhov/ycsb_read_zipf.sql
 - weight: 1 (targets 50.0% of total)
 - 2782999 transactions (49.8% of total, tps = 46364.447705)
 - latency average = 0.131 ms
 - latency stddev = 0.087 ms
SQL script 2: /home/nglukhov/ycsb_update_zipf.sql
 - weight: 1 (targets 50.0% of total)
 - 2780197 transactions (49.8% of total, tps = 46317.766703)
 - latency average = 2.630 ms
 - latency stddev = 14.092 ms

after 128

  idx  | level | l_item | blkno | btpo_prev | btpo_next | 
btpo_flags | type | live_items | dead_items | avg_item_size | page_size | 
free_size | highkey 
—--+---++---+---+---++--+++---+---+---+-
 pgbench_accounts_pkey | 2 |  1 |   290 | 0 | 0 |   
   2 | r| 10 |  0 |15 |  8192 |  7956 | 
 pgbench_accounts_pkey | 1 |  1 | 3 | 0 |   289 |   
   0 | i|353 |  0 |15 |  8192 |  1096 | 
09 96 01 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  2 |   289 | 3 |   575 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
11 2c 03 00 00 00 00 

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-10 Thread Fabien COELHO


Hello Alik,

Your description is not very precise. What version of Postgres is used? 
If there is a decline, compared to which version? Is there a link to 
these results?


Benchmark have been done in master v10. I am attaching image with results:
.


Ok, thanks.

More precision would be helpful, such as the exact pgbench option used (eg 
how many client per thread in pgbench, how long does it run, prepared 
transactions, ...).


Intuitively, contention should explain a saturation of the tps 
performance, because more clients are not effective to improve tps as the 
wait for other clients, and the latency would degrade.


But it is unclear to me why the tps would be reduced even with lock 
contention, so something seems amiss.


Performance debugging by mail is an uneasy task.

Maybe you could try zipf with unlogged tables, to check whether skipping 
the WAL write does something.


Also Amit advice about the perf report looks useful.

Given the explanations, the random draw mostly hits values at the 
beginning of the interval, so when the number of client goes higher one 
just get locking contention on the updated row?


Yes, exactly.


Ok. The uniform distribution run, if all other parameters are equal, gives 
a hint about the potential performance when the performance bottleneck is 
hit.


On Workload A with uniform distribution PostgreSQL shows better results 
than MongoDB and MySQL(see attachment). Also you can notice that for 
small number of clients type of distribution does not affect on tps on 
MySQL.


Ok. I assume that you use pgbench for pg and other ad-hoc tools for the 
other targets (mysql & mongodb).


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-10 Thread Amit Kapila
On Mon, Jul 10, 2017 at 12:19 PM, Alik Khilazhev  wrote:

> Hello, Fabien!
>
> Your description is not very precise. What version of Postgres is used? If
> there is a decline, compared to which version? Is there a link to these
> results?
>
>
> Benchmark have been done in master v10. I am attaching image with results:
> .
>
>
It will be interesting to see what the profiling data with perf says about
this for PostgreSQL.  Can you try to get the perf report?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-10 Thread Alik Khilazhev
Hello, Fabien!

> Your description is not very precise. What version of Postgres is used? If 
> there is a decline, compared to which version? Is there a link to these 
> results?

Benchmark have been done in master v10. I am attaching image with results:
.

> Indeed, the function computation is over expensive, and the numerical 
> precision of the implementation is doubtful.
> 
> If there is no better way to compute this function, ISTM that it should be 
> summed in reverse order to accumulate small values first, from (1/n)^s + ... 
> + (1/2)^ s. As 1/1 == 1, the corresponding term is 1, no point in calling pow 
> for this one, so it could be:
> 
>   double ans = 0.0;
>   for (i = n; i >= 2; i--)
> ans += pow(1. / i, theta);
>   return 1.0 + ans;

You are right, it’s better to reverse order.

> If the functions when actually used is likely to be called with different 
> parameters, then some caching beyond the last value would seem in order. 
> Maybe a small fixed size array?
> 
> However, it should be somehow thread safe, which does not seem to be the case 
> with the current implementation. Maybe a per-thread cache? Or use a lock only 
> to update a shared cache? At least it should avoid locking to read values…

Yea, I forget about thread-safety. I will implement per-thread cache with small 
fixed array.

> Given the explanations, the random draw mostly hits values at the beginning 
> of the interval, so when the number of client goes higher one just get 
> locking contention on the updated row?

Yes, exactly. 

> ISTM that also having the tps achieved with a flat distribution would allow 
> to check this hypothesis.

On Workload A with uniform distribution PostgreSQL shows better results than 
MongoDB and MySQL(see attachment). Also you can notice that for small number of 
clients  type of distribution does not affect on tps on MySQL. 



And it’s important to mention that postgres run with option 
synchronous_commit=off, to satisfy  durability MongoDB 
writeConcern=1=false. In this mode there is possibility to lose all 
changes in the last second. If we run postgres with max durability MongoDB will 
lag far behind. 
---
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com 
The Russian Postgres Company



Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-07 Thread Peter Geoghegan
On Fri, Jul 7, 2017 at 12:59 PM, Alvaro Herrera
 wrote:
> Hmm, this seems potentially very useful.  Care to upload it to
> https://wiki.postgresql.org/wiki/Category:Snippets ?

Sure. I've added it here, under "index maintenance":

https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index

It would be a nice additional touch if there was an easy way of taking
the on-disk representation of index tuples (in this case that would be
little-endian signed integers from bt_page_items()), and from that
output actual typed values. Maybe just for a few select datatypes.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-07 Thread Alvaro Herrera
Peter Geoghegan wrote:

> Here is the query:
> 
> with recursive index_details as (
>   select
> 'pgbench_accounts_pkey'::text idx
> ), [...]

Hmm, this seems potentially very useful.  Care to upload it to
https://wiki.postgresql.org/wiki/Category:Snippets ?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-07 Thread Peter Geoghegan
On Fri, Jul 7, 2017 at 12:45 AM, Alik Khilazhev
 wrote:
> On scale = 10(1 million rows) it gives following results on machine with 144 
> cores(with synchronous_commit=off):
> nclientstps
> 1   8842.401870
> 2   18358.140869
> 4   45999.378785
> 8   88713.743199
> 16  170166.998212
> 32  290069.221493
> 64  178128.030553
> 128 88712.825602
> 256 38364.937573
> 512 13512.765878
> 10006188.136736

Is it possible for you to instrument the number of B-Tree page
accesses using custom instrumentation for pgbench_accounts_pkey?

If that seems like too much work, then it would still be interesting
to see what the B-Tree keyspace looks like before and after varying
the "nclient" count from, say, 32 to 128. Maybe there is a significant
difference in how balanced or skewed it is in each case. Or, the index
could simply be more bloated.

There is a query that I sometimes use, that itself uses pageinspect,
to summarize the keyspace quickly. It shows you the highkey for every
internal page, starting from the root and working down to the lowest
internal page level (the one just before the leaf level -- level 1),
in logical/keyspace order. You can use it to visualize the
distribution of values. It could easily include the leaf level, too,
but that's less interesting and tends to make the query take ages. I
wonder what the query will show here.

Here is the query:

with recursive index_details as (
  select
'pgbench_accounts_pkey'::text idx
),
size_in_pages_index as (
  select
(pg_relation_size(idx::regclass) / (2^13))::int4 size_pages
  from
index_details
),
page_stats as (
  select
index_details.*,
stats.*
  from
index_details,
size_in_pages_index,
lateral (select i from generate_series(1, size_pages - 1) i) series,
lateral (select * from bt_page_stats(idx, i)) stats),
internal_page_stats as (
  select
*
  from
page_stats
  where
type != 'l'),
meta_stats as (
  select
*
  from
index_details s,
lateral (select * from bt_metap(s.idx)) meta),
internal_items as (
  select
*
  from
internal_page_stats
  order by
btpo desc),
-- XXX: Note ordering dependency within this CTE, on internal_items
ordered_internal_items(item, blk, level) as (
  select
1,
blkno,
btpo
  from
internal_items
  where
btpo_prev = 0
and btpo = (select level from meta_stats)
  union
  select
case when level = btpo then o.item + 1 else 1 end,
blkno,
btpo
  from
internal_items i,
ordered_internal_items o
  where
i.btpo_prev = o.blk or (btpo_prev = 0 and btpo = o.level - 1)
)
select
  idx,
  btpo as level,
  item as l_item,
  blkno,
  btpo_prev,
  btpo_next,
  btpo_flags,
  type,
  live_items,
  dead_items,
  avg_item_size,
  page_size,
  free_size,
  -- Only non-rightmost pages have high key.
  case when btpo_next != 0 then (select data from bt_page_items(idx,
blkno) where itemoffset = 1) end as highkey
from
  ordered_internal_items o
  join internal_items i on o.blk = i.blkno
order by btpo desc, item;

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-07 Thread Peter Geoghegan
On Fri, Jul 7, 2017 at 5:17 AM, Robert Haas  wrote:
> How is that possible?  In a Zipfian distribution, no matter how big
> the table is, almost all of the updates will be concentrated on a
> handful of rows - and updates to any given row are necessarily
> serialized, or so I would think.  Maybe MongoDB can be fast there
> since there are no transactions, so it can just lock the row slam in
> the new value and unlock the row, all (I suppose) without writing WAL
> or doing anything hard.

If you're not using the Wired Tiger storage engine, than the locking
is at the document level, which means that a Zipfian distribution is
no worse than any other as far as lock contention goes. That's one
possible explanation. Another is that indexed organized tables
naturally have much better locality, which matters at every level of
the memory hierarchy.

> I'm more curious about why we're performing badly than I am about a
> general-purpose random_zipfian function.  :-)

I'm interested in both. I think that a random_zipfian function would
be quite helpful for modeling certain kinds of performance problems,
like CPU cache misses incurred at the page level.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-07 Thread Robert Haas
On Fri, Jul 7, 2017 at 3:45 AM, Alik Khilazhev
 wrote:
> PostgreSQL shows very bad results in YCSB Workload A (50% SELECT and 50% 
> UPDATE of random row by PK) on benchmarking with big number of clients using 
> Zipfian distribution. MySQL also has decline but it is not significant as it 
> is in PostgreSQL. MongoDB does not have decline at all.

How is that possible?  In a Zipfian distribution, no matter how big
the table is, almost all of the updates will be concentrated on a
handful of rows - and updates to any given row are necessarily
serialized, or so I would think.  Maybe MongoDB can be fast there
since there are no transactions, so it can just lock the row slam in
the new value and unlock the row, all (I suppose) without writing WAL
or doing anything hard.  But MySQL is going to have to hold the row
lock until transaction commit just like we do, or so I would think.
Is it just that their row locking is way faster than ours?

I'm more curious about why we're performing badly than I am about a
general-purpose random_zipfian function.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-07 Thread Fabien COELHO


Hello Alik,

PostgreSQL shows very bad results in YCSB Workload A (50% SELECT and 50% 
UPDATE of random row by PK) on benchmarking with big number of clients 
using Zipfian distribution. MySQL also has decline but it is not 
significant as it is in PostgreSQL. MongoDB does not have decline at 
all. And if pgbench would have Zipfian distribution random number 
generator, everyone will be able to make research on this topic without 
using YCSB.


Your description is not very precise. What version of Postgres is used? If 
there is a decline, compared to which version? Is there a link to these 
results?



This is the reason why I am currently working on random_zipfian function.


The bottleneck of algorithm that I use is that it calculates zeta 
function (it has linear complexity - 
https://en.wikipedia.org/wiki/Riemann_zeta_function). It my cause 
problems on generating huge amount of big numbers.


Indeed, the function computation is over expensive, and the numerical 
precision of the implementation is doubtful.


If there is no better way to compute this function, ISTM that it should be 
summed in reverse order to accumulate small values first, from (1/n)^s + 
... + (1/2)^ s. As 1/1 == 1, the corresponding term is 1, no point in 
calling pow for this one, so it could be:


   double ans = 0.0;
   for (i = n; i >= 2; i--)
 ans += pow(1. / i, theta);
   return 1.0 + ans;

That’s why I added caching for zeta value. And it works good for cases 
when random_zipfian called with same parameters in script. For example:


That’s why I have a question: should I implement support of caching zeta 
values for calls with different parameters, or not?


I do not envision the random_zipfian function to be used widely, but if it 
is useful to someone this is fine with me. Could an inexpensive 
exponential distribution be used instead for the same benchmarking 
purpose?


If the functions when actually used is likely to be called with different 
parameters, then some caching beyond the last value would seem in order. 
Maybe a small fixed size array?


However, it should be somehow thread safe, which does not seem to be the 
case with the current implementation. Maybe a per-thread cache? Or use a 
lock only to update a shared cache? At least it should avoid locking to 
read values...



P.S. I attaching patch and script - analogue of YCSB Workload A.
Run benchmark with command:
$ pgbench -f  ycsb_read_zipf.sql -f  ycsb_update_zipf.sql


Given the explanations, the random draw mostly hits values at the 
beginning of the interval, so when the number of client goes higher one 
just get locking contention on the updated row?


ISTM that also having the tps achieved with a flat distribution would 
allow to check this hypothesis.


On scale = 10(1 million rows) it gives following results on machine with 
144 cores(with synchronous_commit=off):



nclientstps
1   8842.401870
2   18358.140869
4   45999.378785
8   88713.743199
16  170166.998212
32  290069.221493
64  178128.030553
128 88712.825602
256 38364.937573
512 13512.765878
10006188.136736


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers