Hello Alik,

I am attaching patch v7.

Patch generates multiple warnings with "git apply", apparently because of end-of-line spaces, and fails:

  pgbench-zipf-07v.patch:52: trailing whitespace.
  pgbench-zipf-07v.patch:53: trailing whitespace.
                "random_zipfian", 3, PGBENCH_RANDOM_ZIPFIAN
  pgbench-zipf-07v.patch:54: trailing whitespace.
  pgbench-zipf-07v.patch:66: trailing whitespace.
  #define ZIPF_CACHE_SIZE 15              /* cache cells number */
  pgbench-zipf-07v.patch:67: trailing whitespace.

  error: patch failed: doc/src/sgml/ref/pgbench.sgml:1013
  error: doc/src/sgml/ref/pgbench.sgml: patch does not apply
  error: patch failed: src/bin/pgbench/exprparse.y:191
  error: src/bin/pgbench/exprparse.y: patch does not apply
  error: patch failed: src/bin/pgbench/pgbench.c:93
  error: src/bin/pgbench/pgbench.c: patch does not apply
  error: patch failed: src/bin/pgbench/pgbench.h:75
  error: src/bin/pgbench/pgbench.h: patch does not apply

It also complains with "patch", although it seems to work:

 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file doc/src/sgml/ref/pgbench.sgml
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/bin/pgbench/exprparse.y
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/bin/pgbench/pgbench.c
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/bin/pgbench/pgbench.h
 patch unexpectedly ends in middle of line
 patch unexpectedly ends in middle of line

Please do not put spaces at the end of lines, eg there is a lonely tab on the second line of "computeHarmonicZipfian".

It seems that "CR" characters are used:

  sh> sha1sum ~/pgbench-zipf-07v.patch

  sh> file ~/pgbench-zipf-07v.patch
  pgbench-zipf-07v.patch: unified diff output, ASCII text, with CRLF line 

Compilation issues one warning:

 pgbench.c: In function ‘computeHarmonicZipfian’:
 pgbench.c:894:2: warning: ISO C90 forbids mixed declarations and code 
    double  uniform = pg_erand48(thread->random_state);

Please conform to pg standard for portability, even if it is a very old one:-)

About the documentation:

I would suggest to replace 0.5 in the first example by 1.5, as a zipfian distribution with a lower-than-one parameter is not that well defined.

I'm fine with using references to papers or books for the algorithm.

The documentation lines in the sgml file are too long. At least
limit text paragraphs to maybe 80 columns as done elsewhere.

typo: "<entry> Zipfian" (remove space)

typo: "used(described" (missing space)

typo: "numbers(algorithm" (again)

The sentence "It's recommended to pick <replaceable>parameter</> in range (0, 1) for better accuracy." does not make sense with the updated generation algorithm.

The text would benefit from a review by a native English speaker, who I am not. Anyway here is an attempt at improving/simplifying the text (I have removed sgml formatting). If some native English speaker could review it, that would be great.

  "random_zipfian" generates an approximated bounded zipfian distribution.
  For parameter in (0,1), an approximated algorithm is taken from
  "Quickly Generating Billion-Record Synthetic Databases", Jim Gray et al, 
SIGMOD 1994.
  For parameter in (1, 1000), a rejection method is used, based on
  "Non-Uniform Random Variate Generation", Luc Devroye, p. 550-551, Springer 
  The distribution is not defined when the parameter's value is 1.0.
  The drawing performance is poor for parameter values close and above 1.0
  and on a small range.

  "parameter" defines how skewed the distribution is.
  The larger the "parameter", the more frequently values close to the beginning
  of the interval are drawn.
  The closer to 0 "parameter" is, the flatter (more uniform) the access 

English in comments and code:

"inited" is not English, I think you want "initialized". Maybe "nb_cells" would do.

"it is not exist" (multiple instances) -> "it does not exist".

I think that the references to the paper/book should appear as comments in the code, for instance in front of the functions which implement the algorithm.

"current_time", "last_touch_time" are counter based, they are not "time". I suggest to update the name to "current" and "last_touch" or "last_used".

"time when cell was used last time" -> "last used logical time"

About the code:

I would prefer that you use "1.0" instead of "1." for double constants.

I think that getZipfianRand should be symmetric. Maybe a direct formula could do:

  return min - 1 + (s > 1) ? xxx() : yyy();

or at least use an explicit "else" for symmetry.

Function "zipfn" asks for s and n as arguments, but ISTM that they are already include as fields in cell, so this seems redundant. Moreover I do not think that the zipfn function is that useful. I would suggest to inline it in its caller.

"zipfGeneralizedHarmonic" is not really related to zipf, it is just used there. I suggest to call it "generalizedHarmonicNumber" to reflect what it does.

I suggest to put all LRU management in the getCell* function, that is to move the last_use update from computeHarmonicZipfian to getCell.

computeHarminicZipfian and computeIterativeZipfian should have the same signature, not one with a random state and the other with the thread. I suggest to do the same as other random functions, whatever it is.

Otherwise, no action required:

The patch probably conflicst with other patches in the CF, which means that there will be rebase needed. That is life. The queue is long and slow.

Also, this function deserve some tap tests, that should be added if the "pgbench tap test" patch get through. Otherwise it is as well tested as everything else around, which is basically no test.

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to