Constraint's NO INHERIT option is ignored in CREATE TABLE LIKE statement

2020-02-19 Thread Ildar Musin
Hi hackers,

My colleague Chris Travers discovered something that looks like a bug.
Let's say we have a table with a constraint that is declared as NO INHERIT.

CREATE TABLE test (
x INT CHECK (x > 0) NO INHERIT
);
\d test
Table "public.test"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 x  | integer |   |  |
Check constraints:
"test_x_check1" CHECK (x > 0) NO INHERIT

Now when we want to make a copy of the table structure into a new table
the `NO INHERIT` option is ignored.

CREATE TABLE test2 (LIKE test INCLUDING CONSTRAINTS);
\d test2
   Table "public.test2"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 x  | integer |   |  |
Check constraints:
"test_x_check1" CHECK (x > 0)

Is this a bug or expected behaviour? Just in case I've attached a patch
that fixes this.

Regards,
Ildar
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index 2406ca7a5d..d75ec6766a 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -1154,6 +1154,7 @@ transformTableLikeClause(CreateStmtContext *cxt, TableLikeClause *table_like_cla
 			n->contype = CONSTR_CHECK;
 			n->location = -1;
 			n->conname = pstrdup(ccname);
+			n->is_no_inherit = tupleDesc->constr->check[ccnum].ccnoinherit;
 			n->raw_expr = NULL;
 			n->cooked_expr = nodeToString(ccbin_node);
 			cxt->ckconstraints = lappend(cxt->ckconstraints, n);


Compressed pluggable storage experiments

2019-10-10 Thread Ildar Musin
Hi hackers,

I've been experimenting with pluggable storage API recently and just
feel like I can share my first experience. First of all it's great to
have this API and that now community has the opportunity to implement
alternative storage engines. There are a few applications that come to
mind and a compressed storage is one of them.

Recently I've been working on a simple append-only compressed storage
[1]. My first idea was to just store data into compressed 1mb blocks
in a continuous file and keep separate file for block offsets (similar
to Knizhnik's CFS proposal). But then i realized that then i won't be
able to use most of postgres' infrastructure like WAL-logging and also
won't be able to implement some of the functions of TableAmRoutine
(like bitmap scan or analyze). So I had to adjust extension the way to
utilize standard postgres 8kb blocks: compressed 1mb blocks are split
into chunks and distributed among 8kb blocks. Current page layout
looks like this:

┌───┐
│ metapage  │
└───┘
┌───┐  ┐
│  block 1  │  │
├...┤  │ compressed 1mb block
│  block k  │  │
└───┘  ┘
┌───┐  ┐
│ block k+1 │  │
├...┤  │ another compressed 1mb block
│  block m  │  │
└───┘  ┘

Inside compressed blocks there are regular postgres heap tuples.

The following is the list of things i stumbled upon while implementing
storage. Since API is just came out there are not many examples of
pluggable storages and even less as external extensions (I managed to
find only blackhole_am by Michael Paquier which doesn't do much). So
many things i had to figure out by myself. Hopefully some of those
issues have a solution that i just can't see.

1. Unlike FDW API, in pluggable storage API there are no routines like
"begin modify table" and "end modify table" and there is no shared
state between insert/update/delete calls. In context of compressed
storage that means that there is no exact moment when we can finalize
writes (compress, split into chunks etc). We can set a callback at the
end of transaction, but in this case we'll have to keep latest
modifications for every table in memory until the end of transaction.
As for shared state we also can maintain some kind of key-value data
structure with per-relation shared state. But that again requires memory.
Because of this currently I only implemented COPY semantics.

2. It looks like I cannot implement custom storage options. E.g. for
compressed storage it makes sense to implement different compression
methods (lz4, zstd etc.) and corresponding options (like compression
level). But as i can see storage options (like fillfactor etc) are
hardcoded and are not extensible. Possible solution is to use GUCs
which would work but is not extremely convinient.

3. A bit surprising limitation that in order to use bitmap scan the
maximum number of tuples per page must not exceed 291 due to
MAX_TUPLES_PER_PAGE macro in tidbitmap.c which is calculated based on
8kb page size. In case of 1mb page this restriction feels really
limiting.

4. In order to use WAL-logging each page must start with a standard 24
byte PageHeaderData even if it is needless for storage itself. Not a
big deal though. Another (acutally documented) WAL-related limitation
is that only generic WAL can be used within extension. So unless
inserts are made in bulks it's going to require a lot of disk space to
accomodate logs and wide bandwith for replication.

pg_cryogen extension is still in developement so if other issues arise
i'll post them here. At this point the extension already supports
inserts via COPY, index and bitmap scans, vacuum (only freezing),
analyze. It uses lz4 compression and currently i'm working on adding
different compression methods. I'm also willing to work on
forementioned issues in API if community verifies them as valid.


[1] https://github.com/adjust/pg_cryogen

Thanks,
Ildar


Duplicated LSN in ReorderBuffer

2019-06-25 Thread Ildar Musin
Hi hackers,

I believe we found a bug in logical decoding. It only occures with
casserts enabled. It was originally discovered and reproduced by Murat
Kabilov and Ildus Kurbangaliev. Here is the stacktrace we've got:

#0  0x7facc66ef82f in raise () from /usr/lib/libc.so.6
#1  0x7facc66da672 in abort () from /usr/lib/libc.so.6
#2  0x00ac4ebf in ExceptionalCondition (
conditionName=0xccdea8 "!(prev_first_lsn < cur_txn->first_lsn)",
errorType=0xccdce4 "FailedAssertion", fileName=0xccdd38
"reorderbuffer.c",
lineNumber=680) at assert.c:54
#3  0x008a9515 in AssertTXNLsnOrder (rb=0x25ca128) at
reorderbuffer.c:680
#4  0x008a900f in ReorderBufferTXNByXid (rb=0x25ca128, xid=65609,
create=true,
is_new=0x0, lsn=211590864, create_as_top=true) at reorderbuffer.c:559
#5  0x008abf0d in ReorderBufferAddNewTupleCids (rb=0x25ca128,
xid=65609,
lsn=211590864, node=..., tid=..., cmin=0, cmax=4294967295,
combocid=4294967295)
at reorderbuffer.c:2098
#6  0x008b096b in SnapBuildProcessNewCid (builder=0x25d0158,
xid=65610,
lsn=211590864, xlrec=0x25d60b8) at snapbuild.c:781
#7  0x0089d01c in DecodeHeap2Op (ctx=0x25ba0b8, buf=0x7ffd0e294da0)
at decode.c:382
#8  0x0089c8ca in LogicalDecodingProcessRecord (ctx=0x25ba0b8,
record=0x25ba378)
at decode.c:125
#9  0x008a124c in DecodingContextFindStartpoint (ctx=0x25ba0b8) at
logical.c:492
#10 0x008b9c3d in CreateReplicationSlot (cmd=0x257be20) at
walsender.c:957
#11 0x008baa60 in exec_replication_command (
cmd_string=0x24f5b08 "CREATE_REPLICATION_SLOT temp_slot_name TEMPORARY
LOGICAL pgoutput USE_SNAPSHOT") at walsender.c:1531
#12 0x00937230 in PostgresMain (argc=1, argv=0x25233a8,
dbname=0x2523380 "postgres",
username=0x24f23c8 "zilder") at postgres.c:4245
#13 0x00881453 in BackendRun (port=0x251a900) at postmaster.c:4431
#14 0x00880b4f in BackendStartup (port=0x251a900) at
postmaster.c:4122
#15 0x0087cbbe in ServerLoop () at postmaster.c:1704
#16 0x0087c34a in PostmasterMain (argc=3, argv=0x24f0330) at
postmaster.c:1377
#17 0x007926b6 in main (argc=3, argv=0x24f0330) at main.c:228

After viewing coredump we see that
`prev_first_lsn == cur_txn->first_lsn`

The problem seems to be that ReorderBuffer adds two ReorderBufferTXNs
with the same LSN, but different transaction ids: subxid and top-level
xid. See FIX part below.


STEPS TO REPRODUCE
--

We were able reproduce it on 10, 11 and on master branch. Postgres was
configured as:

./configure --enable-cassert CFLAGS='-ggdb3 -O0' --prefix=$HOME/pg12

Additional options in postgresql.conf:

wal_level='logical'
max_connections=1000
max_replication_slots=100
max_wal_senders=100
max_logical_replication_workers=100

pgbench scripts:

$ cat create_table.sql
BEGIN;
SAVEPOINT p1;
CREATE temp TABLE t_t (id INT) ON COMMIT DROP;
ROLLBACK TO SAVEPOINT p1;
ROLLBACK;

$ cat create_slot.sql
BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY;
SELECT pg_create_logical_replication_slot('test' || pg_backend_pid(),
'pgoutput', true);
SELECT pg_drop_replication_slot('test' || pg_backend_pid());
ROLLBACK;

Run in parallel terminals:

$HOME/pg12/bin/pgbench postgres -f create_table.sql -T1000 -c50 -j50
$HOME/pg12/bin/pgbench postgres -f create_slot.sql -T1000 -c50 -j50

It may take some time. On my local machine it breaks in few seconds.


FIX?


Can't say that i have enough understanding of what's going on in the
logical decoding code. But the one thing i've noticed is inconsistency
of xids used to make ReorderBufferTXNByXid() call:

1. first, in DecodeHeap2Op() function ReorderBufferProcessXid() is
called with subtransaction id; it actually creates ReorderBufferTXN
and adds it to reorder buffer's hash table and toplevel_by_lsn list;
2. second, within ReorderBufferXidSetCatalogChanges() it uses same
subxid to lookup the ReorderBufferTXN that was created before,
successfully;
3. now in ReorderBufferAddNewTupleCids() it uses top-level transaction
id instead for lookup; it cannot find xid in hash table and tries to
add a new record with the same LSN. And it fails since this LSN is
already in toplevel_by_lsn list.

Attached is a simple patch that uses subxid instead of top-level xid
in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but
i'm not sure that this is a valid change. Can someone please verify it
or maybe suggest a better solution for the issue?

Best regards,
Ildar
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 3e9d4cd79f..6c03c85768 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -778,7 +778,7 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
 	 */
 	ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn);
 
-	ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
+	

Re: [HACKERS] Transactions involving multiple postgres foreign servers, take 2

2019-01-29 Thread Ildar Musin
Hello,

The patch needs rebase as it doesn't apply to the current master. I applied
it
to the older commit to test it. It worked fine so far.

I found one bug though which would cause resolver to finish by timeout even
though there are unresolved foreign transactions in the list. The
`fdw_xact_exists()` function expects database id as the first argument and
xid
as the second. But everywhere it is called arguments specified in the
different
order (xid first, then dbid).  Also function declaration in header doesn't
match its definition.

There are some other things I found.
* In `FdwXactResolveAllDanglingTransactions()` variable `n_resolved` is
  declared as bool but used as integer.
* In fdwxact.c's module comment there are
`FdwXactRegisterForeignTransaction()`
  and `FdwXactMarkForeignTransactionModified()` functions mentioned that are
  not there anymore.
* In documentation (storage.sgml) there is no mention of `pg_fdw_xact`
  directory.

Couple of stylistic notes.
* In `FdwXactCtlData struct` there are both camel case and snake case naming
  used.
* In `get_fdw_xacts()` `xid != InvalidTransactionId` can be replaced with
  `TransactionIdIsValid(xid)`.
* In `generate_fdw_xact_identifier()` the `fx` prefix could be a part of
format
  string instead of being processed by `sprintf` as an extra argument.

I'll continue looking into the patch. Thanks!



On Tue, Nov 20, 2018 at 12:18 PM Masahiko Sawada 
wrote:

> On Thu, Nov 15, 2018 at 7:36 PM Masahiko Sawada 
> wrote:
> >
> > On Mon, Oct 29, 2018 at 6:03 PM Masahiko Sawada 
> wrote:
> > >
> > > On Mon, Oct 29, 2018 at 10:16 AM Masahiko Sawada <
> sawada.m...@gmail.com> wrote:
> > > >
> > > > On Wed, Oct 24, 2018 at 9:06 AM Masahiko Sawada <
> sawada.m...@gmail.com> wrote:
> > > > >
> > > > > On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI
> > > > >  wrote:
> > > > > >
> > > > > > Hello.
> > > > > >
> > > > > > # It took a long time to come here..
> > > > > >
> > > > > > At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada <
> sawada.m...@gmail.com> wrote in
> 
> > > > > > > On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada <
> sawada.m...@gmail.com> wrote:
> > > > > > ...
> > > > > > > * Updated docs, added the new section "Distributed
> Transaction" at
> > > > > > > Chapter 33 to explain the concept to users
> > > > > > >
> > > > > > > * Moved atomic commit codes into src/backend/access/fdwxact
> directory.
> > > > > > >
> > > > > > > * Some bug fixes.
> > > > > > >
> > > > > > > Please reivew them.
> > > > > >
> > > > > > I have some comments, with apologize in advance for possible
> > > > > > duplicate or conflict with others' comments so far.
> > > > >
> > > > > Thank youf so much for reviewing this patch!
> > > > >
> > > > > >
> > > > > > 0001:
> > > > > >
> > > > > > This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT
> > > > > > relation is modified. Isn't it needed when UNLOGGED tables are
> > > > > > modified? It may be better that we have dedicated classification
> > > > > > macro or function.
> > > > >
> > > > > I think even if we do atomic commit for modifying the an UNLOGGED
> > > > > table and a remote table the data will get inconsistent if the
> local
> > > > > server crashes. For example, if the local server crashes after
> > > > > prepared the transaction on foreign server but before the local
> commit
> > > > > and, we will lose the all data of the local UNLOGGED table whereas
> the
> > > > > modification of remote table is rollbacked. In case of persistent
> > > > > tables, the data consistency is left. So I think the keeping data
> > > > > consistency between remote data and local unlogged table is
> difficult
> > > > > and want to leave it as a restriction for now. Am I missing
> something?
> > > > >
> > > > > >
> > > > > > The flag is handled in heapam.c. I suppose that it should be done
> > > > > > in the upper layer considering coming pluggable storage.
> > > > > > (X_F_ACCESSEDTEMPREL is set in heapam, but..)
> > > > > >
> > > > >
> > > > > Yeah, or we can set the flag after heap_insert in ExecInsert.
> > > > >
> > > > > >
> > > > > > 0002:
> > > > > >
> > > > > > The name FdwXactParticipantsForAC doesn't sound good for me. How
> > > > > > about FdwXactAtomicCommitPartitcipants?
> > > > >
> > > > > +1, will fix it.
> > > > >
> > > > > >
> > > > > > Well, as the file comment of fdwxact.c,
> > > > > > FdwXactRegisterTransaction is called from FDW driver and
> > > > > > F_X_MarkForeignTransactionModified is called from executor. I
> > > > > > think that we should clarify who is responsible to the whole
> > > > > > sequence. Since the state of local tables affects, I suppose
> > > > > > executor is that. Couldn't we do the whole thing within executor
> > > > > > side?  I'm not sure but I feel that
> > > > > > F_X_RegisterForeignTransaction can be a part of
> > > > > > F_X_MarkForeignTransactionModified.  The callers of
> > > > > > MarkForeignTransactionModified can find whether the table is
> > > > > > involved in 2pc by 

Possibly redundant context switch in postgres_fdw

2018-10-18 Thread Ildar Musin
Hi hackers,

ISTM that context switch in `create_cursor()`:

if (numParams > 0)
{
MemoryContext oldcontext;
oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
process_query_params(econtext,
fsstate->param_flinfo,
fsstate->param_exprs,
values);
MemoryContextSwitchTo(oldcontext);
}

is redundant since we should already be in `ecxt_per_tuple_memory` context
according to `ForeignNext()`. Do I miss some hidden purpose? If not here is
a patch that removes it.

Regards,
Ildar Musin
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fd20aa96aa..65962ea657 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3083,16 +3083,10 @@ create_cursor(ForeignScanState *node)
 	 */
 	if (numParams > 0)
 	{
-		MemoryContext oldcontext;
-
-		oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-
 		process_query_params(econtext,
 			 fsstate->param_flinfo,
 			 fsstate->param_exprs,
 			 values);
-
-		MemoryContextSwitchTo(oldcontext);
 	}
 
 	/* Construct the DECLARE CURSOR command */


Re: hostorder and failover_timeout for libpq

2018-09-19 Thread Ildar Musin
Hello Surafel,

On Fri, Sep 14, 2018 at 2:03 PM Surafel Temesgen 
wrote:

> Hey ,
> Here are a few comment.
> +  xreflabel="failover_timeout">
> Here's a typo: ="libpq-connect-falover-timeout"
> +   {"failover_timeout", NULL, NULL, NULL,
> +   "Failover Timeout", "", 10,
> Word is separated by hyphen in internalPQconninfoOption lable as a
> surrounding code
> +If the value is random, the host to connect to
> +will be randomly picked from the list. It allows load balacing
> between
> +several cluster nodes.
> I Can’t think of use case where randomly picking a node rather than in
> user specified order can load balance the cluster better. Can you
> explain the purpose of this feature more?

Probably load-balancing is a wrong word for this. Think of it as a
connection routing mechanism. Let's say you have 10 servers and 100 clients
willing to establish read-only connection. Without this feature all clients
will go to the first specified host (unless they hit max_connections
limit). And with random `hostorder` they would be splited between hosts
more or less evenly.


> And in the code I can’t see
> a mechanism for preventing picking one host multiple time
>
The original idea was to collect all ip addresses that we get after
resolving specified hostnames, put those addresses into a global array,
apply random permutations to it and then use round robin algorithm trying
to connect to each of them until we succeed. Now I'm not sure that this
approach was the best. There are two concerns:

1. host name can be resolved to several ip addresses (which in turn can
point to either the same physical server with multiple network interfaces
or different servers). In described above schema each ip address would be
added to the global array. This may lead to a situation when one host gets
higher chance of being picked because it has more addresses in global array
than other hosts.
2. host may support both ipv4 and ipv6 connections, which again leads to
extra items in global array and therefore also increases its chance to be
picked.

Another approach would be to leave `pg_conn->connhost` as it is now (i.e.
not to create global addresses array) and just apply random permutations to
it if `hostorder=random` is specified. And probably apply permutations to
addresses list within each individual host.

At this point I'd like to ask community what in your opinion would be the
best course of action and whether this feature should be implemented within
libpq at all? Because from my POV there are factors that really depend on
network architecture and there is probably no single right solution.

Kind regards,
Ildar


Re: MAP syntax for arrays

2018-05-08 Thread Ildar Musin



On 08.05.2018 17:15, Peter Eisentraut wrote:

On 5/8/18 09:19, Chapman Flack wrote:

On 05/08/2018 08:57 AM, Ildar Musin wrote:


select map (pow(2, x) - 1 for x in array[1,2,3,4,5]);


I wonder how efficient an implementation would be possible
strictly as a function, without grammar changes?


Yeah, you can pass a function to another function (using regprocedure
or just oid), so this should be possible entirely in user space.



The problem with this approach is that extension should either have
single map() function with input and output type of anyarray which
cannot be used when user needs to map int[] to text[] for example. Or
the other way there should be a set of map functions for different
intput/output types.

Another thing is that this approach is not as versatile since user need
to create a function before running map, while with the proposed patch
they could run arbitrary expression over the array directly.

--
Ildar Musin
i.mu...@postgrespro.ru



Re: MAP syntax for arrays

2018-05-08 Thread Ildar Musin


On 08.05.2018 15:49, Ildar Musin wrote:

select map (pow(x, 2) - 1 for x in array[1,2,3]);


Sorry, the example should be:

select map (pow(2, x) - 1 for x in array[1,2,3,4,5]);
   ?column?
---
 {1,3,7,15,31}
(1 row)

--
Ildar Musin
i.mu...@postgrespro.ru



Re: MAP syntax for arrays

2018-05-08 Thread Ildar Musin

Hello Tom, Ashutosh,

On 07.05.2018 18:16, Tom Lane wrote:

Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> writes:

Is there a way we can improve unnest() and array_agg() to match
the performance you have specified by let's say optimizing the
cases specially when those two are used together. Identifying that
may be some work, but will not require introducing new syntax.


+1.  The first thing I thought on seeing this proposal was "I wonder
how long it will be before the SQL committee introduces some syntax
that uses the MAP keyword and breaks this".

ISTM the planner could be taught to notice the combination of unnest
and array_agg and produce a special output plan from that.

It is, however, fair to wonder whether this is worth our time to
optimize.  I've not noticed a lot of people complaining about
performance of this sort of thing, at least not since we fixed
array_agg to not be O(N^2).


The main point of this patch was about convenience; the performance
thing came out later just as a side effect :) Many users are familiar
with "map/reduce/filter" concept that many languages (not only
functional ones) utilized. And my though was that it would be great to
have those in postgres too.

Apparently there is also a case when unnest/array_agg may not produce
the result we're looking for. I mean multidimensional arrays. E.g.

select array_agg(x * 2)
from unnest(array[[1,2],[3,4],[5,6]]) as x;
array_agg
-
 {2,4,6,8,10,12}
(1 row)

select map(x * 2 for x in array[[1,2],[3,4],[5,6]]);
   ?column?
---
 {{2,4},{6,8},{10,12}}
(1 row)

array_agg produces plain arrays no matter what the input was.

There is a new version of the patch in the attachment which introduces
arbitrary per-element expressions (instead of single function call). So
now user can specify a placeholder representing one element of the array
and use it in the expressions. Like following:

select map (pow(x, 2) - 1 for x in array[1,2,3]);
   ?column?
---
 {1,3,7,15,31}
(1 row)


--
Ildar Musin
i.mu...@postgrespro.ru
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index e284fd7..85d76b2 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -886,6 +886,56 @@ ExecInitExprRec(Expr *node, ExprState *state,
 break;
 			}
 
+		case T_MapExpr:
+			{
+MapExpr	   *map = (MapExpr *) node;
+ExprState  *elemstate;
+Oid			resultelemtype;
+
+ExecInitExprRec(map->arrexpr, state, resv, resnull);
+
+resultelemtype = get_element_type(map->resulttype);
+if (!OidIsValid(resultelemtype))
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			 errmsg("target type is not an array")));
+
+/* Construct a sub-expression for the per-element expression */
+elemstate = makeNode(ExprState);
+elemstate->expr = map->elemexpr;
+elemstate->parent = state->parent;
+elemstate->ext_params = state->ext_params;
+elemstate->innermost_caseval = (Datum *) palloc(sizeof(Datum));
+elemstate->innermost_casenull = (bool *) palloc(sizeof(bool));
+
+ExecInitExprRec(map->elemexpr, elemstate,
+>resvalue, >resnull);
+
+/* Append a DONE step and ready the subexpression */
+scratch.opcode = EEOP_DONE;
+ExprEvalPushStep(elemstate, );
+ExecReadyExpr(elemstate);
+
+scratch.opcode = EEOP_MAP;
+scratch.d.map.elemexprstate = elemstate;
+scratch.d.map.resultelemtype = resultelemtype;
+
+if (elemstate)
+{
+	/* Set up workspace for array_map */
+	scratch.d.map.amstate =
+		(ArrayMapState *) palloc0(sizeof(ArrayMapState));
+}
+else
+{
+	/* Don't need workspace if there's no subexpression */
+	scratch.d.map.amstate = NULL;
+}
+
+ExprEvalPushStep(state, );
+break;
+			}
+
 		case T_OpExpr:
 			{
 OpExpr	   *op = (OpExpr *) node;
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 9d6e25a..b2cbc45 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -328,6 +328,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 		&_EEOP_FUNCEXPR_STRICT,
 		&_EEOP_FUNCEXPR_FUSAGE,
 		&_EEOP_FUNCEXPR_STRICT_FUSAGE,
+		&_EEOP_MAP,
 		&_EEOP_BOOL_AND_STEP_FIRST,
 		&_EEOP_BOOL_AND_STEP,
 		&_EEOP_BOOL_AND_STEP_LAST,
@@ -699,6 +700,13 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 			EEO_NEXT();
 		}
 
+		EEO_CASE(EEOP_MAP)
+		{
+			ExecEvalMapExpr(state, op, econtext);
+
+			EEO_NEXT();
+		}
+
 		/*
 		 * If any of its clauses is FALSE, an AND's result is FALSE regardless
 		 * of the states of the rest of the clauses, so we can stop evaluating
@@ -2230,6 +2238,33 @@ ExecEvalFuncExprStrictFusage(ExprState *state, ExprEvalStep *op,
 }
 
 /*
+ * Evaluate a MapExpr expression.
+ *
+ * Source array is

MAP syntax for arrays

2018-05-04 Thread Ildar Musin

Hello hackers,

Recently I was working with sql arrays in postgres and it turned out
that postgres doesn't have such very convinient functional constructions
as map, reduce and filter. Currently to map function over array user has
to make a subquery like:

select u.* from
my_table,
lateral (
select array_agg(lower(elem))
from unnest(arr) as elem
) as u;

Which is not only inconvenient but not very efficient as well (see
'Demo' section below).

When I dug into the code I found that postgres already has the needed
infrastructure for implementing map for arrays; actually array coercing
already works that way (it basically maps cast function).

In the attached patch there is a simple map implementation which
introduces new expression type and syntax:

MAP( OVER )

For example:

SELECT MAP(upper OVER array['one', 'two', 'three']::text[]);
?column?
-
 {ONE,TWO,THREE}
(1 row)

This is probably not the most useful notation and it would be better to
have syntax for mapping arbitrary expressions over array, not just
function. I'm struggling to come up with a good idea of how it should
look like. It could look something like following:

MAP( FOR  IN )

For instance:

SELECT MAP(x*2 FOR x IN array[1, 2, 3]::int[]);

Looking forward for community's suggestions!

Demo


Here is a small comparison between map and unnest/aggregate ways for
per-element processing of arrays. Given a table with 1K rows which
contains single column of text[] type. Each array contains 5/10/100
elements.

create table my_table (arr text[]);
insert into my_table
select array_agg(md5(random()::text))
from generate_series(1, 1000) as rows,
 generate_series(1, 10) as elements
group by rows;

There are two scripts for pgbench. One for 'map' syntax:

select map(upper over arr) from my_table;

And one for unnest/aggregate:

select u.* from my_table,
lateral (
select array_agg(upper(elem))
from unnest(arr) as elem
) as u;

Results are:

 elements per array |  map (tps) | unnest/aggregate (tps)
++
  5 | 139.105359 |   74.434010
 10 |  74.089743 |   43.622554
100 |   7.693000 |5.325805

Apparently map is more efficient for small arrays. And as the size of
array increases the difference decreases.

I'll be glad to any input from the community. Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index e284fd7..85d76b2 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -886,6 +886,56 @@ ExecInitExprRec(Expr *node, ExprState *state,
 break;
 			}
 
+		case T_MapExpr:
+			{
+MapExpr	   *map = (MapExpr *) node;
+ExprState  *elemstate;
+Oid			resultelemtype;
+
+ExecInitExprRec(map->arrexpr, state, resv, resnull);
+
+resultelemtype = get_element_type(map->resulttype);
+if (!OidIsValid(resultelemtype))
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			 errmsg("target type is not an array")));
+
+/* Construct a sub-expression for the per-element expression */
+elemstate = makeNode(ExprState);
+elemstate->expr = map->elemexpr;
+elemstate->parent = state->parent;
+elemstate->ext_params = state->ext_params;
+elemstate->innermost_caseval = (Datum *) palloc(sizeof(Datum));
+elemstate->innermost_casenull = (bool *) palloc(sizeof(bool));
+
+ExecInitExprRec(map->elemexpr, elemstate,
+>resvalue, >resnull);
+
+/* Append a DONE step and ready the subexpression */
+scratch.opcode = EEOP_DONE;
+ExprEvalPushStep(elemstate, );
+ExecReadyExpr(elemstate);
+
+scratch.opcode = EEOP_MAP;
+scratch.d.map.elemexprstate = elemstate;
+scratch.d.map.resultelemtype = resultelemtype;
+
+if (elemstate)
+{
+	/* Set up workspace for array_map */
+	scratch.d.map.amstate =
+		(ArrayMapState *) palloc0(sizeof(ArrayMapState));
+}
+else
+{
+	/* Don't need workspace if there's no subexpression */
+	scratch.d.map.amstate = NULL;
+}
+
+ExprEvalPushStep(state, );
+break;
+			}
+
 		case T_OpExpr:
 			{
 OpExpr	   *op = (OpExpr *) node;
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 9d6e25a..b2cbc45 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -328,6 +328,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 		&_EEOP_FUNCEXPR_STRICT,
 		&_EEOP_FUNCEXPR_FUSAGE,
 		&_EEOP_FUNCEXPR_STRICT_FUSAGE,
+		&_EEOP_MAP,
 		&_EEOP_BOOL_AND_STEP_FIRST,
 		

hostorder and failover_timeout for libpq

2018-04-18 Thread Ildar Musin

Hello hackers,

Couple of years ago Victor Wagner presented a patch [1] that introduced 
multiple hosts capability and also hostorder and failover_timeout 
parameters for libpq. Subsequently multi-host feature was reimplemented 
by Robert Haas and committed. Later target_session_attrs parameter was 
also added. In this thread I want to revisit hostorder and 
failover_timeout proposal.


'hostorder' defines the order in which postgres instances listed in 
connection string will be tried. Possible values are:

* sequential (default)
* random

Random order can be used, for instance, for maintaining load balancing 
(which is particularly useful in multi-master cluster, but also can be 
used to load-balance read-only connections to standbys).


'failover_timeout' specifies time span (in seconds) during which libpq 
would continue attempts to connect to the hosts listed in connection 
string. If failover_timeout is specified then libpq will loop over hosts 
again and again until either it successfully connects to one of the 
hosts or it runs out of time.


I reimplemented 'hostorder' and 'failover_timeout' parameters in the 
attached patch. I also took some documentation pieces from Victor 
Wagner's original patch. I'll be glad to see any comments and 
suggestions. Thanks!


[1] 
https://www.postgresql.org/message-id/flat/20150818041850.GA5092%40wagner.pp.ru


--
Ildar Musin
i.mu...@postgrespro.ru
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 800e68a..a9ba534 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -845,7 +845,7 @@ postgresql://localhost/mydb
 postgresql://user@localhost
 postgresql://user:secret@localhost
 postgresql://other@localhost/otherdb?connect_timeout=10application_name=myapp
-postgresql://host1:123,host2:456/somedb?target_session_attrs=anyapplication_name=myapp
+postgresql://host1:123,host2:456/somedb?hostorder=randomtarget_session_attrs=anyapplication_name=myapp
 
 Components of the hierarchical part of the URI can also
 be given as parameters.  For example:
@@ -910,14 +910,15 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
  Specifying Multiple Hosts
 
  
-   It is possible to specify multiple hosts to connect to, so that they are
-   tried in the given order. In the Keyword/Value format, the host,
-   hostaddr, and port options accept a comma-separated
-   list of values. The same number of elements must be given in each option, such
-   that e.g. the first hostaddr corresponds to the first host name,
-   the second hostaddr corresponds to the second host name, and so
-   forth. As an exception, if only one port is specified, it
-   applies to all the hosts.
+   It is possible to specify multiple hosts to connect to. In the Keyword/Value
+   format, the host, hostaddr, and
+   port options accept a comma-separated list of values.
+   The same number of elements must be given in each option, such that e.g. the
+   first hostaddr corresponds to the first host name, the second
+   hostaddr corresponds to the second host name, and so
+   forth. As an exception, if only one port is specified,
+   it applies to all the hosts. The order in which hosts are tried may be
+   specified by  parameter.
  
 
  
@@ -968,8 +969,7 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
 Unix-domain sockets, the default is to connect to localhost.


-A comma-separated list of host names is also accepted, in which case
-each host name in the list is tried in order. See
+A comma-separated list of host names is also accepted. See
  for details.

   
@@ -1039,6 +1039,30 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname

   
 
+  
+   hostorder
+   
+   
+Specifies the order in which hosts from the list of hosts
+specified by the  parameter are
+tried.
+   
+   
+If the value of this argument is sequential (the
+default) connections to the hosts will be attempted in the order in
+which they are given.
+   
+   
+If the value is random, the host to connect to
+will be randomly picked from the list. It allows load balacing between
+several cluster nodes. However, PostgreSQL doesn't currently support
+multimaster clusters. So, without the use of third-party products,
+only read-only connections can take advantage from load-balancing.
+See 
+   
+   
+  
+
   
port

@@ -1115,6 +1139,29 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  failover_timeout
+  
+  
+   Maximum time to cyclically retry all the hosts in the connection string.
+   (as decimal integer number of seconds). If not specified, then
+   hosts are tried just once.
+  
+  
+   If we have replicating cluster, and master node

Re: using index or check in ALTER TABLE SET NOT NULL

2018-03-15 Thread Ildar Musin

Hello Sergei,

On 10.03.2018 12:35, Sergei Kornilov wrote:

Hello My patch does not apply after commit
5748f3a0aa7cf78ac6979010273bd9d50869bb8e. Here is update to current
master. Not null constraint is immutable too, so here is no changes
in PartConstraintImpliedByRelConstraint excepts rename and comments
fix.

In this patch version i also revert tests to v4 state: i use DEBUG
ereport instead INFO and code path not tested. Please tell me if i
must change tests some way.

regards, Sergei



Ok, I can't think of any other ways to test it so I have to agree with
Tom Lane i.e. rely only on coverage. (There also were another suggestion
to use statistics [select seq_scan from pg_stat_user_tables where
relid='test'::regclass] which show number of table scans. But statistics
is collected by stat collector process with some latency and hence
cannot be reliable for tests).

Patch seems correct to me, it applies and compiles cleanly, docs
compiles as well, tests pass. Changed status to Ready for Committer.

--
Ildar Musin
i.mu...@postgrespro.ru



Re: General purpose hashing func in pgbench

2018-03-14 Thread Ildar Musin

Hello Teodor,

On 07.03.2018 16:21, Ildar Musin wrote:

Turned out that the only big-endian machine I could run test on is
out of order.


I finally managed to perform this test on sparc v9 machine which is 64
bit big-endian architecture. I run pgbench script (see previous message)
with default_seed=123 on both x86-64 and sparc machines and visualized
the results. You can find them in the attached chart. Both images showed
the same distribution. So endianness isn't a concern here.

Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru


Re: Failed to request an autovacuum work-item in silence

2018-03-12 Thread Ildar Musin

Hi,

autovac_get_workitem_name() declaration seems redundant and should be 
removed. The same thing with including "utils/lsyscache.h" in brin.c.


The 'requested' variable in brininsert() I would again rename to 
something like 'success' because a work item is requested anyway but 
what matters is whether the request was satisfied/successful.


Except for those two points everything is fine and works as expected.


--
Ildar Musin
i.mu...@postgrespro.ru



Re: Failed to request an autovacuum work-item in silence

2018-03-08 Thread Ildar Musin
Hello,

The patch applies and compiles cleanly, tests pass. The code is working
as well. I was able to check it by simply creating a BRIN index and
filling the table with data forcing the index to request lots of work items:

create table test (a serial, b text);
create index on test using brin (a) with (pages_per_range=1,
autosummarize=true);
insert into test select i, repeat(md5(random()::text), 30) from
generate_series(1, 3000) i;
LOG:  could not request an autovacuum work item "BrinSummarizeRange" for
"test_a_idx"
LOG:  could not request an autovacuum work item "BrinSummarizeRange" for
"test_a_idx"
...

Just couple remarks. I would rename 'requested' variable in
AutoVacuumRequestWork() func to something like 'success' or 'result'.
Because request is something caller does. And I would also rephrase log
message as follows:

   request for autovacuum work item "%s" for "%s" failed

Thanks!

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 




Re: General purpose hashing func in pgbench

2018-03-07 Thread Ildar Musin

Hello Teodor,


1) Seems, it's good idea to add credits to Austin Appleby to
comments.



Done. Also rebased to the latest master.



I think that both points refer to the fact that original algorithm
accepts a byte string as an input, slices it up by 8 bytes and form
unsigned int values from it. In that case the order of bytes in the
input string does matter since it may result in different integers on
different architectures. And it is also fair requirement for a byte
string to be aligned as some architectures cannot handle unaligned data.
In this patch though I slightly modified the original algorithm in a way
that it takes unsigned ints as an input (not byte string), so neither of
this points should be a problem as it seems to me. But I'll double check
it on big-endian machine with strict alignment (Sparc).


Turned out that the only big-endian machine I could run test on is out 
of order.


Maybe someone has access to a big-endian machine? If so, could you 
please run some basic test and send me the resulting file? I've attached 
initialization script and pgbench script which could be run as follows:


psql postgres -f hash_init.sql
pgbench postgres -T60 -f hash_run.sql
psql postgres -c "copy abc to '/tmp/hash_results.csv'"

Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru


hash_init.sql
Description: application/sql


hash_run.sql
Description: application/sql
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 5f28023..f07ddf1 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -874,13 +874,18 @@ pgbench  options  d
 
  
   
-scale 
-   current scale factor
+client_id 
+   unique number identifying the client session (starts from zero)
   
 
   
-client_id 
-   unique number identifying the client session (starts from zero)
+default_seed 
+   seed used in hash functions by default
+  
+
+  
+scale 
+   current scale factor
   
  
 
@@ -1246,6 +1251,27 @@ pgbench  options  d
5
   
   
+   hash(a [, seed ] )
+   integer
+   alias for hash_murmur2()
+   hash(10, 5432)
+   -5817877081768721676
+  
+  
+   hash_fnv1a(a [, seed ] )
+   integer
+   https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function;>FNV-1a hash
+   hash_fnv1a(10, 5432)
+   -7793829335365542153
+  
+  
+   hash_murmur2(a [, seed ] )
+   integer
+   https://en.wikipedia.org/wiki/MurmurHash;>MurmurHash2 hash
+   hash_murmur2(10, 5432)
+   -5817877081768721676
+  
+  
int(x)
integer
cast to int
@@ -1424,6 +1450,31 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /

 
   
+Hash functions hash, hash_murmur2 and
+hash_fnv1a accept an input value and an optional seed parameter.
+In case the seed isn't provided the value of :default_seed
+is used, which is initialized randomly unless set by the command-line
+-D option. Hash functions can be used to scatter the
+distribution of random functions such as random_zipfian or
+random_exponential. For instance, the following pgbench
+script simulates possible real world workload typical for social media and
+blogging platforms where few accounts generate excessive load:
+
+
+\set r random_zipfian(0, 1, 1.07)
+\set k abs(hash(:r)) % 100
+
+
+In some cases several distinct distributions are needed which don't correlate
+with each other and this is when implicit seed parameter comes in handy:
+
+
+\set k1 abs(hash(:r), :default_seed + 123) % 100
+\set k2 abs(hash(:r), :default_seed + 321) % 100
+
+  
+
+  
As an example, the full definition of the built-in TPC-B-like
transaction is:
 
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index e23ca51..fc42c47 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -16,6 +16,10 @@
 
 #include "pgbench.h"
 
+#define PGBENCH_NARGS_VARIABLE	(-1)
+#define PGBENCH_NARGS_CASE		(-2)
+#define PGBENCH_NARGS_HASH		(-3)
+
 PgBenchExpr *expr_parse_result;
 
 static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list);
@@ -226,9 +230,13 @@ make_uop(yyscan_t yyscanner, const char *operator, PgBenchExpr *expr)
 /*
  * List of available functions:
  * - fname: function name, "!..." for special internal functions
- * - nargs: number of arguments
- *			-1 is a special value for least & greatest meaning #args >= 1
- *			-2 is for the "CASE WHEN ..." function, which has #args >= 3 and odd
+ * - nargs: number of arguments. Special cases:
+ *			- PGBENCH_NARGS_VARIABLE is a special value for least & greatest
+ *			  meaning #args >= 1;
+ *			- PGBENCH_NARGS_CASE is for the "CASE WHEN ..." function, which
+ *			  has #args >= 3 and odd;
+ * 			- PGBENCH_NARGS_HASH is f

Re: using index or check in ALTER TABLE SET NOT NULL

2018-03-06 Thread Ildar Musin



On 06.03.2018 16:12, Sergei Kornilov wrote:

Hello thank you for review!


Adding check constraint will also force the full table scan. So I
think it would be better to rephrased it as follows:

Agree. I updated docs in new attached patch slightly different


Regarding regression tests it may be useful to set
client_min_messages to 'debug1' before setting "NOT NULL" attribute
for a column. In this case you can tell for sure that
NotNullImpliedByRelConstraints() returned true (i.e. your code
actually did the job) as the special debug message is printed to
the log.

I can not find any debug messages in tests: grep client_min_messages
-irn src/test/ Only some warning level and few error. So i verify in
regression tests only top-level behavior. Or this paragraph was for
other people, not for tests?



Sorry, probably I didn't explain it clearly enough. I meant that your
regression tests can't distinguish cases when the full table scan was
actually performed from the ones when it was skipped due to
NotNullImpliedByRelConstraints() check. For instance, consider this
piece from the test suite:


# create table atacc1 (test_a int, test_b int);
CREATE TABLE
...

# alter table atacc1 add constraint atacc1_constr_a_valid check(test_a
is not null);
ALTER TABLE

# alter table atacc1 alter test_a set not null;
ALTER TABLE


It is not obvious that full table scan was omitted. But if we set
client_min_messages to 'debug1', we'll be able to see what is really
happened by the different debug messages. For example:


# create table atacc1 (test_a int, test_b int);
CREATE TABLE
...

# set client_min_messages to 'debug1';
SET

# alter table atacc1 alter test_a set not null;
DEBUG:  verifying table "atacc1"   <<<< full table scan!
ALTER TABLE

# alter table atacc1 alter test_a drop not null;
ALTER TABLE

# alter table atacc1 add constraint atacc1_constr_a_valid check(test_a
is not null);
DEBUG:  verifying table "atacc1"
ALTER TABLE

# alter table atacc1 alter test_a set not null;
DEBUG:  verifying table "atacc1" NOT NULL constraint on test_a attribute
by existed constraints   <<<< full scan was skipped!
ALTER TABLE


--
Ildar Musin
i.mu...@postgrespro.ru



Re: General purpose hashing func in pgbench

2018-03-06 Thread Ildar Musin

Hello Teodor,

Thank you for reviewing this patch.

On 06.03.2018 15:53, Teodor Sigaev wrote:

Patch applies, compiles, pgbench & global "make check" ok, doc
built ok.


Agree.

If I understand upthread correctly, implementation of Murmur hash
algorithm based on Austin Appleby work
https://github.com/aappleby/smhasher/blob/master/src/MurmurHash2.cpp

If so, I have notice and objections:

1) Seems, it's good idea to add credits to Austin Appleby to
comments.



Sounds fair, I'll send an updated version soon.


2) Reference implementaion directly says (link above): // 2. It will
not produce the same results on little-endian and big-endian //
machines.

I don't think that is good thing for testing and benchmarking for
several reasons: it could produce different data collection,
different selects, different distribution.

3) Again, from comments of reference implementation: // Note - This
code makes a few assumptions about how your machine behaves - // 1.
We can read a 4-byte value from any address without crashing

It's not true for all supported platforms. Any box with strict
aligment will SIGBUSed here.



I think that both points refer to the fact that original algorithm
accepts a byte string as an input, slices it up by 8 bytes and form
unsigned int values from it. In that case the order of bytes in the
input string does matter since it may result in different integers on
different architectures. And it is also fair requirement for a byte
string to be aligned as some architectures cannot handle unaligned data.
In this patch though I slightly modified the original algorithm in a way
that it takes unsigned ints as an input (not byte string), so neither of
this points should be a problem as it seems to me. But I'll double check
it on big-endian machine with strict alignment (Sparc).

Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru



Re: using index or check in ALTER TABLE SET NOT NULL

2018-03-05 Thread Ildar Musin

Hello Sergei,

I couldn't find any case when your code doesn't work properly. So it 
seems ok to me.



@@ -220,6 +220,13 @@ WITH ( MODULUS numeric_literal, REM
  

  
+  Full table scan is performed to check that no existing row
+  in the table has null values in given column.  It is possible to avoid
+  this scan by adding a valid CHECK constraint to
+  the table that would allow only NOT NULL values for given column.
+ 


Adding check constraint will also force the full table scan. So I think 
it would be better to rephrased it as follows:


"Full table scan is performed to check that column doesn't contain NULL 
values unless there are valid check constraints that prohibit NULL 
values for specified column. In the latter case table scan is skipped."


A native English speaker input would be helpful here.

Regarding regression tests it may be useful to set client_min_messages 
to 'debug1' before setting "NOT NULL" attribute for a column. In this 
case you can tell for sure that NotNullImpliedByRelConstraints() 
returned true (i.e. your code actually did the job) as the special debug 
message is printed to the log.


Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru



Re: Proposal: partition pruning by secondary attributes

2018-02-09 Thread Ildar Musin



On 08.02.2018 21:01, Andres Freund wrote:

On 2018-02-08 14:48:34 -0300, Alvaro Herrera wrote:

Ildar Musin wrote:


The idea is to store min and max values of secondary attributes
(like 'id' in the example above) for each partition somewhere in
catalog and use it for partition pruning along with partitioning
key. You can think of it as somewhat like BRIN index but for
partitions.


What is the problem with having a BRIN index?


Building plans to scan the individual partitions, lock them, open
the relevant files, etc is often going to be significantly more
expensive than pruning at plan time.

But there also seems to be a number of fairly nasty locking issues
with this proposal, leaving the amount of required code aside.



Sorry, I probably didn't describe it right. I wasn't talking about using
brin index for partition pruning or something like that, just used it as
a reference to the idea. I'll try to explain it in more detailed way.

Let's say we have a table to store some events, which is partitioned by
timestamp column:

CREATE TABLE events (
id serial,
dt timestamp,
...
) PARTITION BY RANGE (dt);

In some cases it is queried by 'dt' column and partition pruning is
working fine because 'dt' is a partitioning key:

EXPLAIN (COSTS OFF) SELECT ... FROM events WHERE dt >= '2018-01-01' AND
dt < '2018-02-01';
 Append
   ->  Seq Scan on events_0
 Filter: ((dt >= '2018-01-01 00:00:00'::timestamp without time
zone) AND (dt < '2018-02-01 00:00:00'::timestamp without time zone))

But if we filter the table by 'id' then planner has no other way but to
append every partition to the plan.

EXPLAIN (COSTS OFF) SELECT * FROM events WHERE id = 123;
 Append
   ->  Seq Scan on events_0
 Filter: (id = 123)
   ->  Seq Scan on events_1
 Filter: (id = 123)
   ->  Seq Scan on events_2
 Filter: (id = 123)

We can see though that values of 'dt' and 'id' both monotonically
increase over time and so we can potentially use 'id' column to do
partition pruning at plan time too. To do so we need to store min and
max values of 'id' column per partition somewhere in catalog and use
them to decide which partition should be added to the plan by matching
them to the query restrictions.

Each time table is updated we must check whether new value exceeds
stored min/max values and update those too if needed. This raises few
issues. One of them as Ashutosh Bapat mentioned is the need to change
catalog very often which could result in high catalog contention. I
can't think of comprehensive solution for this problem. But for
numerical and datetime types we could shift min or max bounds with
margin so that not every update will result in catalog change. The other
one is the need to change min/max of partition if rows were removed
which is less evil since we can postpone it and do it later (during
autovacuum for instance).

A new command for ALTER TABLE should also be introduced to specify the
column or expression which is not a partition key but can be used for
partition pruning as described above. This command would scan each
partition, gather min/max values and store them into catalog.

What do you think?

--
Ildar Musin
i.mu...@postgrespro.ru



Proposal: partition pruning by secondary attributes

2018-02-08 Thread Ildar Musin

Hello, hackers!

Sorry if this have already been discussed. I've had this idea some time
ago and then successfully forgot about it until pgconf.ru, where I had a
conversation with one of postgres users. His situation could be
described as this: they have a table with id, timestamp and some other
attributes, which is partitioned by (let's say) timestamp column. In
different contexts they may want to filter the table either by id or by
timestamp attribute (but not both). In this case pruning will only work
for timestamp column.

The idea is to store min and max values of secondary attributes (like
'id' in the example above) for each partition somewhere in catalog and
use it for partition pruning along with partitioning key. You can think
of it as somewhat like BRIN index but for partitions. And it will have
similar limitations. For example, we may benefit if secondary attribute
values are monotonically increase or decrease, but would be unhelpful if
they are scattered, or if table wasn't partitioned by range.

I wanted to ask community's opinion would it be worth considering.
Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru



Re: [HACKERS] Custom compression methods

2018-01-29 Thread Ildar Musin

Hello Ildus,

On 29.01.2018 14:44, Ildus Kurbangaliev wrote:


Thanks! Attached new version of the patch.



Patch applies cleanly, builds without any warnings, documentation builds 
ok, all tests pass.


A remark for the committers. The patch is quite big, so I really wish 
more reviewers looked into it for more comprehensive review. Also a 
native english speaker should check the documentation and comments. 
Another thing is that tests don't cover cmdrop method because the 
built-in pglz compression doesn't use it (I know there is an jsonbd 
extension [1] based on this patch and which should benefit from cmdrop 
method, but it doesn't test it either yet).


I think I did what I could and so passing this patch to committers for 
the review. Changed status to "Ready for committer".



[1] https://github.com/postgrespro/jsonbd

--
Ildar Musin
i.mu...@postgrespro.ru



Re: General purpose hashing func in pgbench

2018-01-29 Thread Ildar Musin



On 29.01.2018 15:03, Fabien COELHO wrote:



Patch applies, compiles, pgbench & global "make check" ok, doc built ok.

Ok for me, switched to "Ready".



Thank you for the thorough review!

--
Ildar Musin
i.mu...@postgrespro.ru



Re: General purpose hashing func in pgbench

2018-01-27 Thread Ildar Musin
Hello Fabien,


26/01/2018 09:28, Fabien COELHO пишет:
>
> Hello Ildar,
>
> Applies, compiles, runs.
>
> I still have a few very minor comments, sorry for this (hopefully)
> last iteration from my part. I'm kind of iterative...
>
> The XML documentation source should avoid a paragraph on one very long
> line, but rather be indented like other sections.
>
> I'd propose simplify the second part:
>
>   Hash functions can be used, for example, to modify the distribution of
>   random_zipfian or
> random_exponential
>   functions in order to obtain scattered distribution.
>   Thus the following pgbench script simulates possible real world
> workload
>   typical for social media and blogging platforms where few accounts
>   generate excessive load:
>
> ->
>
>   Hash functions can be used to scatter the distribution of random
>   functions such as random_zipfian or
>   random_exponential.
>   For instance, the following pgbench script simulates possible real
>   world workload typical for social media and blogging platforms where
>   few accounts generate excessive load:
>
> Comment the Assert(0) as an internal error that cannot happen.
>
> I'd suggest to compact the execution code by declaring int64 variable
> and coerce to int in one go, like the integer bitwise functions. I'm
> in favor to keeping them in their own case and not reuse this one.
>
I did everything you mention here and attached a new version on the patch.
Thank you!

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 3dd492c..503dd79 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -874,13 +874,18 @@ pgbench  options 
 d
 
  
   
-scale 
-   current scale factor
+client_id 
+   unique number identifying the client session (starts from 
zero)
   
 
   
-client_id 
-   unique number identifying the client session (starts from 
zero)
+default_seed 
+   random seed used in hash functions by default
+  
+
+  
+scale 
+   current scale factor
   
  
 
@@ -1246,6 +1251,27 @@ pgbench  options 
 d
5
   
   
+   hash(a [, 
seed ] )
+   integer
+   alias for hash_murmur2()
+   hash(10, 5432)
+   -5817877081768721676
+  
+  
+   hash_fnv1a(a [, 
seed ] )
+   integer
+   https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function;>FNV-1a
 hash
+   hash_fnv1a(10, 5432)
+   -7793829335365542153
+  
+  
+   hash_murmur2(a [, 
seed ] )
+   integer
+   https://en.wikipedia.org/wiki/MurmurHash;>MurmurHash2 hash
+   hash_murmur2(10, 5432)
+   -5817877081768721676
+  
+  

int(x)
integer
cast to int
@@ -1423,6 +1449,30 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) 
/

 
   
+Hash functions hash, hash_murmur2 and
+hash_fnv1a accept an input value and an optional seed 
parameter.
+In case the seed isn't provided the value of 
:default_seed
+is used, which is initialized randomly unless set by the command-line
+-D option. Hash functions can be used to scatter the
+distribution of random functions such as random_zipfian 
or
+random_exponential. For instance, the following pgbench
+script simulates possible real world workload typical for social media and
+blogging platforms where few accounts generate excessive load:
+
+
+\set r random_zipfian(0, 1, 1.07)
+\set k abs(hash(:r)) % 100
+
+
+In some cases several distinct distributions are needed which don't 
correlate with each other and this is when implicit seed parameter comes in 
handy:
+
+
+\set k1 abs(hash(:r), :default_seed + 123) % 100
+\set k2 abs(hash(:r), :default_seed + 321) % 100
+
+  
+
+  
As an example, the full definition of the built-in TPC-B-like
transaction is:
 
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index e23ca51..fc42c47 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -16,6 +16,10 @@
 
 #include "pgbench.h"
 
+#define PGBENCH_NARGS_VARIABLE (-1)
+#define PGBENCH_NARGS_CASE (-2)
+#define PGBENCH_NARGS_HASH (-3)
+
 PgBenchExpr *expr_parse_result;
 
 static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list);
@@ -226,9 +230,13 @@ make_uop(yyscan_t yyscanner, const char *operator, 
PgBenchExpr *expr)
 /*
  * List of available functions:
  * - fname: function name, "!..." for special internal functions
- * - nargs: number of arguments
- * -1 is a special value for least & greatest meaning 
#args >= 1
- * -2 is for the "CASE WHEN ..." function,

Re: [HACKERS] Custom compression methods

2018-01-26 Thread Ildar Musin

Hello Ildus,

I continue reviewing your patch. Here are some thoughts.

1. When I set column storage to EXTERNAL then I cannot set compression.
Seems reasonable:
create table test(id serial, msg text);
alter table test alter column msg set storage external;
alter table test alter column msg set compression pg_lz4;
ERROR:  storage for "msg" should be MAIN or EXTENDED

But if I reorder commands then it's ok:
create table test(id serial, msg text);
alter table test alter column msg set compression pg_lz4;
alter table test alter column msg set storage external;
\d+ test
Table "public.test"
 Column |  Type   |  ...  | Storage  | Compression
+-+  ...  +--+-
 id | integer |  ...  | plain|
 msg| text|  ...  | external | pg_lz4

So we could either allow user to set compression settings even when
storage is EXTERNAL but with warning or prohibit user to set compression
and external storage at the same time. The same thing is with setting 
storage PLAIN.



2. I think TOAST_COMPRESS_SET_RAWSIZE macro could be rewritten like
following to prevent overwriting of higher bits of 'info':

((toast_compress_header *) (ptr))->info = \
((toast_compress_header *) (ptr))->info & ~RAWSIZEMASK | (len);

It maybe does not matter at the moment since it is only used once, but
it could save some efforts for other developers in future.
In TOAST_COMPRESS_SET_CUSTOM() instead of changing individual bits you
may do something like this:

#define TOAST_COMPRESS_SET_CUSTOM(ptr) \
do { \
((toast_compress_header *) (ptr))->info = \
((toast_compress_header *) (ptr))->info & RAWSIZEMASK | 
((uint32) 0x02
<< 30) \
} while (0)

Also it would be nice if bit flags were explained and maybe replaced by
a macro.


3. In AlteredTableInfo, BulkInsertStateData and some functions (eg
toast_insert_or_update) there is a hash table used to keep preserved
compression methods list per attribute. I think a simple array of List*
would be sufficient in this case.


4. In optionListToArray() you can use list_qsort() to sort options list 
instead of converting it manually into array and then back to a list.



5. Redundunt #includes:

In heap.c:
#include "access/reloptions.h"
In tsvector.c:
#include "catalog/pg_type.h"
#include "common/pg_lzcompress.h"
In relcache.c:
#include "utils/datum.h"

6. Just a minor thing: no reason to change formatting in copy.c
-   heap_insert(resultRelInfo->ri_RelationDesc, tuple, mycid,
-   hi_options, bistate);
+   heap_insert(resultRelInfo->ri_RelationDesc, tuple,
+   mycid, hi_options, bistate);

7. Also in utility.c the extra new line was added which isn't relevant 
for this patch.


8. In parse_utilcmd.h the 'extern' keyword was removed from 
transformRuleStmt declaration which doesn't make sense in this patch.


9. Comments. Again, they should be read by a native speaker. So just a 
few suggestions:

toast_prepare_varlena() - comment needed
invalidate_amoptions_cache() - comment format doesn't match other 
functions in the file


In htup_details.h:
/* tuple contain custom compressed
 * varlenas */
should be "contains"

--
Ildar Musin
i.mu...@postgrespro.ru



Re: [HACKERS] Custom compression methods

2018-01-25 Thread Ildar Musin

Hello Ildus,

On 23.01.2018 16:04, Ildus Kurbangaliev wrote:

On Mon, 22 Jan 2018 23:26:31 +0300
Ildar Musin <i.mu...@postgrespro.ru> wrote:

Thanks for review! Attached new version of the patch. Fixed few bugs,
added more documentation and rebased to current master.


You need to rebase to the latest master, there are some conflicts.
I've applied it to the three days old master to try it.


Done.



As I can see the documentation is not yet complete. For example, there
is no section for ALTER COLUMN ... SET COMPRESSION in ddl.sgml; and
section "Compression Access Method Functions" in compression-am.sgml
hasn't been finished.


Not sure about ddl.sgml, it contains more common things, but since
postgres contains only pglz by default there is not much to show.



I've implemented an extension [1] to understand the way developer
would go to work with new infrastructure. And for me it seems clear.
(Except that it took me some effort to wrap my mind around varlena
macros but it is probably a different topic).

I noticed that you haven't cover 'cmdrop' in the regression tests and
I saw the previous discussion about it. Have you considered using
event triggers to handle the drop of column compression instead of
'cmdrop' function? This way you would kill two birds with one stone:
it still provides sufficient infrastructure to catch those events
(and it something postgres already has for different kinds of ddl
commands) and it would be easier to test.


I have added support for event triggers for ALTER SET COMPRESSION in
current version. Event trigger on ALTER can be used to replace cmdrop
function but it will be far from trivial. There is not easy way to
understand that's attribute compression is really dropping in the
command.



I've encountered unexpected behavior in command 'CREATE TABLE ... (LIKE 
...)'. It seems that it copies compression settings of the table 
attributes no matter which INCLUDING options are specified. E.g.


create table xxx(id serial, msg text compression pg_lz4);
alter table xxx alter column msg set storage external;
\d+ xxx
Table "public.xxx"
 Column |  Type   |  ...  | Storage  | Compression |
+-+  ...  +--+-+
 id | integer |  ...  | plain| |
 msg| text|  ...  | external | pg_lz4  |

Now copy the table structure with "INCLUDING ALL":

create table yyy (like xxx including all);
\d+ yyy
Table "public.yyy"
 Column |  Type   |  ...  | Storage  | Compression |
+-+  ...  +--+-+
 id | integer |  ...  | plain| |
 msg| text|  ...  | external | pg_lz4  |

And now copy without "INCLUDING ALL":

create table zzz (like xxx);
\d+ zzz
Table "public.zzz"
 Column |  Type   |  ...  | Storage  | Compression |
+-+  ...  +--+-+
 id | integer |  ...  | plain| |
 msg| text|  ...  | extended | pg_lz4  |

As you see, compression option is copied anyway. I suggest adding new 
INCLUDING COMPRESSION option to enable user to explicitly specify 
whether they want or not to copy compression settings.



I found a few phrases in documentation that can be improved. But the 
documentation should be checked by a native speaker.


In compression-am.sgml:
"an compression access method" -> "a compression access method"
"compression method method" -> "compression method"
"compability" -> "compatibility"
Probably "local-backend cached state" would be better to replace with 
"per backend cached state"?
"Useful to store the parsed view of the compression options" -> "It 
could be useful for example to cache compression options"

"and stores result of" -> "and stores the result of"
"Called when CompressionAmOptions is creating." -> "Called when 
CompressionAmOptions is being initialized"


"Note that in any system cache invalidation related with 
pg_attr_compression relation the options will be cleaned" -> "Note that 
any pg_attr_compression relation invalidation will 
cause all the cached acstate options cleared."

"Function used to ..." -> "Function is used to ..."

I think it would be nice to mention custom compression methods in 
storage.sgml. At this moment it only mentions built-in pglz compression.


--
Ildar Musin
i.mu...@postgrespro.ru



Re: General purpose hashing func in pgbench

2018-01-25 Thread Ildar Musin

Hello Fabien,

On 18.01.2018 12:06, Fabien COELHO wrote:


My intention was to introduce seed variable which potentially could be
used in different contexts, not only for hash functions.


Yes, good point. I'd need it for the pseudo-random permutation function.


I renamed it to 'hash_seed' for now. But what do you think about
naming it simply 'seed' or choosing some other general name?


ISTM "seed" that is prone to being used for something else in a script.
What about ":default_seed", which says it all?



Sounds good, renamed to "default_seed".



Some minor suggestions:

In "make_func", I'd assert that nargs is positive in the default branch.


Added assert for non-negative nargs (since there is pi() function with 
zero arguments).




The default seed may be created with just one assignment instead of
repeated |= ops. Or not:-)



Tried this, but after applying pgindent it turns into a mess : ) So I 
left it in the initial form.



In evalStandardFunc, I'd suggest to avoid the ? construction and use an
if and a direct setIntValue in both case, removing the "result"
variable, as done in other branches (eg bitwise operators...). Maybe
something like:

  if (func == murmur2)
setIntValue(retval, getHashXXX(val, seed));
  else if (...)
...
  else
Assert(0);

so that it is structurally ready for other hash functions if needed.



Done. Probably if more functions are added it would make sense to change 
it to "switch".



Documentation:

In the table, use official names in the description, and add wikipedia
links, something like:

FNV hash ->
  https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function;>FNV-1a
hash

murmur2 hash ->
  https://en.wikipedia.org/wiki/MurmurHash;>MurmurHash2
hash


In the text:

"Hash functions accepts" -> "Hash functions hash,
<..> and <> accept*NO S*"

"... provided hash_seed value is used" => "... provided the value of
:hash_seed is used, which is initialized randomly
unless set by the command-line -D option."

ISTM that the Automatic Variable table should be in alphabetical order.



Updated the documentation. Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 3dd492c..f3a4a4f 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -874,13 +874,18 @@ pgbench  options  d
 
  
   
-scale 
-   current scale factor
+client_id 
+   unique number identifying the client session (starts from zero)
   
 
   
-client_id 
-   unique number identifying the client session (starts from zero)
+default_seed 
+   random seed used in hash functions by default
+  
+
+  
+scale 
+   current scale factor
   
  
 
@@ -1246,6 +1251,27 @@ pgbench  options  d
5
   
   
+   hash(a [, seed ] )
+   integer
+   alias for hash_murmur2()
+   hash(10, 5432)
+   -5817877081768721676
+  
+  
+   hash_fnv1a(a [, seed ] )
+   integer
+   https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function;>FNV-1a hash
+   hash_fnv1a(10, 5432)
+   -7793829335365542153
+  
+  
+   hash_murmur2(a [, seed ] )
+   integer
+   https://en.wikipedia.org/wiki/MurmurHash;>MurmurHash2 hash
+   hash_murmur2(10, 5432)
+   -5817877081768721676
+  
+  
int(x)
integer
cast to int
@@ -1423,6 +1449,22 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /

 
   
+Hash functions hash, hash_murmur2 and hash_fnv1a accept an input value and an optional seed parameter. In case the seed isn't provided the value of :default_seed is used, which is initialized randomly unless set by the command-line -D option. Hash functions can be used, for example, to modify the distribution of random_zipfian or random_exponential functions in order to obtain scattered distribution. Thus the following pgbench script simulates possible real world workload typical for social media and blogging platforms where few accounts generate excessive load:
+
+
+\set r random_zipfian(0, 1, 1.07)
+\set k abs(hash(:r)) % 100
+
+
+In some cases several distinct distributions are needed which don't correlate with each other and this is when implicit seed parameter comes in handy:
+
+
+\set k1 abs(hash(:r), :default_seed + 123) % 100
+\set k2 abs(hash(:r), :default_seed + 321) % 100
+
+  
+
+  
As an example, the full definition of the built-in TPC-B-like
transaction is:
 
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index e23ca51..fc42c47 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -16,6 +16,10 @@
 
 #include "pgbench.h"
 
+#define PGBENCH_NARGS_VARIABLE

Re: [HACKERS] Custom compression methods

2018-01-22 Thread Ildar Musin
Hello Ildus,


15/01/2018 00:49, Ildus Kurbangaliev пишет:
> Attached a new version of the patch. Main changes:
>
> * compression as an access method
> * pglz as default compression access method.
> * PRESERVE syntax for tables rewrite control.
> * pg_upgrade fixes
> * support partitioned tables.
> * more tests.
>
You need to rebase to the latest master, there are some conflicts. I've
applied it to the three days old master to try it.

As I can see the documentation is not yet complete. For example, there
is no section for ALTER COLUMN ... SET COMPRESSION in ddl.sgml; and
section "Compression Access Method Functions" in compression-am.sgml
hasn't been finished.

I've implemented an extension [1] to understand the way developer would
go to work with new infrastructure. And for me it seems clear. (Except
that it took me some effort to wrap my mind around varlena macros but it
is probably a different topic).

I noticed that you haven't cover 'cmdrop' in the regression tests and I
saw the previous discussion about it. Have you considered using event
triggers to handle the drop of column compression instead of 'cmdrop'
function? This way you would kill two birds with one stone: it still
provides sufficient infrastructure to catch those events (and it
something postgres already has for different kinds of ddl commands) and
it would be easier to test.

Thanks!

[1] https://github.com/zilder/pg_lz4

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 




Re: General purpose hashing func in pgbench

2018-01-17 Thread Ildar Musin
Hello Fabien,


17/01/2018 10:52, Fabien COELHO пишет:

>> Here is a new version of patch. I've splitted it into two parts. The
>> first one is almost the same as v4 from [1] with some refactoring.
>> The second part introduces random_seed variable as you proposed.
>
> Patch 1 applies. Compilations fails, there are two "hash_seed"
> declared in "pgbench.c".
>
> Patch 2 applies cleanly on top of the previous patch and compiles,
> because the variable is removed...
>
> If an ":hash_seed" pgbench variable is used, ISTM that there is no
> need for a global variable at all, so the two patches are going back
> and forth, which is unhelpful. ISTM better to provide just one
> combined patch for the feature.
>
> If the hash_seed variable really needs to be kept, it should be an
> "int64" variable, like other pgbench values.
>
> The len < 1 || len > 2 is checked twice, once in the "switch", on in
> an "if" just after the "switch". Once is enough.

I totally messed up doing git rebase and didn't double check the code.
*facepalm* There shouldn't be hash_seed variable and the second 'len < 1
|| len > 2' check. Sorry for that, fixed in the attached patch.

> Calling random just usually initializes about 31 bits, so random
> should be called 2 or maybe 3 times? Or maybe use the internal getrand
> which has 48 pseudorandom bits?
Done. I used code from get_random_uint64() as an example.
>
> For me "random seed" is what is passed to srandom, so the variable
> should rather be named hash_seed because there could also be a random
> seed (actually, there is in another parallel patch:-). Moreover, this
> seed may or may not be random if set, so calling it "random_seed" is
> not desirable.
>
My intention was to introduce seed variable which potentially could be
used in different contexts, not only for hash functions. I renamed it to
'hash_seed' for now. But what do you think about naming it simply 'seed'
or choosing some other general name?
>> I didn't do the executor simplification thing yet because I'm a
>> little concerned about inventive users, who may want to change
>> random_seed variable in runtime (which is possible since pgbench
>> doesn't have read only variables aka constants AFAIK).
>
> If the user choses to overide hash_seed in their script, it is their
> decision, the documentation has only to be clear about :hash_seed
> being the default seed. I see no clear reason to work around this
> possibility by evaluating the seed at parse time, especially as the
> variable may not have its final value yet depending on the option
> order. I'd suggest to just use make_variable("hash_seed") for the
> default second argument and simplify the executor.
That is a great idea, I didn't see that possibility. Done.
>
> The seed variable is not tested explicitely in the script, you could add
> a "hash(5432) == hash(5432, :hash_seed)" for instance.
>
> It would be nice if an native English speaker could proofread the
> documentation text. I'd suggest: "*an* optional seed parameter". "In
> case *the* seed...". ":hash_seed". "shared for" ->
> "shared by". "following listing" -> "following pgbench script". "few
> accounts generates" -> "few accounts generate".
>
Done as well.
> For the document example, I'd use larger values for the random &
> modulo, eg 1 and 100. The drawback is that zipfian does a
> costly computation of the generalized harmonic number when the
> parameter is lower than 1.0. For cities, the parameter found by Zipf
> was 1.07 (says Wikipedia). Maybe use this historical value. Or maybe
> use an exponential distribution in the example.
>
Changed parameter to 1.07.

Thanks!

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 3dd492c..c31254c 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -882,6 +882,11 @@ pgbench  options 
 d
 client_id 
unique number identifying the client session (starts from 
zero)
   
+
+  
+hash_seed 
+   random seed used in hash functions by default
+  
  
 

@@ -1246,6 +1251,27 @@ pgbench  options 
 d
5
   
   
+   hash(a [, 
seed ] )
+   integer
+   alias for hash_murmur2()
+   hash(10, 5432)
+   -5817877081768721676
+  
+  
+   hash_fnv1a(a [, 
seed ] )
+   integer
+   FNV hash
+   hash_fnv1a(10, 5432)
+   -7793829335365542153
+  
+  
+   hash_murmur2(a [, 
seed ] )
+   integer
+   

Re: General purpose hashing func in pgbench

2018-01-16 Thread Ildar Musin
Hi Fabien,


13/01/2018 11:16, Fabien COELHO пишет:
>
> Hello Ildar,
>
>>> so that different instances of hash function within one script would
>>> have different seeds. Yes, that is a good idea, I can do that.
>>>
>> Added this feature in attached patch. But on a second thought this could
>> be something that user won't expect. For example, they may want to run
>> pgbench with two scripts:
>> - the first one updates row by key that is a hashed random_zipfian
>> value;
>> - the second one reads row by key generated the same way
>> (that is actually what YCSB workloads A and B do)
>>
>> It feels natural to write something like this:
>> \set rnd random_zipfian(0, 100, 0.99)
>> \set key abs(hash(:rnd)) % 1000
>> in both scripts and expect that they both would have the same
>> distribution. But they wouldn't. We could of course describe this
>> implicit behaviour in documentation, but ISTM that shared seed would be
>> more clear.
>
> I think that it depends on the use case, that both can be useful, so
> there should be a way to do either.
>
> With "always different" default seed, distinct distributions are achieved
> with:
>
>    -- DIFF distinct seeds inside and between runs
>    \set i1 abs(hash(:r1)) % 1000
>    \set j1 abs(hash(:r2)) % 1000
>
> and the same distribution can be done with an explicit seed:
>
>    -- DIFF same seed inside and between runs
>    \set i1 abs(hash(:r1), 5432) % 1000
>    \set j1 abs(hash(:r2), 5432) % 1000
>
> The drawback is that the same seed is used between runs in this case,
> which is not desirable. This could be circumvented by adding the
> random seed as an automatic variable and using it, eg:
>
>    -- DIFF same seed inside run, distinct between runs
>    \set i1 abs(hash(:r1), :random_seed + 5432) % 1000
>    \set j1 abs(hash(:r2), :random_seed + 2345) % 1000
>
>
> Now with a shared hash_seed the same distribution is by default:
>
>    -- SHARED same underlying hash_seed inside run, distinct between runs
>    \set i1 abs(hash(:r1)) % 1000
>    \set j1 abs(hash(:r2)) % 1000
>
> However some trick is needed now to get distinct seeds. With
>
>    -- SHARED distinct seed inside run, but same between runs
>    \set i1 abs(hash(:r1, 5432)) % 1000
>    \set j1 abs(hash(:r2, 2345)) % 1000
>
> We are back to the same issue has the previous case because then the
> distribution is the same from one run to the next, which is not
> desirable. I found this workaround trick:
>
>    -- SHARED distinct seeds inside and between runs
>    \set i1 abs(hash(:r1, hash(5432))) % 1000
>    \set j1 abs(hash(:r2, hash(2345))) % 1000
>
> Or with a new :hash_seed or :random_seed automatic variable, we could
> also have:
>
>    -- SHARED distinct seeds inside and between runs
>    \set i1 abs(hash(:r1, :hash_seed + 5432)) % 1000
>    \set j1 abs(hash(:r2, :hash_seed + 2345)) % 1000
>
> It provides controllable distinct seeds between runs but equal one
> between if desired, by reusing the same value to be hashed as a seed.
>
> I also agree with your argument that the user may reasonably expect
> that hash(5432) == hash(5432) inside and between scripts, at least on
> the same run, so would be surprised that it is not the case.
>
> So I've changed my mind, I'm sorry for making you going back and forth
> on the subject. I'm now okay with one shared 64 bit hash seed, with a
> clear documentation about the fact, and an outline of the trick to
> achieve distinct distributions inside a run if desired and why it
> would be desirable to avoid correlations. Also, I think that providing
> the seed as automatic variable (:hash_seed or :hseed or whatever)
> would make some sense as well. Maybe this could be used as a way to
> fix the seed explicitely, eg:
>
>    pgbench -D hash_seed=1234 ...
>
> Would use this value instead of the random generated one. Also, with
> that the default inserted second argument could be simply
> ":hash_seed", which would simplify the executor which would not have
> to do check for an optional second argument.
>
Here is a new version of patch. I've splitted it into two parts. The
first one is almost the same as v4 from [1] with some refactoring. The
second part introduces random_seed variable as you proposed. I didn't do
the executor simplification thing yet because I'm a little concerned
about inventive users, who may want to change random_seed variable in
runtime (which is possible since pgbench doesn't have read only
variables aka constants AFAIK).

[1]
https://www.postgresql.org/message-id/43a8fbbb-32fa-6478-30a9-f64041adf019%40postgrespro.ru

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.co

Re: General purpose hashing func in pgbench

2018-01-12 Thread Ildar Musin
Hello Fabien,

11/01/2018 19:21, Ildar Musin пишет:
>
> 10/01/2018 21:42, Fabien COELHO пишет:
>> Hmm. I do not think that we should want a shared seed value. The seed
>> should be different for each call so as to avoid undesired
>> correlations. If wanted, correlation could be obtained by using an
>> explicit identical seed.
>>
>> ISTM that the best way to add the seed is to call random() when the
>> second arg is missing in make_func. Also, this means that the executor
>> would always get its two arguments, so it would simplify the code there.
>>
> Ok, I think I understand what you meant. You meant the case like following:
>
> \set x random(1, 100)
> \set h1 hash(:x)
> \set h2 hash(:x)  -- will have different seed from h1
>
> so that different instances of hash function within one script would
> have different seeds. Yes, that is a good idea, I can do that.
>
Added this feature in attached patch. But on a second thought this could
be something that user won't expect. For example, they may want to run
pgbench with two scripts:
- the first one updates row by key that is a hashed random_zipfian value;
- the second one reads row by key generated the same way
(that is actually what YCSB workloads A and B do)

It feels natural to write something like this:
\set rnd random_zipfian(0, 100, 0.99)
\set key abs(hash(:rnd)) % 1000
in both scripts and expect that they both would have the same
distribution. But they wouldn't. We could of course describe this
implicit behaviour in documentation, but ISTM that shared seed would be
more clear.

Thanks!

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 3dd492c..c575f19 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1246,6 +1246,27 @@ pgbench  options 
 d
5
   
   
+   hash(a [, 
seed ] )
+   integer
+   alias for hash_murmur2()
+   hash(10, 5432)
+   -5817877081768721676
+  
+  
+   hash_fnv1a(a [, 
seed ] )
+   integer
+   FNV hash
+   hash_fnv1a(10, 5432)
+   -7793829335365542153
+  
+  
+   hash_murmur2(a [, 
seed ] )
+   integer
+   murmur2 hash
+   hash_murmur2(10, 5432)
+   -5817877081768721676
+  
+  

int(x)
integer
cast to int
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index e23ca51..36cad30 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -16,6 +16,10 @@
 
 #include "pgbench.h"
 
+#define PGBENCH_NARGS_VARIABLE (-1)
+#define PGBENCH_NARGS_CASE (-2)
+#define PGBENCH_NARGS_HASH (-3)
+
 PgBenchExpr *expr_parse_result;
 
 static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list);
@@ -226,9 +230,13 @@ make_uop(yyscan_t yyscanner, const char *operator, 
PgBenchExpr *expr)
 /*
  * List of available functions:
  * - fname: function name, "!..." for special internal functions
- * - nargs: number of arguments
- * -1 is a special value for least & greatest meaning 
#args >= 1
- * -2 is for the "CASE WHEN ..." function, which has #args 
>= 3 and odd
+ * - nargs: number of arguments. Special cases:
+ * - PGBENCH_NARGS_VARIABLE is a special value for least & 
greatest
+ *   meaning #args >= 1;
+ * - PGBENCH_NARGS_CASE is for the "CASE WHEN ..." 
function, which
+ *   has #args >= 3 and odd;
+ * - PGBENCH_NARGS_HASH is for hash functions, which have 
one required
+ *   and one optional argument;
  * - tag: function identifier from PgBenchFunction enum
  */
 static const struct
@@ -259,10 +267,10 @@ static const struct
"abs", 1, PGBENCH_ABS
},
{
-   "least", -1, PGBENCH_LEAST
+   "least", PGBENCH_NARGS_VARIABLE, PGBENCH_LEAST
},
{
-   "greatest", -1, PGBENCH_GREATEST
+   "greatest", PGBENCH_NARGS_VARIABLE, PGBENCH_GREATEST
},
{
"debug", 1, PGBENCH_DEBUG
@@ -347,7 +355,16 @@ static const struct
},
/* "case when ... then ... else ... end" construction */
{
-   "!case_end", -2, PGBENCH_CASE
+   "!case_end", PGBENCH_NARGS_CASE, PGBENCH_CASE
+   },
+   {
+   "hash", PGBENCH_NARGS_HASH, PGBENCH_HASH_MURMUR2
+   },
+   {
+   "hash_murmur2", PGBENCH_NARGS_HASH, PGBENCH_HASH_MURMUR2
+   },
+   {
+   "hash_fnv1a", PGBENCH_NARGS_HASH,

Re: General purpose hashing func in pgbench

2018-01-11 Thread Ildar Musin


10/01/2018 21:42, Fabien COELHO пишет:
>
> Hmm. I do not think that we should want a shared seed value. The seed
> should be different for each call so as to avoid undesired
> correlations. If wanted, correlation could be obtained by using an
> explicit identical seed.
>
> ISTM that the best way to add the seed is to call random() when the
> second arg is missing in make_func. Also, this means that the executor
> would always get its two arguments, so it would simplify the code there.
>
Ok, I think I understand what you meant. You meant the case like following:

\set x random(1, 100)
\set h1 hash(:x)
\set h2 hash(:x)  -- will have different seed from h1

so that different instances of hash function within one script would
have different seeds. Yes, that is a good idea, I can do that.

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 




Re: General purpose hashing func in pgbench

2018-01-11 Thread Ildar Musin
Hello Fabien,


10/01/2018 21:42, Fabien COELHO пишет:
>>>> Should we probably add some infrastructure for optional arguments?
>>>
>>> You can look at the handling of "CASE" which may or may not have an
>>> "ELSE" clause.
>>>
>>> I'd suggest you use a new negative argument with the special meaning
>>> for hash, and create the seed value when missing when building the
>>> function, so as to simplify the executor code.
>
>> Added a new nargs option -3 for hash functions and moved arguments check
>> to parser. It's starting to look a bit odd and I'm thinking about
>> replacing bare numbers (-1, -2, -3) with #defined macros. E.g.:
>>
>> #define PGBENCH_NARGS_VARIABLE (-1)
>> #define PGBENCH_NARGS_CASE (-2)
>> #define PGBENCH_NARGS_HASH (-3)
>
> Yes, I'm more than fine with improving readability.
>
Added macros.
>>> Instead of 0, I'd consider providing a random default so that the
>>> hashing behavior is not the same from one run to the next. What do you
>>> think?
>>
>> Makes sence since each thread is also initializes its random_state with
>> random numbers before start. So I added global variable 'hash_seed' and
>> initialize it with random() before threads spawned.
>
> Hmm. I do not think that we should want a shared seed value. The seed
> should be different for each call so as to avoid undesired
> correlations. If wanted, correlation could be obtained by using an
> explicit identical seed.
Probably I'm missing something but I cannot see the point. If we change
seed on every invokation then we get uniform-like distribution (see
attached image). And we don't get the same hash value for the same input
which is the whole point of hash functions. Maybe I didn't understand
you correctly.

Anyway I've attached a new version with some tests and docs added.

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 3dd492c..c575f19 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1246,6 +1246,27 @@ pgbench  options 
 d
5
   
   
+   hash(a [, 
seed ] )
+   integer
+   alias for hash_murmur2()
+   hash(10, 5432)
+   -5817877081768721676
+  
+  
+   hash_fnv1a(a [, 
seed ] )
+   integer
+   FNV hash
+   hash_fnv1a(10, 5432)
+   -7793829335365542153
+  
+  
+   hash_murmur2(a [, 
seed ] )
+   integer
+   murmur2 hash
+   hash_murmur2(10, 5432)
+   -5817877081768721676
+  
+  

int(x)
integer
cast to int
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index e23ca51..9668b3d 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -16,6 +16,10 @@
 
 #include "pgbench.h"
 
+#define PGBENCH_NARGS_VARIABLE (-1)
+#define PGBENCH_NARGS_CASE (-2)
+#define PGBENCH_NARGS_HASH (-3)
+
 PgBenchExpr *expr_parse_result;
 
 static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list);
@@ -226,9 +230,13 @@ make_uop(yyscan_t yyscanner, const char *operator, 
PgBenchExpr *expr)
 /*
  * List of available functions:
  * - fname: function name, "!..." for special internal functions
- * - nargs: number of arguments
- * -1 is a special value for least & greatest meaning 
#args >= 1
- * -2 is for the "CASE WHEN ..." function, which has #args 
>= 3 and odd
+ * - nargs: number of arguments. Special cases:
+ * - PGBENCH_NARGS_VARIABLE is a special value for least & 
greatest
+ *   meaning #args >= 1;
+ * - PGBENCH_NARGS_CASE is for the "CASE WHEN ..." 
function, which
+ *   has #args >= 3 and odd;
+ * - PGBENCH_NARGS_HASH is for hash functions, which have 
one required
+ *   and one optional argument;
  * - tag: function identifier from PgBenchFunction enum
  */
 static const struct
@@ -259,10 +267,10 @@ static const struct
"abs", 1, PGBENCH_ABS
},
{
-   "least", -1, PGBENCH_LEAST
+   "least", PGBENCH_NARGS_VARIABLE, PGBENCH_LEAST
},
{
-   "greatest", -1, PGBENCH_GREATEST
+   "greatest", PGBENCH_NARGS_VARIABLE, PGBENCH_GREATEST
},
{
"debug", 1, PGBENCH_DEBUG
@@ -347,7 +355,16 @@ static const struct
},
/* "case when ... then ... else ... end" construction */
{
-   "!case_end", -2, PGBENCH_CASE
+   "!ca

Re: General purpose hashing func in pgbench

2018-01-10 Thread Ildar Musin

10/01/2018 16:35, Ildar Musin пишет:
> 09/01/2018 23:11, Fabien COELHO пишет:
>> Hello Ildar,
>>
>>> Sorry for a long delay. I've added hash() function which is just an
>>> alias for murmur2. I've also utilized variable arguments feature from
>>> least()/greates() functions to make optional seed parameter, but I
>>> consider this as a hack.
>> Patch needs a rebase after Teodor push for a set of pgbench functions.
> Done. Congratulations on your patch finally being committed : )
>>> Should we probably add some infrastructure for optional arguments?
>> You can look at the handling of "CASE" which may or may not have an
>> "ELSE" clause.
>>
>> I'd suggest you use a new negative argument with the special meaning
>> for hash, and create the seed value when missing when building the
>> function, so as to simplify the executor code.
> Added a new nargs option -3 for hash functions and moved arguments check
> to parser. It's starting to look a bit odd and I'm thinking about
> replacing bare numbers (-1, -2, -3) with #defined macros. E.g.:
>
> #define PGBENCH_NARGS_VARIABLE (-1)
> #define PGBENCH_NARGS_CASE (-2)
> #define PGBENCH_NARGS_HASH (-3)
>> Instead of 0, I'd consider providing a random default so that the
>> hashing behavior is not the same from one run to the next. What do you
>> think?
>>
> Makes sence since each thread is also initializes its random_state with
> random numbers before start. So I added global variable 'hash_seed' and
> initialize it with random() before threads spawned.
>> Like the previous version, murmur2 with seed looks much more random
>> than fnv with seed...
>>
Sorry, here is the patch

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 

diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index e23ca51..847b011 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -349,6 +349,15 @@ static const struct
{
"!case_end", -2, PGBENCH_CASE
},
+   {
+   "hash", -3, PGBENCH_HASH_MURMUR2
+   },
+   {
+   "hash_murmur2", -3, PGBENCH_HASH_MURMUR2
+   },
+   {
+   "hash_fnv1a", -3, PGBENCH_HASH_FNV1A
+   },
/* keep as last array element */
{
NULL, 0, 0
@@ -447,6 +456,15 @@ make_func(yyscan_t yyscanner, int fnumber, PgBenchExprList 
*args)
expr_yyerror_more(yyscanner, "odd and >= 3 number of 
arguments expected",
  "case control 
structure");
}
+   /* special case: hash functions with optional arguments */
+   if (PGBENCH_FUNCTIONS[fnumber].nargs == -3)
+   {
+   int len = elist_length(args);
+
+   if (len < 1 || len > 2)
+   expr_yyerror_more(yyscanner, "unexpected number of 
arguments",
+ 
PGBENCH_FUNCTIONS[fnumber].fname);
+   }
 
expr->etype = ENODE_FUNCTION;
expr->u.function.function = PGBENCH_FUNCTIONS[fnumber].tag;
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 31ea6ca..6bf94cc 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -61,6 +61,14 @@
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
+ * Hashing constants
+ */
+#define FNV_PRIME 0x10001b3
+#define FNV_OFFSET_BASIS 0xcbf29ce484222325
+#define MM2_MUL 0xc6a4a7935bd1e995
+#define MM2_ROT 47
+
+/*
  * Multi-platform pthread implementations
  */
 
@@ -439,6 +447,8 @@ static int  num_scripts;/* number of scripts in 
sql_script[] */
 static int num_commands = 0;   /* total number of Command structs */
 static int64 total_weight = 0;
 
+static int hash_seed;  /* default seed used in hash functions 
*/
+
 static int debug = 0;  /* debug flag */
 
 /* Builtin test scripts */
@@ -915,6 +925,51 @@ getZipfianRand(TState *thread, int64 min, int64 max, 
double s)
 }
 
 /*
+ * FNV-1a hash function
+ */
+static int64
+getHashFnv1a(int64 val, uint64 seed)
+{
+   int64   result;
+   int i;
+
+   result = FNV_OFFSET_BASIS ^ seed;
+   for (i = 0; i < 8; ++i)
+   {
+   int32 octet = val & 0xff;
+
+   val = val >> 8;
+   result = result ^ octet;
+   result = result * FNV_PRIME;
+   }
+
+   return result;
+}
+
+/*
+ * Murmur2 hash function
+ */
+static int64
+getHashMurmur2(int64 val, uint64 seed)
+{
+   uint64  result = seed ^ (sizeof(int64) * MM2_MUL);
+   uint64  k = (uint64) val;
+
+   k *= MM2_MUL;
+   k ^= k

Re: General purpose hashing func in pgbench

2018-01-09 Thread Ildar Musin
Hello Fabien,


25/12/2017 19:17, Fabien COELHO пишет:
>
>>> However, the key can be used if controlled so that different values do
>>> not have the same randomization in different part of the script, so as
>>> to avoid using the same patterns for different things if not desirable.
>>
>> Original murmur algorithm accepts seed as a parameter, which can be used
>> for this purpose. I used value itself as a seed in the patch, but I can
>> turn it into a function argument.
>
> Hmmm. Possibly. However it needs a default, something like
>
>   x = hash(v[, key])
>
> with key some pseudo-random value?
>
Sorry for a long delay. I've added hash() function which is just an
alias for murmur2. I've also utilized variable arguments feature from
least()/greates() functions to make optional seed parameter, but I
consider this as a hack. Should we probably add some infrastructure for
optional arguments? Docs and tests are on their way.

Thanks!

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 

diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 26494fd..340c020 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -131,7 +131,8 @@ make_op(yyscan_t yyscanner, const char *operator,
  * List of available functions:
  * - fname: function name
  * - nargs: number of arguments
- * -1 is a special value for least & greatest meaning 
#args >= 1
+ * -1 is a special value for least, greatest & hash 
functions
+ *  meaning variable arguments number
  * - tag: function identifier from PgBenchFunction enum
  */
 static const struct
@@ -200,6 +201,15 @@ static const struct
{
"power", 2, PGBENCH_POW
},
+   {
+   "hash", -1, PGBENCH_HASH_MURMUR2
+   },
+   {
+   "hash_murmur2", -1, PGBENCH_HASH_MURMUR2
+   },
+   {
+   "hash_fnv1a", -1, PGBENCH_HASH_FNV1A
+   },
/* keep as last array element */
{
NULL, 0, 0
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index fc2c734..71e4814 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -61,6 +61,14 @@
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
+ * Hashing constants
+ */
+#define FNV_PRIME 0x10001b3
+#define FNV_OFFSET_BASIS 0xcbf29ce484222325
+#define MM2_MUL 0xc6a4a7935bd1e995
+#define MM2_ROT 47
+
+/*
  * Multi-platform pthread implementations
  */
 
@@ -912,6 +920,51 @@ getZipfianRand(TState *thread, int64 min, int64 max, 
double s)
 }
 
 /*
+ * FNV-1a hash function
+ */
+static int64
+getHashFnv1a(int64 val, uint64 seed)
+{
+   int64   result;
+   int i;
+
+   result = FNV_OFFSET_BASIS ^ seed;
+   for (i = 0; i < 8; ++i)
+   {
+   int32 octet = val & 0xff;
+
+   val = val >> 8;
+   result = result ^ octet;
+   result = result * FNV_PRIME;
+   }
+
+   return result;
+}
+
+/*
+ * Murmur2 hash function
+ */
+static int64
+getHashMurmur2(int64 val, uint64 seed)
+{
+   uint64  result = seed ^ (sizeof(int64) * MM2_MUL);
+   uint64  k = (uint64) val;
+
+   k *= MM2_MUL;
+   k ^= k >> MM2_ROT;
+   k *= MM2_MUL;
+
+   result ^= k;
+   result *= MM2_MUL;
+
+   result ^= result >> MM2_ROT;
+   result *= MM2_MUL;
+   result ^= result >> MM2_ROT;
+
+   return (int64) result;
+}
+
+/*
  * Initialize the given SimpleStats struct to all zeroes
  */
 static void
@@ -1868,6 +1921,37 @@ evalFunc(TState *thread, CState *st,
return true;
}
 
+   /* hashing */
+   case PGBENCH_HASH_FNV1A:
+   case PGBENCH_HASH_MURMUR2:
+   {
+   int64   val;
+   int64   seed = 0;
+   int64   result;
+
+   Assert(nargs >= 1);
+
+   if (!coerceToInt([0], ))
+   return false;
+
+   if (nargs > 2)
+   {
+   fprintf(stderr,
+   "hash function accepts 
maximum of two arguments\n");
+   return false;
+   }
+
+   /* read optional seed value */
+   if (nargs > 1)
+   if (!coerceToInt([1], ))
+   return false;
+
+   result

Re: General purpose hashing func in pgbench

2017-12-25 Thread Ildar Musin

25/12/2017 17:12, Fabien COELHO пишет:
>
> However, the key can be used if controlled so that different values do
> not have the same randomization in different part of the script, so as
> to avoid using the same patterns for different things if not desirable.
Original murmur algorithm accepts seed as a parameter, which can be used
for this purpose. I used value itself as a seed in the patch, but I can
turn it into a function argument.
>
> For the pgbench pseudo-random permutation I'm looking at, the key can
> be specified to control whether the same pattern or not should be used.
>
Just curious, which algorithm are you intended to choose?

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 




Re: General purpose hashing func in pgbench

2017-12-25 Thread Ildar Musin
Hello Fabien,


24/12/2017 11:12, Fabien COELHO пишет:
>
> Yep. The ugliness is significantly linked to the choice of name. With
> MM2_MUL and MM2_ROT ISTM that it is more readable:
>
>>     k *= MM2_MUL;
>>     k ^= k >> MM2_ROT;
>>     k *= MM2_MUL;
>>     result ^= k;
>>     result *= MM2_MUL;
Ok, will do.
>
>> [...] So I'd better leave it the way it is. Actually I was thinking
>> to do the same to fnv1a too : )
>
> I think that the implementation style should be homogeneous, so I'd
> suggest at least to stick to one style.
>
> I noticed from the source of all human knowledege (aka Wikipedia:-)
> that there seems to be a murmur3 successor. Have you considered it?
> One good reason to skip it would be that the implementation is long
> and complex. I'm not sure about a 8-byte input simplified version.
Murmur2 naturally supports 8-byte data. Murmur3 has 32- and 128-bit
versions. So to handle int64 I could
1) split input value into two halfs and combine somehow the results of
32 bit version or
2) use 128-bit version and discard higher bytes.

Btw, postgres core already has a 32bit murmur3 implementation, but it
only uses the finalization part of algorithm (see murmurhash32). As my
colleague Yura Sokolov told me in a recent conversation it alone
provides pretty good randomization. I haven't tried it yet though.

>
> Just a question: Have you looked at SipHash24?
>
> https://en.wikipedia.org/wiki/SipHash
>
> The interesting point is that it can use a key and seems somehow
> cryptographically secure, for a similar cost. However the how to
> decide for/control the key is unclear.
>
Not yet. As I can understand from the wiki its main feature is to
prevent attacks with crafted input data. How can it be useful in
benchmarking? Unless it shows superior performance and randomization.

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 




Re: General purpose hashing func in pgbench

2017-12-22 Thread Ildar Musin

21/12/2017 18:26, Fabien COELHO пишет:
>
>> I think it is not commitfest ready yet -- I need to add some
>> documentation and tests first.
>
> Yes, doc & test are missing.
>
> From your figures, the murmur2 algorithm output looks way better. I'm
> wondering whether it makes sense to provide a bad hash function if a
> good/better one is available, unless the bad one actually appears in
> some benchmark... So I would suggest to remove fnv1a.
Actually the "bad" one appears in YCSB. But if we should choose the only
one I would stick to murmur too given it provides better results while
having similar computational complexity.
>
> One implementation put constants in defines, the other one uses "const
> int". The practice in pgbench seems to use defines (eg
> MIN_GAUSSIAN_PARAM...), so I would suggest to stick to this style.
That had been my intention from the start until I coded it that way and
it looked ugly and hard to read (IMHO), like:

    k *= MURMUR2_M;
    k ^= k >> MURMUR2_R;
    k *= MURMUR2_M;
    result ^= k;
    result *= MURMUR2_M;

...etc. And besides it is not a parameter to be tuned and not something
someone would ever want to change; it appears in just a single function.
So I'd better leave it the way it is. Actually I was thinking to do the
same to fnv1a too : )

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 




Re: General purpose hashing func in pgbench

2017-12-21 Thread Ildar Musin
21/12/2017 15:44, Fabien COELHO пишет:
>
>>> Add the submission to the next CF?
>> I think it is not commitfest ready yet -- I need to add some
>> documentation and tests first.
>
> It just helps to that the thread is referenced, and the review process
> has started anyway.
>
You are right, I've submitted the patch to upcoming commitfest.

Thanks!

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company