A case for UPDATE DISTINCT attribute

2018-12-14 Thread Gajus Kuizinas
I have observed that the following pattern is repeating in our data
management programs:

UPDATE
  event
SET
  fuid = ${fuid},
  venue_id = ${venueId},
  url = ${url}
WHERE
  id = ${id} AND
  fuid IS != ${fuid} AND
  venue_id IS != ${venueId} AND
  url IS DISTINCT FROM ${url};

Note: "url" can be null. Therefore, using IS DISTINCT FROM.

The reasons we are using this pattern are multiple:

   - an empty update will trigger matching triggers.
   - an empty update will be WAL-logged
   - an empty update create dead tuples that will need to be cleaned up by
   AUTOVACUUM

In cases where the data does not change, all of these are undesirable side
effects.

Meanwhile, a WHERE condition that excludes rows with matching values makes
this into a noop in case of matching target column values.

It appears this that this pattern should be encouraged, but the verbosity
(and the accompanying risk of introducing logical error, e.g. accidentally
using = comparison on a NULLable column) makes this a rarely used pattern.

I suggest that introducing an attribute such as "UPDATE DISTINCT", e.g.

UPDATE DISTINCT
  event
SET
  fuid = ${fuid},
  venue_id = ${venueId},
  url = ${url}
WHERE
  id = ${id}

would encourage greater adoption of such pattern.

Is there a technical reason this does not existing already?

ᐧ


IMMUTABLE and PARALLEL SAFE function markings

2018-11-26 Thread Gajus Kuizinas
I have defined a function "is_not_distinct" for the purpose of creating a
custom operator "===".

CREATE FUNCTION is_not_distinct(a anyelement, b anyelement)
returns boolean
language sql as $$
  select a is not distinct from b;
$$
IMMUTABLE;

CREATE OPERATOR === (
  LEFTARG = anyelement,
  RIGHTARG = anyelement,
  PROCEDURE = is_not_distinct,
  NEGATOR = !==
);

I have observed that the resulting queries were executed without using
parallelisation.

I have learned by asking on Freenode that the reason my queries are not
using parallelisation is because I have not configured PARALLEL SAFE.

I find it counter-intuitive that a function with IMMUTABLE marking would
require an explicit PARALLEL SAFE. It would seem logical that in all cases
where a function is appropriately market as IMMUTABLE it would also
be PARALLEL SAFE.

I was wondering what is the reason IMMUTABLE functions are not by
default PARALLEL SAFE and if the default behaviour could be changed to make
IMMUTABLE functions PARALLEL SAFE?

Thank you,

Gajus
ᐧ