Fwd: Advice about preloaded libraries

2023-10-10 Thread Esteban Zimanyi
MobilityDB
https://github.com/MobilityDB/MobilityDB
is a PostgreSQL extension that depends on PosGIS.

Bradford Boyle who has been working on packaging MobilityDB
https://www.postgresql.org/message-id/capqrbe716d3gpd0jdbafab72elajrppg1luzvobelnbgl3r...@mail.gmail.com
highlighted the issue of which of the GUC shared_preload_libraries vs
local_preload_libraries vs session_preload_libraries should be used to load
the postgis-3 library.

Our understanding of the information in the manual
https://www.postgresql.org/docs/16/runtime-config-client.html#GUC-SESSION-PRELOAD-LIBRARIES
does not give us a clear-cut answer for this question. We are looking for
advice on which of the three options mentioned above should be used.

MobilityDB requires loading PostGIS before any MobilityDB query can be
issued. For example, commenting out the following line
#shared_preload_libraries = 'postgis-3'
in the postgresql.conf shows the following

$ psql test
psql (15.3)
Type "help" for help.

test=# select tgeompoint 'Point(1 1)@2000-01-01';
2023-10-03 16:41:25.980 CEST [8683] ERROR:  could not load library
"/usr/local/pgsql/15/lib/libMobilityDB-1.1.so": /usr/local/pgsql/15/lib/
libMobilityDB-1.1.so: undefined symbol: ST_Intersects at character 19
2023-10-03 16:41:25.980 CEST [8683] STATEMENT:  select tgeompoint 'Point(1
1)@2000-01-01';
ERROR:  could not load library "/usr/local/pgsql/15/lib/libMobilityDB-1.1.so":
/usr/local/pgsql/15/lib/libMobilityDB-1.1.so: undefined symbol:
ST_Intersects
LINE 1: select tgeompoint 'Point(1 1)@2000-01-01';
  ^
test=# select st_point(1,1);
  st_point

 010100F03FF03F
(1 row)

test=# select tgeompoint 'Point(1 1)@2000-01-01';
tgeompoint
---
 010100F03FF03F@2000-01-01 00:00:00+01
(1 row)

test=#


As can be seen above, it is not REALLY mandatory to have
shared_preload_libraries = 'postgis-3' but then the user is responsible for
issuing a query to load PostGIS (select st_point(1,1); above) and then she
is able to execute MobilityDB queries.

Thanks for your advice.


How to define template types in PostgreSQL

2023-01-07 Thread Esteban Zimanyi
Dear all

MobilityDB (https://github.com/MobilityDB/MobilityDB) defines at the C
level four template types: Set, Span, SpanSet, and Temporal. The type Set
is akin to PostgreSQL's ArrayType restricted to one dimension, but enforces
the constraint that sets do not have duplicates, the types Span and SpanSet
are akin to PostgreSQL's RangeType and MultirangeType but enforce the
constraints that span types are of fixed length and that empty spans and
infinite bounds are not allowed, and the typeTemporal is used to
manipulate time-varying values.

These template types need to be instantiated at the SQL level with base
types (int, bigint, float, timestamptz, text, ...) and because of this,
MobilityDB needs to define numerous SQL functions that all call the same
function in C. Taking as example the Set type, we need to define, e.g.,

CREATE FUNCTION intset_eq(intset, intset) RETURNS bool AS
'MODULE_PATHNAME', 'Set_eq' ...
CREATE FUNCTION bigintset_eq(bigintset, bigintset) RETURNS bool AS
'MODULE_PATHNAME', 'Set_eq' ...
CREATE FUNCTION floatset_eq(floatset, floatset) RETURNS bool AS
'MODULE_PATHNAME', 'Set_eq' ...
CREATE FUNCTION textset_eq(textset, textset) RETURNS bool AS
'MODULE_PATHNAME', 'Set_eq' ...
...

CREATE FUNCTION intset_ne(intset, intset) RETURNS bool AS
'MODULE_PATHNAME', 'Set_ne' ...
CREATE FUNCTION bigintset_ne(bigintset, bigintset) RETURNS bool AS
'MODULE_PATHNAME', 'Set_ne' ...
CREATE FUNCTION floatset_ne(floatset, floatset) RETURNS bool AS
'MODULE_PATHNAME', 'Set_ne' ...
CREATE FUNCTION textset_ne(textset, textset) RETURNS bool AS
'MODULE_PATHNAME', 'Set_ne' ...
...

In the case of arrays, ranges, and multiranges, PostgreSQL avoids this
redundancy using pseudo-types such as anyarray, anyrange, anymultirange, ...

Is there a possibility that we can also define pseudo types such as anyset,
anyspan, anyspanset, anytemporal,  ?

This will considerably reduce the number of SQL functions to define.
Currently, given the high number of functions in MobilityDB, creating the
extension takes a lng time 

Regards

Esteban


Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread Esteban Zimanyi
Dear David

There are two approaches for storing temporal information in a relational
database, explored since the 1980s following the work of Richard Snodgrass
http://www2.cs.arizona.edu/~rts/publications.html
tuple-timestamping vs attribute-timestamping.  The SQL standard used the
tuple-timestamping approach, but in MobilityDB we decided to use the
attribute-timestamping approach. As you rightly pointed out,
tuple-timestamping follows the traditional relational normalization theory.

The main advantage of the attribute timestamping for mobility data is that
we need only to store the changes of values for a temporal attribute. In
the example of gear for a car, even if we receive high-frequency
observations, there will be very little gear changes for a trip, while
there will be much more position changes. Therefore on MobilityDB we only
store the change of values  (e.g., no change of position will be stored
during a red light or traffic jam), which constitutes a huge lossless
compression with respect to the raw format storing every observation in a
single row. We have experimented 450% lossless compression for real IoT
data.

In addition, MobilityDB does all temporal operations and allows to
determine the value of any temporal attribute at any timestamp (e.g., using
linear interpolation between observations for speed or GPS position),
independently of the actual stored observations.

I hope this clarifies things a little.


Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread Esteban Zimanyi
Many thanks for your prompt reply David. Allow me then to restate the
questions, hoping that this better fits this mailing list.

MobilityDB is a time-series extension to PostgreSQL/PostGIS in which
time-varying attributes (e.g., gear, GPS location of a car) are
semantically grouped into "units" (e.g., a trip of a car) and are stored as
temporal functions, e.g., a set of couples (integer, timestamptz) for gear
(a temporal integer) or a set of triples (lon, lat, timestamptz) for the
GPS location (a temporal point). All temporal types are stored using
extended format, e.g.,
CREATE TYPE tint (
  internallength = variable,
  [...]
  storage = extended,
  alignment = double,
  [...]
);
When ingesting mobility (IoT) data into MobilityDB we receive very wide (2K
attributes) of high frequency (every tenth of a second) from flat format
(e.g. CSV) and we need to store it in PostgreSQL tables using MobilityDB
temporal types. In the above scenario, the values of these temporal types
can be very wide (on average 30K timestamped couples/triples per trip).

As suggested by David, this goes beyond the "traditional" usage of
PostgreSQL. Therefore my questions are
* What is the suggested strategy to splitting these 2K attributes into
vertically partitioned tables where the tables are linked by the primary
key (e.g. trip number in the example above). Are there any limitations/best
practices in the number/size of TOASTED attributes that a table should
contain.
* In each partitioned table containing N TOASTED attributes, given the
above requirements, are there any limitations/best practices in storing
them using extended storage or an alternative one such as external.

Many thanks for your insight

Esteban


Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread Esteban Zimanyi
Dear all

May I kindly ask your insight about a question I posted 1 month ago and for
which I never received any answer ?

Many thanks

On Thu, Jan 6, 2022 at 4:05 PM Esteban Zimanyi 
wrote:

> Dear all
>
> When ingesting mobility (IoT) data into MobilityDB
> https://mobilitydb.com/
> we transform very wide (2K attributes) car mobility data of high frequence
> (every tenth of a second) from flat format (e.g. CSV) into MobilityDB
> format in which there is a single record per trip and each of the signals
> is transformed into a temporal attribute (tbool, tint, tfloat, ttext,
> tgeompoint, tgeogpoint), which are temporal extensions of the corresponding
> PostgreSQL/PostGIS base types (bool, int, float, text, geometry,
> geography). All temporal types are stored using extended format, e.g.,
> CREATE TYPE tfloat (
>   internallength = variable,
>   [...]
>   storage = extended,
>   alignment = double,
>   [...]
> );
>
> Given that each temporal value can be very wide (on average 30K
> timestamped  points/floats/text/... per trip) our first question is
> * Is extended the right storage for this ?
>
> Our second question is how all the 2K temporal attributes are stored,
> which may be
> * on a single table space
> * in one table space per attribute
> which in other words, relates to the question row vs column storage.
>
> Many thanks for your insight
>
> Esteban
>
>


Storage for multiple variable-length attributes in a single row

2022-01-06 Thread Esteban Zimanyi
Dear all

When ingesting mobility (IoT) data into MobilityDB
https://mobilitydb.com/
we transform very wide (2K attributes) car mobility data of high frequence
(every tenth of a second) from flat format (e.g. CSV) into MobilityDB
format in which there is a single record per trip and each of the signals
is transformed into a temporal attribute (tbool, tint, tfloat, ttext,
tgeompoint, tgeogpoint), which are temporal extensions of the corresponding
PostgreSQL/PostGIS base types (bool, int, float, text, geometry,
geography). All temporal types are stored using extended format, e.g.,
CREATE TYPE tfloat (
  internallength = variable,
  [...]
  storage = extended,
  alignment = double,
  [...]
);

Given that each temporal value can be very wide (on average 30K
timestamped  points/floats/text/... per trip) our first question is
* Is extended the right storage for this ?

Our second question is how all the 2K temporal attributes are stored, which
may be
* on a single table space
* in one table space per attribute
which in other words, relates to the question row vs column storage.

Many thanks for your insight

Esteban


Fwd: Problem with Unix sockets when porting MobilityDB for Windows

2021-09-05 Thread Esteban Zimanyi
Windows 10 supports Unix sockets as reported, e.g., here
https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/

We run the tests on MobilityDB using an ephemeral instance that is created
by the test suite and torn down afterwards.
https://github.com/MobilityDB/MobilityDB/blob/develop/test/scripts/test.sh
For this we use Unix sockets and thus the pg_ctl command is configured as
follows

PGCTL="${BIN_DIR}/pg_ctl -w -D ${DBDIR} -l ${WORKDIR}/log/postgres.log -o
-k -o ${WORKDIR}/lock -o -h -o ''"

The log file reports things are working as expected

2021-09-05 14:10:53.366 CEST [32170] LOG:  starting PostgreSQL 13.3 on
x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0,
64-bit
2021-09-05 14:10:53.372 CEST [32170] LOG:  listening on Unix socket
"/home/esteban/src/MobilityDB/build/tmptest/lock/.s.PGSQL.5432"
2021-09-05 14:10:53.394 CEST [32171] LOG:  database system was shut down at
2021-09-05 14:10:52 CEST
2021-09-05 14:10:53.412 CEST [32170] LOG:  database system is ready to
accept connections

We are trying to port MobilityDB on Windows using msys2. In this case the
above command does not work as reported in the corresponding log

2021-09-05 14:34:10.553 CEST [19060] LOG:  starting PostgreSQL 13.4 on
x86_64-w64-mingw32, compiled by gcc.exe (Rev5, Built by MSYS2 project)
10.3.0, 64-bit
2021-09-05 14:34:10.558 CEST [19060] LOG:  could not translate host name
"''", service "5432" to address: Unknown host
2021-09-05 14:34:10.558 CEST [19060] WARNING:  could not create listen
socket for "''"
2021-09-05 14:34:10.558 CEST [19060] FATAL:  could not create any TCP/IP
sockets
2021-09-05 14:34:10.560 CEST [19060] LOG:  database system is shut down

Any ideas on how to solve this ?

Esteban


Regression tests for MobilityDB: Continous shutdowns at a random step

2021-08-25 Thread Esteban Zimanyi
Hello

While executing the regression tests for MobilityDB I load a predefined
database on which I run the tests and then compare the results obtained
with those expected. All the tests are driven by the following bash file
https://github.com/MobilityDB/MobilityDB/blob/develop/test/scripts/test.sh

However, I continuously receive at a random step in the process the
following error in the log file

2021-08-25 16:48:13.608 CEST [22375] LOG:  received fast shutdown request
2021-08-25 16:48:13.622 CEST [22375] LOG:  aborting any active transactions
2021-08-25 16:48:13.622 CEST [22375] LOG:  background worker "logical
replication launcher" (PID 22382) exited with exit code 1
2021-08-25 16:48:13.623 CEST [22377] LOG:  shutting down
2021-08-25 16:48:13.971 CEST [22375] LOG:  database system is shut down

and sometimes I need to relaunch *numerous* times the whole build process
in CMake
https://github.com/MobilityDB/MobilityDB/blob/develop/CMakeLists.txt
to finalize the tests

/* While on MobilityDB/build directory */
rm -rf *
cmake ..
make
make test

Any idea where I can begin looking at the problem ?

Thanks for your help

Esteban


Re: How to disable the autovacuum ?

2021-05-31 Thread Esteban Zimanyi
Many thanks Tom for your feedback. I appreciate it.

Actually the tests work in parallel with autovacuum, I just wanted to
minimize the test time since the autovacuum launches in the middle of the
many regression and robustness tests. But then I follow your advice.

Regards

Esteban


On Mon, May 31, 2021 at 3:49 PM Tom Lane  wrote:

> Esteban Zimanyi  writes:
> > Any idea how to disable the autovacuum during the regression and coverage
> > tests for the MobilityDB extension ?
>
> TBH, this seems like a pretty bad idea.  If your extension doesn't
> behave stably with autovacuum it's not going to be much use in the
> real world.
>
> In the core tests, we sometimes disable autovac for individual
> tables using a per-table storage option, but that's a last resort.
>
> regards, tom lane
>


Re: How to disable the autovacuum ?

2021-05-31 Thread Esteban Zimanyi
Dear Christoph

Many thanks for your prompt reply !

Is there a step-by-step procedure specified somewhere?

For example, before launching the tests there is a load.sql file that loads
all the test tables. The file starts as follows

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;
SET default_tablespace = '';
SET default_with_oids = false;

--
-- Name: tbl_tbool; Type: TABLE; Schema: public; Owner: -
--

DROP TABLE IF EXISTS public.tbl_tbool;
CREATE TABLE public.tbl_tbool (
k integer,
temp tbool
);
ALTER TABLE tbl_tbool SET (autovacuum_enabled = false);

[... many more table definitions added after which the load of these tables
starts ...]

COPY public.tbl_tbool (k,temp) FROM stdin;
1  f@2001-05-31 20:25:00+02
2  f@2001-06-13 00:50:00+02
[...]
\.

[... load of the other tables ...]

I wonder whether this is the best way to do it, or whether it is better to
disable the autovacuum at the beginning for all the tests

Thanks for your help !

On Mon, May 31, 2021 at 10:47 AM Christoph Moench-Tegeder <
c...@burggraben.net> wrote:

> ## Esteban Zimanyi (ezima...@ulb.ac.be):
>
> > I have tried
> > alter system set autovacuum = off;
> > but it does not seem to work.
>
> Did you reload the configuration ("SELECT pg_reload_conf()" etc) after
> that? If not, that's your problem right there.
>
> Regards,
> Christoph
>
> --
> Spare Space
>


How to disable the autovacuum ?

2021-05-31 Thread Esteban Zimanyi
Dear all

Any idea how to disable the autovacuum during the regression and coverage
tests for the MobilityDB extension ?

I have tried
alter system set autovacuum = off;
but it does not seem to work.

Any suggestions are much appreciated.

Esteban


Re: How to launch parallel aggregations ?

2021-05-18 Thread Esteban Zimanyi
Thanks a lot! It works!

On Tue, May 18, 2021 at 11:15 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Tue, May 18, 2021 at 2:32 PM Esteban Zimanyi 
> wrote:
> >
> > Dear all
> >
> > In MobilityDB we have defined parallel aggregations with a combine
> function, e.g.,
> >
> > CREATE AGGREGATE extent(tbox) (
> >   SFUNC = tbox_extent_transfn,
> >   STYPE = tbox,
> >   COMBINEFUNC = tbox_extent_combinefn,
> >   PARALLEL = safe
> > );
> >
> > We would like to trigger the combine functions in the coverage tests but
> for this it is required that the tables are VERY big. In particular for the
> above aggregation, the combine function only is triggered when the table
> has more than 300K rows.
> >
> > As it is not very effective to have such a big table in the test
> database used for the regression and the coverage tests I wonder whether it
> is possible to set some parameters to launch the combine functions with
> tables of, e.g., 10K rows, which are the bigger tables in our regression
> test database.
> >
> > Many thanks for your insights !
>
> You could do something like below, just before your test:
>
> -- encourage use of parallel plans
> set parallel_setup_cost=0;
> set parallel_tuple_cost=0;
> set min_parallel_table_scan_size=0;
> set max_parallel_workers_per_gather=2;
>
> And after the test you can reset all of the above parameters.
>
> Hope that helps!
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


How to launch parallel aggregations ?

2021-05-18 Thread Esteban Zimanyi
Dear all

In MobilityDB we have defined parallel aggregations with a combine
function, e.g.,

CREATE AGGREGATE extent(tbox) (
  SFUNC = tbox_extent_transfn,
  STYPE = tbox,
  COMBINEFUNC = tbox_extent_combinefn,
  PARALLEL = safe
);

We would like to trigger the combine functions in the coverage tests but
for this it is required that the tables are VERY big. In particular for the
above aggregation, the combine function only is triggered when the table
has more than 300K rows.

As it is not very effective to have such a big table in the test database
used for the regression and the coverage tests I wonder whether it is
possible to set some parameters to launch the combine functions with tables
of, e.g., 10K rows, which are the bigger tables in our regression test
database.

Many thanks for your insights !

Esteban


Re: Error when defining a set returning function

2021-04-17 Thread Esteban Zimanyi
> If you build with pgxs it should supply the appropriate compiler flags.
> Alternatively, get the right settings from pg_config. In general rolling
> your own is a bad idea.
>

I didn't know about pgxs. Many thanks Andrew for pointing this out.


Re: Error when defining a set returning function

2021-04-16 Thread Esteban Zimanyi
Many thanks Tom for your help !

I removed the flag -fshort-enums and everything works fine !

On Fri, Apr 16, 2021 at 7:04 PM Tom Lane  wrote:

> Esteban Zimanyi  writes:
> > When debugging the function with gdb, I noticed that the rsinfo variable
> of
> > the PostgreSQL function ExecMakeFunctionResultSet  is modified in the
> > macro  SRF_RETURN_NEXT causing the problem. Any idea how to solve this?
>
> Well, what SRF_RETURN_NEXT thinks it's doing is
>
> rsi->isDone = ExprMultipleResult; \
>
> which surely shouldn't change the returnMode field.  At this point
> I'm guessing that you are compiling the PG headers with some compiler
> pragma that changes the struct packing rules.  Don't do that.
>
> regards, tom lane
>


Re: Error when defining a set returning function

2021-04-16 Thread Esteban Zimanyi
Dear Tom

Many thanks for asking my question so quickly. After your answer, I
downloaded brand new versions of PostgreSQL 13.2, PostGIS 2.5.5, and
compiled/installed with the standard parameters. I didn't get any error
messages in the build. I then recompiled again MobilityDB and got the same
error message.

When debugging the function with gdb, I noticed that the rsinfo variable of
the PostgreSQL function ExecMakeFunctionResultSet  is modified in the
macro  SRF_RETURN_NEXT causing the problem. Any idea how to solve this?

4353  SRF_RETURN_NEXT(funcctx, Int32GetDatum(result));
(gdb) up
#1  0x55b8a871fc56 in ExecMakeFunctionResultSet (fcache=0x55b8a8e6d9a0,
econtext=0x55b8a8e6cfa0,
argContext=0x55b8a9d00dd0, isNull=0x55b8a8e6d930, isDone=0x55b8a8e6d988)
at
/home/esteban/src/postgresql-13.2/build_dir/../src/backend/executor/execSRF.c:614
614 result = FunctionCallInvoke(fcinfo);
(gdb) p rsinfo
$5 = {type = T_ReturnSetInfo, econtext = 0x55b8a8e6cfa0, expectedDesc =
0x55b8a8e6e8f0, allowedModes = 3,
  returnMode = SFRM_ValuePerCall, isDone = ExprSingleResult, setResult =
0x0, setDesc = 0x0}
(gdb) n
4354}
(gdb)
ExecMakeFunctionResultSet (fcache=0x55b8a8e6d9a0, econtext=0x55b8a8e6cfa0,
argContext=0x55b8a9d00dd0,
isNull=0x55b8a8e6d930, isDone=0x55b8a8e6d988)
at
/home/esteban/src/postgresql-13.2/build_dir/../src/backend/executor/execSRF.c:615
615 *isNull = fcinfo->isnull;
(gdb) p rsinfo
$6 = {type = T_ReturnSetInfo, econtext = 0x55b8a8e6cfa0, expectedDesc =
0x55b8a8e6e8f0, allowedModes = 3,
  returnMode = (SFRM_ValuePerCall | unknown: 256), isDone =
ExprSingleResult, setResult = 0x0, setDesc = 0x0}
(gdb)


Error when defining a set returning function

2021-04-16 Thread Esteban Zimanyi
Dear all

Since I was receiving an error when defining a set returning function, I
borrowed a function from PostgreSQL as follows

/* C definition */
typedef struct testState
{
  int current;
  int finish;
  int step;
} testState;

/**
* test_srf(startval int, endval int, step int)
*/
PG_FUNCTION_INFO_V1(test_srf);
Datum test_srf(PG_FUNCTION_ARGS)
{
  FuncCallContext *funcctx;
  testState *fctx;
  int result; /* the actual return value */

  if (SRF_IS_FIRSTCALL())
  {
/* Get input values */
int start = PG_GETARG_INT32(0);
int finish = PG_GETARG_INT32(1);
int step = PG_GETARG_INT32(2);
MemoryContext oldcontext;

/* create a function context for cross-call persistence */
funcctx = SRF_FIRSTCALL_INIT();

/* switch to memory context appropriate for multiple function calls */
oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);

/* quick opt-out if we get nonsensical inputs  */
if (step <= 0 || start == finish)
{
  funcctx = SRF_PERCALL_SETUP();
  SRF_RETURN_DONE(funcctx);
}

/* allocate memory for function context */
fctx = (testState *) palloc0(sizeof(testState));
fctx->current = start;
fctx->finish = finish;
fctx->step = step;

funcctx->user_fctx = fctx;
MemoryContextSwitchTo(oldcontext);
  }

  /* stuff done on every call of the function */
  funcctx = SRF_PERCALL_SETUP();

  /* get state */
  fctx = funcctx->user_fctx;

  result = fctx->current;
  fctx->current += fctx->step;
  /* Stop when we have generated all values */
  if (fctx->current > fctx->finish)
  {
SRF_RETURN_DONE(funcctx);
  }

  SRF_RETURN_NEXT(funcctx, Int32GetDatum(result));
}

/* SQL definition */
CREATE OR REPLACE FUNCTION testSRF(startval int, endval int, step int)
  RETURNS SETOF integer
  AS 'MODULE_PATHNAME', 'test_srf'
  LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

When I execute this function I obtain

select testSRF(1,10, 2);
ERROR:  unrecognized table-function returnMode: 257

select version();
 PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit

Any idea what could be wrong ?

Thanks for your help

Esteban


Adding constructors for path and polygon geometric types

2020-09-28 Thread Esteban Zimanyi
Dear all

Since always I have used geometric types as a great pedagogical example in
order to understand how indexes work in PostgreSQL so that I can program
our indexes in MobilityDB
https://github.com/MobilityDB/MobilityDB
However, this requires to create tables with a big number of geometries.
Currently there are SQL constructors for all types with the exception of
path and polygon. The current input format with strings is definitely not
ideal for this purpose. I created constructors for these types, which are
exemplified next.

SELECT path(ARRAY[point(1,2),point(3,4),point(5,6)]);
SELECT polygon(ARRAY[point(1,2),point(3,4),point(5,6),point(1,2)]);

>From this I can create random generators such as

CREATE OR REPLACE FUNCTION random_polygon(low float, high float, maxcard
int)
  RETURNS polygon AS $$
DECLARE
  ptarr point[];
BEGIN
  SELECT array_agg(random_point(low, high)) INTO ptarr
  FROM generate_series (1, random_int(1, maxcard)) AS x;
  ptarr = array_append(ptarr, ptarr[1]);
  RETURN polygon(ptarr);
END;
$$ LANGUAGE 'plpgsql' STRICT;

which allows me to generate random tables of arbitrary number of rows such
as

DROP TABLE IF EXISTS tbl_polygon;
CREATE TABLE tbl_polygon AS
SELECT k, random_polygon(1, 100, 10) AS poly
FROM generate_series(1, 1e5) AS k;

Now I can analyze, e.g., how indexes work for KNN queries

CREATE INDEX tbl_polygon_spgist_idx ON tbl_polygon USING spgist(poly);
EXPLAIN SELECT t1.k FROM tbl_polygon t1
ORDER BY t1.poly <-> point(1, 1) LIMIT 3;

 QUERY PLAN
-
 Limit  (cost=0.28..0.70 rows=3 width=14)
   ->  Index Scan using tbl_polygon_spgist_idx on tbl_polygon t1
 (cost=0.28..13932.28 rows=10 width=14)
 Order By: (poly <-> '(1,1)'::point)
(3 rows)

I wanted to create a patch with these constructors but I am unable to do it
due to a problem in the bootstrap phase that I do not understand.

I have simply added the definition of the path constructor in the file
geo_ops.c as follows

/**
 * PATH Constructor
 */
Datum
path_constructor(PG_FUNCTION_ARGS)
{
ArrayType *array = PG_GETARG_ARRAYTYPE_P(0);
...
PG_RETURN_PATH_P(path);
}

and added to the pg_proc.dat file the following

# additional constructors for geometric types
{ oid => '4226', descr => 'convert array of points to path',
  proname => 'path', prorettype => 'path', proargtypes => '_point',
  prosrc => 'path_constructor' },

It compiled and installed without any problem but when executing initdb I
get the following error.

esteban@ESTEBAN-WORKSTATION:~/src/postgresql-13.0-geo$
/usr/local/pgsql/13/bin/initdb -D /usr/local/pgsql/13/data
The files belonging to this database system will be owned by user "esteban".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /usr/local/pgsql/13/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Europe/Brussels
creating configuration files ... ok
running bootstrap script ... 2020-09-28 15:00:57.961 CEST [647] FATAL:
 invalid input syntax for type oid: "path"
2020-09-28 15:00:57.961 CEST [647] PANIC:  cannot abort transaction 1, it
was already committed
Aborted
child process exited with exit code 134
initdb: removing contents of data directory "/usr/local/pgsql/13/data"
esteban@ESTEBAN-WORKSTATION:~/src/postgresql-13.0-geo$

Any idea how to solve this ?

Esteban


Re: Fwd: Extending range type operators to cope with elements

2020-09-27 Thread Esteban Zimanyi
Dear all

After a long time (as you can imagine, this year everything has been upside
down ...), you will find enclosed the patch for extending the range
operators so they can cope with range  element and element  range
in addition to the existing range  range.

Best regards

Esteban


Prof. Esteban Zimanyi
Department of Computer & Decision Engineering  (CoDE) CP 165/15
Universite Libre de Bruxelles
Avenue F. D. Roosevelt 50
B-1050 Brussels, Belgium
fax: + 32.2.650.47.13
tel: + 32.2.650.31.85
e-mail: ezima...@ulb.ac.be
Internet: http://cs.ulb.ac.be/members/esteban/


On Tue, Sep 17, 2019 at 5:18 AM David Fetter  wrote:

> On Sun, Sep 15, 2019 at 04:30:52PM +0200, Esteban Zimanyi wrote:
> > > So yes, I've had a need for those operators in the past. What I don't
> > know is whether adding these functions will be worth the catalog clutter.
> >
> > The operators are tested and running within MobilityDB. It concerns lines
> > 231-657 for the C code in file
> >
> https://github.com/MobilityDB/MobilityDB/blob/master/src/rangetypes_ext.c
> <https://github.com/ULB-CoDE-WIT/MobilityDB/blob/master/src/rangetypes_ext.c>
> >
> > and lines 32-248 for the SQL code in file
> >
> https://github.com/ULB-CoDE-WIT/MobilityDB/blob/master/src/sql/07_rangetypes_ext.in.sql
> >
> > Since you don't really use PR, please let me know whether I can be of
> > any help.
>
> It's not done by pull request at this time. Instead, it is done by sending
> patches to this mailing list.
>
> http://wiki.postgresql.org/wiki/Development_information
> http://wiki.postgresql.org/wiki/Submitting_a_Patch
> https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F
> http://www.interdb.jp/pg/
>
> Best,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>


range-ext.patch
Description: Binary data


Pedagogical example for KNN usage in GiST indexes?

2020-09-27 Thread Esteban Zimanyi
Dear all

Is there a pedagogical example showing KNN processing in GiST indexes ? I
am looking for something equivalent to the example in file geo_spgist.c.

Thanks for your help

Esteban


Making the function range_union_internal available to other extensions

2020-08-17 Thread Esteban Zimanyi
Dear all

In MobilityDB
https://github.com/MobilityDB/MobilityDB
we use extensively the range types.

Is there any possibility to make the function range_union_internal available to
use by other extensions ? Otherwise we need to copy/paste it verbatim. For
example lines 114-153 in
https://github.com/MobilityDB/MobilityDB/blob/develop/src/rangetypes_ext.c

Regards

Esteban


Re: Fwd: Extending range type operators to cope with elements

2019-09-21 Thread Esteban Zimanyi
On Tue, Sep 17, 2019 at 5:18 AM David Fetter  wrote:
> It's not done by pull request at this time. Instead, it is done by sending
> patches to this mailing list.

Dear all

You will find enclosed the patch that extends the range type operators so
they cope with elements.

Any comments most welcome.

Esteban
diff -urdN postgresql-11.5-orig/doc/src/sgml/func.sgml postgresql-11.5-ranges/doc/src/sgml/func.sgml
--- postgresql-11.5-orig/doc/src/sgml/func.sgml	2019-09-21 11:28:11.836309263 +0200
+++ postgresql-11.5-ranges/doc/src/sgml/func.sgml	2019-09-21 10:32:53.320004000 +0200
@@ -13228,6 +13228,20 @@

 

+  
+strictly left of element
+int8range(1,10)  100
+t
+   
+
+   
+  
+element strictly left of
+10  int8range(100,110)
+t
+   
+
+   
   
 strictly right of
 int8range(50,60)  int8range(20,30)
@@ -13235,6 +13249,20 @@

 

+  
+strictly right of element
+int8range(50,60)  20
+t
+   
+
+   
+  
+element strictly right of
+50  int8range(20,30)
+t
+   
+
+   
   
 does not extend to the right of
 int8range(1,20)  int8range(18,20)
@@ -13242,6 +13270,20 @@

 

+  
+does not extend to the right of element
+int8range(1,20)  20
+t
+   
+
+   
+  
+element does not extend to the right of
+19  int8range(18,20)
+t
+   
+
+   
   
 does not extend to the left of
 int8range(7,20)  int8range(5,10)
@@ -13249,12 +13291,40 @@

 

+  
+does not extend to the left of element
+int8range(7,20)  5
+t
+   
+
+   
+  
+element does not extend to the left of
+7  int8range(5,10)
+t
+   
+
+   
  -|- 
 is adjacent to
 numrange(1.1,2.2) -|- numrange(2.2,3.3)
 t

 
+   
+ -|- 
+is adjacent to element
+numrange(1.1,2.2) -|- 2.2
+t
+   
+
+   
+ -|- 
+element is adjacent to
+2.2 -|- numrange(2.2,3.3, '()')
+t
+   
+

  + 
 union
diff -urdN postgresql-11.5-orig/src/backend/utils/adt/rangetypes.c postgresql-11.5-ranges/src/backend/utils/adt/rangetypes.c
--- postgresql-11.5-orig/src/backend/utils/adt/rangetypes.c	2019-09-21 11:28:11.628205263 +0200
+++ postgresql-11.5-ranges/src/backend/utils/adt/rangetypes.c	2019-09-21 10:32:53.320004000 +0200
@@ -548,6 +548,322 @@
 	PG_RETURN_BOOL(range_contains_elem_internal(typcache, r, val));
 }
 
+/* strictly left of element? (internal version) */
+bool
+range_before_elem_internal(TypeCacheEntry *typcache, RangeType *r, Datum val)
+{
+	RangeBound	lower,
+upper;
+	bool		empty;
+	int32		cmp;
+
+	range_deserialize(typcache, r, , , );
+
+	/* An empty range is neither left nor right any other range */
+	if (empty)
+		return false;
+
+	if (!upper.infinite)
+	{
+		cmp = DatumGetInt32(FunctionCall2Coll(>rng_cmp_proc_finfo,
+			  typcache->rng_collation,
+			  upper.val, val));
+		if (cmp < 0 ||
+			(cmp == 0 && !upper.inclusive))
+			return true;
+	}
+
+	return false;
+}
+
+/* strictly left of element? */
+Datum
+range_before_elem(PG_FUNCTION_ARGS)
+{
+	RangeType  *r = PG_GETARG_RANGE_P(0);
+	Datum		val = PG_GETARG_DATUM(1);
+	TypeCacheEntry *typcache;
+
+	typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r));
+
+	PG_RETURN_BOOL(range_before_elem_internal(typcache, r, val));
+}
+
+/* does not extend to right of element? (internal version) */
+bool
+range_overleft_elem_internal(TypeCacheEntry *typcache, RangeType *r, Datum val)
+{
+	RangeBound	lower,
+upper;
+	bool		empty;
+
+	range_deserialize(typcache, r, , , );
+
+	/* An empty range is neither left nor right any element */
+	if (empty)
+		return false;
+
+	if (!upper.infinite)
+	{
+		if (DatumGetInt32(FunctionCall2Coll(>rng_cmp_proc_finfo,
+			typcache->rng_collation,
+			upper.val, val)) <= 0)
+			return true;
+	}
+
+	return false;
+}
+
+/* does not extend to right of element? */
+Datum
+range_overleft_elem(PG_FUNCTION_ARGS)
+{
+	RangeType  *r = PG_GETARG_RANGE_P(0);
+	Datum		val = PG_GETARG_DATUM(1);
+	TypeCacheEntry *typcache;
+
+	typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r));
+
+	PG_RETURN_BOOL(range_overleft_elem_internal(typcache, r, val));
+}
+
+/* strictly right of element? (internal version) */
+bool
+range_after_elem_internal(TypeCacheEntry *typcache, RangeType *r, Datum val)
+{
+	RangeBound	lower,
+upper;
+	bool		empty;
+	int32		cmp;
+
+	range_deserialize(typcache, r, , , );
+
+	/* An empty range is neither left nor right any other range */
+	if (empty)
+		return false;
+
+	if (!lower.infinite)
+	{
+		cmp = DatumGetInt32(FunctionCall2Coll(>rng_cmp_proc_finfo,
+			  typcache->rng_collation,
+	

Re: Extending range type operators to cope with elements

2019-09-16 Thread Esteban Zimanyi
>
>
> So yes, I've had a need for those operators in the past. What I don't know
> is whether adding these functions will be worth the catalog clutter.
>

The operators are tested and running within MobilityDB. It concerns lines
231-657 for the C code in file
https://github.com/ULB-CoDE-WIT/MobilityDB/blob/master/src/rangetypes_ext.c

and lines 32-248 for the SQL code in file
https://github.com/ULB-CoDE-WIT/MobilityDB/blob/master/src/sql/07_rangetypes_ext.in.sql


Since you don't really use PR, please let me know whether I can be of
any help.

Regards

Esteban

-- 
----
Prof. Esteban Zimanyi
Department of Computer & Decision Engineering  (CoDE) CP 165/15
Universite Libre de Bruxelles
Avenue F. D. Roosevelt 50
B-1050 Brussels, Belgium
fax: + 32.2.650.47.13
tel: + 32.2.650.31.85
e-mail: ezima...@ulb.ac.be
Internet: http://code.ulb.ac.be/



Fwd: Extending range type operators to cope with elements

2019-09-15 Thread Esteban Zimanyi
> So yes, I've had a need for those operators in the past. What I don't
know is whether adding these functions will be worth the catalog clutter.

The operators are tested and running within MobilityDB. It concerns lines
231-657 for the C code in file
https://github.com/ULB-CoDE-WIT/MobilityDB/blob/master/src/rangetypes_ext.c

and lines 32-248 for the SQL code in file
https://github.com/ULB-CoDE-WIT/MobilityDB/blob/master/src/sql/07_rangetypes_ext.in.sql


Since you don't really use PR, please let me know whether I can be of
any help.

Regards
Esteban

-- 

Prof. Esteban Zimanyi
Department of Computer & Decision Engineering  (CoDE) CP 165/15
Universite Libre de Bruxelles
Avenue F. D. Roosevelt 50
B-1050 Brussels, Belgium
fax: + 32.2.650.47.13
tel: + 32.2.650.31.85
e-mail: ezima...@ulb.ac.be
Internet: http://code.ulb.ac.be/



Extending range type operators to cope with elements

2019-09-13 Thread Esteban Zimanyi
Dear all

While developing MobilityDB we needed to extend the range type operators so
they cope with elements. In the same way that currently the range types
support both
- @> contains range/element
- <@ element/range is contained by
we extended the left (<<), overleft (&<), right (>>), and overright (&>)
operators so they can cope with both elements and ranges at the left- or
right-hand side. These can be seen in github
https://github.com/ULB-CoDE-WIT/MobilityDB/blob/master/src/rangetypes_ext.c

If you think that these extensions could be useful for the community at
large, I can prepare a PR. Please let me know.

Esteban


Re: Specifying attribute slot for storing/reading statistics

2019-09-12 Thread Esteban Zimanyi
>
> So these are 4 different data types (or classes of data types) that you
> introduce in your extension? Or is that just a conceptual view and it's
> stored in some other way (e.g. normalized in some way)?
>

At the SQL level these 4 durations are not distinguishable. For example for
a tfloat (temporal float) we can have

select tfloat '1@2000-01-01' -- Instant duration
select tfloat '{1@2000-01-01 , 2@2000-01-02 , 1@2000-01-03}' -- Instant set
duration
select tfloat '[1@2000-01-01, 2@2000-01-02 , 1@2000-01-03)' -- Sequence
duration, left-inclusive and right-exclusive bound,
select tfloat {'[1@2000-01-01, 2@2000-01-02 , 1@2000-01-03], '[1@2000-01-04,
1@2000-01-05]}  ' -- Sequence set duration

Nevertheless it is possible to restrict a column to a specific duration
with a typymod specifier as in

create table test ( ..., measure tfloat(Instant) -- only Instant durations
accepted, ...)

At the C level these 4 durations are distinguished and implement in
something equivalent to a template abstract class Temporal with four
subclasses TemporalInst, TemporalI, TemporalSeq, and TemporalS. Indeed the
algorithms for manipulating these 4 durations are completely different.
They are called template classes since they keep the Oid of the base type
(float for tfloat or geometry for tgeompoint) in the same way array or
ranges do.

For more information please refer to the manual at github
https://github.com/ULB-CoDE-WIT/MobilityDB/


> I don't think we're strongly against changing the code to allow this, as
> long as it does not break existing extensions/code (unnecessarily).
>
> >If you want I can prepare a PR in order to understand the implications of
> >these changes. Please let me know.
> >
>
> I think having an actual patch to look at would be helpful.
>

I am preparing a first patch for the files selfuncs.h and selfunc.c and
thus for instant duration selectivity. It basically
1) Moves some prototypes of the static functions from the .c to the .h file
so that the functions are exported.
2) Passes the operator from the top level functions to the inner functions
such as mcv_selectivity or ineq_histogram_selectivity.

This allows me to call the functions twice, once for the value component
and another for the time component, e.g. as follows.

else if (cachedOp == CONTAINED_OP || cachedOp == OVERLAPS_OP)
{
/* Enable the addition of the selectivity of the value and time
 * dimensions since either may be missing */
int selec_value = 1.0, selec_time = 1.0;

/* Selectivity for the value dimension */
if (MOBDB_FLAGS_GET_X(box->flags))
{
operator = oper_oid(LT_OP, valuetypid, valuetypid);
selec_value = scalarineqsel(root, operator, false, false
, vardata,
Float8GetDatum(box->xmin), valuetypid);
operator = oper_oid(GT_OP, valuetypid, valuetypid);
selec_value += scalarineqsel(root, operator, true, false
, vardata,
Float8GetDatum(box->xmax), valuetypid);
selec_value = 1 - selec_value;
}
/* Selectivity for the time dimension */
if (MOBDB_FLAGS_GET_T(box->flags))
{
operator = oper_oid(LT_OP, T_TIMESTAMPTZ, T_TIMESTAMPTZ);
selec_time = scalarineqsel(root, operator, false, false
, vardata,
TimestampTzGetDatum(box->tmin), TIMESTAMPTZOID);
operator = oper_oid(GT_OP, T_TIMESTAMPTZ, T_TIMESTAMPTZ);
selec_time += scalarineqsel(root, operator, true, false
, vardata,
TimestampTzGetDatum(box->tmax), TIMESTAMPTZOID);
selec_time = 1 - selec_time;
}
selec = selec_value * selec_time;
}

Regards

Esteban


Re: Specifying attribute slot for storing/reading statistics

2019-09-06 Thread Esteban Zimanyi
Dear Tom

Many thanks for your quick reply. Indeed both solutions you proposed can be
combined together in order to solve all the problems. However changes in
the code are needed. Let me now elaborate on the solution concerning the
combination of stakind/staop first and I will elaborate on adding a new
kind identifier after.

In order to understand the setting, let me explain a little more about the
different kinds of temporal types. As explained in my previous email these
are types whose values are composed of elements v@t where v is a
PostgreSQL/PostGIS type (float or geometry) and t is a TimestampTz. There
are four kinds of temporal types, depending on the their duration
* Instant: Values of the form v@t. These are used for example to represent
car accidents as in Point(0 0)@2000-01-01 08:30
* InstantSet: A set of values {v1@t1, , vn@tn} where the values between
the points are unknown. These are used for example to represent checkins in
FourSquare or RFID readings
* Sequence: A sequence of values [v1@t1, , vn@tn] where the values
between two successive instants vi@ti vj@tj are (linearly) interpolated.
These are used to represent for example GPS tracks.
* SequenceSet: A set of sequences {s1, ... , sn} where there is a temporal
gap between them. These are used to represent for example GPS tracks where
the signal was lost during a time period.

To compute the selectivity of temporal types we assume that time and space
dimensions are independent and thus we can reuse all existing analyze and
selectivity infrastructure in PostgreSQL/PostGIS. For the various durations
this amounts to
* Instant: Use the functions in analyze.c and selfuncs.c independently for
the value and time dimensions
* InstantSet: Use the functions in array_typanalyze.c, array_selfuncs.c
independently for the value and time dimensions
* Sequence and SequenceSet: To simplify, we do not take into account the
gaps, and thus use the functions in rangetypes_typanalyze.c,
rangetypes_selfuncs.c independently for the value and time dimensions

However, this requires that the analyze and selectivity functions in all
the above files satisfy the following
* Set the staop when computing statistics. For example in
rangetypes_typanalyze.c the staop is set for
STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM but not for
STATISTIC_KIND_BOUNDS_HISTOGRAM
* Always call get_attstatsslot with the operator Oid not with InvalidOid.
For example, from the 17 times this function is called in selfuncs.c only
two are passed with an operator. This also requires to pass the operator as
an additional parameter to several functions. For example, the operator
should be passed to the function ineq_histogram_selectivity in selfuncs.c
* Export several top-level functions which are currently static. For
example, var_eq_const, ineq_histogram_selectivity, eqjoinsel_inner and
several others in the file selfuncs.c should be exported.

That would solve all the problems excepted for
STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM, since in this case the staop will
always be Float8LessOperator, independently of whether we are computing
lengths of value ranges or of tstzranges. This could be solved by using a
different stakind for the value and time dimensions.

If you want I can prepare a PR in order to understand the implications of
these changes. Please let me know.

Esteban


On Thu, Sep 5, 2019 at 5:11 PM Tom Lane  wrote:

> Esteban Zimanyi  writes:
> > We are developing the analyze/selectivity functions for those types. Our
> > approach is to use the standard PostgreSQL/PostGIS functions for the
> value
> > and the time dimensions where the slots starting from 0 will be used for
> > the value dimension, and the slots starting from 2 will be used for the
> > time dimension. For example, for tfloat we use range_typanalyze and
> related
> > functions for
> > * collecting in slots 0 and 1, STATISTIC_KIND_BOUNDS_HISTOGRAM
> > and STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for the float ranges of the
> value
> > dimension
> > *  collecting in slots 2 and 3, STATISTIC_KIND_BOUNDS_HISTOGRAM
> > and STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for the periods (similar to
> > tstzranges) of the time dimension
>
> IMO this is fundamentally wrong, or at least contrary to the design
> of pg_statistic.  It is not supposed to matter which "slot" a given
> statistic type is actually stored in; rather, readers are supposed to
> search for the desired statistic type using the stakindN, staopN and
> (if relevant) stacollN fields.
>
> In this case it seems like it'd be reasonable to rely on the staop
> fields to distinguish between the value and time dimensions, since
> (IIUC) they're of different types.
>
> Another idea is to invent your own slot kind identifiers instead of
> using built-in ones.  I'm not sure that there's any point in using
> the built-in kind values, since (a) 

Specifying attribute slot for storing/reading statistics

2019-09-05 Thread Esteban Zimanyi
Dear all

We are developing MobilityDB, an open source PostgreSQL/PostGIS extension
that provides temporal and spatio-temporal types. The source code, manuals,
and related publications are available at the address
https://github.com/ULB-CoDE-WIT/MobilityDB/
<https://github.com/ULB-CoDE-WIT/MobilityDB/tree/stats>

In MobilityDB temporal types are types derived from PostgreSQL/PostGIS
types to which a time dimension is added. MobilityDB provides the following
temporal types: tbool (temporal boolean), tint (temporal int), tfloat
(temporal float), text (temporal text), tgeompoint (temporal geometric
points) and tgeogpoint (temporal geographic points). For example, we can
define a tfloat and a tgeompoint as follows

SELECT tfloat '[1.5@2000-01-01, 2.5@2000-01-02, 1.5@2000-01-03]';
SELECT tgeompoint '[Point(0 0)@2000-01-01 08:00, Point(1 0)@2000-01-02
08:05, Point(1 1)@2000-01-03 08:10]';

We are developing the analyze/selectivity functions for those types. Our
approach is to use the standard PostgreSQL/PostGIS functions for the value
and the time dimensions where the slots starting from 0 will be used for
the value dimension, and the slots starting from 2 will be used for the
time dimension. For example, for tfloat we use range_typanalyze and related
functions for
* collecting in slots 0 and 1, STATISTIC_KIND_BOUNDS_HISTOGRAM
and STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for the float ranges of the value
dimension
*  collecting in slots 2 and 3, STATISTIC_KIND_BOUNDS_HISTOGRAM
and STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for the periods (similar to
tstzranges) of the time dimension

However, we end up copying several PostgreSQL functions to which we only
add an additional parameter stating the slot number from which the specific
statistic kind should be found (either 0 or 2)

bool
get_attstatsslot_mobdb(AttStatsSlot *sslot, HeapTuple statstuple,
int reqkind, Oid reqop, int flags, int startslot)
{
[...]
for (i = startslot; i < STATISTIC_NUM_SLOTS; i++)
{
if ((>stakind1)[i] == reqkind &&
(reqop == InvalidOid || (>staop1)[i] == reqop))
break;
}
[...]
}

double
var_eq_const_mobdb(VariableStatData *vardata, Oid operator, Datum constval,
bool negate, int startslot)
{
[...]
}
Selectivity
scalarineqsel_mobdb(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
VariableStatData *vardata, Datum constval, Oid consttype,
int startslot)
{
[...]
}

static Selectivity
mcv_selectivity_mobdb(VariableStatData *vardata, FmgrInfo *opproc,
Datum constval, Oid atttype, bool varonleft,
double *sumcommonp, int startslot)
{
[...]
}
static double
ineq_histogram_selectivity_mobdb(PlannerInfo *root, VariableStatData *
vardata,
FmgrInfo *opproc, bool isgt, bool iseq, Datum constval,
Oid consttype, int startslot)
{
[...]
}

in addition to copying other functions needed by the above functions since
they are not exported (defined as static)

static bool
get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
Oid sortop, Datum *min, Datum *max)

static bool
get_actual_variable_endpoint(Relation heapRel,
Relation indexRel, ScanDirection indexscandir,
ScanKey scankeys, int16 typLen,
bool typByVal, MemoryContext outercontext,
Datum *endpointDatum)

[...]

Is there a better way to do this ?

Is there any chance that the API for accessing the typanalyze and
selectivity functions will be enhanced in a future release ?

Regards

Esteban

-- 
--------
Prof. Esteban Zimanyi
Department of Computer & Decision Engineering  (CoDE) CP 165/15
Universite Libre de Bruxelles
Avenue F. D. Roosevelt 50
B-1050 Brussels, Belgium
fax: + 32.2.650.47.13
tel: + 32.2.650.31.85
e-mail: ezima...@ulb.ac.be
Internet: http://code.ulb.ac.be/



Parameters in user-defined aggregate final functions

2018-01-11 Thread Esteban Zimanyi
I am creating a user-defined aggregate function that needs an additional
parameter. More precisely it is a cumulative (aka window) minimum that
takes as second parameter a time interval defining the window. Since the
aggregate function operates on my user-defined data types I have conveyed a
dummy example that concatenates the n last values of a text column. I am
aware that I can solve this dummy problem in PostgreSQL but the purpose of
the example is only to highlight my problem.

CREATE FUNCTION lastNconcat_transfn(state text[], next text, n integer)
RETURNS text[] AS $$
BEGIN
RETURN array_append(state, next);
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION lastNconcat_combinefn(state1 text[], state2 text[], n
integer)
RETURNS text[] AS $$
BEGIN
RETURN array_concat(state1, state2);
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE;

CREATE FUNCTION lastNconcat_finalfn(state text[], n integer)
RETURNS text AS $$
DECLARE
card integer;
result text;
BEGIN
result := '';
card := array_length(state, 1);
FOR i IN greatest(1,card-n+1)..card
LOOP
result := result || state[i];
END LOOP;
RETURN result;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE;

CREATE AGGREGATE lastNconcat(text, integer) (
SFUNC = lastNconcat_transfn,
STYPE = text[],
INITCOND = '{}',
COMBINEFUNC = lastNconcat_combinefn,
FINALFUNC = lastNconcat_finalfn,
PARALLEL = SAFE
);

I receive the following error

ERROR: function lastnconcat_finalfn(text[]) does not exist
SQL state: 42883

How to tell PostgreSQL that my final function also needs a parameter? I am
working on PostgreSQL 10.1. I know that according to the documentation
direct parameters are only allowed for ordered-set aggregates, but I would
also need a direct parameter for "normal" aggregates.

Notice that the solution proposed here
https://stackoverflow.com/questions/48189751/direct-arguments-in-postgresql-user-defined-aggregate-functions/48190288?noredirect=1#comment83364017_48190288
is neither ideal nor efficient.

IMHO since combine functions accept parameters I don't see why final
functions should not also accept parameters.

-- 
----
Prof. Esteban Zimanyi
Department of Computer & Decision Engineering  (CoDE) CP 165/15
Universite Libre de Bruxelles
Avenue F. D. Roosevelt 50
B-1050 Brussels, Belgium
fax: + 32.2.650.47.13
tel: + 32.2.650.31.85
e-mail: ezima...@ulb.ac.be
Internet: http://code.ulb.ac.be/