Re: CLUSTER, VACUUM, and TABLESPACEs (oh my)

2021-01-24 Thread Demitri Muna


> On Jan 24, 2021, at 9:03 PM, Guillaume Lelarge  wrote:
> 
>> VACUUM FULL unclusters the table??
> 
> It will rebuild the table without sorting the data according to the index 
> used with CLUSTER (without any sorting actually).

Thank you for the clarification; that’s very helpful. For the case above. If I 
CLUSTER a table, add/delete no rows, then VACUUM, will the existing order 
remain or be changed?

Demitri



CLUSTER, VACUUM, and TABLESPACEs (oh my)

2021-01-24 Thread Demitri Muna
Hi,

I would like to request a little clarification on the CLUSTER and VACUUM 
commands. My use case here (partially) is when my disk runs out of space and I 
want to move a table to a newly created tablespace. These questions somewhat 
overlap. Let’s say I am starting with a table that is not CLUSTERed on a given 
index, but I want it to be.

* If I run “CLUSTER table USING idx” on a table, is VACUUM FULL required/useful 
afterwards, or should I assume that the cluster operation did the equivalent of 
a VACUUM FULL?

* If I have previously run a CLUSTER command on a table, will future VACUUM 
FULL commands rewrite the table in the order specified in the previous CLUSTER?

* If I want to move a table to a new tablespace, is it possible to 
CLUSTER+VACUUM in the same step since the whole table will be rewritten anyway? 
This would be very useful in low-disk space scenarios. I did find this answer, 
but it’s dated several years ago and was hoping for something better supported. 
https://dba.stackexchange.com/a/87457/121020

The documentation is somewhat silent on these details, so I thought I’d ask 
here. Right now I move a table to a new tablespace, cluster on an index, and 
then do a full vacuum which results in three full copies of the table being 
written, which seems less than optimal where one should only be needed as far 
as I understand things.

Cheers,
Demitri



Re: Getting "could not read block" error when creating an index on a function.

2021-01-03 Thread Demitri Muna


Thank you for the responses! I was going to go with a materialized view, but 
then realized that since the dataset is static it’s really no different from 
just creating a new table and indexing that. The suggestions provide useful 
advice for the future though.

Cheers,
Demitri

> On Dec 30, 2020, at 3:14 PM, Tom Lane  wrote:
> 
> Karsten Hilbert  writes:
>> Am Wed, Dec 30, 2020 at 02:37:59PM -0500 schrieb Demitri Muna:
>>> I want to index the results of these repeated, unchanging calculations to 
>>> speed up other queries. Which mechanism would be best to do this? Create 
>>> additional columns? Create another table?
> 
>> A materialized view ?
> 
> Yeah, or you might be able to do something with a before-insert-or-update
> trigger that computes whatever desired value you want and fills it into a
> derived column.  Indexing that column then gives the same results as
> indexing the derived expression; but it sidesteps the semantic problems
> because the time of computation of the expression is well-defined, even
> if it's not immutable.
> 
> You might try to avoid a handwritten trigger by defining a generated
> column instead, but we insist that generation expressions be immutable
> so it won't really work.  (Of course, you could still lie about the
> mutability of the expression, but I can't recommend that.  Per Henry
> Spencer's well-known dictum, "If you lie to the compiler, it will get its
> revenge".  He was speaking of C compilers, I suppose, but the principle
> applies to database query optimizers too.)
> 
>   regards, tom lane





Re: Getting "could not read block" error when creating an index on a function.

2020-12-30 Thread Demitri Muna
Hi Tom,

> On Dec 30, 2020, at 11:50 AM, Tom Lane  wrote:
> 
> I would call this a bug if it were a supported case, but really you are
> doing something you are not allowed to.  Functions in indexed expressions
> are required to be immutable, and a function that looks at the contents of
> a table --- particularly the very table that the index is on --- is simply
> not going to be that.  Marking such a function immutable to try to end-run
> around the restriction is unsafe.

Thank you, that makes perfect sense. In my mind it was immutable since the 
database is read-only, but I can see to PG it’s not. Can you suggest an 
alternate for what I’m trying to do? Given this schema (a “person” has a number 
of “events”):

CREATE TABLE person (
id SERIAL,
...
);

CREATE TABLE event (
id SERIAL,
patient_id INTEGER
event_timestamp TIMESTAMP,
…
);

I have a function (the one I was trying to index) that returns the earliest 
event for a person. I’m scanning another table with ~10B rows several times 
using a few of these “constant” values:

* first_event_timestamp(person_id) + ‘1 month’
* first_event_timestamp(person_id) + ‘13 months’
* etc.

I want to index the results of these repeated, unchanging calculations to speed 
up other queries. Which mechanism would be best to do this? Create additional 
columns? Create another table?

Thanks again,
Demitri






Getting "could not read block" error when creating an index on a function.

2020-12-30 Thread Demitri Muna
Hello,

I’m getting stuck on a problem I was hoping I could get some help with. I’m 
trying to create an index for the results of a function that touches two tables 
like this and get the following error:

CREATE INDEX my_idx ON mytable (first_time(id));
ERROR:  could not read block 0 in file "base/16386/42868": read only 0 of 8192 
bytes
CONTEXT:  SQL function "first_time" during startup

Every time I run this, the last number in the block path increments by one, e.g.

ERROR:  could not read block 0 in file "base/16386/42869": read only 0 of 8192 
bytes
ERROR:  could not read block 0 in file "base/16386/42870": read only 0 of 8192 
bytes
ERROR:  could not read block 0 in file "base/16386/42871": read only 0 of 8192 
bytes

The database is sitting on two tablespaces (pg_default and ‘data2’). When I try 
to use the second, I get:

CREATE INDEX my_idx ON mytable (first_time(id)) TABLESPACE data2;
ERROR:  could not read block 0 in file 
"pg_tblspc/17007/PG_13_202007201/16386/42870": read only 0 of 8192 bytes
CONTEXT:  SQL function "first_time" during startup

with the last number similarly incrementing upon repeated attempts.

Relevant details:

* PostgreSQL version 13.1 running on Ubuntu 20.04.1 on an AWS instance using 2 
x 8TB EBS storage.
* The database is ~15TB in size.
* I am not worried about data loss; the database can be considered read-only 
and I have all of the files needed to recreate any table.
* I am hoping to not recreate the whole database from scratch since doing so 
and creating the required indices will take more than a week.
* I used these settings while importing the files to speed the process since I 
was not worried about data loss to improve the import speed (all turned back on 
after import):

autovacuum = off
synchronous_commit=off
fsync = off
full_page_writes = off

* I will not do the above again. :)
* The postmaster server crashed at least twice during the process due to 
running out of disk space.
* Creating any number of new indices on bare columns is no problem.
* I DROPped and recreated the functions with no change.
* This statement returns no results (but maybe am looking at the wrong thing):

select n.nspname AS schema, c.relname AS realtion from pg_class c inner join 
pg_namespace n on (c.relnamespace = n.oid) where c.relfilenode = '16386’;

From reading about this error (missing block files) it suggests I have some 
database corruption, which is fine as I can easily delete anything problematic 
and recreate. I’ve deleted the indices related to the function and recreated 
them, but the same error remains. Accessing the related tables seems ok, but 
with that much data I can’t guarantee that. I don’t get any errors. 

Any help would be appreciated!

Cheers,
Demitri





Re: ddl_command_end not firing DROP events

2018-07-08 Thread Demitri Muna
Hi,

On Jul 8, 2018, at 2:19 PM, Alvaro Herrera  wrote:

> As I recall, handling of DROP events in the ddl_command_end event is not
> completely consistent. You may even find that some DROP events are not
> returned by pg_event_trigger_ddl_commands.  I suggest you stick to
> pg_event_trigger_dropped_objects() instead (for which you need
> sql_drop).

That's what I'm seeing as well and have a workaround in place that does just 
that. But to track schema changes requires two triggers and two functions. 
Which is fine, it works, but I'm wondering if it's a documentation issue or an 
outright bug.

Cheers,
Demitri



ddl_command_end not firing DROP events

2018-07-08 Thread Demitri Muna
Hi,

I’ve created a trigger where I want to capture schema-changing events. I’m 
finding that DROP events are not being triggered when using “ddl_command_end". 
The test case below demonstrates this. I am running PostgreSQL 10.4. The 
trigger is fired for the CREATE event, but not DROP TYPE or DROP TABLE.

I came across the same question on the mailing list, but the solution was to 
use the “sql_drop” trigger instead.

https://www.postgresql.org/message-id/CAHE3wgjX-N%3DX9mccp4Bs-y-%3DHE%3DqayHEpXpCs1jMT%3DW0ZsYc5A%40mail.gmail.com

The documentation suggests that one should be able to use “ddl_command_end” 
instead of creating two triggers (i.e. one for DROPs, another for everything 
else).

Test:

CREATE FUNCTION test_event_trigger_for_drops()
RETURNS event_trigger LANGUAGE plpgsql AS $$
DECLARE
obj record;
BEGIN
FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands()
LOOP
RAISE NOTICE 'caught % event on ''%''',
 obj.command_tag,
 obj.object_identity;
END LOOP;
END
$$;
CREATE EVENT TRIGGER test_event_trigger_for_drops
   ON ddl_command_end
   EXECUTE PROCEDURE test_event_trigger_for_drops();

CREATE TYPE foo AS (f1 int, f2 text);
DROP TYPE foo;
CREATE TABLE drop_test ( pk int ) WITH ( OIDS = FALSE );
DROP TABLE drop_test;

-- for ease of copy/paste to delete the above
DROP EVENT TRIGGER test_event_trigger_for_drops;
DROP FUNCTION test_event_trigger_for_drops;

Cheers,
Demitri




Re: Location to specify C compiler option in C extension

2018-02-23 Thread Demitri Muna
Hi Tom,

> PG_CPPFLAGS ought to work.  One point is that you need to set that before
> including pgxs.mk; maybe it's an ordering problem?

On closer inspection, it was working, but the pg Makefile was specifically 
including “-Wdeclaration-after-statement” which I think was overriding the 
‘-std=c99’ part. The line below fixed the problem.

PG_CPPFLAGS += -std=c99 -Wno-declaration-after-statement

Thanks, sorry for the noise.

Demitri



Location to specify C compiler option in C extension

2018-02-23 Thread Demitri Muna
Hi,

I’m writing a C extension and have successfully based my Makefile on existing 
templates. This conveniently hides the details of the full Makefile provided by 
"pg_config --pgxs”. Which variable would be the appropriate one to specify the 
‘-std=c99’ compiler flag? I’ve tried many and still get warnings that this flag 
should suppress (specifically, "ISO C90 forbids mixed declarations and code”). 
These are the variables I currently define:

EXTENSION
DATA
PGFILEDESC
MODULE_big
PG_CPPFLAGS (this one seemed like the right place)
SHLIB_LINK
OBJS
PG_CONFIG
PGXS

A workaround has been:

OBJS = file1.o file2.o CPPFLAGS+=-Wdeclaration-after-statement

but it’s not exactly the same thing.

Thanks,
Demitri




Re: Getting a primitive numeric value from "DatumGetNumeric"?

2018-02-23 Thread Demitri Muna
Hi,

On Feb 22, 2018, at 9:31 PM, Michael Paquier  wrote:

> PG_GETARG_NUMERIC(), no?

That function returns an object of datatype “Numeric” which still requires some 
(not immediately obvious) conversation to a double (or whatever primitive C 
type).

> When working on implementing your own data
> types or when hacking out functions which manipulate arguments of an
> existing datatype, looking at the input and output functions help a
> lot.  In your case, numeric_in and numeric_out in
> src/backend/utils/adt/numeric.c is full of hints.

I spent an hour diving into the code out of curiosity. I found useful functions 
like this:

double numeric_to_double_no_overflow(Numeric n)

They’re available from the PostgreSQL main source code, but not exposed in the 
public headers. (Maybe I was missing something.) There was enough there where I 
could see a way to copy/paste or otherwise link to those methods, but as Tom 
pointed out, implicit coercion handles what I need so I’ll stick with that.

Cheers,
Demitri




Re: Getting a primitive numeric value from "DatumGetNumeric"?

2018-02-22 Thread Demitri Muna
Hi Tom,

On Feb 20, 2018, at 10:54 PM, Tom Lane  wrote:

> Well, the implicit coercions work in your favor in this particular case.

Ah, I wasn’t aware of implicit coercion. Yes, that solves the problem 
perfectly, thanks.

Out of curiosity, how does one read a numeric type from within a C extension 
(i.e. get a number value out of the Datum type)? I ask as I was unable to find 
an example and there are a few open questions on Stack Overflow (e.g. 
https://stackoverflow.com/questions/12588554/postgres-c-function-passing-returning-numerics).

Thanks,
Demitri




Getting a primitive numeric value from "DatumGetNumeric"?

2018-02-20 Thread Demitri Muna
Hi,

I’m writing a C extension for PostgreSQL. One possible input datatype for my 
function is a numeric array, e.g. ARRAY[[1.5,2.5],[3.5,4.5]]. I can use 
“DatumGetNumeric” to extract a “Numeric” data type from the data, but at some 
point I need to convert this to a number (e.g. double) so that I can do mathy 
things with it. How does one convert a “Numeric” to, say, a double?

I have a workaround in that I can pass this to my function:

ARRAY[[1.5,2.5],[3.5,4.5]]::float8[]

but I’d rather have the code do that instead of bothering the user to remember 
that.

Thanks,
Demitri