Re: Getting "could not read block" error when creating an index on a function.

2020-12-30 Thread Tom Lane
Karsten Hilbert  writes:
> Am Wed, Dec 30, 2020 at 02:37:59PM -0500 schrieb Demitri Muna:
>> I want to index the results of these repeated, unchanging calculations to 
>> speed up other queries. Which mechanism would be best to do this? Create 
>> additional columns? Create another table?

> A materialized view ?

Yeah, or you might be able to do something with a before-insert-or-update
trigger that computes whatever desired value you want and fills it into a
derived column.  Indexing that column then gives the same results as
indexing the derived expression; but it sidesteps the semantic problems
because the time of computation of the expression is well-defined, even
if it's not immutable.

You might try to avoid a handwritten trigger by defining a generated
column instead, but we insist that generation expressions be immutable
so it won't really work.  (Of course, you could still lie about the
mutability of the expression, but I can't recommend that.  Per Henry
Spencer's well-known dictum, "If you lie to the compiler, it will get its
revenge".  He was speaking of C compilers, I suppose, but the principle
applies to database query optimizers too.)

regards, tom lane




Re: Getting "could not read block" error when creating an index on a function.

2020-12-30 Thread Christophe Pettus



> On Dec 30, 2020, at 11:48, Christophe Pettus  wrote:
> 
> This might be a good use for a generated column.
> 
>   https://www.postgresql.org/docs/current/ddl-generated-columns.html

I take that back; the generation formula has to be immutable as well.  Perhaps 
a column populated by a trigger?
--
-- Christophe Pettus
   x...@thebuild.com





Re: Getting "could not read block" error when creating an index on a function.

2020-12-30 Thread Christophe Pettus



> On Dec 30, 2020, at 11:37, Demitri Muna  wrote:
> I want to index the results of these repeated, unchanging calculations to 
> speed up other queries. Which mechanism would be best to do this? Create 
> additional columns? Create another table?

This might be a good use for a generated column.

https://www.postgresql.org/docs/current/ddl-generated-columns.html
--
-- Christophe Pettus
   x...@thebuild.com





Re: Getting "could not read block" error when creating an index on a function.

2020-12-30 Thread Karsten Hilbert
Am Wed, Dec 30, 2020 at 02:37:59PM -0500 schrieb Demitri Muna:

> I want to index the results of these repeated, unchanging calculations to 
> speed up other queries. Which mechanism would be best to do this? Create 
> additional columns? Create another table?

A materialized view ?

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B




Re: Getting "could not read block" error when creating an index on a function.

2020-12-30 Thread Demitri Muna
Hi Tom,

> On Dec 30, 2020, at 11:50 AM, Tom Lane  wrote:
> 
> I would call this a bug if it were a supported case, but really you are
> doing something you are not allowed to.  Functions in indexed expressions
> are required to be immutable, and a function that looks at the contents of
> a table --- particularly the very table that the index is on --- is simply
> not going to be that.  Marking such a function immutable to try to end-run
> around the restriction is unsafe.

Thank you, that makes perfect sense. In my mind it was immutable since the 
database is read-only, but I can see to PG it’s not. Can you suggest an 
alternate for what I’m trying to do? Given this schema (a “person” has a number 
of “events”):

CREATE TABLE person (
id SERIAL,
...
);

CREATE TABLE event (
id SERIAL,
patient_id INTEGER
event_timestamp TIMESTAMP,
…
);

I have a function (the one I was trying to index) that returns the earliest 
event for a person. I’m scanning another table with ~10B rows several times 
using a few of these “constant” values:

* first_event_timestamp(person_id) + ‘1 month’
* first_event_timestamp(person_id) + ‘13 months’
* etc.

I want to index the results of these repeated, unchanging calculations to speed 
up other queries. Which mechanism would be best to do this? Create additional 
columns? Create another table?

Thanks again,
Demitri






Re: Getting "could not read block" error when creating an index on a function.

2020-12-30 Thread Tom Lane
Demitri Muna  writes:
> I’m getting stuck on a problem I was hoping I could get some help with. I’m 
> trying to create an index for the results of a function that touches two 
> tables like this and get the following error:

> CREATE INDEX my_idx ON mytable (first_time(id));
> ERROR:  could not read block 0 in file "base/16386/42868": read only 0 of 
> 8192 bytes
> CONTEXT:  SQL function "first_time" during startup

What's apparently happening is that some query in the function is trying
to examine the newly-created index before it's been populated.

I would call this a bug if it were a supported case, but really you are
doing something you are not allowed to.  Functions in indexed expressions
are required to be immutable, and a function that looks at the contents of
a table --- particularly the very table that the index is on --- is simply
not going to be that.  Marking such a function immutable to try to end-run
around the restriction is unsafe.

regards, tom lane




Re: Problem with ssl and psql in Postgresql 13

2020-12-30 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I think we'd be best off to always override KRB5_KTNAME if we have a
>> nonempty krb_server_keyfile setting, so the attached proposed patch
>> makes both functions do it the same way.  (I did not make an effort
>> to remove the dependency on setenv, given the nearby thread to
>> standardize on that.)

> +1.

Done, thanks for looking at the patch.

>> I'm not sure whether there's any documentation change that needs to
>> be made.  The docs don't suggest that you're allowed to set
>> krb_server_keyfile to an empty string in the first place, so maybe
>> we needn't explain what happens if you do.

> Perhaps saying something about 'system default' or 'taken from the
> environment' might make sense.

I went with "If this parameter is set to an empty string, it is ignored
and a system-dependent default is used."  I don't think we need to go
into more detail than that, since as you say it's unlikely to be a
useful case.

regards, tom lane




Getting "could not read block" error when creating an index on a function.

2020-12-30 Thread Demitri Muna
Hello,

I’m getting stuck on a problem I was hoping I could get some help with. I’m 
trying to create an index for the results of a function that touches two tables 
like this and get the following error:

CREATE INDEX my_idx ON mytable (first_time(id));
ERROR:  could not read block 0 in file "base/16386/42868": read only 0 of 8192 
bytes
CONTEXT:  SQL function "first_time" during startup

Every time I run this, the last number in the block path increments by one, e.g.

ERROR:  could not read block 0 in file "base/16386/42869": read only 0 of 8192 
bytes
ERROR:  could not read block 0 in file "base/16386/42870": read only 0 of 8192 
bytes
ERROR:  could not read block 0 in file "base/16386/42871": read only 0 of 8192 
bytes

The database is sitting on two tablespaces (pg_default and ‘data2’). When I try 
to use the second, I get:

CREATE INDEX my_idx ON mytable (first_time(id)) TABLESPACE data2;
ERROR:  could not read block 0 in file 
"pg_tblspc/17007/PG_13_202007201/16386/42870": read only 0 of 8192 bytes
CONTEXT:  SQL function "first_time" during startup

with the last number similarly incrementing upon repeated attempts.

Relevant details:

* PostgreSQL version 13.1 running on Ubuntu 20.04.1 on an AWS instance using 2 
x 8TB EBS storage.
* The database is ~15TB in size.
* I am not worried about data loss; the database can be considered read-only 
and I have all of the files needed to recreate any table.
* I am hoping to not recreate the whole database from scratch since doing so 
and creating the required indices will take more than a week.
* I used these settings while importing the files to speed the process since I 
was not worried about data loss to improve the import speed (all turned back on 
after import):

autovacuum = off
synchronous_commit=off
fsync = off
full_page_writes = off

* I will not do the above again. :)
* The postmaster server crashed at least twice during the process due to 
running out of disk space.
* Creating any number of new indices on bare columns is no problem.
* I DROPped and recreated the functions with no change.
* This statement returns no results (but maybe am looking at the wrong thing):

select n.nspname AS schema, c.relname AS realtion from pg_class c inner join 
pg_namespace n on (c.relnamespace = n.oid) where c.relfilenode = '16386’;

From reading about this error (missing block files) it suggests I have some 
database corruption, which is fine as I can easily delete anything problematic 
and recreate. I’ve deleted the indices related to the function and recreated 
them, but the same error remains. Accessing the related tables seems ok, but 
with that much data I can’t guarantee that. I don’t get any errors. 

Any help would be appreciated!

Cheers,
Demitri





Re: Problem with ssl and psql in Postgresql 13

2020-12-30 Thread Stephen Frost
Greetings,

First, thanks a lot for working on all of this and improving things!

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> I've got one last complaint about the backend GSS code: we are doing
> things randomly differently in the two places that install
> krb_server_keyfile as the active KRB5_KTNAME value.  secure_open_gssapi()
> sets KRB5_KTNAME unconditionally (and doesn't bother to check for error,
> either, not a good thing in a security-critical operation).  But the older
> code in pg_GSS_recvauth() is written to not override KRB5_KTNAME if it's
> already set.  This of-course-totally-undocumented behavior seems like a
> fairly bad idea to me: as things stand, the client-side choice of whether
> to initiate GSS encryption or not could result in two different server
> keytabs being used.

I agree that we should be consistent and that always setting the
environment variable if krb_server_keyfile has been set is the right
approach.

> I think we'd be best off to always override KRB5_KTNAME if we have a
> nonempty krb_server_keyfile setting, so the attached proposed patch
> makes both functions do it the same way.  (I did not make an effort
> to remove the dependency on setenv, given the nearby thread to
> standardize on that.)

+1.

> I'm not sure whether there's any documentation change that needs to
> be made.  The docs don't suggest that you're allowed to set
> krb_server_keyfile to an empty string in the first place, so maybe
> we needn't explain what happens if you do.

Perhaps saying something about 'system default' or 'taken from the
environment' might make sense.  That said, the system default almost
always ends up not being usable since it'll be owned by root and we
won't run as root.  Perhaps there are some who prefer to set it via the
environment variable, though I don't think I've ever seen that myself.

I didn't look too closely at the patch itself but on a once-over it
seemed fine to me.

Thanks,

Stephen


signature.asc
Description: PGP signature