date:20200521

Re: Query returns no rows in pg_basebackup cluster

2020-05-21 Thread Andrus


Hi!


Database in Windows is in read-only (recovery) mode so it cannot changed.

Then you might as well just rm -rf it (or whatever the equivalent Windows
incantation is).  On Windows, that database is broken and useless.


Backup in created in Windows from Linux server using pg_receivewal and 
pg_basebackup .
Can this backup used for PITR in Linux ?

Andrus.

Suggestion to improve query performance of data validation in proc.

2020-05-21 Thread postgann2020 s

Hi Team,
Thanks for your support.

Could you please suggest on below query.

We have multiple long procs that are having 100s of data validations and
currently we have written as below.

***

if (SELECT 1  FROM SCHEMA.TABLE WHERE column=data AND column=data) then
statements
etc..

***

Are there any other ways to validate the data, which will help us to
improve the performance of the query?.

Thanks for your support.

Regards,
PostgAnn.

Re: Query returns no rows in pg_basebackup cluster

2020-05-21 Thread Andrus


Hi!


Main server is in Linux and backup server is in windows.

This is not a supported setup if you want to run a physical backup.
Your backup and your primary need to be the same - software and hardware.  
Consider anything that is working to be a false
negative – assume >something will break or simply give incorrect results.


This base backup should used for recovery. Taking new base backup in Linux does 
not allow to recover to earlier date.
Both servers have Intel 64 bit CPUs.
I understand that only issue is the index structure and that REINDEX will fix 
this.
What other issues may occur ?

Will pg_dump/pg_restore in Windows server fix all issues.

Andrus.

Re: Query returns no rows in pg_basebackup cluster

2020-05-21 Thread David G. Johnston

On Thu, May 21, 2020 at 10:41 PM Andrus  wrote:

> Main server is in Linux and backup server is in windows.
>

This is not a supported setup if you want to run a physical backup.

Your backup and your primary need to be the same - software and hardware.
Consider anything that is working to be a false negative - assume something
will break or simply give incorrect results.

David J.

Re: Query returns no rows in pg_basebackup cluster

2020-05-21 Thread Andrus


Hi!


No, what it sounds like is the OP tried to physically replicate a
database on another platform with completely different sorting rules.


The sorting rules for this locale must be the same in both platforms.
Only locale names are different. It looks like windows server does not 
recognize Linux locale name.


Which means all his text indexes are corrupt according to the
destination platform's sorting rules, which easily explains the
observed misbehavior (ie, index searches not finding the expected rows).


Lot of queries seems working properly.

REINDEX would fix it. 


REINDEX throws error

ERROR:  cannot execute REINDEX during recovery
SQL state: 25006


But the major point here is you can't just ignore
a collation mismatch, which in turn implies that you can't do physical
replication from Linux to Windows, or vice versa (and most other
cross-platform cases are just as dangerous).


Database is used in recovery mode to find proper recovery point and to get data 
from it in this point.
Locales are actually same. In windows Postgres does not recognize Linux locale 
name.


Database in Windows is in read-only (recovery) mode so it cannot changed.

Then you might as well just rm -rf it (or whatever the equivalent Windows
incantation is).  On Windows, that database is broken and useless.


Most queries seems to work. 
Database should examined to get accidently deleted data from it.


Is making it read-write and index only solution or can it fixed in read-only 
database also, e-q forcing same local in postgres.conf

Andrus.

Re: Query returns no rows in pg_basebackup cluster

2020-05-21 Thread Andrus


Hi!


The LIKE query probably doesn't use an index and thus finds the relevant
data via sequential scan and equality checks on each record.



Yeah, exactly.  An equality condition will use a btree index if
available.  LIKE, however, sees the "_" as a wildcard so it cannot
use an index and resorts to a seqscan --- which will work fine.
It's just index searches (and index-based sorts) that are broken.
Of course, if there isn't an index on the column in question
then this theory falls to the ground.


There is composite index on baas column

CREATE TABLE public.desktop
(
   id integer NOT NULL DEFAULT nextval('desktop_id_seq'::regclass),
   recordtype character(5) COLLATE pg_catalog."default" NOT NULL,
   klass character(1) COLLATE pg_catalog."default",
   baas character(8) COLLATE pg_catalog."default" NOT NULL,
   liigid character(1) COLLATE pg_catalog."default" NOT NULL DEFAULT ''::bpchar,
   jrk numeric(4,0) NOT NULL DEFAULT 0,
...
   CONSTRAINT desktop_pkey PRIMARY KEY (id),
   CONSTRAINT desktop_baas_not_empty CHECK (baas <> ''::bpchar),
   CONSTRAINT desktop_id_check CHECK (id > 0),
   CONSTRAINT desktop_recordtype_check CHECK (recordtype = 'Aken'::bpchar OR 
recordtype = 'Veerg'::bpchar)
)

TABLESPACE pg_default;
CREATE INDEX desktop_baas_liigid_idx
   ON public.desktop USING btree
   (baas COLLATE pg_catalog."default" ASC NULLS LAST, liigid COLLATE 
pg_catalog."default" ASC NULLS LAST)
   TABLESPACE pg_default;

Maybe it is possible to force postgres in windows to use the same locale as in 
Linux. Locales are actually the same.

Andrus.

Re: Query returns no rows in pg_basebackup cluster

2020-05-21 Thread Andrus


Hi!


Are you referring to two different instances of Postgres on Windows?


No. 
Main server is in Linux and backup server is in windows.


Andrus.

Re: Help understanding SIReadLock growing without bound on completed transaction

2020-05-21 Thread Mike Klaas

On Thu, May 21, 2020 at 5:19 PM, Thomas Munro < thomas.mu...@gmail.com > wrote:

> 
> 
> 
> On Fri, May 22, 2020 at 7:48 AM Mike Klaas < mike@ superhuman. com (
> m...@superhuman.com ) > wrote:
> 
> 
> 
>> 
>> 
>> pid:2263461
>> 
>> 
>> 
> 
> 
> 
> That's an unusually high looking pid. Is that expected, for example did
> you crank Linux's pid_max right up, or is this AIX, or something?
> 
> 
> 
> 

Unfortunately I'm not sure exactly what it's running on as it's a 
cloud-provided database instance running on google cloud:

=> select version();

PostgreSQL 9.6.16 on x86_64-pc-linux-gnu, compiled by clang version 
7.0.0-3~ubuntu0.18.04.1 (tags/RELEASE_700/final), 64-bit

-Mike

Re: Help understanding SIReadLock growing without bound on completed transaction

2020-05-21 Thread Thomas Munro

On Fri, May 22, 2020 at 7:48 AM Mike Klaas  wrote:
> locktype: page
> relation::regclass::text: _pkey
> virtualtransaction: 36/296299968
> granted:t
> pid:2263461

That's an unusually high looking pid.  Is that expected, for example
did you crank Linux's pid_max right up, or is this AIX, or something?

Re: Should I use JSON?

2020-05-21 Thread Tim Cross

stan  writes:

> Worming on a small project, and have been doing a lot of Perl scripting to
> parse various types of files to populate the database. Now I need to get
> data from a cloud services provider (time-keeping). They have a REST API
> that returns data in a JSOSN format.
>
> So here is the question, should I just manually parse this data, as I have
> been doing to insert into appropriate entities into the database? Or should I
> insert the JSON data, and use some queries in the database to populate my
> tables from the JSON tables?

Given you plan to store your data in 'normal' tables and you are already
using a scripting language to get the data from the remote API
and your already processing data in various forms using Perl, I
would not bother. All you will really do is add another layer of
complexity and skill requirement (i.e. JSON in the database and writing
JSON queries using PG's SQL JSON support). 

-- 
Tim Cross

Re: Query returns no rows in pg_basebackup cluster

2020-05-21 Thread Tom Lane

"David G. Johnston"  writes:
> On Thu, May 21, 2020 at 3:57 PM Adrian Klaver 
> wrote:
>> Well what I was trying to figure out was:
>> "Windows server this query returns 0 rows.
>> In Windows server same query using like
>> select *  from firma1.desktop where baas like '_LOGIFAI'
>> returns properly 16 rows. "

> The LIKE query probably doesn't use an index and thus finds the relevant
> data via sequential scan and equality checks on each record.

Yeah, exactly.  An equality condition will use a btree index if
available.  LIKE, however, sees the "_" as a wildcard so it cannot
use an index and resorts to a seqscan --- which will work fine.
It's just index searches (and index-based sorts) that are broken.

Of course, if there isn't an index on the column in question
then this theory falls to the ground.

regards, tom lane

Re: Query returns no rows in pg_basebackup cluster

2020-05-21 Thread Adrian Klaver

On 5/21/20 4:06 PM, David G. Johnston wrote:
On Thu, May 21, 2020 at 3:57 PM Adrian Klaver > wrote:

On 5/21/20 3:47 PM, Tom Lane wrote:
 > Adrian Klaver mailto:adrian.kla...@aklaver.com>> writes:
 >> On 5/21/20 1:20 PM, Andrus wrote:
 >>> In windows pg_basebackup was used to create base backup from
Linux server.
 >
 >> Are you referring to two different instances of Postgres on Windows?
 >
 > No, what it sounds like is the OP tried to physically replicate a
 > database on another platform with completely different sorting rules.
 > Which means all his text indexes are corrupt according to the
 > destination platform's sorting rules, which easily explains the
 > observed misbehavior (ie, index searches not finding the expected
rows).

Well what I was trying to figure out was:

"Windows server this query returns 0 rows.

In Windows server same query using like

select *  from firma1.desktop where baas like '_LOGIFAI'

returns properly 16 rows. "

My suspicion is that first case is for the replicated database and
failed for the reasons you mentioned and that the second case is for a
'native' Windows instance. Just trying to get confirmation.

Nothing in the OP's text suggests a different server is involved - 
rather same server but LIKE vs equals.

Aah, missed that.

The LIKE query probably doesn't use an index and thus finds the relevant 
data via sequential scan and equality checks on each record.

David J.

--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Query returns no rows in pg_basebackup cluster

2020-05-21 Thread David G. Johnston

On Thu, May 21, 2020 at 3:57 PM Adrian Klaver 
wrote:

> On 5/21/20 3:47 PM, Tom Lane wrote:
> > Adrian Klaver  writes:
> >> On 5/21/20 1:20 PM, Andrus wrote:
> >>> In windows pg_basebackup was used to create base backup from Linux
> server.
> >
> >> Are you referring to two different instances of Postgres on Windows?
> >
> > No, what it sounds like is the OP tried to physically replicate a
> > database on another platform with completely different sorting rules.
> > Which means all his text indexes are corrupt according to the
> > destination platform's sorting rules, which easily explains the
> > observed misbehavior (ie, index searches not finding the expected rows).
>
> Well what I was trying to figure out was:
>
> "Windows server this query returns 0 rows.
>
> In Windows server same query using like
>
> select *  from firma1.desktop where baas like '_LOGIFAI'
>
> returns properly 16 rows. "
>
> My suspicion is that first case is for the replicated database and
> failed for the reasons you mentioned and that the second case is for a
> 'native' Windows instance. Just trying to get confirmation.
>

Nothing in the OP's text suggests a different server is involved - rather
same server but LIKE vs equals.

The LIKE query probably doesn't use an index and thus finds the relevant
data via sequential scan and equality checks on each record.

David J.

Re: Query returns no rows in pg_basebackup cluster

2020-05-21 Thread Adrian Klaver


On 5/21/20 3:47 PM, Tom Lane wrote:

Adrian Klaver  writes:

On 5/21/20 1:20 PM, Andrus wrote:

In windows pg_basebackup was used to create base backup from Linux server.



Are you referring to two different instances of Postgres on Windows?


No, what it sounds like is the OP tried to physically replicate a
database on another platform with completely different sorting rules.
Which means all his text indexes are corrupt according to the
destination platform's sorting rules, which easily explains the
observed misbehavior (ie, index searches not finding the expected rows).


Well what I was trying to figure out was:

"Windows server this query returns 0 rows.

In Windows server same query using like

select *  from firma1.desktop where baas like '_LOGIFAI'

returns properly 16 rows. "

My suspicion is that first case is for the replicated database and 
failed for the reasons you mentioned and that the second case is for a 
'native' Windows instance. Just trying to get confirmation.




REINDEX would fix it.  But the major point here is you can't just ignore
a collation mismatch, which in turn implies that you can't do physical
replication from Linux to Windows, or vice versa (and most other
cross-platform cases are just as dangerous).


Database in Windows is in read-only (recovery) mode so it cannot changed.


Then you might as well just rm -rf it (or whatever the equivalent Windows
incantation is).  On Windows, that database is broken and useless.

regards, tom lane




--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Query returns no rows in pg_basebackup cluster

2020-05-21 Thread Tom Lane

Adrian Klaver  writes:
> On 5/21/20 1:20 PM, Andrus wrote:
>> In windows pg_basebackup was used to create base backup from Linux server.

> Are you referring to two different instances of Postgres on Windows?

No, what it sounds like is the OP tried to physically replicate a
database on another platform with completely different sorting rules.
Which means all his text indexes are corrupt according to the
destination platform's sorting rules, which easily explains the
observed misbehavior (ie, index searches not finding the expected rows).

REINDEX would fix it.  But the major point here is you can't just ignore
a collation mismatch, which in turn implies that you can't do physical
replication from Linux to Windows, or vice versa (and most other
cross-platform cases are just as dangerous).

>> Database in Windows is in read-only (recovery) mode so it cannot changed.

Then you might as well just rm -rf it (or whatever the equivalent Windows
incantation is).  On Windows, that database is broken and useless.

regards, tom lane

Re: Query returns no rows in pg_basebackup cluster

2020-05-21 Thread Adrian Klaver


On 5/21/20 1:20 PM, Andrus wrote:

Hi!

In windows pg_basebackup was used to create base backup from Linux server.
baas column data type is character(8)

In Linux server  query
select *  from firma1.desktop where baas='_LOGIFAI'

returns 16 rows.

Windows server this query returns 0 rows.

In Windows server same query using like

select *  from firma1.desktop where baas like '_LOGIFAI'

returns properly 16 rows.


Are you referring to two different instances of Postgres on Windows?



Maybe this is because database locale is not known in windows:

CREATE DATABASE sba
    WITH    OWNER = sba_owner
    ENCODING = 'UTF8'
    LC_COLLATE = 'et_EE.UTF-8'
    LC_CTYPE = 'et_EE.UTF-8'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1;

Correct encoding for windows should be

LC_COLLATE = 'Estonian_Estonia.1257'
LC_CTYPE = 'Estonian_Estonia.1257'

IF so how to to fix windows cluster so that query returns proper result 
in windows also?

Database in Windows is in read-only (recovery) mode so it cannot changed.
Postgres 12 is used.

Andrus.





--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Table partitioning for cloud service?

2020-05-21 Thread Adrian Klaver


On 5/21/20 1:23 PM, Israel Brewster wrote:
On May 21, 2020, at 12:12 PM, Sándor Daku > wrote:

Hi,

On Thu, 21 May 2020 at 18:14, Christopher Browne > wrote:


On Thu, 21 May 2020 at 11:53, Israel Brewster
mailto:isr...@brewstersoft.com>> wrote:




- Table-based tenancy (e.g. - each table has a "tenant_id" and
queries need to specify the tenant)


The database/schema per tenant solution can be tedious when you want 
to modify something on the structure and you have numerous tenants.
Therefore I used the "tables with tenant_id" version in a similar 
situation but with a slight twist. One of the biggest issue of this 
solution is that if you forget to add the tenant_id to the where 
clause you are going to reveal one tenant's data to another.
I came up with the solution that the database user have no privileges 
for accessing the base tables. Instead of that I generate views for 
each tenant and they can access their own data in the underlying table 
through these views. Now if forget to address the right tenant in my 
client code(it still happens sometimes) and try to directly access the 
base tables I get a strongly worded reminder from the server.


Nice solution! I think I may go to something like that once I upgrade to 
a cloud solution that lets me add multiple users to the DB (the free 
tier of Heroku does not). In the meantime, while I just have the single 
customer, I can fake it easily enough.


Is there any shortcuts for referencing the proper views, or do you just 
append/prepend something to every table reference in your SQL? One nice 
thing about the database/schema approach is that I can just specify the 
search_path (or database) in the connection command, and then all the 
table references remain the same for all tenants. Also helps avoid the 
situation you mentioned where you forget to address the right tenant, 
since you only have to do it in one place. Of course, as you said, it 
can be tedious when you want to modify the structure. I’ll have to think 
about that a bit more.


If you want to take the tedium out of it take a look at Sqitch:

https://sqitch.org/

Then all you have to do is create the change once and deploy to the 
targets.




---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com 
Home of EZPunch and Lyrics Presenter



Regards,
Sándor







--
Adrian Klaver
adrian.kla...@aklaver.com

Re: pg_basebackup + incremental base backups

2020-05-21 Thread Stephen Frost

Greetings,

* Christopher Pereira (krip...@imatronix.cl) wrote:
> On 21-May-20 08:43, Stephen Frost wrote:
> >* Christopher Pereira (krip...@imatronix.cl) wrote:
> >>Is there some way to rebuild the standby cluster by doing a differential
> >>backup of the primary cluster directly?
> >We've contemplated adding support for something like this to pgbackrest,
> >since all the pieces are there, but there hasn't been a lot of demand
> >for it and it kind of goes against the idea of having a proper backup
> >solution, really..  It'd also create quite a bit of load on the primary
> >to checksum all the files to do the comparison against what's on the
> >replica that you're trying to update, so not something you'd probably
> >want to do a lot more than necessary.
> 
> We have backups of the whole server and only need a efficient way to rebuild
> the hot-standby cluster when pg_rewind is not able to do so.

Personally, I find myself more confident in what pgbackrest does to
remaster a former primary (using a delta restore), but a lot of that
really comes down to the question of: why did the primary fail?  If you
don't know that, I really wouldn't recommend using pg_rewind.

> I agree with your concerns about the increased load on the primary server,
> but this rebuilding process would only be done in case of emergency or
> during low load hours.
> 
> pg_basebackup works fine but does not support differential/incremental
> backups which is a blocker.

pg_basebackup is missing an awful lot of other things- managing of
backup rotation, WAL expiration, the ability to parallelize, encryption
support, ability to push backups/fetch backups to/from cloud storage
solutions, ability to resume from failed backups, delta restore (which
is more-or-less what you're asking for), parallel archiving/fetching of
WAL..

> Do you know any alternative software that is able to rebuild the standby PG
> data dir using rsync or similar while the primary is still online?

pgbackrest can certainly rebuild the standby, if you're using it for
backups, and do so very quickly thanks to delta restore and it's
parallelism.  I'm not aware of anything that does exactly what you're
looking for.

> It seems a simple pg_start_backup + rsync + pg_stop_backup (maybe combined
> with a LVM snapshot) would do, but we would prefer to use some existing
> tool.

I'd strongly recommend that you use an existing tool, there's an awful
lot of complications and you absolutely can *not* use rsync for that
unless you are doing it with checksums enabled, and even then it's
complicated- you probably don't want to sync across unlogged tables but
it's not easy to exclude those, or temp files/tables, you have to make
sure to manage the WAL properly, ensure that the appropriate information
makes it into the backup_label (you shouldn't be using exclusive backup
because a reboot of the primary at the wrong time will result in PG not
starting up on the primary...), etc, etc.

> We just tried barman, but it also seems to require a restore from the backup
> before being able to start the standby server (?), and we are afraid this
> would require double storage, IO and time for rebuilding the standby
> cluster.

I really think you should reconsider whatever backup solution you're
using today and rather than keeping it independent, make it part of the
solution to rebuilding replicas.

Maybe it isn't clear, so I'll try to explain- pgbackrest, if you use it
for your backups, will be able to restore over top of an existing PG
cluster, updating only those files which are different from what's in
the backup (based on checksums that it calculates), and is able to do so
in parallel, and then you can replay WAL from your pgbackrest repo,
right up until the replica is able to reconnect to the primary and
resume replaying WAL.  It's a pretty common approach and is supported by
HA solutions like patroni.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Table partitioning for cloud service?

2020-05-21 Thread Ravi Krishna

> 
> The database/schema per tenant solution can be tedious when you want to 
> modify something on the structure and you have numerous tenants.
> Therefore I used the "tables with tenant_id" version in a similar situation 
> but with a slight twist. One of the biggest issue of this solution is that if 
> you forget to add the tenant_id to the where clause you are going to reveal 
> one tenant's data to another.
> I came up with the solution that the database user have no privileges for 
> accessing the base tables. Instead of that I generate views for each tenant 
> and they can access their own data in the underlying table through these 
> views. Now if forget to address the right tenant in my client code(it still 
> happens sometimes) and try to directly access the base tables I get a 
> strongly worded reminder from the server.

1. If you have 50 tables and say 100 tenants, we are talking about 5,000 views. 
I am not sure 
whether it is any more elegant than having 100 schemas.
2. In your approach you can do any phased DDL upgrade.  It is typical to do 
rolling upgrades
in a multi tenant databases, starting with least risky tenant.

Re: Strategy for materialisation and centralisation of data

2020-05-21 Thread Rory Campbell-Lange

On 21/05/20, Michael Stephenson (domehead...@gmail.com) wrote:
> On Thu, May 21, 2020 at 10:36 AM Rory Campbell-Lange wrote

> > Presently I've been thinking of using triggers or materialized views in
> > each database to materialise data into a "matview" schema which is then
> > shipped via logical replication to an organisation database when
> > required. New columns in the matview schema tables would ensure replica
> > identity uniqueness and allow the data to be safely stored in common
> > tables in the organisation database.

> You might find Materialize interesting:
> 
> https://materialize.io/

Hi Michael. Thanks for the pointer. Materialize.io seems cool but we
don't deal with streaming data, or am I missing something?

Cheers
Rory

Re: Strategy for materialisation and centralisation of data

2020-05-21 Thread Michael Stephenson

You might find Materialize interesting:

https://materialize.io/

https://youtu.be/zWSdkGq1XWk


On Thu, May 21, 2020 at 10:36 AM Rory Campbell-Lange <
r...@campbell-lange.net> wrote:

> We have quite a few databases of type a and many of type b in a cluster.
> Both a and b types are fairly complex and are different solutions to a
> similar problem domain. All the databases are very read-centric, and all
> database interaction is currently through plpgsql with no materialised
> data.
>
> Some organisations have several type a and many type b databases, and
> need to query these in a homogeneous manner. We presently do this with
> many middleware requests or pl/proxy. An a or b type database belongs to
> 0 or 1 organisations.
>
> Making a and b generally the same would be a very big project.
> Consequently I'm discussing materialising a subset of data in a common
> format between the two database types and shipping that data to
> organisation databases. This would have the benefit of providing a
> common data interface and speeding up queries for all database types.
> Users would have faster queries, and it would be a big time saver for
> our development team, who presently have to deal with three quite
> different data APIs.
>
> Presently I've been thinking of using triggers or materialized views in
> each database to materialise data into a "matview" schema which is then
> shipped via logical replication to an organisation database when
> required. New columns in the matview schema tables would ensure replica
> identity uniqueness and allow the data to be safely stored in common
> tables in the organisation database.
>
> A few issues I foresee with this approach include:
>
> * requiring two to three times current storage for materialisation
>   (the cluster is currently ~250GB)
>
> * having to have many logical replication slots
>   (we sometimes suffer from pl/proxy connection storms)
>
> Commentary gratefully received,
> Rory
>
>
>
>
>
>

RE: Should I use JSON?

2020-05-21 Thread Zahir Lalani

We make lots of use JSON – but in specific contexts.

If we need to pull data out for listing view – always raw fields. If these are 
detail view only and we need dynamic content depending on record types, JSON is 
a life saver

Z

From: pabloa98 
Sent: 21 May 2020 20:28
Cc: pgsql-gene...@postgresql.org
Subject: Re: Should I use JSON?



On Thu, May 21, 2020 at 8:37 AM stan mailto:st...@panix.com>> 
wrote:
Worming on a small project, and have been doing a lot of Perl scripting to
parse various types of files to populate the database. Now I need to get
data from a cloud services provider (time-keeping). They have a REST API
that returns data in a JSOSN format.

So here is the question, should I just manually parse this data, as I have
been doing to insert into appropriate entities into the database? Or should I
insert the JSON data, and use some queries in the database to populate my
tables from the JSON tables?

That depends of how advanced is your analysis of the solution you want to 
implement.
If you are still exploring, I would suggest you store JSON in JSONB columns + 
some id column to search it.

When your program/solution knows what properties you are going to use, perhaps 
you want to convert those in columns.

In any case, data could be indexed in both, columns and JSONB

So it is up to you :)

Pablo

Re: Strategy for materialisation and centralisation of data

2020-05-21 Thread Karsten Hilbert

On Thu, May 21, 2020 at 09:52:02PM +0100, Rory Campbell-Lange wrote:

> > Might postgres_fdw help in any way ?
>
> Thanks for the suggestion. As I noted we are already using pl/proxy and
> it works well, although we are soaking up a lot of connections with it.
> >From my reading of the postgres_fdw docs it is much more featureful than
> pl/proxy but it is likely to have the same connection characteristics.
>
> The main issues we're trying to solve is standardising data access
> through (I think) materialisation and centralisation.

While plausible I still would not be entirely surprised if
testing were to reveal something different.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B

Re: Strategy for materialisation and centralisation of data

2020-05-21 Thread Rory Campbell-Lange

On 21/05/20, Karsten Hilbert (karsten.hilb...@gmx.net) wrote:
> On Thu, May 21, 2020 at 03:35:59PM +0100, Rory Campbell-Lange wrote:
> 
> > We have quite a few databases of type a and many of type b in a cluster.
> > Both a and b types are fairly complex and are different solutions to a
> > similar problem domain. All the databases are very read-centric, and all
> > database interaction is currently through plpgsql with no materialised
> > data.
> >
> > Some organisations have several type a and many type b databases, and
> > need to query these in a homogeneous manner. We presently do this with
> > many middleware requests or pl/proxy. An a or b type database belongs to
> > 0 or 1 organisations.
> 
> Might postgres_fdw help in any way ?

Thanks for the suggestion. As I noted we are already using pl/proxy and
it works well, although we are soaking up a lot of connections with it.
>From my reading of the postgres_fdw docs it is much more featureful than
pl/proxy but it is likely to have the same connection characteristics.

The main issues we're trying to solve is standardising data access
through (I think) materialisation and centralisation.

Rory

Re: Table partitioning for cloud service?

2020-05-21 Thread Israel Brewster

On May 21, 2020, at 12:12 PM, Sándor Daku  wrote:
> Hi,
> 
> On Thu, 21 May 2020 at 18:14, Christopher Browne  > wrote:
> On Thu, 21 May 2020 at 11:53, Israel Brewster  > wrote:
>> 
> 
> - Table-based tenancy (e.g. - each table has a "tenant_id" and queries need 
> to specify the tenant)
> 
> The database/schema per tenant solution can be tedious when you want to 
> modify something on the structure and you have numerous tenants.
> Therefore I used the "tables with tenant_id" version in a similar situation 
> but with a slight twist. One of the biggest issue of this solution is that if 
> you forget to add the tenant_id to the where clause you are going to reveal 
> one tenant's data to another.
> I came up with the solution that the database user have no privileges for 
> accessing the base tables. Instead of that I generate views for each tenant 
> and they can access their own data in the underlying table through these 
> views. Now if forget to address the right tenant in my client code(it still 
> happens sometimes) and try to directly access the base tables I get a 
> strongly worded reminder from the server.

Nice solution! I think I may go to something like that once I upgrade to a 
cloud solution that lets me add multiple users to the DB (the free tier of 
Heroku does not). In the meantime, while I just have the single customer, I can 
fake it easily enough.

Is there any shortcuts for referencing the proper views, or do you just 
append/prepend something to every table reference in your SQL? One nice thing 
about the database/schema approach is that I can just specify the search_path 
(or database) in the connection command, and then all the table references 
remain the same for all tenants. Also helps avoid the situation you mentioned 
where you forget to address the right tenant, since you only have to do it in 
one place. Of course, as you said, it can be tedious when you want to modify 
the structure. I’ll have to think about that a bit more.

---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com 
Home of EZPunch and Lyrics Presenter
> 
> 
> Regards,
> Sándor
> 
>   
> 
>

Query returns no rows in pg_basebackup cluster

2020-05-21 Thread Andrus


Hi!

In windows pg_basebackup was used to create base backup from Linux server.
baas column data type is character(8)

In Linux server  query 


select *  from firma1.desktop where baas='_LOGIFAI'

returns 16 rows.

Windows server this query returns 0 rows.

In Windows server same query using like

select *  from firma1.desktop where baas like '_LOGIFAI'

returns properly 16 rows. 


Maybe this is because database locale is not known in windows:

CREATE DATABASE sba
   WITH 
   OWNER = sba_owner

   ENCODING = 'UTF8'
   LC_COLLATE = 'et_EE.UTF-8'
   LC_CTYPE = 'et_EE.UTF-8'
   TABLESPACE = pg_default
   CONNECTION LIMIT = -1;

Correct encoding for windows should be

LC_COLLATE = 'Estonian_Estonia.1257'
LC_CTYPE = 'Estonian_Estonia.1257'

IF so how to to fix windows cluster so that query returns proper result in 
windows also?
Database in Windows is in read-only (recovery) mode so it cannot changed.
Postgres 12 is used.

Andrus.

Re: Table partitioning for cloud service?

2020-05-21 Thread Sándor Daku

Hi,

On Thu, 21 May 2020 at 18:14, Christopher Browne  wrote:

> On Thu, 21 May 2020 at 11:53, Israel Brewster 
> wrote:
>
>>
>> - Table-based tenancy (e.g. - each table has a "tenant_id" and queries
> need to specify the tenant)
>

The database/schema per tenant solution can be tedious when you want to
modify something on the structure and you have numerous tenants.
Therefore I used the "tables with tenant_id" version in a similar situation
but with a slight twist. One of the biggest issue of this solution is that
if you forget to add the tenant_id to the where clause you are going to
reveal one tenant's data to another.
I came up with the solution that the database user have no privileges for
accessing the base tables. Instead of that I generate views for each tenant
and they can access their own data in the underlying table through these
views. Now if forget to address the right tenant in my client code(it still
happens sometimes) and try to directly access the base tables I get a
strongly worded reminder from the server.

Regards,
Sándor

Help understanding SIReadLock growing without bound on completed transaction

2020-05-21 Thread Mike Klaas

We recently experienced an issue where a transaction that was finished and no 
longer existed kept slowly accumulating SIReadLocks over the period of a week.  
The only way we could remove the locks was by restarting postgresql.

The entries in pg_locks resembled:

mode: SIReadLock

locktype: page

relation::regclass::text: _pkey

virtualtransaction: 36/296299968

granted:t

pid: 2263461

count(1): 5559 (when grouped)

Note that this pid did not exist in pg_stat_activity.  I understand that it is 
normal for SSI locks to persist after a transaction is finished.  There are 
however two aspects to this that I don't understand:

* It's my understanding that these locks should be cleared when there are no 
conflicting transactions.  These locks had existed for > 1 week and we have no 
transactions that last more than a few seconds (the oldest transaction in 
pg_stat_activity is always < 1minute old).

* Why would a transaction that is finished continue accumulating locks over 
time?

If it helps, here is some more information about the state of the system:

* There were a total of six pids in pg_locks that didn't exist in 
pg_stat_activity.  They held a variety of SIReadLocks, but they weren't 
increasing in number over time.  I'm not sure how long they were present; I 
only know that the problematic pid existed for a week due to its continual 
growth reflecting in our internal lock monitoring system.

* I tried finding overlapping SIReadLocks (see query below), but none were 
returned (I realize that the SSI conflict resolution algo is much more involved 
than this simple query)

* PG version: 9.6.17

I would appreciate any hints of what I could've done to investigate this 
further or how I could've resolved the issue without restarting the db (and 
thus experiencing downtime).

thank you in advance,
-Mike

SELECT

waiting.locktype AS w_locktype,

LEFT(waiting.relation::regclass::text,25) AS waiting_table,

COALESCE(waiting_stm.query,'?') AS w_query,

waiting.page ( http://waiting.page/ ) AS w_page,

waiting.tuple AS w_tuple,

waiting.pid ( http://waiting.pid/ ) AS w_pid,

other.locktype AS o_locktype,

LEFT(other.relation::regclass::text,15) AS other_table,

LEFT(COALESCE(other_stm.query, '?'), 50) AS other_query,

other.page ( http://other.page/ ) AS o_page,

other.tuple AS o_tuple,

other.pid ( http://other.pid/ ) AS other_pid,

other.GRANTED AS o_granted

FROM

pg_catalog.pg_locks AS waiting

LEFT JOIN

pg_catalog.pg_stat_activity AS waiting_stm

ON waiting_stm.pid ( http://waiting_stm.pid/ ) = waiting.pid ( 
http://waiting.pid/ )

JOIN

pg_catalog.pg_locks AS other

ON (

(

waiting."database" = other."database"

AND waiting.relation = other.relation

and waiting.locktype = other.locktype

AND ( CASE WHEN other.locktype = 'page' THEN waiting.page ( 
http://waiting.page/ ) IS NOT DISTINCT FROM other.page ( http://other.page/ )

WHEN other.locktype = 'tuple' THEN waiting.page ( http://waiting.page/ ) IS NOT 
DISTINCT FROM other.page ( http://other.page/ ) and waiting.tuple IS NOT 
DISTINCT FROM other.tuple

ELSE true END

)

)

OR waiting.transactionid = other.transactionid

) AND waiting.pid ( http://waiting.pid/ ) <> other.pid ( http://other.pid/ )

LEFT JOIN

pg_catalog.pg_stat_activity AS other_stm

ON other_stm.pid ( http://other_stm.pid/ ) = other.pid ( http://other.pid/ )

WHERE waiting.pid ( http://waiting.pid/ ) IN (2263461, 2263276, 2263283, 
2263284, 2263459, 2263527 )

Re: Strategy for materialisation and centralisation of data

2020-05-21 Thread Karsten Hilbert

On Thu, May 21, 2020 at 03:35:59PM +0100, Rory Campbell-Lange wrote:


> We have quite a few databases of type a and many of type b in a cluster.
> Both a and b types are fairly complex and are different solutions to a
> similar problem domain. All the databases are very read-centric, and all
> database interaction is currently through plpgsql with no materialised
> data.
>
> Some organisations have several type a and many type b databases, and
> need to query these in a homogeneous manner. We presently do this with
> many middleware requests or pl/proxy. An a or b type database belongs to
> 0 or 1 organisations.

Might postgres_fdw help in any way ?

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B

Re: Should I use JSON?

2020-05-21 Thread pabloa98

On Thu, May 21, 2020 at 8:37 AM stan  wrote:

> Worming on a small project, and have been doing a lot of Perl scripting to
> parse various types of files to populate the database. Now I need to get
> data from a cloud services provider (time-keeping). They have a REST API
> that returns data in a JSOSN format.
>
> So here is the question, should I just manually parse this data, as I have
> been doing to insert into appropriate entities into the database? Or
> should I
> insert the JSON data, and use some queries in the database to populate my
> tables from the JSON tables?
>
>
That depends of how advanced is your analysis of the solution you want to
implement.
If you are still exploring, I would suggest you store JSON in JSONB columns
+ some id column to search it.

When your program/solution knows what properties you are going to use,
perhaps you want to convert those in columns.

In any case, data could be indexed in both, columns and JSONB

So it is up to you :)

Pablo

Re: Table partitioning for cloud service?

2020-05-21 Thread Michael Lewis

On Thu, May 21, 2020 at 11:41 AM Adam Brusselback 
wrote:

> As an optimization I just worked on for my database earlier this week, I
> decided to logically replicate that table from my main authentication
> database into a each cluster, and I replaced all references to the FDW for
> read-only queries to use the logically replicated table. All write queries
> still hit the FDW as before.
>

Perhaps you considered this, but if you had not wanted to deal with the
administration side with replication, and the centralized data is changed
infrequently from only one application/source perhaps, then updating the
source and then refreshing a materialized view on each local db that pulls
in the foreign data could be a good option. Some chance of stale data since
the refresh must be triggered, but for some use cases it may be the
simplest setup.

Re: Table partitioning for cloud service?

2020-05-21 Thread Adam Brusselback

>  An interesting answer, if there needs to be shared data, is for the
shared data to go in its own database, and use a Foreign Data Wrapper to
have each tenants' database access it <
https://www.postgresql.org/docs/12/postgres-fdw.html>

For my application I went the schema-per-tenant route, but I have a need to
have a single login which will work for all tenants you've been given
access to. Not all tenants are required to be on the same database host, so
I broke that piece out into it's own database and used postgres fdw to make
it seem local to each tenant.

So i've got first hand experience with this for the past ~5 years, but this
approach has serious tradeoffs. Queries that need to access the remote
table can just fall on their face sometimes.  You will also need to deal
with practically every connection spawning 1-or-more new connections which
will stay open taking resources the first time a query is issued that
accesses foreign data.

As an optimization I just worked on for my database earlier this week, I
decided to logically replicate that table from my main authentication
database into a each cluster, and I replaced all references to the FDW for
read-only queries to use the logically replicated table. All write queries
still hit the FDW as before.

This was acceptable for my use case, and drastically improved performance
for some queries where I had previously had to use a CTE to force
materialization to get acceptable performance due to the nature of going
over the FDW for that data.

It's a very cool tool, just be careful about how it can impact performance
if you don't measure for your specific use case.

Re: Table partitioning for cloud service?

2020-05-21 Thread Israel Brewster



> On May 21, 2020, at 7:57 AM, Adrian Klaver  wrote:
> 
> On 5/21/20 8:53 AM, Israel Brewster wrote:
>>> On May 21, 2020, at 7:36 AM, Adrian Klaver >> > wrote:
>>> 
>>> On 5/21/20 8:29 AM, Israel Brewster wrote:
 I’m working on my first cloud service, which will be backed by a 
 postgresql database. Currently I only have a single customer, but of 
 course I want to design with the possibility of multiple customers in 
 mind. In that vein, I’m wondering what is “typical” in terms of designing 
 the DB structure to make sure that one customer doesn’t “accidentally" get 
 data for another customer? At the moment I am leaning towards giving each 
 customer their own set of tables, with a unique prefix for each. This 
 would provide a “hard” separation for the data,
>>>  I think that is called a schema:)
>> Ok. That’s probably an option. Although it looks from a cursory perusal that 
>> for that to work, I would have to have separate DB users with different 
>> permissions. Which would be fine, except that I don’t have permissions to 
>> create users.
>>> Or set up a separate database for each in the cluster.
>> Same as above - no permissions.
>> At the moment, I am running on Heroku, which gives me a postgresql database, 
>> but not full control over the cluster. I may need to move to something more 
>> robust, if having completely separate databases is the best option. I was 
>> hoping to avoid SysAdmin stuff as much as possible, and focus on the 
>> software side, but obviously some sys admin is required.
> 
> You can't use this?:
> 
> https://devcenter.heroku.com/articles/heroku-postgresql-credentials 
> 

Wasn’t aware of that. I *did* mention this is my first cloud project. Done 
plenty of DB/web/application development, but not cloud/multi-customer. Thanks 
for the pointer.

> 
>> ---
>> Israel Brewster
>> BrewsterSoft Development
>> http://www.brewstersoft.com
>> Home of EZPunch and Lyrics Presenter
>>> 
 but would also increase maintenance efforts, as if I needed to add a field 
 I would have to add it to every table. On the other hand, keeping 
 everything in the same set of tables would mean having to be VERY careful 
 with my SQL to make sure no customer could access another’s data.
 How is this typically done?
 ---
 Israel Brewster
 BrewsterSoft Development
 http://www.brewstersoft.com 
 Home of EZPunch and Lyrics Presenter
>>> 
>>> 
>>> -- 
>>> Adrian Klaver
>>> adrian.kla...@aklaver.com  
>>> >
>>> 
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com

Re: Table partitioning for cloud service?

2020-05-21 Thread Christopher Browne

On Thu, 21 May 2020 at 11:53, Israel Brewster 
wrote:

> On May 21, 2020, at 7:36 AM, Adrian Klaver 
> wrote:
>
> On 5/21/20 8:29 AM, Israel Brewster wrote:
>
> I’m working on my first cloud service, which will be backed by a
> postgresql database. Currently I only have a single customer, but of course
> I want to design with the possibility of multiple customers in mind. In
> that vein, I’m wondering what is “typical” in terms of designing the DB
> structure to make sure that one customer doesn’t “accidentally" get data
> for another customer? At the moment I am leaning towards giving each
> customer their own set of tables, with a unique prefix for each. This would
> provide a “hard” separation for the data,
>
>  I think that is called a schema:)
>
>
> Ok. That’s probably an option. Although it looks from a cursory perusal
> that for that to work, I would have to have separate DB users with
> different permissions. Which would be fine, except that I don’t have
> permissions to create users.
>
> Or set up a separate database for each in the cluster.
>
>
> Same as above - no permissions.
>
> At the moment, I am running on Heroku, which gives me a postgresql
> database, but not full control over the cluster. I may need to move to
> something more robust, if having completely separate databases is the best
> option. I was hoping to avoid SysAdmin stuff as much as possible, and focus
> on the software side, but obviously some sys admin is required.
>

There's a whole lot of "that depends" to this.

If there is not much data shared across customers, then it's a pretty good
answer to create a database for each one.  This is especially good if they
are only occasionally connected.

If there is a LOT of shared data, then "life gets more complicated."

It's a decently well documented problem out there; I just searched for
"multitenant database design" which showed up a number of decent
(not-Postgres-specific) answers

https://www.google.com/search?client=firefox-b-d&q=multitenant+database+design

Some common patterns include:
- A database per tenant (on Postgres, that means that PGDATABASE and/or
connection URIs change for each tenant)
- A schema per tenant (on Postgres, that means each time a tenant is added,
you need "CREATE NAMESPACE" to establish the tenancy and "CREATE TABLE" for
each table in that tenancy, and connections use "set
search_path=tenantname;" to select data from the right tenant)
- Table-based tenancy (e.g. - each table has a "tenant_id" and queries need
to specify the tenant)

An interesting answer, if there needs to be shared data, is for the shared
data to go in its own database, and use a Foreign Data Wrapper to have each
tenants' database access it <
https://www.postgresql.org/docs/12/postgres-fdw.html>

There are lots of tradeoffs involved in each case; each of the above
patterns has merits and demerits particularly as the number of tenants
scales, as well as when you discover there are both tiny and large tenants
with differing requirements.

You need to look at it from various perspectives:
- How do application schema changes get handled as the application evolves?
- What are the security concerns about data sharing across tenants?
- What issues come up when managing storage across tenants?  (Some
approaches are easier to cope with than others)

If you don't have a fair bit of technical expertise locally, then
sophisticated choices will cause you problems that you won't be able to
solve.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: Table partitioning for cloud service?

2020-05-21 Thread Adrian Klaver


On 5/21/20 8:53 AM, Israel Brewster wrote:
On May 21, 2020, at 7:36 AM, Adrian Klaver > wrote:


On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a 
postgresql database. Currently I only have a single customer, but of 
course I want to design with the possibility of multiple customers in 
mind. In that vein, I’m wondering what is “typical” in terms of 
designing the DB structure to make sure that one customer doesn’t 
“accidentally" get data for another customer? At the moment I am 
leaning towards giving each customer their own set of tables, with a 
unique prefix for each. This would provide a “hard” separation for 
the data,

 I think that is called a schema:)


Ok. That’s probably an option. Although it looks from a cursory perusal 
that for that to work, I would have to have separate DB users with 
different permissions. Which would be fine, except that I don’t have 
permissions to create users.



Or set up a separate database for each in the cluster.


Same as above - no permissions.

At the moment, I am running on Heroku, which gives me a postgresql 
database, but not full control over the cluster. I may need to move to 
something more robust, if having completely separate databases is the 
best option. I was hoping to avoid SysAdmin stuff as much as possible, 
and focus on the software side, but obviously some sys admin is required.


You can't use this?:

https://devcenter.heroku.com/articles/heroku-postgresql-credentials



---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com
Home of EZPunch and Lyrics Presenter



but would also increase maintenance efforts, as if I needed to add a 
field I would have to add it to every table. On the other hand, 
keeping everything in the same set of tables would mean having to be 
VERY careful with my SQL to make sure no customer could access 
another’s data.

How is this typically done?
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com 
Home of EZPunch and Lyrics Presenter



--
Adrian Klaver
adrian.kla...@aklaver.com 






--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Should I use JSON?

2020-05-21 Thread David G. Johnston

On Thu, May 21, 2020 at 8:37 AM stan  wrote:

> So here is the question, should I just manually parse this data, as I have
> been doing to insert into appropriate entities into the database? Or
> should I
> insert the JSON data, and use some queries in the database to populate my
> tables from the JSON tables?
>

The simplest thing that works is usually a good starting point.

I consider it a requirement that I can get to the original data as supplied
by the vendor.  So, if you are capturing that elsewhere, going directly to
structured tables is sufficient.  If you don't have any other place where
that gets saved I would save it to a table first then convert it to the
final structured tables.

My current setup is using "jq" to perform an intermediate transform of the
source json to table-specific files and then using psql and jsonb_to_record
insert the json data into the tables.

David J.

Re: Table partitioning for cloud service?

2020-05-21 Thread Israel Brewster

> On May 21, 2020, at 7:36 AM, Adrian Klaver  wrote:
> 
> On 5/21/20 8:29 AM, Israel Brewster wrote:
>> I’m working on my first cloud service, which will be backed by a postgresql 
>> database. Currently I only have a single customer, but of course I want to 
>> design with the possibility of multiple customers in mind. In that vein, I’m 
>> wondering what is “typical” in terms of designing the DB structure to make 
>> sure that one customer doesn’t “accidentally" get data for another customer? 
>> At the moment I am leaning towards giving each customer their own set of 
>> tables, with a unique prefix for each. This would provide a “hard” 
>> separation for the data, 
>  I think that is called a schema:)

Ok. That’s probably an option. Although it looks from a cursory perusal that 
for that to work, I would have to have separate DB users with different 
permissions. Which would be fine, except that I don’t have permissions to 
create users.

> Or set up a separate database for each in the cluster.

Same as above - no permissions.

At the moment, I am running on Heroku, which gives me a postgresql database, 
but not full control over the cluster. I may need to move to something more 
robust, if having completely separate databases is the best option. I was 
hoping to avoid SysAdmin stuff as much as possible, and focus on the software 
side, but obviously some sys admin is required.

---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com 
Home of EZPunch and Lyrics Presenter

> 
>> but would also increase maintenance efforts, as if I needed to add a field I 
>> would have to add it to every table. On the other hand, keeping everything 
>> in the same set of tables would mean having to be VERY careful with my SQL 
>> to make sure no customer could access another’s data.
>> How is this typically done?
>> ---
>> Israel Brewster
>> BrewsterSoft Development
>> http://www.brewstersoft.com 
>> Home of EZPunch and Lyrics Presenter
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Re: Should I use JSON?

2020-05-21 Thread Adrian Klaver


On 5/21/20 8:37 AM, stan wrote:

Worming on a small project, and have been doing a lot of Perl scripting to
parse various types of files to populate the database. Now I need to get
data from a cloud services provider (time-keeping). They have a REST API
that returns data in a JSOSN format.

So here is the question, should I just manually parse this data, as I have
been doing to insert into appropriate entities into the database? Or should I
insert the JSON data, and use some queries in the database to populate my
tables from the JSON tables?



I use Python for this sort of thing. A JSON array of objects maps so 
nicely to a Python list of dicts that I just do the parsing in Python 
and INSERT the parsed data into the table.







--
Adrian Klaver
adrian.kla...@aklaver.com

Should I use JSON?

2020-05-21 Thread stan

Worming on a small project, and have been doing a lot of Perl scripting to
parse various types of files to populate the database. Now I need to get
data from a cloud services provider (time-keeping). They have a REST API
that returns data in a JSOSN format.

So here is the question, should I just manually parse this data, as I have
been doing to insert into appropriate entities into the database? Or should I
insert the JSON data, and use some queries in the database to populate my
tables from the JSON tables?


-- 
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
-- Benjamin Franklin

Re: Table partitioning for cloud service?

2020-05-21 Thread Adrian Klaver


On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a 
postgresql database. Currently I only have a single customer, but of 
course I want to design with the possibility of multiple customers in 
mind. In that vein, I’m wondering what is “typical” in terms of 
designing the DB structure to make sure that one customer doesn’t 
“accidentally" get data for another customer? At the moment I am leaning 
towards giving each customer their own set of tables, with a unique 
prefix for each. This would provide a “hard” separation for the data, 

 I think that is called a schema:)
Or set up a separate database for each in the cluster.

but would also increase maintenance efforts, as if I needed to add a 
field I would have to add it to every table. On the other hand, keeping 
everything in the same set of tables would mean having to be VERY 
careful with my SQL to make sure no customer could access another’s data.


How is this typically done?

---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com 
Home of EZPunch and Lyrics Presenter






--
Adrian Klaver
adrian.kla...@aklaver.com

Table partitioning for cloud service?

2020-05-21 Thread Israel Brewster

I’m working on my first cloud service, which will be backed by a postgresql 
database. Currently I only have a single customer, but of course I want to 
design with the possibility of multiple customers in mind. In that vein, I’m 
wondering what is “typical” in terms of designing the DB structure to make sure 
that one customer doesn’t “accidentally" get data for another customer? At the 
moment I am leaning towards giving each customer their own set of tables, with 
a unique prefix for each. This would provide a “hard” separation for the data, 
but would also increase maintenance efforts, as if I needed to add a field I 
would have to add it to every table. On the other hand, keeping everything in 
the same set of tables would mean having to be VERY careful with my SQL to make 
sure no customer could access another’s data.

How is this typically done?

---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com 
Home of EZPunch and Lyrics Presenter

Re: Suggestion on index creation for TEXT data field

2020-05-21 Thread postgann2020 s

Hi David, Adrian,

Thanks for the information.
Sure, will post on PostGIS community.

Regards,
PostgAnn.

On Thu, May 21, 2020 at 8:21 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Thu, May 21, 2020 at 7:45 AM postgann2020 s 
> wrote:
>
>> >And what type of data exactly are we talking about.  ==> Column is
>> stroing GIS data.
>>
>
> GIS data isn't really TEXT and isn't a core datatype of PostgreSQL so this
> is maybe better posted to the PostGIS community directly...
>
> David J.
>
>

Re: Suggestion on index creation for TEXT data field

2020-05-21 Thread David G. Johnston

On Thu, May 21, 2020 at 7:45 AM postgann2020 s 
wrote:

> >And what type of data exactly are we talking about.  ==> Column is
> stroing GIS data.
>

GIS data isn't really TEXT and isn't a core datatype of PostgreSQL so this
is maybe better posted to the PostGIS community directly...

David J.

Re: Suggestion on table analyze

2020-05-21 Thread postgann2020 s

Hi Adrian,

Thanks, I'll check it out.

Regards,
PostgAnn.

On Thu, May 21, 2020 at 8:11 PM Adrian Klaver 
wrote:

> On 5/21/20 7:18 AM, postgann2020 s wrote:
> > Hi Team,
> >
> > Thanks for your support.
> >
> > Could you please suggest on below query.
> >
> > Environment
> > PostgreSQL: 9.5.15
> > Postgis: 2.2.7
> >
> > Mostly table contain GIS data.
> >
> > While analyzing the table getting below NOTICE. It seems is pretty
> > understanding, but needs help on the below points.
> >
> > 1 . What might be the reason for getting the NOTICE?.
> > 2. Is this lead to any problems in the future?.
> >
> > ANALYZE SCHEMA.TABLE;
> >
> > NOTICE:  no non-null/empty features, unable to compute statistics
> > NOTICE:  no non-null/empty features, unable to compute statistics
> > Query returned successfully with no result in 1.1 secs.
>
> This is coming from PostGIS:
>
> postgis/gserialized_estimate.c:
> /* If there's no useful features, we can't work out stats */
>  if ( ! notnull_cnt )
>  {
>  elog(NOTICE, "no non-null/empty features, unable to
> compute statistics");
>  stats->stats_valid = false;
>  return;
>  }
>
>
>
> You might find more information from here:
>
> https://postgis.net/support/
>
> Though FYI PostGIS 2.2.7 is past EOL:
>
> https://postgis.net/source/
>
> >
> > Thanks for your support.
> >
> > Regards,
> > PostgAnn.
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Re: Suggestion on index creation for TEXT data field

2020-05-21 Thread Adrian Klaver


On 5/21/20 7:27 AM, postgann2020 s wrote:

Hi Team,
Thanks for your support.
Could you please suggest on below query.

Environment

PostgreSQL: 9.5.15
Postgis: 2.2.7
Mostly table contains GIS data and we are trying to creating an index on 
the column which is having an avg width of 149bytes.


  CREATE INDEX index_idx
   ON SCHEMA.TABLE
   USING btree
   (column);

ERROR:  index row size 2976 exceeds maximum 2712 for index "index_idx"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full-text 
indexing.^^

Hint supplies answer to 1) and 2) below.



Could you please suggest on below queries.
1. How to solve the issue?.
2. What type of index is the best suited for this type of data?.

Thanks for your support.

Regards,
PostgAnn.



--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Suggestion on index creation for TEXT data field

2020-05-21 Thread postgann2020 s

Hi David,

Thanks for your email.

>And what type of data exactly are we talking about.  ==> Column is stroing
GIS data.

Regards,
PostgAnn.

On Thu, May 21, 2020 at 8:06 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Thu, May 21, 2020 at 7:28 AM postgann2020 s 
> wrote:
>
>> which is having an avg width of 149bytes.
>>
>
> The average is meaningless if your maximum value exceeds a limit.
>
> 2. What type of index is the best suited for this type of data?.
>>
>
> And what type of data exactly are we talking about.  "TEXT" is not a
> useful answer.
>
> If the raw data is too large no index is going to be "best" -  as the hint
> suggests you either need to drop the idea of indexing the column altogether
> or apply some function to the raw data and then index the result.
>
> David J.
>
>

Re: Suggestion on table analyze

2020-05-21 Thread Adrian Klaver


On 5/21/20 7:18 AM, postgann2020 s wrote:

Hi Team,

Thanks for your support.

Could you please suggest on below query.

Environment
PostgreSQL: 9.5.15
Postgis: 2.2.7

Mostly table contain GIS data.

While analyzing the table getting below NOTICE. It seems is pretty 
understanding, but needs help on the below points.


1 . What might be the reason for getting the NOTICE?.
2. Is this lead to any problems in the future?.

ANALYZE SCHEMA.TABLE;

NOTICE:  no non-null/empty features, unable to compute statistics
NOTICE:  no non-null/empty features, unable to compute statistics
Query returned successfully with no result in 1.1 secs.


This is coming from PostGIS:

postgis/gserialized_estimate.c:
/* If there's no useful features, we can't work out stats */
if ( ! notnull_cnt )
{
elog(NOTICE, "no non-null/empty features, unable to 
compute statistics");

stats->stats_valid = false;
return;
}



You might find more information from here:

https://postgis.net/support/

Though FYI PostGIS 2.2.7 is past EOL:

https://postgis.net/source/



Thanks for your support.

Regards,
PostgAnn.



--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Suggestion on index creation for TEXT data field

2020-05-21 Thread David G. Johnston

On Thu, May 21, 2020 at 7:28 AM postgann2020 s 
wrote:

> which is having an avg width of 149bytes.
>

The average is meaningless if your maximum value exceeds a limit.

2. What type of index is the best suited for this type of data?.
>

And what type of data exactly are we talking about.  "TEXT" is not a useful
answer.

If the raw data is too large no index is going to be "best" -  as the hint
suggests you either need to drop the idea of indexing the column altogether
or apply some function to the raw data and then index the result.

David J.

Strategy for materialisation and centralisation of data

2020-05-21 Thread Rory Campbell-Lange

We have quite a few databases of type a and many of type b in a cluster.
Both a and b types are fairly complex and are different solutions to a
similar problem domain. All the databases are very read-centric, and all
database interaction is currently through plpgsql with no materialised
data.

Some organisations have several type a and many type b databases, and
need to query these in a homogeneous manner. We presently do this with
many middleware requests or pl/proxy. An a or b type database belongs to
0 or 1 organisations.

Making a and b generally the same would be a very big project.
Consequently I'm discussing materialising a subset of data in a common
format between the two database types and shipping that data to
organisation databases. This would have the benefit of providing a
common data interface and speeding up queries for all database types.
Users would have faster queries, and it would be a big time saver for
our development team, who presently have to deal with three quite
different data APIs.

Presently I've been thinking of using triggers or materialized views in
each database to materialise data into a "matview" schema which is then
shipped via logical replication to an organisation database when
required. New columns in the matview schema tables would ensure replica
identity uniqueness and allow the data to be safely stored in common
tables in the organisation database.

A few issues I foresee with this approach include:

* requiring two to three times current storage for materialisation
  (the cluster is currently ~250GB)

* having to have many logical replication slots
  (we sometimes suffer from pl/proxy connection storms)

Commentary gratefully received,
Rory

Re: Behaviour of failed Primary

2020-05-21 Thread Amit Kapila

On Thu, May 21, 2020 at 5:38 PM Santhosh Kumar  wrote:
>
> Hi Forum,
>  If I have a cluster with Synchronous replication enabled with three nodes, 
> for eg:
>
> [primary] [hot stand by 1] [host stand by 2]
>
> And for some unforeseen reasons, if primary fails, the failover will kick in 
> and hot stand by 1 will become new primary and  cluster setup will look like 
> this
>
> [new primary (hot stand by1)] [host stand by 2]
>
> My question here is, what will happen if the original primary which has 
> failed comes back. Will it become part of this high available replica cluster 
> automatically or it will be stale and disconnected from the cluster?
>

It won't become standby automatically as it would have diverged from
the new master.

> How can we automatically make the failed primary to be part of the cluster 
> with hot standby role? It would be of great help, if you can direct me to any 
> references details. Thank you, upfront.
>

I think pg_rewind can help in such situations.  See the docs of pg_rewind [1].


[1] - https://www.postgresql.org/docs/devel/app-pgrewind.html

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Suggestion on index creation for TEXT data field

2020-05-21 Thread postgann2020 s

Hi Team,
Thanks for your support.
Could you please suggest on below query.

Environment

PostgreSQL: 9.5.15
Postgis: 2.2.7
Mostly table contains GIS data and we are trying to creating an index on
the column which is having an avg width of 149bytes.

 CREATE INDEX index_idx
  ON SCHEMA.TABLE
  USING btree
  (column);

ERROR:  index row size 2976 exceeds maximum 2712 for index "index_idx"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full-text
indexing.

Could you please suggest on below queries.
1. How to solve the issue?.
2. What type of index is the best suited for this type of data?.

Thanks for your support.

Regards,
PostgAnn.

Suggestion on table analyze

2020-05-21 Thread postgann2020 s

Hi Team,

Thanks for your support.

Could you please suggest on below query.

Environment
PostgreSQL: 9.5.15
Postgis: 2.2.7

Mostly table contain GIS data.

While analyzing the table getting below NOTICE. It seems is pretty
understanding, but needs help on the below points.

1 . What might be the reason for getting the NOTICE?.
2. Is this lead to any problems in the future?.

ANALYZE SCHEMA.TABLE;

NOTICE:  no non-null/empty features, unable to compute statistics
NOTICE:  no non-null/empty features, unable to compute statistics
Query returned successfully with no result in 1.1 secs.

Thanks for your support.

Regards,
PostgAnn.

Re: pg_basebackup + incremental base backups

2020-05-21 Thread Stephen Frost

Greetings,

* Christopher Pereira (krip...@imatronix.cl) wrote:
> >>In case of big databases, can we do incremental backups with pg_basebackup?
> >pg_basebackup doesn't support incremental backups, though there's been
> >discussion of adding some kind of support for it, check -hackers if
> >you're curious.
> >
> >>Is there any alternative?
> >There's a few different alternatives for PG backup software, some of
> >which support incremental backups and restores.  I'm personally involved
> >with pgbackrest- https://pgbackrest.org though there's other options out
> >there also (wal-g, barman, and more).
> 
> We tried pgbackrest as a solution to rebuild the standby cluster, by making
> a differential backup of the primary cluster.
> 
> But it seems that pgbackrest is only able to rebuild the standby cluster by
> restoring an existing backup from a repository cluster, and not directly
> from the primary cluster.

Yes, pgbackrest is a full solution for backups, not just a clone sync
tool.

> We are afraid that creating a backup on the repository cluster is an
> additional step that could be avoided.

Ideally you'd be taking regular backups and therefore would have one to
use to perform the restore (or delta restore if you're trying to catch
up a replica that's fallen behind).  Note that with pgbackrest, you
don't need to take a *new* backup, if you're already using it for your
regular backups- just do a delta restore and pgbackrest will
automatically pick the latest backup (of whatever type) and will restore
using it.

> Is there some way to rebuild the standby cluster by doing a differential
> backup of the primary cluster directly?

We've contemplated adding support for something like this to pgbackrest,
since all the pieces are there, but there hasn't been a lot of demand
for it and it kind of goes against the idea of having a proper backup
solution, really..  It'd also create quite a bit of load on the primary
to checksum all the files to do the comparison against what's on the
replica that you're trying to update, so not something you'd probably
want to do a lot more than necessary.

Thanks,

Stephen

signature.asc
Description: PGP signature

Behaviour of failed Primary

2020-05-21 Thread Santhosh Kumar

Hi Forum,
 If I have a cluster with Synchronous replication enabled with three nodes,
for eg:

[primary] [hot stand by 1] [host stand by 2]

And for some unforeseen reasons, if primary fails, the failover will kick
in and hot stand by 1 will become new primary and  cluster setup will look
like this

[new primary (hot stand by1)] [host stand by 2]

My question here is, what will happen if the original primary which has
failed comes back. Will it become part of this high available replica
cluster automatically or it will be stale and disconnected from the
cluster?

How can we automatically make the failed primary to be part of the
cluster with hot standby role? It would be of great help, if you can direct
me to any references details. Thank you, upfront.

Regards,
Santhosh

Re: pg_basebackup + incremental base backups

2020-05-21 Thread Christopher Pereira





In case of big databases, can we do incremental backups with pg_basebackup?

pg_basebackup doesn't support incremental backups, though there's been
discussion of adding some kind of support for it, check -hackers if
you're curious.


Is there any alternative?

There's a few different alternatives for PG backup software, some of
which support incremental backups and restores.  I'm personally involved
with pgbackrest- https://pgbackrest.org though there's other options out
there also (wal-g, barman, and more).

Thanks,

Stephen


Hi Stephen,

We tried pgbackrest as a solution to rebuild the standby cluster, by 
making a differential backup of the primary cluster.


But it seems that pgbackrest is only able to rebuild the standby cluster 
by restoring an existing backup from a repository cluster, and not 
directly from the primary cluster.


We are afraid that creating a backup on the repository cluster is an 
additional step that could be avoided.


Is there some way to rebuild the standby cluster by doing a differential 
backup of the primary cluster directly?

55 matches

Mail list logo