Re: Table partitioning for cloud service?

2020-05-23 Thread Edson Richter


-- Mensagem original--
De: Ravi Krishna
Data: qui, 21 de mai de 2020 18:50
Para:
Cc:PostgreSQL Mailing Lists;
Assunto:Re: Table partitioning for cloud service?


>
> The database/schema per tenant solution can be tedious when you want to 
> modify something on the structure and you have numerous tenants.
> Therefore I used the "tables with tenant_id" version in a similar situation 
> but with a slight twist. One of the biggest issue of this solution is that if 
> you forget to add the tenant_id to the where clause you are going to reveal 
> one tenant's data to another.
> I came up with the solution that the database user have no privileges for 
> accessing the base tables. Instead of that I generate views for each tenant 
> and they can access their own data in the underlying table through these 
> views. Now if forget to address the right tenant in my client code(it still 
> happens sometimes) and try to directly access the base tables I get a 
> strongly worded reminder from the server.

1. If you have 50 tables and say 100 tenants, we are talking about 5,000 views. 
I am not sure
whether it is any more elegant than having 100 schemas.
2. In your approach you can do any phased DDL upgrade.  It is typical to do 
rolling upgrades
in a multi tenant databases, starting with least risky tenant.


Be carefull to plan your backup/recovery strategy. How do you plan to recover 
one customer from backup without interfering with the others?

What will be your disaster recover strategy? Have you considered replication?

Large databases take longer to backup.

Keep schemas in sync can be easily solved.


Regards,


Edson


Re: Table partitioning for cloud service?

2020-05-21 Thread Adrian Klaver

On 5/21/20 1:23 PM, Israel Brewster wrote:
On May 21, 2020, at 12:12 PM, Sándor Daku > wrote:

Hi,

On Thu, 21 May 2020 at 18:14, Christopher Browne > wrote:


On Thu, 21 May 2020 at 11:53, Israel Brewster
mailto:isr...@brewstersoft.com>> wrote:




- Table-based tenancy (e.g. - each table has a "tenant_id" and
queries need to specify the tenant)


The database/schema per tenant solution can be tedious when you want 
to modify something on the structure and you have numerous tenants.
Therefore I used the "tables with tenant_id" version in a similar 
situation but with a slight twist. One of the biggest issue of this 
solution is that if you forget to add the tenant_id to the where 
clause you are going to reveal one tenant's data to another.
I came up with the solution that the database user have no privileges 
for accessing the base tables. Instead of that I generate views for 
each tenant and they can access their own data in the underlying table 
through these views. Now if forget to address the right tenant in my 
client code(it still happens sometimes) and try to directly access the 
base tables I get a strongly worded reminder from the server.


Nice solution! I think I may go to something like that once I upgrade to 
a cloud solution that lets me add multiple users to the DB (the free 
tier of Heroku does not). In the meantime, while I just have the single 
customer, I can fake it easily enough.


Is there any shortcuts for referencing the proper views, or do you just 
append/prepend something to every table reference in your SQL? One nice 
thing about the database/schema approach is that I can just specify the 
search_path (or database) in the connection command, and then all the 
table references remain the same for all tenants. Also helps avoid the 
situation you mentioned where you forget to address the right tenant, 
since you only have to do it in one place. Of course, as you said, it 
can be tedious when you want to modify the structure. I’ll have to think 
about that a bit more.


If you want to take the tedium out of it take a look at Sqitch:

https://sqitch.org/

Then all you have to do is create the change once and deploy to the 
targets.




---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com 
Home of EZPunch and Lyrics Presenter



Regards,
Sándor







--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Table partitioning for cloud service?

2020-05-21 Thread Ravi Krishna
> 
> The database/schema per tenant solution can be tedious when you want to 
> modify something on the structure and you have numerous tenants.
> Therefore I used the "tables with tenant_id" version in a similar situation 
> but with a slight twist. One of the biggest issue of this solution is that if 
> you forget to add the tenant_id to the where clause you are going to reveal 
> one tenant's data to another.
> I came up with the solution that the database user have no privileges for 
> accessing the base tables. Instead of that I generate views for each tenant 
> and they can access their own data in the underlying table through these 
> views. Now if forget to address the right tenant in my client code(it still 
> happens sometimes) and try to directly access the base tables I get a 
> strongly worded reminder from the server.

1. If you have 50 tables and say 100 tenants, we are talking about 5,000 views. 
I am not sure 
whether it is any more elegant than having 100 schemas.
2. In your approach you can do any phased DDL upgrade.  It is typical to do 
rolling upgrades
in a multi tenant databases, starting with least risky tenant.





Re: Table partitioning for cloud service?

2020-05-21 Thread Israel Brewster
On May 21, 2020, at 12:12 PM, Sándor Daku  wrote:
> Hi,
> 
> On Thu, 21 May 2020 at 18:14, Christopher Browne  > wrote:
> On Thu, 21 May 2020 at 11:53, Israel Brewster  > wrote:
>> 
> 
> - Table-based tenancy (e.g. - each table has a "tenant_id" and queries need 
> to specify the tenant)
> 
> The database/schema per tenant solution can be tedious when you want to 
> modify something on the structure and you have numerous tenants.
> Therefore I used the "tables with tenant_id" version in a similar situation 
> but with a slight twist. One of the biggest issue of this solution is that if 
> you forget to add the tenant_id to the where clause you are going to reveal 
> one tenant's data to another.
> I came up with the solution that the database user have no privileges for 
> accessing the base tables. Instead of that I generate views for each tenant 
> and they can access their own data in the underlying table through these 
> views. Now if forget to address the right tenant in my client code(it still 
> happens sometimes) and try to directly access the base tables I get a 
> strongly worded reminder from the server.

Nice solution! I think I may go to something like that once I upgrade to a 
cloud solution that lets me add multiple users to the DB (the free tier of 
Heroku does not). In the meantime, while I just have the single customer, I can 
fake it easily enough.

Is there any shortcuts for referencing the proper views, or do you just 
append/prepend something to every table reference in your SQL? One nice thing 
about the database/schema approach is that I can just specify the search_path 
(or database) in the connection command, and then all the table references 
remain the same for all tenants. Also helps avoid the situation you mentioned 
where you forget to address the right tenant, since you only have to do it in 
one place. Of course, as you said, it can be tedious when you want to modify 
the structure. I’ll have to think about that a bit more.

---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com 
Home of EZPunch and Lyrics Presenter
> 
> 
> Regards,
> Sándor
> 
>   
> 
>  



Re: Table partitioning for cloud service?

2020-05-21 Thread Sándor Daku
Hi,

On Thu, 21 May 2020 at 18:14, Christopher Browne  wrote:

> On Thu, 21 May 2020 at 11:53, Israel Brewster 
> wrote:
>
>>
>> - Table-based tenancy (e.g. - each table has a "tenant_id" and queries
> need to specify the tenant)
>

The database/schema per tenant solution can be tedious when you want to
modify something on the structure and you have numerous tenants.
Therefore I used the "tables with tenant_id" version in a similar situation
but with a slight twist. One of the biggest issue of this solution is that
if you forget to add the tenant_id to the where clause you are going to
reveal one tenant's data to another.
I came up with the solution that the database user have no privileges for
accessing the base tables. Instead of that I generate views for each tenant
and they can access their own data in the underlying table through these
views. Now if forget to address the right tenant in my client code(it still
happens sometimes) and try to directly access the base tables I get a
strongly worded reminder from the server.


Regards,
Sándor


Re: Table partitioning for cloud service?

2020-05-21 Thread Michael Lewis
On Thu, May 21, 2020 at 11:41 AM Adam Brusselback 
wrote:

> As an optimization I just worked on for my database earlier this week, I
> decided to logically replicate that table from my main authentication
> database into a each cluster, and I replaced all references to the FDW for
> read-only queries to use the logically replicated table. All write queries
> still hit the FDW as before.
>

Perhaps you considered this, but if you had not wanted to deal with the
administration side with replication, and the centralized data is changed
infrequently from only one application/source perhaps, then updating the
source and then refreshing a materialized view on each local db that pulls
in the foreign data could be a good option. Some chance of stale data since
the refresh must be triggered, but for some use cases it may be the
simplest setup.


Re: Table partitioning for cloud service?

2020-05-21 Thread Adam Brusselback
>  An interesting answer, if there needs to be shared data, is for the
shared data to go in its own database, and use a Foreign Data Wrapper to
have each tenants' database access it <
https://www.postgresql.org/docs/12/postgres-fdw.html>

For my application I went the schema-per-tenant route, but I have a need to
have a single login which will work for all tenants you've been given
access to. Not all tenants are required to be on the same database host, so
I broke that piece out into it's own database and used postgres fdw to make
it seem local to each tenant.

So i've got first hand experience with this for the past ~5 years, but this
approach has serious tradeoffs. Queries that need to access the remote
table can just fall on their face sometimes.  You will also need to deal
with practically every connection spawning 1-or-more new connections which
will stay open taking resources the first time a query is issued that
accesses foreign data.

As an optimization I just worked on for my database earlier this week, I
decided to logically replicate that table from my main authentication
database into a each cluster, and I replaced all references to the FDW for
read-only queries to use the logically replicated table. All write queries
still hit the FDW as before.

This was acceptable for my use case, and drastically improved performance
for some queries where I had previously had to use a CTE to force
materialization to get acceptable performance due to the nature of going
over the FDW for that data.

It's a very cool tool, just be careful about how it can impact performance
if you don't measure for your specific use case.


Re: Table partitioning for cloud service?

2020-05-21 Thread Israel Brewster


> On May 21, 2020, at 7:57 AM, Adrian Klaver  wrote:
> 
> On 5/21/20 8:53 AM, Israel Brewster wrote:
>>> On May 21, 2020, at 7:36 AM, Adrian Klaver >> > wrote:
>>> 
>>> On 5/21/20 8:29 AM, Israel Brewster wrote:
 I’m working on my first cloud service, which will be backed by a 
 postgresql database. Currently I only have a single customer, but of 
 course I want to design with the possibility of multiple customers in 
 mind. In that vein, I’m wondering what is “typical” in terms of designing 
 the DB structure to make sure that one customer doesn’t “accidentally" get 
 data for another customer? At the moment I am leaning towards giving each 
 customer their own set of tables, with a unique prefix for each. This 
 would provide a “hard” separation for the data,
>>>  I think that is called a schema:)
>> Ok. That’s probably an option. Although it looks from a cursory perusal that 
>> for that to work, I would have to have separate DB users with different 
>> permissions. Which would be fine, except that I don’t have permissions to 
>> create users.
>>> Or set up a separate database for each in the cluster.
>> Same as above - no permissions.
>> At the moment, I am running on Heroku, which gives me a postgresql database, 
>> but not full control over the cluster. I may need to move to something more 
>> robust, if having completely separate databases is the best option. I was 
>> hoping to avoid SysAdmin stuff as much as possible, and focus on the 
>> software side, but obviously some sys admin is required.
> 
> You can't use this?:
> 
> https://devcenter.heroku.com/articles/heroku-postgresql-credentials 
> 

Wasn’t aware of that. I *did* mention this is my first cloud project. Done 
plenty of DB/web/application development, but not cloud/multi-customer. Thanks 
for the pointer.

> 
>> ---
>> Israel Brewster
>> BrewsterSoft Development
>> http://www.brewstersoft.com
>> Home of EZPunch and Lyrics Presenter
>>> 
 but would also increase maintenance efforts, as if I needed to add a field 
 I would have to add it to every table. On the other hand, keeping 
 everything in the same set of tables would mean having to be VERY careful 
 with my SQL to make sure no customer could access another’s data.
 How is this typically done?
 ---
 Israel Brewster
 BrewsterSoft Development
 http://www.brewstersoft.com 
 Home of EZPunch and Lyrics Presenter
>>> 
>>> 
>>> -- 
>>> Adrian Klaver
>>> adrian.kla...@aklaver.com  
>>> >
>>> 
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com 


Re: Table partitioning for cloud service?

2020-05-21 Thread Christopher Browne
On Thu, 21 May 2020 at 11:53, Israel Brewster 
wrote:

> On May 21, 2020, at 7:36 AM, Adrian Klaver 
> wrote:
>
> On 5/21/20 8:29 AM, Israel Brewster wrote:
>
> I’m working on my first cloud service, which will be backed by a
> postgresql database. Currently I only have a single customer, but of course
> I want to design with the possibility of multiple customers in mind. In
> that vein, I’m wondering what is “typical” in terms of designing the DB
> structure to make sure that one customer doesn’t “accidentally" get data
> for another customer? At the moment I am leaning towards giving each
> customer their own set of tables, with a unique prefix for each. This would
> provide a “hard” separation for the data,
>
>  I think that is called a schema:)
>
>
> Ok. That’s probably an option. Although it looks from a cursory perusal
> that for that to work, I would have to have separate DB users with
> different permissions. Which would be fine, except that I don’t have
> permissions to create users.
>
> Or set up a separate database for each in the cluster.
>
>
> Same as above - no permissions.
>
> At the moment, I am running on Heroku, which gives me a postgresql
> database, but not full control over the cluster. I may need to move to
> something more robust, if having completely separate databases is the best
> option. I was hoping to avoid SysAdmin stuff as much as possible, and focus
> on the software side, but obviously some sys admin is required.
>

There's a whole lot of "that depends" to this.

If there is not much data shared across customers, then it's a pretty good
answer to create a database for each one.  This is especially good if they
are only occasionally connected.

If there is a LOT of shared data, then "life gets more complicated."

It's a decently well documented problem out there; I just searched for
"multitenant database design" which showed up a number of decent
(not-Postgres-specific) answers

https://www.google.com/search?client=firefox-b-d=multitenant+database+design

Some common patterns include:
- A database per tenant (on Postgres, that means that PGDATABASE and/or
connection URIs change for each tenant)
- A schema per tenant (on Postgres, that means each time a tenant is added,
you need "CREATE NAMESPACE" to establish the tenancy and "CREATE TABLE" for
each table in that tenancy, and connections use "set
search_path=tenantname;" to select data from the right tenant)
- Table-based tenancy (e.g. - each table has a "tenant_id" and queries need
to specify the tenant)

An interesting answer, if there needs to be shared data, is for the shared
data to go in its own database, and use a Foreign Data Wrapper to have each
tenants' database access it <
https://www.postgresql.org/docs/12/postgres-fdw.html>

There are lots of tradeoffs involved in each case; each of the above
patterns has merits and demerits particularly as the number of tenants
scales, as well as when you discover there are both tiny and large tenants
with differing requirements.

You need to look at it from various perspectives:
- How do application schema changes get handled as the application evolves?
- What are the security concerns about data sharing across tenants?
- What issues come up when managing storage across tenants?  (Some
approaches are easier to cope with than others)

If you don't have a fair bit of technical expertise locally, then
sophisticated choices will cause you problems that you won't be able to
solve.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: Table partitioning for cloud service?

2020-05-21 Thread Adrian Klaver

On 5/21/20 8:53 AM, Israel Brewster wrote:
On May 21, 2020, at 7:36 AM, Adrian Klaver > wrote:


On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a 
postgresql database. Currently I only have a single customer, but of 
course I want to design with the possibility of multiple customers in 
mind. In that vein, I’m wondering what is “typical” in terms of 
designing the DB structure to make sure that one customer doesn’t 
“accidentally" get data for another customer? At the moment I am 
leaning towards giving each customer their own set of tables, with a 
unique prefix for each. This would provide a “hard” separation for 
the data,

 I think that is called a schema:)


Ok. That’s probably an option. Although it looks from a cursory perusal 
that for that to work, I would have to have separate DB users with 
different permissions. Which would be fine, except that I don’t have 
permissions to create users.



Or set up a separate database for each in the cluster.


Same as above - no permissions.

At the moment, I am running on Heroku, which gives me a postgresql 
database, but not full control over the cluster. I may need to move to 
something more robust, if having completely separate databases is the 
best option. I was hoping to avoid SysAdmin stuff as much as possible, 
and focus on the software side, but obviously some sys admin is required.


You can't use this?:

https://devcenter.heroku.com/articles/heroku-postgresql-credentials



---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com
Home of EZPunch and Lyrics Presenter



but would also increase maintenance efforts, as if I needed to add a 
field I would have to add it to every table. On the other hand, 
keeping everything in the same set of tables would mean having to be 
VERY careful with my SQL to make sure no customer could access 
another’s data.

How is this typically done?
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com 
Home of EZPunch and Lyrics Presenter



--
Adrian Klaver
adrian.kla...@aklaver.com 






--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Table partitioning for cloud service?

2020-05-21 Thread Israel Brewster
> On May 21, 2020, at 7:36 AM, Adrian Klaver  wrote:
> 
> On 5/21/20 8:29 AM, Israel Brewster wrote:
>> I’m working on my first cloud service, which will be backed by a postgresql 
>> database. Currently I only have a single customer, but of course I want to 
>> design with the possibility of multiple customers in mind. In that vein, I’m 
>> wondering what is “typical” in terms of designing the DB structure to make 
>> sure that one customer doesn’t “accidentally" get data for another customer? 
>> At the moment I am leaning towards giving each customer their own set of 
>> tables, with a unique prefix for each. This would provide a “hard” 
>> separation for the data, 
>  I think that is called a schema:)

Ok. That’s probably an option. Although it looks from a cursory perusal that 
for that to work, I would have to have separate DB users with different 
permissions. Which would be fine, except that I don’t have permissions to 
create users.

> Or set up a separate database for each in the cluster.

Same as above - no permissions.

At the moment, I am running on Heroku, which gives me a postgresql database, 
but not full control over the cluster. I may need to move to something more 
robust, if having completely separate databases is the best option. I was 
hoping to avoid SysAdmin stuff as much as possible, and focus on the software 
side, but obviously some sys admin is required.

---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com 
Home of EZPunch and Lyrics Presenter

> 
>> but would also increase maintenance efforts, as if I needed to add a field I 
>> would have to add it to every table. On the other hand, keeping everything 
>> in the same set of tables would mean having to be VERY careful with my SQL 
>> to make sure no customer could access another’s data.
>> How is this typically done?
>> ---
>> Israel Brewster
>> BrewsterSoft Development
>> http://www.brewstersoft.com 
>> Home of EZPunch and Lyrics Presenter
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com
> 



Re: Table partitioning for cloud service?

2020-05-21 Thread Adrian Klaver

On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a 
postgresql database. Currently I only have a single customer, but of 
course I want to design with the possibility of multiple customers in 
mind. In that vein, I’m wondering what is “typical” in terms of 
designing the DB structure to make sure that one customer doesn’t 
“accidentally" get data for another customer? At the moment I am leaning 
towards giving each customer their own set of tables, with a unique 
prefix for each. This would provide a “hard” separation for the data, 

 I think that is called a schema:)
Or set up a separate database for each in the cluster.

but would also increase maintenance efforts, as if I needed to add a 
field I would have to add it to every table. On the other hand, keeping 
everything in the same set of tables would mean having to be VERY 
careful with my SQL to make sure no customer could access another’s data.


How is this typically done?

---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com 
Home of EZPunch and Lyrics Presenter






--
Adrian Klaver
adrian.kla...@aklaver.com




Table partitioning for cloud service?

2020-05-21 Thread Israel Brewster
I’m working on my first cloud service, which will be backed by a postgresql 
database. Currently I only have a single customer, but of course I want to 
design with the possibility of multiple customers in mind. In that vein, I’m 
wondering what is “typical” in terms of designing the DB structure to make sure 
that one customer doesn’t “accidentally" get data for another customer? At the 
moment I am leaning towards giving each customer their own set of tables, with 
a unique prefix for each. This would provide a “hard” separation for the data, 
but would also increase maintenance efforts, as if I needed to add a field I 
would have to add it to every table. On the other hand, keeping everything in 
the same set of tables would mean having to be VERY careful with my SQL to make 
sure no customer could access another’s data.

How is this typically done?

---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com 
Home of EZPunch and Lyrics Presenter