Re: Logical Replication Custom Column Expression

2023-02-06 Thread Stavros Koureas
>> And yes, probably you need to change the way you reply to email on
>> this list. Top-posting is generally avoided. See
>> https://wiki.postgresql.org/wiki/Mailing_Lists.

>Thanks for bringing this into the discussion :)

Thinking these days more about this topic, subscriber name is not a bad
idea, although it makes sense to be able to give your own value even on
subscriber level, for example an integer.
Having a custom integer value is better as definitely this integer will
participate later in all joins beside the tables and for sure joining with
an integer it would be quicker rather than joining on a character varying
(plus the rest of the columns).
In addition, discussing with other people and also on Stack
Overflow/DBAExchange I have found that other people think it is a great
enhancement for analytical purposes.

Στις Τετ 30 Νοε 2022 στις 10:39 π.μ., ο/η Stavros Koureas <
koureasstav...@gmail.com> έγραψε:

>
>
> Στις Τρί 29 Νοε 2022 στις 3:27 μ.μ., ο/η Ashutosh Bapat <
> ashutosh.bapat@gmail.com> έγραψε:
> > That would be too restrictive - not necessarily in your application
> > but generally. There could be some tables where consolidating rows
> > with same PK from different publishers into a single row in subscriber
> > would be desirable. I think we need to enable the property for every
> > subscriber that intends to add publisher column to the desired and
> > subscribed tables. But there should be another option per table which
> > will indicate that receiver should add publisher when INSERTING row to
> > that table.
>
> So we are discussing the scope level of this property, if this property
> will be implemented on subscriber level or on subscriber table.
> In that case I am not sure how this will be implemented as currently
> postgres subscribers can have multiple tables streamed from a single
> publisher.
> In that case we may have an additional syntax on subscriber, for example:
>
> CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost port=5432
> user=postgres password=XX dbname=publisher1' PUBLICATION pub1 with
> (enabled = false, create_slot = false, slot_name = NONE, tables =
> {tableA:union, tableB:none, });
>
> Something like this?
>
> > And yes, probably you need to change the way you reply to email on
> > this list. Top-posting is generally avoided. See
> > https://wiki.postgresql.org/wiki/Mailing_Lists.
>
> Thanks for bringing this into the discussion :)
>


Re: Logical Replication Custom Column Expression

2022-12-02 Thread Ashutosh Bapat
On Wed, Nov 30, 2022 at 2:09 PM Stavros Koureas
 wrote:
>
>
>
> Στις Τρί 29 Νοε 2022 στις 3:27 μ.μ., ο/η Ashutosh Bapat 
>  έγραψε:
> > That would be too restrictive - not necessarily in your application
> > but generally. There could be some tables where consolidating rows
> > with same PK from different publishers into a single row in subscriber
> > would be desirable. I think we need to enable the property for every
> > subscriber that intends to add publisher column to the desired and
> > subscribed tables. But there should be another option per table which
> > will indicate that receiver should add publisher when INSERTING row to
> > that table.
>
> So we are discussing the scope level of this property, if this property will 
> be implemented on subscriber level or on subscriber table.
> In that case I am not sure how this will be implemented as currently postgres 
> subscribers can have multiple tables streamed from a single publisher.
> In that case we may have an additional syntax on subscriber, for example:
>
> CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost port=5432 user=postgres 
> password=XX dbname=publisher1' PUBLICATION pub1 with (enabled = false, 
> create_slot = false, slot_name = NONE, tables = {tableA:union, tableB:none, 
> });
>
> Something like this?

Nope, I think we will need to add a table level property through table
options or receiver can infer it by looking at the table columns -
e.g. existence of origin_id column or some such thing.


-- 
Best Wishes,
Ashutosh Bapat




Re: Logical Replication Custom Column Expression

2022-11-30 Thread Stavros Koureas
Στις Τρί 29 Νοε 2022 στις 3:27 μ.μ., ο/η Ashutosh Bapat <
ashutosh.bapat@gmail.com> έγραψε:
> That would be too restrictive - not necessarily in your application
> but generally. There could be some tables where consolidating rows
> with same PK from different publishers into a single row in subscriber
> would be desirable. I think we need to enable the property for every
> subscriber that intends to add publisher column to the desired and
> subscribed tables. But there should be another option per table which
> will indicate that receiver should add publisher when INSERTING row to
> that table.

So we are discussing the scope level of this property, if this property
will be implemented on subscriber level or on subscriber table.
In that case I am not sure how this will be implemented as currently
postgres subscribers can have multiple tables streamed from a single
publisher.
In that case we may have an additional syntax on subscriber, for example:

CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost port=5432 user=postgres
password=XX dbname=publisher1' PUBLICATION pub1 with (enabled = false,
create_slot = false, slot_name = NONE, tables = {tableA:union, tableB:none,
});

Something like this?

> And yes, probably you need to change the way you reply to email on
> this list. Top-posting is generally avoided. See
> https://wiki.postgresql.org/wiki/Mailing_Lists.

Thanks for bringing this into the discussion :)


Re: Logical Replication Custom Column Expression

2022-11-29 Thread Ashutosh Bapat
On Fri, Nov 25, 2022 at 4:13 PM Stavros Koureas
 wrote:
>
> Yes, if the property is on the subscription side then it should be applied 
> for all the tables that the connected publication is exposing.
> So if the property is enabled you should be sure that this origin column 
> exists to all of the tables that the publication is exposing...
>

That would be too restrictive - not necessarily in your application
but generally. There could be some tables where consolidating rows
with same PK from different publishers into a single row in subscriber
would be desirable. I think we need to enable the property for every
subscriber that intends to add publisher column to the desired and
subscribed tables. But there should be another option per table which
will indicate that receiver should add publisher when INSERTING row to
that table.


> Sure this is the complete idea, that the subscriber should match the PK of 
> origin, 
> As the subscription table will contain same key values from different 
> origins, for example:
>

And yes, probably you need to change the way you reply to email on
this list. Top-posting is generally avoided. See
https://wiki.postgresql.org/wiki/Mailing_Lists.

-- 
Best Wishes,
Ashutosh Bapat




Re: Logical Replication Custom Column Expression

2022-11-28 Thread Stavros Koureas
Sure I understand and neither do I have good knowledge of what else could
be influenced by such a change.
If the value of the column is the subscriber name has no benefit to this
idea of merging multiple upstreams with same primary keys, later you
describe the "connection dbname", yes this could be a possibility.
I do not fully understand that part "how will the initial tablesync COPY
efficiently assign these subscriber name column values?"
Why is difficult that during the initial sync put everywhere the same value
for all rows of the same origin?

Στις Δευ 28 Νοε 2022 στις 10:16 π.μ., ο/η Peter Smith 
έγραψε:

> On Fri, Nov 25, 2022 at 9:43 PM Stavros Koureas
>  wrote:
> >
> > Yes, if the property is on the subscription side then it should be
> applied for all the tables that the connected publication is exposing.
> > So if the property is enabled you should be sure that this origin column
> exists to all of the tables that the publication is exposing...
> >
> > Sure this is the complete idea, that the subscriber should match the PK
> of origin, 
> > As the subscription table will contain same key values from different
> origins, for example:
> >
> > For publisher1 database table
> > id pk integer | value character varying
> > 1   | testA1
> > 2   | testA2
> >
> > For publisher2 database table
> > id pk integer | value character varying
> > 1   | testB1
> > 2   | testB2
> >
> > For subscriber database table
> > origin pk character varying | id pk integer | value character varying
> > publisher1   | 1   | testA1
> > publisher1   | 2   | testA2
> > publisher2   | 1   | testB1
> > publisher2   | 2   | testB2
> >
> > All statements INSERT, UPDATE, DELETE should always include the
> predicate of the origin.
> >
>
> This sounds similar to what I had posted [1] although I was saying the
> generated column value might be the *subscriber* name, not the origin
> publisher name. (where are you getting that value from -- somehow from
> the subscriptions' CONNECTION dbname?)
>
> Anyway, regardless of the details, please note -- my idea was really
> intended just as a discussion starting point to demonstrate that
> required functionality might be achieved using a simpler syntax than
> what had been previously suggested. But in practice there may be some
> problems with this approach -- e.g. how will the initial tablesync
> COPY efficiently assign these subscriber name column values?
>
> --
> [1]
> https://www.postgresql.org/message-id/CAHut%2BPuZowXd7Aa7t0nqjP6afHMwJarngzeMq%2BQP0vE2KKLOgQ%40mail.gmail.com
>
> Kind Regards,
> Peter Smith.
> Fujitsu Australia.
>


Re: Logical Replication Custom Column Expression

2022-11-28 Thread Peter Smith
On Fri, Nov 25, 2022 at 9:43 PM Stavros Koureas
 wrote:
>
> Yes, if the property is on the subscription side then it should be applied 
> for all the tables that the connected publication is exposing.
> So if the property is enabled you should be sure that this origin column 
> exists to all of the tables that the publication is exposing...
>
> Sure this is the complete idea, that the subscriber should match the PK of 
> origin, 
> As the subscription table will contain same key values from different 
> origins, for example:
>
> For publisher1 database table
> id pk integer | value character varying
> 1   | testA1
> 2   | testA2
>
> For publisher2 database table
> id pk integer | value character varying
> 1   | testB1
> 2   | testB2
>
> For subscriber database table
> origin pk character varying | id pk integer | value character varying
> publisher1   | 1   | testA1
> publisher1   | 2   | testA2
> publisher2   | 1   | testB1
> publisher2   | 2   | testB2
>
> All statements INSERT, UPDATE, DELETE should always include the predicate of 
> the origin.
>

This sounds similar to what I had posted [1] although I was saying the
generated column value might be the *subscriber* name, not the origin
publisher name. (where are you getting that value from -- somehow from
the subscriptions' CONNECTION dbname?)

Anyway, regardless of the details, please note -- my idea was really
intended just as a discussion starting point to demonstrate that
required functionality might be achieved using a simpler syntax than
what had been previously suggested. But in practice there may be some
problems with this approach -- e.g. how will the initial tablesync
COPY efficiently assign these subscriber name column values?

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPuZowXd7Aa7t0nqjP6afHMwJarngzeMq%2BQP0vE2KKLOgQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Logical Replication Custom Column Expression

2022-11-25 Thread Stavros Koureas
Yes, if the property is on the subscription side then it should be applied
for all the tables that the connected publication is exposing.
So if the property is enabled you should be sure that this origin column
exists to all of the tables that the publication is exposing...

Sure this is the complete idea, that the subscriber should match the PK of
origin, 
As the subscription table will contain same key values from different
origins, for example:

*For publisher1 database **table*
id pk integer | value character varying
1   | testA1
2   | testA2

*For publisher2 database **table*
id pk integer | value character varying
1   | testB1
2   | testB2

*For subscriber database table*
origin *pk *character varying | id *pk *integer | value character varying
publisher1   | 1   | testA1
publisher1   | 2   | testA2
publisher2   | 1   | testB1
publisher2   | 2   | testB2

All statements INSERT, UPDATE, DELETE should always include the predicate
of the origin.

Στις Παρ 25 Νοε 2022 στις 12:21 μ.μ., ο/η Ashutosh Bapat <
ashutosh.bapat@gmail.com> έγραψε:

> On Wed, Nov 23, 2022 at 4:54 AM Peter Smith  wrote:
> >
> > On Wed, Nov 23, 2022 at 7:38 AM Stavros Koureas
> >  wrote:
> > >
> > > Reading more carefully what you described, I think you are interested
> in getting something you call origin from publishers, probably some
> metadata from the publications.
> > >
> > > This identifier in those metadata maybe does not have business value
> on the reporting side. The idea is to use a value which has specific
> meaning to the user at the end.
> > >
> > > For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at
> the end based on a dimension table which holds this mapping the user would
> be able to filter the data. So programmatically the user can set the id
> value of the column plus creating the mapping table from an application
> let’s say and be able to distinguish the data.
> > >
> > > In addition this column should have the ability to be part of the
> primary key on the subscription table in order to not conflict with lines
> from other tenants having the same keys.
> > >
> > >
> >
> > I was wondering if a simpler syntax solution might also work here.
> >
> > Imagine another SUBSCRIPTION parameter that indicates to write the
> > *name* of the subscription to some pre-defined table column:
> > e.g. CREATE SUBSCRIPTION subname FOR PUBLICATION pub_tenant_1
> > CONNECTION '...' WITH (subscription_column);
> >
> > Logical Replication already allows the subscriber table to have extra
> > columns, so you just need to manually create the extra 'subscription'
> > column up-front.
> >
> > Then...
> >
> > ~~
> >
> > On Publisher:
> >
> > test_pub=# CREATE TABLE tab(id int primary key, description varchar);
> > CREATE TABLE
> >
> > test_pub=# INSERT INTO tab VALUES (1,'one'),(2,'two'),(3,'three');
> > INSERT 0 3
> >
> > test_pub=# CREATE PUBLICATION tenant1 FOR ALL TABLES;
> > CREATE PUBLICATION
> >
> > ~~
> >
> > On Subscriber:
> >
> > test_sub=# CREATE TABLE tab(id int, description varchar, subscription
> varchar);
> > CREATE TABLE
> >
> > test_sub=# CREATE SUBSCRIPTION sub_tenant1 CONNECTION 'host=localhost
> > dbname=test_pub' PUBLICATION tenant1 WITH (subscription_column);
> > CREATE SUBSCRIPTION
> >
> > test_sub=# SELECT * FROM tab;
> >  id | description | subscription
> > +-+--
> >   1 | one | sub_tenant1
> >   2 | two | sub_tenant1
> >   3 | three   | sub_tenant1
> > (3 rows)
> >
> > ~~
> >
> Thanks for the example. This is more concrete than just verbal description.
>
> In this example, do all the tables that a subscription subscribes to
> need that additional column or somehow the pglogical receiver will
> figure out which tables have that column and populate rows
> accordingly?
>
> My further fear is that the subscriber will also need to match the
> subscription column along with the rest of PK so as not to update rows
> from other subscriptions.
> --
> Best Wishes,
> Ashutosh Bapat
>


Re: Logical Replication Custom Column Expression

2022-11-25 Thread Ashutosh Bapat
On Wed, Nov 23, 2022 at 4:54 AM Peter Smith  wrote:
>
> On Wed, Nov 23, 2022 at 7:38 AM Stavros Koureas
>  wrote:
> >
> > Reading more carefully what you described, I think you are interested in 
> > getting something you call origin from publishers, probably some metadata 
> > from the publications.
> >
> > This identifier in those metadata maybe does not have business value on the 
> > reporting side. The idea is to use a value which has specific meaning to 
> > the user at the end.
> >
> > For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the end 
> > based on a dimension table which holds this mapping the user would be able 
> > to filter the data. So programmatically the user can set the id value of 
> > the column plus creating the mapping table from an application let’s say 
> > and be able to distinguish the data.
> >
> > In addition this column should have the ability to be part of the primary 
> > key on the subscription table in order to not conflict with lines from 
> > other tenants having the same keys.
> >
> >
>
> I was wondering if a simpler syntax solution might also work here.
>
> Imagine another SUBSCRIPTION parameter that indicates to write the
> *name* of the subscription to some pre-defined table column:
> e.g. CREATE SUBSCRIPTION subname FOR PUBLICATION pub_tenant_1
> CONNECTION '...' WITH (subscription_column);
>
> Logical Replication already allows the subscriber table to have extra
> columns, so you just need to manually create the extra 'subscription'
> column up-front.
>
> Then...
>
> ~~
>
> On Publisher:
>
> test_pub=# CREATE TABLE tab(id int primary key, description varchar);
> CREATE TABLE
>
> test_pub=# INSERT INTO tab VALUES (1,'one'),(2,'two'),(3,'three');
> INSERT 0 3
>
> test_pub=# CREATE PUBLICATION tenant1 FOR ALL TABLES;
> CREATE PUBLICATION
>
> ~~
>
> On Subscriber:
>
> test_sub=# CREATE TABLE tab(id int, description varchar, subscription 
> varchar);
> CREATE TABLE
>
> test_sub=# CREATE SUBSCRIPTION sub_tenant1 CONNECTION 'host=localhost
> dbname=test_pub' PUBLICATION tenant1 WITH (subscription_column);
> CREATE SUBSCRIPTION
>
> test_sub=# SELECT * FROM tab;
>  id | description | subscription
> +-+--
>   1 | one | sub_tenant1
>   2 | two | sub_tenant1
>   3 | three   | sub_tenant1
> (3 rows)
>
> ~~
>
Thanks for the example. This is more concrete than just verbal description.

In this example, do all the tables that a subscription subscribes to
need that additional column or somehow the pglogical receiver will
figure out which tables have that column and populate rows
accordingly?

My further fear is that the subscriber will also need to match the
subscription column along with the rest of PK so as not to update rows
from other subscriptions.
-- 
Best Wishes,
Ashutosh Bapat




Re: Logical Replication Custom Column Expression

2022-11-23 Thread Stavros Koureas
Just one correction for the subscriber
On Subscriber:

test_sub=# CREATE TABLE tab(id int *pkey*, description varchar,
subscription varchar *pkey*);
CREATE TABLE

The subscription table should have the same primary key columns as the
publisher plus one more.
We need to make sure that on update only the same origin data is
being updated.

Στις Τετ 23 Νοε 2022 στις 1:24 π.μ., ο/η Peter Smith 
έγραψε:

> On Wed, Nov 23, 2022 at 7:38 AM Stavros Koureas
>  wrote:
> >
> > Reading more carefully what you described, I think you are interested in
> getting something you call origin from publishers, probably some metadata
> from the publications.
> >
> > This identifier in those metadata maybe does not have business value on
> the reporting side. The idea is to use a value which has specific meaning
> to the user at the end.
> >
> > For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the
> end based on a dimension table which holds this mapping the user would be
> able to filter the data. So programmatically the user can set the id value
> of the column plus creating the mapping table from an application let’s say
> and be able to distinguish the data.
> >
> > In addition this column should have the ability to be part of the
> primary key on the subscription table in order to not conflict with lines
> from other tenants having the same keys.
> >
> >
>
> I was wondering if a simpler syntax solution might also work here.
>
> Imagine another SUBSCRIPTION parameter that indicates to write the
> *name* of the subscription to some pre-defined table column:
> e.g. CREATE SUBSCRIPTION subname FOR PUBLICATION pub_tenant_1
> CONNECTION '...' WITH (subscription_column);
>
> Logical Replication already allows the subscriber table to have extra
> columns, so you just need to manually create the extra 'subscription'
> column up-front.
>
> Then...
>
> ~~
>
> On Publisher:
>
> test_pub=# CREATE TABLE tab(id int primary key, description varchar);
> CREATE TABLE
>
> test_pub=# INSERT INTO tab VALUES (1,'one'),(2,'two'),(3,'three');
> INSERT 0 3
>
> test_pub=# CREATE PUBLICATION tenant1 FOR ALL TABLES;
> CREATE PUBLICATION
>
> ~~
>
> On Subscriber:
>
> test_sub=# CREATE TABLE tab(id int, description varchar, subscription
> varchar);
> CREATE TABLE
>
> test_sub=# CREATE SUBSCRIPTION sub_tenant1 CONNECTION 'host=localhost
> dbname=test_pub' PUBLICATION tenant1 WITH (subscription_column);
> CREATE SUBSCRIPTION
>
> test_sub=# SELECT * FROM tab;
>  id | description | subscription
> +-+--
>   1 | one | sub_tenant1
>   2 | two | sub_tenant1
>   3 | three   | sub_tenant1
> (3 rows)
>
> ~~
>
> Subscriptions to different tenants would be named differently.
>
> And using other SQL you can map/filter those names however your
> application wants.
>
> --
> Kind Regards,
> Peter Smith.
> Fujitsu Australia
>


Re: Logical Replication Custom Column Expression

2022-11-23 Thread Stavros Koureas
It's easy to answer this question.

Imagine that in a software company who sells the product and also offers
reporting solutions, the ERP tables will not have this additional column to
all the tables.
Now the reporting department comes and needs to consolidate all that data
from different databases (publishers) and create one multitenant database
to have all the data.
So in an ERP like NAV or anything else you cannot suggest change all the
code to all of the tables plus all functions to add one additional column
to this table, even that was possible then you cannot work with integers
but you need to work with GUIDs as this column should be predefined to each
ERP. Then joining with GUID in the second phase for reporting
definitely will slow down the performance.

In summary:

   1. Cannot touch the underlying source (important)
   2. GUID identifier column will slow down the reporting performance


Στις Τετ 23 Νοε 2022 στις 5:19 π.μ., ο/η Amit Kapila <
amit.kapil...@gmail.com> έγραψε:

> On Wed, Nov 23, 2022 at 1:40 AM Stavros Koureas
>  wrote:
> >
> > Reading more carefully what you described, I think you are interested in
> getting something you call origin from publishers, probably some metadata
> from the publications.
> >
> > This identifier in those metadata maybe does not have business value on
> the reporting side. The idea is to use a value which has specific meaning
> to the user at the end.
> >
> > For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the
> end based on a dimension table which holds this mapping the user would be
> able to filter the data. So programmatically the user can set the id value
> of the column plus creating the mapping table from an application let’s say
> and be able to distinguish the data.
> >
>
> In your example, are different tenants represent different publisher
> nodes? If so, why can't we have a predefined column and value for the
> required tables on each publisher rather than logical replication
> generate that value while replicating data?
>
> --
> With Regards,
> Amit Kapila.
>


Re: Logical Replication Custom Column Expression

2022-11-22 Thread Amit Kapila
On Wed, Nov 23, 2022 at 1:40 AM Stavros Koureas
 wrote:
>
> Reading more carefully what you described, I think you are interested in 
> getting something you call origin from publishers, probably some metadata 
> from the publications.
>
> This identifier in those metadata maybe does not have business value on the 
> reporting side. The idea is to use a value which has specific meaning to the 
> user at the end.
>
> For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the end 
> based on a dimension table which holds this mapping the user would be able to 
> filter the data. So programmatically the user can set the id value of the 
> column plus creating the mapping table from an application let’s say and be 
> able to distinguish the data.
>

In your example, are different tenants represent different publisher
nodes? If so, why can't we have a predefined column and value for the
required tables on each publisher rather than logical replication
generate that value while replicating data?

-- 
With Regards,
Amit Kapila.




Re: Logical Replication Custom Column Expression

2022-11-22 Thread Amit Kapila
On Tue, Nov 22, 2022 at 6:22 PM Stavros Koureas
 wrote:
>
> Sure, this can be implemented as a subscription option, and it will cover 
> this use case scenario as each subscriber points only to one database.
> I also have some more analytical/reporting use-cases which need additions in 
> logical-replication, I am not sure if I need to open different discussions 
> for each one, all ideas are for publication/subscription.
>

I think to some extent it depends on how unique each idea is but
initially you may want to post here and then we can spin off different
threads for a discussion if required. Are you interested in working on
one or more of those ideas to make them reality or do you want others
to pick up based on their interest?

-- 
With Regards,
Amit Kapila.




Re: Logical Replication Custom Column Expression

2022-11-22 Thread Peter Smith
On Wed, Nov 23, 2022 at 7:38 AM Stavros Koureas
 wrote:
>
> Reading more carefully what you described, I think you are interested in 
> getting something you call origin from publishers, probably some metadata 
> from the publications.
>
> This identifier in those metadata maybe does not have business value on the 
> reporting side. The idea is to use a value which has specific meaning to the 
> user at the end.
>
> For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the end 
> based on a dimension table which holds this mapping the user would be able to 
> filter the data. So programmatically the user can set the id value of the 
> column plus creating the mapping table from an application let’s say and be 
> able to distinguish the data.
>
> In addition this column should have the ability to be part of the primary key 
> on the subscription table in order to not conflict with lines from other 
> tenants having the same keys.
>
>

I was wondering if a simpler syntax solution might also work here.

Imagine another SUBSCRIPTION parameter that indicates to write the
*name* of the subscription to some pre-defined table column:
e.g. CREATE SUBSCRIPTION subname FOR PUBLICATION pub_tenant_1
CONNECTION '...' WITH (subscription_column);

Logical Replication already allows the subscriber table to have extra
columns, so you just need to manually create the extra 'subscription'
column up-front.

Then...

~~

On Publisher:

test_pub=# CREATE TABLE tab(id int primary key, description varchar);
CREATE TABLE

test_pub=# INSERT INTO tab VALUES (1,'one'),(2,'two'),(3,'three');
INSERT 0 3

test_pub=# CREATE PUBLICATION tenant1 FOR ALL TABLES;
CREATE PUBLICATION

~~

On Subscriber:

test_sub=# CREATE TABLE tab(id int, description varchar, subscription varchar);
CREATE TABLE

test_sub=# CREATE SUBSCRIPTION sub_tenant1 CONNECTION 'host=localhost
dbname=test_pub' PUBLICATION tenant1 WITH (subscription_column);
CREATE SUBSCRIPTION

test_sub=# SELECT * FROM tab;
 id | description | subscription
+-+--
  1 | one | sub_tenant1
  2 | two | sub_tenant1
  3 | three   | sub_tenant1
(3 rows)

~~

Subscriptions to different tenants would be named differently.

And using other SQL you can map/filter those names however your
application wants.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Logical Replication Custom Column Expression

2022-11-22 Thread Stavros Koureas
Reading more carefully what you described, I think you are interested in 
getting something you call origin from publishers, probably some metadata from 
the publications.

This identifier in those metadata maybe does not have business value on the 
reporting side. The idea is to use a value which has specific meaning to the 
user at the end.

For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the end 
based on a dimension table which holds this mapping the user would be able to 
filter the data. So programmatically the user can set the id value of the 
column plus creating the mapping table from an application let’s say and be 
able to distinguish the data.

In addition this column should have the ability to be part of the primary key 
on the subscription table in order to not conflict with lines from other 
tenants having the same keys.

> 
> 22 Νοε 2022, 14:52, ο χρήστης «Stavros Koureas » 
> έγραψε:
> 
> 
> Sure, this can be implemented as a subscription option, and it will cover 
> this use case scenario as each subscriber points only to one database.
> I also have some more analytical/reporting use-cases which need additions in 
> logical-replication, I am not sure if I need to open different discussions 
> for each one, all ideas are for publication/subscription.
> 
> Στις Τρί 22 Νοε 2022 στις 2:22 μ.μ., ο/η Amit Kapila 
>  έγραψε:
>> On Mon, Nov 21, 2022 at 5:05 PM Ashutosh Bapat
>>  wrote:
>> >
>> > On Sat, Nov 19, 2022 at 6:47 PM Stavros Koureas
>> >  wrote:
>> > >
>> > > What does not support is the option for defining custom column 
>> > > expressions, as keys or values, into the upstream (publication). This 
>> > > will give more flexibility into making replication from multiple 
>> > > upstreams into less downstreams adding more logic. For instance, in a 
>> > > project for analytical purposes there is the need to consolidate data 
>> > > from multiple databases into one and at the same time keep the origin of 
>> > > each replicated data identified by a tenanant_id column. In this case we 
>> > > also need the ability to define the new column as an additional key 
>> > > which will participate into the destination table.
>> > >
>> > > Tenant 1 table
>> > > id serial pk
>> > > description varchar
>> > >
>> > > Tenant 2 table
>> > > id integer pk
>> > > description varchar
>> > >
>> > > Group table
>> > > tenant integer pk
>> > > id integer pk
>> > > description varchar
>> > >
>> > > Possible syntax to archive that
>> > > CREATE PUBLICATION pb_test FOR TABLE test ({value:datatype:iskey:alias} 
>> > > ,"id", "name")
>> > >
>> > > Example
>> > > CREATE PUBLICATION pb_test FOR TABLE test ({1:integer:true:tenant} 
>> > > ,"id", "name")
>> >
>> > I think that's a valid usecase.
>> >
>> > This looks more like a subscription option to me. In multi-subscriber
>> > multi-publisher scenarios, on one subscriber a given upstream may be
>> > tenant 1 but on some other it could be 2. But I don't think we allow
>> > specifying subscription options for a single table. AFAIU, the origin
>> > ids are available as part of the commit record which contained this
>> > change; that's how conflict resolution is supposed to know it. So
>> > somehow the subscriber will need to fetch those from there and set the
>> > tenant.
>> >
>> 
>> Yeah, to me also it appears that we can handle it on the subscriber
>> side. We have the provision of sending origin information in proto.c.
>> But note that by default publishers won't have any origin associated
>> with change unless someone has defined it. I think this work needs
>> more thought but sounds to be an interesting feature.
>> 
>> -- 
>> With Regards,
>> Amit Kapila.


Re: Logical Replication Custom Column Expression

2022-11-22 Thread Stavros Koureas
Sure, this can be implemented as a subscription option, and it will cover
this use case scenario as each subscriber points only to one database.
I also have some more analytical/reporting use-cases which need additions
in logical-replication, I am not sure if I need to open
different discussions for each one, all ideas are for
publication/subscription.

Στις Τρί 22 Νοε 2022 στις 2:22 μ.μ., ο/η Amit Kapila <
amit.kapil...@gmail.com> έγραψε:

> On Mon, Nov 21, 2022 at 5:05 PM Ashutosh Bapat
>  wrote:
> >
> > On Sat, Nov 19, 2022 at 6:47 PM Stavros Koureas
> >  wrote:
> > >
> > > What does not support is the option for defining custom column
> expressions, as keys or values, into the upstream (publication). This will
> give more flexibility into making replication from multiple upstreams into
> less downstreams adding more logic. For instance, in a project for
> analytical purposes there is the need to consolidate data from multiple
> databases into one and at the same time keep the origin of each replicated
> data identified by a tenanant_id column. In this case we also need the
> ability to define the new column as an additional key which will
> participate into the destination table.
> > >
> > > Tenant 1 table
> > > id serial pk
> > > description varchar
> > >
> > > Tenant 2 table
> > > id integer pk
> > > description varchar
> > >
> > > Group table
> > > tenant integer pk
> > > id integer pk
> > > description varchar
> > >
> > > Possible syntax to archive that
> > > CREATE PUBLICATION pb_test FOR TABLE test
> ({value:datatype:iskey:alias} ,"id", "name")
> > >
> > > Example
> > > CREATE PUBLICATION pb_test FOR TABLE test ({1:integer:true:tenant}
> ,"id", "name")
> >
> > I think that's a valid usecase.
> >
> > This looks more like a subscription option to me. In multi-subscriber
> > multi-publisher scenarios, on one subscriber a given upstream may be
> > tenant 1 but on some other it could be 2. But I don't think we allow
> > specifying subscription options for a single table. AFAIU, the origin
> > ids are available as part of the commit record which contained this
> > change; that's how conflict resolution is supposed to know it. So
> > somehow the subscriber will need to fetch those from there and set the
> > tenant.
> >
>
> Yeah, to me also it appears that we can handle it on the subscriber
> side. We have the provision of sending origin information in proto.c.
> But note that by default publishers won't have any origin associated
> with change unless someone has defined it. I think this work needs
> more thought but sounds to be an interesting feature.
>
> --
> With Regards,
> Amit Kapila.
>


Re: Logical Replication Custom Column Expression

2022-11-22 Thread Amit Kapila
On Mon, Nov 21, 2022 at 5:05 PM Ashutosh Bapat
 wrote:
>
> On Sat, Nov 19, 2022 at 6:47 PM Stavros Koureas
>  wrote:
> >
> > What does not support is the option for defining custom column expressions, 
> > as keys or values, into the upstream (publication). This will give more 
> > flexibility into making replication from multiple upstreams into less 
> > downstreams adding more logic. For instance, in a project for analytical 
> > purposes there is the need to consolidate data from multiple databases into 
> > one and at the same time keep the origin of each replicated data identified 
> > by a tenanant_id column. In this case we also need the ability to define 
> > the new column as an additional key which will participate into the 
> > destination table.
> >
> > Tenant 1 table
> > id serial pk
> > description varchar
> >
> > Tenant 2 table
> > id integer pk
> > description varchar
> >
> > Group table
> > tenant integer pk
> > id integer pk
> > description varchar
> >
> > Possible syntax to archive that
> > CREATE PUBLICATION pb_test FOR TABLE test ({value:datatype:iskey:alias} 
> > ,"id", "name")
> >
> > Example
> > CREATE PUBLICATION pb_test FOR TABLE test ({1:integer:true:tenant} ,"id", 
> > "name")
>
> I think that's a valid usecase.
>
> This looks more like a subscription option to me. In multi-subscriber
> multi-publisher scenarios, on one subscriber a given upstream may be
> tenant 1 but on some other it could be 2. But I don't think we allow
> specifying subscription options for a single table. AFAIU, the origin
> ids are available as part of the commit record which contained this
> change; that's how conflict resolution is supposed to know it. So
> somehow the subscriber will need to fetch those from there and set the
> tenant.
>

Yeah, to me also it appears that we can handle it on the subscriber
side. We have the provision of sending origin information in proto.c.
But note that by default publishers won't have any origin associated
with change unless someone has defined it. I think this work needs
more thought but sounds to be an interesting feature.

-- 
With Regards,
Amit Kapila.




Re: Logical Replication Custom Column Expression

2022-11-21 Thread Ashutosh Bapat
On Sat, Nov 19, 2022 at 6:47 PM Stavros Koureas
 wrote:
>
> Hi all,
>
> Working with PostgreSQL Logical Replication is just great! It helps a lot 
> doing real time replication for analytical purposes without using any other 
> 3d party service. Although all these years working as product architect of 
> reporting i have noted a few requirements which are always a challenge and 
> may help enhance logical replication even better.
>
> To the point:
> PostgreSQL14 Logical Replication allows replication of a table to another 
> table that exists in another database or even in another host. It also allows 
> multiple upstream tables using the same structure to downstream into a single 
> table.
> CREATE PUBLICATION pb_test FOR TABLE test
>
> PostgreSQL15 Logical Replication allows even better replication options, like 
> selecting subsets of the columns from publisher tables. It also supports 
> plenty of options like disable_on_error etc.
> CREATE PUBLICATION pb_test FOR TABLE test ("id", "name")
>
> What does not support is the option for defining custom column expressions, 
> as keys or values, into the upstream (publication). This will give more 
> flexibility into making replication from multiple upstreams into less 
> downstreams adding more logic. For instance, in a project for analytical 
> purposes there is the need to consolidate data from multiple databases into 
> one and at the same time keep the origin of each replicated data identified 
> by a tenanant_id column. In this case we also need the ability to define the 
> new column as an additional key which will participate into the destination 
> table.
>
> Tenant 1 table
> id serial pk
> description varchar
>
> Tenant 2 table
> id integer pk
> description varchar
>
> Group table
> tenant integer pk
> id integer pk
> description varchar
>
> Possible syntax to archive that
> CREATE PUBLICATION pb_test FOR TABLE test ({value:datatype:iskey:alias} 
> ,"id", "name")
>
> Example
> CREATE PUBLICATION pb_test FOR TABLE test ({1:integer:true:tenant} ,"id", 
> "name")

I think that's a valid usecase.

This looks more like a subscription option to me. In multi-subscriber
multi-publisher scenarios, on one subscriber a given upstream may be
tenant 1 but on some other it could be 2. But I don't think we allow
specifying subscription options for a single table. AFAIU, the origin
ids are available as part of the commit record which contained this
change; that's how conflict resolution is supposed to know it. So
somehow the subscriber will need to fetch those from there and set the
tenant.

-- 
Best Wishes,
Ashutosh Bapat




Logical Replication Custom Column Expression

2022-11-19 Thread Stavros Koureas
Hi all,

Working with PostgreSQL Logical Replication is just great! It helps a lot
doing real time replication for analytical purposes without using any other
3d party service. Although all these years working as product architect of
reporting i have noted a few requirements which are always a challenge and
may help enhance logical replication even better.

To the point:
PostgreSQL14 Logical Replication allows replication of a table to another
table that exists in another database or even in another host. It also
allows multiple upstream tables using the same structure to downstream into
a single table.
*CREATE PUBLICATION pb_test FOR TABLE test*

PostgreSQL15 Logical Replication allows even better replication options,
like selecting subsets of the columns from publisher tables. It also
supports plenty of options like disable_on_error etc.
*CREATE PUBLICATION pb_test FOR TABLE test ("id", "name")*

What does not support is the option for defining custom column expressions,
as keys or values, into the upstream (publication). This will give more
flexibility into making replication from multiple upstreams into less
downstreams adding more logic. For instance, in a project for analytical
purposes there is the need to consolidate data from multiple databases into
one and at the same time keep the origin of each replicated data
identified by a tenanant_id column. In this case we also need the ability
to define the new column as an additional key which will participate into
the destination table.

Tenant 1 table
id serial pk
description varchar

Tenant 2 table
id integer pk
description varchar

Group table
tenant integer pk
id integer pk
description varchar

Possible syntax to archive that
*CREATE PUBLICATION pb_test FOR TABLE test ({value:datatype:iskey:alias}
,"id", "name")*

Example
*CREATE PUBLICATION pb_test FOR TABLE test ({1:integer:true:tenant} ,"id",
"name")*

I suppose the column definition should exist in the publication syntax as
the publication should know from before the datatype and if is a key before
being consumed by a subscriber which may already have the column.

So making an insert or update or delete statement into a tenant 1 database:
INSERT INTO test (id, description) VALUES (5, 'data')
UPDATE test SET description = 'data' WHERE id = 5
DELETE FROM test WHERE id = 5
Will be reflected into subscriber as the following
INSERT INTO test (tenant, id, description) VALUES (1, 5, 'data')
UPDATE test SET description = 'data' WHERE tenant=1 AND id = 5
DELETE FROM test WHERE tenant=1 AND id = 5

For more clarifications please reach me at koureasstav...@gmail.com
Thanks!