Re: Backfill bgworker Extension?

2017-12-17 Thread Jeremy Finzel
On Sat, Dec 16, 2017 at 8:31 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 12/15/17 23:50, Jeremy Finzel wrote:
> > The common ground is some column in some table needs to be bulk updated.
> > I may not be explaining well, but in our environment we have done
> > hundreds of these using a generic framework to build a backfill. So I’m
> > not sure what you are questioning about the need? We have had to build a
> > worker to accomplish this because it can’t be done as a sql script alone.
>
> I'm trying to identify the independently useful pieces in your use case.
>  A background worker to backfill large tables is a very specific use
> case.  If instead we had a job/scheduler mechanism and a way to have
> server-side scripts that can control transactions, then that might
> satisfy your requirements as well (I'm not sure), but it would also
> potentially address many other uses.


I follow you now.  Yes, I think it probably would.  I think it would at
least provide a framework on which to build the tool I want.  It would be
great to have a "worker-capable" tool inside postgres than always having to
write external logic to do things like this.

> I’m not sure what you mean by a stored procedure in the background.
> > Since it would not be a single transaction, it doesn’t fit as a stored
> > procedure at least in Postgres when a function is 1 transaction.
>
> In progress: https://commitfest.postgresql.org/16/1360/


Looking forward to this.  I think this will help.  A stored procedure with
subtransactions could have the logic for the backfill in it, but would
still need an external worker that could retry it in case of failure
especially when things like a server restart happens.

Thanks,
Jeremy


Re: Backfill bgworker Extension?

2017-12-16 Thread Peter Eisentraut
On 12/15/17 23:50, Jeremy Finzel wrote:
> The common ground is some column in some table needs to be bulk updated.
> I may not be explaining well, but in our environment we have done
> hundreds of these using a generic framework to build a backfill. So I’m
> not sure what you are questioning about the need? We have had to build a
> worker to accomplish this because it can’t be done as a sql script alone.

I'm trying to identify the independently useful pieces in your use case.
 A background worker to backfill large tables is a very specific use
case.  If instead we had a job/scheduler mechanism and a way to have
server-side scripts that can control transactions, then that might
satisfy your requirements as well (I'm not sure), but it would also
potentially address many other uses.

> I’m not sure what you mean by a stored procedure in the background.
> Since it would not be a single transaction, it doesn’t fit as a stored
> procedure at least in Postgres when a function is 1 transaction.

In progress: https://commitfest.postgresql.org/16/1360/

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Backfill bgworker Extension?

2017-12-15 Thread Jeremy Finzel
On Tue, Dec 12, 2017 at 2:26 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 12/12/17 13:03, Jeremy Finzel wrote:
> > To be clear, what I mean is batch updating a large set of data in small
> > pieces so as to avoid things like lock contention and replication lags.
> > Sometimes these have a driving table that has the source data to update
> > in a destination table based on a key column, but sometimes it is
> > something like setting just a single specific value for a huge table.
> >
> > I would love instead to have a Postgres extension that uses postgres
> > background workers to accomplish this, especially if it were part of
> > core.  Before I venture into exploring writing something like this as an
> > extension, would this ever be considered something appropriate as an
> > extension in Postgres core?  Would that be appropriate?
>
> I don't see what the common ground between different variants of this
> use case would be.  Aren't you basically just looking to execute a
> use-case-specific stored procedure in the background?
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


The common ground is some column in some table needs to be bulk updated. I
may not be explaining well, but in our environment we have done hundreds of
these using a generic framework to build a backfill. So I’m not sure what
you are questioning about the need? We have had to build a worker to
accomplish this because it can’t be done as a sql script alone.

I’m not sure what you mean by a stored procedure in the background. Since
it would not be a single transaction, it doesn’t fit as a stored procedure
at least in Postgres when a function is 1 transaction.

Sorry if I’m misunderstanding.

Thanks,
Jeremy


Re: Backfill bgworker Extension?

2017-12-12 Thread Peter Eisentraut
On 12/12/17 13:03, Jeremy Finzel wrote:
> To be clear, what I mean is batch updating a large set of data in small
> pieces so as to avoid things like lock contention and replication lags. 
> Sometimes these have a driving table that has the source data to update
> in a destination table based on a key column, but sometimes it is
> something like setting just a single specific value for a huge table.
> 
> I would love instead to have a Postgres extension that uses postgres
> background workers to accomplish this, especially if it were part of
> core.  Before I venture into exploring writing something like this as an
> extension, would this ever be considered something appropriate as an
> extension in Postgres core?  Would that be appropriate?

I don't see what the common ground between different variants of this
use case would be.  Aren't you basically just looking to execute a
use-case-specific stored procedure in the background?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Backfill bgworker Extension?

2017-12-12 Thread Jeremy Finzel
One of our challenges we have is that our engineers have written frameworks
to backfill data in several different DSLs, and every time they adopt a new
language, they maybe need to write another one.

To be clear, what I mean is batch updating a large set of data in small
pieces so as to avoid things like lock contention and replication lags.
Sometimes these have a driving table that has the source data to update in
a destination table based on a key column, but sometimes it is something
like setting just a single specific value for a huge table.

I would love instead to have a Postgres extension that uses postgres
background workers to accomplish this, especially if it were part of core.
Before I venture into exploring writing something like this as an
extension, would this ever be considered something appropriate as an
extension in Postgres core?  Would that be appropriate?

Thanks,
Jeremy