Re: [GENERAL] plpgsql function with offset - Postgres 9.1

2017-06-15 Thread Peter J. Holzer
On 2017-06-16 10:19:45 +1200, Patrick B wrote:
> 2017-05-29 19:27 GMT+12:00 Albe Laurenz :
> Patrick B wrote:
> > I am running a background task on my DB, which will copy data from 
> tableA
> to tableB. For
> > that, I'm writing a PL/PGSQL function which basically needs to do the
> following:
> >
> >
> > 1.    Select the data from tableA
> > 2.    The limit will be put when calling the function
> > 3.    insert the selected data on Step 1 onto new table
[...]
> >               FOR row IN EXECUTE '
> >                           SELECT
> >                                   id,
> >                                   path,
> >                                   name,
> >                                   name_last,
> >                                   created_at
> >                           FROM
> >                                   tablea
> >                           WHERE
> >                                   ready = true
> >                           ORDER BY 1 LIMIT ' || rows || ' OFFSET ' ||
> rows || ''
> 
> '... LIMIT ' || p_limit || ' OFFSET ' || p_offset
> 
> >               LOOP
> 
> num_rows := num_rows + 1;
>
> >               INSERT INTO tableB (id,path,name,name_last,created_at)
> >               VALUES (row.id,row.path,row.name,row.
> name_last,row.created_at);
> >
> >               END LOOP;
[...]
> 
> There are two problems with this approach:
> 
> 1. It will do the wrong thing if rows are added or deleted in "tablea"
> while
>    you process it.
> 
> 
> 
> There will be actually records being inserted in tablea while processing the
> migration Any ideas here?

Is id monotonically increasing? You might be able to use that, as Albe
suggests:

> The solution is to avoid OFFSET and to use "keyset pagination":
> http://use-the-index-luke.com/no-offset

But it works only if rows cannot become ready after their id range has
already been processed. Otherwise you will miss them.

> I can add another column in tablea, like example: row_migrated boolean --> if
> that helps

Yes that's probably the best way. Instead of using an additional column
you could also make ready tristate: New -> ready_for_migration -> migrated.

> 2. Queries with hight OFFSET values have bad performance.
> 
> 
> No problem. The plan is to perform 2k rows at once, which is not much.

Are rows deleted from tablea after they are migrated? Otherwise you will
have a problem:

select ... limit 2000 offset 1234000

will have to retrieve 1236000 rows and then discard 1234000 of them.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: Digital signature


Re: [GENERAL] plpgsql function with offset - Postgres 9.1

2017-06-15 Thread David G. Johnston
On Thu, Jun 15, 2017 at 3:49 PM, Patrick B  wrote:

> 2017-06-16 10:35 GMT+12:00 David G. Johnston :
>
>> On Thu, Jun 15, 2017 at 3:19 PM, Patrick B 
>> wrote:
>>
>>> 2017-05-29 19:27 GMT+12:00 Albe Laurenz :
>>>
 Patrick B wrote:
 > I am running a background task on my DB, which will copy data from
 tableA to tableB. For
 > that, I'm writing a PL/PGSQL function which basically needs to do the
 following:
 >
>>>
>>>
>> ​https://www.postgresql.org/message-id/CY1PR18MB0490632A9A73
>> E64973F66D6BAFCE0%40CY1PR18MB0490.namprd18.prod.outlook.com
>>
>> My problem is when using LIMIT to select the data I get none. I assume
> because the first 3000 rows (i'm using ORDER BY 1) are not candidates for
> the migration.
>

Your where clause should only allow candidates to be selected.  If it does
then your theory is wrong.

David J.​


Re: [GENERAL] plpgsql function with offset - Postgres 9.1

2017-06-15 Thread Patrick B
2017-06-16 10:35 GMT+12:00 David G. Johnston :

> On Thu, Jun 15, 2017 at 3:19 PM, Patrick B 
> wrote:
>
>> 2017-05-29 19:27 GMT+12:00 Albe Laurenz :
>>
>>> Patrick B wrote:
>>> > I am running a background task on my DB, which will copy data from
>>> tableA to tableB. For
>>> > that, I'm writing a PL/PGSQL function which basically needs to do the
>>> following:
>>> >
>>
>>
> ​https://www.postgresql.org/message-id/CY1PR18MB0490632A9A73E64973F66
> D6BAFCE0%40CY1PR18MB0490.namprd18.prod.outlook.com
>
> David J​
>


My problem is when using LIMIT to select the data I get none. I assume
because the first 3000 rows (i'm using ORDER BY 1) are not candidates for
the migration.


Re: [GENERAL] plpgsql function with offset - Postgres 9.1

2017-06-15 Thread David G. Johnston
On Thu, Jun 15, 2017 at 3:19 PM, Patrick B  wrote:

> 2017-05-29 19:27 GMT+12:00 Albe Laurenz :
>
>> Patrick B wrote:
>> > I am running a background task on my DB, which will copy data from
>> tableA to tableB. For
>> > that, I'm writing a PL/PGSQL function which basically needs to do the
>> following:
>> >
>
>
​
https://www.postgresql.org/message-id/CY1PR18MB0490632A9A73E64973F66D6BAFCE0%40CY1PR18MB0490.namprd18.prod.outlook.com

David J​


Re: [GENERAL] plpgsql function with offset - Postgres 9.1

2017-06-15 Thread Patrick B
2017-05-29 19:27 GMT+12:00 Albe Laurenz :

> Patrick B wrote:
> > I am running a background task on my DB, which will copy data from
> tableA to tableB. For
> > that, I'm writing a PL/PGSQL function which basically needs to do the
> following:
> >
> >
> > 1.Select the data from tableA
> > 2.The limit will be put when calling the function
> > 3.insert the selected data on Step 1 onto new table
> >
> > Question:
> >
> > * When I stop it and start it again, how can the query "know" that
> it has already
> > processed some rows so it won't do it twice on the same rows? If it
> stopped on row number
> > 100, I need it to continue on row number 101, for example.
> >
> > * How can I ask the function to return the number of processed rows?
> >
> >
> > I can add a column on TableB if needed, but not on tableA.
> >
> > This is what I've done so far:
>
> >   CREATE or REPLACE FUNCTION data_copy(rows integer)
> >   RETURNS SETOF bigint AS $$
>
> CREATE or REPLACE FUNCTION data_copy(p_limit integer, p_offset integer)
> RETURNS integer;
>
> >   declare
> >   row record;
> >   offset_num integer;
>
> num_rows integer := 0;
>
> >   BEGIN
> >   FOR row IN EXECUTE '
> >   SELECT
> >   id,
> >   path,
> >   name,
> >   name_last,
> >   created_at
> >   FROM
> >   tablea
> >   WHERE
> >   ready = true
> >   ORDER BY 1 LIMIT ' || rows || ' OFFSET ' ||
> rows || ''
>
> '... LIMIT ' || p_limit || ' OFFSET ' || p_offset
>
> >   LOOP
>
> num_rows := num_rows + 1;
>
> >   INSERT INTO tableB (id,path,name,name_last,created_at)
> >   VALUES (row.id,row.path,row.name,row.
> name_last,row.created_at);
> >
> >   END LOOP;
>
> RETURN num_rows;
>
> >   END
> >   $$ language 'plpgsql';
>
> There are two problems with this approach:
>
> 1. It will do the wrong thing if rows are added or deleted in "tablea"
> while
>you process it.
>


There will be actually records being inserted in tablea while processing
the migration Any ideas here?

I can add another column in tablea, like example: row_migrated boolean -->
if that helps


>
> 2. Queries with hight OFFSET values have bad performance.
>

No problem. The plan is to perform 2k rows at once, which is not much.



>
> The solution is to avoid OFFSET and to use "keyset pagination":
> http://use-the-index-luke.com/no-offset
>
>
>

Thanks
Patrick


Re: [GENERAL] plpgsql function with offset - Postgres 9.1

2017-05-29 Thread Albe Laurenz
Patrick B wrote:
> I am running a background task on my DB, which will copy data from tableA to 
> tableB. For
> that, I'm writing a PL/PGSQL function which basically needs to do the 
> following:
> 
> 
> 1.Select the data from tableA
> 2.The limit will be put when calling the function
> 3.insert the selected data on Step 1 onto new table
> 
> Question:
> 
> * When I stop it and start it again, how can the query "know" that it has 
> already
> processed some rows so it won't do it twice on the same rows? If it stopped 
> on row number
> 100, I need it to continue on row number 101, for example.
> 
> * How can I ask the function to return the number of processed rows?
> 
> 
> I can add a column on TableB if needed, but not on tableA.
> 
> This is what I've done so far:

>   CREATE or REPLACE FUNCTION data_copy(rows integer)
>   RETURNS SETOF bigint AS $$

CREATE or REPLACE FUNCTION data_copy(p_limit integer, p_offset integer)
RETURNS integer;

>   declare
>   row record;
>   offset_num integer;

num_rows integer := 0;

>   BEGIN
>   FOR row IN EXECUTE '
>   SELECT
>   id,
>   path,
>   name,
>   name_last,
>   created_at
>   FROM
>   tablea
>   WHERE
>   ready = true
>   ORDER BY 1 LIMIT ' || rows || ' OFFSET ' || rows || 
> ''

'... LIMIT ' || p_limit || ' OFFSET ' || p_offset

>   LOOP

num_rows := num_rows + 1;

>   INSERT INTO tableB (id,path,name,name_last,created_at)
>   VALUES (row.id,row.path,row.name,row.name_last,row.created_at);
> 
>   END LOOP;

RETURN num_rows;

>   END
>   $$ language 'plpgsql';

There are two problems with this approach:

1. It will do the wrong thing if rows are added or deleted in "tablea" while
   you process it.

2. Queries with hight OFFSET values have bad performance.

The solution is to avoid OFFSET and to use "keyset pagination":
http://use-the-index-luke.com/no-offset

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] plpgsql function with offset - Postgres 9.1

2017-05-28 Thread Patrick B
Hi guys,

I am running a background task on my DB, which will copy data from tableA
to tableB. For that, I'm writing a PL/PGSQL function which basically needs
to do the following:


   1. Select the data from tableA
   2. The limit will be put when calling the function
   3. insert the selected data on Step 1 onto new table

Question:

   - When I stop it and start it again, how can the query "know" that it
   has already processed some rows so it won't do it twice on the same rows?
   If it stopped on row number 100, I need it to continue on row number 101,
   for example.
   - How can I ask the function to return the number of processed rows?


I can add a column on TableB if needed, but not on tableA.

This is what I've done so far:

select data_copy(500);


CREATE or REPLACE FUNCTION data_copy(rows integer)

RETURNS SETOF bigint AS $$


declare

row record;

offset_num integer;


BEGIN


FOR row IN EXECUTE '

SELECT

id,

path,

name,

name_last,

created_at

FROM

tablea

WHERE

ready = true

ORDER BY 1 LIMIT ' || rows || ' OFFSET ' || rows || ''

LOOP


INSERT INTO tableB (id,path,name,name_last,created_at)

VALUES (row.id,row.path,row.name,row.name_last,row.created_at);



END LOOP;


END


$$ language 'plpgsql';