On Thu, Jun 2, 2016 at 5:03 PM, Patrick Baker <patrickbake...@gmail.com>
wrote:

>
>
> 2016-06-03 2:10 GMT+12:00 David G. Johnston <david.g.johns...@gmail.com>:
>
>> On Thu, Jun 2, 2016 at 1:04 AM, Patrick Baker <patrickbake...@gmail.com>
>> wrote:
>>
>>>
>>>>> It's all working, except the LIMIT...  if possible can you please give
>>> me an example of that LIMIT in some of those queries?
>>>
>>> ​​
>> You also should use ORDER BY when using LIMIT and OFFSET; though
>> depending on the setup it could be omitted.  Usually as long as the second
>> execution cannot select any of the records the first execution touched you
>> can choose a random quantity.  But if you want random then using OFFSET is
>> pointless.
>>
>> ​SELECT *
>> FROM generate_series(1, 10)
>> ORDER BY 1
>> LIMIT 5
>> OFFSET 3
>>
>> generate_series
>> ----------------------
>> 4
>> 5
>> 6
>> 7
>> 8
>> ​
>> You are going to have difficultly finding people willing to help when you
>> cannot put together a self-contained and syntax error free example (I think
>> the last one is...) of what you want to do.  The PostgreSQL parser is very
>> good at reading code and telling you what it doesn't like.  I'm not
>> inclined to spend time reading queries that obviously cannot run and point
>> out those same problems.  If you can a particular error you don't
>> understand I'll be happy to try and explain what it is trying to tell you.
>>
>> ​You probably need to reformulate your update to read:
>>
>> UPDATE tbl
>> FROM (
>> SELECT 50 RECORDS
>> )​ src
>> WHERE src = tbl;
>>
>> ​And ensure that the 50 being selected each time through are a different
>> 50.
>>
>> Writeable CTEs will probably help here.
>>
>> https://www.postgresql.org/docs/current/static/queries-with.html
>>
>> ​David J.
>>
>>
>
> Hi David.
>
> The SQLs inside the function works....
>

​Really?  ​You seem to have lost your FOR loop for starters, and your
RETURN statement, and a semi-colon after END, and I doubt crtRow.file_id
works, should I go on...so, yes, you can run the four individual SQL
statements correctly but the function itself is bogus.


> I'm just having problem about limiting the query to the number of rows I
> want, and also, to teach the update SQL to only touch the records the other
> SQLs inside the function have touched.
>
This is the function updated:
>
> CREATE or REPLACE FUNCTION function_data_1(rows integer)
>
> RETURNS INTEGER AS $$
>
>
> declare
>
>   completed integer;
>
>   offset_num integer;
>
>   crtRow record;
>
>
> BEGIN
>
>   offset_num = 0;
>
>
> INSERT INTO table2_y_b (note_id, size, file_id, full_path)
>
>     (
>
>             SELECT
>
>                     t1.note_id,
>
>                     t1.size,
>
>                     t1.file_id,
>
>                     t1.full_path
>
>             FROM
>
>                     table1_n_b t1
>
>             JOIN
>
>                     table3_n_b t3 ON t3.file_id = t1.file_id
>
>     );
>
>
> UPDATE table2_y_b t2 SET segment_data =
>
>     (
>
>             SELECT
>
>                     o1.data
>
>             FROM
>
>                     original_table1_b o1
>
>             JOIN
>
>                     table3_n_b t3 ON t3.file_id = o1.file_id
>
>             WHERE
>
>                     t2.migrated = 0
>
>             AND
>
>                     t2.file_id = o1.file_id
>
>     );
>
>
> UPDATE table2_y_b SET migrated = 1 WHERE file_id = crtRow.file_id AND
> migrated = 0;
>
>
> UPDATE original_table1_b SET data = NULL WHERE file_id = crtRow.file_id;
>
>
> END
>
>
> $$ language 'plpgsql';
>
>
>
>
>
> - As you can see, the first *insert*, inserts data into a new table from
> another select. This query must be limited by the number of rows I'll
> provide when calling the function; example:
>
> select function_data_1(5000);
>> select function_data_1(60000);
>> select function_data_1(15000);
>
>
> - The first *update*, copies the BLOBS from the original_table1_b table
> into the new one (as above). Here, I also need the query knows to only
> touch those records that have been touched by the above query.
>
>
> - The second *update*, set the table2_y_b.migrated column from 0 to 1,
> telling me that, that record has been touched by the query. So the next
> call ( select function_data_1(60000); ) will already know that it does not
> need to touch that record; example:
>
> WHERE
>>                     t2.migrated = 0
>
>
> - The third and last *update*, deletes (set the blobs column as null) the
> blobs that have already been touched by the above queries.... Still.. don't
> know how to tell postgres to only touches the rows that have been touched
> by the above queries....
>
>
​Here's a fish - though you will still need to clean it.​

​This is not tested, and I haven't ever build this exact query for real,
but it should work in theory...

--assumes that to be migrated records have previously had their migrated
flag set to 0

function name (number_of_rows_to_process integer)
LANGUAGE sql -- this no longer requires procedural logic so no need for
plpgsql
RETURNS SETOF bigint --returns the affected ids
AS $$
WITH the_records_I_want_to_affect AS (
-- pick N records to process
SELECT id, ...
FROM source_tbl
WHERE migrated = 0
​ORDER BY ...
LIMIT number_of_rows_to_process  -- your function argument goes here
FOR UPDATE
)​,
migrate_the_data AS (
-- place a copy of them into the archive table
INSERT INTO migration_table
SELECT id, ...
FROM the_records_I_want_to_affect
RETURNING *
),
mark_as_migrated AS (
-- mark them as having been archived and nullify the blob data
UPDATE source_tbl
SET migrated = 1, data = null
FROM migrate_the_data recs
WHERE recs.id = source_tbl.id
RETURNING source_tbl.id
)
SELECT id FROM mark_as_migrated;
$$

​I am sure a fully working version of this idiom in present in one and more
places on the internet.  Feel free to search out fully working examples
with additional commentary.​

You can make a FOR loop version of this work, and had to many years ago
before writable CTEs were implemented.

David J.

Reply via email to