Re: [GENERAL] how to monitor the progress of really large bulk operations?

2016-09-28 Thread Pavel Stehule
2016-09-28 14:34 GMT+02:00 Mike Sofen :

> *From:* Pavel Stehule*Sent:* Tuesday, September 27, 2016 9:18 PM
> 2016-09-28 6:13 GMT+02:00 Pavel Stehule :
>
> Hi
>
> 2016-09-27 23:03 GMT+02:00 Mike Sofen :
>
> Hi gang,
>
> how to view the state of a transaction in flight, seeing how many rows
> have been read or inserted (possible for a transaction in flight?), memory
> allocations across the various PG processes, etc.
>
> some years ago I used a trick http://okbob.blogspot.cz/2014/
> 09/nice-unix-filter-pv.html#links
>
>
>
> pltoolbox has counter function https://github.com/okbob/
> pltoolbox/blob/master/utils.c
>
> pavel=# insert into omega2 select (x.xx).*
>
>from (select pst.counter(omega,20, true) xx
>
> from omega
>
> ) x;
>
> NOTICE:  processed 20 rows, current value is '(5,8)'
>
> NOTICE:  processed 20 rows, current value is '(5,8)'
>
> Regards
>
> Pavel
>
>
>
>
>
> Pavel - That’s a very interesting function and thanks for sharing your
> toolbox.  The big question of course, is what is the impact on performance,
> scalability and stability?  Would it work inside of a stored function that
> would allow me write out the progress to a tracking table?
>

When a IO is bottleneck then counter has zero overhead. The usage is same
as any PostgreSQL set returning function. This function should be stable -
it is pretty simple

Regards

Pavel


>
>
> Mike
>


Re: [GENERAL] how to monitor the progress of really large bulk operations?

2016-09-28 Thread Mike Sofen
From: Pavel StehuleSent: Tuesday, September 27, 2016 9:18 PM
2016-09-28 6:13 GMT+02:00 Pavel Stehule  >:

Hi

2016-09-27 23:03 GMT+02:00 Mike Sofen  >:

Hi gang,

how to view the state of a transaction in flight, seeing how many rows have 
been read or inserted (possible for a transaction in flight?), memory 
allocations across the various PG processes, etc.

some years ago I used a trick 
http://okbob.blogspot.cz/2014/09/nice-unix-filter-pv.html#links

 

pltoolbox has counter function 
https://github.com/okbob/pltoolbox/blob/master/utils.c

pavel=# insert into omega2 select (x.xx).* 
   from (select pst.counter(omega,20, true) xx 
from omega
) x;
NOTICE:  processed 20 rows, current value is '(5,8)'
NOTICE:  processed 20 rows, current value is '(5,8)'

Regards

Pavel

 

 

Pavel - That’s a very interesting function and thanks for sharing your toolbox. 
 The big question of course, is what is the impact on performance, scalability 
and stability?  Would it work inside of a stored function that would allow me 
write out the progress to a tracking table?  

 

Mike



Re: [GENERAL] how to monitor the progress of really large bulk operations?

2016-09-27 Thread Pavel Stehule
2016-09-28 6:13 GMT+02:00 Pavel Stehule :

> Hi
>
> 2016-09-27 23:03 GMT+02:00 Mike Sofen :
>
>> Hi gang,
>>
>>
>>
>> On PG 9.5.1, linux, I’m running some large ETL operations, migrate data
>> from a legacy mysql system into PG, upwards of 250m rows in a transaction
>> (it’s on a big box).  It’s always a 2 step operation – extract raw mysql
>> data and pull it to the target big box into staging tables that match the
>> source, the second step being read the landed dataset and transform it into
>> the final formats, linking to newly generated ids, compressing big subsets
>> into jsonb documents, etc.
>>
>>
>>
>> While I could break it into smaller chunks, it hasn’t been necessary, and
>> it doesn’t eliminate my need:  how to view the state of a transaction in
>> flight, seeing how many rows have been read or inserted (possible for a
>> transaction in flight?), memory allocations across the various PG
>> processes, etc.
>>
>>
>>
>> Possible or a hallucination?
>>
>>
>>
>> Mike Sofen (Synthetic Genomics)
>>
>
> some years ago I used a trick http://okbob.blogspot.cz/2014/
> 09/nice-unix-filter-pv.html#links
>

pltoolbox has counter function
https://github.com/okbob/pltoolbox/blob/master/utils.c

pavel=# insert into omega2 select (x.xx).*
   from (select pst.counter(omega,20, true) xx
from omega
) x;
NOTICE:  processed 20 rows, current value is '(5,8)'
NOTICE:  processed 20 rows, current value is '(5,8)'
NOTICE:  processed 40 rows, current value is '(6,8)'
NOTICE:  processed 40 rows, current value is '(6,8)'
NOTICE:  processed 60 rows, current value is '(7,8)'
NOTICE:  processed 60 rows, current value is '(7,8)'
NOTICE:  processed 80 rows, current value is '(1,8)'
NOTICE:  processed 80 rows, current value is '(1,8)'
NOTICE:  processed 100 rows, current value is '(5,8)'
NOTICE:  processed 100 rows, current value is '(5,8)'




>
>
> Regards
>
> Pavel
>
>


Re: [GENERAL] how to monitor the progress of really large bulk operations?

2016-09-27 Thread Pavel Stehule
Hi

2016-09-27 23:03 GMT+02:00 Mike Sofen :

> Hi gang,
>
>
>
> On PG 9.5.1, linux, I’m running some large ETL operations, migrate data
> from a legacy mysql system into PG, upwards of 250m rows in a transaction
> (it’s on a big box).  It’s always a 2 step operation – extract raw mysql
> data and pull it to the target big box into staging tables that match the
> source, the second step being read the landed dataset and transform it into
> the final formats, linking to newly generated ids, compressing big subsets
> into jsonb documents, etc.
>
>
>
> While I could break it into smaller chunks, it hasn’t been necessary, and
> it doesn’t eliminate my need:  how to view the state of a transaction in
> flight, seeing how many rows have been read or inserted (possible for a
> transaction in flight?), memory allocations across the various PG
> processes, etc.
>
>
>
> Possible or a hallucination?
>
>
>
> Mike Sofen (Synthetic Genomics)
>

some years ago I used a trick
http://okbob.blogspot.cz/2014/09/nice-unix-filter-pv.html#links

Regards

Pavel


Re: [GENERAL] how to monitor the progress of really large bulk operations?

2016-09-27 Thread Melvin Davidson
On Tue, Sep 27, 2016 at 5:03 PM, Mike Sofen  wrote:

> Hi gang,
>
>
>
> On PG 9.5.1, linux, I’m running some large ETL operations, migrate data
> from a legacy mysql system into PG, upwards of 250m rows in a transaction
> (it’s on a big box).  It’s always a 2 step operation – extract raw mysql
> data and pull it to the target big box into staging tables that match the
> source, the second step being read the landed dataset and transform it into
> the final formats, linking to newly generated ids, compressing big subsets
> into jsonb documents, etc.
>
>
>
> While I could break it into smaller chunks, it hasn’t been necessary, and
> it doesn’t eliminate my need:  how to view the state of a transaction in
> flight, seeing how many rows have been read or inserted (possible for a
> transaction in flight?), memory allocations across the various PG
> processes, etc.
>
>
>
> Possible or a hallucination?
>
>
>
> Mike Sofen (Synthetic Genomics)
>

AFAIK, it is not currently possible to monitor the progress/status of a
query. However, I do see that this is planned for "sometime in the  future".
*https://postgresql.uservoice.com/forums/21853-general/suggestions/249745-query-progress-reported-by-percentage-in-pg-stat
*

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


[GENERAL] how to monitor the progress of really large bulk operations?

2016-09-27 Thread Mike Sofen
Hi gang,

 

On PG 9.5.1, linux, I'm running some large ETL operations, migrate data from
a legacy mysql system into PG, upwards of 250m rows in a transaction (it's
on a big box).  It's always a 2 step operation - extract raw mysql data and
pull it to the target big box into staging tables that match the source, the
second step being read the landed dataset and transform it into the final
formats, linking to newly generated ids, compressing big subsets into jsonb
documents, etc.

 

While I could break it into smaller chunks, it hasn't been necessary, and it
doesn't eliminate my need:  how to view the state of a transaction in
flight, seeing how many rows have been read or inserted (possible for a
transaction in flight?), memory allocations across the various PG processes,
etc.

 

Possible or a hallucination?

 

Mike Sofen (Synthetic Genomics)