Re: [HACKERS] [PATCH v3] pg_progress() SQL function to monitor progression of long running SQL queries/utilities

Pavel Stehule Tue, 01 Aug 2017 13:08:16 -0700

2017-08-01 18:35 GMT+02:00 Remi Colinet <remi.coli...@gmail.com>:

> I did it in version 2 of the patch.
> The patch could yield TEXT, JSON, and XML ouput.
>
> For below query, it gives:
>
> => Terminal 1
> test=# select * from t_10m, t_1m where t_10m.md5 like '%cb%';
>
> => Terminal 2
> test=# \watch PROGRESS 9546;
>                                Wed 10 May 2017 06:29:59 PM CEST (every 1s)
>
>                                               PLAN
> PROGRESS
> ------------------------------------------------------------
> ---------------------------------------------
>  status: <query running>
>  query: select * from t_10m, t_1m where t_10m.md5 like '%cb%';
>  time used (s): 10
>  Nested Loop
>    ->  Seq Scan on t_1m => rows 7/1000000 0% => blks 8334/8334 100%
>    ->  Materialize => file read readptrcount=1 rows write=1189285
> read=6584854 disk use (bytes) 53842965
>          ->  Seq Scan on t_10m => rows 1145596/738172 155%
>                Filter: (md5 ~~ '%cb%'::text)
>  total disk space used (MB) 51
> (9 rows)
>
> => Terminal 2
> test=# PROGRESS (FORMAT JSON) 9546;
>                             PLAN PROGRESS
> ----------------------------------------------------------------------
>  [                                                                   +
>    "status": "<query running>",                                      +
>    "query": "select * from t_10m, t_1m where t_10m.md5 like '%cb%';",+
>    "time used (s)": 0,                                               +
>    "single worker": {                                                +
>      "Node Type": "Nested Loop",                                     +
>      "Partial Mode": "",                                             +
>      "Operation": "single worker",                                   +
>      "Parent Relationship": "single worker",                         +
>      "Custom Plan Provider": "(@\u0004\u0001",                       +
>      "Parallel Aware": false,                                        +
>      "Outer": {                                                      +
>        "Node Type": "Seq Scan",                                      +
>        "Strategy": "",                                               +
>        "Partial Mode": "single worker",                              +
>        "Operation": "Outer",                                         +
>        "Parent Relationship": "Outer",                               +
>        "Custom Plan Provider": "(@\u0004\u0001",                     +
>        "Parallel Aware": false,                                      +
>        "relation": "t_1m",                                           +
>        "rows fetched": 1,                                            +
>        "rows total": 1000000,                                        +
>        "rows percent": 0,                                            +
>        "blocks fetched": 8334,                                       +
>        "blocks total": 8334,                                         +
>        "blocks percent": 100                                         +
>      },                                                              +
>      "Inner": {                                                      +
>        "Node Type": "Materialize",                                   +
>        "Strategy": "",                                               +
>        "Partial Mode": "single worker",                              +
>        "Operation": "Inner",                                         +
>        "Parent Relationship": "Inner",                               +
>        "Custom Plan Provider": "(@\u0004\u0001",                     +
>        "Parallel Aware": false,                                      +
>        "file store": "write",                                        +
>        "readptrcount": 1,                                            +
>        "rows write": 297256,                                         +
>        "rows read": 0,                                               +
>        "disk use (bytes)": 11911168,                                 +
>        "Outer": {                                                    +
>          "Node Type": "Seq Scan",                                    +
>          "Strategy": "",                                             +
>          "Partial Mode": "Inner",                                    +
>          "Operation": "Outer",                                       +
>          "Parent Relationship": "Outer",                             +
>          "Custom Plan Provider": "HtFH\b[]\u000f\u001f",             +
>          "Parallel Aware": false,                                    +
>          "relation": "t_10m",                                        +
>          "rows fetched": 253566,                                     +
>          "rows total": 738172,                                       +
>          "rows percent": 34,                                         +
>          "blocks fetched": 18436,                                    +
>          "blocks total": 83334,                                      +
>          "blocks percent": 22,                                       +
>          "Filter": "(md5 ~~ '%cb%'::text)"                           +
>        }                                                             +
>      }                                                               +
>    },                                                                +
>    "unit": "MB",                                                     +
>    "total disk space used": 11                                       +
>  ]
> (1 row)
>
> test=#
>
> I'am skeptical about the use of JSON, XML, and others in such output.
>
> Does anyone use these formats (XML, JSON, YAML) for EXPLAIN output?
> I suspect only TEXT format is being used.
>
> This looks better with a SQL table made of name/value fields to be used
> both by administrators and by utilities.
>


I disagree. It should be consistent with EXPLAIN - there is zero benefit
from introduction new format.

I know few utils that uses JSON format.

Regards

Pavel




> And this is much more inline with SQL. No need to parse JSON, XML or any
> other format. Output can be controlled with WHERE clause.
>
> Rgds
> Remi
>
> 2017-07-31 12:10 GMT+02:00 Pavel Stehule <pavel.steh...@gmail.com>:
>
>>
>>
>> 2017-07-31 11:09 GMT+02:00 Remi Colinet <remi.coli...@gmail.com>:
>>
>>>
>>>
>>> 2017-07-26 15:27 GMT+02:00 Robert Haas <robertmh...@gmail.com>:
>>>
>>>> On Wed, Jun 21, 2017 at 10:01 AM, Remi Colinet <remi.coli...@gmail.com>
>>>> wrote:
>>>> > test=# SELECT  pid, ppid, bid, concat(repeat(' ', 3 * indent),name),
>>>> value,
>>>> > unit FROM pg_progress(0,0);
>>>> >   pid  | ppid | bid |      concat      |      value       |  unit
>>>> > -------+------+-----+------------------+------------------+---------
>>>> >  14106 |    0 |   4 | status           | query running    |
>>>> >  14106 |    0 |   4 | relationship     | progression      |
>>>> >  14106 |    0 |   4 |    node name     | Sort             |
>>>> >  14106 |    0 |   4 |    sort status   | on tapes writing |
>>>> >  14106 |    0 |   4 |    completion    | 0                | percent
>>>> >  14106 |    0 |   4 |    relationship  | Outer            |
>>>> >  14106 |    0 |   4 |       node name  | Seq Scan         |
>>>> >  14106 |    0 |   4 |       scan on    | t_10m            |
>>>> >  14106 |    0 |   4 |       fetched    | 25049            | block
>>>> >  14106 |    0 |   4 |       total      | 83334            | block
>>>> >  14106 |    0 |   4 |       completion | 30               | percent
>>>> > (11 rows)
>>>> >
>>>> > test=#
>>>>
>>>> Somehow I imagined that the output would look more like what EXPLAIN
>>>> produces.
>>>>
>>>
>>>
>>> I had initially used the same output as for the ANALYZE command:
>>>
>>> test=# PROGRESS 14611;
>>>                                       PLAN
>>> PROGRESS
>>> ------------------------------------------------------------
>>> -----------------------------
>>>  Gather Merge
>>>    ->  Sort=> dumping tuples to tapes
>>>          rows r/w merge 0/0 rows r/w effective 0/1464520 0%
>>>          Sort Key: md5
>>>          ->  Parallel Seq Scan on t_10m => rows 1464520/4166700 35% blks
>>> 36011/83334 43%
>>> (5 rows)
>>>
>>> test=#
>>>
>>> But this restricts the use to "human consumers". Using a table output
>>> with name/value pairs, allows the use by utilities for instance, without
>>> parsing. This is less handy for administrators, but far better for 3rd
>>> party utilities. One solution is otherwise to create a PL/SQL command on
>>> top of pg_progress() SQL function to produce an output similar to the one
>>> of the ANALYZE command.
>>>
>>
>> you can support XML, JSON output format like EXPLAIN does.
>>
>>  https://www.postgresql.org/docs/current/static/sql-explain.html
>>
>> Regards
>>
>> pavel
>>
>>>
>>>
>>>> > If the one shared memory page is not enough for the whole progress
>>>> report,
>>>> > the progress report transfert between the 2 backends is done with a
>>>> series
>>>> > of request/response. Before setting the latch, the monitored backend
>>>> write
>>>> > the size of the data dumped in shared memory and set a status to
>>>> indicate
>>>> > that more data is to be sent through the shared memory page. The
>>>> monitoring
>>>> > backends get the result and sends an other signal, and then wait for
>>>> the
>>>> > latch again. The monitored backend does not collect a new progress
>>>> report
>>>> > but continues to dump the already collected report. And the exchange
>>>> goes on
>>>> > until the full progress report has been dumped.
>>>>
>>>> This is basically what shm_mq does.  We probably don't want to
>>>> reinvent that code, as it has taken a surprising amount of debugging
>>>> to get it fully working.
>>>>
>>>
>>> Yes, I had once considered this solution but then moved away as I was
>>> unsure of the exact need for the transfert of the progress report between
>>> the monitored and the monitoring backends.
>>> I'am going to switch to shm_mq.
>>>
>>> Thx & Rgds
>>>
>>>
>>>
>>>>
>>>> --
>>>> Robert Haas
>>>> EnterpriseDB: http://www.enterprisedb.com
>>>> The Enterprise PostgreSQL Company
>>>>
>>>
>>>
>>
>

Re: [HACKERS] [PATCH v3] pg_progress() SQL function to monitor progression of long running SQL queries/utilities

Reply via email to