subject:"Re\: \[HACKERS\] ANALYZE command progress checker"

Re: [HACKERS] ANALYZE command progress checker

2017-10-01 Thread Daniel Gustafsson

> On 05 Apr 2017, at 03:17, Masahiko Sawada  wrote:
> 
> On Wed, Apr 5, 2017 at 1:49 AM, Robert Haas  wrote:
>> On Tue, Apr 4, 2017 at 4:57 AM, Amit Langote
>>  wrote:
>>> Hmm, you're right.  It could be counted with a separate variable
>>> initialized to 0 and incremented every time we decide to add a row to the
>>> final set of sampled rows, although different implementations of
>>> AcquireSampleRowsFunc have different ways of deciding if a given row will
>>> be part of the final set of sampled rows.
>>> 
>>> On the other hand, if we decide to count progress in terms of blocks as
>>> you suggested afraid, I'm afraid that FDWs won't be able to report the
>>> progress.
>> 
>> I think it may be time to push this patch out to v11.  It was
>> submitted one day before the start of the last CommitFest, the design
>> wasn't really right, and it's not clear even now that we know what the
>> right design is.  And we're pretty much out of time.
> 
> +1
> We're encountering the design issue and it takes more time to find out
> right design including FDWs support.

I’m marking this patch Returned with Feedback since it has been Waiting for
author during the commitfest without updates.

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-09-06 Thread Tatsuro Yamada


On 2017/04/05 10:17, Masahiko Sawada wrote:

On Wed, Apr 5, 2017 at 1:49 AM, Robert Haas  wrote:

On Tue, Apr 4, 2017 at 4:57 AM, Amit Langote
 wrote:

Hmm, you're right.  It could be counted with a separate variable
initialized to 0 and incremented every time we decide to add a row to the
final set of sampled rows, although different implementations of
AcquireSampleRowsFunc have different ways of deciding if a given row will
be part of the final set of sampled rows.

On the other hand, if we decide to count progress in terms of blocks as
you suggested afraid, I'm afraid that FDWs won't be able to report the
progress.


I think it may be time to push this patch out to v11.  It was
submitted one day before the start of the last CommitFest, the design
wasn't really right, and it's not clear even now that we know what the
right design is.  And we're pretty much out of time.



+1
We're encountering the design issue and it takes more time to find out
right design including FDWs support.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Hi Vinayak,

I found a typo in your patch: s/taht/that/

sampling.c
   /* Report total number of blocks taht will be sampled */

Then, regarding FDWs support, I believe it means postgres_fdw support (is my 
understanding right?).
You might better to put pgstat_progress_update_param() into these functions, 
maybe.
  postgres_fdw.c
- postgresAnalyzeForeignTable
- postgresAcquireSampleRowsFunc

And PgFdwAnalyzeState has these variables, hopefully you can get current number 
of rows
as a progress indicator.

  sturuct PgFdwAnalyzeState
  ...
/* collected sample rows */
HeapTuple  *rows;   /* array of size targrows */
int targrows;   /* target # of sample rows */
int numrows;/* # of sample rows collected */

/* for random sampling */
double  samplerows; /* # of rows fetched */
double  rowstoskip; /* # of rows to skip before next sample */
  ...

I hope it will help you.

Regards,
Tatsuro Yamada



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-04-04 Thread Masahiko Sawada

On Wed, Apr 5, 2017 at 1:49 AM, Robert Haas  wrote:
> On Tue, Apr 4, 2017 at 4:57 AM, Amit Langote
>  wrote:
>> Hmm, you're right.  It could be counted with a separate variable
>> initialized to 0 and incremented every time we decide to add a row to the
>> final set of sampled rows, although different implementations of
>> AcquireSampleRowsFunc have different ways of deciding if a given row will
>> be part of the final set of sampled rows.
>>
>> On the other hand, if we decide to count progress in terms of blocks as
>> you suggested afraid, I'm afraid that FDWs won't be able to report the
>> progress.
>
> I think it may be time to push this patch out to v11.  It was
> submitted one day before the start of the last CommitFest, the design
> wasn't really right, and it's not clear even now that we know what the
> right design is.  And we're pretty much out of time.
>

+1
We're encountering the design issue and it takes more time to find out
right design including FDWs support.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-04-04 Thread Robert Haas

On Tue, Apr 4, 2017 at 4:57 AM, Amit Langote
 wrote:
> Hmm, you're right.  It could be counted with a separate variable
> initialized to 0 and incremented every time we decide to add a row to the
> final set of sampled rows, although different implementations of
> AcquireSampleRowsFunc have different ways of deciding if a given row will
> be part of the final set of sampled rows.
>
> On the other hand, if we decide to count progress in terms of blocks as
> you suggested afraid, I'm afraid that FDWs won't be able to report the
> progress.

I think it may be time to push this patch out to v11.  It was
submitted one day before the start of the last CommitFest, the design
wasn't really right, and it's not clear even now that we know what the
right design is.  And we're pretty much out of time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-04-04 Thread Amit Langote

On 2017/04/04 15:30, Masahiko Sawada wrote:
>> We can report progress in terms of individual blocks only inside
>> acquire_sample_rows(), which seems undesirable when one thinks that we
>> will be resetting the target for every child table.  We should have a
>> global target that considers all child tables in the inheritance
>> hierarchy, which maybe is possible if we count them beforehand in
>> acquire_inheritance_sample_rows().  But why not use target sample rows,
>> which remains the same for both when we're collecting sample rows from one
>> table and from the whole inheritance hierarchy.  We can keep the count of
>> already collected rows in a struct that is used across calls for all the
>> child tables and increment upward from that count when we start collecting
>> from a new child table.
> 
> An another option I came up with is that support new pgstat progress
> function, say pgstat_progress_incr_param, which increments index'th
> member in st_progress_param[]. That way we just need to report a delta
> using that function.

That's an interesting idea.  It could be made to work and would not
require changing the interface of AcquireSampleRowsFunc, which seems very
desirable.

 /*
  * The first targrows sample rows are simply copied into the
  * reservoir. Then we start replacing tuples in the sample
  * until we reach the end of the relation.  This algorithm is
  * from Jeff Vitter's paper (see full citation below). It
  * works by repeatedly computing the number of tuples to skip
  * before selecting a tuple, which replaces a randomly chosen
  * element of the reservoir (current set of tuples).  At all
  * times the reservoir is a true random sample of the tuples
  * we've passed over so far, so when we fall off the end of
  * the relation we're done.
  */
>>
>> It seems that we could use samplerows instead of numrows to count the
>> progress (if we choose to count progress in terms of sample rows collected).
>>
> 
> I guess it's hard to count progress in terms of sample rows collected
> even if we use samplerows instead, because samplerows can be
> incremented independently of the target number of sampling rows. The
> samplerows can be incremented up to the total number of rows of
> relation.

Hmm, you're right.  It could be counted with a separate variable
initialized to 0 and incremented every time we decide to add a row to the
final set of sampled rows, although different implementations of
AcquireSampleRowsFunc have different ways of deciding if a given row will
be part of the final set of sampled rows.

On the other hand, if we decide to count progress in terms of blocks as
you suggested afraid, I'm afraid that FDWs won't be able to report the
progress.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-04-04 Thread Masahiko Sawada

On Tue, Apr 4, 2017 at 2:12 PM, Amit Langote
 wrote:
> On 2017/03/30 17:39, Masahiko Sawada wrote:
>> On Wed, Mar 29, 2017 at 5:38 PM, vinayak wrote:
> I have updated the patch.
>>
>> I reviewed v7 patch.
>>
>> When I ran ANALYZE command to the table having 5 millions rows with 3
>> child tables, the progress report I got shows the strange result. The
>> following result was output after sampled all target rows from child
>> tables (i.g, after finished acquire_inherited_sample_rows). I think
>> the progress information should report 100% complete at this point.
>>
>
> [...]
>
>>
>> I guess the cause of this is that you always count the number of
>> sampled rows in acquire_sample_rows staring from 0 for each child
>> table. num_rows_sampled is reset whenever starting collecting sample
>> rows.
>> Also, even if table has child table, the total number of sampling row
>> is not changed. I think that main differences between analyzing on
>> normal table and on partitioned table is where we fetch the sample row
>> from. So I'm not sure if adding "computing inherited statistics" phase
>> is desirable. If you want to report progress of collecting sample rows
>> on child table I think that you might want to add the information
>> showing which child table is being analyzed.
>
> Two kinds of statistics are collected if the table is a inheritance parent.
>
> First kind considers the table by itself and calculates regular
> single-table statistics using rows sampled from the only available heap
> (by calling do_analyze_rel() with inh=false).
>
> The second kind are called inheritance statistics, which consider rows
> sampled from the whole inheritance hierarchy (by calling do_analyze_rel()
> with inh=true).  Note that we are still collecting statistics for the
> parent table, so we not really "analyzing" child tables; we just collect
> sample rows from them to calculate whole-tree statistics for the root
> parent table.

Agreed.

>  It might still be possible to show the child table being
> sampled though.
>
>> --
>> pg_stat_progress_analyze.num_target_sample_rows is set while ignoring
>> the number of rows the target table has actually. So If the table has
>> rows less than target number of rows, the num_rows_sampled never reach
>> to num_target_sample_rows.
>>
>> --
>> @@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel,
>> }
>>
>> samplerows += 1;
>> +
>> +   /* Report number of rows sampled so far */
>> +
>> pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED,
>> numrows);
>> }
>> }
>>
>> I think that it's wrong to use numrows variable to report the progress
>> of the number of rows sampled. As the following comment mentioned, we
>> first save row into rows[] and increment numrows until numrows <
>> targrows. And then we could replace stored sample row with new sampled
>> row. That is, after collecting "numrows" rows, analyze can still take
>> a long time to get and replace the sample row. To computing progress
>> of collecting sample row, IMO reporting the number of blocks we
>> scanned so far is better than number of sample rows. Thought?
>
> We can report progress in terms of individual blocks only inside
> acquire_sample_rows(), which seems undesirable when one thinks that we
> will be resetting the target for every child table.  We should have a
> global target that considers all child tables in the inheritance
> hierarchy, which maybe is possible if we count them beforehand in
> acquire_inheritance_sample_rows().  But why not use target sample rows,
> which remains the same for both when we're collecting sample rows from one
> table and from the whole inheritance hierarchy.  We can keep the count of
> already collected rows in a struct that is used across calls for all the
> child tables and increment upward from that count when we start collecting
> from a new child table.

An another option I came up with is that support new pgstat progress
function, say pgstat_progress_incr_param, which increments index'th
member in st_progress_param[]. That way we just need to report a delta
using that function.

>
>>> /*
>>>  * The first targrows sample rows are simply copied into the
>>>  * reservoir. Then we start replacing tuples in the sample
>>>  * until we reach the end of the relation.  This algorithm is
>>>  * from Jeff Vitter's paper (see full citation below). It
>>>  * works by repeatedly computing the number of tuples to skip
>>>  * before selecting a tuple, which replaces a randomly chosen
>>>  * element of the reservoir (current set of tuples).  At all
>>>  * times the reservoir is a true random sample of the tuples
>>>  * we've passed over so far, so when we fall off the end of
>>>  * the relation we're done.
>>>  */
>
> It seems that we could use samplerows instead

Re: [HACKERS] ANALYZE command progress checker

2017-04-03 Thread Amit Langote

On 2017/04/04 14:15, Amit Langote wrote:
> On 2017/04/04 14:12, Amit Langote wrote:
>> Two kinds of statistics are collected if the table is a inheritance parent.
>>
>> First kind considers the table by itself and calculates regular
>> single-table statistics using rows sampled from the only available heap
>> (by calling do_analyze_rel() with inh=false).
> 
> Oops, should have also mentioned that this part is not true for the new
> partitioned tables.  There are single-table statistics for partitioned
> tables, because it contains no data.

Sorry again, I meant *no* single-table statistics partitioned tables...

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-04-03 Thread Amit Langote

On 2017/04/04 14:12, Amit Langote wrote:
> Two kinds of statistics are collected if the table is a inheritance parent.
> 
> First kind considers the table by itself and calculates regular
> single-table statistics using rows sampled from the only available heap
> (by calling do_analyze_rel() with inh=false).

Oops, should have also mentioned that this part is not true for the new
partitioned tables.  There are single-table statistics for partitioned
tables, because it contains no data.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-04-03 Thread Amit Langote

On 2017/03/30 17:39, Masahiko Sawada wrote:
> On Wed, Mar 29, 2017 at 5:38 PM, vinayak wrote:
 I have updated the patch.
> 
> I reviewed v7 patch.
> 
> When I ran ANALYZE command to the table having 5 millions rows with 3
> child tables, the progress report I got shows the strange result. The
> following result was output after sampled all target rows from child
> tables (i.g, after finished acquire_inherited_sample_rows). I think
> the progress information should report 100% complete at this point.
> 

[...]

> 
> I guess the cause of this is that you always count the number of
> sampled rows in acquire_sample_rows staring from 0 for each child
> table. num_rows_sampled is reset whenever starting collecting sample
> rows.
> Also, even if table has child table, the total number of sampling row
> is not changed. I think that main differences between analyzing on
> normal table and on partitioned table is where we fetch the sample row
> from. So I'm not sure if adding "computing inherited statistics" phase
> is desirable. If you want to report progress of collecting sample rows
> on child table I think that you might want to add the information
> showing which child table is being analyzed.

Two kinds of statistics are collected if the table is a inheritance parent.

First kind considers the table by itself and calculates regular
single-table statistics using rows sampled from the only available heap
(by calling do_analyze_rel() with inh=false).

The second kind are called inheritance statistics, which consider rows
sampled from the whole inheritance hierarchy (by calling do_analyze_rel()
with inh=true).  Note that we are still collecting statistics for the
parent table, so we not really "analyzing" child tables; we just collect
sample rows from them to calculate whole-tree statistics for the root
parent table.  It might still be possible to show the child table being
sampled though.

> --
> pg_stat_progress_analyze.num_target_sample_rows is set while ignoring
> the number of rows the target table has actually. So If the table has
> rows less than target number of rows, the num_rows_sampled never reach
> to num_target_sample_rows.
> 
> --
> @@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel,
> }
> 
> samplerows += 1;
> +
> +   /* Report number of rows sampled so far */
> +
> pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED,
> numrows);
> }
> }
> 
> I think that it's wrong to use numrows variable to report the progress
> of the number of rows sampled. As the following comment mentioned, we
> first save row into rows[] and increment numrows until numrows <
> targrows. And then we could replace stored sample row with new sampled
> row. That is, after collecting "numrows" rows, analyze can still take
> a long time to get and replace the sample row. To computing progress
> of collecting sample row, IMO reporting the number of blocks we
> scanned so far is better than number of sample rows. Thought?

We can report progress in terms of individual blocks only inside
acquire_sample_rows(), which seems undesirable when one thinks that we
will be resetting the target for every child table.  We should have a
global target that considers all child tables in the inheritance
hierarchy, which maybe is possible if we count them beforehand in
acquire_inheritance_sample_rows().  But why not use target sample rows,
which remains the same for both when we're collecting sample rows from one
table and from the whole inheritance hierarchy.  We can keep the count of
already collected rows in a struct that is used across calls for all the
child tables and increment upward from that count when we start collecting
from a new child table.

>> /*
>>  * The first targrows sample rows are simply copied into the
>>  * reservoir. Then we start replacing tuples in the sample
>>  * until we reach the end of the relation.  This algorithm is
>>  * from Jeff Vitter's paper (see full citation below). It
>>  * works by repeatedly computing the number of tuples to skip
>>  * before selecting a tuple, which replaces a randomly chosen
>>  * element of the reservoir (current set of tuples).  At all
>>  * times the reservoir is a true random sample of the tuples
>>  * we've passed over so far, so when we fall off the end of
>>  * the relation we're done.
>>  */

It seems that we could use samplerows instead of numrows to count the
progress (if we choose to count progress in terms of sample rows collected).

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-04-03 Thread vinayak



On 2017/03/30 17:39, Masahiko Sawada wrote:

On Wed, Mar 29, 2017 at 5:38 PM, vinayak
 wrote:

On 2017/03/25 4:30, Robert Haas wrote:

On Fri, Mar 24, 2017 at 3:41 AM, vinayak
 wrote:

I have updated the patch.

You can't change the definition of AcquireSampleRowsFunc without
updating the documentation in fdwhandler.sgml, but I think I don't
immediately understand why that's a thing we want to do at all if
neither file_fdw nor postgres_fdw are going to use the additional
argument.  It seems to be entirely pointless because the new value
being passed down is always zero and even if the callee were to update
it, do_analyze_rel() would just ignore the change on return.  Am I
missing something here?  Also, the whitespace-only changes to fdwapi.h
related to AcquireSampleRowsFunc going to get undone by pgindent, so
even if there's some reason to change that you should leave the lines
that don't need changing untouched.

I reviewed v7 patch.

When I ran ANALYZE command to the table having 5 millions rows with 3
child tables, the progress report I got shows the strange result. The
following result was output after sampled all target rows from child
tables (i.g, after finished acquire_inherited_sample_rows). I think
the progress information should report 100% complete at this point.

#= select * from pg_stat_progress_analyze ;
   pid  | datid | datname  | relid |  phase   |
num_target_sample_rows | num_rows_sampled
---+---+--+---+--++--
  81719 | 13215 | postgres | 16440 | collecting inherited sample rows |
300 |  180
(1 row)

I guess the cause of this is that you always count the number of
sampled rows in acquire_sample_rows staring from 0 for each child
table. num_rows_sampled is reset whenever starting collecting sample
rows.
Also, even if table has child table, the total number of sampling row
is not changed. I think that main differences between analyzing on
normal table and on partitioned table is where we fetch the sample row
from. So I'm not sure if adding "computing inherited statistics" phase
is desirable. If you want to report progress of collecting sample rows
on child table I think that you might want to add the information
showing which child table is being analyzed.

--
pg_stat_progress_analyze.num_target_sample_rows is set while ignoring
the number of rows the target table has actually. So If the table has
rows less than target number of rows, the num_rows_sampled never reach
to num_target_sample_rows.

--
@@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 }

 samplerows += 1;
+
+   /* Report number of rows sampled so far */
+
pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED,
numrows);
 }
 }

I think that it's wrong to use numrows variable to report the progress
of the number of rows sampled. As the following comment mentioned, we
first save row into rows[] and increment numrows until numrows <
targrows. And then we could replace stored sample row with new sampled
row. That is, after collecting "numrows" rows, analyze can still take
a long time to get and replace the sample row. To computing progress
of collecting sample row, IMO reporting the number of blocks we
scanned so far is better than number of sample rows. Thought?


 /*
  * The first targrows sample rows are simply copied into the
  * reservoir. Then we start replacing tuples in the sample
  * until we reach the end of the relation.  This algorithm is
  * from Jeff Vitter's paper (see full citation below). It
  * works by repeatedly computing the number of tuples to skip
  * before selecting a tuple, which replaces a randomly chosen
  * element of the reservoir (current set of tuples).  At all
  * times the reservoir is a true random sample of the tuples
  * we've passed over so far, so when we fall off the end of
  * the relation we're done.
  */

Looks good to me.
In the attached patch I have reported number of blocks scanned so far 
instead of number of sample rows.


In the 'collecting inherited sample rows' phase, child_relid is reported 
as a separate column.
The child_relid is reported its value only in 'collecting inherited 
sample rows' phase. For single relation it return 0.

I am not sure whether this column would helpful or not.
Any suggestions are welcome.

+/* Reset rows processed to zero for the next column */
+
pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_PROCESSED,
0);

This seems like a bad design.  If you see a value other than zero in
this field, you still won't know how much progress analyze has made,
because you don't know which column you're processing.  I also feel
like

Re: [HACKERS] ANALYZE command progress checker

2017-03-31 Thread Masahiko Sawada

On Thu, Mar 30, 2017 at 6:19 PM, vinayak
 wrote:
>
> On 2017/03/30 17:39, Masahiko Sawada wrote:
>>
>> On Wed, Mar 29, 2017 at 5:38 PM, vinayak
>>  wrote:
>>>
>>> On 2017/03/25 4:30, Robert Haas wrote:

 On Fri, Mar 24, 2017 at 3:41 AM, vinayak
  wrote:
>
> I have updated the patch.

 You can't change the definition of AcquireSampleRowsFunc without
 updating the documentation in fdwhandler.sgml, but I think I don't
 immediately understand why that's a thing we want to do at all if
 neither file_fdw nor postgres_fdw are going to use the additional
 argument.  It seems to be entirely pointless because the new value
 being passed down is always zero and even if the callee were to update
 it, do_analyze_rel() would just ignore the change on return.  Am I
 missing something here?  Also, the whitespace-only changes to fdwapi.h
 related to AcquireSampleRowsFunc going to get undone by pgindent, so
 even if there's some reason to change that you should leave the lines
 that don't need changing untouched.
>>
>> I reviewed v7 patch.
>
> Thank you for reviewing the patch.
>>
>>
>> When I ran ANALYZE command to the table having 5 millions rows with 3
>> child tables, the progress report I got shows the strange result. The
>> following result was output after sampled all target rows from child
>> tables (i.g, after finished acquire_inherited_sample_rows). I think
>> the progress information should report 100% complete at this point.
>>
>> #= select * from pg_stat_progress_analyze ;
>>pid  | datid | datname  | relid |  phase   |
>> num_target_sample_rows | num_rows_sampled
>>
>> ---+---+--+---+--++--
>>   81719 | 13215 | postgres | 16440 | collecting inherited sample rows |
>> 300 |  180
>> (1 row)
>>
>> I guess the cause of this is that you always count the number of
>> sampled rows in acquire_sample_rows staring from 0 for each child
>> table. num_rows_sampled is reset whenever starting collecting sample
>> rows.
>> Also, even if table has child table, the total number of sampling row
>> is not changed. I think that main differences between analyzing on
>> normal table and on partitioned table is where we fetch the sample row
>> from. So I'm not sure if adding "computing inherited statistics" phase
>> is desirable. If you want to report progress of collecting sample rows
>> on child table I think that you might want to add the information
>> showing which child table is being analyzed.
>
> Yes. I think showing child table information would be good to user/DBA.
>
>> --
>> pg_stat_progress_analyze.num_target_sample_rows is set while ignoring
>> the number of rows the target table has actually. So If the table has
>> rows less than target number of rows, the num_rows_sampled never reach
>> to num_target_sample_rows.
>>
>> --
>> @@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel,
>>  }
>>
>>  samplerows += 1;
>> +
>> +   /* Report number of rows sampled so far */
>> +
>> pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED,
>> numrows);
>>  }
>>  }
>>
>> I think that it's wrong to use numrows variable to report the progress
>> of the number of rows sampled. As the following comment mentioned, we
>> first save row into rows[] and increment numrows until numrows <
>> targrows. And then we could replace stored sample row with new sampled
>> row. That is, after collecting "numrows" rows, analyze can still take
>> a long time to get and replace the sample row. To computing progress
>> of collecting sample row, IMO reporting the number of blocks we
>> scanned so far is better than number of sample rows. Thought?
>>
>>>  /*
>>>   * The first targrows sample rows are simply copied into the
>>>   * reservoir. Then we start replacing tuples in the sample
>>>   * until we reach the end of the relation.  This algorithm is
>>>   * from Jeff Vitter's paper (see full citation below). It
>>>   * works by repeatedly computing the number of tuples to skip
>>>   * before selecting a tuple, which replaces a randomly chosen
>>>   * element of the reservoir (current set of tuples).  At all
>>>   * times the reservoir is a true random sample of the tuples
>>>   * we've passed over so far, so when we fall off the end of
>>>   * the relation we're done.
>>>   */
>
> I think we can either report number of blocks scanned so far or number of
> sample rows.
> But I agree with you that reporting the number of blocks scanned so far
> would be better than reporting number of sample rows.
>
>>> I Understood that we could not change the

Re: [HACKERS] ANALYZE command progress checker

2017-03-30 Thread vinayak



On 2017/03/30 17:39, Masahiko Sawada wrote:

On Wed, Mar 29, 2017 at 5:38 PM, vinayak
 wrote:

On 2017/03/25 4:30, Robert Haas wrote:

On Fri, Mar 24, 2017 at 3:41 AM, vinayak
 wrote:

I have updated the patch.

You can't change the definition of AcquireSampleRowsFunc without
updating the documentation in fdwhandler.sgml, but I think I don't
immediately understand why that's a thing we want to do at all if
neither file_fdw nor postgres_fdw are going to use the additional
argument.  It seems to be entirely pointless because the new value
being passed down is always zero and even if the callee were to update
it, do_analyze_rel() would just ignore the change on return.  Am I
missing something here?  Also, the whitespace-only changes to fdwapi.h
related to AcquireSampleRowsFunc going to get undone by pgindent, so
even if there's some reason to change that you should leave the lines
that don't need changing untouched.

I reviewed v7 patch.

Thank you for reviewing the patch.


When I ran ANALYZE command to the table having 5 millions rows with 3
child tables, the progress report I got shows the strange result. The
following result was output after sampled all target rows from child
tables (i.g, after finished acquire_inherited_sample_rows). I think
the progress information should report 100% complete at this point.

#= select * from pg_stat_progress_analyze ;
   pid  | datid | datname  | relid |  phase   |
num_target_sample_rows | num_rows_sampled
---+---+--+---+--++--
  81719 | 13215 | postgres | 16440 | collecting inherited sample rows |
300 |  180
(1 row)

I guess the cause of this is that you always count the number of
sampled rows in acquire_sample_rows staring from 0 for each child
table. num_rows_sampled is reset whenever starting collecting sample
rows.
Also, even if table has child table, the total number of sampling row
is not changed. I think that main differences between analyzing on
normal table and on partitioned table is where we fetch the sample row
from. So I'm not sure if adding "computing inherited statistics" phase
is desirable. If you want to report progress of collecting sample rows
on child table I think that you might want to add the information
showing which child table is being analyzed.

Yes. I think showing child table information would be good to user/DBA.

--
pg_stat_progress_analyze.num_target_sample_rows is set while ignoring
the number of rows the target table has actually. So If the table has
rows less than target number of rows, the num_rows_sampled never reach
to num_target_sample_rows.

--
@@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 }

 samplerows += 1;
+
+   /* Report number of rows sampled so far */
+
pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED,
numrows);
 }
 }

I think that it's wrong to use numrows variable to report the progress
of the number of rows sampled. As the following comment mentioned, we
first save row into rows[] and increment numrows until numrows <
targrows. And then we could replace stored sample row with new sampled
row. That is, after collecting "numrows" rows, analyze can still take
a long time to get and replace the sample row. To computing progress
of collecting sample row, IMO reporting the number of blocks we
scanned so far is better than number of sample rows. Thought?


 /*
  * The first targrows sample rows are simply copied into the
  * reservoir. Then we start replacing tuples in the sample
  * until we reach the end of the relation.  This algorithm is
  * from Jeff Vitter's paper (see full citation below). It
  * works by repeatedly computing the number of tuples to skip
  * before selecting a tuple, which replaces a randomly chosen
  * element of the reservoir (current set of tuples).  At all
  * times the reservoir is a true random sample of the tuples
  * we've passed over so far, so when we fall off the end of
  * the relation we're done.
  */
I think we can either report number of blocks scanned so far or number 
of sample rows.
But I agree with you that reporting the number of blocks scanned so far 
would be better than reporting number of sample rows.

I Understood that we could not change the definition of
AcquireSampleRowsFunc without updating the documentation in fdwhandler.sgml.
In the last patch I have modified the definition of AcquireSampleRowsFunc to
handle the inheritance case.
If the table being analyzed has one or more children, ANALYZE will gather
statistics twice:
once on the rows of parent table only and second on the rows of the parent
table with all of its children.
So while reporting

Re: [HACKERS] ANALYZE command progress checker

2017-03-30 Thread Masahiko Sawada

On Wed, Mar 29, 2017 at 5:38 PM, vinayak
 wrote:
>
> On 2017/03/25 4:30, Robert Haas wrote:
>>
>> On Fri, Mar 24, 2017 at 3:41 AM, vinayak
>>  wrote:
>>>
>>> I have updated the patch.
>>
>> You can't change the definition of AcquireSampleRowsFunc without
>> updating the documentation in fdwhandler.sgml, but I think I don't
>> immediately understand why that's a thing we want to do at all if
>> neither file_fdw nor postgres_fdw are going to use the additional
>> argument.  It seems to be entirely pointless because the new value
>> being passed down is always zero and even if the callee were to update
>> it, do_analyze_rel() would just ignore the change on return.  Am I
>> missing something here?  Also, the whitespace-only changes to fdwapi.h
>> related to AcquireSampleRowsFunc going to get undone by pgindent, so
>> even if there's some reason to change that you should leave the lines
>> that don't need changing untouched.

I reviewed v7 patch.

When I ran ANALYZE command to the table having 5 millions rows with 3
child tables, the progress report I got shows the strange result. The
following result was output after sampled all target rows from child
tables (i.g, after finished acquire_inherited_sample_rows). I think
the progress information should report 100% complete at this point.

#= select * from pg_stat_progress_analyze ;
  pid  | datid | datname  | relid |  phase   |
num_target_sample_rows | num_rows_sampled
---+---+--+---+--++--
 81719 | 13215 | postgres | 16440 | collecting inherited sample rows |
   300 |  180
(1 row)

I guess the cause of this is that you always count the number of
sampled rows in acquire_sample_rows staring from 0 for each child
table. num_rows_sampled is reset whenever starting collecting sample
rows.
Also, even if table has child table, the total number of sampling row
is not changed. I think that main differences between analyzing on
normal table and on partitioned table is where we fetch the sample row
from. So I'm not sure if adding "computing inherited statistics" phase
is desirable. If you want to report progress of collecting sample rows
on child table I think that you might want to add the information
showing which child table is being analyzed.

--
pg_stat_progress_analyze.num_target_sample_rows is set while ignoring
the number of rows the target table has actually. So If the table has
rows less than target number of rows, the num_rows_sampled never reach
to num_target_sample_rows.

--
@@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel,
}

samplerows += 1;
+
+   /* Report number of rows sampled so far */
+
pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED,
numrows);
}
}

I think that it's wrong to use numrows variable to report the progress
of the number of rows sampled. As the following comment mentioned, we
first save row into rows[] and increment numrows until numrows <
targrows. And then we could replace stored sample row with new sampled
row. That is, after collecting "numrows" rows, analyze can still take
a long time to get and replace the sample row. To computing progress
of collecting sample row, IMO reporting the number of blocks we
scanned so far is better than number of sample rows. Thought?

> /*
>  * The first targrows sample rows are simply copied into the
>  * reservoir. Then we start replacing tuples in the sample
>  * until we reach the end of the relation.  This algorithm is
>  * from Jeff Vitter's paper (see full citation below). It
>  * works by repeatedly computing the number of tuples to skip
>  * before selecting a tuple, which replaces a randomly chosen
>  * element of the reservoir (current set of tuples).  At all
>  * times the reservoir is a true random sample of the tuples
>  * we've passed over so far, so when we fall off the end of
>  * the relation we're done.
>  */

> I Understood that we could not change the definition of
> AcquireSampleRowsFunc without updating the documentation in fdwhandler.sgml.
> In the last patch I have modified the definition of AcquireSampleRowsFunc to
> handle the inheritance case.
> If the table being analyzed has one or more children, ANALYZE will gather
> statistics twice:
> once on the rows of parent table only and second on the rows of the parent
> table with all of its children.
> So while reporting the progress the value of number of target sample rows
> and number of rows sampled is mismatched.
> For single relation it works fine.
>
> In the attached patch I have not change the definition of
> AcquireSampleRowsFunc.
> I have updated inheritance case in the the documentation.
>
>>
>> +

Re: [HACKERS] ANALYZE command progress checker

2017-03-29 Thread vinayak



On 2017/03/25 4:30, Robert Haas wrote:

On Fri, Mar 24, 2017 at 3:41 AM, vinayak
 wrote:

I have updated the patch.

You can't change the definition of AcquireSampleRowsFunc without
updating the documentation in fdwhandler.sgml, but I think I don't
immediately understand why that's a thing we want to do at all if
neither file_fdw nor postgres_fdw are going to use the additional
argument.  It seems to be entirely pointless because the new value
being passed down is always zero and even if the callee were to update
it, do_analyze_rel() would just ignore the change on return.  Am I
missing something here?  Also, the whitespace-only changes to fdwapi.h
related to AcquireSampleRowsFunc going to get undone by pgindent, so
even if there's some reason to change that you should leave the lines
that don't need changing untouched.
I Understood that we could not change the definition of 
AcquireSampleRowsFunc without updating the documentation in fdwhandler.sgml.
In the last patch I have modified the definition of 
AcquireSampleRowsFunc to handle the inheritance case.
If the table being analyzed has one or more children, ANALYZE will 
gather statistics twice:
once on the rows of parent table only and second on the rows of the 
parent table with all of its children.
So while reporting the progress the value of number of target sample 
rows and number of rows sampled is mismatched.

For single relation it works fine.

In the attached patch I have not change the definition of 
AcquireSampleRowsFunc.

I have updated inheritance case in the the documentation.



+/* Reset rows processed to zero for the next column */
+pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_PROCESSED,
0);

This seems like a bad design.  If you see a value other than zero in
this field, you still won't know how much progress analyze has made,
because you don't know which column you're processing.  I also feel
like you're not paying enough attention to a key point here that I
made in my last review, which is that you need to look at where
ANALYZE actually spends most of the time.  If the process of computing
the per-row statistics consumes significant CPU time, then maybe
there's some point in trying to instrument it, but does it?  Unless
you know the answer to that question, you don't know whether there's
even a point to trying to instrument this.

Understood. The computing statistics phase takes long time so I am 
looking at the code.


Regards,
Vinayak Pokale
NTT Open Source Software Center


pg_stat_progress_analyze_v7.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-24 Thread Robert Haas

On Fri, Mar 24, 2017 at 3:41 AM, vinayak
 wrote:
> I have updated the patch.

You can't change the definition of AcquireSampleRowsFunc without
updating the documentation in fdwhandler.sgml, but I think I don't
immediately understand why that's a thing we want to do at all if
neither file_fdw nor postgres_fdw are going to use the additional
argument.  It seems to be entirely pointless because the new value
being passed down is always zero and even if the callee were to update
it, do_analyze_rel() would just ignore the change on return.  Am I
missing something here?  Also, the whitespace-only changes to fdwapi.h
related to AcquireSampleRowsFunc going to get undone by pgindent, so
even if there's some reason to change that you should leave the lines
that don't need changing untouched.

+/* Reset rows processed to zero for the next column */
+pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_PROCESSED,
0);

This seems like a bad design.  If you see a value other than zero in
this field, you still won't know how much progress analyze has made,
because you don't know which column you're processing.  I also feel
like you're not paying enough attention to a key point here that I
made in my last review, which is that you need to look at where
ANALYZE actually spends most of the time.  If the process of computing
the per-row statistics consumes significant CPU time, then maybe
there's some point in trying to instrument it, but does it?  Unless
you know the answer to that question, you don't know whether there's
even a point to trying to instrument this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-24 Thread vinayak

On 2017/03/23 15:04, Haribabu Kommi wrote:

On Wed, Mar 22, 2017 at 8:11 PM, vinayak 
> wrote:

On 2017/03/21 21:25, Haribabu Kommi wrote:

On Tue, Mar 21, 2017 at 3:41 PM, vinayak
> wrote:

Thank you for testing the patch on Windows platform.

Thanks for the updated patch.

It works good for a normal relation.  But for a relation that
contains child tables,
the PROGRESS_ANALYZE_NUM_ROWS_SAMPLED produces wrong results.

Thank you for reviewing the patch.
The attached patch implements a way to report sample rows count
from acquire_sample_rows() even if called for child tables.

How about adding another phase called
PROGRESS_ANALYZE_PHASE_COLLECT_INHERIT_SAMPLE_ROWS
and set this phase only when it is an inheritance analyze
operation. And adding
some explanation of ROWS_SAMPLED phase about inheritance tables
how these sampled rows are calculated will provide good analyze
progress of
relation that contains child relations also.

I have added the phase called
PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS.
I have also updated the documentation.

Thanks for the updated patch. I will check it.

The ANALYZE command takes long time in computing statistics
phase.So I think we can add some column or phase so that user can
easily understand the progress.
How about adding new column like "num_rows_processed" will compute
the statistics of specified column?

I prefer a column with rows processed instead of a phase.
Because we already set the phase of compute stats and showing
the progress there will number of rows processed will be good.

How about separate the computing "inheritance statistics" phase
from the computing regular "single table" statistics.
Comment?

Yes, this will be good to show both that states of inheritance of 
sampled rows and
compute inheritance stats, so that it will be clearly visible to the 
user the current

status.

I have updated the patch.
I have added column "num_rows_processed" and phase "computing inherited 
statistics" in the view.

=# \d+ pg_stat_progress_analyze
 View "pg_catalog.pg_stat_progress_analyze"
 Column |  Type   | Collation | Nullable | Default | 
Storage  | Description

+-+---+--+-+--+-
 pid| integer |   |  | | 
plain|
 datid  | oid |   |  | | 
plain|
 datname| name|   |  | | 
plain|
 relid  | oid |   |  | | 
plain|
 phase  | text|   |  | | 
extended |
 num_target_sample_rows | bigint  |   |  | | 
plain|
 num_rows_sampled   | bigint  |   |  | | 
plain|
 num_rows_processed | bigint  |   |  | | 
plain|

View definition:
 SELECT s.pid,
s.datid,
d.datname,
s.relid,
CASE s.param1
WHEN 0 THEN 'initializing'::text
WHEN 1 THEN 'collecting sample rows'::text
WHEN 2 THEN 'collecting inherited sample rows'::text
WHEN 3 THEN 'computing statistics'::text
WHEN 4 THEN 'computing inherited statistics'::text
ELSE NULL::text
END AS phase,
s.param2 AS num_target_sample_rows,
s.param3 AS num_rows_sampled,
s.param4 AS num_rows_processed
   FROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, 
param1, param2, param3, param4, param5, param6, param7, param8, param9, 
param10)

 LEFT JOIN pg_database d ON s.datid = d.oid;

Regards,
Vinayak Pokale
NTT Open Source Software Center

pg_stat_progress_analyze_v6.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-23 Thread Haribabu Kommi

On Wed, Mar 22, 2017 at 8:11 PM, vinayak 
wrote:

>
> On 2017/03/21 21:25, Haribabu Kommi wrote:
>
>
>
> On Tue, Mar 21, 2017 at 3:41 PM, vinayak 
> wrote:
>
>> Thank you for testing the patch on Windows platform.
>>
>>
> Thanks for the updated patch.
>
> It works good for a normal relation.  But for a relation that contains
> child tables,
> the PROGRESS_ANALYZE_NUM_ROWS_SAMPLED produces wrong results.
>
> Thank you for reviewing the patch.
> The attached patch implements a way to report sample rows count from
> acquire_sample_rows() even if called for child tables.
>
> How about adding another phase called PROGRESS_ANALYZE_PHASE_
> COLLECT_INHERIT_SAMPLE_ROWS
> and set this phase only when it is an inheritance analyze operation. And
> adding
> some explanation of ROWS_SAMPLED phase about inheritance tables
> how these sampled rows are calculated will provide good analyze progress of
> relation that contains child relations also.
>
> I have added the phase called PROGRESS_ANALYZE_PHASE_
> COLLECT_INH_SAMPLE_ROWS.
> I have also updated the documentation.
>

Thanks for the updated patch. I will check it.


> The ANALYZE command takes long time in computing statistics phase.So I
> think we can add some column or phase so that user can easily understand
> the progress.
> How about adding new column like "num_rows_processed" will compute the
> statistics of specified column?
>

I prefer a column with rows processed instead of a phase.
Because we already set the phase of compute stats and showing
the progress there will number of rows processed will be good.


> How about separate the computing "inheritance statistics" phase from the
> computing regular "single table" statistics.
> Comment?
>

Yes, this will be good to show both that states of inheritance of sampled
rows and
compute inheritance stats, so that it will be clearly visible to the user
the current
status.


Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] ANALYZE command progress checker

2017-03-22 Thread vinayak



On 2017/03/21 21:25, Haribabu Kommi wrote:



On Tue, Mar 21, 2017 at 3:41 PM, vinayak 
> wrote:


Thank you for testing the patch on Windows platform.


Thanks for the updated patch.

It works good for a normal relation.  But for a relation that contains 
child tables,

the PROGRESS_ANALYZE_NUM_ROWS_SAMPLED produces wrong results.


Thank you for reviewing the patch.
The attached patch implements a way to report sample rows count from 
acquire_sample_rows() even if called for child tables.
How about adding another phase called 
PROGRESS_ANALYZE_PHASE_COLLECT_INHERIT_SAMPLE_ROWS
and set this phase only when it is an inheritance analyze operation. 
And adding

some explanation of ROWS_SAMPLED phase about inheritance tables
how these sampled rows are calculated will provide good analyze 
progress of

relation that contains child relations also.
I have added the phase called 
PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS.

I have also updated the documentation.

The ANALYZE command takes long time in computing statistics phase.So I 
think we can add some column or phase so that user can easily understand 
the progress.
How about adding new column like "num_rows_processed" will compute the 
statistics of specified column?
How about separate the computing "inheritance statistics" phase from the 
computing regular "single table" statistics.

Comment?

Regards,
Vinayak Pokale
NTT Open Source Software Center


pg_stat_progress_analyze_v5.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-21 Thread Haribabu Kommi

On Tue, Mar 21, 2017 at 3:41 PM, vinayak 
wrote:

> Thank you for testing the patch on Windows platform.
>
>
Thanks for the updated patch.

It works good for a normal relation.  But for a relation that contains
child tables,
the PROGRESS_ANALYZE_NUM_ROWS_SAMPLED produces wrong results.

How about adding another phase called
PROGRESS_ANALYZE_PHASE_COLLECT_INHERIT_SAMPLE_ROWS
and set this phase only when it is an inheritance analyze operation. And
adding
some explanation of ROWS_SAMPLED phase about inheritance tables
how these sampled rows are calculated will provide good analyze progress of
relation that contains child relations also.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] ANALYZE command progress checker

2017-03-20 Thread vinayak


Hi Ashutosh,

On 2017/03/19 17:56, Ashutosh Sharma wrote:

Hi,

I didn't find any major issues with the patch. It works as expected.
However, I have few minor comments that I would like to share,

+ 
+   Total number of sample rows. The sample it reads is taken randomly.
+   Its size depends on the default_statistics_target
parameter value.
+ 

1) I think it would be better if you could specify reference link to
the guc variable 'default_statistics_target'. Something like this,

.

If you go through monitoring.sgml, you would find that there is
reference link to all guc variables being used.

+1. Updated in the attached patch.


2) I feel that the 'computing_statistics' phase could have been
splitted into 'computing_column_stats', 'computing_index_stats'
Please let me know your thoughts on this.

Initially I have spitted this phase as 'computing heap stats' and 
'computing index stats' but
I agreed with Roberts suggestion to merge two phases into one as 
'computing statistics'.

+   certain commands during command execution.  Currently, the command
+   which supports progress reporting is VACUUM and
ANALYZE.  This may be
 expanded in the future.

3) I think above needs to be rephrased. Something like...Currently,
the supported progress reporting commands are 'VACUUM' and
'ANALYZE'.

+1. Updated in the attached patch.

Moreover, I also ran your patch on Windows platform and didn't find
any issues. Thanks.

Thank you for testing the patch on Windows platform.

Regards,
Vinayak Pokale
NTT Open Source Software Center


pg_stat_progress_analyze_v4.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-20 Thread vinayak


Hi Ashutosh,

Thank you for reviewing the patch.

On 2017/03/18 21:00, Ashutosh Sharma wrote:

Hi Vinayak,

Here are couple of review comments that may need your attention.

1. Firstly, I am seeing some trailing whitespace errors when trying to
apply your v3 patch using git apply command.

[ashu@localhost postgresql]$ git apply pg_stat_progress_analyze_v3.patch
pg_stat_progress_analyze_v3.patch:155: trailing whitespace.
CREATE VIEW pg_stat_progress_analyze AS
pg_stat_progress_analyze_v3.patch:156: trailing whitespace.
 SELECT
pg_stat_progress_analyze_v3.patch:157: trailing whitespace.
 S.pid AS pid, S.datid AS datid, D.datname AS datname,
pg_stat_progress_analyze_v3.patch:158: trailing whitespace.
 S.relid AS relid,
pg_stat_progress_analyze_v3.patch:159: trailing whitespace.
 CASE S.param1 WHEN 0 THEN 'initializing'
error: patch failed: doc/src/sgml/monitoring.sgml:521
I have tried to apply patch using "git apply" and it works fine in my 
environment.

I use below command to apply the patch.
patch -p1 < pg_stat_progress_analyze_v3.patch

2) The test-case 'rules' is failing.  I think it's failing because in
rules.sql 'ORDERBY viewname' is used which will put
'pg_stat_progress_analyze' ahead of 'pg_stat_progress_vacuum' in the
list of catalog views. You may need to correct rules.out file
accordingly. This is what i could see in rules.sql,

SELECT viewname, definition FROM pg_views WHERE schemaname <>
'information_schema' ORDER BY viewname;

I am still reviewing your patch and doing some testing. Will update if
i find any issues. Thanks.

Understood. I have fixed it.

Regards,
Vinayak Pokale
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-19 Thread Ashutosh Sharma

Hi,

I didn't find any major issues with the patch. It works as expected.
However, I have few minor comments that I would like to share,

+ 
+   Total number of sample rows. The sample it reads is taken randomly.
+   Its size depends on the default_statistics_target
parameter value.
+ 

1) I think it would be better if you could specify reference link to
the guc variable 'default_statistics_target'. Something like this,

.

If you go through monitoring.sgml, you would find that there is
reference link to all guc variables being used.

2) I feel that the 'computing_statistics' phase could have been
splitted into 'computing_column_stats', 'computing_index_stats'
Please let me know your thoughts on this.


+   certain commands during command execution.  Currently, the command
+   which supports progress reporting is VACUUM and
ANALYZE.  This may be
expanded in the future.

3) I think above needs to be rephrased. Something like...Currently,
the supported progress reporting commands are 'VACUUM' and
'ANALYZE'.

Moreover, I also ran your patch on Windows platform and didn't find
any issues. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Sat, Mar 18, 2017 at 5:30 PM, Ashutosh Sharma  wrote:
> Hi Vinayak,
>
> Here are couple of review comments that may need your attention.
>
> 1. Firstly, I am seeing some trailing whitespace errors when trying to
> apply your v3 patch using git apply command.
>
> [ashu@localhost postgresql]$ git apply pg_stat_progress_analyze_v3.patch
> pg_stat_progress_analyze_v3.patch:155: trailing whitespace.
> CREATE VIEW pg_stat_progress_analyze AS
> pg_stat_progress_analyze_v3.patch:156: trailing whitespace.
> SELECT
> pg_stat_progress_analyze_v3.patch:157: trailing whitespace.
> S.pid AS pid, S.datid AS datid, D.datname AS datname,
> pg_stat_progress_analyze_v3.patch:158: trailing whitespace.
> S.relid AS relid,
> pg_stat_progress_analyze_v3.patch:159: trailing whitespace.
> CASE S.param1 WHEN 0 THEN 'initializing'
> error: patch failed: doc/src/sgml/monitoring.sgml:521
>
> 2) The test-case 'rules' is failing.  I think it's failing because in
> rules.sql 'ORDERBY viewname' is used which will put
> 'pg_stat_progress_analyze' ahead of 'pg_stat_progress_vacuum' in the
> list of catalog views. You may need to correct rules.out file
> accordingly. This is what i could see in rules.sql,
>
> SELECT viewname, definition FROM pg_views WHERE schemaname <>
> 'information_schema' ORDER BY viewname;
>
> I am still reviewing your patch and doing some testing. Will update if
> i find any issues. Thanks.
>
> --
> With Regards,
> Ashutosh Sharma
> EnterpriseDB:http://www.enterprisedb.com
>
> On Fri, Mar 17, 2017 at 3:16 PM, vinayak
>  wrote:
>>
>> On 2017/03/17 10:38, Robert Haas wrote:
>>>
>>> On Fri, Mar 10, 2017 at 2:46 AM, vinayak
>>>  wrote:

 Thank you for reviewing the patch.

 The attached patch incorporated Michael and Amit comments also.
>>>
>>> I reviewed this tonight.
>>
>> Thank you for reviewing the patch.
>>>
>>> +/* Report compute index stats phase */
>>> +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>>> +
>>> PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);
>>>
>>> Hmm, is there really any point to this?  And is it even accurate?  It
>>> doesn't look to me like we are computing any index statistics here;
>>> we're only allocating a few in-memory data structures here, I think,
>>> which is not a statistics computation and probably doesn't take any
>>> significant time.
>>>
>>> +/* Report compute heap stats phase */
>>> +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>>> +
>>> PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);
>>>
>>> The phase you label as computing heap statistics also includes the
>>> computation of index statistics.  I wouldn't bother separating the
>>> computation of heap and index statistics; I'd just have a "compute
>>> statistics" phase that begins right after the comment that starts
>>> "Compute the statistics."
>>
>> Understood. Fixed in the attached patch.
>>>
>>>
>>> +/* Report that we are now doing index cleanup */
>>> +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>>> +PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);
>>>
>>> OK, this doesn't make any sense either.  We are not cleaning up the
>>> indexes here.  We are just closing them and releasing resources.  I
>>> don't see why you need this phase at all; it can't last more than some
>>> tiny fraction of a second.  It seems like you're trying to copy the
>>> exact same phases that exist for vacuum instead of thinking about what
>>> makes sense for ANALYZE.
>>
>> Understood. I have removed this phase.
>>>
>>>
>>> +/* Report number of heap blocks scaned so far*/
>>> +pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED,

Re: [HACKERS] ANALYZE command progress checker

2017-03-18 Thread Ashutosh Sharma

Hi Vinayak,

Here are couple of review comments that may need your attention.

1. Firstly, I am seeing some trailing whitespace errors when trying to
apply your v3 patch using git apply command.

[ashu@localhost postgresql]$ git apply pg_stat_progress_analyze_v3.patch
pg_stat_progress_analyze_v3.patch:155: trailing whitespace.
CREATE VIEW pg_stat_progress_analyze AS
pg_stat_progress_analyze_v3.patch:156: trailing whitespace.
SELECT
pg_stat_progress_analyze_v3.patch:157: trailing whitespace.
S.pid AS pid, S.datid AS datid, D.datname AS datname,
pg_stat_progress_analyze_v3.patch:158: trailing whitespace.
S.relid AS relid,
pg_stat_progress_analyze_v3.patch:159: trailing whitespace.
CASE S.param1 WHEN 0 THEN 'initializing'
error: patch failed: doc/src/sgml/monitoring.sgml:521

2) The test-case 'rules' is failing.  I think it's failing because in
rules.sql 'ORDERBY viewname' is used which will put
'pg_stat_progress_analyze' ahead of 'pg_stat_progress_vacuum' in the
list of catalog views. You may need to correct rules.out file
accordingly. This is what i could see in rules.sql,

SELECT viewname, definition FROM pg_views WHERE schemaname <>
'information_schema' ORDER BY viewname;

I am still reviewing your patch and doing some testing. Will update if
i find any issues. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Fri, Mar 17, 2017 at 3:16 PM, vinayak
 wrote:
>
> On 2017/03/17 10:38, Robert Haas wrote:
>>
>> On Fri, Mar 10, 2017 at 2:46 AM, vinayak
>>  wrote:
>>>
>>> Thank you for reviewing the patch.
>>>
>>> The attached patch incorporated Michael and Amit comments also.
>>
>> I reviewed this tonight.
>
> Thank you for reviewing the patch.
>>
>> +/* Report compute index stats phase */
>> +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>> +
>> PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);
>>
>> Hmm, is there really any point to this?  And is it even accurate?  It
>> doesn't look to me like we are computing any index statistics here;
>> we're only allocating a few in-memory data structures here, I think,
>> which is not a statistics computation and probably doesn't take any
>> significant time.
>>
>> +/* Report compute heap stats phase */
>> +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>> +
>> PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);
>>
>> The phase you label as computing heap statistics also includes the
>> computation of index statistics.  I wouldn't bother separating the
>> computation of heap and index statistics; I'd just have a "compute
>> statistics" phase that begins right after the comment that starts
>> "Compute the statistics."
>
> Understood. Fixed in the attached patch.
>>
>>
>> +/* Report that we are now doing index cleanup */
>> +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>> +PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);
>>
>> OK, this doesn't make any sense either.  We are not cleaning up the
>> indexes here.  We are just closing them and releasing resources.  I
>> don't see why you need this phase at all; it can't last more than some
>> tiny fraction of a second.  It seems like you're trying to copy the
>> exact same phases that exist for vacuum instead of thinking about what
>> makes sense for ANALYZE.
>
> Understood. I have removed this phase.
>>
>>
>> +/* Report number of heap blocks scaned so far*/
>> +pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED,
>> targblock);
>>
>> While I don't think this is necessarily a bad thing to report, I find
>> it very surprising that you're not reporting what seems to me like the
>> most useful thing here - namely, the number of blocks that will be
>> sampled in total and the number of those that you've selected.
>> Instead, you're just reporting the length of the relation and the
>> last-sampled block, which is a less-accurate guide to total progress.
>> There are enough counters to report both things, so maybe we should.
>>
>> +/* Report total number of sample rows*/
>> +pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS,
>> numrows);
>>
>> As an alternative to the previous paragraph, yet another thing you
>> could report is number of rows sampled out of total rows to be
>> sampled.  But this isn't the way to do it: you are reporting the
>> number of rows you sampled only once you've finished sampling.  The
>> point of progress reporting is to report progress -- that is, to
>> report this value as it's updated, not just to report it when ANALYZE
>> is practically done and the value has reached its maximum.
>
> Understood.
> I have reported number of rows sampled out of total rows to be sampled.
> I have not reported the number of blocks that will be sampled in total.
> So currect pg_stat_progress_analyze view looks like:
>
> =# \d+ pg_stat_progress_analyze
>

Re: [HACKERS] ANALYZE command progress checker

2017-03-17 Thread vinayak



On 2017/03/17 10:38, Robert Haas wrote:

On Fri, Mar 10, 2017 at 2:46 AM, vinayak
 wrote:

Thank you for reviewing the patch.

The attached patch incorporated Michael and Amit comments also.

I reviewed this tonight.

Thank you for reviewing the patch.

+/* Report compute index stats phase */
+pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+
PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);

Hmm, is there really any point to this?  And is it even accurate?  It
doesn't look to me like we are computing any index statistics here;
we're only allocating a few in-memory data structures here, I think,
which is not a statistics computation and probably doesn't take any
significant time.

+/* Report compute heap stats phase */
+pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);

The phase you label as computing heap statistics also includes the
computation of index statistics.  I wouldn't bother separating the
computation of heap and index statistics; I'd just have a "compute
statistics" phase that begins right after the comment that starts
"Compute the statistics."

Understood. Fixed in the attached patch.


+/* Report that we are now doing index cleanup */
+pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);

OK, this doesn't make any sense either.  We are not cleaning up the
indexes here.  We are just closing them and releasing resources.  I
don't see why you need this phase at all; it can't last more than some
tiny fraction of a second.  It seems like you're trying to copy the
exact same phases that exist for vacuum instead of thinking about what
makes sense for ANALYZE.

Understood. I have removed this phase.


+/* Report number of heap blocks scaned so far*/
+pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED,
targblock);

While I don't think this is necessarily a bad thing to report, I find
it very surprising that you're not reporting what seems to me like the
most useful thing here - namely, the number of blocks that will be
sampled in total and the number of those that you've selected.
Instead, you're just reporting the length of the relation and the
last-sampled block, which is a less-accurate guide to total progress.
There are enough counters to report both things, so maybe we should.

+/* Report total number of sample rows*/
+pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS, numrows);

As an alternative to the previous paragraph, yet another thing you
could report is number of rows sampled out of total rows to be
sampled.  But this isn't the way to do it: you are reporting the
number of rows you sampled only once you've finished sampling.  The
point of progress reporting is to report progress -- that is, to
report this value as it's updated, not just to report it when ANALYZE
is practically done and the value has reached its maximum.

Understood.
I have reported number of rows sampled out of total rows to be sampled.
I have not reported the number of blocks that will be sampled in total.
So currect pg_stat_progress_analyze view looks like:

=# \d+ pg_stat_progress_analyze
 View "pg_catalog.pg_stat_progress_analyze"
 Column |  Type   | Collation | Nullable | Default | 
Storage  | Descripti

on
+-+---+--+-+--+--
---
 pid| integer |   |  | | 
plain|
 datid  | oid |   |  | | 
plain|
 datname| name|   |  | | 
plain|
 relid  | oid |   |  | | 
plain|
 phase  | text|   |  | | 
extended |
 num_target_sample_rows | bigint  |   |  | | 
plain|
 num_rows_sampled   | bigint  |   |  | | 
plain|

View definition:
 SELECT s.pid,
s.datid,
d.datname,
s.relid,
CASE s.param1
WHEN 0 THEN 'initializing'::text
WHEN 1 THEN 'collecting sample rows'::text
WHEN 2 THEN 'computing statistics'::text
ELSE NULL::text
END AS phase,
s.param2 AS num_target_sample_rows,
s.param3 AS num_rows_sampled
   FROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, 
param1, param2, p

aram3, param4, param5, param6, param7, param8, param9, param10)
 LEFT JOIN pg_database d ON s.datid = d.oid;

Comment?

The documentation for this patch isn't very good, either.  You forgot
to update the part of the documentation that says that VACUUM is the
only command that currently supports progress reporting, and the
descriptions of the phases and progress counters are brief and not

Re: [HACKERS] ANALYZE command progress checker

2017-03-16 Thread Robert Haas

On Fri, Mar 10, 2017 at 2:46 AM, vinayak
 wrote:
> Thank you for reviewing the patch.
>
> The attached patch incorporated Michael and Amit comments also.

I reviewed this tonight.

+/* Report compute index stats phase */
+pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+
PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);

Hmm, is there really any point to this?  And is it even accurate?  It
doesn't look to me like we are computing any index statistics here;
we're only allocating a few in-memory data structures here, I think,
which is not a statistics computation and probably doesn't take any
significant time.

+/* Report compute heap stats phase */
+pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);

The phase you label as computing heap statistics also includes the
computation of index statistics.  I wouldn't bother separating the
computation of heap and index statistics; I'd just have a "compute
statistics" phase that begins right after the comment that starts
"Compute the statistics."

+/* Report that we are now doing index cleanup */
+pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);

OK, this doesn't make any sense either.  We are not cleaning up the
indexes here.  We are just closing them and releasing resources.  I
don't see why you need this phase at all; it can't last more than some
tiny fraction of a second.  It seems like you're trying to copy the
exact same phases that exist for vacuum instead of thinking about what
makes sense for ANALYZE.

+/* Report number of heap blocks scaned so far*/
+pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED,
targblock);

While I don't think this is necessarily a bad thing to report, I find
it very surprising that you're not reporting what seems to me like the
most useful thing here - namely, the number of blocks that will be
sampled in total and the number of those that you've selected.
Instead, you're just reporting the length of the relation and the
last-sampled block, which is a less-accurate guide to total progress.
There are enough counters to report both things, so maybe we should.

+/* Report total number of sample rows*/
+pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS, numrows);

As an alternative to the previous paragraph, yet another thing you
could report is number of rows sampled out of total rows to be
sampled.  But this isn't the way to do it: you are reporting the
number of rows you sampled only once you've finished sampling.  The
point of progress reporting is to report progress -- that is, to
report this value as it's updated, not just to report it when ANALYZE
is practically done and the value has reached its maximum.

The documentation for this patch isn't very good, either.  You forgot
to update the part of the documentation that says that VACUUM is the
only command that currently supports progress reporting, and the
descriptions of the phases and progress counters are brief and not
entirely accurate - e.g. heap_blks_scanned is not actually the number
of heap blocks scanned, because we are sampling, not reading all the
blocks.

When adding calls to the progress reporting functions, please consider
whether a blank line should be added before or after the new code, or
both, or neither.  I'd say some blank lines are needed in a few places
where you didn't add them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-14 Thread Haribabu Kommi

On Fri, Mar 10, 2017 at 6:46 PM, vinayak 
wrote:

>
> + /* Report total number of heap blocks and collectinf sample row phase*/
> + initprog_val[0] = PROGRESS_ANALYZE_PHASE_COLLECT_HEAP_SAMPLE_ROWS;
> + initprog_val[1] = totalblocks;
> + pgstat_progress_update_multi_param(2, initprog_index, initprog_val);
>
> acquire_sample_rows function is called from acquire_inherited_sample_rows
> function, so adding the phase in that function will provide wrong info.
>
> I agree with you.
>
>
> +#define PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS 2
>
> why there is no code added for the phase, any specific reason?
>
> I am thinking how to report this phase. Do you have any suggestion?
>

Thanks for the update patch.

Actually the analyze operation is done in two stages for the relation that
contains child relations,
1. For parent relation
2. All child relations

So the progress with start each time for the above two stages. This needs
to clearly mentioned in the docs.

The acquire_sample_rows function collects statistics of the relation
that is provided, updating the analyze details like number of rows and etc
works fine for a single relation, but if it contains many child relations,
the
data will be misleading.

Apart from the above, some more comments.

+ /* Report compute index stats phase */
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);

The above block of the code should be set when it actually doing the
compute_index_stats.

+ /* Report that we are now doing index cleanup */
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);

The above code block should be inside  if (!(options & VACOPT_VACUUM))
block.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] ANALYZE command progress checker

2017-03-13 Thread Robert Haas

On Fri, Mar 10, 2017 at 2:57 PM, Jim Nasby  wrote:
> Oh, I wasn't suggesting a single SRF for everything. Hopefully users will
> eventually figure out a good formula to drive a "progress bar" for each type
> of monitor, which is what you really want anyway (at least 99% of the time).
> If we got there we could have a single view that gave the % complete for
> every command that was providing feedback. If someone wanted details they
> could hit the individual SRF.

So, the point, at least with the VACUUM progress indicator and perhaps
here also, is that we really can't just give a single measure of
progress -- we are 58.32% done.  That's almost impossible to estimate
accurately.  For example, consider VACUUM.  maintenance_work_mem is
69% used, and we are 74% done vacuuming the table.  What percentage
done are we?  Well, it depends.  If the latter part of the table has
exactly the same density of dead tuples we've already witnessed, we
will need only one index vac pass, but there is a good chance that
there's more activity toward the tail end of the table and we will
need two index vac passes.  If the table has several large indexes,
that's a big difference, so if we guess wrong about whether we're
going to run out of memory, we will be way off in estimating overall
progress.  That's why the vacuum progress checker reports the facts we
actually know -- like how many blocks of the relation we've scanned,
and how many dead tuples we've accumulated so far, and how many dead
tuples we are able to store before triggering index vacuuming --
rather than just a percentage.  You can try to derive a percentage
from that if you want, and you can use any formula you like to derive
that, but what the system reports are straight-up facts, not
predictions.

I don't really want to get into the prediction business.  The in-core
progress reporting structure can spit out enough data for people to
write queries or tools that attempt to predict stuff, and *maybe*
someday somebody will write one of those tools that is enough unlike
https://xkcd.com/612/ that we'll feel moved to put it in core.  But my
guess is most likely not.  It's easy to get the easy cases right in
such a prediction tool, but AFAICS getting the hard cases right
requires a crystal ball.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-10 Thread Jim Nasby


On 3/10/17 1:06 PM, Andres Freund wrote:

Hi,

On 2017-03-10 02:11:18 -0600, Jim Nasby wrote:

Perhaps instead of adding more clutter to \dvS we could just have a SRF for
now.


I don't see that as clutter, it's useful information, and keeping it
discoverable is good, not bad.


If we keep adding status reporting commands at some point it's going to 
get unwieldy. Though, if they were in their own schema...



At over 2800 rows currently, you're not going to notice one more
addition to \dfS.


I think it's hard to design a good SRF for this. Because the fields for
different types of progress are different / empty, you can't just
trivially return them as rows.  You'd have to do some EAV like
'command, field_name1, field_value1, ...' type of thing - not
particularly pretty / easy to use.



Oh, I wasn't suggesting a single SRF for everything. Hopefully users 
will eventually figure out a good formula to drive a "progress bar" for 
each type of monitor, which is what you really want anyway (at least 99% 
of the time). If we got there we could have a single view that gave the 
% complete for every command that was providing feedback. If someone 
wanted details they could hit the individual SRF.

--
Jim Nasby, Chief Data Architect, OpenSCG
http://OpenSCG.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-10 Thread Andres Freund

Hi,

On 2017-03-10 02:11:18 -0600, Jim Nasby wrote:
> Perhaps instead of adding more clutter to \dvS we could just have a SRF for
> now.

I don't see that as clutter, it's useful information, and keeping it
discoverable is good, not bad.

> At over 2800 rows currently, you're not going to notice one more
> addition to \dfS.

I think it's hard to design a good SRF for this. Because the fields for
different types of progress are different / empty, you can't just
trivially return them as rows.  You'd have to do some EAV like
'command, field_name1, field_value1, ...' type of thing - not
particularly pretty / easy to use.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-10 Thread Jim Nasby


On 3/6/17 12:49 AM, Michael Paquier wrote:

On Sat, Mar 4, 2017 at 5:33 AM, David Steele  wrote:

I think the idea of a general progress view is very valuable and there
are a ton of operations it could be used for:  full table scans, index
rebuilds, vacuum, copy, etc.

However, I feel that this proposal is not flexible enough and comes too
late in the release cycle to allow development into something that could
be committed.


Well, each command really has its own requirements in terms of data to
store, so we either finish with a bunch of small tables that anyone
could query and join as they wish or a somewhat unique table that is
bloated with all the information, with a set of views on top of it to
query all the information. For extensibility's sake of each command
(for example imagine that REINDEX could be extended with a
CONCURRENTLY option and multiple phases), I would think that having a
table per command type would not be that bad.


Well, the ideal scenario is that someone uses the raw data to come up 
with a good way to just provide ye olde 0-100% progress bar. At that 
point a single view would do the trick.


Perhaps instead of adding more clutter to \dvS we could just have a SRF 
for now. At over 2800 rows currently, you're not going to notice one 
more addition to \dfS.

--
Jim Nasby, Chief Data Architect, OpenSCG
http://OpenSCG.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-09 Thread vinayak


Thank you for reviewing the patch.

The attached patch incorporated Michael and Amit comments also.

On 2017/03/07 15:45, Haribabu Kommi wrote:



On Tue, Mar 7, 2017 at 5:01 PM, Michael Paquier 
> wrote:



@@ -496,7 +499,6 @@ do_analyze_rel(Relation onerel, int options,
VacuumParams *params,
numrows = (*acquirefunc) (onerel, elevel,
  rows, targrows,
  , );
-
/*
Useless diff.


Fixed.


+ 
+   ANALYZE is currently collecting the sample rows.
+   The sample it reads is taken randomly.Its size depends on
+   the default_statistics_target parameter value.
+ 
This should use a  markup for default_statistics_target.


Fixed.



@@ -203,6 +204,8 @@ analyze_rel(Oid relid, RangeVar *relation, int
options,
if (onerel->rd_rel->relkind == RELKIND_RELATION ||
onerel->rd_rel->relkind == RELKIND_MATVIEW)
{
+   pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
+  RelationGetRelid(onerel));
It seems to me that the report should begin in do_analyze_rel().


Fixed.


some more comments,

+/* Report compute heap stats phase */
+pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);

The above analyze phase is updated inside a for loop, instead just set 
it above once.

Fixed.


+ /* Report compute index stats phase */
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);

Irrespective of whether the index exists on the table or not, the 
above analyze phase
is set. It is better to set the above phase and index cleanup phase 
only when there

are indexes on the table.


Agreed. Fixed.

+/* Report total number of heap blocks and collectinf sample row phase*/

Typo? collecting?


Fixed.


+/* Report total number of heap blocks and collectinf sample row phase*/
+initprog_val[0] = PROGRESS_ANALYZE_PHASE_COLLECT_HEAP_SAMPLE_ROWS;
+initprog_val[1] = totalblocks;
+pgstat_progress_update_multi_param(2, initprog_index, initprog_val);
acquire_sample_rows function is called from acquire_inherited_sample_rows
function, so adding the phase in that function will provide wrong info.


I agree with you.


+#define PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS2

why there is no code added for the phase, any specific reason?

I am thinking how to report this phase. Do you have any suggestion?


+/* Phases of analyze */

Can be written as following for better understanding, and also
similar like vacuum.

/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */


Done.

Regards,
Vinayak Pokale
NTT Open Source Software Center


pg_stat_progress_analyze_v2.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-07 Thread Robert Haas

On Wed, Mar 1, 2017 at 1:25 PM, Andres Freund  wrote:
> On 2017-03-01 10:20:41 -0800, David Fetter wrote:
>> On Wed, Mar 01, 2017 at 09:45:40AM -0500, Peter Eisentraut wrote:
>> > On 2/28/17 04:24, vinayak wrote:
>> > > The view provides the information of analyze command progress details as
>> > > follows
>> > > postgres=# \d pg_stat_progress_analyze
>> > >View "pg_catalog.pg_stat_progress_analyze"
>> >
>> > Hmm, do we want a separate "progress" system view for every kind of
>> > command?  What if someone comes up with a progress checker for CREATE
>> > INDEX, REINDEX, CLUSTER, etc.?
>
> I don't think that'd be that bad, otherwise the naming of the fields is
> complicated.

+1.

I suppose if it gets really out of control we might have to rethink,
but 2 is not 50, and having appropriate column names is worth a lot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-06 Thread Haribabu Kommi

On Tue, Mar 7, 2017 at 5:01 PM, Michael Paquier 
wrote:

>
> @@ -496,7 +499,6 @@ do_analyze_rel(Relation onerel, int options,
> VacuumParams *params,
> numrows = (*acquirefunc) (onerel, elevel,
>   rows, targrows,
>   , );
> -
> /*
> Useless diff.
>
> + 
> +   ANALYZE is currently collecting the sample rows.
> +   The sample it reads is taken randomly.Its size depends on
> +   the default_statistics_target parameter value.
> + 
> This should use a  markup for default_statistics_target.
>
> @@ -203,6 +204,8 @@ analyze_rel(Oid relid, RangeVar *relation, int options,
> if (onerel->rd_rel->relkind == RELKIND_RELATION ||
> onerel->rd_rel->relkind == RELKIND_MATVIEW)
> {
> +   pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
> +   RelationGetRelid(onerel));
> It seems to me that the report should begin in do_analyze_rel().


some more comments,

+ /* Report compute heap stats phase */
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);

The above analyze phase is updated inside a for loop, instead just set it
above once.

+ /* Report compute index stats phase */
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);

Irrespective of whether the index exists on the table or not, the above
analyze phase
is set. It is better to set the above phase and index cleanup phase only
when there
are indexes on the table.

+ /* Report total number of heap blocks and collectinf sample row phase*/

Typo? collecting?


+ /* Report total number of heap blocks and collectinf sample row phase*/
+ initprog_val[0] = PROGRESS_ANALYZE_PHASE_COLLECT_HEAP_SAMPLE_ROWS;
+ initprog_val[1] = totalblocks;
+ pgstat_progress_update_multi_param(2, initprog_index, initprog_val);

acquire_sample_rows function is called from acquire_inherited_sample_rows
function, so adding the phase in that function will provide wrong info.


+#define PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS 2

why there is no code added for the phase, any specific reason?


+/* Phases of analyze */

Can be written as following for better understanding, and also
similar like vacuum.

/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */


Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] ANALYZE command progress checker

2017-03-06 Thread Michael Paquier

On Mon, Mar 6, 2017 at 3:58 PM, Andres Freund  wrote:
> On 2017-03-03 15:33:15 -0500, David Steele wrote:
>> On 3/1/17 1:25 PM, Andres Freund wrote:
>> > On 2017-03-01 10:20:41 -0800, David Fetter wrote:
>> >> On Wed, Mar 01, 2017 at 09:45:40AM -0500, Peter Eisentraut wrote:
>> >>> On 2/28/17 04:24, vinayak wrote:
>>  The view provides the information of analyze command progress details as
>>  follows
>>  postgres=# \d pg_stat_progress_analyze
>> View "pg_catalog.pg_stat_progress_analyze"
>> >>>
>> >>> Hmm, do we want a separate "progress" system view for every kind of
>> >>> command?  What if someone comes up with a progress checker for CREATE
>> >>> INDEX, REINDEX, CLUSTER, etc.?
>> >
>> > I don't think that'd be that bad, otherwise the naming of the fields is
>> > complicated.  I guess the alternative (or do both?) would be to to have
>> > a pivoted table, but that'd painful to query.  Do you have a better idea?
>> >
>> >
>> >> Some kind of design for progress seems like a good plan.  Some ideas:
>> >>
>> >> - System view(s)
>> >>
>> >> This has the advantage of being shown to work at least to a PoC by
>> >> this patch, and is similar to extant system views like
>> >> pg_stat_activity in the sense of capturing a moment in time.
>> >>
>> >> - NOTIFY
>> >>
>> >> Event-driven model as opposed to a polling one.  This is
>> >> attractive on efficiency grounds, less so on reliability ones.
>> >>
>> >> - Something added to the wire protocol
>> >>
>> >> More specialized, limits the information to the session where the
>> >> command was issued
>> >>
>> >> - Other things not named here
>> >
>> > We now have a framework for this [1] (currently used by vacuum, but
>> > extensible). The question is about presentation.  I'm fairly sure that
>> > we shouldn't just add yet another framework, and I doubt that that's
>> > what's proposed by Peter.
>>
>> I think the idea of a general progress view is very valuable and there
>> are a ton of operations it could be used for:  full table scans, index
>> rebuilds, vacuum, copy, etc.
>> However, I feel that this proposal is not flexible enough and comes too
>> late in the release cycle to allow development into something that could
>> be committed.
>
> I'm not following. As I pointed out, we already have this framework?
> This patch is just a short one using that framework?
>
>
>> I propose we move this to the 2017-07 CF so the idea can be more fully
>> developed.
>
> I don't see that being warranted in this case, we're really not talking
> about something complicated:
>  doc/src/sgml/monitoring.sgml |  135 
> +++
>  src/backend/catalog/system_views.sql |   16 
>  src/backend/commands/analyze.c   |   34 
>  src/backend/utils/adt/pgstatfuncs.c  |2
>  src/include/commands/progress.h  |   13 +++
>  src/include/pgstat.h |3
>  src/test/regress/expected/rules.out  |   18 
>  7 files changed, 219 insertions(+), 2 deletions(-)
> excepting tests and docs, this is very little actual code.

Or 35 lines just for the backend portion, it is hard to something smaller.

@@ -496,7 +499,6 @@ do_analyze_rel(Relation onerel, int options,
VacuumParams *params,
numrows = (*acquirefunc) (onerel, elevel,
  rows, targrows,
  , );
-
/*
Useless diff.

+ 
+   ANALYZE is currently collecting the sample rows.
+   The sample it reads is taken randomly.Its size depends on
+   the default_statistics_target parameter value.
+ 
This should use a  markup for default_statistics_target.

@@ -203,6 +204,8 @@ analyze_rel(Oid relid, RangeVar *relation, int options,
if (onerel->rd_rel->relkind == RELKIND_RELATION ||
onerel->rd_rel->relkind == RELKIND_MATVIEW)
{
+   pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
+   RelationGetRelid(onerel));
It seems to me that the report should begin in do_analyze_rel().
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-06 Thread David Steele

On 3/6/17 1:58 AM, Andres Freund wrote:
> On 2017-03-03 15:33:15 -0500, David Steele wrote:
> 
>> I propose we move this to the 2017-07 CF so the idea can be more fully
>> developed.
> 
> I don't see that being warranted in this case, we're really not talking
> about something complicated:

<...>

> excepting tests and docs, this is very little actual code.

Fair enough.  From my read through it appeared a redesign was
required/requested.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-06 Thread vinayak



On 2017/03/06 17:02, Amit Langote wrote:

Hi Vinayak,

On 2017/02/28 18:24, vinayak wrote:

The attached patch reports the different phases of analyze command.
Added this patch to CF 2017-03.

In the updated monitoring.sgml:

+
+ computing heap stats
+ 
+   VACUUM is currently computing heap stats.
+ 
+
+
+ computing index stats
+ 
+   VACUUM is currently computing index stats.
+ 
+
+
+ cleaning up indexes
+ 
+   ANALYZE is currently cleaning up indexes.
+ 
+
+   
+   
+  

The entries mentioning VACUUM should actually say ANALYZE.

Yes. Thank you.
I will fix it.

Regards,
Vinayak Pokale
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-06 Thread Amit Langote

Hi Vinayak,

On 2017/02/28 18:24, vinayak wrote:
> The attached patch reports the different phases of analyze command.
> Added this patch to CF 2017-03.

In the updated monitoring.sgml:

+
+ computing heap stats
+ 
+   VACUUM is currently computing heap stats.
+ 
+
+
+ computing index stats
+ 
+   VACUUM is currently computing index stats.
+ 
+
+
+ cleaning up indexes
+ 
+   ANALYZE is currently cleaning up indexes.
+ 
+
+   
+   
+  

The entries mentioning VACUUM should actually say ANALYZE.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-05 Thread Andres Freund

On 2017-03-03 15:33:15 -0500, David Steele wrote:
> On 3/1/17 1:25 PM, Andres Freund wrote:
> > On 2017-03-01 10:20:41 -0800, David Fetter wrote:
> >> On Wed, Mar 01, 2017 at 09:45:40AM -0500, Peter Eisentraut wrote:
> >>> On 2/28/17 04:24, vinayak wrote:
>  The view provides the information of analyze command progress details as 
>  follows
>  postgres=# \d pg_stat_progress_analyze
> View "pg_catalog.pg_stat_progress_analyze"
> >>>
> >>> Hmm, do we want a separate "progress" system view for every kind of
> >>> command?  What if someone comes up with a progress checker for CREATE
> >>> INDEX, REINDEX, CLUSTER, etc.?
> > 
> > I don't think that'd be that bad, otherwise the naming of the fields is
> > complicated.  I guess the alternative (or do both?) would be to to have
> > a pivoted table, but that'd painful to query.  Do you have a better idea?
> > 
> > 
> >> Some kind of design for progress seems like a good plan.  Some ideas:
> >>
> >> - System view(s)
> >>
> >> This has the advantage of being shown to work at least to a PoC by
> >> this patch, and is similar to extant system views like
> >> pg_stat_activity in the sense of capturing a moment in time.
> >>
> >> - NOTIFY
> >>
> >> Event-driven model as opposed to a polling one.  This is
> >> attractive on efficiency grounds, less so on reliability ones.
> >>
> >> - Something added to the wire protocol
> >>
> >> More specialized, limits the information to the session where the
> >> command was issued
> >>
> >> - Other things not named here
> > 
> > We now have a framework for this [1] (currently used by vacuum, but
> > extensible). The question is about presentation.  I'm fairly sure that
> > we shouldn't just add yet another framework, and I doubt that that's
> > what's proposed by Peter.
> 
> I think the idea of a general progress view is very valuable and there
> are a ton of operations it could be used for:  full table scans, index
> rebuilds, vacuum, copy, etc.
> However, I feel that this proposal is not flexible enough and comes too
> late in the release cycle to allow development into something that could
> be committed.

I'm not following. As I pointed out, we already have this framework?
This patch is just a short one using that framework?


> I propose we move this to the 2017-07 CF so the idea can be more fully
> developed.

I don't see that being warranted in this case, we're really not talking
about something complicated:
 doc/src/sgml/monitoring.sgml |  135 +++
 src/backend/catalog/system_views.sql |   16 
 src/backend/commands/analyze.c   |   34 
 src/backend/utils/adt/pgstatfuncs.c  |2 
 src/include/commands/progress.h  |   13 +++
 src/include/pgstat.h |3 
 src/test/regress/expected/rules.out  |   18 
 7 files changed, 219 insertions(+), 2 deletions(-)
excepting tests and docs, this is very little actual code.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-05 Thread Michael Paquier

On Sat, Mar 4, 2017 at 5:33 AM, David Steele  wrote:
> I think the idea of a general progress view is very valuable and there
> are a ton of operations it could be used for:  full table scans, index
> rebuilds, vacuum, copy, etc.
>
> However, I feel that this proposal is not flexible enough and comes too
> late in the release cycle to allow development into something that could
> be committed.

Well, each command really has its own requirements in terms of data to
store, so we either finish with a bunch of small tables that anyone
could query and join as they wish or a somewhat unique table that is
bloated with all the information, with a set of views on top of it to
query all the information. For extensibility's sake of each command
(for example imagine that REINDEX could be extended with a
CONCURRENTLY option and multiple phases), I would think that having a
table per command type would not be that bad.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-03 Thread David Steele

On 3/1/17 1:25 PM, Andres Freund wrote:
> On 2017-03-01 10:20:41 -0800, David Fetter wrote:
>> On Wed, Mar 01, 2017 at 09:45:40AM -0500, Peter Eisentraut wrote:
>>> On 2/28/17 04:24, vinayak wrote:
 The view provides the information of analyze command progress details as 
 follows
 postgres=# \d pg_stat_progress_analyze
View "pg_catalog.pg_stat_progress_analyze"
>>>
>>> Hmm, do we want a separate "progress" system view for every kind of
>>> command?  What if someone comes up with a progress checker for CREATE
>>> INDEX, REINDEX, CLUSTER, etc.?
> 
> I don't think that'd be that bad, otherwise the naming of the fields is
> complicated.  I guess the alternative (or do both?) would be to to have
> a pivoted table, but that'd painful to query.  Do you have a better idea?
> 
> 
>> Some kind of design for progress seems like a good plan.  Some ideas:
>>
>> - System view(s)
>>
>> This has the advantage of being shown to work at least to a PoC by
>> this patch, and is similar to extant system views like
>> pg_stat_activity in the sense of capturing a moment in time.
>>
>> - NOTIFY
>>
>> Event-driven model as opposed to a polling one.  This is
>> attractive on efficiency grounds, less so on reliability ones.
>>
>> - Something added to the wire protocol
>>
>> More specialized, limits the information to the session where the
>> command was issued
>>
>> - Other things not named here
> 
> We now have a framework for this [1] (currently used by vacuum, but
> extensible). The question is about presentation.  I'm fairly sure that
> we shouldn't just add yet another framework, and I doubt that that's
> what's proposed by Peter.

I think the idea of a general progress view is very valuable and there
are a ton of operations it could be used for:  full table scans, index
rebuilds, vacuum, copy, etc.

However, I feel that this proposal is not flexible enough and comes too
late in the release cycle to allow development into something that could
be committed.

I propose we move this to the 2017-07 CF so the idea can be more fully
developed.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-01 Thread Andres Freund



On March 1, 2017 11:34:48 AM PST, David Fetter  wrote:
>I notice that that commit has no SGML component.  Should it have one?

Don't think so.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-01 Thread David Fetter

On Wed, Mar 01, 2017 at 10:28:23AM -0800, Andres Freund wrote:
> On 2017-03-01 10:25:49 -0800, Andres Freund wrote:
> > We now have a framework for this [1] (currently used by vacuum, but
> > extensible). The question is about presentation.  I'm fairly sure that
> > we shouldn't just add yet another framework, and I doubt that that's
> > what's proposed by Peter.
> > 
> > [1]
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c16dc1aca5e
> 
> Majority of that is actually in
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b6fb6471f6afaf649e52f38269fd8c5c60647669

If that's even vaguely usable, I'd say we should use it for this.

I notice that that commit has no SGML component.  Should it have one?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-01 Thread Andres Freund

On 2017-03-01 10:25:49 -0800, Andres Freund wrote:
> We now have a framework for this [1] (currently used by vacuum, but
> extensible). The question is about presentation.  I'm fairly sure that
> we shouldn't just add yet another framework, and I doubt that that's
> what's proposed by Peter.
> 
> [1]
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c16dc1aca5e

Majority of that is actually in
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b6fb6471f6afaf649e52f38269fd8c5c60647669


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-01 Thread Andres Freund

On 2017-03-01 10:20:41 -0800, David Fetter wrote:
> On Wed, Mar 01, 2017 at 09:45:40AM -0500, Peter Eisentraut wrote:
> > On 2/28/17 04:24, vinayak wrote:
> > > The view provides the information of analyze command progress details as 
> > > follows
> > > postgres=# \d pg_stat_progress_analyze
> > >View "pg_catalog.pg_stat_progress_analyze"
> > 
> > Hmm, do we want a separate "progress" system view for every kind of
> > command?  What if someone comes up with a progress checker for CREATE
> > INDEX, REINDEX, CLUSTER, etc.?

I don't think that'd be that bad, otherwise the naming of the fields is
complicated.  I guess the alternative (or do both?) would be to to have
a pivoted table, but that'd painful to query.  Do you have a better idea?


> Some kind of design for progress seems like a good plan.  Some ideas:
> 
> - System view(s)
> 
> This has the advantage of being shown to work at least to a PoC by
> this patch, and is similar to extant system views like
> pg_stat_activity in the sense of capturing a moment in time.
> 
> - NOTIFY
> 
> Event-driven model as opposed to a polling one.  This is
> attractive on efficiency grounds, less so on reliability ones.
> 
> - Something added to the wire protocol
> 
> More specialized, limits the information to the session where the
> command was issued
> 
> - Other things not named here

We now have a framework for this [1] (currently used by vacuum, but
extensible). The question is about presentation.  I'm fairly sure that
we shouldn't just add yet another framework, and I doubt that that's
what's proposed by Peter.

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c16dc1aca5e


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-01 Thread David Fetter

On Wed, Mar 01, 2017 at 09:45:40AM -0500, Peter Eisentraut wrote:
> On 2/28/17 04:24, vinayak wrote:
> > The view provides the information of analyze command progress details as 
> > follows
> > postgres=# \d pg_stat_progress_analyze
> >View "pg_catalog.pg_stat_progress_analyze"
> 
> Hmm, do we want a separate "progress" system view for every kind of
> command?  What if someone comes up with a progress checker for CREATE
> INDEX, REINDEX, CLUSTER, etc.?

Some kind of design for progress seems like a good plan.  Some ideas:

- System view(s)

This has the advantage of being shown to work at least to a PoC by
this patch, and is similar to extant system views like
pg_stat_activity in the sense of capturing a moment in time.

- NOTIFY

Event-driven model as opposed to a polling one.  This is
attractive on efficiency grounds, less so on reliability ones.

- Something added to the wire protocol

More specialized, limits the information to the session where the
command was issued

- Other things not named here

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-01 Thread Peter Eisentraut

On 2/28/17 04:24, vinayak wrote:
> The view provides the information of analyze command progress details as 
> follows
> postgres=# \d pg_stat_progress_analyze
>View "pg_catalog.pg_stat_progress_analyze"

Hmm, do we want a separate "progress" system view for every kind of
command?  What if someone comes up with a progress checker for CREATE
INDEX, REINDEX, CLUSTER, etc.?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

46 matches

Mail list logo