Re: [HACKERS] ANALYZE command progress checker
> On 05 Apr 2017, at 03:17, Masahiko Sawadawrote: > > On Wed, Apr 5, 2017 at 1:49 AM, Robert Haas wrote: >> On Tue, Apr 4, 2017 at 4:57 AM, Amit Langote >> wrote: >>> Hmm, you're right. It could be counted with a separate variable >>> initialized to 0 and incremented every time we decide to add a row to the >>> final set of sampled rows, although different implementations of >>> AcquireSampleRowsFunc have different ways of deciding if a given row will >>> be part of the final set of sampled rows. >>> >>> On the other hand, if we decide to count progress in terms of blocks as >>> you suggested afraid, I'm afraid that FDWs won't be able to report the >>> progress. >> >> I think it may be time to push this patch out to v11. It was >> submitted one day before the start of the last CommitFest, the design >> wasn't really right, and it's not clear even now that we know what the >> right design is. And we're pretty much out of time. > > +1 > We're encountering the design issue and it takes more time to find out > right design including FDWs support. I’m marking this patch Returned with Feedback since it has been Waiting for author during the commitfest without updates. cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 2017/04/05 10:17, Masahiko Sawada wrote: On Wed, Apr 5, 2017 at 1:49 AM, Robert Haaswrote: On Tue, Apr 4, 2017 at 4:57 AM, Amit Langote wrote: Hmm, you're right. It could be counted with a separate variable initialized to 0 and incremented every time we decide to add a row to the final set of sampled rows, although different implementations of AcquireSampleRowsFunc have different ways of deciding if a given row will be part of the final set of sampled rows. On the other hand, if we decide to count progress in terms of blocks as you suggested afraid, I'm afraid that FDWs won't be able to report the progress. I think it may be time to push this patch out to v11. It was submitted one day before the start of the last CommitFest, the design wasn't really right, and it's not clear even now that we know what the right design is. And we're pretty much out of time. +1 We're encountering the design issue and it takes more time to find out right design including FDWs support. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center Hi Vinayak, I found a typo in your patch: s/taht/that/ sampling.c /* Report total number of blocks taht will be sampled */ Then, regarding FDWs support, I believe it means postgres_fdw support (is my understanding right?). You might better to put pgstat_progress_update_param() into these functions, maybe. postgres_fdw.c - postgresAnalyzeForeignTable - postgresAcquireSampleRowsFunc And PgFdwAnalyzeState has these variables, hopefully you can get current number of rows as a progress indicator. sturuct PgFdwAnalyzeState ... /* collected sample rows */ HeapTuple *rows; /* array of size targrows */ int targrows; /* target # of sample rows */ int numrows;/* # of sample rows collected */ /* for random sampling */ double samplerows; /* # of rows fetched */ double rowstoskip; /* # of rows to skip before next sample */ ... I hope it will help you. Regards, Tatsuro Yamada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On Wed, Apr 5, 2017 at 1:49 AM, Robert Haaswrote: > On Tue, Apr 4, 2017 at 4:57 AM, Amit Langote > wrote: >> Hmm, you're right. It could be counted with a separate variable >> initialized to 0 and incremented every time we decide to add a row to the >> final set of sampled rows, although different implementations of >> AcquireSampleRowsFunc have different ways of deciding if a given row will >> be part of the final set of sampled rows. >> >> On the other hand, if we decide to count progress in terms of blocks as >> you suggested afraid, I'm afraid that FDWs won't be able to report the >> progress. > > I think it may be time to push this patch out to v11. It was > submitted one day before the start of the last CommitFest, the design > wasn't really right, and it's not clear even now that we know what the > right design is. And we're pretty much out of time. > +1 We're encountering the design issue and it takes more time to find out right design including FDWs support. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On Tue, Apr 4, 2017 at 4:57 AM, Amit Langotewrote: > Hmm, you're right. It could be counted with a separate variable > initialized to 0 and incremented every time we decide to add a row to the > final set of sampled rows, although different implementations of > AcquireSampleRowsFunc have different ways of deciding if a given row will > be part of the final set of sampled rows. > > On the other hand, if we decide to count progress in terms of blocks as > you suggested afraid, I'm afraid that FDWs won't be able to report the > progress. I think it may be time to push this patch out to v11. It was submitted one day before the start of the last CommitFest, the design wasn't really right, and it's not clear even now that we know what the right design is. And we're pretty much out of time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 2017/04/04 15:30, Masahiko Sawada wrote: >> We can report progress in terms of individual blocks only inside >> acquire_sample_rows(), which seems undesirable when one thinks that we >> will be resetting the target for every child table. We should have a >> global target that considers all child tables in the inheritance >> hierarchy, which maybe is possible if we count them beforehand in >> acquire_inheritance_sample_rows(). But why not use target sample rows, >> which remains the same for both when we're collecting sample rows from one >> table and from the whole inheritance hierarchy. We can keep the count of >> already collected rows in a struct that is used across calls for all the >> child tables and increment upward from that count when we start collecting >> from a new child table. > > An another option I came up with is that support new pgstat progress > function, say pgstat_progress_incr_param, which increments index'th > member in st_progress_param[]. That way we just need to report a delta > using that function. That's an interesting idea. It could be made to work and would not require changing the interface of AcquireSampleRowsFunc, which seems very desirable. /* * The first targrows sample rows are simply copied into the * reservoir. Then we start replacing tuples in the sample * until we reach the end of the relation. This algorithm is * from Jeff Vitter's paper (see full citation below). It * works by repeatedly computing the number of tuples to skip * before selecting a tuple, which replaces a randomly chosen * element of the reservoir (current set of tuples). At all * times the reservoir is a true random sample of the tuples * we've passed over so far, so when we fall off the end of * the relation we're done. */ >> >> It seems that we could use samplerows instead of numrows to count the >> progress (if we choose to count progress in terms of sample rows collected). >> > > I guess it's hard to count progress in terms of sample rows collected > even if we use samplerows instead, because samplerows can be > incremented independently of the target number of sampling rows. The > samplerows can be incremented up to the total number of rows of > relation. Hmm, you're right. It could be counted with a separate variable initialized to 0 and incremented every time we decide to add a row to the final set of sampled rows, although different implementations of AcquireSampleRowsFunc have different ways of deciding if a given row will be part of the final set of sampled rows. On the other hand, if we decide to count progress in terms of blocks as you suggested afraid, I'm afraid that FDWs won't be able to report the progress. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On Tue, Apr 4, 2017 at 2:12 PM, Amit Langotewrote: > On 2017/03/30 17:39, Masahiko Sawada wrote: >> On Wed, Mar 29, 2017 at 5:38 PM, vinayak wrote: > I have updated the patch. >> >> I reviewed v7 patch. >> >> When I ran ANALYZE command to the table having 5 millions rows with 3 >> child tables, the progress report I got shows the strange result. The >> following result was output after sampled all target rows from child >> tables (i.g, after finished acquire_inherited_sample_rows). I think >> the progress information should report 100% complete at this point. >> > > [...] > >> >> I guess the cause of this is that you always count the number of >> sampled rows in acquire_sample_rows staring from 0 for each child >> table. num_rows_sampled is reset whenever starting collecting sample >> rows. >> Also, even if table has child table, the total number of sampling row >> is not changed. I think that main differences between analyzing on >> normal table and on partitioned table is where we fetch the sample row >> from. So I'm not sure if adding "computing inherited statistics" phase >> is desirable. If you want to report progress of collecting sample rows >> on child table I think that you might want to add the information >> showing which child table is being analyzed. > > Two kinds of statistics are collected if the table is a inheritance parent. > > First kind considers the table by itself and calculates regular > single-table statistics using rows sampled from the only available heap > (by calling do_analyze_rel() with inh=false). > > The second kind are called inheritance statistics, which consider rows > sampled from the whole inheritance hierarchy (by calling do_analyze_rel() > with inh=true). Note that we are still collecting statistics for the > parent table, so we not really "analyzing" child tables; we just collect > sample rows from them to calculate whole-tree statistics for the root > parent table. Agreed. > It might still be possible to show the child table being > sampled though. > >> -- >> pg_stat_progress_analyze.num_target_sample_rows is set while ignoring >> the number of rows the target table has actually. So If the table has >> rows less than target number of rows, the num_rows_sampled never reach >> to num_target_sample_rows. >> >> -- >> @@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel, >> } >> >> samplerows += 1; >> + >> + /* Report number of rows sampled so far */ >> + >> pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED, >> numrows); >> } >> } >> >> I think that it's wrong to use numrows variable to report the progress >> of the number of rows sampled. As the following comment mentioned, we >> first save row into rows[] and increment numrows until numrows < >> targrows. And then we could replace stored sample row with new sampled >> row. That is, after collecting "numrows" rows, analyze can still take >> a long time to get and replace the sample row. To computing progress >> of collecting sample row, IMO reporting the number of blocks we >> scanned so far is better than number of sample rows. Thought? > > We can report progress in terms of individual blocks only inside > acquire_sample_rows(), which seems undesirable when one thinks that we > will be resetting the target for every child table. We should have a > global target that considers all child tables in the inheritance > hierarchy, which maybe is possible if we count them beforehand in > acquire_inheritance_sample_rows(). But why not use target sample rows, > which remains the same for both when we're collecting sample rows from one > table and from the whole inheritance hierarchy. We can keep the count of > already collected rows in a struct that is used across calls for all the > child tables and increment upward from that count when we start collecting > from a new child table. An another option I came up with is that support new pgstat progress function, say pgstat_progress_incr_param, which increments index'th member in st_progress_param[]. That way we just need to report a delta using that function. > >>> /* >>> * The first targrows sample rows are simply copied into the >>> * reservoir. Then we start replacing tuples in the sample >>> * until we reach the end of the relation. This algorithm is >>> * from Jeff Vitter's paper (see full citation below). It >>> * works by repeatedly computing the number of tuples to skip >>> * before selecting a tuple, which replaces a randomly chosen >>> * element of the reservoir (current set of tuples). At all >>> * times the reservoir is a true random sample of the tuples >>> * we've passed over so far, so when we fall off the end of >>> * the relation we're done. >>> */ > > It seems that we could use samplerows instead
Re: [HACKERS] ANALYZE command progress checker
On 2017/04/04 14:15, Amit Langote wrote: > On 2017/04/04 14:12, Amit Langote wrote: >> Two kinds of statistics are collected if the table is a inheritance parent. >> >> First kind considers the table by itself and calculates regular >> single-table statistics using rows sampled from the only available heap >> (by calling do_analyze_rel() with inh=false). > > Oops, should have also mentioned that this part is not true for the new > partitioned tables. There are single-table statistics for partitioned > tables, because it contains no data. Sorry again, I meant *no* single-table statistics partitioned tables... Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 2017/04/04 14:12, Amit Langote wrote: > Two kinds of statistics are collected if the table is a inheritance parent. > > First kind considers the table by itself and calculates regular > single-table statistics using rows sampled from the only available heap > (by calling do_analyze_rel() with inh=false). Oops, should have also mentioned that this part is not true for the new partitioned tables. There are single-table statistics for partitioned tables, because it contains no data. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 2017/03/30 17:39, Masahiko Sawada wrote: > On Wed, Mar 29, 2017 at 5:38 PM, vinayak wrote: I have updated the patch. > > I reviewed v7 patch. > > When I ran ANALYZE command to the table having 5 millions rows with 3 > child tables, the progress report I got shows the strange result. The > following result was output after sampled all target rows from child > tables (i.g, after finished acquire_inherited_sample_rows). I think > the progress information should report 100% complete at this point. > [...] > > I guess the cause of this is that you always count the number of > sampled rows in acquire_sample_rows staring from 0 for each child > table. num_rows_sampled is reset whenever starting collecting sample > rows. > Also, even if table has child table, the total number of sampling row > is not changed. I think that main differences between analyzing on > normal table and on partitioned table is where we fetch the sample row > from. So I'm not sure if adding "computing inherited statistics" phase > is desirable. If you want to report progress of collecting sample rows > on child table I think that you might want to add the information > showing which child table is being analyzed. Two kinds of statistics are collected if the table is a inheritance parent. First kind considers the table by itself and calculates regular single-table statistics using rows sampled from the only available heap (by calling do_analyze_rel() with inh=false). The second kind are called inheritance statistics, which consider rows sampled from the whole inheritance hierarchy (by calling do_analyze_rel() with inh=true). Note that we are still collecting statistics for the parent table, so we not really "analyzing" child tables; we just collect sample rows from them to calculate whole-tree statistics for the root parent table. It might still be possible to show the child table being sampled though. > -- > pg_stat_progress_analyze.num_target_sample_rows is set while ignoring > the number of rows the target table has actually. So If the table has > rows less than target number of rows, the num_rows_sampled never reach > to num_target_sample_rows. > > -- > @@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel, > } > > samplerows += 1; > + > + /* Report number of rows sampled so far */ > + > pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED, > numrows); > } > } > > I think that it's wrong to use numrows variable to report the progress > of the number of rows sampled. As the following comment mentioned, we > first save row into rows[] and increment numrows until numrows < > targrows. And then we could replace stored sample row with new sampled > row. That is, after collecting "numrows" rows, analyze can still take > a long time to get and replace the sample row. To computing progress > of collecting sample row, IMO reporting the number of blocks we > scanned so far is better than number of sample rows. Thought? We can report progress in terms of individual blocks only inside acquire_sample_rows(), which seems undesirable when one thinks that we will be resetting the target for every child table. We should have a global target that considers all child tables in the inheritance hierarchy, which maybe is possible if we count them beforehand in acquire_inheritance_sample_rows(). But why not use target sample rows, which remains the same for both when we're collecting sample rows from one table and from the whole inheritance hierarchy. We can keep the count of already collected rows in a struct that is used across calls for all the child tables and increment upward from that count when we start collecting from a new child table. >> /* >> * The first targrows sample rows are simply copied into the >> * reservoir. Then we start replacing tuples in the sample >> * until we reach the end of the relation. This algorithm is >> * from Jeff Vitter's paper (see full citation below). It >> * works by repeatedly computing the number of tuples to skip >> * before selecting a tuple, which replaces a randomly chosen >> * element of the reservoir (current set of tuples). At all >> * times the reservoir is a true random sample of the tuples >> * we've passed over so far, so when we fall off the end of >> * the relation we're done. >> */ It seems that we could use samplerows instead of numrows to count the progress (if we choose to count progress in terms of sample rows collected). Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 2017/03/30 17:39, Masahiko Sawada wrote: On Wed, Mar 29, 2017 at 5:38 PM, vinayakwrote: On 2017/03/25 4:30, Robert Haas wrote: On Fri, Mar 24, 2017 at 3:41 AM, vinayak wrote: I have updated the patch. You can't change the definition of AcquireSampleRowsFunc without updating the documentation in fdwhandler.sgml, but I think I don't immediately understand why that's a thing we want to do at all if neither file_fdw nor postgres_fdw are going to use the additional argument. It seems to be entirely pointless because the new value being passed down is always zero and even if the callee were to update it, do_analyze_rel() would just ignore the change on return. Am I missing something here? Also, the whitespace-only changes to fdwapi.h related to AcquireSampleRowsFunc going to get undone by pgindent, so even if there's some reason to change that you should leave the lines that don't need changing untouched. I reviewed v7 patch. When I ran ANALYZE command to the table having 5 millions rows with 3 child tables, the progress report I got shows the strange result. The following result was output after sampled all target rows from child tables (i.g, after finished acquire_inherited_sample_rows). I think the progress information should report 100% complete at this point. #= select * from pg_stat_progress_analyze ; pid | datid | datname | relid | phase | num_target_sample_rows | num_rows_sampled ---+---+--+---+--++-- 81719 | 13215 | postgres | 16440 | collecting inherited sample rows | 300 | 180 (1 row) I guess the cause of this is that you always count the number of sampled rows in acquire_sample_rows staring from 0 for each child table. num_rows_sampled is reset whenever starting collecting sample rows. Also, even if table has child table, the total number of sampling row is not changed. I think that main differences between analyzing on normal table and on partitioned table is where we fetch the sample row from. So I'm not sure if adding "computing inherited statistics" phase is desirable. If you want to report progress of collecting sample rows on child table I think that you might want to add the information showing which child table is being analyzed. -- pg_stat_progress_analyze.num_target_sample_rows is set while ignoring the number of rows the target table has actually. So If the table has rows less than target number of rows, the num_rows_sampled never reach to num_target_sample_rows. -- @@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel, } samplerows += 1; + + /* Report number of rows sampled so far */ + pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED, numrows); } } I think that it's wrong to use numrows variable to report the progress of the number of rows sampled. As the following comment mentioned, we first save row into rows[] and increment numrows until numrows < targrows. And then we could replace stored sample row with new sampled row. That is, after collecting "numrows" rows, analyze can still take a long time to get and replace the sample row. To computing progress of collecting sample row, IMO reporting the number of blocks we scanned so far is better than number of sample rows. Thought? /* * The first targrows sample rows are simply copied into the * reservoir. Then we start replacing tuples in the sample * until we reach the end of the relation. This algorithm is * from Jeff Vitter's paper (see full citation below). It * works by repeatedly computing the number of tuples to skip * before selecting a tuple, which replaces a randomly chosen * element of the reservoir (current set of tuples). At all * times the reservoir is a true random sample of the tuples * we've passed over so far, so when we fall off the end of * the relation we're done. */ Looks good to me. In the attached patch I have reported number of blocks scanned so far instead of number of sample rows. In the 'collecting inherited sample rows' phase, child_relid is reported as a separate column. The child_relid is reported its value only in 'collecting inherited sample rows' phase. For single relation it return 0. I am not sure whether this column would helpful or not. Any suggestions are welcome. +/* Reset rows processed to zero for the next column */ + pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_PROCESSED, 0); This seems like a bad design. If you see a value other than zero in this field, you still won't know how much progress analyze has made, because you don't know which column you're processing. I also feel like
Re: [HACKERS] ANALYZE command progress checker
On Thu, Mar 30, 2017 at 6:19 PM, vinayakwrote: > > On 2017/03/30 17:39, Masahiko Sawada wrote: >> >> On Wed, Mar 29, 2017 at 5:38 PM, vinayak >> wrote: >>> >>> On 2017/03/25 4:30, Robert Haas wrote: On Fri, Mar 24, 2017 at 3:41 AM, vinayak wrote: > > I have updated the patch. You can't change the definition of AcquireSampleRowsFunc without updating the documentation in fdwhandler.sgml, but I think I don't immediately understand why that's a thing we want to do at all if neither file_fdw nor postgres_fdw are going to use the additional argument. It seems to be entirely pointless because the new value being passed down is always zero and even if the callee were to update it, do_analyze_rel() would just ignore the change on return. Am I missing something here? Also, the whitespace-only changes to fdwapi.h related to AcquireSampleRowsFunc going to get undone by pgindent, so even if there's some reason to change that you should leave the lines that don't need changing untouched. >> >> I reviewed v7 patch. > > Thank you for reviewing the patch. >> >> >> When I ran ANALYZE command to the table having 5 millions rows with 3 >> child tables, the progress report I got shows the strange result. The >> following result was output after sampled all target rows from child >> tables (i.g, after finished acquire_inherited_sample_rows). I think >> the progress information should report 100% complete at this point. >> >> #= select * from pg_stat_progress_analyze ; >>pid | datid | datname | relid | phase | >> num_target_sample_rows | num_rows_sampled >> >> ---+---+--+---+--++-- >> 81719 | 13215 | postgres | 16440 | collecting inherited sample rows | >> 300 | 180 >> (1 row) >> >> I guess the cause of this is that you always count the number of >> sampled rows in acquire_sample_rows staring from 0 for each child >> table. num_rows_sampled is reset whenever starting collecting sample >> rows. >> Also, even if table has child table, the total number of sampling row >> is not changed. I think that main differences between analyzing on >> normal table and on partitioned table is where we fetch the sample row >> from. So I'm not sure if adding "computing inherited statistics" phase >> is desirable. If you want to report progress of collecting sample rows >> on child table I think that you might want to add the information >> showing which child table is being analyzed. > > Yes. I think showing child table information would be good to user/DBA. > >> -- >> pg_stat_progress_analyze.num_target_sample_rows is set while ignoring >> the number of rows the target table has actually. So If the table has >> rows less than target number of rows, the num_rows_sampled never reach >> to num_target_sample_rows. >> >> -- >> @@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel, >> } >> >> samplerows += 1; >> + >> + /* Report number of rows sampled so far */ >> + >> pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED, >> numrows); >> } >> } >> >> I think that it's wrong to use numrows variable to report the progress >> of the number of rows sampled. As the following comment mentioned, we >> first save row into rows[] and increment numrows until numrows < >> targrows. And then we could replace stored sample row with new sampled >> row. That is, after collecting "numrows" rows, analyze can still take >> a long time to get and replace the sample row. To computing progress >> of collecting sample row, IMO reporting the number of blocks we >> scanned so far is better than number of sample rows. Thought? >> >>> /* >>> * The first targrows sample rows are simply copied into the >>> * reservoir. Then we start replacing tuples in the sample >>> * until we reach the end of the relation. This algorithm is >>> * from Jeff Vitter's paper (see full citation below). It >>> * works by repeatedly computing the number of tuples to skip >>> * before selecting a tuple, which replaces a randomly chosen >>> * element of the reservoir (current set of tuples). At all >>> * times the reservoir is a true random sample of the tuples >>> * we've passed over so far, so when we fall off the end of >>> * the relation we're done. >>> */ > > I think we can either report number of blocks scanned so far or number of > sample rows. > But I agree with you that reporting the number of blocks scanned so far > would be better than reporting number of sample rows. > >>> I Understood that we could not change the
Re: [HACKERS] ANALYZE command progress checker
On 2017/03/30 17:39, Masahiko Sawada wrote: On Wed, Mar 29, 2017 at 5:38 PM, vinayakwrote: On 2017/03/25 4:30, Robert Haas wrote: On Fri, Mar 24, 2017 at 3:41 AM, vinayak wrote: I have updated the patch. You can't change the definition of AcquireSampleRowsFunc without updating the documentation in fdwhandler.sgml, but I think I don't immediately understand why that's a thing we want to do at all if neither file_fdw nor postgres_fdw are going to use the additional argument. It seems to be entirely pointless because the new value being passed down is always zero and even if the callee were to update it, do_analyze_rel() would just ignore the change on return. Am I missing something here? Also, the whitespace-only changes to fdwapi.h related to AcquireSampleRowsFunc going to get undone by pgindent, so even if there's some reason to change that you should leave the lines that don't need changing untouched. I reviewed v7 patch. Thank you for reviewing the patch. When I ran ANALYZE command to the table having 5 millions rows with 3 child tables, the progress report I got shows the strange result. The following result was output after sampled all target rows from child tables (i.g, after finished acquire_inherited_sample_rows). I think the progress information should report 100% complete at this point. #= select * from pg_stat_progress_analyze ; pid | datid | datname | relid | phase | num_target_sample_rows | num_rows_sampled ---+---+--+---+--++-- 81719 | 13215 | postgres | 16440 | collecting inherited sample rows | 300 | 180 (1 row) I guess the cause of this is that you always count the number of sampled rows in acquire_sample_rows staring from 0 for each child table. num_rows_sampled is reset whenever starting collecting sample rows. Also, even if table has child table, the total number of sampling row is not changed. I think that main differences between analyzing on normal table and on partitioned table is where we fetch the sample row from. So I'm not sure if adding "computing inherited statistics" phase is desirable. If you want to report progress of collecting sample rows on child table I think that you might want to add the information showing which child table is being analyzed. Yes. I think showing child table information would be good to user/DBA. -- pg_stat_progress_analyze.num_target_sample_rows is set while ignoring the number of rows the target table has actually. So If the table has rows less than target number of rows, the num_rows_sampled never reach to num_target_sample_rows. -- @@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel, } samplerows += 1; + + /* Report number of rows sampled so far */ + pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED, numrows); } } I think that it's wrong to use numrows variable to report the progress of the number of rows sampled. As the following comment mentioned, we first save row into rows[] and increment numrows until numrows < targrows. And then we could replace stored sample row with new sampled row. That is, after collecting "numrows" rows, analyze can still take a long time to get and replace the sample row. To computing progress of collecting sample row, IMO reporting the number of blocks we scanned so far is better than number of sample rows. Thought? /* * The first targrows sample rows are simply copied into the * reservoir. Then we start replacing tuples in the sample * until we reach the end of the relation. This algorithm is * from Jeff Vitter's paper (see full citation below). It * works by repeatedly computing the number of tuples to skip * before selecting a tuple, which replaces a randomly chosen * element of the reservoir (current set of tuples). At all * times the reservoir is a true random sample of the tuples * we've passed over so far, so when we fall off the end of * the relation we're done. */ I think we can either report number of blocks scanned so far or number of sample rows. But I agree with you that reporting the number of blocks scanned so far would be better than reporting number of sample rows. I Understood that we could not change the definition of AcquireSampleRowsFunc without updating the documentation in fdwhandler.sgml. In the last patch I have modified the definition of AcquireSampleRowsFunc to handle the inheritance case. If the table being analyzed has one or more children, ANALYZE will gather statistics twice: once on the rows of parent table only and second on the rows of the parent table with all of its children. So while reporting
Re: [HACKERS] ANALYZE command progress checker
On Wed, Mar 29, 2017 at 5:38 PM, vinayakwrote: > > On 2017/03/25 4:30, Robert Haas wrote: >> >> On Fri, Mar 24, 2017 at 3:41 AM, vinayak >> wrote: >>> >>> I have updated the patch. >> >> You can't change the definition of AcquireSampleRowsFunc without >> updating the documentation in fdwhandler.sgml, but I think I don't >> immediately understand why that's a thing we want to do at all if >> neither file_fdw nor postgres_fdw are going to use the additional >> argument. It seems to be entirely pointless because the new value >> being passed down is always zero and even if the callee were to update >> it, do_analyze_rel() would just ignore the change on return. Am I >> missing something here? Also, the whitespace-only changes to fdwapi.h >> related to AcquireSampleRowsFunc going to get undone by pgindent, so >> even if there's some reason to change that you should leave the lines >> that don't need changing untouched. I reviewed v7 patch. When I ran ANALYZE command to the table having 5 millions rows with 3 child tables, the progress report I got shows the strange result. The following result was output after sampled all target rows from child tables (i.g, after finished acquire_inherited_sample_rows). I think the progress information should report 100% complete at this point. #= select * from pg_stat_progress_analyze ; pid | datid | datname | relid | phase | num_target_sample_rows | num_rows_sampled ---+---+--+---+--++-- 81719 | 13215 | postgres | 16440 | collecting inherited sample rows | 300 | 180 (1 row) I guess the cause of this is that you always count the number of sampled rows in acquire_sample_rows staring from 0 for each child table. num_rows_sampled is reset whenever starting collecting sample rows. Also, even if table has child table, the total number of sampling row is not changed. I think that main differences between analyzing on normal table and on partitioned table is where we fetch the sample row from. So I'm not sure if adding "computing inherited statistics" phase is desirable. If you want to report progress of collecting sample rows on child table I think that you might want to add the information showing which child table is being analyzed. -- pg_stat_progress_analyze.num_target_sample_rows is set while ignoring the number of rows the target table has actually. So If the table has rows less than target number of rows, the num_rows_sampled never reach to num_target_sample_rows. -- @@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel, } samplerows += 1; + + /* Report number of rows sampled so far */ + pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED, numrows); } } I think that it's wrong to use numrows variable to report the progress of the number of rows sampled. As the following comment mentioned, we first save row into rows[] and increment numrows until numrows < targrows. And then we could replace stored sample row with new sampled row. That is, after collecting "numrows" rows, analyze can still take a long time to get and replace the sample row. To computing progress of collecting sample row, IMO reporting the number of blocks we scanned so far is better than number of sample rows. Thought? > /* > * The first targrows sample rows are simply copied into the > * reservoir. Then we start replacing tuples in the sample > * until we reach the end of the relation. This algorithm is > * from Jeff Vitter's paper (see full citation below). It > * works by repeatedly computing the number of tuples to skip > * before selecting a tuple, which replaces a randomly chosen > * element of the reservoir (current set of tuples). At all > * times the reservoir is a true random sample of the tuples > * we've passed over so far, so when we fall off the end of > * the relation we're done. > */ > I Understood that we could not change the definition of > AcquireSampleRowsFunc without updating the documentation in fdwhandler.sgml. > In the last patch I have modified the definition of AcquireSampleRowsFunc to > handle the inheritance case. > If the table being analyzed has one or more children, ANALYZE will gather > statistics twice: > once on the rows of parent table only and second on the rows of the parent > table with all of its children. > So while reporting the progress the value of number of target sample rows > and number of rows sampled is mismatched. > For single relation it works fine. > > In the attached patch I have not change the definition of > AcquireSampleRowsFunc. > I have updated inheritance case in the the documentation. > >> >> +
Re: [HACKERS] ANALYZE command progress checker
On 2017/03/25 4:30, Robert Haas wrote: On Fri, Mar 24, 2017 at 3:41 AM, vinayakwrote: I have updated the patch. You can't change the definition of AcquireSampleRowsFunc without updating the documentation in fdwhandler.sgml, but I think I don't immediately understand why that's a thing we want to do at all if neither file_fdw nor postgres_fdw are going to use the additional argument. It seems to be entirely pointless because the new value being passed down is always zero and even if the callee were to update it, do_analyze_rel() would just ignore the change on return. Am I missing something here? Also, the whitespace-only changes to fdwapi.h related to AcquireSampleRowsFunc going to get undone by pgindent, so even if there's some reason to change that you should leave the lines that don't need changing untouched. I Understood that we could not change the definition of AcquireSampleRowsFunc without updating the documentation in fdwhandler.sgml. In the last patch I have modified the definition of AcquireSampleRowsFunc to handle the inheritance case. If the table being analyzed has one or more children, ANALYZE will gather statistics twice: once on the rows of parent table only and second on the rows of the parent table with all of its children. So while reporting the progress the value of number of target sample rows and number of rows sampled is mismatched. For single relation it works fine. In the attached patch I have not change the definition of AcquireSampleRowsFunc. I have updated inheritance case in the the documentation. +/* Reset rows processed to zero for the next column */ +pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_PROCESSED, 0); This seems like a bad design. If you see a value other than zero in this field, you still won't know how much progress analyze has made, because you don't know which column you're processing. I also feel like you're not paying enough attention to a key point here that I made in my last review, which is that you need to look at where ANALYZE actually spends most of the time. If the process of computing the per-row statistics consumes significant CPU time, then maybe there's some point in trying to instrument it, but does it? Unless you know the answer to that question, you don't know whether there's even a point to trying to instrument this. Understood. The computing statistics phase takes long time so I am looking at the code. Regards, Vinayak Pokale NTT Open Source Software Center pg_stat_progress_analyze_v7.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On Fri, Mar 24, 2017 at 3:41 AM, vinayakwrote: > I have updated the patch. You can't change the definition of AcquireSampleRowsFunc without updating the documentation in fdwhandler.sgml, but I think I don't immediately understand why that's a thing we want to do at all if neither file_fdw nor postgres_fdw are going to use the additional argument. It seems to be entirely pointless because the new value being passed down is always zero and even if the callee were to update it, do_analyze_rel() would just ignore the change on return. Am I missing something here? Also, the whitespace-only changes to fdwapi.h related to AcquireSampleRowsFunc going to get undone by pgindent, so even if there's some reason to change that you should leave the lines that don't need changing untouched. +/* Reset rows processed to zero for the next column */ +pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_PROCESSED, 0); This seems like a bad design. If you see a value other than zero in this field, you still won't know how much progress analyze has made, because you don't know which column you're processing. I also feel like you're not paying enough attention to a key point here that I made in my last review, which is that you need to look at where ANALYZE actually spends most of the time. If the process of computing the per-row statistics consumes significant CPU time, then maybe there's some point in trying to instrument it, but does it? Unless you know the answer to that question, you don't know whether there's even a point to trying to instrument this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 2017/03/23 15:04, Haribabu Kommi wrote: On Wed, Mar 22, 2017 at 8:11 PM, vinayak> wrote: On 2017/03/21 21:25, Haribabu Kommi wrote: On Tue, Mar 21, 2017 at 3:41 PM, vinayak > wrote: Thank you for testing the patch on Windows platform. Thanks for the updated patch. It works good for a normal relation. But for a relation that contains child tables, the PROGRESS_ANALYZE_NUM_ROWS_SAMPLED produces wrong results. Thank you for reviewing the patch. The attached patch implements a way to report sample rows count from acquire_sample_rows() even if called for child tables. How about adding another phase called PROGRESS_ANALYZE_PHASE_COLLECT_INHERIT_SAMPLE_ROWS and set this phase only when it is an inheritance analyze operation. And adding some explanation of ROWS_SAMPLED phase about inheritance tables how these sampled rows are calculated will provide good analyze progress of relation that contains child relations also. I have added the phase called PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS. I have also updated the documentation. Thanks for the updated patch. I will check it. The ANALYZE command takes long time in computing statistics phase.So I think we can add some column or phase so that user can easily understand the progress. How about adding new column like "num_rows_processed" will compute the statistics of specified column? I prefer a column with rows processed instead of a phase. Because we already set the phase of compute stats and showing the progress there will number of rows processed will be good. How about separate the computing "inheritance statistics" phase from the computing regular "single table" statistics. Comment? Yes, this will be good to show both that states of inheritance of sampled rows and compute inheritance stats, so that it will be clearly visible to the user the current status. I have updated the patch. I have added column "num_rows_processed" and phase "computing inherited statistics" in the view. =# \d+ pg_stat_progress_analyze View "pg_catalog.pg_stat_progress_analyze" Column | Type | Collation | Nullable | Default | Storage | Description +-+---+--+-+--+- pid| integer | | | | plain| datid | oid | | | | plain| datname| name| | | | plain| relid | oid | | | | plain| phase | text| | | | extended | num_target_sample_rows | bigint | | | | plain| num_rows_sampled | bigint | | | | plain| num_rows_processed | bigint | | | | plain| View definition: SELECT s.pid, s.datid, d.datname, s.relid, CASE s.param1 WHEN 0 THEN 'initializing'::text WHEN 1 THEN 'collecting sample rows'::text WHEN 2 THEN 'collecting inherited sample rows'::text WHEN 3 THEN 'computing statistics'::text WHEN 4 THEN 'computing inherited statistics'::text ELSE NULL::text END AS phase, s.param2 AS num_target_sample_rows, s.param3 AS num_rows_sampled, s.param4 AS num_rows_processed FROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10) LEFT JOIN pg_database d ON s.datid = d.oid; Regards, Vinayak Pokale NTT Open Source Software Center pg_stat_progress_analyze_v6.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On Wed, Mar 22, 2017 at 8:11 PM, vinayakwrote: > > On 2017/03/21 21:25, Haribabu Kommi wrote: > > > > On Tue, Mar 21, 2017 at 3:41 PM, vinayak > wrote: > >> Thank you for testing the patch on Windows platform. >> >> > Thanks for the updated patch. > > It works good for a normal relation. But for a relation that contains > child tables, > the PROGRESS_ANALYZE_NUM_ROWS_SAMPLED produces wrong results. > > Thank you for reviewing the patch. > The attached patch implements a way to report sample rows count from > acquire_sample_rows() even if called for child tables. > > How about adding another phase called PROGRESS_ANALYZE_PHASE_ > COLLECT_INHERIT_SAMPLE_ROWS > and set this phase only when it is an inheritance analyze operation. And > adding > some explanation of ROWS_SAMPLED phase about inheritance tables > how these sampled rows are calculated will provide good analyze progress of > relation that contains child relations also. > > I have added the phase called PROGRESS_ANALYZE_PHASE_ > COLLECT_INH_SAMPLE_ROWS. > I have also updated the documentation. > Thanks for the updated patch. I will check it. > The ANALYZE command takes long time in computing statistics phase.So I > think we can add some column or phase so that user can easily understand > the progress. > How about adding new column like "num_rows_processed" will compute the > statistics of specified column? > I prefer a column with rows processed instead of a phase. Because we already set the phase of compute stats and showing the progress there will number of rows processed will be good. > How about separate the computing "inheritance statistics" phase from the > computing regular "single table" statistics. > Comment? > Yes, this will be good to show both that states of inheritance of sampled rows and compute inheritance stats, so that it will be clearly visible to the user the current status. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] ANALYZE command progress checker
On 2017/03/21 21:25, Haribabu Kommi wrote: On Tue, Mar 21, 2017 at 3:41 PM, vinayak> wrote: Thank you for testing the patch on Windows platform. Thanks for the updated patch. It works good for a normal relation. But for a relation that contains child tables, the PROGRESS_ANALYZE_NUM_ROWS_SAMPLED produces wrong results. Thank you for reviewing the patch. The attached patch implements a way to report sample rows count from acquire_sample_rows() even if called for child tables. How about adding another phase called PROGRESS_ANALYZE_PHASE_COLLECT_INHERIT_SAMPLE_ROWS and set this phase only when it is an inheritance analyze operation. And adding some explanation of ROWS_SAMPLED phase about inheritance tables how these sampled rows are calculated will provide good analyze progress of relation that contains child relations also. I have added the phase called PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS. I have also updated the documentation. The ANALYZE command takes long time in computing statistics phase.So I think we can add some column or phase so that user can easily understand the progress. How about adding new column like "num_rows_processed" will compute the statistics of specified column? How about separate the computing "inheritance statistics" phase from the computing regular "single table" statistics. Comment? Regards, Vinayak Pokale NTT Open Source Software Center pg_stat_progress_analyze_v5.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On Tue, Mar 21, 2017 at 3:41 PM, vinayakwrote: > Thank you for testing the patch on Windows platform. > > Thanks for the updated patch. It works good for a normal relation. But for a relation that contains child tables, the PROGRESS_ANALYZE_NUM_ROWS_SAMPLED produces wrong results. How about adding another phase called PROGRESS_ANALYZE_PHASE_COLLECT_INHERIT_SAMPLE_ROWS and set this phase only when it is an inheritance analyze operation. And adding some explanation of ROWS_SAMPLED phase about inheritance tables how these sampled rows are calculated will provide good analyze progress of relation that contains child relations also. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] ANALYZE command progress checker
Hi Ashutosh, On 2017/03/19 17:56, Ashutosh Sharma wrote: Hi, I didn't find any major issues with the patch. It works as expected. However, I have few minor comments that I would like to share, + + Total number of sample rows. The sample it reads is taken randomly. + Its size depends on the default_statistics_target parameter value. + 1) I think it would be better if you could specify reference link to the guc variable 'default_statistics_target'. Something like this, . If you go through monitoring.sgml, you would find that there is reference link to all guc variables being used. +1. Updated in the attached patch. 2) I feel that the 'computing_statistics' phase could have been splitted into 'computing_column_stats', 'computing_index_stats' Please let me know your thoughts on this. Initially I have spitted this phase as 'computing heap stats' and 'computing index stats' but I agreed with Roberts suggestion to merge two phases into one as 'computing statistics'. + certain commands during command execution. Currently, the command + which supports progress reporting is VACUUM and ANALYZE. This may be expanded in the future. 3) I think above needs to be rephrased. Something like...Currently, the supported progress reporting commands are 'VACUUM' and 'ANALYZE'. +1. Updated in the attached patch. Moreover, I also ran your patch on Windows platform and didn't find any issues. Thanks. Thank you for testing the patch on Windows platform. Regards, Vinayak Pokale NTT Open Source Software Center pg_stat_progress_analyze_v4.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
Hi Ashutosh, Thank you for reviewing the patch. On 2017/03/18 21:00, Ashutosh Sharma wrote: Hi Vinayak, Here are couple of review comments that may need your attention. 1. Firstly, I am seeing some trailing whitespace errors when trying to apply your v3 patch using git apply command. [ashu@localhost postgresql]$ git apply pg_stat_progress_analyze_v3.patch pg_stat_progress_analyze_v3.patch:155: trailing whitespace. CREATE VIEW pg_stat_progress_analyze AS pg_stat_progress_analyze_v3.patch:156: trailing whitespace. SELECT pg_stat_progress_analyze_v3.patch:157: trailing whitespace. S.pid AS pid, S.datid AS datid, D.datname AS datname, pg_stat_progress_analyze_v3.patch:158: trailing whitespace. S.relid AS relid, pg_stat_progress_analyze_v3.patch:159: trailing whitespace. CASE S.param1 WHEN 0 THEN 'initializing' error: patch failed: doc/src/sgml/monitoring.sgml:521 I have tried to apply patch using "git apply" and it works fine in my environment. I use below command to apply the patch. patch -p1 < pg_stat_progress_analyze_v3.patch 2) The test-case 'rules' is failing. I think it's failing because in rules.sql 'ORDERBY viewname' is used which will put 'pg_stat_progress_analyze' ahead of 'pg_stat_progress_vacuum' in the list of catalog views. You may need to correct rules.out file accordingly. This is what i could see in rules.sql, SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schema' ORDER BY viewname; I am still reviewing your patch and doing some testing. Will update if i find any issues. Thanks. Understood. I have fixed it. Regards, Vinayak Pokale NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
Hi, I didn't find any major issues with the patch. It works as expected. However, I have few minor comments that I would like to share, + + Total number of sample rows. The sample it reads is taken randomly. + Its size depends on the default_statistics_target parameter value. + 1) I think it would be better if you could specify reference link to the guc variable 'default_statistics_target'. Something like this, . If you go through monitoring.sgml, you would find that there is reference link to all guc variables being used. 2) I feel that the 'computing_statistics' phase could have been splitted into 'computing_column_stats', 'computing_index_stats' Please let me know your thoughts on this. + certain commands during command execution. Currently, the command + which supports progress reporting is VACUUM and ANALYZE. This may be expanded in the future. 3) I think above needs to be rephrased. Something like...Currently, the supported progress reporting commands are 'VACUUM' and 'ANALYZE'. Moreover, I also ran your patch on Windows platform and didn't find any issues. Thanks. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com On Sat, Mar 18, 2017 at 5:30 PM, Ashutosh Sharmawrote: > Hi Vinayak, > > Here are couple of review comments that may need your attention. > > 1. Firstly, I am seeing some trailing whitespace errors when trying to > apply your v3 patch using git apply command. > > [ashu@localhost postgresql]$ git apply pg_stat_progress_analyze_v3.patch > pg_stat_progress_analyze_v3.patch:155: trailing whitespace. > CREATE VIEW pg_stat_progress_analyze AS > pg_stat_progress_analyze_v3.patch:156: trailing whitespace. > SELECT > pg_stat_progress_analyze_v3.patch:157: trailing whitespace. > S.pid AS pid, S.datid AS datid, D.datname AS datname, > pg_stat_progress_analyze_v3.patch:158: trailing whitespace. > S.relid AS relid, > pg_stat_progress_analyze_v3.patch:159: trailing whitespace. > CASE S.param1 WHEN 0 THEN 'initializing' > error: patch failed: doc/src/sgml/monitoring.sgml:521 > > 2) The test-case 'rules' is failing. I think it's failing because in > rules.sql 'ORDERBY viewname' is used which will put > 'pg_stat_progress_analyze' ahead of 'pg_stat_progress_vacuum' in the > list of catalog views. You may need to correct rules.out file > accordingly. This is what i could see in rules.sql, > > SELECT viewname, definition FROM pg_views WHERE schemaname <> > 'information_schema' ORDER BY viewname; > > I am still reviewing your patch and doing some testing. Will update if > i find any issues. Thanks. > > -- > With Regards, > Ashutosh Sharma > EnterpriseDB:http://www.enterprisedb.com > > On Fri, Mar 17, 2017 at 3:16 PM, vinayak > wrote: >> >> On 2017/03/17 10:38, Robert Haas wrote: >>> >>> On Fri, Mar 10, 2017 at 2:46 AM, vinayak >>> wrote: Thank you for reviewing the patch. The attached patch incorporated Michael and Amit comments also. >>> >>> I reviewed this tonight. >> >> Thank you for reviewing the patch. >>> >>> +/* Report compute index stats phase */ >>> +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, >>> + >>> PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS); >>> >>> Hmm, is there really any point to this? And is it even accurate? It >>> doesn't look to me like we are computing any index statistics here; >>> we're only allocating a few in-memory data structures here, I think, >>> which is not a statistics computation and probably doesn't take any >>> significant time. >>> >>> +/* Report compute heap stats phase */ >>> +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, >>> + >>> PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS); >>> >>> The phase you label as computing heap statistics also includes the >>> computation of index statistics. I wouldn't bother separating the >>> computation of heap and index statistics; I'd just have a "compute >>> statistics" phase that begins right after the comment that starts >>> "Compute the statistics." >> >> Understood. Fixed in the attached patch. >>> >>> >>> +/* Report that we are now doing index cleanup */ >>> +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, >>> +PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP); >>> >>> OK, this doesn't make any sense either. We are not cleaning up the >>> indexes here. We are just closing them and releasing resources. I >>> don't see why you need this phase at all; it can't last more than some >>> tiny fraction of a second. It seems like you're trying to copy the >>> exact same phases that exist for vacuum instead of thinking about what >>> makes sense for ANALYZE. >> >> Understood. I have removed this phase. >>> >>> >>> +/* Report number of heap blocks scaned so far*/ >>> +pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED,
Re: [HACKERS] ANALYZE command progress checker
Hi Vinayak, Here are couple of review comments that may need your attention. 1. Firstly, I am seeing some trailing whitespace errors when trying to apply your v3 patch using git apply command. [ashu@localhost postgresql]$ git apply pg_stat_progress_analyze_v3.patch pg_stat_progress_analyze_v3.patch:155: trailing whitespace. CREATE VIEW pg_stat_progress_analyze AS pg_stat_progress_analyze_v3.patch:156: trailing whitespace. SELECT pg_stat_progress_analyze_v3.patch:157: trailing whitespace. S.pid AS pid, S.datid AS datid, D.datname AS datname, pg_stat_progress_analyze_v3.patch:158: trailing whitespace. S.relid AS relid, pg_stat_progress_analyze_v3.patch:159: trailing whitespace. CASE S.param1 WHEN 0 THEN 'initializing' error: patch failed: doc/src/sgml/monitoring.sgml:521 2) The test-case 'rules' is failing. I think it's failing because in rules.sql 'ORDERBY viewname' is used which will put 'pg_stat_progress_analyze' ahead of 'pg_stat_progress_vacuum' in the list of catalog views. You may need to correct rules.out file accordingly. This is what i could see in rules.sql, SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schema' ORDER BY viewname; I am still reviewing your patch and doing some testing. Will update if i find any issues. Thanks. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com On Fri, Mar 17, 2017 at 3:16 PM, vinayakwrote: > > On 2017/03/17 10:38, Robert Haas wrote: >> >> On Fri, Mar 10, 2017 at 2:46 AM, vinayak >> wrote: >>> >>> Thank you for reviewing the patch. >>> >>> The attached patch incorporated Michael and Amit comments also. >> >> I reviewed this tonight. > > Thank you for reviewing the patch. >> >> +/* Report compute index stats phase */ >> +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, >> + >> PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS); >> >> Hmm, is there really any point to this? And is it even accurate? It >> doesn't look to me like we are computing any index statistics here; >> we're only allocating a few in-memory data structures here, I think, >> which is not a statistics computation and probably doesn't take any >> significant time. >> >> +/* Report compute heap stats phase */ >> +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, >> + >> PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS); >> >> The phase you label as computing heap statistics also includes the >> computation of index statistics. I wouldn't bother separating the >> computation of heap and index statistics; I'd just have a "compute >> statistics" phase that begins right after the comment that starts >> "Compute the statistics." > > Understood. Fixed in the attached patch. >> >> >> +/* Report that we are now doing index cleanup */ >> +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, >> +PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP); >> >> OK, this doesn't make any sense either. We are not cleaning up the >> indexes here. We are just closing them and releasing resources. I >> don't see why you need this phase at all; it can't last more than some >> tiny fraction of a second. It seems like you're trying to copy the >> exact same phases that exist for vacuum instead of thinking about what >> makes sense for ANALYZE. > > Understood. I have removed this phase. >> >> >> +/* Report number of heap blocks scaned so far*/ >> +pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED, >> targblock); >> >> While I don't think this is necessarily a bad thing to report, I find >> it very surprising that you're not reporting what seems to me like the >> most useful thing here - namely, the number of blocks that will be >> sampled in total and the number of those that you've selected. >> Instead, you're just reporting the length of the relation and the >> last-sampled block, which is a less-accurate guide to total progress. >> There are enough counters to report both things, so maybe we should. >> >> +/* Report total number of sample rows*/ >> +pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS, >> numrows); >> >> As an alternative to the previous paragraph, yet another thing you >> could report is number of rows sampled out of total rows to be >> sampled. But this isn't the way to do it: you are reporting the >> number of rows you sampled only once you've finished sampling. The >> point of progress reporting is to report progress -- that is, to >> report this value as it's updated, not just to report it when ANALYZE >> is practically done and the value has reached its maximum. > > Understood. > I have reported number of rows sampled out of total rows to be sampled. > I have not reported the number of blocks that will be sampled in total. > So currect pg_stat_progress_analyze view looks like: > > =# \d+ pg_stat_progress_analyze >
Re: [HACKERS] ANALYZE command progress checker
On 2017/03/17 10:38, Robert Haas wrote: On Fri, Mar 10, 2017 at 2:46 AM, vinayakwrote: Thank you for reviewing the patch. The attached patch incorporated Michael and Amit comments also. I reviewed this tonight. Thank you for reviewing the patch. +/* Report compute index stats phase */ +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS); Hmm, is there really any point to this? And is it even accurate? It doesn't look to me like we are computing any index statistics here; we're only allocating a few in-memory data structures here, I think, which is not a statistics computation and probably doesn't take any significant time. +/* Report compute heap stats phase */ +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, +PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS); The phase you label as computing heap statistics also includes the computation of index statistics. I wouldn't bother separating the computation of heap and index statistics; I'd just have a "compute statistics" phase that begins right after the comment that starts "Compute the statistics." Understood. Fixed in the attached patch. +/* Report that we are now doing index cleanup */ +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, +PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP); OK, this doesn't make any sense either. We are not cleaning up the indexes here. We are just closing them and releasing resources. I don't see why you need this phase at all; it can't last more than some tiny fraction of a second. It seems like you're trying to copy the exact same phases that exist for vacuum instead of thinking about what makes sense for ANALYZE. Understood. I have removed this phase. +/* Report number of heap blocks scaned so far*/ +pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED, targblock); While I don't think this is necessarily a bad thing to report, I find it very surprising that you're not reporting what seems to me like the most useful thing here - namely, the number of blocks that will be sampled in total and the number of those that you've selected. Instead, you're just reporting the length of the relation and the last-sampled block, which is a less-accurate guide to total progress. There are enough counters to report both things, so maybe we should. +/* Report total number of sample rows*/ +pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS, numrows); As an alternative to the previous paragraph, yet another thing you could report is number of rows sampled out of total rows to be sampled. But this isn't the way to do it: you are reporting the number of rows you sampled only once you've finished sampling. The point of progress reporting is to report progress -- that is, to report this value as it's updated, not just to report it when ANALYZE is practically done and the value has reached its maximum. Understood. I have reported number of rows sampled out of total rows to be sampled. I have not reported the number of blocks that will be sampled in total. So currect pg_stat_progress_analyze view looks like: =# \d+ pg_stat_progress_analyze View "pg_catalog.pg_stat_progress_analyze" Column | Type | Collation | Nullable | Default | Storage | Descripti on +-+---+--+-+--+-- --- pid| integer | | | | plain| datid | oid | | | | plain| datname| name| | | | plain| relid | oid | | | | plain| phase | text| | | | extended | num_target_sample_rows | bigint | | | | plain| num_rows_sampled | bigint | | | | plain| View definition: SELECT s.pid, s.datid, d.datname, s.relid, CASE s.param1 WHEN 0 THEN 'initializing'::text WHEN 1 THEN 'collecting sample rows'::text WHEN 2 THEN 'computing statistics'::text ELSE NULL::text END AS phase, s.param2 AS num_target_sample_rows, s.param3 AS num_rows_sampled FROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, p aram3, param4, param5, param6, param7, param8, param9, param10) LEFT JOIN pg_database d ON s.datid = d.oid; Comment? The documentation for this patch isn't very good, either. You forgot to update the part of the documentation that says that VACUUM is the only command that currently supports progress reporting, and the descriptions of the phases and progress counters are brief and not
Re: [HACKERS] ANALYZE command progress checker
On Fri, Mar 10, 2017 at 2:46 AM, vinayakwrote: > Thank you for reviewing the patch. > > The attached patch incorporated Michael and Amit comments also. I reviewed this tonight. +/* Report compute index stats phase */ +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS); Hmm, is there really any point to this? And is it even accurate? It doesn't look to me like we are computing any index statistics here; we're only allocating a few in-memory data structures here, I think, which is not a statistics computation and probably doesn't take any significant time. +/* Report compute heap stats phase */ +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, +PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS); The phase you label as computing heap statistics also includes the computation of index statistics. I wouldn't bother separating the computation of heap and index statistics; I'd just have a "compute statistics" phase that begins right after the comment that starts "Compute the statistics." +/* Report that we are now doing index cleanup */ +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, +PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP); OK, this doesn't make any sense either. We are not cleaning up the indexes here. We are just closing them and releasing resources. I don't see why you need this phase at all; it can't last more than some tiny fraction of a second. It seems like you're trying to copy the exact same phases that exist for vacuum instead of thinking about what makes sense for ANALYZE. +/* Report number of heap blocks scaned so far*/ +pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED, targblock); While I don't think this is necessarily a bad thing to report, I find it very surprising that you're not reporting what seems to me like the most useful thing here - namely, the number of blocks that will be sampled in total and the number of those that you've selected. Instead, you're just reporting the length of the relation and the last-sampled block, which is a less-accurate guide to total progress. There are enough counters to report both things, so maybe we should. +/* Report total number of sample rows*/ +pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS, numrows); As an alternative to the previous paragraph, yet another thing you could report is number of rows sampled out of total rows to be sampled. But this isn't the way to do it: you are reporting the number of rows you sampled only once you've finished sampling. The point of progress reporting is to report progress -- that is, to report this value as it's updated, not just to report it when ANALYZE is practically done and the value has reached its maximum. The documentation for this patch isn't very good, either. You forgot to update the part of the documentation that says that VACUUM is the only command that currently supports progress reporting, and the descriptions of the phases and progress counters are brief and not entirely accurate - e.g. heap_blks_scanned is not actually the number of heap blocks scanned, because we are sampling, not reading all the blocks. When adding calls to the progress reporting functions, please consider whether a blank line should be added before or after the new code, or both, or neither. I'd say some blank lines are needed in a few places where you didn't add them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On Fri, Mar 10, 2017 at 6:46 PM, vinayakwrote: > > + /* Report total number of heap blocks and collectinf sample row phase*/ > + initprog_val[0] = PROGRESS_ANALYZE_PHASE_COLLECT_HEAP_SAMPLE_ROWS; > + initprog_val[1] = totalblocks; > + pgstat_progress_update_multi_param(2, initprog_index, initprog_val); > > acquire_sample_rows function is called from acquire_inherited_sample_rows > function, so adding the phase in that function will provide wrong info. > > I agree with you. > > > +#define PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS 2 > > why there is no code added for the phase, any specific reason? > > I am thinking how to report this phase. Do you have any suggestion? > Thanks for the update patch. Actually the analyze operation is done in two stages for the relation that contains child relations, 1. For parent relation 2. All child relations So the progress with start each time for the above two stages. This needs to clearly mentioned in the docs. The acquire_sample_rows function collects statistics of the relation that is provided, updating the analyze details like number of rows and etc works fine for a single relation, but if it contains many child relations, the data will be misleading. Apart from the above, some more comments. + /* Report compute index stats phase */ + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS); The above block of the code should be set when it actually doing the compute_index_stats. + /* Report that we are now doing index cleanup */ + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP); The above code block should be inside if (!(options & VACOPT_VACUUM)) block. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] ANALYZE command progress checker
On Fri, Mar 10, 2017 at 2:57 PM, Jim Nasbywrote: > Oh, I wasn't suggesting a single SRF for everything. Hopefully users will > eventually figure out a good formula to drive a "progress bar" for each type > of monitor, which is what you really want anyway (at least 99% of the time). > If we got there we could have a single view that gave the % complete for > every command that was providing feedback. If someone wanted details they > could hit the individual SRF. So, the point, at least with the VACUUM progress indicator and perhaps here also, is that we really can't just give a single measure of progress -- we are 58.32% done. That's almost impossible to estimate accurately. For example, consider VACUUM. maintenance_work_mem is 69% used, and we are 74% done vacuuming the table. What percentage done are we? Well, it depends. If the latter part of the table has exactly the same density of dead tuples we've already witnessed, we will need only one index vac pass, but there is a good chance that there's more activity toward the tail end of the table and we will need two index vac passes. If the table has several large indexes, that's a big difference, so if we guess wrong about whether we're going to run out of memory, we will be way off in estimating overall progress. That's why the vacuum progress checker reports the facts we actually know -- like how many blocks of the relation we've scanned, and how many dead tuples we've accumulated so far, and how many dead tuples we are able to store before triggering index vacuuming -- rather than just a percentage. You can try to derive a percentage from that if you want, and you can use any formula you like to derive that, but what the system reports are straight-up facts, not predictions. I don't really want to get into the prediction business. The in-core progress reporting structure can spit out enough data for people to write queries or tools that attempt to predict stuff, and *maybe* someday somebody will write one of those tools that is enough unlike https://xkcd.com/612/ that we'll feel moved to put it in core. But my guess is most likely not. It's easy to get the easy cases right in such a prediction tool, but AFAICS getting the hard cases right requires a crystal ball. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 3/10/17 1:06 PM, Andres Freund wrote: Hi, On 2017-03-10 02:11:18 -0600, Jim Nasby wrote: Perhaps instead of adding more clutter to \dvS we could just have a SRF for now. I don't see that as clutter, it's useful information, and keeping it discoverable is good, not bad. If we keep adding status reporting commands at some point it's going to get unwieldy. Though, if they were in their own schema... At over 2800 rows currently, you're not going to notice one more addition to \dfS. I think it's hard to design a good SRF for this. Because the fields for different types of progress are different / empty, you can't just trivially return them as rows. You'd have to do some EAV like 'command, field_name1, field_value1, ...' type of thing - not particularly pretty / easy to use. Oh, I wasn't suggesting a single SRF for everything. Hopefully users will eventually figure out a good formula to drive a "progress bar" for each type of monitor, which is what you really want anyway (at least 99% of the time). If we got there we could have a single view that gave the % complete for every command that was providing feedback. If someone wanted details they could hit the individual SRF. -- Jim Nasby, Chief Data Architect, OpenSCG http://OpenSCG.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
Hi, On 2017-03-10 02:11:18 -0600, Jim Nasby wrote: > Perhaps instead of adding more clutter to \dvS we could just have a SRF for > now. I don't see that as clutter, it's useful information, and keeping it discoverable is good, not bad. > At over 2800 rows currently, you're not going to notice one more > addition to \dfS. I think it's hard to design a good SRF for this. Because the fields for different types of progress are different / empty, you can't just trivially return them as rows. You'd have to do some EAV like 'command, field_name1, field_value1, ...' type of thing - not particularly pretty / easy to use. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 3/6/17 12:49 AM, Michael Paquier wrote: On Sat, Mar 4, 2017 at 5:33 AM, David Steelewrote: I think the idea of a general progress view is very valuable and there are a ton of operations it could be used for: full table scans, index rebuilds, vacuum, copy, etc. However, I feel that this proposal is not flexible enough and comes too late in the release cycle to allow development into something that could be committed. Well, each command really has its own requirements in terms of data to store, so we either finish with a bunch of small tables that anyone could query and join as they wish or a somewhat unique table that is bloated with all the information, with a set of views on top of it to query all the information. For extensibility's sake of each command (for example imagine that REINDEX could be extended with a CONCURRENTLY option and multiple phases), I would think that having a table per command type would not be that bad. Well, the ideal scenario is that someone uses the raw data to come up with a good way to just provide ye olde 0-100% progress bar. At that point a single view would do the trick. Perhaps instead of adding more clutter to \dvS we could just have a SRF for now. At over 2800 rows currently, you're not going to notice one more addition to \dfS. -- Jim Nasby, Chief Data Architect, OpenSCG http://OpenSCG.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
Thank you for reviewing the patch. The attached patch incorporated Michael and Amit comments also. On 2017/03/07 15:45, Haribabu Kommi wrote: On Tue, Mar 7, 2017 at 5:01 PM, Michael Paquier> wrote: @@ -496,7 +499,6 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params, numrows = (*acquirefunc) (onerel, elevel, rows, targrows, , ); - /* Useless diff. Fixed. + + ANALYZE is currently collecting the sample rows. + The sample it reads is taken randomly.Its size depends on + the default_statistics_target parameter value. + This should use a markup for default_statistics_target. Fixed. @@ -203,6 +204,8 @@ analyze_rel(Oid relid, RangeVar *relation, int options, if (onerel->rd_rel->relkind == RELKIND_RELATION || onerel->rd_rel->relkind == RELKIND_MATVIEW) { + pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE, + RelationGetRelid(onerel)); It seems to me that the report should begin in do_analyze_rel(). Fixed. some more comments, +/* Report compute heap stats phase */ +pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, +PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS); The above analyze phase is updated inside a for loop, instead just set it above once. Fixed. + /* Report compute index stats phase */ + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS); Irrespective of whether the index exists on the table or not, the above analyze phase is set. It is better to set the above phase and index cleanup phase only when there are indexes on the table. Agreed. Fixed. +/* Report total number of heap blocks and collectinf sample row phase*/ Typo? collecting? Fixed. +/* Report total number of heap blocks and collectinf sample row phase*/ +initprog_val[0] = PROGRESS_ANALYZE_PHASE_COLLECT_HEAP_SAMPLE_ROWS; +initprog_val[1] = totalblocks; +pgstat_progress_update_multi_param(2, initprog_index, initprog_val); acquire_sample_rows function is called from acquire_inherited_sample_rows function, so adding the phase in that function will provide wrong info. I agree with you. +#define PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS2 why there is no code added for the phase, any specific reason? I am thinking how to report this phase. Do you have any suggestion? +/* Phases of analyze */ Can be written as following for better understanding, and also similar like vacuum. /* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */ Done. Regards, Vinayak Pokale NTT Open Source Software Center pg_stat_progress_analyze_v2.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On Wed, Mar 1, 2017 at 1:25 PM, Andres Freundwrote: > On 2017-03-01 10:20:41 -0800, David Fetter wrote: >> On Wed, Mar 01, 2017 at 09:45:40AM -0500, Peter Eisentraut wrote: >> > On 2/28/17 04:24, vinayak wrote: >> > > The view provides the information of analyze command progress details as >> > > follows >> > > postgres=# \d pg_stat_progress_analyze >> > >View "pg_catalog.pg_stat_progress_analyze" >> > >> > Hmm, do we want a separate "progress" system view for every kind of >> > command? What if someone comes up with a progress checker for CREATE >> > INDEX, REINDEX, CLUSTER, etc.? > > I don't think that'd be that bad, otherwise the naming of the fields is > complicated. +1. I suppose if it gets really out of control we might have to rethink, but 2 is not 50, and having appropriate column names is worth a lot. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On Tue, Mar 7, 2017 at 5:01 PM, Michael Paquierwrote: > > @@ -496,7 +499,6 @@ do_analyze_rel(Relation onerel, int options, > VacuumParams *params, > numrows = (*acquirefunc) (onerel, elevel, > rows, targrows, > , ); > - > /* > Useless diff. > > + > + ANALYZE is currently collecting the sample rows. > + The sample it reads is taken randomly.Its size depends on > + the default_statistics_target parameter value. > + > This should use a markup for default_statistics_target. > > @@ -203,6 +204,8 @@ analyze_rel(Oid relid, RangeVar *relation, int options, > if (onerel->rd_rel->relkind == RELKIND_RELATION || > onerel->rd_rel->relkind == RELKIND_MATVIEW) > { > + pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE, > + RelationGetRelid(onerel)); > It seems to me that the report should begin in do_analyze_rel(). some more comments, + /* Report compute heap stats phase */ + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS); The above analyze phase is updated inside a for loop, instead just set it above once. + /* Report compute index stats phase */ + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS); Irrespective of whether the index exists on the table or not, the above analyze phase is set. It is better to set the above phase and index cleanup phase only when there are indexes on the table. + /* Report total number of heap blocks and collectinf sample row phase*/ Typo? collecting? + /* Report total number of heap blocks and collectinf sample row phase*/ + initprog_val[0] = PROGRESS_ANALYZE_PHASE_COLLECT_HEAP_SAMPLE_ROWS; + initprog_val[1] = totalblocks; + pgstat_progress_update_multi_param(2, initprog_index, initprog_val); acquire_sample_rows function is called from acquire_inherited_sample_rows function, so adding the phase in that function will provide wrong info. +#define PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS 2 why there is no code added for the phase, any specific reason? +/* Phases of analyze */ Can be written as following for better understanding, and also similar like vacuum. /* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */ Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] ANALYZE command progress checker
On Mon, Mar 6, 2017 at 3:58 PM, Andres Freundwrote: > On 2017-03-03 15:33:15 -0500, David Steele wrote: >> On 3/1/17 1:25 PM, Andres Freund wrote: >> > On 2017-03-01 10:20:41 -0800, David Fetter wrote: >> >> On Wed, Mar 01, 2017 at 09:45:40AM -0500, Peter Eisentraut wrote: >> >>> On 2/28/17 04:24, vinayak wrote: >> The view provides the information of analyze command progress details as >> follows >> postgres=# \d pg_stat_progress_analyze >> View "pg_catalog.pg_stat_progress_analyze" >> >>> >> >>> Hmm, do we want a separate "progress" system view for every kind of >> >>> command? What if someone comes up with a progress checker for CREATE >> >>> INDEX, REINDEX, CLUSTER, etc.? >> > >> > I don't think that'd be that bad, otherwise the naming of the fields is >> > complicated. I guess the alternative (or do both?) would be to to have >> > a pivoted table, but that'd painful to query. Do you have a better idea? >> > >> > >> >> Some kind of design for progress seems like a good plan. Some ideas: >> >> >> >> - System view(s) >> >> >> >> This has the advantage of being shown to work at least to a PoC by >> >> this patch, and is similar to extant system views like >> >> pg_stat_activity in the sense of capturing a moment in time. >> >> >> >> - NOTIFY >> >> >> >> Event-driven model as opposed to a polling one. This is >> >> attractive on efficiency grounds, less so on reliability ones. >> >> >> >> - Something added to the wire protocol >> >> >> >> More specialized, limits the information to the session where the >> >> command was issued >> >> >> >> - Other things not named here >> > >> > We now have a framework for this [1] (currently used by vacuum, but >> > extensible). The question is about presentation. I'm fairly sure that >> > we shouldn't just add yet another framework, and I doubt that that's >> > what's proposed by Peter. >> >> I think the idea of a general progress view is very valuable and there >> are a ton of operations it could be used for: full table scans, index >> rebuilds, vacuum, copy, etc. >> However, I feel that this proposal is not flexible enough and comes too >> late in the release cycle to allow development into something that could >> be committed. > > I'm not following. As I pointed out, we already have this framework? > This patch is just a short one using that framework? > > >> I propose we move this to the 2017-07 CF so the idea can be more fully >> developed. > > I don't see that being warranted in this case, we're really not talking > about something complicated: > doc/src/sgml/monitoring.sgml | 135 > +++ > src/backend/catalog/system_views.sql | 16 > src/backend/commands/analyze.c | 34 > src/backend/utils/adt/pgstatfuncs.c |2 > src/include/commands/progress.h | 13 +++ > src/include/pgstat.h |3 > src/test/regress/expected/rules.out | 18 > 7 files changed, 219 insertions(+), 2 deletions(-) > excepting tests and docs, this is very little actual code. Or 35 lines just for the backend portion, it is hard to something smaller. @@ -496,7 +499,6 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params, numrows = (*acquirefunc) (onerel, elevel, rows, targrows, , ); - /* Useless diff. + + ANALYZE is currently collecting the sample rows. + The sample it reads is taken randomly.Its size depends on + the default_statistics_target parameter value. + This should use a markup for default_statistics_target. @@ -203,6 +204,8 @@ analyze_rel(Oid relid, RangeVar *relation, int options, if (onerel->rd_rel->relkind == RELKIND_RELATION || onerel->rd_rel->relkind == RELKIND_MATVIEW) { + pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE, + RelationGetRelid(onerel)); It seems to me that the report should begin in do_analyze_rel(). -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 3/6/17 1:58 AM, Andres Freund wrote: > On 2017-03-03 15:33:15 -0500, David Steele wrote: > >> I propose we move this to the 2017-07 CF so the idea can be more fully >> developed. > > I don't see that being warranted in this case, we're really not talking > about something complicated: <...> > excepting tests and docs, this is very little actual code. Fair enough. From my read through it appeared a redesign was required/requested. -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 2017/03/06 17:02, Amit Langote wrote: Hi Vinayak, On 2017/02/28 18:24, vinayak wrote: The attached patch reports the different phases of analyze command. Added this patch to CF 2017-03. In the updated monitoring.sgml: + + computing heap stats + + VACUUM is currently computing heap stats. + + + + computing index stats + + VACUUM is currently computing index stats. + + + + cleaning up indexes + + ANALYZE is currently cleaning up indexes. + + + + + The entries mentioning VACUUM should actually say ANALYZE. Yes. Thank you. I will fix it. Regards, Vinayak Pokale NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
Hi Vinayak, On 2017/02/28 18:24, vinayak wrote: > The attached patch reports the different phases of analyze command. > Added this patch to CF 2017-03. In the updated monitoring.sgml: + + computing heap stats + + VACUUM is currently computing heap stats. + + + + computing index stats + + VACUUM is currently computing index stats. + + + + cleaning up indexes + + ANALYZE is currently cleaning up indexes. + + + + + The entries mentioning VACUUM should actually say ANALYZE. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 2017-03-03 15:33:15 -0500, David Steele wrote: > On 3/1/17 1:25 PM, Andres Freund wrote: > > On 2017-03-01 10:20:41 -0800, David Fetter wrote: > >> On Wed, Mar 01, 2017 at 09:45:40AM -0500, Peter Eisentraut wrote: > >>> On 2/28/17 04:24, vinayak wrote: > The view provides the information of analyze command progress details as > follows > postgres=# \d pg_stat_progress_analyze > View "pg_catalog.pg_stat_progress_analyze" > >>> > >>> Hmm, do we want a separate "progress" system view for every kind of > >>> command? What if someone comes up with a progress checker for CREATE > >>> INDEX, REINDEX, CLUSTER, etc.? > > > > I don't think that'd be that bad, otherwise the naming of the fields is > > complicated. I guess the alternative (or do both?) would be to to have > > a pivoted table, but that'd painful to query. Do you have a better idea? > > > > > >> Some kind of design for progress seems like a good plan. Some ideas: > >> > >> - System view(s) > >> > >> This has the advantage of being shown to work at least to a PoC by > >> this patch, and is similar to extant system views like > >> pg_stat_activity in the sense of capturing a moment in time. > >> > >> - NOTIFY > >> > >> Event-driven model as opposed to a polling one. This is > >> attractive on efficiency grounds, less so on reliability ones. > >> > >> - Something added to the wire protocol > >> > >> More specialized, limits the information to the session where the > >> command was issued > >> > >> - Other things not named here > > > > We now have a framework for this [1] (currently used by vacuum, but > > extensible). The question is about presentation. I'm fairly sure that > > we shouldn't just add yet another framework, and I doubt that that's > > what's proposed by Peter. > > I think the idea of a general progress view is very valuable and there > are a ton of operations it could be used for: full table scans, index > rebuilds, vacuum, copy, etc. > However, I feel that this proposal is not flexible enough and comes too > late in the release cycle to allow development into something that could > be committed. I'm not following. As I pointed out, we already have this framework? This patch is just a short one using that framework? > I propose we move this to the 2017-07 CF so the idea can be more fully > developed. I don't see that being warranted in this case, we're really not talking about something complicated: doc/src/sgml/monitoring.sgml | 135 +++ src/backend/catalog/system_views.sql | 16 src/backend/commands/analyze.c | 34 src/backend/utils/adt/pgstatfuncs.c |2 src/include/commands/progress.h | 13 +++ src/include/pgstat.h |3 src/test/regress/expected/rules.out | 18 7 files changed, 219 insertions(+), 2 deletions(-) excepting tests and docs, this is very little actual code. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On Sat, Mar 4, 2017 at 5:33 AM, David Steelewrote: > I think the idea of a general progress view is very valuable and there > are a ton of operations it could be used for: full table scans, index > rebuilds, vacuum, copy, etc. > > However, I feel that this proposal is not flexible enough and comes too > late in the release cycle to allow development into something that could > be committed. Well, each command really has its own requirements in terms of data to store, so we either finish with a bunch of small tables that anyone could query and join as they wish or a somewhat unique table that is bloated with all the information, with a set of views on top of it to query all the information. For extensibility's sake of each command (for example imagine that REINDEX could be extended with a CONCURRENTLY option and multiple phases), I would think that having a table per command type would not be that bad. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 3/1/17 1:25 PM, Andres Freund wrote: > On 2017-03-01 10:20:41 -0800, David Fetter wrote: >> On Wed, Mar 01, 2017 at 09:45:40AM -0500, Peter Eisentraut wrote: >>> On 2/28/17 04:24, vinayak wrote: The view provides the information of analyze command progress details as follows postgres=# \d pg_stat_progress_analyze View "pg_catalog.pg_stat_progress_analyze" >>> >>> Hmm, do we want a separate "progress" system view for every kind of >>> command? What if someone comes up with a progress checker for CREATE >>> INDEX, REINDEX, CLUSTER, etc.? > > I don't think that'd be that bad, otherwise the naming of the fields is > complicated. I guess the alternative (or do both?) would be to to have > a pivoted table, but that'd painful to query. Do you have a better idea? > > >> Some kind of design for progress seems like a good plan. Some ideas: >> >> - System view(s) >> >> This has the advantage of being shown to work at least to a PoC by >> this patch, and is similar to extant system views like >> pg_stat_activity in the sense of capturing a moment in time. >> >> - NOTIFY >> >> Event-driven model as opposed to a polling one. This is >> attractive on efficiency grounds, less so on reliability ones. >> >> - Something added to the wire protocol >> >> More specialized, limits the information to the session where the >> command was issued >> >> - Other things not named here > > We now have a framework for this [1] (currently used by vacuum, but > extensible). The question is about presentation. I'm fairly sure that > we shouldn't just add yet another framework, and I doubt that that's > what's proposed by Peter. I think the idea of a general progress view is very valuable and there are a ton of operations it could be used for: full table scans, index rebuilds, vacuum, copy, etc. However, I feel that this proposal is not flexible enough and comes too late in the release cycle to allow development into something that could be committed. I propose we move this to the 2017-07 CF so the idea can be more fully developed. -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On March 1, 2017 11:34:48 AM PST, David Fetterwrote: >I notice that that commit has no SGML component. Should it have one? Don't think so. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On Wed, Mar 01, 2017 at 10:28:23AM -0800, Andres Freund wrote: > On 2017-03-01 10:25:49 -0800, Andres Freund wrote: > > We now have a framework for this [1] (currently used by vacuum, but > > extensible). The question is about presentation. I'm fairly sure that > > we shouldn't just add yet another framework, and I doubt that that's > > what's proposed by Peter. > > > > [1] > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c16dc1aca5e > > Majority of that is actually in > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b6fb6471f6afaf649e52f38269fd8c5c60647669 If that's even vaguely usable, I'd say we should use it for this. I notice that that commit has no SGML component. Should it have one? Best, David. -- David Fetterhttp://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 2017-03-01 10:25:49 -0800, Andres Freund wrote: > We now have a framework for this [1] (currently used by vacuum, but > extensible). The question is about presentation. I'm fairly sure that > we shouldn't just add yet another framework, and I doubt that that's > what's proposed by Peter. > > [1] > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c16dc1aca5e Majority of that is actually in https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b6fb6471f6afaf649e52f38269fd8c5c60647669 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 2017-03-01 10:20:41 -0800, David Fetter wrote: > On Wed, Mar 01, 2017 at 09:45:40AM -0500, Peter Eisentraut wrote: > > On 2/28/17 04:24, vinayak wrote: > > > The view provides the information of analyze command progress details as > > > follows > > > postgres=# \d pg_stat_progress_analyze > > >View "pg_catalog.pg_stat_progress_analyze" > > > > Hmm, do we want a separate "progress" system view for every kind of > > command? What if someone comes up with a progress checker for CREATE > > INDEX, REINDEX, CLUSTER, etc.? I don't think that'd be that bad, otherwise the naming of the fields is complicated. I guess the alternative (or do both?) would be to to have a pivoted table, but that'd painful to query. Do you have a better idea? > Some kind of design for progress seems like a good plan. Some ideas: > > - System view(s) > > This has the advantage of being shown to work at least to a PoC by > this patch, and is similar to extant system views like > pg_stat_activity in the sense of capturing a moment in time. > > - NOTIFY > > Event-driven model as opposed to a polling one. This is > attractive on efficiency grounds, less so on reliability ones. > > - Something added to the wire protocol > > More specialized, limits the information to the session where the > command was issued > > - Other things not named here We now have a framework for this [1] (currently used by vacuum, but extensible). The question is about presentation. I'm fairly sure that we shouldn't just add yet another framework, and I doubt that that's what's proposed by Peter. [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c16dc1aca5e -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On Wed, Mar 01, 2017 at 09:45:40AM -0500, Peter Eisentraut wrote: > On 2/28/17 04:24, vinayak wrote: > > The view provides the information of analyze command progress details as > > follows > > postgres=# \d pg_stat_progress_analyze > >View "pg_catalog.pg_stat_progress_analyze" > > Hmm, do we want a separate "progress" system view for every kind of > command? What if someone comes up with a progress checker for CREATE > INDEX, REINDEX, CLUSTER, etc.? Some kind of design for progress seems like a good plan. Some ideas: - System view(s) This has the advantage of being shown to work at least to a PoC by this patch, and is similar to extant system views like pg_stat_activity in the sense of capturing a moment in time. - NOTIFY Event-driven model as opposed to a polling one. This is attractive on efficiency grounds, less so on reliability ones. - Something added to the wire protocol More specialized, limits the information to the session where the command was issued - Other things not named here Best, David. -- David Fetterhttp://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 2/28/17 04:24, vinayak wrote: > The view provides the information of analyze command progress details as > follows > postgres=# \d pg_stat_progress_analyze >View "pg_catalog.pg_stat_progress_analyze" Hmm, do we want a separate "progress" system view for every kind of command? What if someone comes up with a progress checker for CREATE INDEX, REINDEX, CLUSTER, etc.? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers