Re: Most-common value docs in PG 12
On Wed, Aug 28, 2019 at 12:22:38PM -0400, Bruce Momjian wrote: Our docs for most-common values in PG 12 has: --> CREATE STATISTICS stts3 (mcv) ON state, city FROM zipcodes; ANALYZE zipcodes; SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid), pg_mcv_list_items(stxdmcv) m WHERE stxname = 'stts3'; index | values | nulls | frequency | base_frequency ---++---+---+ --> 0 | {Washington, DC} | {f,f} | 0.003467 |2.7e-05 1 | {Apo, AE} | {f,f} | 0.003067 |1.9e-05 2 | {Houston, TX} | {f,f} | 0.002167 | 0.000133 3 | {El Paso, TX} | {f,f} | 0.002 | 0.000113 It seems pg_mcv_list_items() reports the column names in the order they appear in the table, not in the order they appear in the CREATE STATISTICS statement. Same for psql \d: \d zipcodes Table "public.zipcodes" Column | Type | Collation | Nullable | Default -+--+---+--+- city| text | | | state | text | | | zipcode | text | | | Statistics objects: --> "public"."stts3" (mcv) ON city, state FROM zipcodes If this is so, why don't we show the CREATE STATISTICS example as city/state, and not state/city? Yes, we deduplicate the attributes and store them sorted by attnum. I'm not sure it makes sense to change the example to match this order, which is mostly an implementation detail, though. It might be better to point out the order may not exactly match CREATE STATISTICS, and point users to what e.g. "\d" shows (because that will show the order as stored in the system catalog). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Most-common value docs in PG 12
On Thu, Sep 26, 2019 at 04:20:59PM -0400, Bruce Momjian wrote: On Thu, Sep 26, 2019 at 05:17:55PM -0300, Alvaro Herrera wrote: On 2019-Sep-26, Bruce Momjian wrote: > On Wed, Sep 11, 2019 at 06:15:22PM -0300, Alvaro Herrera wrote: > > On 2019-Aug-30, Bruce Momjian wrote: > > > > > OK, how is this patch? I didn't mention psql since I think everyone > > > expects psql to show all information about tables and indexes. > > > > Why would you change perform.sgml? It seems unnecessary; the commands > > shown work fine. > > I realize they work fine, but the ordering in the examples not matching > the defined order suggests that ordering matters, but it does not. Well, I mean exactly the other way around: the fact that the orders don't match illustrates that the order is not important. And that is reinforced by the explanation indicating explicitly that it does not matter: Uh, people normally list things in defined order, so you would usually not list them in non-defined order unless there is a purpose. Doing that just to illustrate the order doesn't matter seems odd. Well, that assumes there is a definition, and I don't think the zipcodes table is defined anywhere. So how do you know in what order are those columns defined? Now, maybe the table should be defined somewhere in perform.sgml - I don't recall why exactly I chose not to do that, maybe because there is no universal definition (one country uses text, another number, ...). I do however agree that had there been such definition, it's probably natural to list columns in the same order. We know the order is not important, the proposed patch states that explicitly, but this just feels natural. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Most-common value docs in PG 12
On Thu, Sep 26, 2019 at 05:31:07PM -0400, Bruce Momjian wrote: On Thu, Sep 26, 2019 at 11:03:54PM +0200, Tomas Vondra wrote: On Thu, Sep 26, 2019 at 04:20:59PM -0400, Bruce Momjian wrote: > Uh, people normally list things in defined order, so you would usually > not list them in non-defined order unless there is a purpose. Doing > that just to illustrate the order doesn't matter seems odd. > Well, that assumes there is a definition, and I don't think the zipcodes table is defined anywhere. So how do you know in what order are those columns defined? In the USA, it is usually specific to general, i.e., city, state. I'd probably define it the same way, but for example the zipcode data sets I usually use for my talks [1] defines it like this: postal code : varchar(20) place name: varchar(180) admin name1 : 1. order subdivision (state) varchar(100) admin code1 : 1. order subdivision (state) varchar(20) admin name2 : 2. order subdivision (county/province) varchar(100) admin code2 : 2. order subdivision (county/province) varchar(20) admin name3 : 3. order subdivision (community) varchar(100) admin code3 : 3. order subdivision (community) varchar(20) latitude : estimated latitude (wgs84) longitude : estimated longitude (wgs84) accuracy : accuracy of lat/lng so in this case it's a bit of a mix of specific vs. general first. [1] http://download.geonames.org/export/zip/ Now, maybe the table should be defined somewhere in perform.sgml - I don't recall why exactly I chose not to do that, maybe because there is no universal definition (one country uses text, another number, ...) Yeah, doesn't seem worth adding. OK. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: DISTINCT term in aggregate function
Sorry, I'm not reading pgsql-docs very often, so I missed the post. Yeah, we should probably add an indexterm to the other places too. regards On 3/18/21 7:03 PM, Pantelis Theodosiou wrote: > Hi, I didn't think of including you in this suggestion. > Or the pdsql-docs was not the right list to post? I didn't want to mix > it with the GROUP BY DISTINCT patch. > > Please check my suggestion. > > Best regards > Pantelis Theodosiou > > > > > -- Forwarded message - > From: *Pantelis Theodosiou* mailto:yperc...@gmail.com>> > Date: Sat, Mar 13, 2021 at 1:03 AM > Subject: Fwd: GROUP BY DISTINCT > To: <mailto:pgsql-docs@lists.postgresql.org>> > > > > -- Forwarded message - > From: *Tomas Vondra* <mailto:tomas.von...@enterprisedb.com>> > Date: Fri, Mar 12, 2021 at 11:33 PM > Subject: Re: GROUP BY DISTINCT > To: Vik Fearing <mailto:v...@postgresfriends.org>>, Georgios Kokolatos > mailto:gkokola...@protonmail.com>>, > <mailto:pgsql-hack...@lists.postgresql.org>> > Cc: Erik Rijkers mailto:e...@xs4all.nl>> > > > Hi Vik, > > The patch seems quite ready, I have just two comments. > > 1) Shouldn't this add another for DISTINCT, somewhere in the > documentation? Now the index points just to the SELECT DISTINCT part. > > . > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com> > The Enterprise PostgreSQL Company > > > > After reading the above thread in hackers, I noticed that the index does > not point to aggrgeate functions either and DISTINCT is not mentioned in > the aggregate functions page > either: https://www.postgresql.org/docs/current/functions-aggregate.html > <https://www.postgresql.org/docs/current/functions-aggregate.html> > Shouldn't it be mentioned with an example of COUNT(DISTINCT ...) or > aggregate_function(DISTINCT ...) in general ? > > Best regards > > Pantelis Theodosiou -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Parallel index build for BRIN
On 12/8/24 16:00, Egor Rogov wrote: > Hi, > > > On 17.11.2024 11:28, Egor Rogov wrote: >> Hi everyone, >> >> This thread doesn't seem to have attracted attention, so let me try >> again. Two documentation pages claim that B-tree is the only access >> method that supports parallel building, which is no longer true. I >> propose to fix it in a way like this: >> >> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml >> index d54f9049569..b5b1580dee7 100644 >> --- a/doc/src/sgml/config.sgml >> +++ b/doc/src/sgml/config.sgml >> @@ -2835,7 +2835,7 @@ include_dir 'conf.d' >> Sets the maximum number of parallel workers that can be >> started by a single utility command. Currently, the parallel >> utility commands that support the use of parallel workers are >> - CREATE INDEX only when building a B-tree >> index, >> + CREATE INDEX when building a B-tree or >> BRIN index, >> and VACUUM without FULL >> option. Parallel workers are taken from the pool of processes >> established by , >> limited >> diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/ >> create_index.sgml >> index 621bc0e253c..208389e8006 100644 >> --- a/doc/src/sgml/ref/create_index.sgml >> +++ b/doc/src/sgml/ref/create_index.sgml >> @@ -808,7 +808,7 @@ Indexes: >> leveraging multiple CPUs in order to process the table rows faster. >> This feature is known as parallel index >> build. For index methods that support building indexes >> - in parallel (currently, only B-tree), >> + in parallel (currently, B-tree and BRIN), >> maintenance_work_mem specifies the maximum >> amount of memory that can be used by each index build operation as >> a whole, regardless of how many worker processes were started. > > > I've spotted another mention of B-tree being the only AM that supports > parallel builds: comment in src/backend/catalog/index.c. As this mention > is not visible to the users, I'd propose removing it altogether rather > than fixing it. Updated patch is attached. > Thanks for noticing this and the patches. You're right, this should have been updated with the BRIN parallel builds. I'll get this committed sometime the week. regards -- Tomas Vondra
Re: Parallel index build for BRIN
On 12/9/24 19:54, Tomas Vondra wrote: > On 12/8/24 16:00, Egor Rogov wrote: >> Hi, >> >> ... >> >> I've spotted another mention of B-tree being the only AM that supports >> parallel builds: comment in src/backend/catalog/index.c. As this mention >> is not visible to the users, I'd propose removing it altogether rather >> than fixing it. Updated patch is attached. >> > > Thanks for noticing this and the patches. You're right, this should have > been updated with the BRIN parallel builds. I'll get this committed > sometime the week. > I've pushed the doc fix, and backpatched it to PG 17. Thanks for the patience! -- Tomas Vondra
Re: Parallel index build for BRIN
On 12/17/24 07:40, Egor Rogov wrote: > On 16.12.2024 21:24, Tomas Vondra wrote: > >> On 12/9/24 19:54, Tomas Vondra wrote: >>> On 12/8/24 16:00, Egor Rogov wrote: >>>> Hi, >>>> >>>> ... >>>> >>>> I've spotted another mention of B-tree being the only AM that supports >>>> parallel builds: comment in src/backend/catalog/index.c. As this >>>> mention >>>> is not visible to the users, I'd propose removing it altogether rather >>>> than fixing it. Updated patch is attached. >>>> >>> Thanks for noticing this and the patches. You're right, this should have >>> been updated with the BRIN parallel builds. I'll get this committed >>> sometime the week. >>> >> I've pushed the doc fix, and backpatched it to PG 17. > > > Thanks so much, Tomas! > > Please note that the comment in src/backend/catalog/index.c remains > unchanged: > > > --- a/src/backend/catalog/index.c > +++ b/src/backend/catalog/index.c > @@ -2988,8 +2988,7 @@ index_build(Relation heapRelation, > Assert(PointerIsValid(indexRelation->rd_indam->ambuildempty)); > > /* > - * Determine worker process details for parallel CREATE INDEX. > Currently, > - * only btree has support for parallel builds. > + * Determine worker process details for parallel CREATE INDEX. > * > * Note that planner considers parallel safety for us. > */ > Oh, thanks for noticing that. I'm not sure why I missed that, I must have used the first patch by mistake. Anyway, I found another obsolete comment in planner.c, so I fixed that too and pushed. Thanks -- Tomas Vondra