Re: [GENERAL] BRIN indexes and ORDER BY
On Wed, Oct 5, 2016 at 3:27 PM, Stephen Frostwrote: > Darren, > > * Darren Lafreniere (dlafreni...@onezero.com) wrote: >> Tom Lane wrote: >> > > Gavin Wahl wrote: >> > >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You >> > >> just find the page range with the largest/smallest value, and then only >> > >> scan that one. Would that be hard to implement? I'm interested in >> > working >> > >> on it if someone can give me some pointers. >> > >> > I think this proposal is fairly broken anyway. The page range with the >> > largest max-value may once have contained the largest live row, but >> > there's no guarantee that it still does. It might even be completely >> > empty. You could imagine an algorithm like this: >> > >> > 1. Find page-range with largest max. Scan it to identify live row with >> > largest value. If *no* live values, find page-range with next largest >> > max, repeat until no page ranges remain (whereupon return NULL). >> > >> > 2. For each remaining page-range whose indexed max exceeds the value >> > currently in hand, scan that page-range to see if any value exceeds >> > the one in hand, replacing the value if so. >> > >> > This'd probably allow you to omit scanning some of the page-ranges >> > in the table, but in a lot of cases you'd end up scanning many of them; >> > and you'd need a lot of working state to remember which ranges you'd >> > already looked at. It'd certainly always be a lot more expensive than >> > answering the same question with a btree index, because in no case do >> > you get to avoid scanning the entire contents of the index. > [...] >> A b-tree index would certainly be faster for ordering. But in scenarios >> where you have huge datasets that can't afford the space or update time >> required for b-tree, could such a BRIN-accelerated ordering algorithm at >> least be faster than ordering with no index? > > For at least some of the common BRIN use-cases, where the rows are > inserted in-order and never/very-rarely modified or deleted, this > approach would work very well. > > Certainly, using this would be much cheaper than a seqscan/top-N sort, > for small values of 'N', relative to the number of rows in the table, > in those cases. > > In general, I like the idea of supporting this as BRIN indexes strike me > as very good for very large tables which have highly clumped data in > them and being able to do a top-N query on those can be very useful at > times. Yeah. If the brin average page overlap and % dead tuple coefficients are low it absolutely makes sense to drive top N with brin. It will never beat a btree but typically brin is used when the btree index is no good for various reasons. brin indexes are pretty neat; they can provide stupefying amounts of optimization in many common warehousing workloads. They even beat out index only scans for a tiny fraction of the storage. Of course, you have to work around the limitations... :-) merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] BRIN indexes and ORDER BY
Stephen Frostwrote: > For at least some of the common BRIN use-cases, where the rows are > inserted in-order and never/very-rarely modified or deleted, this > approach would work very well. > Thanks Stephen, this is exactly our use case.
Re: [GENERAL] BRIN indexes and ORDER BY
Darren, * Darren Lafreniere (dlafreni...@onezero.com) wrote: > Tom Lanewrote: > > > Gavin Wahl wrote: > > >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You > > >> just find the page range with the largest/smallest value, and then only > > >> scan that one. Would that be hard to implement? I'm interested in > > working > > >> on it if someone can give me some pointers. > > > > I think this proposal is fairly broken anyway. The page range with the > > largest max-value may once have contained the largest live row, but > > there's no guarantee that it still does. It might even be completely > > empty. You could imagine an algorithm like this: > > > > 1. Find page-range with largest max. Scan it to identify live row with > > largest value. If *no* live values, find page-range with next largest > > max, repeat until no page ranges remain (whereupon return NULL). > > > > 2. For each remaining page-range whose indexed max exceeds the value > > currently in hand, scan that page-range to see if any value exceeds > > the one in hand, replacing the value if so. > > > > This'd probably allow you to omit scanning some of the page-ranges > > in the table, but in a lot of cases you'd end up scanning many of them; > > and you'd need a lot of working state to remember which ranges you'd > > already looked at. It'd certainly always be a lot more expensive than > > answering the same question with a btree index, because in no case do > > you get to avoid scanning the entire contents of the index. [...] > A b-tree index would certainly be faster for ordering. But in scenarios > where you have huge datasets that can't afford the space or update time > required for b-tree, could such a BRIN-accelerated ordering algorithm at > least be faster than ordering with no index? For at least some of the common BRIN use-cases, where the rows are inserted in-order and never/very-rarely modified or deleted, this approach would work very well. Certainly, using this would be much cheaper than a seqscan/top-N sort, for small values of 'N', relative to the number of rows in the table, in those cases. In general, I like the idea of supporting this as BRIN indexes strike me as very good for very large tables which have highly clumped data in them and being able to do a top-N query on those can be very useful at times. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] BRIN indexes and ORDER BY
Tom Lanewrote: > > Gavin Wahl wrote: > >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You > >> just find the page range with the largest/smallest value, and then only > >> scan that one. Would that be hard to implement? I'm interested in > working > >> on it if someone can give me some pointers. > > I think this proposal is fairly broken anyway. The page range with the > largest max-value may once have contained the largest live row, but > there's no guarantee that it still does. It might even be completely > empty. You could imagine an algorithm like this: > > 1. Find page-range with largest max. Scan it to identify live row with > largest value. If *no* live values, find page-range with next largest > max, repeat until no page ranges remain (whereupon return NULL). > > 2. For each remaining page-range whose indexed max exceeds the value > currently in hand, scan that page-range to see if any value exceeds > the one in hand, replacing the value if so. > > This'd probably allow you to omit scanning some of the page-ranges > in the table, but in a lot of cases you'd end up scanning many of them; > and you'd need a lot of working state to remember which ranges you'd > already looked at. It'd certainly always be a lot more expensive than > answering the same question with a btree index, because in no case do > you get to avoid scanning the entire contents of the index. > > regards, tom lane > Thanks Tom, A b-tree index would certainly be faster for ordering. But in scenarios where you have huge datasets that can't afford the space or update time required for b-tree, could such a BRIN-accelerated ordering algorithm at least be faster than ordering with no index? Darren Lafreniere
Re: [GENERAL] BRIN indexes and ORDER BY
Alvaro Herrerawrites: > Darren Lafreniere wrote: >> We found a pgsql-hackers thread from about a year ago about optimizing >> ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it: >> https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us > Tom said he was working on some infrastructure planner changes > ("upper-planner path-ification"), not that he was working on improving > usage of BRIN indexes. As far as I know, nobody has worked on that. Alvaro's reading is correct; I wasn't planning to work on any such thing, and still am not. Looking again at the original thread: > Gavin Wahl wrote: >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You >> just find the page range with the largest/smallest value, and then only >> scan that one. Would that be hard to implement? I'm interested in working >> on it if someone can give me some pointers. I think this proposal is fairly broken anyway. The page range with the largest max-value may once have contained the largest live row, but there's no guarantee that it still does. It might even be completely empty. You could imagine an algorithm like this: 1. Find page-range with largest max. Scan it to identify live row with largest value. If *no* live values, find page-range with next largest max, repeat until no page ranges remain (whereupon return NULL). 2. For each remaining page-range whose indexed max exceeds the value currently in hand, scan that page-range to see if any value exceeds the one in hand, replacing the value if so. This'd probably allow you to omit scanning some of the page-ranges in the table, but in a lot of cases you'd end up scanning many of them; and you'd need a lot of working state to remember which ranges you'd already looked at. It'd certainly always be a lot more expensive than answering the same question with a btree index, because in no case do you get to avoid scanning the entire contents of the index. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] BRIN indexes and ORDER BY
Ahh, yes. I misread that. Thank you for the clarification. On Wed, Oct 5, 2016 at 2:27 PM, Alvaro Herrerawrote: > Darren Lafreniere wrote: > > > "In addition to simply finding the rows to be returned by a query, an > index > > may be able to deliver them in a specific sorted order. This allows a > > query's ORDER BY specification to be honored without a separate sorting > > step. Of the index types currently supported by PostgreSQL, only B-tree > can > > produce sorted output — the other index types return matching rows in an > > unspecified, implementation-dependent order." > > > > We found a pgsql-hackers thread from about a year ago about optimizing > > ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it: > > https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us > > Tom said he was working on some infrastructure planner changes > ("upper-planner path-ification"), not that he was working on improving > usage of BRIN indexes. As far as I know, nobody has worked on that. > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [GENERAL] BRIN indexes and ORDER BY
Darren Lafreniere wrote: > "In addition to simply finding the rows to be returned by a query, an index > may be able to deliver them in a specific sorted order. This allows a > query's ORDER BY specification to be honored without a separate sorting > step. Of the index types currently supported by PostgreSQL, only B-tree can > produce sorted output — the other index types return matching rows in an > unspecified, implementation-dependent order." > > We found a pgsql-hackers thread from about a year ago about optimizing > ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it: > https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us Tom said he was working on some infrastructure planner changes ("upper-planner path-ification"), not that he was working on improving usage of BRIN indexes. As far as I know, nobody has worked on that. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] BRIN indexes and ORDER BY
Hello, We're curious about the current behavior in 9.5.4, and possible future enhancements, of BRIN indexes with respect to ordering. In the docs, section 11.4. "Indexes and ORDER BY" ( https://www.postgresql.org/docs/9.5/static/indexes-ordering.html) is clear that anything other than B-tree indexes have unspecified ordering: "In addition to simply finding the rows to be returned by a query, an index may be able to deliver them in a specific sorted order. This allows a query's ORDER BY specification to be honored without a separate sorting step. Of the index types currently supported by PostgreSQL, only B-tree can produce sorted output — the other index types return matching rows in an unspecified, implementation-dependent order." We found a pgsql-hackers thread from about a year ago about optimizing ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it: https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us Our current test shows that ordering by a BRIN indexed column still performs an unoptimized sort: SELECT generate_series(1, 1000) AS id INTO test; CREATE INDEX idx_test_id ON test USING BRIN (id); EXPLAIN SELECT id FROM test ORDER BY id DESC LIMIT 20; Limit (cost=410344.40..410344.45 rows=20 width=4) -> Sort (cost=410344.40..435344.40 rows=100 width=4)" Sort Key: id DESC -> Seq Scan on test (cost=0.00..144248.00 rows=1000 width=4) Is there anything we're missing to speed this up? Or is it still a future feature? Thank you, Darren Lafreniere
Re: [GENERAL] BRIN indexes
Felipe Santos wrote: > brin_db=# EXPLAIN ANALYSE SELECT count(*) FROM orders WHERE order_date > BETWEEN '2012-01-04 09:00:00' and '2014-01-04 14:30:00'; > > QUERY PLAN > > -- > - > Aggregate (cost=2408269.34..2408269.35 rows=1 width=0) (actual > time=14164.923..14164.923 rows=1 loops=1) >-> Bitmap Heap Scan on orders (cost=326808.28..2328609.76 > rows=31863832 width=0) (actual time=42.742..11430.562 rows=31589101 loops=1) > Recheck Cond: ((order_date >= '2012-01-04 09:00:00-02'::timestamp > with time zone) AND (order_date <= '2014-01-04 14:30:00-02'::timestamp with > time zone)) > Rows Removed by Index Recheck: 21907 > Heap Blocks: lossy=201344 > -> Bitmap Index Scan on idx_order_date_brin > (cost=0.00..318842.32 rows=31863832 width=0) (actual time=36.151..36.151 > rows=2013440 loops=1) >Index Cond: ((order_date >= '2012-01-04 > 09:00:00-02'::timestamp with time zone) AND (order_date <= '2014-01-04 > 14:30:00-02'::timestamp > with time zone)) > Planning time: 0.297 ms > Execution time: 14164.985 ms > (9 rows) The number of blocks read from the heap is a key number to watch for, because when you get concurrency that's what going to matter the most. Here you have 201k buffer reads, versus, uh, I don't know how many pages read (because you didn't use the VERBOSE explain option). I think it's worth comparing how many buffer accesses your query had to read. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] BRIN indexes
From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Felipe Santos Sent: Thursday, January 28, 2016 1:17 PM To: Joshua D. Drake <j...@commandprompt.com> Cc: Melvin Davidson <melvin6...@gmail.com>; David Rowley <david.row...@2ndquadrant.com>; pgsql-general@postgresql.org; Thomas Kellerer <spam_ea...@gmx.net> Subject: Re: [GENERAL] BRIN indexes "Further to the point, it is self defeating to have more than one BRIN index on the table if the columns involved would have mutually non-adjacent pages." Not really, if both columns are ordered, BRIN will work "Therefore, it actually would be good to state that in the documentation, even it were just a comment." It is = "BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table" Link: http://www.postgresql.org/docs/devel/static/brin-intro.html Also, I did some tests and here are the results I got: Query with no index = completion time 43s Same Query with BRIN = completion time 14s / index size 0,5 MB Same Query without BRIN and with BTREE = completion time 10s / index size 5.000,00 MB As you can see, BRIN can save 99% of disk space for just a slightly worse performance. It seems like a huge improvement, given that your data fits BRIN's use case. Felipe, What kind of queries you used in your test? Where they based on clustering columns? Regards Igor Neyman
Re: [GENERAL] BRIN indexes
Emre Hasegeli wrote: > >> From the above, may I presume that it is best to cluster (or sort), the > >> table based on the intended > >> BRIN column(s) before actually creating the index to insure the pages are > >> adjacent? If so, should > >> that not be included in the documentation, instead of implied? > > The same question is asked to me at PGConf.DE. I think it would be > nice to address it in the documentation somehow. Maybe, we should > also explain how the table is physically organised. It is not clear > to users what kind of operations would make BRIN more useful. Grumble. > > I don't have faith in CLUSTER anyway. Taking exclusive locks and all. > > It also requires a btree index. If you can afford to have btree, you > probably don't need BRIN anyway. Something lighter than CLUSTER which > can use BRIN would be useful. What I think would be useful is a way for the BRIN index to guide location of a new tuple, so that it's put in the right spot right from the start, instead of having it be moved later. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] BRIN indexes
2016-01-28 16:33 GMT-02:00 Igor Neyman <iney...@perceptron.com>: > > > *From:* pgsql-general-ow...@postgresql.org [mailto: > pgsql-general-ow...@postgresql.org] *On Behalf Of *Felipe Santos > *Sent:* Thursday, January 28, 2016 1:17 PM > *To:* Joshua D. Drake <j...@commandprompt.com> > *Cc:* Melvin Davidson <melvin6...@gmail.com>; David Rowley < > david.row...@2ndquadrant.com>; pgsql-general@postgresql.org; Thomas > Kellerer <spam_ea...@gmx.net> > *Subject:* Re: [GENERAL] BRIN indexes > > > > "Further to the point, it is self defeating to have more than one BRIN > index on the table if the columns involved would have mutually > non-adjacent pages." > > > >Not really, if both columns are ordered, BRIN will work > > > > "Therefore, it actually would be good to state that in the documentation, > even it were just a comment." > > > >It is = "BRIN is designed for handling very large tables in which > certain columns have some natural correlation with their physical location > within the table" > >Link: http://www.postgresql.org/docs/devel/static/brin-intro.html > > > > > > Also, I did some tests and here are the results I got: > > > > Query with no index = completion time 43s > > Same Query with BRIN = completion time 14s / index size 0,5 MB > > Same Query without BRIN and with BTREE = completion time 10s / index size > 5.000,00 MB > > > > As you can see, BRIN can save 99% of disk space for just a slightly worse > performance. > > > > It seems like a huge improvement, given that your data fits BRIN's use > case. > > > > Felipe, > > > > What kind of queries you used in your test? > > Where they based on clustering columns? > > > > Regards > > Igor Neyman > Hello Igor, I took the sample BRIN test from the new release's wiki and added the BTREE test: https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.5#BRIN_Indexes The results today may vary from the reported above but are still in the same levels of performance gain: brin_db=# CREATE TABLE orders ( brin_db(# id int, brin_db(# order_date timestamptz, brin_db(# item text); CREATE TABLE brin_db=# INSERT INTO orders (order_date, item) brin_db-# SELECT x, 'dfiojdso' brin_db-# FROM generate_series('2000-01-01 00:00:00'::timestamptz, '2015-03-01 00:00:00'::timestamptz,'2 seconds'::interval) a(x); INSERT 0 239243401 brin_db=# \dt+ orders List of relations Schema | Name | Type | Owner | Size | Description ++---+--+---+- public | orders | table | postgres | 12 GB | (1 row) brin_db=# EXPLAIN ANALYSE SELECT count(*) FROM orders WHERE order_date BETWEEN '2012-01-04 09:00:00' and '2014-01-04 14:30:00'; QUERY PLAN -- --- Aggregate (cost=4108912.01..4108912.02 rows=1 width=0) (actual time=81116.722..81116.722 rows=1 loops=1) -> Seq Scan on orders (cost=0.00..4106759.58 rows=860972 width=0) (actual time=60173.531..78566.113 rows=31589101 loops=1) Filter: ((order_date >= '2012-01-04 09:00:00-02'::timestamp with time zone) AND (order_date <= '2014-01-04 14:30:00-02'::timestamp with time zone)) Rows Removed by Filter: 207654300 Planning time: 0.443 ms Execution time: 81118.168 ms (6 rows) brin_db=# CREATE INDEX idx_order_date_brin brin_db-#ON orders brin_db-#USING BRIN (order_date); CREATE INDEX brin_db=# \di+ idx_order_date_brin List of relations Schema |Name | Type | Owner | Table | Size | Description +-+---+--+++- public | idx_order_date_brin | index | postgres | orders | 432 kB | (1 row) brin_db=# EXPLAIN ANALYSE SELECT count(*) FROM orders WHERE order_date BETWEEN '2012-01-04 09:00:00' and '2014-01-04 14:30:00'; QUERY PLAN -- - Aggregate (cost=2408269.34..2408269.35 rows=1 width=0) (actual time=14164.923..14164.923 rows=1 loops=1) -> Bitmap Heap Scan on orders (cost=326808.28..2328609.76 rows=31863832 width=0) (actual time=42.742..11430.562 rows=31589101 loops=1) Recheck Cond: ((order_date >= '2012-01-04 09:00:00-02'::timestamp with time zone) AND (order_date <= '2014-01-04 14:30:00-02'::timestamp with time zone)) Rows Removed by Index Recheck: 21907 Heap Blocks: lossy=201344 -> Bitmap Index Scan on idx_order_date_brin (cost=0.00..3
Re: [GENERAL] BRIN indexes
>> From the above, may I presume that it is best to cluster (or sort), the >> table based on the intended >> BRIN column(s) before actually creating the index to insure the pages are >> adjacent? If so, should >> that not be included in the documentation, instead of implied? The same question is asked to me at PGConf.DE. I think it would be nice to address it in the documentation somehow. Maybe, we should also explain how the table is physically organised. It is not clear to users what kind of operations would make BRIN more useful. > I don't have faith in CLUSTER anyway. Taking exclusive locks and all. It also requires a btree index. If you can afford to have btree, you probably don't need BRIN anyway. Something lighter than CLUSTER which can use BRIN would be useful. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] BRIN indexes
Melvin Davidson wrote: > With regard to BRIN indexes: > > http://www.postgresql.org/docs/9.5/interactive/brin-intro.html > > 62.1. Introduction > > "A block range is a group of pages that are physically adjacent in the > table; for each block range, some summary info is stored by the index." > > From the above, may I presume that it is best to cluster (or sort), the > table based on the intended > BRIN column(s) before actually creating the index to insure the pages are > adjacent? If so, should > that not be included in the documentation, instead of implied? The issue is that you cannot normally afford to cluster a table every once in a while; if the natural order in which data is loaded isn't good for BRIN, then perhaps you shouldn't consider BRIN at all. If you're bulk-loading and then create a BRIN index, then it's better to load the data in order of the columns. But perhaps you have reasons to have the table sorted in some other order, in which case trying to satisfy BRIN would be worse. All in all, I think there are enough caveats about this that I'm not sure about putting it up in the doc. I don't have faith in CLUSTER anyway. Taking exclusive locks and all. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] BRIN indexes
Reposting because I received no reply from a previous inquiry as "9.5 new features". With regard to BRIN indexes: http://www.postgresql.org/docs/9.5/interactive/brin-intro.html 62.1. Introduction "A block range is a group of pages that are physically adjacent in the table; for each block range, some summary info is stored by the index." >From the above, may I presume that it is best to cluster (or sort), the table based on the intended BRIN column(s) before actually creating the index to insure the pages are adjacent? If so, should that not be included in the documentation, instead of implied? -- *Melvin Davidson* I reserve the right to fantasize. Whether or not you wish to share my fantasy is entirely up to you.
Re: [GENERAL] BRIN indexes
2016-01-28 16:03 GMT-02:00 Joshua D. Drake: > On 01/28/2016 09:41 AM, Melvin Davidson wrote: > >> So, IOW, and the answer to my question is yes, it should be insured that >> all pages involved are physically adjacent (by design or by pre-sort) >> before creating a BRIN on them. >> Further to the point, it is self defeating to have more than one BRIN >> index on the table if the columns involved would have mutually >> non-adjacent pages. >> Therefore, it actually would be good to state that in the documentation, >> even it were just a comment. >> > > BRIN indexes are best used on INSERT only tables with a sequence of > numbers as a PK or indexed column that will be queried against. At least as > I understand it. > > JD > > -- > Command Prompt, Inc. http://the.postgres.company/ > +1-503-667-4564 > PostgreSQL Centered full stack support, consulting and development. > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > "Further to the point, it is self defeating to have more than one BRIN index on the table if the columns involved would have mutually non-adjacent pages." Not really, if both columns are ordered, BRIN will work "Therefore, it actually would be good to state that in the documentation, even it were just a comment." It is = "BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table" Link: http://www.postgresql.org/docs/devel/static/brin-intro.html Also, I did some tests and here are the results I got: Query with no index = completion time 43s Same Query with BRIN = completion time 14s / index size 0,5 MB Same Query without BRIN and with BTREE = completion time 10s / index size 5.000,00 MB As you can see, BRIN can save 99% of disk space for just a slightly worse performance. It seems like a huge improvement, given that your data fits BRIN's use case.
Re: [GENERAL] BRIN indexes
So, IOW, and the answer to my question is yes, it should be insured that all pages involved are physically adjacent (by design or by pre-sort) before creating a BRIN on them. Further to the point, it is self defeating to have more than one BRIN index on the table if the columns involved would have mutually non-adjacent pages. Therefore, it actually would be good to state that in the documentation, even it were just a comment. On Thu, Jan 28, 2016 at 12:31 PM, David Rowleywrote: > On 29 January 2016 at 06:10, Melvin Davidson wrote: > > With regard to BRIN indexes: > > > > http://www.postgresql.org/docs/9.5/interactive/brin-intro.html > > > > 62.1. Introduction > > > > "A block range is a group of pages that are physically adjacent in the > table; for each block range, some summary info is stored by the index." > > > > From the above, may I presume that it is best to cluster (or sort), the > table based on the intended > > BRIN column(s) before actually creating the index to insure the pages > are adjacent? If so, should > > that not be included in the documentation, instead of implied? > > I personally think the second sentence of the link to the > documentation covers this quite well. Namely "BRIN is designed for > handling very large tables in which certain columns have some natural > correlation with their physical location within the table." > > Examples of this might be something like an "orders" table, where you > have an orderdate column, probably you'll insert into this table as > orders are received, so quite possibly the table will be naturally > ordered in ascending orderdate order. Although UPDATEs might create > new tuples in some free space elsewhere in the relation, but it's not > hard to imagine other cases where there's no updates and "natural > correlation" is persisted. > > -- > David Rowley http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > -- *Melvin Davidson* I reserve the right to fantasize. Whether or not you wish to share my fantasy is entirely up to you.
Re: [GENERAL] BRIN indexes
On 29 January 2016 at 06:10, Melvin Davidsonwrote: > With regard to BRIN indexes: > > http://www.postgresql.org/docs/9.5/interactive/brin-intro.html > > 62.1. Introduction > > "A block range is a group of pages that are physically adjacent in the > table; for each block range, some summary info is stored by the index." > > From the above, may I presume that it is best to cluster (or sort), the table > based on the intended > BRIN column(s) before actually creating the index to insure the pages are > adjacent? If so, should > that not be included in the documentation, instead of implied? I personally think the second sentence of the link to the documentation covers this quite well. Namely "BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table." Examples of this might be something like an "orders" table, where you have an orderdate column, probably you'll insert into this table as orders are received, so quite possibly the table will be naturally ordered in ascending orderdate order. Although UPDATEs might create new tuples in some free space elsewhere in the relation, but it's not hard to imagine other cases where there's no updates and "natural correlation" is persisted. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] BRIN indexes
On 01/28/2016 09:41 AM, Melvin Davidson wrote: So, IOW, and the answer to my question is yes, it should be insured that all pages involved are physically adjacent (by design or by pre-sort) before creating a BRIN on them. Further to the point, it is self defeating to have more than one BRIN index on the table if the columns involved would have mutually non-adjacent pages. Therefore, it actually would be good to state that in the documentation, even it were just a comment. BRIN indexes are best used on INSERT only tables with a sequence of numbers as a PK or indexed column that will be queried against. At least as I understand it. JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general