Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-06 Thread Merlin Moncure
On Wed, Oct 5, 2016 at 3:27 PM, Stephen Frost  wrote:
> Darren,
>
> * Darren Lafreniere (dlafreni...@onezero.com) wrote:
>> Tom Lane  wrote:
>> > > Gavin Wahl wrote:
>> > >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
>> > >> just find the page range with the largest/smallest value, and then only
>> > >> scan that one. Would that be hard to implement? I'm interested in
>> > working
>> > >> on it if someone can give me some pointers.
>> >
>> > I think this proposal is fairly broken anyway.  The page range with the
>> > largest max-value may once have contained the largest live row, but
>> > there's no guarantee that it still does.  It might even be completely
>> > empty.  You could imagine an algorithm like this:
>> >
>> > 1. Find page-range with largest max.  Scan it to identify live row with
>> > largest value.  If *no* live values, find page-range with next largest
>> > max, repeat until no page ranges remain (whereupon return NULL).
>> >
>> > 2. For each remaining page-range whose indexed max exceeds the value
>> > currently in hand, scan that page-range to see if any value exceeds
>> > the one in hand, replacing the value if so.
>> >
>> > This'd probably allow you to omit scanning some of the page-ranges
>> > in the table, but in a lot of cases you'd end up scanning many of them;
>> > and you'd need a lot of working state to remember which ranges you'd
>> > already looked at.  It'd certainly always be a lot more expensive than
>> > answering the same question with a btree index, because in no case do
>> > you get to avoid scanning the entire contents of the index.
> [...]
>> A b-tree index would certainly be faster for ordering. But in scenarios
>> where you have huge datasets that can't afford the space or update time
>> required for b-tree, could such a BRIN-accelerated ordering algorithm at
>> least be faster than ordering with no index?
>
> For at least some of the common BRIN use-cases, where the rows are
> inserted in-order and never/very-rarely modified or deleted, this
> approach would work very well.
>
> Certainly, using this would be much cheaper than a seqscan/top-N sort,
> for small values of 'N', relative to the number of rows in the table,
> in those cases.
>
> In general, I like the idea of supporting this as BRIN indexes strike me
> as very good for very large tables which have highly clumped data in
> them and being able to do a top-N query on those can be very useful at
> times.

Yeah.  If the brin average page overlap and % dead tuple coefficients
are low it absolutely makes sense to drive top N with brin.  It will
never beat a btree but typically brin is used when the btree index is
no good for various reasons.

brin indexes are pretty neat; they can provide stupefying amounts of
optimization in many common warehousing workloads.   They even beat
out index only scans for a tiny fraction of the storage.  Of course,
you have to work around the limitations... :-)

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Darren Lafreniere
Stephen Frost  wrote:

> For at least some of the common BRIN use-cases, where the rows are
> inserted in-order and never/very-rarely modified or deleted, this
> approach would work very well.
>

Thanks Stephen, this is exactly our use case.


Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Stephen Frost
Darren,

* Darren Lafreniere (dlafreni...@onezero.com) wrote:
> Tom Lane  wrote:
> > > Gavin Wahl wrote:
> > >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
> > >> just find the page range with the largest/smallest value, and then only
> > >> scan that one. Would that be hard to implement? I'm interested in
> > working
> > >> on it if someone can give me some pointers.
> >
> > I think this proposal is fairly broken anyway.  The page range with the
> > largest max-value may once have contained the largest live row, but
> > there's no guarantee that it still does.  It might even be completely
> > empty.  You could imagine an algorithm like this:
> >
> > 1. Find page-range with largest max.  Scan it to identify live row with
> > largest value.  If *no* live values, find page-range with next largest
> > max, repeat until no page ranges remain (whereupon return NULL).
> >
> > 2. For each remaining page-range whose indexed max exceeds the value
> > currently in hand, scan that page-range to see if any value exceeds
> > the one in hand, replacing the value if so.
> >
> > This'd probably allow you to omit scanning some of the page-ranges
> > in the table, but in a lot of cases you'd end up scanning many of them;
> > and you'd need a lot of working state to remember which ranges you'd
> > already looked at.  It'd certainly always be a lot more expensive than
> > answering the same question with a btree index, because in no case do
> > you get to avoid scanning the entire contents of the index.
[...]
> A b-tree index would certainly be faster for ordering. But in scenarios
> where you have huge datasets that can't afford the space or update time
> required for b-tree, could such a BRIN-accelerated ordering algorithm at
> least be faster than ordering with no index?

For at least some of the common BRIN use-cases, where the rows are
inserted in-order and never/very-rarely modified or deleted, this
approach would work very well.

Certainly, using this would be much cheaper than a seqscan/top-N sort,
for small values of 'N', relative to the number of rows in the table,
in those cases.

In general, I like the idea of supporting this as BRIN indexes strike me
as very good for very large tables which have highly clumped data in
them and being able to do a top-N query on those can be very useful at
times.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Darren Lafreniere
Tom Lane  wrote:

> > Gavin Wahl wrote:
> >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
> >> just find the page range with the largest/smallest value, and then only
> >> scan that one. Would that be hard to implement? I'm interested in
> working
> >> on it if someone can give me some pointers.
>
> I think this proposal is fairly broken anyway.  The page range with the
> largest max-value may once have contained the largest live row, but
> there's no guarantee that it still does.  It might even be completely
> empty.  You could imagine an algorithm like this:
>
> 1. Find page-range with largest max.  Scan it to identify live row with
> largest value.  If *no* live values, find page-range with next largest
> max, repeat until no page ranges remain (whereupon return NULL).
>
> 2. For each remaining page-range whose indexed max exceeds the value
> currently in hand, scan that page-range to see if any value exceeds
> the one in hand, replacing the value if so.
>
> This'd probably allow you to omit scanning some of the page-ranges
> in the table, but in a lot of cases you'd end up scanning many of them;
> and you'd need a lot of working state to remember which ranges you'd
> already looked at.  It'd certainly always be a lot more expensive than
> answering the same question with a btree index, because in no case do
> you get to avoid scanning the entire contents of the index.
>
> regards, tom lane
>


Thanks Tom,

A b-tree index would certainly be faster for ordering. But in scenarios
where you have huge datasets that can't afford the space or update time
required for b-tree, could such a BRIN-accelerated ordering algorithm at
least be faster than ordering with no index?

Darren Lafreniere


Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Tom Lane
Alvaro Herrera  writes:
> Darren Lafreniere wrote:
>> We found a pgsql-hackers thread from about a year ago about optimizing
>> ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it:
>> https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us

> Tom said he was working on some infrastructure planner changes
> ("upper-planner path-ification"), not that he was working on improving
> usage of BRIN indexes.  As far as I know, nobody has worked on that.

Alvaro's reading is correct; I wasn't planning to work on any such thing,
and still am not.

Looking again at the original thread:

> Gavin Wahl wrote:
>> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
>> just find the page range with the largest/smallest value, and then only
>> scan that one. Would that be hard to implement? I'm interested in working
>> on it if someone can give me some pointers.

I think this proposal is fairly broken anyway.  The page range with the
largest max-value may once have contained the largest live row, but
there's no guarantee that it still does.  It might even be completely
empty.  You could imagine an algorithm like this:

1. Find page-range with largest max.  Scan it to identify live row with
largest value.  If *no* live values, find page-range with next largest
max, repeat until no page ranges remain (whereupon return NULL).

2. For each remaining page-range whose indexed max exceeds the value
currently in hand, scan that page-range to see if any value exceeds
the one in hand, replacing the value if so.

This'd probably allow you to omit scanning some of the page-ranges
in the table, but in a lot of cases you'd end up scanning many of them;
and you'd need a lot of working state to remember which ranges you'd
already looked at.  It'd certainly always be a lot more expensive than
answering the same question with a btree index, because in no case do
you get to avoid scanning the entire contents of the index.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Darren Lafreniere
Ahh, yes. I misread that. Thank you for the clarification.

On Wed, Oct 5, 2016 at 2:27 PM, Alvaro Herrera 
wrote:

> Darren Lafreniere wrote:
>
> > "In addition to simply finding the rows to be returned by a query, an
> index
> > may be able to deliver them in a specific sorted order. This allows a
> > query's ORDER BY specification to be honored without a separate sorting
> > step. Of the index types currently supported by PostgreSQL, only B-tree
> can
> > produce sorted output — the other index types return matching rows in an
> > unspecified, implementation-dependent order."
> >
> > We found a pgsql-hackers thread from about a year ago about optimizing
> > ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it:
> > https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us
>
> Tom said he was working on some infrastructure planner changes
> ("upper-planner path-ification"), not that he was working on improving
> usage of BRIN indexes.  As far as I know, nobody has worked on that.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Alvaro Herrera
Darren Lafreniere wrote:

> "In addition to simply finding the rows to be returned by a query, an index
> may be able to deliver them in a specific sorted order. This allows a
> query's ORDER BY specification to be honored without a separate sorting
> step. Of the index types currently supported by PostgreSQL, only B-tree can
> produce sorted output — the other index types return matching rows in an
> unspecified, implementation-dependent order."
> 
> We found a pgsql-hackers thread from about a year ago about optimizing
> ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it:
> https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us

Tom said he was working on some infrastructure planner changes
("upper-planner path-ification"), not that he was working on improving
usage of BRIN indexes.  As far as I know, nobody has worked on that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Darren Lafreniere
Hello,

We're curious about the current behavior in 9.5.4, and possible future
enhancements, of BRIN indexes with respect to ordering.

In the docs, section 11.4. "Indexes and ORDER BY" (
https://www.postgresql.org/docs/9.5/static/indexes-ordering.html) is clear
that anything other than B-tree indexes have unspecified ordering:

"In addition to simply finding the rows to be returned by a query, an index
may be able to deliver them in a specific sorted order. This allows a
query's ORDER BY specification to be honored without a separate sorting
step. Of the index types currently supported by PostgreSQL, only B-tree can
produce sorted output — the other index types return matching rows in an
unspecified, implementation-dependent order."

We found a pgsql-hackers thread from about a year ago about optimizing
ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it:
https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us


Our current test shows that ordering by a BRIN indexed column still
performs an unoptimized sort:

SELECT generate_series(1, 1000) AS id INTO test;
CREATE INDEX idx_test_id ON test USING BRIN (id);
EXPLAIN SELECT id FROM test ORDER BY id DESC LIMIT 20;

Limit  (cost=410344.40..410344.45 rows=20 width=4)
  ->  Sort  (cost=410344.40..435344.40 rows=100 width=4)"
Sort Key: id DESC
->  Seq Scan on test  (cost=0.00..144248.00 rows=1000 width=4)

Is there anything we're missing to speed this up? Or is it still a future
feature?

Thank you,
Darren Lafreniere


Re: [GENERAL] BRIN indexes

2016-01-30 Thread Alvaro Herrera
Felipe Santos wrote:

> brin_db=# EXPLAIN ANALYSE SELECT count(*) FROM orders WHERE order_date
> BETWEEN '2012-01-04 09:00:00' and '2014-01-04 14:30:00';
> 
>   QUERY PLAN
> 
> --
> -
>  Aggregate  (cost=2408269.34..2408269.35 rows=1 width=0) (actual
> time=14164.923..14164.923 rows=1 loops=1)
>->  Bitmap Heap Scan on orders  (cost=326808.28..2328609.76
> rows=31863832 width=0) (actual time=42.742..11430.562 rows=31589101 loops=1)
>  Recheck Cond: ((order_date >= '2012-01-04 09:00:00-02'::timestamp
> with time zone) AND (order_date <= '2014-01-04 14:30:00-02'::timestamp with
>  time zone))
>  Rows Removed by Index Recheck: 21907
>  Heap Blocks: lossy=201344
>  ->  Bitmap Index Scan on idx_order_date_brin
>  (cost=0.00..318842.32 rows=31863832 width=0) (actual time=36.151..36.151
> rows=2013440 loops=1)
>Index Cond: ((order_date >= '2012-01-04
> 09:00:00-02'::timestamp with time zone) AND (order_date <= '2014-01-04
> 14:30:00-02'::timestamp
> with time zone))
>  Planning time: 0.297 ms
>  Execution time: 14164.985 ms
> (9 rows)

The number of blocks read from the heap is a key number to watch for,
because when you get concurrency that's what going to matter the most.
Here you have 201k buffer reads, versus, uh, I don't know how many pages
read (because you didn't use the VERBOSE explain option).  I think it's
worth comparing how many buffer accesses your query had to read.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BRIN indexes

2016-01-30 Thread Igor Neyman

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Felipe Santos
Sent: Thursday, January 28, 2016 1:17 PM
To: Joshua D. Drake <j...@commandprompt.com>
Cc: Melvin Davidson <melvin6...@gmail.com>; David Rowley 
<david.row...@2ndquadrant.com>; pgsql-general@postgresql.org; Thomas Kellerer 
<spam_ea...@gmx.net>
Subject: Re: [GENERAL] BRIN indexes

"Further to the point, it is self defeating to have more than one BRIN index on 
the table if the columns involved would have mutually  non-adjacent pages."

   Not really, if both columns are ordered, BRIN will work

"Therefore, it actually would be good to state that in the documentation, even 
it were just a comment."

   It is = "BRIN is designed for handling very large tables in which certain 
columns have some natural correlation with their physical location within the 
table"
   Link: http://www.postgresql.org/docs/devel/static/brin-intro.html


Also, I did some tests and here are the results I got:

Query with no index = completion time 43s
Same Query with BRIN = completion time 14s / index size 0,5 MB
Same Query without BRIN and with BTREE = completion time 10s / index size 
5.000,00 MB

As you can see, BRIN can save 99% of disk space for just a slightly worse 
performance.

It seems like a huge improvement, given that your data fits BRIN's use case.

Felipe,

What kind of queries you used in your test?
Where they based on clustering columns?

Regards
Igor Neyman


Re: [GENERAL] BRIN indexes

2016-01-29 Thread Alvaro Herrera
Emre Hasegeli wrote:
> >> From the above, may I presume that it is best to cluster (or sort), the
> >> table based on the intended
> >> BRIN column(s) before actually creating the index to insure the pages are
> >> adjacent? If so, should
> >>  that not be included in the documentation, instead of implied?
> 
> The same question is asked to me at PGConf.DE.  I think it would be
> nice to address it in the documentation somehow.  Maybe, we should
> also explain how the table is physically organised.  It is not clear
> to users what kind of operations would make BRIN more useful.

Grumble.

> > I don't have faith in CLUSTER anyway.  Taking exclusive locks and all.
> 
> It also requires a btree index.  If you can afford to have btree, you
> probably don't need BRIN anyway.  Something lighter than CLUSTER which
> can use BRIN would be useful.

What I think would be useful is a way for the BRIN index to guide
location of a new tuple, so that it's put in the right spot right from
the start, instead of having it be moved later.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BRIN indexes

2016-01-29 Thread Felipe Santos
2016-01-28 16:33 GMT-02:00 Igor Neyman <iney...@perceptron.com>:

>
>
> *From:* pgsql-general-ow...@postgresql.org [mailto:
> pgsql-general-ow...@postgresql.org] *On Behalf Of *Felipe Santos
> *Sent:* Thursday, January 28, 2016 1:17 PM
> *To:* Joshua D. Drake <j...@commandprompt.com>
> *Cc:* Melvin Davidson <melvin6...@gmail.com>; David Rowley <
> david.row...@2ndquadrant.com>; pgsql-general@postgresql.org; Thomas
> Kellerer <spam_ea...@gmx.net>
> *Subject:* Re: [GENERAL] BRIN indexes
>
>
>
> "Further to the point, it is self defeating to have more than one BRIN
> index on the table if the columns involved would have mutually
> non-adjacent pages."
>
>
>
>Not really, if both columns are ordered, BRIN will work
>
>
>
> "Therefore, it actually would be good to state that in the documentation,
> even it were just a comment."
>
>
>
>It is = "BRIN is designed for handling very large tables in which
> certain columns have some natural correlation with their physical location
> within the table"
>
>Link: http://www.postgresql.org/docs/devel/static/brin-intro.html
>
>
>
>
>
> Also, I did some tests and here are the results I got:
>
>
>
> Query with no index = completion time 43s
>
> Same Query with BRIN = completion time 14s / index size 0,5 MB
>
> Same Query without BRIN and with BTREE = completion time 10s / index size
> 5.000,00 MB
>
>
>
> As you can see, BRIN can save 99% of disk space for just a slightly worse
> performance.
>
>
>
> It seems like a huge improvement, given that your data fits BRIN's use
> case.
>
>
>
> Felipe,
>
>
>
> What kind of queries you used in your test?
>
> Where they based on clustering columns?
>
>
>
> Regards
>
> Igor Neyman
>


Hello Igor,

I took the sample BRIN test from the new release's wiki and added the BTREE
test:
https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.5#BRIN_Indexes

The results today may vary from the reported above but are still in the
same levels of performance gain:

brin_db=# CREATE TABLE orders (
brin_db(#  id int,
brin_db(#  order_date timestamptz,
brin_db(#  item text);
CREATE TABLE

brin_db=# INSERT INTO orders (order_date, item)
brin_db-#  SELECT x, 'dfiojdso'
brin_db-#  FROM generate_series('2000-01-01 00:00:00'::timestamptz,
'2015-03-01 00:00:00'::timestamptz,'2 seconds'::interval) a(x);
INSERT 0 239243401

brin_db=# \dt+ orders
List of relations
 Schema |  Name  | Type  |  Owner   | Size  | Description
++---+--+---+-
 public | orders | table | postgres | 12 GB |
(1 row)

brin_db=# EXPLAIN ANALYSE SELECT count(*) FROM orders WHERE order_date
BETWEEN '2012-01-04 09:00:00' and '2014-01-04 14:30:00';

 QUERY PLAN


--
---
 Aggregate  (cost=4108912.01..4108912.02 rows=1 width=0) (actual
time=81116.722..81116.722 rows=1 loops=1)
   ->  Seq Scan on orders  (cost=0.00..4106759.58 rows=860972 width=0)
(actual time=60173.531..78566.113 rows=31589101 loops=1)
 Filter: ((order_date >= '2012-01-04 09:00:00-02'::timestamp with
time zone) AND (order_date <= '2014-01-04 14:30:00-02'::timestamp with time
zone))
 Rows Removed by Filter: 207654300
 Planning time: 0.443 ms
 Execution time: 81118.168 ms
(6 rows)

brin_db=#  CREATE INDEX idx_order_date_brin
brin_db-#ON orders
brin_db-#USING BRIN (order_date);
CREATE INDEX

brin_db=# \di+ idx_order_date_brin
List of relations
 Schema |Name | Type  |  Owner   | Table  |  Size  |
Description
+-+---+--+++-
 public | idx_order_date_brin | index | postgres | orders | 432 kB |
(1 row)

brin_db=# EXPLAIN ANALYSE SELECT count(*) FROM orders WHERE order_date
BETWEEN '2012-01-04 09:00:00' and '2014-01-04 14:30:00';

  QUERY PLAN

--
-
 Aggregate  (cost=2408269.34..2408269.35 rows=1 width=0) (actual
time=14164.923..14164.923 rows=1 loops=1)
   ->  Bitmap Heap Scan on orders  (cost=326808.28..2328609.76
rows=31863832 width=0) (actual time=42.742..11430.562 rows=31589101 loops=1)
 Recheck Cond: ((order_date >= '2012-01-04 09:00:00-02'::timestamp
with time zone) AND (order_date <= '2014-01-04 14:30:00-02'::timestamp with
 time zone))
 Rows Removed by Index Recheck: 21907
 Heap Blocks: lossy=201344
 ->  Bitmap Index Scan on idx_order_date_brin
 (cost=0.00..3

Re: [GENERAL] BRIN indexes

2016-01-29 Thread Emre Hasegeli
>> From the above, may I presume that it is best to cluster (or sort), the
>> table based on the intended
>> BRIN column(s) before actually creating the index to insure the pages are
>> adjacent? If so, should
>>  that not be included in the documentation, instead of implied?

The same question is asked to me at PGConf.DE.  I think it would be
nice to address it in the documentation somehow.  Maybe, we should
also explain how the table is physically organised.  It is not clear
to users what kind of operations would make BRIN more useful.

> I don't have faith in CLUSTER anyway.  Taking exclusive locks and all.

It also requires a btree index.  If you can afford to have btree, you
probably don't need BRIN anyway.  Something lighter than CLUSTER which
can use BRIN would be useful.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BRIN indexes

2016-01-28 Thread Alvaro Herrera
Melvin Davidson wrote:

> With regard to BRIN indexes:
> 
> http://www.postgresql.org/docs/9.5/interactive/brin-intro.html
> 
> 62.1. Introduction
> 
>   "A block range is a group of pages that are physically adjacent in the
> table; for each block range, some summary info is stored by the index."
> 
> From the above, may I presume that it is best to cluster (or sort), the
> table based on the intended
> BRIN column(s) before actually creating the index to insure the pages are
> adjacent? If so, should
>  that not be included in the documentation, instead of implied?

The issue is that you cannot normally afford to cluster a table every
once in a while; if the natural order in which data is loaded isn't good
for BRIN, then perhaps you shouldn't consider BRIN at all.  If you're
bulk-loading and then create a BRIN index, then it's better to load the
data in order of the columns.  But perhaps you have reasons to have the
table sorted in some other order, in which case trying to satisfy BRIN
would be worse.

All in all, I think there are enough caveats about this that I'm not
sure about putting it up in the doc.

I don't have faith in CLUSTER anyway.  Taking exclusive locks and all.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BRIN indexes

2016-01-28 Thread Melvin Davidson
Reposting because I received no reply from a previous inquiry as "9.5 new
features".

With regard to BRIN indexes:

http://www.postgresql.org/docs/9.5/interactive/brin-intro.html

62.1. Introduction

  "A block range is a group of pages that are physically adjacent in the
table; for each block range, some summary info is stored by the index."

>From the above, may I presume that it is best to cluster (or sort), the
table based on the intended
BRIN column(s) before actually creating the index to insure the pages are
adjacent? If so, should
 that not be included in the documentation, instead of implied?

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] BRIN indexes

2016-01-28 Thread Felipe Santos
2016-01-28 16:03 GMT-02:00 Joshua D. Drake :

> On 01/28/2016 09:41 AM, Melvin Davidson wrote:
>
>> So, IOW, and the answer to my question is yes, it should be insured that
>> all pages involved are physically adjacent (by design or by pre-sort)
>> before creating a BRIN on them.
>> Further to the point, it is self defeating to have more than one BRIN
>> index on the table if the columns involved would have mutually
>> non-adjacent pages.
>> Therefore, it actually would be good to state that in the documentation,
>> even it were just a comment.
>>
>
> BRIN indexes are best used on INSERT only tables with a sequence of
> numbers as a PK or indexed column that will be queried against. At least as
> I understand it.
>
> JD
>
> --
> Command Prompt, Inc.  http://the.postgres.company/
>  +1-503-667-4564
> PostgreSQL Centered full stack support, consulting and development.
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


"Further to the point, it is self defeating to have more than one BRIN
index on the table if the columns involved would have mutually
non-adjacent pages."

   Not really, if both columns are ordered, BRIN will work

"Therefore, it actually would be good to state that in the documentation,
even it were just a comment."

   It is = "BRIN is designed for handling very large tables in which
certain columns have some natural correlation with their physical location
within the table"
   Link: http://www.postgresql.org/docs/devel/static/brin-intro.html


Also, I did some tests and here are the results I got:

Query with no index = completion time 43s
Same Query with BRIN = completion time 14s / index size 0,5 MB
Same Query without BRIN and with BTREE = completion time 10s / index size
5.000,00 MB

As you can see, BRIN can save 99% of disk space for just a slightly worse
performance.

It seems like a huge improvement, given that your data fits BRIN's use case.


Re: [GENERAL] BRIN indexes

2016-01-28 Thread Melvin Davidson
So, IOW, and the answer to my question is yes, it should be insured that
all pages involved are physically adjacent (by design or by pre-sort)
before creating a BRIN on them.
Further to the point, it is self defeating to have more than one BRIN index
on the table if the columns involved would have mutually  non-adjacent
pages.
Therefore, it actually would be good to state that in the documentation,
even it were just a comment.

On Thu, Jan 28, 2016 at 12:31 PM, David Rowley  wrote:

> On 29 January 2016 at 06:10, Melvin Davidson  wrote:
> > With regard to BRIN indexes:
> >
> > http://www.postgresql.org/docs/9.5/interactive/brin-intro.html
> >
> > 62.1. Introduction
> > 
> >   "A block range is a group of pages that are physically adjacent in the
> table; for each block range, some summary info is stored by the index."
> >
> > From the above, may I presume that it is best to cluster (or sort), the
> table based on the intended
> > BRIN column(s) before actually creating the index to insure the pages
> are adjacent? If so, should
> >  that not be included in the documentation, instead of implied?
>
> I personally think the second sentence of the link to the
> documentation covers this quite well. Namely "BRIN is designed for
> handling very large tables in which certain columns have some natural
> correlation with their physical location within the table."
>
> Examples of this might be something like an "orders" table, where you
> have an orderdate column, probably you'll insert into this table as
> orders are received, so quite possibly the table will be naturally
> ordered in ascending orderdate order. Although UPDATEs might create
> new tuples in some free space elsewhere in the relation, but it's not
> hard to imagine other cases where there's no updates and "natural
> correlation" is persisted.
>
> --
> David Rowley   http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] BRIN indexes

2016-01-28 Thread David Rowley
On 29 January 2016 at 06:10, Melvin Davidson  wrote:
> With regard to BRIN indexes:
>
> http://www.postgresql.org/docs/9.5/interactive/brin-intro.html
>
> 62.1. Introduction
> 
>   "A block range is a group of pages that are physically adjacent in the 
> table; for each block range, some summary info is stored by the index."
>
> From the above, may I presume that it is best to cluster (or sort), the table 
> based on the intended
> BRIN column(s) before actually creating the index to insure the pages are 
> adjacent? If so, should
>  that not be included in the documentation, instead of implied?

I personally think the second sentence of the link to the
documentation covers this quite well. Namely "BRIN is designed for
handling very large tables in which certain columns have some natural
correlation with their physical location within the table."

Examples of this might be something like an "orders" table, where you
have an orderdate column, probably you'll insert into this table as
orders are received, so quite possibly the table will be naturally
ordered in ascending orderdate order. Although UPDATEs might create
new tuples in some free space elsewhere in the relation, but it's not
hard to imagine other cases where there's no updates and "natural
correlation" is persisted.

-- 
David Rowley   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BRIN indexes

2016-01-28 Thread Joshua D. Drake

On 01/28/2016 09:41 AM, Melvin Davidson wrote:

So, IOW, and the answer to my question is yes, it should be insured that
all pages involved are physically adjacent (by design or by pre-sort)
before creating a BRIN on them.
Further to the point, it is self defeating to have more than one BRIN
index on the table if the columns involved would have mutually
non-adjacent pages.
Therefore, it actually would be good to state that in the documentation,
even it were just a comment.


BRIN indexes are best used on INSERT only tables with a sequence of 
numbers as a PK or indexed column that will be queried against. At least 
as I understand it.


JD

--
Command Prompt, Inc.  http://the.postgres.company/
 +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general