Re: [HACKERS] Improving executor performance - tidbitmap

2016-07-17 Thread Andres Freund
On 2016-07-17 08:32:17 -0400, Robert Haas wrote:
> On Wed, Jul 13, 2016 at 11:06 PM, Andres Freund  wrote:
> > The major issue with the simplehash implementation in the path is
> > probably the deletion; which should rather move cells around, rather
> > than use toombstones. But that was too complex for a POC ;). Also, it'd
> > likely need a proper iterator interface.
>
> Do we ever need to delete from a TIDBitmap?  Probably not, but I'm
> guessing you have other uses for this in mind.

We do, via BitmapAnd.


> > FWIW, the dynahash usage in nodeAgg.c is a major bottleneck in a number
> > of other queries.
>
> Can we use this implementation for that as well, or are we going to
> need yet another one?

I've not tested it, but I'd assume that something like the simplehash
should work there. It's a bit more complicated of a scenario though.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving executor performance - tidbitmap

2016-07-17 Thread Robert Haas
On Wed, Jul 13, 2016 at 11:06 PM, Andres Freund  wrote:
> On 2016-06-24 16:29:53 -0700, Andres Freund wrote:
>> 4) Various missing micro optimizations have to be performed, for more
>>architectural issues to become visible. E.g. [2] causes such bad
>>slowdowns in hash-agg workloads, that other bottlenecks are hidden.
>
> One such issue is the usage of dynahash.c in tidbitmap.c. In many
> queries, e.g. tpch q7, the bitmapscan is often the bottleneck. Profiling
> shows that this is largely due to dynahash.c being slow. Primary issues
> are: a) two level structure doubling the amount of indirect lookups b)
> indirect function calls c) using separate chaining based conflict
> resolution d) being too general.
>
> I've quickly hacked up an alternative linear addressing hashtable
> implementation. And the improvements are quite remarkable.

Nice!

> I'm wondering whether we can do 'macro based templates' or
> something. I.e. have something like the simplehash in the patch in
> simplehash.h, but the key/value widths, the function names, are all
> determined by macros (oh, this would be easier with C++ templates...).
>
> Does anybody have a better idea?

No.

> The major issue with the simplehash implementation in the path is
> probably the deletion; which should rather move cells around, rather
> than use toombstones. But that was too complex for a POC ;). Also, it'd
> likely need a proper iterator interface.

Do we ever need to delete from a TIDBitmap?  Probably not, but I'm
guessing you have other uses for this in mind.

> FWIW, the dynahash usage in nodeAgg.c is a major bottleneck in a number
> of other queries.

Can we use this implementation for that as well, or are we going to
need yet another one?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving executor performance - tidbitmap

2016-07-14 Thread Peter Geoghegan
On Thu, Jul 14, 2016 at 8:45 PM, Andres Freund  wrote:
> Brin indexes IIRC always end up using tidbitmap.c, so the benefits
> should be there as well ;)

Right. Might the improvement be even more pronounced, though?

I'm not sure how a BRIN index with a suitable physical/logical
correlation performs compared to a bitmap index scan of a B-Tree (due
to a less selective bitmap index scan qual, as in your example), but
offhand I guess that could be faster in general, making the bottleneck
you're addressing relatively greater there.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving executor performance - tidbitmap

2016-07-14 Thread Andres Freund
On 2016-07-14 20:41:21 -0700, Peter Geoghegan wrote:
> On Wed, Jul 13, 2016 at 8:06 PM, Andres Freund  wrote:
> > I've quickly hacked up an alternative linear addressing hashtable
> > implementation. And the improvements are quite remarkable.
> >
> > Example Query:
> > EXPLAIN ANALYZE SELECT SUM(l_extendedprice) FROM lineitem WHERE (l_shipdate 
> > >= '1995-01-01'::date) AND (l_shipdate <= '1996-12-31'::date);
> 
> > timing without analyze: 4136.425 4101.873 4177.441
> >
> > after:
> 
> > timing without analyze: 2647.364 2674.456 2680.197
> >
> > as you can see the the time for the bitmap index scan goes from 2461.622
> > to 952.161.
> 
> That's pretty great. I wonder what this would look like with a BRIN
> index, since l_shipdate looks like a good candidate for BRIN indexing.

Brin indexes IIRC always end up using tidbitmap.c, so the benefits
should be there as well ;)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving executor performance - tidbitmap

2016-07-14 Thread Peter Geoghegan
On Wed, Jul 13, 2016 at 8:06 PM, Andres Freund  wrote:
> I've quickly hacked up an alternative linear addressing hashtable
> implementation. And the improvements are quite remarkable.
>
> Example Query:
> EXPLAIN ANALYZE SELECT SUM(l_extendedprice) FROM lineitem WHERE (l_shipdate 
> >= '1995-01-01'::date) AND (l_shipdate <= '1996-12-31'::date);

> timing without analyze: 4136.425 4101.873 4177.441
>
> after:

> timing without analyze: 2647.364 2674.456 2680.197
>
> as you can see the the time for the bitmap index scan goes from 2461.622
> to 952.161.

That's pretty great. I wonder what this would look like with a BRIN
index, since l_shipdate looks like a good candidate for BRIN indexing.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving executor performance - tidbitmap

2016-07-13 Thread Andres Freund
Hi,

On 2016-06-24 16:29:53 -0700, Andres Freund wrote:
> 4) Various missing micro optimizations have to be performed, for more
>architectural issues to become visible. E.g. [2] causes such bad
>slowdowns in hash-agg workloads, that other bottlenecks are hidden.

One such issue is the usage of dynahash.c in tidbitmap.c. In many
queries, e.g. tpch q7, the bitmapscan is often the bottleneck. Profiling
shows that this is largely due to dynahash.c being slow. Primary issues
are: a) two level structure doubling the amount of indirect lookups b)
indirect function calls c) using separate chaining based conflict
resolution d) being too general.

I've quickly hacked up an alternative linear addressing hashtable
implementation. And the improvements are quite remarkable.

Example Query:
EXPLAIN ANALYZE SELECT SUM(l_extendedprice) FROM lineitem WHERE (l_shipdate >= 
'1995-01-01'::date) AND (l_shipdate <= '1996-12-31'::date);

before:
┌──┐
│QUERY PLAN 
   │
├──┤
│ Aggregate  (cost=942046.72..942046.73 rows=1 width=8) (actual 
time=4647.908..4647.909 rows=1 loops=1) 
   │
│   ->  Bitmap Heap Scan on lineitem  (cost=193514.84..919246.17 rows=9120222 
width=8) (actual time=2667.821..3885.598 rows=9113219 loops=1)   │
│ Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= 
'1996-12-31'::date))│
│ Heap Blocks: exact=585542 
   │
│ ->  Bitmap Index Scan on i_l_shipdate  (cost=0.00..191234.78 
rows=9120222 width=0) (actual time=2461.622..2461.622 rows=9113219 loops=1) │
│   Index Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate 
<= '1996-12-31'::date))│
│ Planning time: 0.170 ms   
   │
│ Execution time: 4648.921 ms   
   │
└──┘

timing without analyze: 4136.425 4101.873 4177.441

after:
┌┐
│   QUERY PLAN  
 │
├┤
│ Aggregate  (cost=942046.72..942046.73 rows=1 width=8) (actual 
time=3218.422..3218.423 rows=1 loops=1) 
 │
│   ->  Bitmap Heap Scan on lineitem  (cost=193514.84..919246.17 rows=9120222 
width=8) (actual time=1153.707..2430.500 rows=9113219 loops=1) │
│ Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= 
'1996-12-31'::date))  │
│ Heap Blocks: exact=585542 
 │
│ ->  Bitmap Index Scan on i_l_shipdate  (cost=0.00..191234.78 
rows=9120222 width=0) (actual time=952.161..952.161 rows=9113219 loops=1) │
│   Index Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate 
<= '1996-12-31'::date))  │
│ Planning time: 1.075 ms   
 │
│ Execution time: 3221.861 ms   
 │
└┘
(8 rows)

timing without analyze: 2647.364 2674.456 2680.197

as you can see the the time for the bitmap index scan goes from 2461.622
to 952.161.

I'm not proposing to apply the patch as-is, but I think it's a good
starting point to discuss how to evolve our use of hash tables.

I'm wondering whether we can do 'macro based templates' or
something. I.e. have something like the simplehash in the patch in
simplehash.h,