Re: Yet another vectorized engine

Hubert Zhang Wed, 26 Feb 2020 19:02:18 -0800

On Wed, Feb 26, 2020 at 7:59 PM Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:


>
>
> On 26.02.2020 13:11, Hubert Zhang wrote:
>
>
>
>> and with JIT:
>>
>>  13.88%  postgres  postgres             [.] tts_buffer_heap_getsomeattrs
>>    7.15%  postgres  vectorize_engine.so  [.] vfloat8_accum
>>    6.03%  postgres  postgres             [.] HeapTupleSatisfiesVisibility
>>    5.55%  postgres  postgres             [.] bpchareq
>>    4.42%  postgres  vectorize_engine.so  [.] VExecStoreColumns
>>    4.19%  postgres  postgres             [.] hashbpchar
>>    4.09%  postgres  vectorize_engine.so  [.] vfloat8pl
>>
>>
> I also tested Q1 with your latest code. Result of vectorized is still slow.
> PG13 native: 38 secs
> PG13 Vec: 30 secs
> PG13 JIT: 23 secs
> PG13 JIT+Vec: 27 secs
>
>
> It is strange that your results are much slower than my and profile is
> very different.
> Which postgres configuration you are using?
>
>
./configure CFLAGS="-O3 -g -march=native" --prefix=/usr/local/pgsql/
--disable-cassert --enable-debug --with-llvm
 I also use `PGXS := $(shell $(PG_CONFIG) --pgxs)` to compile
vectorized_engine. So it will share the same compile configuration.

My perf result is as belows. There are three parts:
> 1. lookup_hash_entry(43.5%) this part is not vectorized yet.
>
> It is vectorized in some sense: lookup_hash_entry performs bulk of hash
> lookups and pass array with results of such lookups to aggregate transmit
> functions.
> It will be possible to significantly increase speed of HashAgg if we store
> data in order of grouping attributes and use RLE (run length encoding) to
> peform just one
> hash lookup for group of values. But it requires creation of special
> partitions (like it is done in Vertica and VOPS).
>
>
Yes, Vertica's partition needed to be pre-sorted on user defined columns.
So for TPCH Q1 on Postgres, we could not have that assumption. And my Q1
plan uses HashAgg instead of GroupAgg based on cost.


> 2. scan part: fetch_input_tuple(36%)
> 3. vadvance_aggregates part(20%)
> I also perfed on PG96 vectorized version and got similar perf results and
> running time of vectorized PG96 and PG13 are also similar. But PG13 is much
> faster than PG96. So I just wonder whether we merge all the latest executor
> code of PG13 into the vectorized PG13 branch?
>
>
> Sorry, I do not understand the question. vectorize_executor contains
> patched versions of nodeSeqscan  and nodeAgg from standard executor.
> When performing porting to PG13, I took the latest version of nodeAgg and
> tried to apply your patches to it. Certainly not always it was possible and
> I have to rewrite a lt of places. Concerning nodeSeqscan - I took old
> version from vectorize_executor and port it to PG13.
>

> It is strange that I am not seeing lookup_hash_entry in profile in my
> case.
>
>
So you already have the PG13 nodeAgg, that is good.
Yes, it is strange. Hash table probing is always the costly part.
My perf command `perf record --call-graph dwarf -p pid`
Could you share your lineitem schema and Q1 query?
My schema and Q1 query are:
CREATE TABLE lineitem (
        l_orderkey BIGINT NOT NULL,
        l_partkey INTEGER NOT NULL,
        l_suppkey INTEGER NOT NULL,
        l_linenumber INTEGER NOT NULL,
        l_quantity double precision NOT NULL,
        l_extendedprice double precision NOT NULL,
        l_discount double precision NOT NULL,
        l_tax double precision NOT NULL,
        l_returnflag CHAR(1) NOT NULL,
        l_linestatus CHAR(1) NOT NULL,
        l_shipdate DATE NOT NULL,
        l_commitdate DATE NOT NULL,
        l_receiptdate DATE NOT NULL,
        l_shipinstruct CHAR(25) NOT NULL,
        l_shipmode CHAR(10) NOT NULL,
        l_comment VARCHAR(44) NOT NULL
);
select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(l_discount) as count_order
from
    lineitem
where
    l_shipdate <= date '1998-12-01' - interval '106 day'
group by
    l_returnflag,
    l_linestatus
;


-- 
Thanks

Hubert Zhang

Re: Yet another vectorized engine

Reply via email to