Hi, Thanks for the updated patch. It's a quite massive amount of code - I I don't think we had many 2MB patches in the past, so this is by no means a full review.
1) the psql_1.out is missing a bit of expected output (due to 098fb0079) 2) I'm getting crashes in intarray contrib, due to hitting this error in lwlock.c (backtrace attached): /* Ensure we will have room to remember the lock */ if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS) elog(ERROR, "too many LWLocks taken"); I haven't investigates this too much, but it's regular build with asserts and TAP tests, so it should be simple to reproduce using "make check-world" I guess. 3) I did a very simple benchmark, loading a TPC-H data (for 75GB), followed by pg_dump, and the duration (in seconds) looks like this: master zedstore/pglz zedstore/lz4 ------------------------------------------------- copy 1855 68092 2131 dump 751 905 811 And the size of the lineitem table (as shown by \d+) is: master: 64GB zedstore/pglz: 51GB zedstore/lz4: 20GB It's mostly expected lz4 beats pglz in performance and compression ratio, but this seems a bit too extreme I guess. Per past benchmarks (e.g. [1] and [2]) the difference in compression/decompression time should be maybe 1-2x or something like that, not 35x like here. [1] https://www.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de [2] https://www.postgresql.org/message-id/20130605150144.GD28067%40alap2.anarazel.de Furthermore, the pglz compression is not consuming the most CPU, at least that's what perf says: 24.82% postgres [.] encode_chunk_varlen 20.49% postgres [.] decode_chunk 13.01% postgres [.] merge_attstream_guts.isra.0 12.68% libc-2.32.so [.] __memmove_avx_unaligned_erms 8.72% postgres [.] encode_chunk_fixed 6.16% postgres [.] pglz_compress 4.36% postgres [.] decode_attstream_cont 2.27% postgres [.] 0x00000000000baff0 1.84% postgres [.] AllocSetAlloc 0.79% postgres [.] append_attstream 0.70% postgres [.] palloc So I wonder if this is a sign of a deeper issue - maybe the lower compression ratio (for pglz) triggers some sort of feedback loop in zedstore, or something like that? Not sure, but this seems strange. 4) I looked at some of the code, like merge_attstream etc. and I wonder if this might be related to some of the FIXME comments. For example this bit in merge_attstream seems interesting: * FIXME: we don't actually pay attention to the compression anymore. * We never repack. * FIXME: this is backwords, the normal fast path is if (firsttid1 > lasttid2) But I suppose that should affect both pglz and lz4, and I'm not sure how up to date those comments actually are. BTW the comments in general need updating and tidying up, to make reviews easier. For example the merge_attstream comment references attstream1 and attstream2, but those are not the current parameters of the function. 5) IHMO there should be a #define specifying the maximum number of items per chunk (60). Currently there are literal constants used in various places, sometimes 60, sometimes 59 etc. which makes it harder to understand the code. FWIW 60 seems a bit low, but maybe it's OK. 6) I do think ZSAttStream should track which compression is used by the stream, for two main reasons. Firstly, there's another patch to support "custom compression" methods, which (also) allows multiple compression methods per column. It'd be a bit strange to support that for varlena columns in heap table, and not here, I guess. Secondly, I think one of the interesting columnstore features down the road will be execution on compressed data, which however requires compression method designed for that purpose, and it's often datatype-specific (delta encoding, ...). I don't think we need to go as far as supporting "custom" compression methods here, but I think we should allow different built-in compression methods for different attstreams. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Core was generated by `postgres: user contrib_regression [local] CREATE INDEX '. Program terminated with signal SIGABRT, Aborted. #0 0x000074f9452239e5 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: dnf debuginfo-install glibc-2.31-4.fc32.x86_64 (gdb) bt #0 0x000074f9452239e5 in raise () from /lib64/libc.so.6 #1 0x000074f94520c895 in abort () from /lib64/libc.so.6 #2 0x0000000000936246 in errfinish (filename=<optimized out>, lineno=<optimized out>, funcname=0xab4660 <__func__.5> "LWLockAcquire") at elog.c:592 #3 0x00000000007fa895 in LWLockAcquire (lock=0x74f93bcb0400, mode=LW_EXCLUSIVE) at lwlock.c:1240 #4 0x000000000053c850 in AdvanceXLInsertBuffer (upto=28450816, opportunistic=false) at xlog.c:2203 #5 0x000000000053cbf7 in GetXLogBuffer (ptr=28450816) at xlog.c:1962 #6 0x000000000053d622 in CopyXLogRecordToWAL (EndPos=28557992, StartPos=27456680, rdata=0x74f93bb55f60, isLogSwitch=false, write_len=1098071) at xlog.c:1561 #7 XLogInsertRecord (rdata=<optimized out>, fpw_lsn=<optimized out>, flags=<optimized out>, num_fpi=199) at xlog.c:1125 #8 0x0000000000548dec in XLogInsert (rmid=rmid@entry=0 '\000', info=info@entry=176 '\260') at xloginsert.c:818 #9 0x0000000000549716 in log_newpage_range (rel=0x74f93bbaa508, forkNum=MAIN_FORKNUM, startblk=<optimized out>, endblk=4261, page_std=<optimized out>) at xloginsert.c:1182 #10 0x00000000004b2b9e in gistbuild (heap=<optimized out>, index=0x74f93bbaa508, indexInfo=<optimized out>) at gistbuild.c:330 #11 0x0000000000581d52 in index_build (heapRelation=heapRelation@entry=0x74f93bba4208, indexRelation=indexRelation@entry=0x74f93bbaa508, indexInfo=indexInfo@entry=0x21748a0, isreindex=isreindex@entry=false, parallel=parallel@entry=true) at index.c:3067 #12 0x00000000005831d6 in index_create (heapRelation=<optimized out>, indexRelationName=<optimized out>, indexRelationId=<optimized out>, parentIndexRelid=<optimized out>, parentConstraintId=<optimized out>, relFileNode=<optimized out>, indexInfo=<optimized out>, indexColNames=<optimized out>, accessMethodObjectId=<optimized out>, tableSpaceId=<optimized out>, collationObjectId=<optimized out>, classObjectId=<optimized out>, coloptions=<optimized out>, reloptions=<optimized out>, flags=<optimized out>, constr_flags=<optimized out>, allow_system_table_mods=<optimized out>, is_internal=<optimized out>, constraintId=<optimized out>) at index.c:1256 #13 0x0000000000626d10 in DefineIndex (relationId=16507, stmt=<optimized out>, indexRelationId=0, parentIndexId=0, parentConstraintId=0, is_alter_table=false, check_rights=true, check_not_in_use=true, skip_build=false, quiet=false) at indexcmds.c:1118 #14 0x00000000008135c9 in ProcessUtilitySlow (pstate=pstate@entry=0x21b55a0, pstmt=pstmt@entry=0x2092a70, queryString=queryString@entry=0x2091920 "CREATE INDEX text_idx on test__int using gist (a gist__intbig_ops(siglen = 2024));", context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, qc=0x7fffc1132580, dest=0x2092b60) at utility.c:1521 #15 0x00000000008125b9 in standard_ProcessUtility (pstmt=0x2092a70, queryString=0x2091920 "CREATE INDEX text_idx on test__int using gist (a gist__intbig_ops(siglen = 2024));", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x2092b60, qc=0x7fffc1132580) at utility.c:1071 #16 0x000000000080fb36 in PortalRunUtility (portal=0x20f9a50, pstmt=0x2092a70, isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>, dest=0x2092b60, qc=0x7fffc1132580) at pquery.c:1159 #17 0x00000000008105e4 in PortalRunMulti (portal=portal@entry=0x20f9a50, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x2092b60, altdest=altdest@entry=0x2092b60, qc=qc@entry=0x7fffc1132580) at pquery.c:1305 #18 0x00000000008114d6 in PortalRun (portal=0x20f9a50, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x2092b60, altdest=0x2092b60, qc=0x7fffc1132580) at pquery.c:779 #19 0x000000000080cd16 in exec_simple_query (query_string=0x2091920 "CREATE INDEX text_idx on test__int using gist (a gist__intbig_ops(siglen = 2024));") at postgres.c:1239 #20 0x000000000080ea1f in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:4305 #21 0x00000000007804ab in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4488 #22 BackendStartup (port=<optimized out>) at postmaster.c:4210 #23 ServerLoop () at postmaster.c:1727 #24 0x000000000078142c in PostmasterMain (argc=<optimized out>, argv=0x208b5f0) at postmaster.c:1400 #25 0x00000000004838f0 in main (argc=8, argv=0x208b5f0) at main.c:209
98.96% 0.00% postgres [unknown] [.] 0x495641002713333d | ---0x495641002713333d __libc_start_main startup_hacks PostmasterMain ServerLoop BackendStartup ExitPostmaster PostgresMain exec_simple_query PortalRun PortalRunMulti PortalRunUtility ProcessUtility standard_ProcessUtility DoCopy | --98.95%--CopyFrom | |--96.09%--CopyMultiInsertInfoFlush | | | --96.09%--CopyMultiInsertBufferFlush | | | --96.05%--table_multi_insert | | | --96.05%--zedstoream_multi_insert | | | --96.03%--zsbt_tuplebuffer_spool_slots | | | --95.82%--zsbt_attbuffer_spool | | | --95.51%--zsbt_attbuffer_flush | | | --95.19%--zsbt_attr_add | | | |--86.14%--merge_attstream | | | | | --85.70%--merge_attstream_guts | | | | | |--40.57%--append_attstream | | | | | | | --39.48%--encode_chunk | | | | | | | |--25.36%--encode_chunk_varlen | | | | | | | |--11.59%--encode_chunk_fixed | | | | | | | --2.10%--__memmove_avx_unaligned_erms | | | | | |--27.24%--decode_attstream_cont | | | | | | | --23.14%--decode_chunk | | | | | | | |--12.71%--decode_chunk_varlen | | | | | | | | | --2.34%--palloc | | | | | | | | | --1.86%--AllocSetAlloc | | | | | | | |--6.43%--decode_chunk_fixed | | | | | | | --3.44%--__memmove_avx_unaligned_erms | | | | | --1.36%--__memmove_avx_unaligned_erms | | | --8.72%--zsbt_attr_pack_attstream | | | --8.44%--zs_compress_destSize | | | --8.41%--pglz_compress | | | --6.16%--pglz_find_match | --2.60%--NextCopyFrom | |--1.57%--InputFunctionCall | | | |--0.60%--date_in | | | --0.56%--numeric_in | --0.83%--NextCopyFromRawFields | --0.59%--CopyReadLine