Both $SUBJECT functions pass to hash_create() an expected hash table size of 10000 * attstattarget. Based on header comments, this represents a near-worst case. These typanalyze functions scan the hash tables sequentially, thereby visiting the entire allocation. Per the recommendation in comments at hash_create(), we should be more conservative. On my system, this tiny test case runs in 2.8s and dirties 1.0 GiB of local memory:
CREATE TEMP TABLE t AS SELECT '{}'::int[]; SET default_statistics_target = 10000; ANALYZE t; Rather arbitrarily, I reduced the hash_create() size hint by 99.9%, to the width of the histograms destined for pg_statistic. This streamlined the test case to <20ms runtime and 2 MiB of memory. To verify that nothing awful happens when the hash table sees considerable dynamic growth, I used a subject table entailing a 9M-element hash table at ANALYZE time: CREATE UNLOGGED TABLE t AS SELECT array[n,3000000+n,6000000+n] FROM generate_series(1,3000000) t(n); Unpatched master takes 15s and dirties 2.1 GiB; patched takes 15s and dirties 1.2 GiB. The timing noise overlapped any systematic difference, but the patched version might have been around 500ms slower. Based on that, I'm comfortable trusting that improving smaller cases in this way will not greatly harm larger cases. The lack of field complaints about ts_typanalyze() resource usage does argue against the need for a change here, but I think adding array_typanalyze() in PostgreSQL 9.2 significantly increases our risk exposure. Sites may have cranked up the statistics target on array columns to compensate for the lack of explicit statistical support. Every cluster has several array columns in the system catalogs. The size hint I chose is fairly arbitrary. Any suggestions for principled alternatives? Thanks, nm
*** a/src/backend/tsearch/ts_typanalyze.c --- b/src/backend/tsearch/ts_typanalyze.c *************** *** 186,192 **** compute_tsvector_stats(VacAttrStats *stats, hash_ctl.match = lexeme_match; hash_ctl.hcxt = CurrentMemoryContext; lexemes_tab = hash_create("Analyzed lexemes table", ! bucket_width * 7, &hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); --- 186,192 ---- hash_ctl.match = lexeme_match; hash_ctl.hcxt = CurrentMemoryContext; lexemes_tab = hash_create("Analyzed lexemes table", ! num_mcelem, &hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); *** a/src/backend/utils/adt/array_typanalyze.c --- b/src/backend/utils/adt/array_typanalyze.c *************** *** 282,288 **** compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc, elem_hash_ctl.match = element_match; elem_hash_ctl.hcxt = CurrentMemoryContext; elements_tab = hash_create("Analyzed elements table", ! bucket_width * 7, &elem_hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); --- 282,288 ---- elem_hash_ctl.match = element_match; elem_hash_ctl.hcxt = CurrentMemoryContext; elements_tab = hash_create("Analyzed elements table", ! num_mcelem, &elem_hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers