Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-04 Thread Alexander Korotkov
I have another hypothesis about index bloat cause. AFAIK, vaccum procedure on GiST don't have any storage utilization guarantee. For example, if only one live item is in some page, then only one item will be left in this page. I.e. there is no index reroganization during vacuum. If there wouldn't

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-04 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes: I have another hypothesis about index bloat cause. AFAIK, vaccum procedure on GiST don't have any storage utilization guarantee. For example, if only one live item is in some page, then only one item will be left in this page. I.e. there is no

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-04 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: (c) we *think* that other columns of the table, including other indexed columns, are changed much more frequently than the intarray column is. Currently doing analysis on that. Yeah, I noticed that your statistics for the table showed far more updates than

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-04 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: Can you provide the data in the column that's indexed? Attached. This is for the index which is 90% free space. I tried loading this data in fresh, and then creating a gist__intbig_ops index on it. I got these pgstattuple numbers (in 8.4.8): table_len

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-04 Thread Josh Berkus
On 5/4/11 11:29 AM, Tom Lane wrote: which is up in the same ballpark with your problem. Now probably your client's app is not updating all rows at once, but still this is a pretty wasteful update pattern. Is there a reason not to update all the columns in a single update? Yeah, really

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-04 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: On 5/4/11 11:29 AM, Tom Lane wrote: If you can't change the app, I'd suggest more aggressive autovacuuming as the least painful fix. Will test that. It's not clear that vacuuming is helping at all. Well, you realize of course that you need a REINDEX to

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-03 Thread Josh Berkus
Tom, Alexander, So, some data: corp=# select indexname, pg_size_pretty(pg_relation_size(indexname::text)) as indexsize, pg_size_pretty(pg_relation_size(tablename::text)) as tablesize from pg_indexes where indexname like '%__listings_features' order by pg_relation_size(indexname::text) desc;

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-03 Thread Josh Berkus
All, Some trending data, since there's a lot of bloated indexes here: select 'index_' || ( row_number() over ( order by free_percent desc ) ) as index, * from ( select (public.pgstattuple(indexname::text)).free_percent, round(( n_tup_upd )::numeric / n_tup_ins, 2) as update_ratio, round((

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-03 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: So, some data: corp=# select * from public.pgstattuple('idx__listings_features'); -[ RECORD 1 ]--+- table_len | 54190080 tuple_count| 7786 tuple_len | 2117792 tuple_percent | 3.91 dead_tuple_count | 0

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-02 Thread Josh Berkus
Tom, Alexander, So we are using gist_intbig_ops, so that's not the issue. Using pgstattuple might be a bit of a challenge. The client doesn't have it installed, and I can't pull it from Yum without also upgrading PostgreSQL, since Yum doesn't stock old versions AFAIK. Maybe we should consider

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-02 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: Tom, Alexander, So we are using gist_intbig_ops, so that's not the issue. Using pgstattuple might be a bit of a challenge. The client doesn't have it installed, and I can't pull it from Yum without also upgrading PostgreSQL, since Yum doesn't stock old

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-04-29 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes: Uh, no, the picksplit bugs we fixed were in cube and seg --- there's no reason to think that updating will help this. But 8.4's pgstattuple does appear to support gist indexes, so please run that and see what you get. There's also gevel that I used to

[HACKERS] Extreme bloating of intarray GiST indexes

2011-04-28 Thread Josh Berkus
Hackers, I'm currently looking at a database which has some extreme bloating of intarray GiST indexes. As in 1000% bloating in only a few months. This is not a particularly high-transaction-rate database, so the bloating is a little surprising; I can only explain it if vacuum wasn't cleaning

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-04-28 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: I'm currently looking at a database which has some extreme bloating of intarray GiST indexes. As in 1000% bloating in only a few months. This is not a particularly high-transaction-rate database, so the bloating is a little surprising; I can only explain

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-04-28 Thread Josh Berkus
1. What PG version? 8.4.4, so it has the broken picksplit. 2. If new enough to have contrib/pgstattuple, what does pgstattuple() have to say about the index? Will check. I'm suspicious that this might be bloat caused by a bad picksplit function, not from having a lot of dead entries

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-04-28 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: 1. What PG version? 8.4.4, so it has the broken picksplit. ... Yeah, I'll test updating to 8.4.8. Uh, no, the picksplit bugs we fixed were in cube and seg --- there's no reason to think that updating will help this. But 8.4's pgstattuple does appear to

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-04-28 Thread Alexander Korotkov
On Thu, Apr 28, 2011 at 11:11 PM, Josh Berkus j...@agliodbs.com wrote: I'm currently looking at a database which has some extreme bloating of intarray GiST indexes. As in 1000% bloating in only a few months. This is not a particularly high-transaction-rate database, so the bloating is a

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-04-28 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes: What opclass is used for GiST index: gist__int_ops or gist__intbig_ops? Do you take into account that gist__int_ops is very inefficient for large datasets? I seem to recall some discussion recently about documenting where you should cut over to

Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-04-28 Thread Alexander Korotkov
On Fri, Apr 29, 2011 at 1:27 AM, Tom Lane t...@sss.pgh.pa.us wrote: I seem to recall some discussion recently about documenting where you should cut over to using gist__intbig_ops --- IIRC, it wasn't all that big by modern standards. But it doesn't look like any such change made it into the