According to this link http://postgresql.nabble.com/NULL-saves-disk-space-td4344106.html NULL values do not take place if only one other column are null for that row. Boolean takes 1 byte wheras smallint 2bytes. Then the space problem is not anymore a problem with boolean columns 95% empty
One thing that is really great with postgresql is transaction for drop table cascade, that allow te restore all stuf index, views on a rollback if problem in loading appears. I hope using one transaction to drop/load many table is not a performance issue ? Nicolas PARIS 2015-03-06 11:25 GMT+01:00 Nicolas Paris <nipari...@gmail.com>: > Thanks Jim, > > My suggestion is to test using nothing but plain views and plain indexes >> on the base table > > Actualy the way I choose subset rows for views is complicated in terms of > query. Then using simple views without partial indexes is terrible in terms > of performance (I have tested that). > > You mean the materialized views, right? > > Well I have tested matviews, views without partial indexes, views with > hashjoin on a key, .. > > > >> I think you'll ultimately be unhappy trying to go down this route, for >> the reasons you mention, plus the very large amount of extra space you'll >> be using. 800 booleans is 800 extra bytes for every row in your fact table. >> That's a lot. Even if you used a bitmap instead (which means you have to >> mess around with tracking which bit means what and probably other problems >> as well) you're still looking at 100 bytes per row. That's nothing to >> sneeze at. > > > Since each subset is about 5% (this number is decreasing when number of > views increase) of the fact table, most boolean rows are null. This means > 5% of 800 extra bytes, right ? I have choosen smallint, because of bulk > load decrease csv size (true VS 1). > > For the views that are too slow, look at what the expensive part of the >> view and materialize *only that*. > > It would be great if I could, but all will be automatic, then It will be > difficult to apply such rules that demands human analyse, and manual > database modification, for one subset > > > Hope I have well anderstand you > > > > Nicolas PARIS > > 2015-03-06 10:32 GMT+01:00 Jim Nasby <jim.na...@bluetreble.com>: > >> On 3/6/15 2:16 AM, Nicolas Paris wrote: >> >>> If you have that many different views I doubt you want that many >>> indexes anyway. >>> >>> It's a datawarehouse, then each view is used by many user for each >>> query. >>> Those views must be subset of the huge material table. All indexes are >>> needed >>> >> >> Yes, but they don't have to be partial. >> >> Have you tried just hitting the base table and indexes directly, >>> either through plain views or just direct SQL? >>> >>> I have tried each. The performances are worst querying on a subset >>> (the views) than querying on whole huge table when using the huge indexes >>> >> >> You mean the materialized views, right? If so, that makes sense: Instead >> of having all your users hitting one common set of data (your fact table) >> you had them hitting a bunch of other data (the mat views). But you still >> had other stuff hitting the fact table. So now you were dealing with a lot >> more data than if you just stuck to the single fact table. >> >> => this is the solution I am implementing. (800 is not true, but in 10 >>> years it maybe will be) >>> >> >> In 10 years we'll all be using quantum computers anyway... ;P >> >> Actually, I have added a boolean column on the huge table for each >>> views. This is the way each view is a subset of huge table (Create View >>> as Select * FROM hugeTable WHERE columnX is true --etc 800 times). Then >>> I create 800partials indexes on that column(create index...WHERE columnX >>> is TRUE), for each view. >>> This works great as the query planer chooses the partials indexes when >>> querying the little subset of the terrific table (potential 20bilion >>> rows) >>> >>> This is better than material views for some reasons : >>> - saves places on hard drive (columnX is boolean +same indexes - data >>> for MatViews) >>> - saves time generating materialised views >>> >> >> But this isn't better than the mat views because of a bunch of booleans; >> it's better because it means less stain on the disk cache. >> >> This is quite more complicated because in the project, the number of >>> view is increasing, and dynamic then : >>> - then adding new mat views is simple >>> - adding new views => adding new column on the huge table. It can take >>> long time to update boolean for each tuple. Then I need to truncate/bulk >>> load all data each time I add a new View. Other problem is dynamic >>> number column table was a bit tricky to implement in an ETL soft such >>> Talend, but the benefits are I hope great. >>> >> >> I think you'll ultimately be unhappy trying to go down this route, for >> the reasons you mention, plus the very large amount of extra space you'll >> be using. 800 booleans is 800 extra bytes for every row in your fact table. >> That's a lot. Even if you used a bitmap instead (which means you have to >> mess around with tracking which bit means what and probably other problems >> as well) you're still looking at 100 bytes per row. That's nothing to >> sneeze at. >> >> My suggestion is to test using nothing but plain views and plain indexes >> on the base table. I expect that some of those views will not perform >> adequately, but many (or most) of them will be fine. For the views that are >> too slow, look at what the expensive part of the view and materialize *only >> that*. I suspect you'll find that when you do that you'll discover that >> several views are slow because of the same thing, so if you materialize >> that one thing one time you can then use it to speed up several views. >> >> Using that approach means you'll have a lot less data that you have to >> read. >> >> -- >> Jim Nasby, Data Architect, Blue Treble Consulting >> Data in Trouble? Get it in Treble! http://BlueTreble.com >> > >