On Tue, Apr 16, 2019 at 9:15 AM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
> > I'm not sure it's that clear cut, actually. Sure, it's not the usual > (block,item) pair so it's not possible to jump to the exact location, so > it's not the raw physical identifier as regular TID. But the data are > organized in a btree, with the TID as a key, so it does actually provide > some information about the location. > >From representation perspective its logical identifier. But yes since is used as used as key to layout datum's, there exists pretty good correlation between TIDs and physical location. Can consider it as clustered based on TID. I've asked about BRIN indexes elsewhere in this thread, which I think is > related to this question, because that index type relies on TID providing > sufficient information about location. And I think BRIN indexes are going > to be rather important for colstores (and formats like ORC have something > very similar built-in). > > But maybe all we'll have to do is define the ranges differently - instead > of "number of pages" we may define them as "number of rows" and it might > be working. > BRIN indexes work for zedstore right now. A block range maps to just a range of TIDs in zedstore, as pointed out above. When one converts a zstid to an ItemPointer, can get the "block number" from the ItemPointer, like from a normal heap TID. It doesn't mean the direct physical location of the row in zedstore, but that's fine. It might be sub-optimal in some cases. For example if one zedstore page contains TIDs 1-1000, and another 1000-2000, and the entry in the BRIN index covers TIDs 500-1500, have to access both zedstore pages. Would be better if the cutoff points in the BRIN index would match the physical pages of the zedstore. But it still works, and is probably fine in practice. Plan is to add integrated BRIN index in zedstore, means keep min-max values for appropriate columns within page. This will not help to eliminate the IO as external BRIN index does but helps to skip uncompression and visibility checks etc... for blocks not matching the conditions. Just to showcase brin works for zedstore, played with hands-on example mentioned in [1]. With btree index on zedstore QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=4351.50..4351.51 rows=1 width=32) (actual time=1267.140..1267.140 rows=1 loops=1) -> Index Scan using idx_ztemperature_log_log_timestamp on ztemperature_log (cost=0.56..4122.28 rows=91686 width=4) (actual time=0.117..1244.112 rows=86400 loops=1) Index Cond: ((log_timestamp >= '2016-04-04 00:00:00'::timestamp without time zone) AND (log_timestamp < '2016-04-05 00:00:00'::timestamp without time zone)) Planning Time: 0.240 ms Execution Time: 1269.016 ms (5 rows) With brin index on zedstore. Note: Bitmap index for zedstore currently scans all the columns. Scanning only required columns for query is yet to be implemented. QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=217538.85..217538.86 rows=1 width=32) (actual time=54.167..54.167 rows=1 loops=1) -> Gather (cost=217538.63..217538.84 rows=2 width=32) (actual time=53.967..55.184 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=216538.63..216538.64 rows=1 width=32) (actual time=42.956..42.957 rows=1 loops=3) -> Parallel Bitmap Heap Scan on ztemperature_log (cost=59.19..216446.98 rows=36660 width=4) (actual time=3.571..35.904 rows=28800 loops=3) Recheck Cond: ((log_timestamp >= '2016-04-04 00:00:00'::timestamp without time zone) AND (log_timestamp < '2016-04-05 00:00:00'::timestamp without time zone)) Rows Removed by Index Recheck: 3968 Heap Blocks: lossy=381 -> Bitmap Index Scan on idx_ztemperature_log_log_timestamp (cost=0.00..37.19 rows=98270 width=0) (actual time=1.201..1.201 rows=7680 loops=1) Index Cond: ((log_timestamp >= '2016-04-04 00:00:00'::timestamp without time zone) AND (log_timestamp < '2016-04-05 00:00:00'::timestamp without time zone)) Planning Time: 0.240 ms Execution Time: 55.341 ms (13 rows) schema_name | index_name | index_ratio | index_size | table_size -------------+------------------------------------+-------------+------------+------------ public | idx_ztemperature_log_log_timestamp | 0 | 80 kB | 1235 MB (1 row) 1] https://www.postgresql.fastware.com/blog/brin-indexes-what-are-they-and-how-do-you-use-them