Eh, I checked, you are right. Something terribly wrong with hash index in 
PostgreSQL. But there are another hash index gin(jsonb_path_ops), may be 
correctly say gin+hash index. Looked like it is the best for this purpose.

> 26 окт. 2018 г., в 19:27, Jeff Janes <jeff.ja...@gmail.com> написал(а):
> 
> On Thu, Oct 25, 2018 at 9:36 AM Олег Самойлов <spl...@ya.ru 
> <mailto:spl...@ya.ru>> wrote:
> 
>> 17 окт. 2018 г., в 13:46, Ravi Krishna <srkrish...@aol.com 
>> <mailto:srkrish...@aol.com>> написал(а):
>> 
>> In  
>> https://www.cybertec-postgresql.com/en/ideas-for-scaling-postgresql-to-multi-terabyte-and-beyond/
>>  
>> <https://www.cybertec-postgresql.com/en/ideas-for-scaling-postgresql-to-multi-terabyte-and-beyond/>
>> 
>> it is mentioned:
>> 
>> "GIN, the most know non-default index type perhaps, has been actually around 
>> for ages (full-text search) and in short is perfect for indexing columns 
>> where there are lot of repeating values – think all kinds of statuses or 
>> good old Mr/Mrs/Miss. GIN only stores every unique column value only once as 
>> for the default B-tree you’ll have e.g. 1 millon leaf nodes with the integer 
>> “1” in it."
>> 
>> 
>> Does it mean that GIN is a very good choice for low cardinality columns.  
> 
> Not necessary. There is other index which also don’t keep column value in an 
> every leaf. Hash, for instance. 
> 
> For smallish values (which low cardinality columns tend to be) the per-tuple 
> overhead and the pointer itself is probably much larger than the value, so 
> hash won't save you much if any space.  The GIN index removes not just the 
> value, but the per-tuple overhead.  And also compresses the point list to 
> further save space.  
> 
> Here is a real-world example from one of my databases where each value is 
> about 17 characters long, and is present about 20 times:
> 
> gin: 411 MB
> btree: 2167 MB 
> hash: 2159 MB
> 
> Cheers,
> 
> Jeff

Reply via email to