I have a database which stores receiver to indicate which account the data 
relates to. This has led to tons of duplication of data, as one set of data may 
create 3 separate rows, where the only difference is the receiver column.

|---------------------|------------------|---------------------|------------------|
|      Receiver       |       Event      |         Date        |      Location  
  |
|---------------------|------------------|---------------------|------------------|
|       Alpha         |         3        |          12         |         USA    
  |
|---------------------|------------------|---------------------|------------------|
|       Bravo         |         3        |          12         |         USA    
  |
|---------------------|------------------|---------------------|------------------|
|       Charlie       |         3        |          12         |         USA    
  |
|---------------------|------------------|---------------------|------------------|

While redesigning the database, I have considered using an array with a GIN 
index instead of the current B-Tree index on receiver. My proposed new table 
would look like this:
|-------------------------------|--------------|------------|-------------------|
|           Receivers           |     Event    |    Date    |     Location      
|
|-------------------------------|--------------|------------|-------------------|
| ["Alpha", "Bravo", "Charlie"] |       3      |     12     |         USA       
|
|-------------------------------|--------------|------------|-------------------|

More Information:
·         Receiver names are of the type (a-z, 1-5, .)
·         95% of all queries currently look like this: SELECT * FROM table 
WHERE Receiver = Alpha, with the new format this would be SELECT * FROM table 
WHERE receivers @> '"Alpha"'::jsonb;
·         The table currently contains over 4 billion rows (with duplication) 
and the new proposed schema would cut it down to under 2 billion rows.
·
Question:
1.      Does it make more sense to use Postgres Native Text Array?
2.      Would a jsonb_path_ops GIN index on receivers make sense here?
3.      Which option is more efficient? Which is faster?


Reply via email to