I need to support the following queries : 1. give all documents where attrib X='value' 2. give me all documents where attib X='value' and attrib Y='value2'
the distinct attributes if about 10,000,000,000 on about 10 difference type (X,Y etc), so in average 1000 M for each. each attribute may appear in 10-20 documents . The model has to optimized to fast read & write. Here are two model I was thinking of : Option 1 : Using RDBMS (PG) One big table (partition by type) , index on value , index on document id , type: attr type, attr value , document id . For query 1 - it a simple query . For query 2 - do self join Option 2 : Same on option 1 but to hold all documents id in one string : for example : 'host', 'myhost', apper in ' 3,5,6,7 ,8' For query 2 : do one query - with or , for example : select document ids from ... where (attr='X' and value='Y') union select document ids from ... where (attr='X' and value='Y') and the do set merging Other options : ? using btree_gin ? elasticsearch ?