Hi, I've spent a few hours stress-testing this a bit - loading a mail archive with ~1M of messages (with headers stored in a jsonb column) and then doing queries on that. Good news - no crashes or any such issues so far. The queries that I ran manually seem to return sane results.
The only problem I ran into is with limited index row size with GIN indexes. I understand it's not a bug, but I admit I haven't realized I might run into it in this case ... The data I used for testing is just a bunch of e-mail messages, with headers stored as jsonb, so each row has something like this in "headers" column: { "from" : "John Doe <j...@example.com>", "to" : ["Jane Doe <j...@example.com>", "Jack Doe <j...@example.com>"], "cc" : ..., "bcc" : ..., ... various other headers ... } The snag is that some of the header values may be very long, exceeding the limit of 1352 bytes and causing errors like this: ERROR: index row size 1416 exceeds maximum 1352 for index "gin_idx" A good example of such header is "dkim-signature" which basically contains the whole message digitally signed with DKIM. The signature tends to be long and non-compressible, thanks to the signature. I'm wondering what's the best way around this, because I suspect many new users (especially those attracted by jsonb and GIN improvements) will run into this. Maybe not immediately, but eventully they'll try to insert a jsonb with long value, and it will fail ... With btree indexes on text I would probably create an index on substr(column,0,1000) or something like that, but doing that with JSON seems a bit strange. I assume we need to store the actual values in the GIN index (so a hash is not sufficient), right? GIST indexes work, but with that I have to give up the significant performance gains that we got thanks to Alexander's GIN patches. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers