I couldn't get my hands on the twitter data but I'm generating my own. The json template is http://paste2.org/wJ1dfcjw and data was generated with http://www.json-generator.com/. It has 35 top level keys, just in case someone is wondering. I generated 10000 random objects and I'm inserting them repeatedly until I got 320k rows.
Test query: SELECT data->>'name', data->>'email' FROM t_json Test storage: EXTERNAL Test jsonb lengths quartiles: {1278,1587,1731,1871,2231} Tom's lengths+cache aware: 455ms HEAD: 440ms This is a realistic-ish workload in my opinion and Tom's patch performs within 4% of HEAD. Due to the overall lenghts I couldn't really test compressibility so I re-ran the test. This time I inserted an array of 2 objects in each row, as in: [obj, obj]; The objects where taken in sequence from the 10000 pool so contents match in both tests. Test query: SELECT data #> '{0, name}', data #> '{0, email}', data #> '{1, name}', data #> '{1, email}' FROM t_json Test storage: EXTENDED HEAD: 17mb table + 878mb toast HEAD size quartiles: {2015,2500,2591,2711,3483} HEAD query runtime: 15s Tom's: 220mb table + 580mb toast Tom's size quartiles: {1665,1984,2061,2142.25,2384} Tom's query runtime: 13s This is an intriguing edge case that Tom's patch actually outperform the base implementation for 3~4kb jsons.