Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-20 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: On 08/15/2014 04:19 PM, Tom Lane wrote: Personally I'd prefer to go to the all-lengths approach, but a large part of that comes from a subjective assessment that the hybrid approach is too messy. Others might well disagree. ... So, that extraction test

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-20 Thread Josh Berkus
On 08/20/2014 08:29 AM, Tom Lane wrote: Since it's looking like this might be the direction we want to go, I took the time to flesh out my proof-of-concept patch. The attached version takes care of cosmetic issues (like fixing the comments), and includes code to avoid O(N^2) penalties in

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-20 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: This means we need a beta3, no? If we change the on-disk format, I'd say so. So we don't want to wait around too long before deciding. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-20 Thread Josh Berkus
On 08/20/2014 08:29 AM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: On 08/15/2014 04:19 PM, Tom Lane wrote: Personally I'd prefer to go to the all-lengths approach, but a large part of that comes from a subjective assessment that the hybrid approach is too messy. Others might well

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-20 Thread Arthur Silva
What data are you using right now Josh? There's the github archive http://www.githubarchive.org/ Here's some sample data https://gist.github.com/igrigorik/2017462 -- Arthur Silva On Wed, Aug 20, 2014 at 6:09 PM, Josh Berkus j...@agliodbs.com wrote: On 08/20/2014 08:29 AM, Tom Lane wrote:

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-19 Thread Josh Berkus
On 08/15/2014 04:19 PM, Tom Lane wrote: Personally I'd prefer to go to the all-lengths approach, but a large part of that comes from a subjective assessment that the hybrid approach is too messy. Others might well disagree. In case anyone else wants to do measurements on some more data

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-16 Thread Arthur Silva
On Fri, Aug 15, 2014 at 8:19 PM, Tom Lane t...@sss.pgh.pa.us wrote: Arthur Silva arthur...@gmail.com writes: We should add some sort of versionning to the jsonb format. This can be explored in the future in many ways. If we end up making an incompatible change to the jsonb format, I would

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-15 Thread Josh Berkus
On 08/14/2014 07:24 PM, Tom Lane wrote: We can certainly reduce that. The question was whether it would be worth the effort to try. At this point, with three different test data sets having shown clear space savings, I think it is worth the effort. I'll poke into it tomorrow or over the

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-15 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: On 08/14/2014 07:24 PM, Tom Lane wrote: We can certainly reduce that. The question was whether it would be worth the effort to try. At this point, with three different test data sets having shown clear space savings, I think it is worth the effort.

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-15 Thread Arthur Silva
I'm still getting up to speed on postgres development but I'd like to leave an opinion. We should add some sort of versionning to the jsonb format. This can be explored in the future in many ways. As for the current problem, we should explore the directory at the end option. It should improve

Re: [Bad Attachment] Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-15 Thread Josh Berkus
On 08/15/2014 01:38 PM, Tom Lane wrote: I've been poking at this, and I think the main explanation for your result is that with more JSONB documents being subject to compression, we're spending more time in pglz_decompress. There's no free lunch in that department: if you want compressed

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-15 Thread Tom Lane
Arthur Silva arthur...@gmail.com writes: We should add some sort of versionning to the jsonb format. This can be explored in the future in many ways. If we end up making an incompatible change to the jsonb format, I would support taking the opportunity to stick a version ID in there. But I

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Heikki Linnakangas
On 08/14/2014 04:01 AM, Tom Lane wrote: I wrote: That's a fair question. I did a very very simple hack to replace the item offsets with item lengths -- turns out that that mostly requires removing some code that changes lengths to offsets ;-). I then loaded up Larry's example of a

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes: For comparison, here's a patch that implements the scheme that Alexander Korotkov suggested, where we store an offset every 8th element, and a length in the others. It compresses Larry's example to 525 bytes. Increasing the stride from 8 to

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Bruce Momjian
On Wed, Aug 13, 2014 at 09:01:43PM -0400, Tom Lane wrote: I wrote: That's a fair question. I did a very very simple hack to replace the item offsets with item lengths -- turns out that that mostly requires removing some code that changes lengths to offsets ;-). I then loaded up Larry's

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Tom Lane
Bruce Momjian br...@momjian.us writes: Uh, can we get compression for actual documents, rather than duplicate strings? [ shrug... ] What's your proposed set of actual documents? I don't think we have any corpus of JSON docs that are all large enough to need compression. This gets back to the

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Bruce Momjian
On Thu, Aug 14, 2014 at 12:22:46PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Uh, can we get compression for actual documents, rather than duplicate strings? [ shrug... ] What's your proposed set of actual documents? I don't think we have any corpus of JSON docs that

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Merlin Moncure
On Thu, Aug 14, 2014 at 11:52 AM, Bruce Momjian br...@momjian.us wrote: On Thu, Aug 14, 2014 at 12:22:46PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Uh, can we get compression for actual documents, rather than duplicate strings? [ shrug... ] What's your proposed set of

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Tom Lane
Bruce Momjian br...@momjian.us writes: On Thu, Aug 14, 2014 at 12:22:46PM -0400, Tom Lane wrote: This gets back to the problem of what test case are we going to consider while debating what solution to adopt. Uh, we just one need one 12k JSON document from somewhere. Clearly this is

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Peter Geoghegan
On Thu, Aug 14, 2014 at 10:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maybe this is telling us it's not worth changing the representation, and we should just go do something about the first_success_by threshold and be done. I'm hesitant to draw such conclusions on the basis of a single

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Bruce Momjian
On Thu, Aug 14, 2014 at 01:57:14PM -0400, Tom Lane wrote: Maybe this is telling us it's not worth changing the representation, and we should just go do something about the first_success_by threshold and be done. I'm hesitant to draw such conclusions on the basis of a single use-case though,

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Josh Berkus
On 08/14/2014 11:13 AM, Bruce Momjian wrote: On Thu, Aug 14, 2014 at 01:57:14PM -0400, Tom Lane wrote: Maybe this is telling us it's not worth changing the representation, and we should just go do something about the first_success_by threshold and be done. I'm hesitant to draw such

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Oleg Bartunov
I did quick test on the same bookmarks to test performance of 9.4beta2 and 9.4beta2+patch The query was the same we used in pgcon presentation: SELECT count(*) FROM jb WHERE jb @ '{tags:[{term:NYC}]}'::jsonb; table size | time (ms) 9.4beta2:1374 Mb |

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Claudio Freire
On Thu, Aug 14, 2014 at 3:49 PM, Larry White ljw1...@gmail.com wrote: I attached a json file of approximately 513K. It contains two repetitions of a single json structure. The values are quasi-random. It might make a decent test case of meaningfully sized data. I have a 59M in plain SQL (10M

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Claudio Freire
On Thu, Aug 14, 2014 at 4:24 PM, Claudio Freire klaussfre...@gmail.com wrote: On Thu, Aug 14, 2014 at 3:49 PM, Larry White ljw1...@gmail.com wrote: I attached a json file of approximately 513K. It contains two repetitions of a single json structure. The values are quasi-random. It might make a

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes: On Thu, Aug 14, 2014 at 10:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maybe this is telling us it's not worth changing the representation, and we should just go do something about the first_success_by threshold and be done. I'm hesitant to draw such

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Gavin Flower
On 15/08/14 09:47, Tom Lane wrote: Peter Geoghegan p...@heroku.com writes: On Thu, Aug 14, 2014 at 10:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maybe this is telling us it's not worth changing the representation, and we should just go do something about the first_success_by threshold and be

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Josh Berkus
So, here's a destruction test case: 200,000 JSON values (plus 2 key columns) Average width 4K (+/- 1K) 183 keys per JSON value keys 10 to 30 characters 105 float values 70 integer values 8 text and date values no nesting The jsonic table is JSON The

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: So, here's a destruction test case: 200,000 JSON values (plus 2 key columns) Average width 4K (+/- 1K) 183 keys per JSON value Is that 183 keys exactly each time, or is 183 the average? If so, what's the min/max number of keys? I ask because 183 would be

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Josh Berkus
On 08/14/2014 04:02 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: So, here's a destruction test case: 200,000 JSON values (plus 2 key columns) Average width 4K (+/- 1K) 183 keys per JSON value Is that 183 keys exactly each time, or is 183 the average? Each time exactly. It

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: On 08/14/2014 04:02 PM, Tom Lane wrote: It would be useful to see min/max/avg of pg_column_size() in both these cases. Well, this is 9.4, so I can do better than that. How about quartiles? thetype |colsize_distribution

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Josh Berkus
What's more, it looks like the jsonb data is pretty much never getting compressed --- the min is too high for that. So I'm guessing that this example is mostly about the first_success_by threshold preventing any compression from happening. Please, before looking at my other patch, try

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Josh Berkus
On 08/14/2014 04:47 PM, Josh Berkus wrote: thetype |colsize_distribution -+ json| {1777,1803,1890,1940,4424} jsonb | {5902,5926,5978,6002,6208} Just realized my query was counting the whole row size instead of just the column size. Here's just

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Josh Berkus
Before changing to to INT_MAX: thetype |colsize_distribution -+ json| {1741,1767,1854,1904,2292} jsonb | {3551,5866,5910,5958,6168} After: thetype |colsize_distribution -+ json|

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-14 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: And with Tom's test patch: ... Since that improved things a *lot*, just +40% instead of +200%, I thought I'd test some select queries. That test patch is not meant to be fast, its ambition was only to see what the effects on storage size would be. So I

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-13 Thread Claudio Freire
On Tue, Aug 12, 2014 at 8:00 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Aug 11, 2014 at 01:44:05PM -0700, Peter Geoghegan wrote: On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost sfr...@snowman.net wrote: We've got a clear example of someone, quite reasonably, expecting their JSONB

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-13 Thread Tom Lane
Bruce Momjian br...@momjian.us writes: Seems we have two issues: 1) the header makes testing for compression likely to fail 2) use of pointers rather than offsets reduces compression potential I understand we are focusing on #1, but how much does compression reduce the storage size with

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-13 Thread Tom Lane
I wrote: That's a fair question. I did a very very simple hack to replace the item offsets with item lengths -- turns out that that mostly requires removing some code that changes lengths to offsets ;-). I then loaded up Larry's example of a noncompressible JSON value, and compared

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-13 Thread Andrew Dunstan
On 08/13/2014 09:01 PM, Tom Lane wrote: I wrote: That's a fair question. I did a very very simple hack to replace the item offsets with item lengths -- turns out that that mostly requires removing some code that changes lengths to offsets ;-). I then loaded up Larry's example of a

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-13 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: On 08/13/2014 09:01 PM, Tom Lane wrote: That's a fair question. I did a very very simple hack to replace the item offsets with item lengths -- turns out that that mostly requires removing some code that changes lengths to offsets ;-). What does

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-12 Thread Marti Raudsepp
On Fri, Aug 8, 2014 at 10:50 PM, Hannu Krosing ha...@2ndquadrant.com wrote: How hard and how expensive would it be to teach pg_lzcompress to apply a delta filter on suitable data ? So that instead of integers their deltas will be fed to the real compressor Has anyone given this more thought?

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-12 Thread Robert Haas
On Mon, Aug 11, 2014 at 3:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: ... I think it would be a better idea to arrange some method by which JSONB (and perhaps other data types) can provide compression hints to pglz. I agree with that as a long-term

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-12 Thread Bruce Momjian
On Mon, Aug 11, 2014 at 01:44:05PM -0700, Peter Geoghegan wrote: On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost sfr...@snowman.net wrote: We've got a clear example of someone, quite reasonably, expecting their JSONB object to be compressed using the normal TOAST mechanism, and we're failing

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-11 Thread Robert Haas
On Sat, Aug 9, 2014 at 3:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: Stephen Frost sfr...@snowman.net writes: Trying to move the header to the end just for the sake of this doesn't strike me as a good solution as it'll make things quite a bit more

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-11 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: ... I think it would be a better idea to arrange some method by which JSONB (and perhaps other data types) can provide compression hints to pglz. I agree with that as a long-term goal, but not sure if it's sane to push into 9.4. What we could

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-11 Thread Peter Geoghegan
On Mon, Aug 11, 2014 at 12:07 PM, Robert Haas robertmh...@gmail.com wrote: I think that's a good point. I think that there may be something to be said for the current layout. Having adjacent keys and values could take better advantage of CPU cache characteristics. I've heard of approaches to

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-11 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote: Robert Haas robertmh...@gmail.com writes: ... I think it would be a better idea to arrange some method by which JSONB (and perhaps other data types) can provide compression hints to pglz. I agree with that as a long-term goal, but not sure if it's

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-11 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: What we could conceivably do now is (a) add a datatype OID argument to toast_compress_datum, and (b) hard-wire the selection of a different compression-parameters struct if it's JSONBOID. The actual fix would

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-11 Thread Stephen Frost
* Peter Geoghegan (p...@heroku.com) wrote: If there is to be any effort to make jsonb a more effective target for compression, I imagine that that would have to target redundancy between JSON documents. With idiomatic usage, we can expect plenty of it. While I certainly agree, that's a rather

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-11 Thread Peter Geoghegan
On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost sfr...@snowman.net wrote: We've got a clear example of someone, quite reasonably, expecting their JSONB object to be compressed using the normal TOAST mechanism, and we're failing to do that in cases where it's actually a win to do so. That's the

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-09 Thread Stephen Frost
Bruce, * Bruce Momjian (br...@momjian.us) wrote: On Fri, Aug 8, 2014 at 08:25:04PM -0400, Stephen Frost wrote: * Bruce Momjian (br...@momjian.us) wrote: FYI, pg_upgrade could be taught to refuse to upgrade from earlier 9.4 betas and report the problem JSONB columns. That is *not* a

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-09 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: Stephen Frost sfr...@snowman.net writes: Trying to move the header to the end just for the sake of this doesn't strike me as a good solution as it'll make things quite a bit more complicated. Why is that?  How much harder would it be to add a single offset

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-09 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes: Stephen Frost sfr...@snowman.net writes: Trying to move the header to the end just for the sake of this doesn't strike me as a good solution as it'll make things quite a bit more complicated. Why is that?  How much harder would it be to add a single

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Ashutosh Bapat
On Fri, Aug 8, 2014 at 10:48 AM, Stephen Frost sfr...@snowman.net wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: I looked into the issue reported in bug #11109. The problem appears to be that jsonb's on-disk format is designed in such a way that the leading portion of any JSON array or

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Andrew Dunstan
On 08/07/2014 11:17 PM, Tom Lane wrote: I looked into the issue reported in bug #11109. The problem appears to be that jsonb's on-disk format is designed in such a way that the leading portion of any JSON array or object will be fairly incompressible, because it consists mostly of a

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: I looked into the issue reported in bug #11109. The problem appears to be that jsonb's on-disk format is designed in such a way that the leading portion of any JSON array or object will be fairly incompressible,

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: On 08/07/2014 11:17 PM, Tom Lane wrote: I looked into the issue reported in bug #11109. The problem appears to be that jsonb's on-disk format is designed in such a way that the leading portion of any JSON array or object will be fairly

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Andrew Dunstan
On 08/08/2014 11:18 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 08/07/2014 11:17 PM, Tom Lane wrote: I looked into the issue reported in bug #11109. The problem appears to be that jsonb's on-disk format is designed in such a way that the leading portion of any JSON

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: On 08/08/2014 11:18 AM, Tom Lane wrote: That's not really the issue here, I think. The problem is that a relatively minor aspect of the representation, namely the choice to store a series of offsets rather than a series of lengths, produces

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread John W Higgins
On Fri, Aug 8, 2014 at 8:02 AM, Tom Lane t...@sss.pgh.pa.us wrote: Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: I'm rather disinclined to change the on-disk format because of this specific test, that feels a bit like the tail wagging the dog to me,

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Tom Lane
John W Higgins wish...@gmail.com writes: Would an answer be to switch the location of the jsonb header data to the end of the field as opposed to the beginning of the field? That would allow pglz to see what it wants to see early on and go to work when possible? Hm, might work. Seems a bit

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Andrew Dunstan
On 08/08/2014 12:04 PM, John W Higgins wrote: Would an answer be to switch the location of the jsonb header data to the end of the field as opposed to the beginning of the field? That would allow pglz to see what it wants to see early on and go to work when possible? Add an offset at the

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Andrew Dunstan
On 08/08/2014 11:54 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 08/08/2014 11:18 AM, Tom Lane wrote: That's not really the issue here, I think. The problem is that a relatively minor aspect of the representation, namely the choice to store a series of offsets rather

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Teodor Sigaev
value-to-be-compressed. (first_success_by is 1024 in the default set of compression parameters.) Curious idea: we could swap JEntry array and values: values in the begining of type will be catched by pg_lzcompress. But we will need to know offset of JEntry array, so header will grow up till

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Teodor Sigaev
Curious idea: we could swap JEntry array and values: values in the begining of type will be catched by pg_lzcompress. But we will need to know offset of JEntry array, so header will grow up till 8 bytes (actually, it will be a varlena header!) May be I wasn't clear:jsonb type will start from

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Bruce Momjian
On Fri, Aug 8, 2014 at 11:02:26AM -0400, Tom Lane wrote: 2. Are we going to ship 9.4 without fixing this? I definitely don't see replacing pg_lzcompress as being on the agenda for 9.4, whereas changing jsonb is still within the bounds of reason. FYI, pg_upgrade could be taught to refuse to

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Josh Berkus
On 08/08/2014 08:02 AM, Tom Lane wrote: 2. Are we going to ship 9.4 without fixing this? I definitely don't see replacing pg_lzcompress as being on the agenda for 9.4, whereas changing jsonb is still within the bounds of reason. Considering all the hype that's built up around jsonb,

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Alexander Korotkov
On Fri, Aug 8, 2014 at 9:14 PM, Teodor Sigaev teo...@sigaev.ru wrote: Curious idea: we could swap JEntry array and values: values in the begining of type will be catched by pg_lzcompress. But we will need to know offset of JEntry array, so header will grow up till 8 bytes (actually, it will

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Ants Aasma
On Fri, Aug 8, 2014 at 7:35 PM, Andrew Dunstan and...@dunslane.net wrote: I took a quick look and saw that this wouldn't be that easy to get around. As I'd suspected upthread, there are places that do random access into a JEntry array, such as the binary search in findJsonbValueFromContainer().

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Hannu Krosing
On 08/08/2014 06:17 AM, Tom Lane wrote: I looked into the issue reported in bug #11109. The problem appears to be that jsonb's on-disk format is designed in such a way that the leading portion of any JSON array or object will be fairly incompressible, because it consists mostly of a

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Peter Geoghegan
On Fri, Aug 8, 2014 at 12:41 PM, Ants Aasma a...@cybertec.at wrote: I don't think binary search is the main problem here. Objects are usually reasonably sized, while arrays are more likely to be huge. To make matters worse, jsonb - int goes from O(1) to O(n). I don't think it's true that

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Peter Geoghegan
On Fri, Aug 8, 2014 at 12:06 PM, Josh Berkus j...@agliodbs.com wrote: One we ship 9.4, many users are going to load 100's of GB into JSONB fields. Even if we fix the compressability issue in 9.5, those users won't be able to fix the compression without rewriting all their data, which could be

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Larry White
I was not complaining; I think JSONB is awesome. But I am one of those people who would like to put 100's of GB (or more) JSON files into Postgres and I am concerned about file size and possible future changes to the format. On Fri, Aug 8, 2014 at 7:10 PM, Peter Geoghegan p...@heroku.com wrote:

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: I looked into the issue reported in bug #11109. The problem appears to be that jsonb's on-disk format is designed in such a way that the leading portion of any JSON

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote: On Fri, Aug 8, 2014 at 11:02:26AM -0400, Tom Lane wrote: 2. Are we going to ship 9.4 without fixing this? I definitely don't see replacing pg_lzcompress as being on the agenda for 9.4, whereas changing jsonb is still within the bounds of reason.

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Stephen Frost
* Josh Berkus (j...@agliodbs.com) wrote: On 08/08/2014 08:02 AM, Tom Lane wrote: 2. Are we going to ship 9.4 without fixing this? I definitely don't see replacing pg_lzcompress as being on the agenda for 9.4, whereas changing jsonb is still within the bounds of reason. Considering all

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes: What about considering how large the object is when we are analyzing if it compresses well overall? Hmm, yeah, that's a possibility: we could redefine the limit at which we bail out in terms of a fraction of the object size instead of a fixed limit.

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Andrew Dunstan
On 08/08/2014 08:45 PM, Tom Lane wrote: Perhaps another options would be a new storage type which basically says just compress it, no matter what? We'd be able to make that the default for jsonb columns too, no? Meh. We could do that, but it would still require adding arguments to

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost sfr...@snowman.net writes: What about considering how large the object is when we are analyzing if it compresses well overall? Hmm, yeah, that's a possibility: we could redefine the limit at which we bail out in terms of a fraction of

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes: I agree that we need to avoid changing jsonb's on-disk representation. ... post-release, I assume you mean. Have I missed where a good suggestion has been made about how to do that which preserves the binary-search capabilities and doesn't make the

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost sfr...@snowman.net writes: I agree that we need to avoid changing jsonb's on-disk representation. ... post-release, I assume you mean. Yes. Have I missed where a good suggestion has been made about how to do that which preserves the

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Amit Kapila
On Sat, Aug 9, 2014 at 6:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: Stephen Frost sfr...@snowman.net writes: What about considering how large the object is when we are analyzing if it compresses well overall? Hmm, yeah, that's a possibility: we could redefine the limit at which we bail out

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Bruce Momjian
On Fri, Aug 8, 2014 at 08:25:04PM -0400, Stephen Frost wrote: * Bruce Momjian (br...@momjian.us) wrote: On Fri, Aug 8, 2014 at 11:02:26AM -0400, Tom Lane wrote: 2. Are we going to ship 9.4 without fixing this? I definitely don't see replacing pg_lzcompress as being on the agenda for

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread David G Johnston
akapila wrote On Sat, Aug 9, 2014 at 6:15 AM, Tom Lane lt; tgl@.pa gt; wrote: Stephen Frost lt; sfrost@ gt; writes: What about considering how large the object is when we are analyzing if it compresses well overall? Hmm, yeah, that's a possibility: we could redefine the limit at

[HACKERS] jsonb format is pessimal for toast compression

2014-08-07 Thread Tom Lane
I looked into the issue reported in bug #11109. The problem appears to be that jsonb's on-disk format is designed in such a way that the leading portion of any JSON array or object will be fairly incompressible, because it consists mostly of a strictly-increasing series of integer offsets. This

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-07 Thread Larry White
Apologies if this is a ridiculous suggestion, but I believe that swapping out the compression algorithm (for Snappy, for example) has been discussed in the past. I wonder if that algorithm is sufficiently different that it would produce a better result, and if that might not be preferable to some

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-07 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote: I looked into the issue reported in bug #11109. The problem appears to be that jsonb's on-disk format is designed in such a way that the leading portion of any JSON array or object will be fairly incompressible, because it consists mostly of a

<    1   2