Josh Berkus j...@agliodbs.com writes:
On 08/15/2014 04:19 PM, Tom Lane wrote:
Personally I'd prefer to go to the all-lengths approach, but a large
part of that comes from a subjective assessment that the hybrid approach
is too messy. Others might well disagree.
... So, that extraction test
On 08/20/2014 08:29 AM, Tom Lane wrote:
Since it's looking like this might be the direction we want to go, I took
the time to flesh out my proof-of-concept patch. The attached version
takes care of cosmetic issues (like fixing the comments), and includes
code to avoid O(N^2) penalties in
Josh Berkus j...@agliodbs.com writes:
This means we need a beta3, no?
If we change the on-disk format, I'd say so. So we don't want to wait
around too long before deciding.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To
On 08/20/2014 08:29 AM, Tom Lane wrote:
Josh Berkus j...@agliodbs.com writes:
On 08/15/2014 04:19 PM, Tom Lane wrote:
Personally I'd prefer to go to the all-lengths approach, but a large
part of that comes from a subjective assessment that the hybrid approach
is too messy. Others might well
What data are you using right now Josh?
There's the github archive http://www.githubarchive.org/
Here's some sample data https://gist.github.com/igrigorik/2017462
--
Arthur Silva
On Wed, Aug 20, 2014 at 6:09 PM, Josh Berkus j...@agliodbs.com wrote:
On 08/20/2014 08:29 AM, Tom Lane wrote:
On 08/15/2014 04:19 PM, Tom Lane wrote:
Personally I'd prefer to go to the all-lengths approach, but a large
part of that comes from a subjective assessment that the hybrid approach
is too messy. Others might well disagree.
In case anyone else wants to do measurements on some more data
On Fri, Aug 15, 2014 at 8:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Arthur Silva arthur...@gmail.com writes:
We should add some sort of versionning to the jsonb format. This can be
explored in the future in many ways.
If we end up making an incompatible change to the jsonb format, I would
On 08/14/2014 07:24 PM, Tom Lane wrote:
We can certainly reduce that. The question was whether it would be
worth the effort to try. At this point, with three different test
data sets having shown clear space savings, I think it is worth
the effort. I'll poke into it tomorrow or over the
Josh Berkus j...@agliodbs.com writes:
On 08/14/2014 07:24 PM, Tom Lane wrote:
We can certainly reduce that. The question was whether it would be
worth the effort to try. At this point, with three different test
data sets having shown clear space savings, I think it is worth
the effort.
I'm still getting up to speed on postgres development but I'd like to leave
an opinion.
We should add some sort of versionning to the jsonb format. This can be
explored in the future in many ways.
As for the current problem, we should explore the directory at the end
option. It should improve
On 08/15/2014 01:38 PM, Tom Lane wrote:
I've been poking at this, and I think the main explanation for your result
is that with more JSONB documents being subject to compression, we're
spending more time in pglz_decompress. There's no free lunch in that
department: if you want compressed
Arthur Silva arthur...@gmail.com writes:
We should add some sort of versionning to the jsonb format. This can be
explored in the future in many ways.
If we end up making an incompatible change to the jsonb format, I would
support taking the opportunity to stick a version ID in there. But
I
On 08/14/2014 04:01 AM, Tom Lane wrote:
I wrote:
That's a fair question. I did a very very simple hack to replace the item
offsets with item lengths -- turns out that that mostly requires removing
some code that changes lengths to offsets ;-). I then loaded up Larry's
example of a
Heikki Linnakangas hlinnakan...@vmware.com writes:
For comparison, here's a patch that implements the scheme that Alexander
Korotkov suggested, where we store an offset every 8th element, and a
length in the others. It compresses Larry's example to 525 bytes.
Increasing the stride from 8 to
On Wed, Aug 13, 2014 at 09:01:43PM -0400, Tom Lane wrote:
I wrote:
That's a fair question. I did a very very simple hack to replace the item
offsets with item lengths -- turns out that that mostly requires removing
some code that changes lengths to offsets ;-). I then loaded up Larry's
Bruce Momjian br...@momjian.us writes:
Uh, can we get compression for actual documents, rather than duplicate
strings?
[ shrug... ] What's your proposed set of actual documents?
I don't think we have any corpus of JSON docs that are all large
enough to need compression.
This gets back to the
On Thu, Aug 14, 2014 at 12:22:46PM -0400, Tom Lane wrote:
Bruce Momjian br...@momjian.us writes:
Uh, can we get compression for actual documents, rather than duplicate
strings?
[ shrug... ] What's your proposed set of actual documents?
I don't think we have any corpus of JSON docs that
On Thu, Aug 14, 2014 at 11:52 AM, Bruce Momjian br...@momjian.us wrote:
On Thu, Aug 14, 2014 at 12:22:46PM -0400, Tom Lane wrote:
Bruce Momjian br...@momjian.us writes:
Uh, can we get compression for actual documents, rather than duplicate
strings?
[ shrug... ] What's your proposed set of
Bruce Momjian br...@momjian.us writes:
On Thu, Aug 14, 2014 at 12:22:46PM -0400, Tom Lane wrote:
This gets back to the problem of what test case are we going to consider
while debating what solution to adopt.
Uh, we just one need one 12k JSON document from somewhere. Clearly this
is
On Thu, Aug 14, 2014 at 10:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Maybe this is telling us it's not worth changing the representation,
and we should just go do something about the first_success_by threshold
and be done. I'm hesitant to draw such conclusions on the basis of a
single
On Thu, Aug 14, 2014 at 01:57:14PM -0400, Tom Lane wrote:
Maybe this is telling us it's not worth changing the representation,
and we should just go do something about the first_success_by threshold
and be done. I'm hesitant to draw such conclusions on the basis of a
single use-case though,
On 08/14/2014 11:13 AM, Bruce Momjian wrote:
On Thu, Aug 14, 2014 at 01:57:14PM -0400, Tom Lane wrote:
Maybe this is telling us it's not worth changing the representation,
and we should just go do something about the first_success_by threshold
and be done. I'm hesitant to draw such
I did quick test on the same bookmarks to test performance of 9.4beta2 and
9.4beta2+patch
The query was the same we used in pgcon presentation:
SELECT count(*) FROM jb WHERE jb @ '{tags:[{term:NYC}]}'::jsonb;
table size | time (ms)
9.4beta2:1374 Mb |
On Thu, Aug 14, 2014 at 3:49 PM, Larry White ljw1...@gmail.com wrote:
I attached a json file of approximately 513K. It contains two repetitions of
a single json structure. The values are quasi-random. It might make a decent
test case of meaningfully sized data.
I have a 59M in plain SQL (10M
On Thu, Aug 14, 2014 at 4:24 PM, Claudio Freire klaussfre...@gmail.com wrote:
On Thu, Aug 14, 2014 at 3:49 PM, Larry White ljw1...@gmail.com wrote:
I attached a json file of approximately 513K. It contains two repetitions of
a single json structure. The values are quasi-random. It might make a
Peter Geoghegan p...@heroku.com writes:
On Thu, Aug 14, 2014 at 10:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Maybe this is telling us it's not worth changing the representation,
and we should just go do something about the first_success_by threshold
and be done. I'm hesitant to draw such
On 15/08/14 09:47, Tom Lane wrote:
Peter Geoghegan p...@heroku.com writes:
On Thu, Aug 14, 2014 at 10:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Maybe this is telling us it's not worth changing the representation,
and we should just go do something about the first_success_by threshold
and be
So, here's a destruction test case:
200,000 JSON values (plus 2 key columns)
Average width 4K (+/- 1K)
183 keys per JSON value
keys 10 to 30 characters
105 float values
70 integer values
8 text and date values
no nesting
The jsonic table is JSON
The
Josh Berkus j...@agliodbs.com writes:
So, here's a destruction test case:
200,000 JSON values (plus 2 key columns)
Average width 4K (+/- 1K)
183 keys per JSON value
Is that 183 keys exactly each time, or is 183 the average?
If so, what's the min/max number of keys?
I ask because 183 would be
On 08/14/2014 04:02 PM, Tom Lane wrote:
Josh Berkus j...@agliodbs.com writes:
So, here's a destruction test case:
200,000 JSON values (plus 2 key columns)
Average width 4K (+/- 1K)
183 keys per JSON value
Is that 183 keys exactly each time, or is 183 the average?
Each time exactly.
It
Josh Berkus j...@agliodbs.com writes:
On 08/14/2014 04:02 PM, Tom Lane wrote:
It would be useful to see min/max/avg of pg_column_size() in both
these cases.
Well, this is 9.4, so I can do better than that. How about quartiles?
thetype |colsize_distribution
What's more, it looks like the jsonb data is pretty much never getting
compressed --- the min is too high for that. So I'm guessing that this
example is mostly about the first_success_by threshold preventing any
compression from happening. Please, before looking at my other patch,
try
On 08/14/2014 04:47 PM, Josh Berkus wrote:
thetype |colsize_distribution
-+
json| {1777,1803,1890,1940,4424}
jsonb | {5902,5926,5978,6002,6208}
Just realized my query was counting the whole row size instead of just
the column size. Here's just
Before changing to to INT_MAX:
thetype |colsize_distribution
-+
json| {1741,1767,1854,1904,2292}
jsonb | {3551,5866,5910,5958,6168}
After:
thetype |colsize_distribution
-+
json|
Josh Berkus j...@agliodbs.com writes:
And with Tom's test patch:
...
Since that improved things a *lot*, just +40% instead of +200%, I
thought I'd test some select queries.
That test patch is not meant to be fast, its ambition was only to see
what the effects on storage size would be. So I
On Tue, Aug 12, 2014 at 8:00 PM, Bruce Momjian br...@momjian.us wrote:
On Mon, Aug 11, 2014 at 01:44:05PM -0700, Peter Geoghegan wrote:
On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost sfr...@snowman.net wrote:
We've got a clear example of someone, quite reasonably, expecting their
JSONB
Bruce Momjian br...@momjian.us writes:
Seems we have two issues:
1) the header makes testing for compression likely to fail
2) use of pointers rather than offsets reduces compression potential
I understand we are focusing on #1, but how much does compression reduce
the storage size with
I wrote:
That's a fair question. I did a very very simple hack to replace the item
offsets with item lengths -- turns out that that mostly requires removing
some code that changes lengths to offsets ;-). I then loaded up Larry's
example of a noncompressible JSON value, and compared
On 08/13/2014 09:01 PM, Tom Lane wrote:
I wrote:
That's a fair question. I did a very very simple hack to replace the item
offsets with item lengths -- turns out that that mostly requires removing
some code that changes lengths to offsets ;-). I then loaded up Larry's
example of a
Andrew Dunstan and...@dunslane.net writes:
On 08/13/2014 09:01 PM, Tom Lane wrote:
That's a fair question. I did a very very simple hack to replace the item
offsets with item lengths -- turns out that that mostly requires removing
some code that changes lengths to offsets ;-).
What does
On Fri, Aug 8, 2014 at 10:50 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
How hard and how expensive would it be to teach pg_lzcompress to
apply a delta filter on suitable data ?
So that instead of integers their deltas will be fed to the real
compressor
Has anyone given this more thought?
On Mon, Aug 11, 2014 at 3:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
... I think it would be a better idea to arrange some method by
which JSONB (and perhaps other data types) can provide compression
hints to pglz.
I agree with that as a long-term
On Mon, Aug 11, 2014 at 01:44:05PM -0700, Peter Geoghegan wrote:
On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost sfr...@snowman.net wrote:
We've got a clear example of someone, quite reasonably, expecting their
JSONB object to be compressed using the normal TOAST mechanism, and
we're failing
On Sat, Aug 9, 2014 at 3:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Kevin Grittner kgri...@ymail.com writes:
Stephen Frost sfr...@snowman.net writes:
Trying to move the header to the end just for the sake of this
doesn't strike me as a good solution as it'll make things quite
a bit more
Robert Haas robertmh...@gmail.com writes:
... I think it would be a better idea to arrange some method by
which JSONB (and perhaps other data types) can provide compression
hints to pglz.
I agree with that as a long-term goal, but not sure if it's sane to
push into 9.4.
What we could
On Mon, Aug 11, 2014 at 12:07 PM, Robert Haas robertmh...@gmail.com wrote:
I think that's a good point.
I think that there may be something to be said for the current layout.
Having adjacent keys and values could take better advantage of CPU
cache characteristics. I've heard of approaches to
* Tom Lane (t...@sss.pgh.pa.us) wrote:
Robert Haas robertmh...@gmail.com writes:
... I think it would be a better idea to arrange some method by
which JSONB (and perhaps other data types) can provide compression
hints to pglz.
I agree with that as a long-term goal, but not sure if it's
Stephen Frost sfr...@snowman.net writes:
* Tom Lane (t...@sss.pgh.pa.us) wrote:
What we could conceivably do now is (a) add a datatype OID argument to
toast_compress_datum, and (b) hard-wire the selection of a different
compression-parameters struct if it's JSONBOID. The actual fix would
* Peter Geoghegan (p...@heroku.com) wrote:
If there is to be any effort to make jsonb a more effective target for
compression, I imagine that that would have to target redundancy
between JSON documents. With idiomatic usage, we can expect plenty of
it.
While I certainly agree, that's a rather
On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost sfr...@snowman.net wrote:
We've got a clear example of someone, quite reasonably, expecting their
JSONB object to be compressed using the normal TOAST mechanism, and
we're failing to do that in cases where it's actually a win to do so.
That's the
Bruce,
* Bruce Momjian (br...@momjian.us) wrote:
On Fri, Aug 8, 2014 at 08:25:04PM -0400, Stephen Frost wrote:
* Bruce Momjian (br...@momjian.us) wrote:
FYI, pg_upgrade could be taught to refuse to upgrade from earlier 9.4
betas and report the problem JSONB columns.
That is *not* a
Tom Lane t...@sss.pgh.pa.us wrote:
Stephen Frost sfr...@snowman.net writes:
Trying to move the header to the end just for the sake of this
doesn't strike me as a good solution as it'll make things quite
a bit more complicated.
Why is that? How much harder would it be to add a single offset
Kevin Grittner kgri...@ymail.com writes:
Stephen Frost sfr...@snowman.net writes:
Trying to move the header to the end just for the sake of this
doesn't strike me as a good solution as it'll make things quite
a bit more complicated.
Why is that? How much harder would it be to add a single
On Fri, Aug 8, 2014 at 10:48 AM, Stephen Frost sfr...@snowman.net wrote:
* Tom Lane (t...@sss.pgh.pa.us) wrote:
I looked into the issue reported in bug #11109. The problem appears to
be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON array or
On 08/07/2014 11:17 PM, Tom Lane wrote:
I looked into the issue reported in bug #11109. The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON array or object will be fairly incompressible, because
it consists mostly of a
Stephen Frost sfr...@snowman.net writes:
* Tom Lane (t...@sss.pgh.pa.us) wrote:
I looked into the issue reported in bug #11109. The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON array or object will be fairly incompressible,
Andrew Dunstan and...@dunslane.net writes:
On 08/07/2014 11:17 PM, Tom Lane wrote:
I looked into the issue reported in bug #11109. The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON array or object will be fairly
On 08/08/2014 11:18 AM, Tom Lane wrote:
Andrew Dunstan and...@dunslane.net writes:
On 08/07/2014 11:17 PM, Tom Lane wrote:
I looked into the issue reported in bug #11109. The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON
Andrew Dunstan and...@dunslane.net writes:
On 08/08/2014 11:18 AM, Tom Lane wrote:
That's not really the issue here, I think. The problem is that a
relatively minor aspect of the representation, namely the choice to store
a series of offsets rather than a series of lengths, produces
On Fri, Aug 8, 2014 at 8:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Stephen Frost sfr...@snowman.net writes:
* Tom Lane (t...@sss.pgh.pa.us) wrote:
I'm rather disinclined to change the on-disk format because of this
specific test, that feels a bit like the tail wagging the dog to me,
John W Higgins wish...@gmail.com writes:
Would an answer be to switch the location of the jsonb header data to the
end of the field as opposed to the beginning of the field? That would allow
pglz to see what it wants to see early on and go to work when possible?
Hm, might work. Seems a bit
On 08/08/2014 12:04 PM, John W Higgins wrote:
Would an answer be to switch the location of the jsonb header data
to the end of the field as opposed to the beginning of the field? That
would allow pglz to see what it wants to see early on and go to work
when possible?
Add an offset at the
On 08/08/2014 11:54 AM, Tom Lane wrote:
Andrew Dunstan and...@dunslane.net writes:
On 08/08/2014 11:18 AM, Tom Lane wrote:
That's not really the issue here, I think. The problem is that a
relatively minor aspect of the representation, namely the choice to store
a series of offsets rather
value-to-be-compressed. (first_success_by is 1024 in the default set of
compression parameters.)
Curious idea: we could swap JEntry array and values: values in the
begining of type will be catched by pg_lzcompress. But we will need to
know offset of JEntry array, so header will grow up till
Curious idea: we could swap JEntry array and values: values in the
begining of type will be catched by pg_lzcompress. But we will need to
know offset of JEntry array, so header will grow up till 8 bytes
(actually, it will be a varlena header!)
May be I wasn't clear:jsonb type will start from
On Fri, Aug 8, 2014 at 11:02:26AM -0400, Tom Lane wrote:
2. Are we going to ship 9.4 without fixing this? I definitely don't see
replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
jsonb is still within the bounds of reason.
FYI, pg_upgrade could be taught to refuse to
On 08/08/2014 08:02 AM, Tom Lane wrote:
2. Are we going to ship 9.4 without fixing this? I definitely don't see
replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
jsonb is still within the bounds of reason.
Considering all the hype that's built up around jsonb,
On Fri, Aug 8, 2014 at 9:14 PM, Teodor Sigaev teo...@sigaev.ru wrote:
Curious idea: we could swap JEntry array and values: values in the
begining of type will be catched by pg_lzcompress. But we will need to
know offset of JEntry array, so header will grow up till 8 bytes
(actually, it will
On Fri, Aug 8, 2014 at 7:35 PM, Andrew Dunstan and...@dunslane.net wrote:
I took a quick look and saw that this wouldn't be that easy to get around.
As I'd suspected upthread, there are places that do random access into a
JEntry array, such as the binary search in findJsonbValueFromContainer().
On 08/08/2014 06:17 AM, Tom Lane wrote:
I looked into the issue reported in bug #11109. The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON array or object will be fairly incompressible, because
it consists mostly of a
On Fri, Aug 8, 2014 at 12:41 PM, Ants Aasma a...@cybertec.at wrote:
I don't think binary search is the main problem here. Objects are
usually reasonably sized, while arrays are more likely to be huge. To
make matters worse, jsonb - int goes from O(1) to O(n).
I don't think it's true that
On Fri, Aug 8, 2014 at 12:06 PM, Josh Berkus j...@agliodbs.com wrote:
One we ship 9.4, many users are going to load 100's of GB into JSONB
fields. Even if we fix the compressability issue in 9.5, those users
won't be able to fix the compression without rewriting all their data,
which could be
I was not complaining; I think JSONB is awesome.
But I am one of those people who would like to put 100's of GB (or more)
JSON files into Postgres and I am concerned about file size and possible
future changes to the format.
On Fri, Aug 8, 2014 at 7:10 PM, Peter Geoghegan p...@heroku.com wrote:
* Tom Lane (t...@sss.pgh.pa.us) wrote:
Stephen Frost sfr...@snowman.net writes:
* Tom Lane (t...@sss.pgh.pa.us) wrote:
I looked into the issue reported in bug #11109. The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON
* Bruce Momjian (br...@momjian.us) wrote:
On Fri, Aug 8, 2014 at 11:02:26AM -0400, Tom Lane wrote:
2. Are we going to ship 9.4 without fixing this? I definitely don't see
replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
jsonb is still within the bounds of reason.
* Josh Berkus (j...@agliodbs.com) wrote:
On 08/08/2014 08:02 AM, Tom Lane wrote:
2. Are we going to ship 9.4 without fixing this? I definitely don't see
replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
jsonb is still within the bounds of reason.
Considering all
Stephen Frost sfr...@snowman.net writes:
What about considering how large the object is when we are analyzing if
it compresses well overall?
Hmm, yeah, that's a possibility: we could redefine the limit at which
we bail out in terms of a fraction of the object size instead of a fixed
limit.
On 08/08/2014 08:45 PM, Tom Lane wrote:
Perhaps another options would be a new storage type which basically says
just compress it, no matter what? We'd be able to make that the
default for jsonb columns too, no?
Meh. We could do that, but it would still require adding arguments to
* Tom Lane (t...@sss.pgh.pa.us) wrote:
Stephen Frost sfr...@snowman.net writes:
What about considering how large the object is when we are analyzing if
it compresses well overall?
Hmm, yeah, that's a possibility: we could redefine the limit at which
we bail out in terms of a fraction of
Stephen Frost sfr...@snowman.net writes:
I agree that we need to avoid changing jsonb's on-disk representation.
... post-release, I assume you mean.
Have I missed where a good suggestion has been made about how to do that
which preserves the binary-search capabilities and doesn't make the
* Tom Lane (t...@sss.pgh.pa.us) wrote:
Stephen Frost sfr...@snowman.net writes:
I agree that we need to avoid changing jsonb's on-disk representation.
... post-release, I assume you mean.
Yes.
Have I missed where a good suggestion has been made about how to do that
which preserves the
On Sat, Aug 9, 2014 at 6:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Stephen Frost sfr...@snowman.net writes:
What about considering how large the object is when we are analyzing if
it compresses well overall?
Hmm, yeah, that's a possibility: we could redefine the limit at which
we bail out
On Fri, Aug 8, 2014 at 08:25:04PM -0400, Stephen Frost wrote:
* Bruce Momjian (br...@momjian.us) wrote:
On Fri, Aug 8, 2014 at 11:02:26AM -0400, Tom Lane wrote:
2. Are we going to ship 9.4 without fixing this? I definitely don't see
replacing pg_lzcompress as being on the agenda for
akapila wrote
On Sat, Aug 9, 2014 at 6:15 AM, Tom Lane lt;
tgl@.pa
gt; wrote:
Stephen Frost lt;
sfrost@
gt; writes:
What about considering how large the object is when we are analyzing if
it compresses well overall?
Hmm, yeah, that's a possibility: we could redefine the limit at
I looked into the issue reported in bug #11109. The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON array or object will be fairly incompressible, because
it consists mostly of a strictly-increasing series of integer offsets.
This
Apologies if this is a ridiculous suggestion, but I believe that swapping
out the compression algorithm (for Snappy, for example) has been discussed
in the past. I wonder if that algorithm is sufficiently different that it
would produce a better result, and if that might not be preferable to some
* Tom Lane (t...@sss.pgh.pa.us) wrote:
I looked into the issue reported in bug #11109. The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON array or object will be fairly incompressible, because
it consists mostly of a
101 - 187 of 187 matches
Mail list logo