Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-29 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
 BTW, it seems like there is consensus that we ought to reorder the items
 in a jsonb object to have keys first and then values, independently of the
 other issues under discussion.  This means we *will* be breaking on-disk
 compatibility with 9.4beta2, which means pg_upgrade will need to be taught
 to refuse an upgrade if the database contains any jsonb columns.  Bruce,
 do you have time to crank out a patch for that?

 Yes, I can do that easily.  Tell me when you want it --- I just need a
 catalog version number to trigger on.

Done --- 201409291 is the cutover point.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-29 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Bruce Momjian br...@momjian.us writes:
  On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
  BTW, it seems like there is consensus that we ought to reorder the items
  in a jsonb object to have keys first and then values, independently of the
  other issues under discussion.  This means we *will* be breaking on-disk
  compatibility with 9.4beta2, which means pg_upgrade will need to be taught
  to refuse an upgrade if the database contains any jsonb columns.  Bruce,
  do you have time to crank out a patch for that?
 
  Yes, I can do that easily.  Tell me when you want it --- I just need a
  catalog version number to trigger on.
 
 Done --- 201409291 is the cutover point.

Just to clarify- the commit bumped the catversion to 201409292, so
version = 201409291 has the old format while version  201409291 has
the new format.  There was no 201409291, so I suppose it doesn't matter
too much, but technically 'version = 201409291' wouldn't be accurate.

I'm guessing this all makes sense for how pg_upgrade works, but I found
it a bit surprising that the version mentioned as the cutover point
wasn't the catversion committed.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-29 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  * Tom Lane (t...@sss.pgh.pa.us) wrote:
  Done --- 201409291 is the cutover point.
 
  Just to clarify- the commit bumped the catversion to 201409292, so
  version = 201409291 has the old format while version  201409291 has
  the new format.  There was no 201409291, so I suppose it doesn't matter
  too much, but technically 'version = 201409291' wouldn't be accurate.
 
 Nope.  See my response to Andrew: ...1 is the cutover commit Bruce
 should use, because that's what it is in 9.4.

Yup, makes sense.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-29 Thread Josh Berkus
On 09/29/2014 11:49 AM, Arthur Silva wrote:
 What's the call on the stride length? Are we going to keep it hardcoded?

Please, yes.  The complications caused by a variable stride length would
be horrible.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-29 Thread Tom Lane
Arthur Silva arthur...@gmail.com writes:
 What's the call on the stride length? Are we going to keep it hardcoded?

At the moment it's 32, but we could change it without forcing a new
initdb.  I ran a simple test that seemed to show 32 was a good choice,
but if anyone else wants to try other cases, go for it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-29 Thread Bruce Momjian
On Mon, Sep 29, 2014 at 12:30:40PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
  BTW, it seems like there is consensus that we ought to reorder the items
  in a jsonb object to have keys first and then values, independently of the
  other issues under discussion.  This means we *will* be breaking on-disk
  compatibility with 9.4beta2, which means pg_upgrade will need to be taught
  to refuse an upgrade if the database contains any jsonb columns.  Bruce,
  do you have time to crank out a patch for that?
 
  Yes, I can do that easily.  Tell me when you want it --- I just need a
  catalog version number to trigger on.
 
 Done --- 201409291 is the cutover point.

Attached patch applied to head, and backpatched to 9.4.  I think we need
to keep this in all future pg_ugprade versions in case someone from the
beta tries to jump versions, e.g. 9.4 beta1 to 9.5.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index 88fe12d..bbfcab7
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*** static void check_is_install_user(Cluste
*** 24,29 
--- 24,30 
  static void check_for_prepared_transactions(ClusterInfo *cluster);
  static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
  static void check_for_reg_data_type_usage(ClusterInfo *cluster);
+ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
  static void get_bin_version(ClusterInfo *cluster);
  static char *get_canonical_locale_name(int category, const char *locale);
  
*** check_and_dump_old_cluster(bool live_che
*** 99,104 
--- 100,108 
  	check_for_prepared_transactions(old_cluster);
  	check_for_reg_data_type_usage(old_cluster);
  	check_for_isn_and_int8_passing_mismatch(old_cluster);
+ 	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 
+ 		old_cluster.controldata.cat_ver  JSONB_FORMAT_CHANGE_CAT_VER)
+ 		check_for_jsonb_9_4_usage(old_cluster);
  
  	/* Pre-PG 9.4 had a different 'line' data type internal format */
  	if (GET_MAJOR_VERSION(old_cluster.major_version) = 903)
*** check_for_reg_data_type_usage(ClusterInf
*** 911,916 
--- 915,1010 
   %s\n\n, output_path);
  	}
  	else
+ 		check_ok();
+ }
+ 
+ 
+ /*
+  * check_for_jsonb_9_4_usage()
+  *
+  *	JSONB changed its storage format during 9.4 beta, so check for it.
+  */
+ static void
+ check_for_jsonb_9_4_usage(ClusterInfo *cluster)
+ {
+ 	int			dbnum;
+ 	FILE	   *script = NULL;
+ 	bool		found = false;
+ 	char		output_path[MAXPGPATH];
+ 
+ 	prep_status(Checking for JSONB user data types);
+ 
+ 	snprintf(output_path, sizeof(output_path), tables_using_jsonb.txt);
+ 
+ 	for (dbnum = 0; dbnum  cluster-dbarr.ndbs; dbnum++)
+ 	{
+ 		PGresult   *res;
+ 		bool		db_used = false;
+ 		int			ntups;
+ 		int			rowno;
+ 		int			i_nspname,
+ 	i_relname,
+ 	i_attname;
+ 		DbInfo	   *active_db = cluster-dbarr.dbs[dbnum];
+ 		PGconn	   *conn = connectToServer(cluster, active_db-db_name);
+ 
+ 		/*
+ 		 * While several relkinds don't store any data, e.g. views, they can
+ 		 * be used to define data types of other columns, so we check all
+ 		 * relkinds.
+ 		 */
+ 		res = executeQueryOrDie(conn,
+ SELECT n.nspname, c.relname, a.attname 
+ FROM	pg_catalog.pg_class c, 
+ 		pg_catalog.pg_namespace n, 
+ 		pg_catalog.pg_attribute a 
+ WHERE	c.oid = a.attrelid AND 
+ 		NOT a.attisdropped AND 
+ 		a.atttypid = 'pg_catalog.jsonb'::pg_catalog.regtype AND 
+ 		c.relnamespace = n.oid AND 
+ 		/* exclude possible orphaned temp tables */
+   		n.nspname !~ '^pg_temp_' AND 
+ 			  		n.nspname NOT IN ('pg_catalog', 'information_schema'));
+ 
+ 		ntups = PQntuples(res);
+ 		i_nspname = PQfnumber(res, nspname);
+ 		i_relname = PQfnumber(res, relname);
+ 		i_attname = PQfnumber(res, attname);
+ 		for (rowno = 0; rowno  ntups; rowno++)
+ 		{
+ 			found = true;
+ 			if (script == NULL  (script = fopen_priv(output_path, w)) == NULL)
+ pg_fatal(Could not open file \%s\: %s\n,
+ 		 output_path, getErrorText(errno));
+ 			if (!db_used)
+ 			{
+ fprintf(script, Database: %s\n, active_db-db_name);
+ db_used = true;
+ 			}
+ 			fprintf(script,   %s.%s.%s\n,
+ 	PQgetvalue(res, rowno, i_nspname),
+ 	PQgetvalue(res, rowno, i_relname),
+ 	PQgetvalue(res, rowno, i_attname));
+ 		}
+ 
+ 		PQclear(res);
+ 
+ 		PQfinish(conn);
+ 	}
+ 
+ 	if (script)
+ 		fclose(script);
+ 
+ 	if (found)
+ 	{
+ 		pg_log(PG_REPORT, fatal\n);
+ 		pg_fatal(Your installation contains one of the JSONB data types in user tables.\n
+ 		 The internal format of JSONB changed during 9.4 beta so this cluster cannot currently\n
+  be upgraded.  You can remove the problem tables 

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-28 Thread Josh Berkus
On 09/26/2014 06:20 PM, Josh Berkus wrote:
 Overall, I'm satisfied with the performance of the length-and-offset
 patch.

Oh, also ... no bugs found.

So, can we get Beta3 out now?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-28 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 So, can we get Beta3 out now?

If nobody else steps up and says they want to do some performance
testing, I'll push the latest lengths+offsets patch tomorrow.

Are any of the other open items listed at
https://wiki.postgresql.org/wiki/PostgreSQL_9.4_Open_Items
things that we must-fix-before-beta3?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-26 Thread Josh Berkus
All,

So these results have become a bit complex.  So spreadsheet time.

https://docs.google.com/spreadsheets/d/1Mokpx3EqlbWlFDIkF9qzpM7NneN9z-QOXWSzws3E-R4

Some details:

The Length-and-Offset test was performed using a more recent 9.4
checkout than the other two tests.  This was regrettable, and due to a
mistake with git, since the results tell me that there have been some
other changes.

I added two new datasets:

errlog2 is a simple, 4-column error log in JSON format, with 2 small
values and 2 large values in each datum.  It was there to check if any
of our changes affected the performance or size of such simple
structures (answer: no).

processed_b is a synthetic version of Mozilla Socorro's crash dumps,
about 900,000 of them, with nearly identical JSON on each row. These are
large json values (around 4KB each) with a broad mix of values and 5
levels of nesting.  However, none of the levels have very many keys per
level; the max is that the top level has up to 40 keys.  Unlike the
other data sets, I can provide a copy of processed_b for asking.

So, some observations:

* Data sizes with lengths-and-offets are slightly (3%) larger than
all-lengths for the pathological case (jsonbish) and unaffected for
other cases.

* Even large, complex JSON (processed_b) gets better compression with
the two patches than with head, although only slightly better (16%)

* This better compression for processed_b leads to slightly slower
extraction (6-7%), and surprisingly slower extraction for
length-and-offset than for all-lengths (about 2%).

* in the patholgical case, length-and-offset was notably faster on Q1
than all-lengths (24%), and somewhat slower on Q2 (8%).  I think this
shows me that I don't understand what JSON keys are at the end.

* notably, length-and-offset when uncompressed (EXTERNAL) was faster on
Q1 than head!  This was surprising enough that I retested it.

Overall, I'm satisfied with the performance of the length-and-offset
patch.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Andres Freund
On 2014-09-19 15:40:14 +0300, Heikki Linnakangas wrote:
 On 09/18/2014 09:27 PM, Heikki Linnakangas wrote:
 I'll try to write a more polished patch tomorrow. We'll then see what it
 looks like, and can decide if we want it.
 
 Ok, here are two patches. One is a refined version of my earlier patch, and
 the other implements the separate offsets array approach. They are both
 based on Tom's jsonb-lengths-merged.patch, so they include all the
 whitespace fixes etc. he mentioned.
 
 There is no big difference in terms of code complexity between the patches.
 IMHO the separate offsets array is easier to understand, but it makes for
 more complicated accessor macros to find the beginning of the
 variable-length data.

I personally am pretty clearly in favor of Heikki's version. I think it
could stand to slightly expand the reasoning behind the mixed
length/offset format; it's not immediately obvious why the offsets are
problematic for compression. Otherwise, based on a cursory look, it
looks good.

But independent of which version is chosen, we *REALLY* need to make the
decision soon. This issue has held up the next beta (like jsonb has
blocked previous beta) for *weeks*.

Personally it doesn't make me very happy that Heikki and Tom had to be
the people stepping up to fix this.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Josh Berkus
On 09/25/2014 09:01 AM, Andres Freund wrote:
 But independent of which version is chosen, we *REALLY* need to make the
 decision soon. This issue has held up the next beta (like jsonb has
 blocked previous beta) for *weeks*.

Yes, please!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Bruce Momjian
On Thu, Sep 25, 2014 at 06:01:08PM +0200, Andres Freund wrote:
 But independent of which version is chosen, we *REALLY* need to make the
 decision soon. This issue has held up the next beta (like jsonb has
 blocked previous beta) for *weeks*.
 
 Personally it doesn't make me very happy that Heikki and Tom had to be
 the people stepping up to fix this.

I think there are a few reasons this has been delayed, aside from the
scheduling ones:

1.  compression issues were a surprise, and we are wondering if
there are any other surprises
2.  pg_upgrade makes future data format changes problematic
3.  9.3 multi-xact bugs spooked us into being more careful

I am not sure what we can do to increase our speed based on these items.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Josh Berkus
On 09/25/2014 10:14 AM, Bruce Momjian wrote:
 On Thu, Sep 25, 2014 at 06:01:08PM +0200, Andres Freund wrote:
 But independent of which version is chosen, we *REALLY* need to make the
 decision soon. This issue has held up the next beta (like jsonb has
 blocked previous beta) for *weeks*.

 Personally it doesn't make me very happy that Heikki and Tom had to be
 the people stepping up to fix this.
 
 I think there are a few reasons this has been delayed, aside from the
 scheduling ones:
 
   1.  compression issues were a surprise, and we are wondering if
   there are any other surprises
   2.  pg_upgrade makes future data format changes problematic
   3.  9.3 multi-xact bugs spooked us into being more careful
 
 I am not sure what we can do to increase our speed based on these items.

Alternately, this is delayed because:

1. We have one tested patch to fix the issue.

2. However, people are convinced that there's a better patch possible.

3. But nobody is working on this better patch except in their spare time.

Given this, I once again vote for releasing based on Tom's lengths-only
patch, which is done, tested, and ready to go.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Andres Freund
On 2014-09-25 10:18:24 -0700, Josh Berkus wrote:
 On 09/25/2014 10:14 AM, Bruce Momjian wrote:
  On Thu, Sep 25, 2014 at 06:01:08PM +0200, Andres Freund wrote:
  But independent of which version is chosen, we *REALLY* need to make the
  decision soon. This issue has held up the next beta (like jsonb has
  blocked previous beta) for *weeks*.
 
  Personally it doesn't make me very happy that Heikki and Tom had to be
  the people stepping up to fix this.
  
  I think there are a few reasons this has been delayed, aside from the
  scheduling ones:
  
  1.  compression issues were a surprise, and we are wondering if
  there are any other surprises
  2.  pg_upgrade makes future data format changes problematic
  3.  9.3 multi-xact bugs spooked us into being more careful
  
  I am not sure what we can do to increase our speed based on these items.
 
 Alternately, this is delayed because:
 
 1. We have one tested patch to fix the issue.
 
 2. However, people are convinced that there's a better patch possible.
 
 3. But nobody is working on this better patch except in their spare time.
 
 Given this, I once again vote for releasing based on Tom's lengths-only
 patch, which is done, tested, and ready to go.

Heikki's patch is there and polished.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Josh Berkus
On 09/25/2014 10:20 AM, Andres Freund wrote:
 On 2014-09-25 10:18:24 -0700, Josh Berkus wrote:
 On 09/25/2014 10:14 AM, Bruce Momjian wrote:
 On Thu, Sep 25, 2014 at 06:01:08PM +0200, Andres Freund wrote:
 But independent of which version is chosen, we *REALLY* need to make the
 decision soon. This issue has held up the next beta (like jsonb has
 blocked previous beta) for *weeks*.

 Personally it doesn't make me very happy that Heikki and Tom had to be
 the people stepping up to fix this.

 I think there are a few reasons this has been delayed, aside from the
 scheduling ones:

 1.  compression issues were a surprise, and we are wondering if
 there are any other surprises
 2.  pg_upgrade makes future data format changes problematic
 3.  9.3 multi-xact bugs spooked us into being more careful

 I am not sure what we can do to increase our speed based on these items.

 Alternately, this is delayed because:

 1. We have one tested patch to fix the issue.

 2. However, people are convinced that there's a better patch possible.

 3. But nobody is working on this better patch except in their spare time.

 Given this, I once again vote for releasing based on Tom's lengths-only
 patch, which is done, tested, and ready to go.
 
 Heikki's patch is there and polished.

If Heikki says it's ready, I'll test.  So far he's said that it wasn't
done yet.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Josh Berkus
On 09/25/2014 10:26 AM, Andres Freund wrote:
 On 2014-09-25 10:25:24 -0700, Josh Berkus wrote:
 If Heikki says it's ready, I'll test.  So far he's said that it wasn't
 done yet.
 
 http://www.postgresql.org/message-id/541c242e.3030...@vmware.com

Yeah, and that didn't include some of Tom's bug fixes apparently, per
the succeeding message.  Which is why I asked Heikki if he was done, to
which he has not replied.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Andres Freund
On 2014-09-25 10:29:51 -0700, Josh Berkus wrote:
 On 09/25/2014 10:26 AM, Andres Freund wrote:
  On 2014-09-25 10:25:24 -0700, Josh Berkus wrote:
  If Heikki says it's ready, I'll test.  So far he's said that it wasn't
  done yet.
  
  http://www.postgresql.org/message-id/541c242e.3030...@vmware.com
 
 Yeah, and that didn't include some of Tom's bug fixes apparently, per
 the succeeding message.  Which is why I asked Heikki if he was done, to
 which he has not replied.

Well, Heikki said he doesn't see any fixes in Tom's patch. But either
way, this isn't anything that should prevent you from testing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Andres Freund
On 2014-09-25 10:25:24 -0700, Josh Berkus wrote:
 If Heikki says it's ready, I'll test.  So far he's said that it wasn't
 done yet.

http://www.postgresql.org/message-id/541c242e.3030...@vmware.com

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 09/25/2014 10:26 AM, Andres Freund wrote:
 On 2014-09-25 10:25:24 -0700, Josh Berkus wrote:
 If Heikki says it's ready, I'll test.  So far he's said that it wasn't
 done yet.

 http://www.postgresql.org/message-id/541c242e.3030...@vmware.com

 Yeah, and that didn't include some of Tom's bug fixes apparently, per
 the succeeding message.  Which is why I asked Heikki if he was done, to
 which he has not replied.

I took a quick look at the two patches Heikki posted.  I find the
separate offsets array approach unappealing.  It takes more space
than the other approaches, and that space will be filled with data
that we already know will not be at all compressible.  Moreover,
AFAICS we'd have to engrave the stride on stone tablets, which as
I already mentioned I'd really like to not do.

The offsets-and-lengths patch seems like the approach we ought to
compare to my patch, but it looks pretty unfinished to me: AFAICS it
includes logic to understand offsets sprinkled into a mostly-lengths
array, but no logic that would actually *store* any such offsets,
which means it's going to act just like my patch for performance
purposes.

In the interests of pushing this forward, I will work today on
trying to finish and review Heikki's offsets-and-lengths patch
so that we have something we can do performance testing on.
I doubt that the performance testing will tell us anything we
don't expect, but we should do it anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Josh Berkus
On 09/25/2014 11:22 AM, Tom Lane wrote:
 In the interests of pushing this forward, I will work today on
 trying to finish and review Heikki's offsets-and-lengths patch
 so that we have something we can do performance testing on.
 I doubt that the performance testing will tell us anything we
 don't expect, but we should do it anyway.

OK.  I'll spend some time trying to get Socorro with JSONB working so
that I'll have a second test case.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Tom Lane
BTW, it seems like there is consensus that we ought to reorder the items
in a jsonb object to have keys first and then values, independently of the
other issues under discussion.  This means we *will* be breaking on-disk
compatibility with 9.4beta2, which means pg_upgrade will need to be taught
to refuse an upgrade if the database contains any jsonb columns.  Bruce,
do you have time to crank out a patch for that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Bruce Momjian
On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
 BTW, it seems like there is consensus that we ought to reorder the items
 in a jsonb object to have keys first and then values, independently of the
 other issues under discussion.  This means we *will* be breaking on-disk
 compatibility with 9.4beta2, which means pg_upgrade will need to be taught
 to refuse an upgrade if the database contains any jsonb columns.  Bruce,
 do you have time to crank out a patch for that?

Yes, I can do that easily.  Tell me when you want it --- I just need a
catalog version number to trigger on.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Andres Freund
On 2014-09-25 14:46:18 -0400, Bruce Momjian wrote:
 On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
  BTW, it seems like there is consensus that we ought to reorder the items
  in a jsonb object to have keys first and then values, independently of the
  other issues under discussion.  This means we *will* be breaking on-disk
  compatibility with 9.4beta2, which means pg_upgrade will need to be taught
  to refuse an upgrade if the database contains any jsonb columns.  Bruce,
  do you have time to crank out a patch for that?
 
 Yes, I can do that easily.  Tell me when you want it --- I just need a
 catalog version number to trigger on.

Do you plan to make it conditional on jsonb being used in the database?
That'd not be bad to reduce the pain for testers that haven't used jsonb.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Bruce Momjian
On Thu, Sep 25, 2014 at 09:00:07PM +0200, Andres Freund wrote:
 On 2014-09-25 14:46:18 -0400, Bruce Momjian wrote:
  On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
   BTW, it seems like there is consensus that we ought to reorder the items
   in a jsonb object to have keys first and then values, independently of the
   other issues under discussion.  This means we *will* be breaking on-disk
   compatibility with 9.4beta2, which means pg_upgrade will need to be taught
   to refuse an upgrade if the database contains any jsonb columns.  Bruce,
   do you have time to crank out a patch for that?
  
  Yes, I can do that easily.  Tell me when you want it --- I just need a
  catalog version number to trigger on.
 
 Do you plan to make it conditional on jsonb being used in the database?
 That'd not be bad to reduce the pain for testers that haven't used jsonb.

Yes, I already have code that scans pg_attribute looking for columns
with problematic data types and output them to a file, and then throw an
error.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Alvaro Herrera
Bruce Momjian wrote:

   3.  9.3 multi-xact bugs spooked us into being more careful

Uh.  Multixact changes in 9.3 were infinitely more invasive than the
jsonb changes will ever be.  a) they touched basic visibility design and 
routines,
which are complex, understood by very few people, and have remained
mostly unchanged for ages; b) they changed on-disk format for an
underlying support structure, requiring pg_upgrade to handle the
conversion; c) they added new catalog infrastructure to keep track of
required freezing; d) they introduced new uint32 counters subject to
wraparound; e) they introduced a novel user of slru.c with 5-char long
filenames; f) they messed with tuple locking protocol and EvalPlanQual
logic for traversing update chains.  Maybe I'm forgetting others.

JSONB has none of these properties.  As far as I can see, the only hairy
issue here (other than getting Josh Berkus to actually test the proposed
patches) is that JSONB is changing on-disk format; but we're avoiding
most issues there by dictating that people with existing JSONB databases
need to pg_dump them, i.e. there is no conversion step being written for
pg_upgrade.

It's good to be careful; it's even better to be more careful.  I too
have learned a lesson there.

Anyway I have no opinion on the JSONB stuff, other than considering that
ignoring performance for large arrays and large objects seems to run
counter to the whole point of JSONB in the first place (and of course
failing to compress is part of that, too.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Tom Lane
I wrote:
 The offsets-and-lengths patch seems like the approach we ought to
 compare to my patch, but it looks pretty unfinished to me: AFAICS it
 includes logic to understand offsets sprinkled into a mostly-lengths
 array, but no logic that would actually *store* any such offsets,
 which means it's going to act just like my patch for performance
 purposes.

 In the interests of pushing this forward, I will work today on
 trying to finish and review Heikki's offsets-and-lengths patch
 so that we have something we can do performance testing on.
 I doubt that the performance testing will tell us anything we
 don't expect, but we should do it anyway.

I've now done that, and attached is what I think would be a committable
version.  Having done this work, I no longer think that this approach
is significantly messier code-wise than the all-lengths version, and
it does have the merit of not degrading on very large objects/arrays.
So at the moment I'm leaning to this solution not the all-lengths one.

To get a sense of the compression effects of varying the stride distance,
I repeated the compression measurements I'd done on 14 August with Pavel's
geometry data (24077.1408052...@sss.pgh.pa.us).  The upshot of that was

min max avg

external text representation220 172685  880.3
JSON representation (compressed text)   224 78565   541.3
pg_column_size, JSONB HEAD repr.225 82540   639.0
pg_column_size, all-lengths repr.   225 66794   531.1

Here's what I get with this patch and different stride distances:

JB_OFFSET_STRIDE = 8225 68551   559.7
JB_OFFSET_STRIDE = 16   225 67601   552.3
JB_OFFSET_STRIDE = 32   225 67120   547.4
JB_OFFSET_STRIDE = 64   225 66886   546.9
JB_OFFSET_STRIDE = 128  225 66879   546.9
JB_OFFSET_STRIDE = 256  225 66846   546.8

So at least for that test data, 32 seems like the sweet spot.
We are giving up a couple percent of space in comparison to the
all-lengths version, but this is probably an acceptable tradeoff
for not degrading on very large arrays.

I've not done any speed testing.

regards, tom lane

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 2fd87fc..9beebb3 100644
*** a/src/backend/utils/adt/jsonb.c
--- b/src/backend/utils/adt/jsonb.c
*** jsonb_from_cstring(char *json, int len)
*** 196,207 
  static size_t
  checkStringLen(size_t len)
  {
! 	if (len  JENTRY_POSMASK)
  		ereport(ERROR,
  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
   errmsg(string too long to represent as jsonb string),
   errdetail(Due to an implementation restriction, jsonb strings cannot exceed %d bytes.,
! 		   JENTRY_POSMASK)));
  
  	return len;
  }
--- 196,207 
  static size_t
  checkStringLen(size_t len)
  {
! 	if (len  JENTRY_OFFLENMASK)
  		ereport(ERROR,
  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
   errmsg(string too long to represent as jsonb string),
   errdetail(Due to an implementation restriction, jsonb strings cannot exceed %d bytes.,
! 		   JENTRY_OFFLENMASK)));
  
  	return len;
  }
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..f157df3 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
***
*** 26,40 
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (the total size of an array's elements is also limited by JENTRY_POSMASK,
!  * but we're not concerned about that here)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
  
! static void fillJsonbValue(JEntry *array, int index, char *base_addr,
  			   JsonbValue *result);
! static bool	equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int	compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static Jsonb *convertToJsonb(JsonbValue *val);
  static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
--- 26,41 
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (The total size of an array's or object's elements is also limited by
!  * JENTRY_OFFLENMASK, but we're not concerned about that here.)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
  
! static void fillJsonbValue(JsonbContainer *container, int index,
! 			   char *base_addr, uint32 offset,
  			   JsonbValue *result);
! static bool equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int	

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-25 Thread Josh Berkus
On 09/25/2014 08:10 PM, Tom Lane wrote:
 I wrote:
 The offsets-and-lengths patch seems like the approach we ought to
 compare to my patch, but it looks pretty unfinished to me: AFAICS it
 includes logic to understand offsets sprinkled into a mostly-lengths
 array, but no logic that would actually *store* any such offsets,
 which means it's going to act just like my patch for performance
 purposes.
 
 In the interests of pushing this forward, I will work today on
 trying to finish and review Heikki's offsets-and-lengths patch
 so that we have something we can do performance testing on.
 I doubt that the performance testing will tell us anything we
 don't expect, but we should do it anyway.
 
 I've now done that, and attached is what I think would be a committable
 version.  Having done this work, I no longer think that this approach
 is significantly messier code-wise than the all-lengths version, and
 it does have the merit of not degrading on very large objects/arrays.
 So at the moment I'm leaning to this solution not the all-lengths one.
 
 To get a sense of the compression effects of varying the stride distance,
 I repeated the compression measurements I'd done on 14 August with Pavel's
 geometry data (24077.1408052...@sss.pgh.pa.us).  The upshot of that was
 
   min max avg
 
 external text representation  220 172685  880.3
 JSON representation (compressed text) 224 78565   541.3
 pg_column_size, JSONB HEAD repr.  225 82540   639.0
 pg_column_size, all-lengths repr. 225 66794   531.1
 
 Here's what I get with this patch and different stride distances:
 
 JB_OFFSET_STRIDE = 8  225 68551   559.7
 JB_OFFSET_STRIDE = 16 225 67601   552.3
 JB_OFFSET_STRIDE = 32 225 67120   547.4
 JB_OFFSET_STRIDE = 64 225 66886   546.9
 JB_OFFSET_STRIDE = 128225 66879   546.9
 JB_OFFSET_STRIDE = 256225 66846   546.8
 
 So at least for that test data, 32 seems like the sweet spot.
 We are giving up a couple percent of space in comparison to the
 all-lengths version, but this is probably an acceptable tradeoff
 for not degrading on very large arrays.
 
 I've not done any speed testing.

I'll do some tommorrow.  I should have some different DBs to test on, too.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-24 Thread Heikki Linnakangas

On 09/24/2014 08:16 AM, Tom Lane wrote:

Jan Wieck j...@wi3ck.info writes:

On 09/15/2014 09:46 PM, Craig Ringer wrote:

Anyway - this is looking like the change will go in, and with it a
catversion bump. Introduction of a jsonb version/flags byte might be
worthwhile at the same time. It seems likely that there'll be more room
for improvement in jsonb, possibly even down to using different formats
for different data.

Is it worth paying a byte per value to save on possible upgrade pain?



If there indeed has to be a catversion bump in the process of this, then
I agree with Craig.


FWIW, I don't really.  To begin with, it wouldn't be a byte per value,
it'd be four bytes, because we need word-alignment of the jsonb contents
so there's noplace to squeeze in an ID byte for free.  Secondly, as I
wrote in 15378.1408548...@sss.pgh.pa.us:

: There remains the
: question of whether to take this opportunity to add a version ID to the
: binary format.  I'm not as excited about that idea as I originally was;
: having now studied the code more carefully, I think that any expansion
: would likely happen by adding more type codes and/or commandeering the
: currently-unused high-order bit of JEntrys.  We don't need a version ID
: in the header for that.  Moreover, if we did have such an ID, it would be
: notationally painful to get it to most of the places that might need it.

Heikki's patch would eat up the high-order JEntry bits, but the other
points remain.


If we don't need to be backwards-compatible with the 9.4beta on-disk 
format, we don't necessarily need to eat the high-order JEntry bit. You 
can just assume that that every nth element is stored as an offset, and 
the rest as lengths. Although it would be nice to have the flag for it 
explicitly.


There are also a few free bits in the JsonbContainer header that can be 
used as a version ID in the future. So I don't think we need to change 
the format to add an explicit version ID field.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-24 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 09/24/2014 08:16 AM, Tom Lane wrote:
 Heikki's patch would eat up the high-order JEntry bits, but the other
 points remain.

 If we don't need to be backwards-compatible with the 9.4beta on-disk 
 format, we don't necessarily need to eat the high-order JEntry bit. You 
 can just assume that that every nth element is stored as an offset, and 
 the rest as lengths. Although it would be nice to have the flag for it 
 explicitly.

If we go with this approach, I think that we *should* eat the high bit
for it.  The main reason I want to do that is that it avoids having to
engrave the value of N on stone tablets.  I think that we should use
a pretty large value of N --- maybe 32 or so --- and having the freedom
to change it later based on experience seems like a good thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-23 Thread Peter Geoghegan
On Fri, Sep 19, 2014 at 5:40 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I think we should bite the bullet and break compatibility with 9.4beta2
 format, even if we go with my patch. In a jsonb object, it makes sense to
 store all the keys first, like Tom did, because of cache benefits, and the
 future possibility to do smart EXTERNAL access. Also, even if we can make
 the on-disk format compatible, it's weird that you can get different runtime
 behavior with datums created with a beta version. Seems more clear to just
 require a pg_dump + restore.

I vote for going with your patch, and breaking compatibility for the
reasons stated here (though I'm skeptical of the claims about cache
benefits, FWIW).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-23 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 On Fri, Sep 19, 2014 at 5:40 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 I think we should bite the bullet and break compatibility with 9.4beta2
 format, even if we go with my patch. In a jsonb object, it makes sense to
 store all the keys first, like Tom did, because of cache benefits, and the
 future possibility to do smart EXTERNAL access. Also, even if we can make
 the on-disk format compatible, it's weird that you can get different runtime
 behavior with datums created with a beta version. Seems more clear to just
 require a pg_dump + restore.

 I vote for going with your patch, and breaking compatibility for the
 reasons stated here (though I'm skeptical of the claims about cache
 benefits, FWIW).

I'm also skeptical of that, but I think the potential for smart EXTERNAL
access is a valid consideration.

I've not had time to read Heikki's updated patch yet --- has anyone
else compared the two patches for code readability?  If they're fairly
close on that score, then I'd agree his approach is the best solution.
(I will look at his code, but I'm not sure I'm the most unbiased
observer.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-23 Thread Jan Wieck

On 09/15/2014 09:46 PM, Craig Ringer wrote:

On 09/16/2014 07:44 AM, Peter Geoghegan wrote:

FWIW, I am slightly concerned about weighing use cases around very
large JSON documents too heavily. Having enormous jsonb documents just
isn't going to work out that well, but neither will equivalent designs
in popular document database systems for similar reasons. For example,
the maximum BSON document size supported by MongoDB is 16 megabytes,
and that seems to be something that their users don't care too much
about. Having 270 pairs in an object isn't unreasonable, but it isn't
going to be all that common either.


Also, at a certain size the fact that Pg must rewrite the whole document
for any change to it starts to introduce other practical changes.

Anyway - this is looking like the change will go in, and with it a
catversion bump. Introduction of a jsonb version/flags byte might be
worthwhile at the same time. It seems likely that there'll be more room
for improvement in jsonb, possibly even down to using different formats
for different data.

Is it worth paying a byte per value to save on possible upgrade pain?



This comment seems to have drowned in the discussion.

If there indeed has to be a catversion bump in the process of this, then 
I agree with Craig.



Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-23 Thread Peter Geoghegan
On Tue, Sep 23, 2014 at 10:02 PM, Jan Wieck j...@wi3ck.info wrote:
 Is it worth paying a byte per value to save on possible upgrade pain?


 This comment seems to have drowned in the discussion.

 If there indeed has to be a catversion bump in the process of this, then I
 agree with Craig.

-1. We already have a reserved bit.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-23 Thread Tom Lane
Jan Wieck j...@wi3ck.info writes:
 On 09/15/2014 09:46 PM, Craig Ringer wrote:
 Anyway - this is looking like the change will go in, and with it a
 catversion bump. Introduction of a jsonb version/flags byte might be
 worthwhile at the same time. It seems likely that there'll be more room
 for improvement in jsonb, possibly even down to using different formats
 for different data.
 
 Is it worth paying a byte per value to save on possible upgrade pain?

 If there indeed has to be a catversion bump in the process of this, then 
 I agree with Craig.

FWIW, I don't really.  To begin with, it wouldn't be a byte per value,
it'd be four bytes, because we need word-alignment of the jsonb contents
so there's noplace to squeeze in an ID byte for free.  Secondly, as I
wrote in 15378.1408548...@sss.pgh.pa.us:

: There remains the
: question of whether to take this opportunity to add a version ID to the
: binary format.  I'm not as excited about that idea as I originally was;
: having now studied the code more carefully, I think that any expansion
: would likely happen by adding more type codes and/or commandeering the
: currently-unused high-order bit of JEntrys.  We don't need a version ID
: in the header for that.  Moreover, if we did have such an ID, it would be
: notationally painful to get it to most of the places that might need it.

Heikki's patch would eat up the high-order JEntry bits, but the other
points remain.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-22 Thread Josh Berkus
On 09/19/2014 07:07 AM, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
 Tom: You mentioned earlier that your patch fixes some existing bugs. 
 What were they?
 
 What I remember at the moment (sans caffeine) is that the routines for
 assembling jsonb values out of field data were lacking some necessary
 tests for overflow of the size/offset fields.  If you like I can apply
 those fixes separately, but I think they were sufficiently integrated with
 other changes in the logic that it wouldn't really help much for patch
 reviewability.

Where are we on this?  Do we have a patch ready for testing?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-19 Thread Heikki Linnakangas

On 09/18/2014 09:27 PM, Heikki Linnakangas wrote:

On 09/18/2014 07:53 PM, Josh Berkus wrote:

On 09/16/2014 08:45 PM, Tom Lane wrote:

We're somewhat comparing apples and oranges here, in that I pushed my
approach to something that I think is of committable quality (and which,
not incidentally, fixes some existing bugs that we'd need to fix in any
case); while Heikki's patch was just proof-of-concept.  It would be worth
pushing Heikki's patch to committable quality so that we had a more
complete understanding of just what the complexity difference really is.


Is anyone actually working on this?

If not, I'm voting for the all-lengths patch so that we can get 9.4 out
the door.


I'll try to write a more polished patch tomorrow. We'll then see what it
looks like, and can decide if we want it.


Ok, here are two patches. One is a refined version of my earlier patch, 
and the other implements the separate offsets array approach. They are 
both based on Tom's jsonb-lengths-merged.patch, so they include all the 
whitespace fixes etc. he mentioned.


There is no big difference in terms of code complexity between the 
patches. IMHO the separate offsets array is easier to understand, but it 
makes for more complicated accessor macros to find the beginning of the 
variable-length data.


Unlike Tom's patch, these patches don't cache any offsets when doing a 
binary search. Doesn't seem worth it, when the access time is O(1) anyway.


Both of these patches have a #define JB_OFFSET_STRIDE for the stride 
size. For the separate offsets array, the offsets array has one element 
for every JB_OFFSET_STRIDE children. For the other patch, every 
JB_OFFSET_STRIDE child stores the end offset, while others store the 
length. A smaller value makes random access faster, at the cost of 
compressibility / on-disk size. I haven't done any measurements to find 
the optimal value, the values in the patches are arbitrary.


I think we should bite the bullet and break compatibility with 9.4beta2 
format, even if we go with my patch. In a jsonb object, it makes sense 
to store all the keys first, like Tom did, because of cache benefits, 
and the future possibility to do smart EXTERNAL access. Also, even if we 
can make the on-disk format compatible, it's weird that you can get 
different runtime behavior with datums created with a beta version. 
Seems more clear to just require a pg_dump + restore.


Tom: You mentioned earlier that your patch fixes some existing bugs. 
What were they? There were a bunch of whitespace and comment fixes that 
we should apply in any case, but I couldn't see any actual bugs. I think 
we should apply those fixes separately, to make sure we don't forget 
about them, and to make it easier to review these patches.


- Heikki

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 2fd87fc..456011a 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -196,12 +196,12 @@ jsonb_from_cstring(char *json, int len)
 static size_t
 checkStringLen(size_t len)
 {
-	if (len  JENTRY_POSMASK)
+	if (len  JENTRY_LENMASK)
 		ereport(ERROR,
 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
  errmsg(string too long to represent as jsonb string),
  errdetail(Due to an implementation restriction, jsonb strings cannot exceed %d bytes.,
-		   JENTRY_POSMASK)));
+		   JENTRY_LENMASK)));
 
 	return len;
 }
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..7f7ed4f 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -26,15 +26,16 @@
  * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
  * reserved for that in the JsonbContainer.header field.
  *
- * (the total size of an array's elements is also limited by JENTRY_POSMASK,
- * but we're not concerned about that here)
+ * (The total size of an array's or object's elements is also limited by
+ * JENTRY_LENMASK, but we're not concerned about that here.)
  */
 #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
 #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
 
-static void fillJsonbValue(JEntry *array, int index, char *base_addr,
+static void fillJsonbValue(JEntry *children, int index,
+			   char *base_addr, uint32 offset,
 			   JsonbValue *result);
-static bool	equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
+static bool equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
 static int	compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
 static Jsonb *convertToJsonb(JsonbValue *val);
 static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
@@ -42,7 +43,7 @@ static void convertJsonbArray(StringInfo buffer, JEntry *header, JsonbValue *val
 static void convertJsonbObject(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
 static void convertJsonbScalar(StringInfo buffer, JEntry *header, JsonbValue 

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-19 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 Tom: You mentioned earlier that your patch fixes some existing bugs. 
 What were they?

What I remember at the moment (sans caffeine) is that the routines for
assembling jsonb values out of field data were lacking some necessary
tests for overflow of the size/offset fields.  If you like I can apply
those fixes separately, but I think they were sufficiently integrated with
other changes in the logic that it wouldn't really help much for patch
reviewability.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-18 Thread Josh Berkus
On 09/16/2014 08:45 PM, Tom Lane wrote:
 We're somewhat comparing apples and oranges here, in that I pushed my
 approach to something that I think is of committable quality (and which,
 not incidentally, fixes some existing bugs that we'd need to fix in any
 case); while Heikki's patch was just proof-of-concept.  It would be worth
 pushing Heikki's patch to committable quality so that we had a more
 complete understanding of just what the complexity difference really is.

Is anyone actually working on this?

If not, I'm voting for the all-lengths patch so that we can get 9.4 out
the door.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-18 Thread Heikki Linnakangas

On 09/18/2014 07:53 PM, Josh Berkus wrote:

On 09/16/2014 08:45 PM, Tom Lane wrote:

We're somewhat comparing apples and oranges here, in that I pushed my
approach to something that I think is of committable quality (and which,
not incidentally, fixes some existing bugs that we'd need to fix in any
case); while Heikki's patch was just proof-of-concept.  It would be worth
pushing Heikki's patch to committable quality so that we had a more
complete understanding of just what the complexity difference really is.


Is anyone actually working on this?

If not, I'm voting for the all-lengths patch so that we can get 9.4 out
the door.


I'll try to write a more polished patch tomorrow. We'll then see what it 
looks like, and can decide if we want it.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-16 Thread Robert Haas
On Mon, Sep 15, 2014 at 7:44 PM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus j...@agliodbs.com wrote:
 Actually, having the keys all at the same level *is* relevant for the
 issue we're discussing.  If those 270 keys are organized in a tree, it's
 not the same as having them all on one level (and not as problematic).

 I believe Robert meant that the 270 keys are not at the top level, but
 are at some level (in other words, some object has 270 pairs). That is
 equivalent to having them at the top level for the purposes of this
 discussion.

Yes, that's exactly what I meant.

 FWIW, I am slightly concerned about weighing use cases around very
 large JSON documents too heavily. Having enormous jsonb documents just
 isn't going to work out that well, but neither will equivalent designs
 in popular document database systems for similar reasons. For example,
 the maximum BSON document size supported by MongoDB is 16 megabytes,
 and that seems to be something that their users don't care too much
 about. Having 270 pairs in an object isn't unreasonable, but it isn't
 going to be all that common either.

The JSON documents in this case were not particularly large.  These
objects were  100kB; they just had a lot of keys.   I'm a little
baffled by the apparent theme that people think that (object size) /
(# of keys) will tend to be large.  Maybe there will be some instances
where that's the case, but it's not what I'd expect.  I would expect
people to use JSON to serialize structured data in situations where
normalizing would be unwieldly.

For example, pick your favorite Facebook or Smartphone game - Plants
vs. Zombies, Farmville, Candy Crush Saga, whatever.  Or even a
traditional board game like chess.  Think about what the game state
looks like as an abstract object.  Almost without exception, you've
got some kind of game board with a bunch of squares and then you have
a bunch of pieces (plants, crops, candies, pawns) that are positioned
on those squares.  Now you want to store this in a database.  You're
certainly not going to have a table column per square, and EAV would
be stupid, so what's left?  You could use an array, but an array of
strings might not be descriptive enough; for a square in Farmville,
for example, you might need to know the type of crop, and whether it
was fertilized with special magic fertilizer, and when it's going to
be ready to harvest, and when it'll wither if not harvested.  So a
JSON is a pretty natural structure: an array of arrays of objects.  If
you have a 30x30 farm, you'll have 900 keys.  If you have a 50x50
farm, which probably means you're spending real money to buy imaginary
plants, you'll have 2500 keys.

(For the record, I have no actual knowledge of how any of these games
are implemented under the hood.  I'm just speculating on how I would
have done it.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-16 Thread Josh Berkus
On 09/16/2014 06:31 AM, Robert Haas wrote:
 On Mon, Sep 15, 2014 at 7:44 PM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus j...@agliodbs.com wrote:
 Actually, having the keys all at the same level *is* relevant for the
 issue we're discussing.  If those 270 keys are organized in a tree, it's
 not the same as having them all on one level (and not as problematic).

 I believe Robert meant that the 270 keys are not at the top level, but
 are at some level (in other words, some object has 270 pairs). That is
 equivalent to having them at the top level for the purposes of this
 discussion.
 
 Yes, that's exactly what I meant.
 
 FWIW, I am slightly concerned about weighing use cases around very
 large JSON documents too heavily. Having enormous jsonb documents just
 isn't going to work out that well, but neither will equivalent designs
 in popular document database systems for similar reasons. For example,
 the maximum BSON document size supported by MongoDB is 16 megabytes,
 and that seems to be something that their users don't care too much
 about. Having 270 pairs in an object isn't unreasonable, but it isn't
 going to be all that common either.

Well, I can only judge from the use cases I personally have, none of
which involve more than 100 keys at any level for most rows.  So far
I've seen some people argue hypotetical use cases involving hundreds of
keys per level, but nobody who *actually* has such a use case.  Also,
note that we currently don't know where the last value extraction
becomes a performance problem at this stage, except that it's somewhere
between 200 and 100,000.  Also, we don't have a test which shows the
hybrid approach (Heikki's patch) performing better with 1000's of keys.

Basically, if someone is going to make a serious case for Heikki's
hybrid approach over the simpler lengths-only approach, then please post
some test data showing the benefit ASAP, since I can't demonstrate it.
Otherwise, let's get beta 3 out the door so we can get the 9.4 release
train moving again.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-16 Thread Robert Haas
On Tue, Sep 16, 2014 at 12:47 PM, Josh Berkus j...@agliodbs.com wrote:
 On 09/16/2014 06:31 AM, Robert Haas wrote:
 On Mon, Sep 15, 2014 at 7:44 PM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus j...@agliodbs.com wrote:
 Actually, having the keys all at the same level *is* relevant for the
 issue we're discussing.  If those 270 keys are organized in a tree, it's
 not the same as having them all on one level (and not as problematic).

 I believe Robert meant that the 270 keys are not at the top level, but
 are at some level (in other words, some object has 270 pairs). That is
 equivalent to having them at the top level for the purposes of this
 discussion.

 Yes, that's exactly what I meant.

 FWIW, I am slightly concerned about weighing use cases around very
 large JSON documents too heavily. Having enormous jsonb documents just
 isn't going to work out that well, but neither will equivalent designs
 in popular document database systems for similar reasons. For example,
 the maximum BSON document size supported by MongoDB is 16 megabytes,
 and that seems to be something that their users don't care too much
 about. Having 270 pairs in an object isn't unreasonable, but it isn't
 going to be all that common either.

 Well, I can only judge from the use cases I personally have, none of
 which involve more than 100 keys at any level for most rows.  So far
 I've seen some people argue hypotetical use cases involving hundreds of
 keys per level, but nobody who *actually* has such a use case.

I already told you that I did, and that it was the only and only app I
had written for JSONB.

 Also,
 note that we currently don't know where the last value extraction
 becomes a performance problem at this stage, except that it's somewhere
 between 200 and 100,000.  Also, we don't have a test which shows the
 hybrid approach (Heikki's patch) performing better with 1000's of keys.

Fair point.

 Basically, if someone is going to make a serious case for Heikki's
 hybrid approach over the simpler lengths-only approach, then please post
 some test data showing the benefit ASAP, since I can't demonstrate it.
 Otherwise, let's get beta 3 out the door so we can get the 9.4 release
 train moving again.

I don't personally care about this enough to spend more time on it.  I
told you my extremely limited experience because it seems to
contradict your broader experience.  If you don't care, you don't
care.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-16 Thread Josh Berkus
On 09/16/2014 09:54 AM, Robert Haas wrote:
 On Tue, Sep 16, 2014 at 12:47 PM, Josh Berkus j...@agliodbs.com wrote:
 On 09/16/2014 06:31 AM, Robert Haas wrote:
 On Mon, Sep 15, 2014 at 7:44 PM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus j...@agliodbs.com wrote:
 Actually, having the keys all at the same level *is* relevant for the
 issue we're discussing.  If those 270 keys are organized in a tree, it's
 not the same as having them all on one level (and not as problematic).

 I believe Robert meant that the 270 keys are not at the top level, but
 are at some level (in other words, some object has 270 pairs). That is
 equivalent to having them at the top level for the purposes of this
 discussion.

 Yes, that's exactly what I meant.

 FWIW, I am slightly concerned about weighing use cases around very
 large JSON documents too heavily. Having enormous jsonb documents just
 isn't going to work out that well, but neither will equivalent designs
 in popular document database systems for similar reasons. For example,
 the maximum BSON document size supported by MongoDB is 16 megabytes,
 and that seems to be something that their users don't care too much
 about. Having 270 pairs in an object isn't unreasonable, but it isn't
 going to be all that common either.

 Well, I can only judge from the use cases I personally have, none of
 which involve more than 100 keys at any level for most rows.  So far
 I've seen some people argue hypotetical use cases involving hundreds of
 keys per level, but nobody who *actually* has such a use case.
 
 I already told you that I did, and that it was the only and only app I
 had written for JSONB.

Ah, ok, I thought yours was a test case.  Did you check how it performed
on the two patches at all?  My tests with 185 keys didn't show any
difference, including for a last key case.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-16 Thread Heikki Linnakangas

On 09/16/2014 07:47 PM, Josh Berkus wrote:

On 09/16/2014 06:31 AM, Robert Haas wrote:

On Mon, Sep 15, 2014 at 7:44 PM, Peter Geoghegan p...@heroku.com wrote:

On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus j...@agliodbs.com wrote:

Actually, having the keys all at the same level *is* relevant for the
issue we're discussing.  If those 270 keys are organized in a tree, it's
not the same as having them all on one level (and not as problematic).


I believe Robert meant that the 270 keys are not at the top level, but
are at some level (in other words, some object has 270 pairs). That is
equivalent to having them at the top level for the purposes of this
discussion.


Yes, that's exactly what I meant.


FWIW, I am slightly concerned about weighing use cases around very
large JSON documents too heavily. Having enormous jsonb documents just
isn't going to work out that well, but neither will equivalent designs
in popular document database systems for similar reasons. For example,
the maximum BSON document size supported by MongoDB is 16 megabytes,
and that seems to be something that their users don't care too much
about. Having 270 pairs in an object isn't unreasonable, but it isn't
going to be all that common either.


Well, I can only judge from the use cases I personally have, none of
which involve more than 100 keys at any level for most rows.  So far
I've seen some people argue hypotetical use cases involving hundreds of
keys per level, but nobody who *actually* has such a use case.  Also,
note that we currently don't know where the last value extraction
becomes a performance problem at this stage, except that it's somewhere
between 200 and 100,000.  Also, we don't have a test which shows the
hybrid approach (Heikki's patch) performing better with 1000's of keys.

Basically, if someone is going to make a serious case for Heikki's
hybrid approach over the simpler lengths-only approach, then please post
some test data showing the benefit ASAP, since I can't demonstrate it.
Otherwise, let's get beta 3 out the door so we can get the 9.4 release
train moving again.


Are you looking for someone with a real life scenario, or just synthetic 
test case? The latter is easy to do.


See attached test program. It's basically the same I posted earlier. 
Here are the results from my laptop with Tom's jsonb-lengths-merged.patch:


postgres=# select * from testtimes ;
 elem | duration_ms
--+-
   11 |0.289508
   12 |0.288122
   13 |0.290558
   14 |0.287889
   15 |0.286303
   17 |0.290415
   19 |0.289829
   21 |0.289783
   23 |0.287104
   25 |0.289834
   28 |0.290735
   31 |0.291844
   34 |0.293454
   37 |0.293866
   41 |0.291217
   45 |0.289243
   50 |0.290385
   55 |0.292085
   61 |0.290892
   67 |0.292335
   74 |0.292561
   81 |0.291416
   89 |0.295714
   98 | 0.29844
  108 |0.297421
  119 |0.299471
  131 |0.299877
  144 |0.301604
  158 |0.303365
  174 |0.304203
  191 |0.303596
  210 |0.306526
  231 |0.304189
  254 |0.307782
  279 |0.307372
  307 |0.306873
  338 |0.310471
  372 |  0.3151
  409 |0.320354
  450 | 0.32038
  495 |0.322127
  545 |0.323256
  600 |0.330419
  660 |0.334226
  726 |0.336951
  799 | 0.34108
  879 |0.347746
  967 |0.354275
 1064 |0.356696
 1170 |0.366906
 1287 |0.375352
 1416 |0.392952
 1558 |0.392907
 1714 |0.402157
 1885 |0.412384
 2074 |0.425958
 2281 |0.435415
 2509 | 0.45301
 2760 |0.469983
 3036 |0.487329
 3340 |0.505505
 3674 |0.530412
 4041 |0.552585
 4445 |0.581815
 4890 |0.610509
 5379 |0.642885
 5917 |0.680395
 6509 |0.713849
 7160 |0.757561
 7876 |0.805225
 8664 |0.856142
 9530 |0.913255
(72 rows)

That's up to 9530 elements - it's pretty easy to extrapolate from there 
to higher counts, it's O(n).


With unpatched git master, the runtime is flat, regardless of which 
element is queried, at about 0.29 s. With 
jsonb-with-offsets-and-lengths-2.patch, there's no difference that I 
could measure.


The difference starts to be meaningful at around 500 entries. In 
practice, I doubt anyone's going to notice until you start talking about 
tens of thousands of entries.


I'll leave it up to the jury to decide if we care or not. It seems like 
a fairly unusual use case, where you push around large enough arrays or 
objects to notice. Then again, I'm sure *someone* will do it. People do 
strange things, and they find ways to abuse the features that the 
original developers didn't think of.


- Heikki



jsonb-lengths.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-16 Thread Claudio Freire
On Tue, Sep 16, 2014 at 3:12 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I'll leave it up to the jury to decide if we care or not. It seems like a
 fairly unusual use case, where you push around large enough arrays or
 objects to notice. Then again, I'm sure *someone* will do it. People do
 strange things, and they find ways to abuse the features that the original
 developers didn't think of.

Again, it's not abusing of the feature. It's using it. Jsonb is
supposed to be fast for this.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-16 Thread Robert Haas
On Tue, Sep 16, 2014 at 1:11 PM, Josh Berkus j...@agliodbs.com wrote:
 Well, I can only judge from the use cases I personally have, none of
 which involve more than 100 keys at any level for most rows.  So far
 I've seen some people argue hypotetical use cases involving hundreds of
 keys per level, but nobody who *actually* has such a use case.

 I already told you that I did, and that it was the only and only app I
 had written for JSONB.

 Ah, ok, I thought yours was a test case.  Did you check how it performed
 on the two patches at all?  My tests with 185 keys didn't show any
 difference, including for a last key case.

No, I didn't test it.   But I think Heikki's test results pretty much
tell us everything there is to see here.  This isn't really that
complicated; I've read a few papers on index compression over the
years and they seem to often use techniques that have the same general
flavor as what Heikki did here, adding complexity in the data format
to gain other advantages.  So I don't think we should be put off.

Basically, I think that if we make a decision to use Tom's patch
rather than Heikki's patch, we're deciding that the initial decision,
by the folks who wrote the original jsonb code, to make array access
less than O(n) was misguided.  While that could be true, I'd prefer to
bet that those folks knew what they were doing.  The only way reason
we're even considering changing it is that the array of lengths
doesn't compress well, and we've got an approach that fixes that
problem while preserving the advantages of fast lookup.  We should
have a darn fine reason to say no to that approach, and it didn't
benefit my particular use case is not it.

In practice, I'm not very surprised that the impact doesn't seem too
bad when you're running SQL queries from the client.  There's so much
other overhead, for de-TOASTing and client communication and even just
planner and executor costs, that this gets lost in the noise.  But
think about a PL/pgsql procedure, say, where somebody might loop over
all of the elements in array.  If those operations go from O(1) to
O(n), then the loop goes from O(n) to O(n^2).  I will bet you a
beverage of your choice that somebody will find that behavior within a
year of release and be dismayed by it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-16 Thread Josh Berkus
Heikki, Robert:

On 09/16/2014 11:12 AM, Heikki Linnakangas wrote:
 Are you looking for someone with a real life scenario, or just synthetic
 test case? The latter is easy to do.
 
 See attached test program. It's basically the same I posted earlier.
 Here are the results from my laptop with Tom's jsonb-lengths-merged.patch:

Thanks for that!

 postgres=# select * from testtimes ;
  elem | duration_ms
 --+-
  3674 |0.530412
  4041 |0.552585
  4445 |0.581815

This looks like the level at which the difference gets to be really
noticeable.  Note that this is completely swamped by the difference
between compressed vs. uncompressed though.

 With unpatched git master, the runtime is flat, regardless of which
 element is queried, at about 0.29 s. With
 jsonb-with-offsets-and-lengths-2.patch, there's no difference that I
 could measure.

OK, thanks.

 The difference starts to be meaningful at around 500 entries. In
 practice, I doubt anyone's going to notice until you start talking about
 tens of thousands of entries.
 
 I'll leave it up to the jury to decide if we care or not. It seems like
 a fairly unusual use case, where you push around large enough arrays or
 objects to notice. Then again, I'm sure *someone* will do it. People do
 strange things, and they find ways to abuse the features that the
 original developers didn't think of.

Right, but the question is whether it's worth having a more complex code
and data structure in order to support what certainly *seems* to be a
fairly obscure use-case, that is more than 4000 keys at the same level.
 And it's not like it stops working or becomes completely unresponsive
at that level; it's just double the response time.

On 09/16/2014 12:20 PM, Robert Haas wrote: Basically, I think that if
we make a decision to use Tom's patch
 rather than Heikki's patch, we're deciding that the initial decision,
 by the folks who wrote the original jsonb code, to make array access
 less than O(n) was misguided.  While that could be true, I'd prefer to
 bet that those folks knew what they were doing.  The only way reason
 we're even considering changing it is that the array of lengths
 doesn't compress well, and we've got an approach that fixes that
 problem while preserving the advantages of fast lookup.  We should
 have a darn fine reason to say no to that approach, and it didn't
 benefit my particular use case is not it.

Do you feel that way *as a code maintainer*?  That is, if you ended up
maintaining the JSONB code, would you still feel that it's worth the
extra complexity?  Because that will be the main cost here.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-16 Thread Robert Haas
On Tue, Sep 16, 2014 at 3:24 PM, Josh Berkus j...@agliodbs.com wrote:
 Do you feel that way *as a code maintainer*?  That is, if you ended up
 maintaining the JSONB code, would you still feel that it's worth the
 extra complexity?  Because that will be the main cost here.

I feel that Heikki doesn't have a reputation for writing or committing
unmaintainable code.

I haven't reviewed the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-16 Thread Petr Jelinek

On 16/09/14 21:20, Robert Haas wrote:

In practice, I'm not very surprised that the impact doesn't seem too
bad when you're running SQL queries from the client.  There's so much
other overhead, for de-TOASTing and client communication and even just
planner and executor costs, that this gets lost in the noise.  But
think about a PL/pgsql procedure, say, where somebody might loop over
all of the elements in array.  If those operations go from O(1) to
O(n), then the loop goes from O(n) to O(n^2).  I will bet you a
beverage of your choice that somebody will find that behavior within a
year of release and be dismayed by it.



As somebody who did see server melt (quite literally that time 
unfortunately) thanks to the CPU overhead of operations on varlena 
arrays +1 (in fact +many).


Especially if we are trying to promote the json improvements in 9.4 as 
best of both worlds kind of thing.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-16 Thread Heikki Linnakangas

On 09/16/2014 10:37 PM, Robert Haas wrote:

On Tue, Sep 16, 2014 at 3:24 PM, Josh Berkus j...@agliodbs.com wrote:

Do you feel that way *as a code maintainer*?  That is, if you ended up
maintaining the JSONB code, would you still feel that it's worth the
extra complexity?  Because that will be the main cost here.


I feel that Heikki doesn't have a reputation for writing or committing
unmaintainable code.

I haven't reviewed the patch.


The patch I posted was not pretty, but I'm sure it could be refined to 
something sensible.


There are many possible variations of the basic scheme of storing mostly 
lengths, but an offset for every N elements. I replaced the length with 
offset on some element and used a flag bit to indicate which it is. 
Perhaps a simpler approach would be to store lengths, but also store a 
separate smaller array of offsets, after the lengths array.


I can write a patch if we want to go that way.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-16 Thread Arthur Silva
On Tue, Sep 16, 2014 at 4:20 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Sep 16, 2014 at 1:11 PM, Josh Berkus j...@agliodbs.com wrote:
  Well, I can only judge from the use cases I personally have, none of
  which involve more than 100 keys at any level for most rows.  So far
  I've seen some people argue hypotetical use cases involving hundreds of
  keys per level, but nobody who *actually* has such a use case.
 
  I already told you that I did, and that it was the only and only app I
  had written for JSONB.
 
  Ah, ok, I thought yours was a test case.  Did you check how it performed
  on the two patches at all?  My tests with 185 keys didn't show any
  difference, including for a last key case.

 No, I didn't test it.   But I think Heikki's test results pretty much
 tell us everything there is to see here.  This isn't really that
 complicated; I've read a few papers on index compression over the
 years and they seem to often use techniques that have the same general
 flavor as what Heikki did here, adding complexity in the data format
 to gain other advantages.  So I don't think we should be put off.


I second this reasoning. Even if I ran a couple of very realistic test
cases that support all-lengths I do fell that the Hybrid aproach would be
better as it covers all bases. To put things in perspective Tom's latest
patch isn't much simpler either.

Since it would still be a breaking change we should consider changing the
layout to key-key-key-value-value-value as it seems to pay off.


 Basically, I think that if we make a decision to use Tom's patch
 rather than Heikki's patch, we're deciding that the initial decision,
 by the folks who wrote the original jsonb code, to make array access
 less than O(n) was misguided.  While that could be true, I'd prefer to
 bet that those folks knew what they were doing.  The only way reason
 we're even considering changing it is that the array of lengths
 doesn't compress well, and we've got an approach that fixes that
 problem while preserving the advantages of fast lookup.  We should
 have a darn fine reason to say no to that approach, and it didn't
 benefit my particular use case is not it.

 In practice, I'm not very surprised that the impact doesn't seem too
 bad when you're running SQL queries from the client.  There's so much
 other overhead, for de-TOASTing and client communication and even just
 planner and executor costs, that this gets lost in the noise.  But
 think about a PL/pgsql procedure, say, where somebody might loop over
 all of the elements in array.  If those operations go from O(1) to
 O(n), then the loop goes from O(n) to O(n^2).  I will bet you a
 beverage of your choice that somebody will find that behavior within a
 year of release and be dismayed by it.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-16 Thread Любен Каравелов

- Цитат от Robert Haas (robertmh...@gmail.com), на 16.09.2014 в 22:20 -
 
 In practice, I'm not very surprised that the impact doesn't seem too
 bad when you're running SQL queries from the client.  There's so much
 other overhead, for de-TOASTing and client communication and even just
 planner and executor costs, that this gets lost in the noise.  But
 think about a PL/pgsql procedure, say, where somebody might loop over
 all of the elements in array.  If those operations go from O(1) to
 O(n), then the loop goes from O(n) to O(n^2).  I will bet you a
 beverage of your choice that somebody will find that behavior within a
 year of release and be dismayed by it.
 


Hi,

I can imagine situation exactly like that. We could use jsonb object to
represent sparse vectors in the database where the key is the dimension
and the value is the value. So they could easily grow to thousands of
dimensions. Once you have than in the database it is easy to go and
write some simple numeric computations on these vectors, let's say you
want a dot product of 2 sparse vectors. If the random access inside one
vector is going to O(n^2) then the dot product computation will be going
to O(n^2*m^2), so not pretty.

I am not saying that the DB is the right place to do this type of
computations but it is somethimes convenient to have it also in the DB.

Regards,
luben



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-16 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 09/16/2014 10:37 PM, Robert Haas wrote:
 On Tue, Sep 16, 2014 at 3:24 PM, Josh Berkus j...@agliodbs.com wrote:
 Do you feel that way *as a code maintainer*?  That is, if you ended up
 maintaining the JSONB code, would you still feel that it's worth the
 extra complexity?  Because that will be the main cost here.

 I feel that Heikki doesn't have a reputation for writing or committing
 unmaintainable code.
 I haven't reviewed the patch.

 The patch I posted was not pretty, but I'm sure it could be refined to 
 something sensible.

We're somewhat comparing apples and oranges here, in that I pushed my
approach to something that I think is of committable quality (and which,
not incidentally, fixes some existing bugs that we'd need to fix in any
case); while Heikki's patch was just proof-of-concept.  It would be worth
pushing Heikki's patch to committable quality so that we had a more
complete understanding of just what the complexity difference really is.

 There are many possible variations of the basic scheme of storing mostly 
 lengths, but an offset for every N elements. I replaced the length with 
 offset on some element and used a flag bit to indicate which it is. 

Aside from the complexity issue, a demerit of Heikki's solution is that it
eats up a flag bit that we may well wish we had back later.  On the other
hand, there's definitely something to be said for not breaking
pg_upgrade-ability of 9.4beta databases.

 Perhaps a simpler approach would be to store lengths, but also store a 
 separate smaller array of offsets, after the lengths array.

That way would also give up on-disk compatibility, and I'm not sure it's
any simpler in practice than your existing solution.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Josh Berkus
On 09/12/2014 01:30 PM, Heikki Linnakangas wrote:
 
 Performance was one argument for sure. It's not hard to come up with a
 case where the all-lengths approach is much slower: take a huge array
 with, say, million elements, and fetch the last element in a tight loop.
 And do that in a PL/pgSQL function without storing the datum to disk, so
 that it doesn't get toasted. Not a very common thing to do in real life,
 although something like that might come up if you do a lot of json
 processing in PL/pgSQL. but storing offsets makes that faster.

While I didnt post the results (because they were uninteresting), I did
specifically test the last element in a set of 200 elements for
all-lengths vs. original offsets for JSONB, and the results were not
statistically different.

I did not test against your patch; is there some reason why your patch
would be faster for the last element case than the original offsets
version?

If not, I think the corner case is so obscure as to be not worth
optimizing for.  I can't imagine that more than a tiny minority of our
users are going to have thousands of keys per datum.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Claudio Freire
On Mon, Sep 15, 2014 at 2:12 PM, Josh Berkus j...@agliodbs.com wrote:
 If not, I think the corner case is so obscure as to be not worth
 optimizing for.  I can't imagine that more than a tiny minority of our
 users are going to have thousands of keys per datum.

Worst case is linear cost scaling vs number of keys, which depends on
the number of keys how expensive it is.

It would have an effect only on uncompressed jsonb, since compressed
jsonb already pays a linear cost for decompression.

I'd suggest testing performance of large small keys in uncompressed
form. It's bound to have a noticeable regression there.

Now, large small keys could be 200 or 2000, or even 20k. I'd guess
several should be tested to find the shape of the curve.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Josh Berkus
On 09/15/2014 10:23 AM, Claudio Freire wrote:
 Now, large small keys could be 200 or 2000, or even 20k. I'd guess
 several should be tested to find the shape of the curve.

Well, we know that it's not noticeable with 200, and that it is
noticeable with 100K.  It's only worth testing further if we think that
having more than 200 top-level keys in one JSONB value is going to be a
use case for more than 0.1% of our users.  I personally do not.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Claudio Freire
On Mon, Sep 15, 2014 at 4:09 PM, Josh Berkus j...@agliodbs.com wrote:
 On 09/15/2014 10:23 AM, Claudio Freire wrote:
 Now, large small keys could be 200 or 2000, or even 20k. I'd guess
 several should be tested to find the shape of the curve.

 Well, we know that it's not noticeable with 200, and that it is
 noticeable with 100K.  It's only worth testing further if we think that
 having more than 200 top-level keys in one JSONB value is going to be a
 use case for more than 0.1% of our users.  I personally do not.

Yes, but bear in mind that the worst case is exactly at the use case
jsonb was designed to speed up: element access within relatively big
json documents.

Having them uncompressed is expectable because people using jsonb will
often favor speed over compactness if it's a tradeoff (otherwise
they'd use plain json).

So while you're right that it's perhaps above what would be a common
use case, the range somewhere between 200 and 100K for the tipping
point seems overly imprecise to me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Josh Berkus
On 09/15/2014 12:15 PM, Claudio Freire wrote:
 So while you're right that it's perhaps above what would be a common
 use case, the range somewhere between 200 and 100K for the tipping
 point seems overly imprecise to me.

Well, then, you know how to solve that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Claudio Freire
On Mon, Sep 15, 2014 at 4:17 PM, Josh Berkus j...@agliodbs.com wrote:
 On 09/15/2014 12:15 PM, Claudio Freire wrote:
 So while you're right that it's perhaps above what would be a common
 use case, the range somewhere between 200 and 100K for the tipping
 point seems overly imprecise to me.

 Well, then, you know how to solve that.


I was hoping testing with other numbers was a simple hitting a key for
someone else.

But sure. I'll set something up.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Josh Berkus
On 09/15/2014 12:25 PM, Claudio Freire wrote:
 On Mon, Sep 15, 2014 at 4:17 PM, Josh Berkus j...@agliodbs.com wrote:
 On 09/15/2014 12:15 PM, Claudio Freire wrote:
 So while you're right that it's perhaps above what would be a common
 use case, the range somewhere between 200 and 100K for the tipping
 point seems overly imprecise to me.

 Well, then, you know how to solve that.
 
 
 I was hoping testing with other numbers was a simple hitting a key for
 someone else.

Nope.  My test case has a fixed size.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Robert Haas
On Mon, Sep 15, 2014 at 3:09 PM, Josh Berkus j...@agliodbs.com wrote:
 On 09/15/2014 10:23 AM, Claudio Freire wrote:
 Now, large small keys could be 200 or 2000, or even 20k. I'd guess
 several should be tested to find the shape of the curve.

 Well, we know that it's not noticeable with 200, and that it is
 noticeable with 100K.  It's only worth testing further if we think that
 having more than 200 top-level keys in one JSONB value is going to be a
 use case for more than 0.1% of our users.  I personally do not.

FWIW, I have written one (1) application that uses JSONB and it has
one sub-object (not the top-level object) that in the most typical
configuration contains precisely 270 keys.  Now, granted, that is not
the top-level object, if that distinction is actually relevant here,
but color me just a bit skeptical of this claim anyway.  This was just
a casual thing I did for my own use, not anything industrial strength,
so it's hard to believe I'm stressing the system more than 99.9% of
users will.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Josh Berkus
On 09/15/2014 02:16 PM, Robert Haas wrote:
 On Mon, Sep 15, 2014 at 3:09 PM, Josh Berkus j...@agliodbs.com wrote:
 On 09/15/2014 10:23 AM, Claudio Freire wrote:
 Now, large small keys could be 200 or 2000, or even 20k. I'd guess
 several should be tested to find the shape of the curve.

 Well, we know that it's not noticeable with 200, and that it is
 noticeable with 100K.  It's only worth testing further if we think that
 having more than 200 top-level keys in one JSONB value is going to be a
 use case for more than 0.1% of our users.  I personally do not.
 
 FWIW, I have written one (1) application that uses JSONB and it has
 one sub-object (not the top-level object) that in the most typical
 configuration contains precisely 270 keys.  Now, granted, that is not
 the top-level object, if that distinction is actually relevant here,
 but color me just a bit skeptical of this claim anyway.  This was just
 a casual thing I did for my own use, not anything industrial strength,
 so it's hard to believe I'm stressing the system more than 99.9% of
 users will.

Actually, having the keys all at the same level *is* relevant for the
issue we're discussing.  If those 270 keys are organized in a tree, it's
not the same as having them all on one level (and not as problematic).

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Peter Geoghegan
On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus j...@agliodbs.com wrote:
 Actually, having the keys all at the same level *is* relevant for the
 issue we're discussing.  If those 270 keys are organized in a tree, it's
 not the same as having them all on one level (and not as problematic).

I believe Robert meant that the 270 keys are not at the top level, but
are at some level (in other words, some object has 270 pairs). That is
equivalent to having them at the top level for the purposes of this
discussion.

FWIW, I am slightly concerned about weighing use cases around very
large JSON documents too heavily. Having enormous jsonb documents just
isn't going to work out that well, but neither will equivalent designs
in popular document database systems for similar reasons. For example,
the maximum BSON document size supported by MongoDB is 16 megabytes,
and that seems to be something that their users don't care too much
about. Having 270 pairs in an object isn't unreasonable, but it isn't
going to be all that common either.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Arthur Silva
I couldn't get my hands on the twitter data but I'm generating my own. The
json template is http://paste2.org/wJ1dfcjw and data was generated with
http://www.json-generator.com/. It has 35 top level keys, just in case
someone is wondering.
I generated 1 random objects and I'm inserting them repeatedly until I
got 320k rows.

Test query: SELECT data-'name', data-'email' FROM t_json
Test storage: EXTERNAL
Test jsonb lengths quartiles: {1278,1587,1731,1871,2231}
Tom's lengths+cache aware: 455ms
HEAD: 440ms

This is a realistic-ish workload in my opinion and Tom's patch performs
within 4% of HEAD.

Due to the overall lenghts I couldn't really test compressibility so I
re-ran the test. This time I inserted an array of 2 objects in each row, as
in: [obj, obj];
The objects where taken in sequence from the 1 pool so contents match
in both tests.

Test query: SELECT data # '{0, name}', data # '{0, email}', data # '{1,
name}', data # '{1, email}' FROM t_json
Test storage: EXTENDED
HEAD: 17mb table + 878mb toast
HEAD size quartiles: {2015,2500,2591,2711,3483}
HEAD query runtime: 15s
Tom's: 220mb table + 580mb toast
Tom's size quartiles: {1665,1984,2061,2142.25,2384}
Tom's query runtime: 13s

This is an intriguing edge case that Tom's patch actually outperform the
base implementation for 3~4kb jsons.


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Craig Ringer
On 09/16/2014 07:44 AM, Peter Geoghegan wrote:
 FWIW, I am slightly concerned about weighing use cases around very
 large JSON documents too heavily. Having enormous jsonb documents just
 isn't going to work out that well, but neither will equivalent designs
 in popular document database systems for similar reasons. For example,
 the maximum BSON document size supported by MongoDB is 16 megabytes,
 and that seems to be something that their users don't care too much
 about. Having 270 pairs in an object isn't unreasonable, but it isn't
 going to be all that common either.

Also, at a certain size the fact that Pg must rewrite the whole document
for any change to it starts to introduce other practical changes.

Anyway - this is looking like the change will go in, and with it a
catversion bump. Introduction of a jsonb version/flags byte might be
worthwhile at the same time. It seems likely that there'll be more room
for improvement in jsonb, possibly even down to using different formats
for different data.

Is it worth paying a byte per value to save on possible upgrade pain?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-12 Thread Josh Berkus
On 09/11/2014 06:56 PM, Arthur Silva wrote:
 
 In my testings with the github archive data the savings -
 performance-penalty was fine, but I'm not confident in those results
 since there were only 8 top level keys.

Well, we did want to see that the patch doesn't create a regression with
data which doesn't fall into the problem case area, and your test did
that nicely.

 For comparison, some twitter api objects[1] have 30+ top level keys. If
 I have time in the next couple of days I'll conduct some testings with
 the public twitter fire-hose data.

Yah, if we have enough time for me to get the Mozilla Socorro test
environment working, I can also test with Mozilla crash data.  That has
some deep nesting and very large values.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-12 Thread Robert Haas
On Thu, Sep 11, 2014 at 9:01 PM, Josh Berkus j...@agliodbs.com wrote:
 So, I finally got time to test Tom's latest patch on this.

 TLDR: we want to go with Tom's latest patch and release beta3.

 Figures:

 So I tested HEAD against the latest lengths patch.  Per Arthur Silva, I
 checked uncompressed times for JSONB against compressed times.  This
 changed the picture considerably.

Did you


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-12 Thread Robert Haas
On Fri, Sep 12, 2014 at 1:00 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Sep 11, 2014 at 9:01 PM, Josh Berkus j...@agliodbs.com wrote:
 So, I finally got time to test Tom's latest patch on this.

 TLDR: we want to go with Tom's latest patch and release beta3.

 Figures:

 So I tested HEAD against the latest lengths patch.  Per Arthur Silva, I
 checked uncompressed times for JSONB against compressed times.  This
 changed the picture considerably.

 Did you

Blah.

Did you test Heikki's patch from here?

http://www.postgresql.org/message-id/53ec8194.4020...@vmware.com

Tom didn't like it, but I thought it was rather clever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-12 Thread Josh Berkus
On 09/12/2014 10:00 AM, Robert Haas wrote:
 On Fri, Sep 12, 2014 at 1:00 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Sep 11, 2014 at 9:01 PM, Josh Berkus j...@agliodbs.com wrote:
 So, I finally got time to test Tom's latest patch on this.

 TLDR: we want to go with Tom's latest patch and release beta3.

 Figures:

 So I tested HEAD against the latest lengths patch.  Per Arthur Silva, I
 checked uncompressed times for JSONB against compressed times.  This
 changed the picture considerably.

 Did you
 
 Blah.
 
 Did you test Heikki's patch from here?
 
 http://www.postgresql.org/message-id/53ec8194.4020...@vmware.com
 
 Tom didn't like it, but I thought it was rather clever.
 

Yes, I posted the results for that a couple weeks ago; Tom had posted a
cleaned-up version of that patch, but materially it made no difference
in sizes or extraction times compared with Tom's lengths-only patch.
Same for Arthur's tests.

It's certainly possible that there is a test case for which Heikki's
approach is superior, but if so we haven't seen it.  And since it's
approach is also more complicated, sticking with the simpler
lengths-only approach seems like the way to go.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-12 Thread Robert Haas
On Fri, Sep 12, 2014 at 1:11 PM, Josh Berkus j...@agliodbs.com wrote:
 On 09/12/2014 10:00 AM, Robert Haas wrote:
 On Fri, Sep 12, 2014 at 1:00 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Sep 11, 2014 at 9:01 PM, Josh Berkus j...@agliodbs.com wrote:
 So, I finally got time to test Tom's latest patch on this.

 TLDR: we want to go with Tom's latest patch and release beta3.

 Figures:

 So I tested HEAD against the latest lengths patch.  Per Arthur Silva, I
 checked uncompressed times for JSONB against compressed times.  This
 changed the picture considerably.

 Did you

 Blah.

 Did you test Heikki's patch from here?

 http://www.postgresql.org/message-id/53ec8194.4020...@vmware.com

 Tom didn't like it, but I thought it was rather clever.

 Yes, I posted the results for that a couple weeks ago; Tom had posted a
 cleaned-up version of that patch, but materially it made no difference
 in sizes or extraction times compared with Tom's lengths-only patch.
 Same for Arthur's tests.

 It's certainly possible that there is a test case for which Heikki's
 approach is superior, but if so we haven't seen it.  And since it's
 approach is also more complicated, sticking with the simpler
 lengths-only approach seems like the way to go.

Huh, OK.  I'm slightly surprised, but that's why we benchmark these things.

Thanks for following up on this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-12 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Sep 12, 2014 at 1:11 PM, Josh Berkus j...@agliodbs.com wrote:
 It's certainly possible that there is a test case for which Heikki's
 approach is superior, but if so we haven't seen it.  And since it's
 approach is also more complicated, sticking with the simpler
 lengths-only approach seems like the way to go.

 Huh, OK.  I'm slightly surprised, but that's why we benchmark these things.

The argument for Heikki's patch was never that it would offer better
performance; it's obvious (at least to me) that it won't.  The argument
was that it'd be upward-compatible with what we're doing now, so that
we'd not have to force an on-disk compatibility break with 9.4beta2.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-12 Thread Heikki Linnakangas

On 09/12/2014 08:52 PM, Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:

On Fri, Sep 12, 2014 at 1:11 PM, Josh Berkus j...@agliodbs.com wrote:

It's certainly possible that there is a test case for which Heikki's
approach is superior, but if so we haven't seen it.  And since it's
approach is also more complicated, sticking with the simpler
lengths-only approach seems like the way to go.



Huh, OK.  I'm slightly surprised, but that's why we benchmark these things.


The argument for Heikki's patch was never that it would offer better
performance; it's obvious (at least to me) that it won't.


Performance was one argument for sure. It's not hard to come up with a 
case where the all-lengths approach is much slower: take a huge array 
with, say, million elements, and fetch the last element in a tight loop. 
And do that in a PL/pgSQL function without storing the datum to disk, so 
that it doesn't get toasted. Not a very common thing to do in real life, 
although something like that might come up if you do a lot of json 
processing in PL/pgSQL. but storing offsets makes that faster.


IOW, something like this:

do $$
declare
  ja jsonb;
  i int4;
begin
  select json_agg(g) into ja from generate_series(1, 10) g;
  for i in 1..10 loop
perform ja - 9;
  end loop;
end;
$$;

should perform much better with current git master or my patch, than 
with the all-lengths patch.


I'm OK with going for the all-lengths approach anyway; it's simpler, and 
working with huge arrays is hopefully not that common. But it's not a 
completely open-and-shut case.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-11 Thread Josh Berkus
So, I finally got time to test Tom's latest patch on this.

TLDR: we want to go with Tom's latest patch and release beta3.

Figures:

So I tested HEAD against the latest lengths patch.  Per Arthur Silva, I
checked uncompressed times for JSONB against compressed times.  This
changed the picture considerably.

TABLE SIZES
---

HEAD

  ?column?   | pg_size_pretty
-+
 json text format| 393 MB
 jsonb: compressed   | 1147 MB
 jsonb: uncompressed | 1221 MB

PATCHED

  ?column?   | pg_size_pretty
-+
 json text format| 394 MB
 jsonb: compressed   | 525 MB
 jsonb: uncompressed | 1200 MB


EXTRACTION TIMES


HEAD

Q1 (search via GIN index followed by extracting 100,000 values from rows):

jsonb compressed: 4000
jsonb uncompressed: 3250


Q2 (seq scan and extract 200,000 values from rows):

json: 11700
jsonb compressed: 3150
jsonb uncompressed: 2700


PATCHED

Q1:

jsonb compressed: 6750
jsonb uncompressed: 3350

Q2:

json: 11796
jsonb compressed: 4700
jsonb uncompressed: 2650

--

Conclusion: with Tom's patch, compressed JSONB is 55% smaller when
compressed (EXTENDED).  Extraction times are 50% to 70% slower, but this
appears to be almost entirely due to decompression overhead.  When not
compressing (EXTERNAL), extraction times for patch versions are
statistically the same as HEAD, and file sizes are similar to HEAD.

USER REACTION
-

I polled at both PDXpgDay and at FOSS4G, asking some ~~ 80 Postgres
users how they would feel about a compression vs. extraction time
tradeoff.  The audience was evenly split.

However, with the current patch, the user can choose.  Users who know
enough for performance tuning can set JSONB columns to EXTERNAL, and the
the same performance as the unpatched version.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-11 Thread Stephen Frost
* Josh Berkus (j...@agliodbs.com) wrote:
 TLDR: we want to go with Tom's latest patch and release beta3.

Having not even read the rest- yes please.  We really need to get beta3
out and figure out when we're going to actually release 9.4...
Admittedly, the last month has been good and we've been fixing issues,
but it'd really be good to wrap 9.4 up.

 Conclusion: with Tom's patch, compressed JSONB is 55% smaller when
 compressed (EXTENDED).  Extraction times are 50% to 70% slower, but this
 appears to be almost entirely due to decompression overhead.  When not
 compressing (EXTERNAL), extraction times for patch versions are
 statistically the same as HEAD, and file sizes are similar to HEAD.

Not really a surprise.

 I polled at both PDXpgDay and at FOSS4G, asking some ~~ 80 Postgres
 users how they would feel about a compression vs. extraction time
 tradeoff.  The audience was evenly split.

Also not a surprise.

 However, with the current patch, the user can choose.  Users who know
 enough for performance tuning can set JSONB columns to EXTERNAL, and the
 the same performance as the unpatched version.

Agreed.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-11 Thread Arthur Silva
On Thu, Sep 11, 2014 at 10:01 PM, Josh Berkus j...@agliodbs.com wrote:

 So, I finally got time to test Tom's latest patch on this.

 TLDR: we want to go with Tom's latest patch and release beta3.

 Figures:

 So I tested HEAD against the latest lengths patch.  Per Arthur Silva, I
 checked uncompressed times for JSONB against compressed times.  This
 changed the picture considerably.

 TABLE SIZES
 ---

 HEAD

   ?column?   | pg_size_pretty
 -+
  json text format| 393 MB
  jsonb: compressed   | 1147 MB
  jsonb: uncompressed | 1221 MB

 PATCHED

   ?column?   | pg_size_pretty
 -+
  json text format| 394 MB
  jsonb: compressed   | 525 MB
  jsonb: uncompressed | 1200 MB


 EXTRACTION TIMES
 

 HEAD

 Q1 (search via GIN index followed by extracting 100,000 values from rows):

 jsonb compressed: 4000
 jsonb uncompressed: 3250


 Q2 (seq scan and extract 200,000 values from rows):

 json: 11700
 jsonb compressed: 3150
 jsonb uncompressed: 2700


 PATCHED

 Q1:

 jsonb compressed: 6750
 jsonb uncompressed: 3350

 Q2:

 json: 11796
 jsonb compressed: 4700
 jsonb uncompressed: 2650

 --

 Conclusion: with Tom's patch, compressed JSONB is 55% smaller when
 compressed (EXTENDED).  Extraction times are 50% to 70% slower, but this
 appears to be almost entirely due to decompression overhead.  When not
 compressing (EXTERNAL), extraction times for patch versions are
 statistically the same as HEAD, and file sizes are similar to HEAD.

 USER REACTION
 -

 I polled at both PDXpgDay and at FOSS4G, asking some ~~ 80 Postgres
 users how they would feel about a compression vs. extraction time
 tradeoff.  The audience was evenly split.

 However, with the current patch, the user can choose.  Users who know
 enough for performance tuning can set JSONB columns to EXTERNAL, and the
 the same performance as the unpatched version.

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



The compression ratio difference is exaggerated in this case but it does
support that Tom's patch alleviates the extraction penalty.

In my testings with the github archive data the savings -
performance-penalty was fine, but I'm not confident in those results since
there were only 8 top level keys.
For comparison, some twitter api objects[1] have 30+ top level keys. If I
have time in the next couple of days I'll conduct some testings with the
public twitter fire-hose data.

[1] https://dev.twitter.com/rest/reference/get/statuses/home_timeline


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-06 Thread David E. Wheeler
On Sep 4, 2014, at 7:26 PM, Jan Wieck j...@wi3ck.info wrote:

 This is only because the input data was exact copies of the same strings over 
 and over again. PGLZ can very well compress slightly less identical strings 
 of varying lengths too. Not as well, but well enough. But I suspect such 
 input data would make it fail again, even with lengths.

We had a bit of discussion about JSONB compression at PDXPUG Day this morning. 
Josh polled the room, and about half though we should apply the patch for 
better compression, while the other half seemed to want faster access 
operations. (Some folks no doubt voted for both.) But in the ensuing 
discussion, I started to think that maybe we should leave it as it is, for two 
reasons:

1. There has been a fair amount of discussion about ways to better deal with 
this in future releases, such as hints to TOAST about how to compress, or the 
application of different compression algorithms (or pluggable compression). I’m 
assuming that leaving it as-is does not remove those possibilities.

2. The major advantage of JSONB is fast access operations. If those are not as 
important for a given use case as storage space, there’s still the JSON type, 
which *does* compress reasonably well. IOW, We already have a JSON alternative 
the compresses well. So why make the same (or similar) trade-offs with JSONB?

Just my $0.02. I would like to see some consensus on this, soon, though, as I 
am eager to get 9.4 and JSONB, regardless of the outcome!

Best,

David



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-04 Thread Jan Wieck

On 08/12/2014 10:58 AM, Robert Haas wrote:

What would really be ideal here is if the JSON code could inform the
toast compression code this many initial bytes are likely
incompressible, just pass them through without trying, and then start
compressing at byte N, where N is the byte following the TOC.  But I
don't know that there's a reasonable way to implement that.



Sorry, being late for the party.

Anyhow, this strikes me as a good basic direction of thought. But I 
think we should put the burden on the data type, not on toast. To do 
that data types could have an optional toast_hint_values() function, 
which the toast code can call with the actual datum at hand and its 
default parameter array. The hint values function then can modify that 
parameter array, telling toast how much to skip, how hard to try (or not 
at all) and so on. A data type specific function should know much better 
how to figure out how compressible a particular datum may be.


Certainly nothing for 9.4, but it might require changing the toast API 
in a different way than just handing it an oid and hard-coding the 
JASONBOID case into toast for 9.4. If we are going to change the API, we 
might as well do it right.



Regards,
Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-04 Thread Jan Wieck

On 08/08/2014 10:21 AM, Andrew Dunstan wrote:


On 08/07/2014 11:17 PM, Tom Lane wrote:

I looked into the issue reported in bug #11109.  The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON array or object will be fairly incompressible, because
it consists mostly of a strictly-increasing series of integer offsets.
This interacts poorly with the code in pglz_compress() that gives up if
it's found nothing compressible in the first first_success_by bytes of a
value-to-be-compressed.  (first_success_by is 1024 in the default set of
compression parameters.)


[snip]


There is plenty of compressible data once we get into the repetitive
strings in the payload part --- but that starts at offset 944, and up to
that point there is nothing that pg_lzcompress can get a handle on.  There
are, by definition, no sequences of 4 or more repeated bytes in that area.
I think in principle pg_lzcompress could decide to compress the 3-byte
sequences consisting of the high-order 24 bits of each offset; but it
doesn't choose to do so, probably because of the way its lookup hash table
works:

  * pglz_hist_idx -
  *
  * Computes the history table slot for the lookup by the next 4
  * characters in the input.
  *
  * NB: because we use the next 4 characters, we are not guaranteed to
  * find 3-character matches; they very possibly will be in the wrong
  * hash list.  This seems an acceptable tradeoff for spreading out the
  * hash keys more.

For jsonb header data, the next 4 characters are *always* different, so
only a chance hash collision can result in a match.  There is therefore a
pretty good chance that no compression will occur before it gives up
because of first_success_by.

I'm not sure if there is any easy fix for this.  We could possibly change
the default first_success_by value, but I think that'd just be postponing
the problem to larger jsonb objects/arrays, and it would hurt performance
for genuinely incompressible data.  A somewhat painful, but not yet
out-of-the-question, alternative is to change the jsonb on-disk
representation.  Perhaps the JEntry array could be defined as containing
element lengths instead of element ending offsets.  Not sure though if
that would break binary searching for JSON object keys.





Ouch.

Back when this structure was first presented at pgCon 2013, I wondered
if we shouldn't extract the strings into a dictionary, because of key
repetition, and convinced myself that this shouldn't be necessary
because in significant cases TOAST would take care of it.

Maybe we should have pglz_compress() look at the *last* 1024 bytes if it
can't find anything worth compressing in the first, for values larger
than a certain size.

It's worth noting that this is a fairly pathological case. AIUI the
example you constructed has an array with 100k string elements. I don't
think that's typical. So I suspect that unless I've misunderstood the
statement of the problem we're going to find that almost all the jsonb
we will be storing is still compressible.


I also think that a substantial part of the problem of coming up with a 
representative data sample is because the size of the incompressible 
data at the beginning is somewhat tied to the overall size of the datum 
itself. This may or may not be true in any particular use case, but as a 
general rule of thumb I would assume that the larger the JSONB document, 
the larger the offset array at the beginning.


Would changing 1024 to a fraction of the datum length for the time being 
give us enough room to come up with a proper solution for 9.5?



Regards,
Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-04 Thread Jan Wieck

On 08/08/2014 11:18 AM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 08/07/2014 11:17 PM, Tom Lane wrote:

I looked into the issue reported in bug #11109.  The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON array or object will be fairly incompressible, because
it consists mostly of a strictly-increasing series of integer offsets.



Ouch.



Back when this structure was first presented at pgCon 2013, I wondered
if we shouldn't extract the strings into a dictionary, because of key
repetition, and convinced myself that this shouldn't be necessary
because in significant cases TOAST would take care of it.


That's not really the issue here, I think.  The problem is that a
relatively minor aspect of the representation, namely the choice to store
a series of offsets rather than a series of lengths, produces
nonrepetitive data even when the original input is repetitive.


This is only because the input data was exact copies of the same strings 
over and over again. PGLZ can very well compress slightly less identical 
strings of varying lengths too. Not as well, but well enough. But I 
suspect such input data would make it fail again, even with lengths.



Regards,
Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-27 Thread Arthur Silva
On Wed, Aug 27, 2014 at 1:09 AM, Arthur Silva arthur...@gmail.com wrote:

 It won't be faster by any means, but it should definitely be incorporated
 if any format changes are made (like Tom already suggested).

 I think it's important we gather at least 2 more things before making any
 calls:
 * Josh tests w/ cache aware patch, which should confirm cache aware is
 indeed prefered
 * Tests with toast hacked to use lz4 instead, which might ease any
 decisions


 --
 Arthur Silva



 On Wed, Aug 27, 2014 at 12:53 AM, Peter Geoghegan p...@heroku.com wrote:

 On Tue, Aug 26, 2014 at 8:41 PM, Arthur Silva arthur...@gmail.com
 wrote:
  The difference is small but I's definitely faster, which makes sense
 since
  cache line misses are probably slightly reduced.
  As in the previous runs, I ran the query a dozen times and took the
 average
  after excluding runs with a high deviation.

 I'm not surprised that it hasn't beaten HEAD. I haven't studied the
 problem in detail, but I don't think that the cache awareness of the
 new revision is necessarily a distinct advantage.

 --
 Peter Geoghegan



I'm attaching a quick-n-dirty patch that uses lz4 compression instead of
pglz in case someone wants to experiment with it. Seems to work in my test
env, I'll make more tests when I get home.

PS: gotta love gmail fixed defaults of top-posting...


lz4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Heikki Linnakangas

On 08/16/2014 02:19 AM, Tom Lane wrote:

I think the realistic alternatives at this point are either to
switch to all-lengths as in my test patch, or to use the hybrid approach
of Heikki's test patch.  IMO the major attraction of Heikki's patch
is that it'd be upward compatible with existing beta installations,
ie no initdb required (but thus, no opportunity to squeeze in a version
identifier either).  It's not showing up terribly well in the performance
tests I've been doing --- it's about halfway between HEAD and my patch on
that extract-a-key-from-a-PLAIN-stored-column test.  But, just as with my
patch, there are things that could be done to micro-optimize it by
touching a bit more code.

I did some quick stats comparing compressed sizes for the delicio.us
data, printing quartiles as per Josh's lead:

all-lengths {440,569,609,655,1257}
Heikki's patch  {456,582,624,671,1274}
HEAD{493,636,684,744,1485}

(As before, this is pg_column_size of the jsonb within a table whose rows
are wide enough to force tuptoaster.c to try to compress the jsonb;
otherwise many of these values wouldn't get compressed.)  These documents
don't have enough keys to trigger the first_success_by issue, so that
HEAD doesn't look too awful, but still there's about an 11% gain from
switching from offsets to lengths.  Heikki's method captures much of
that but not all.

Personally I'd prefer to go to the all-lengths approach, but a large
part of that comes from a subjective assessment that the hybrid approach
is too messy.  Others might well disagree.


It's not too pretty, no. But it would be nice to not have to make a 
tradeoff between lookup speed and compressibility.


Yet another idea is to store all lengths, but add an additional array of 
offsets to JsonbContainer. The array would contain the offset of, say, 
every 16th element. It would be very small compared to the lengths 
array, but would greatly speed up random access on a large array/object.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 08/16/2014 02:19 AM, Tom Lane wrote:
 I think the realistic alternatives at this point are either to
 switch to all-lengths as in my test patch, or to use the hybrid approach
 of Heikki's test patch. ...
 Personally I'd prefer to go to the all-lengths approach, but a large
 part of that comes from a subjective assessment that the hybrid approach
 is too messy.  Others might well disagree.

 It's not too pretty, no. But it would be nice to not have to make a 
 tradeoff between lookup speed and compressibility.

 Yet another idea is to store all lengths, but add an additional array of 
 offsets to JsonbContainer. The array would contain the offset of, say, 
 every 16th element. It would be very small compared to the lengths 
 array, but would greatly speed up random access on a large array/object.

That does nothing to address my basic concern about the patch, which is
that it's too complicated and therefore bug-prone.  Moreover, it'd lose
on-disk compatibility which is really the sole saving grace of the
proposal.

My feeling about it at this point is that the apparent speed gain from
using offsets is illusory: in practically all real-world cases where there
are enough keys or array elements for it to matter, costs associated with
compression (or rather failure to compress) will dominate any savings we
get from offset-assisted lookups.  I agree that the evidence for this
opinion is pretty thin ... but the evidence against it is nonexistent.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Josh Berkus
On 08/26/2014 07:51 AM, Tom Lane wrote:
 My feeling about it at this point is that the apparent speed gain from
 using offsets is illusory: in practically all real-world cases where there
 are enough keys or array elements for it to matter, costs associated with
 compression (or rather failure to compress) will dominate any savings we
 get from offset-assisted lookups.  I agree that the evidence for this
 opinion is pretty thin ... but the evidence against it is nonexistent.

Well, I have shown one test case which shows where lengths is a net
penalty.  However, for that to be the case, you have to have the
following conditions *all* be true:

* lots of top-level keys
* short values
* rows which are on the borderline for TOAST
* table which fits in RAM

... so that's a special case and if it's sub-optimal, no bigee.  Also,
it's not like it's an order-of-magnitude slower.

Anyway,  I called for feedback on by blog, and have gotten some:

http://www.databasesoup.com/2014/08/the-great-jsonb-tradeoff.html

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Anyway,  I called for feedback on by blog, and have gotten some:
 http://www.databasesoup.com/2014/08/the-great-jsonb-tradeoff.html

I was hoping you'd get some useful data from that, but so far it seems
like a rehash of points made in the on-list thread :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Josh Berkus
On 08/26/2014 11:40 AM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 Anyway,  I called for feedback on by blog, and have gotten some:
 http://www.databasesoup.com/2014/08/the-great-jsonb-tradeoff.html
 
 I was hoping you'd get some useful data from that, but so far it seems
 like a rehash of points made in the on-list thread :-(
 
   regards, tom lane

yah, me too. :-(

Unfortunately even the outside commentors don't seem to understand that
storage size *is* related to speed, it's exchanging I/O speed for CPU speed.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 08/26/2014 11:40 AM, Tom Lane wrote:
 I was hoping you'd get some useful data from that, but so far it seems
 like a rehash of points made in the on-list thread :-(

 Unfortunately even the outside commentors don't seem to understand that
 storage size *is* related to speed, it's exchanging I/O speed for CPU speed.

Yeah, exactly.  Given current hardware trends, data compression is
becoming more of a win not less as time goes on: CPU cycles are cheap
even compared to main memory access, let alone mass storage.  So I'm
thinking we want to adopt a compression-friendly data format even if
it measures out as a small loss currently.

I wish it were cache-friendly too, per the upthread tangent about having
to fetch keys from all over the place within a large JSON object.

... and while I was typing that sentence, lightning struck.  The existing
arrangement of object subfields with keys and values interleaved is just
plain dumb.  We should rearrange that as all the keys in order, then all
the values in the same order.  Then the keys are naturally adjacent in
memory and object-key searches become much more cache-friendly: you
probably touch most of the key portion of the object, but none of the
values portion, until you know exactly what part of the latter to fetch.
This approach might complicate the lookup logic marginally but I bet not
very much; and it will be a huge help if we ever want to do smart access
to EXTERNAL (non-compressed) JSON values.

I will go prototype that just to see how much code rearrangement is
required.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Andres Freund
On 2014-08-26 15:01:27 -0400, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
  On 08/26/2014 11:40 AM, Tom Lane wrote:
  I was hoping you'd get some useful data from that, but so far it seems
  like a rehash of points made in the on-list thread :-(
 
  Unfortunately even the outside commentors don't seem to understand that
  storage size *is* related to speed, it's exchanging I/O speed for CPU speed.
 
 Yeah, exactly.  Given current hardware trends, data compression is
 becoming more of a win not less as time goes on: CPU cycles are cheap
 even compared to main memory access, let alone mass storage.  So I'm
 thinking we want to adopt a compression-friendly data format even if
 it measures out as a small loss currently.

On the other hand the majority of databases these day fit into main
memory due to its increasing sizes and postgres is more often CPU than
IO bound.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Laurence Rowe
On 26 August 2014 11:34, Josh Berkus j...@agliodbs.com wrote:

 On 08/26/2014 07:51 AM, Tom Lane wrote:
  My feeling about it at this point is that the apparent speed gain from
  using offsets is illusory: in practically all real-world cases where
 there
  are enough keys or array elements for it to matter, costs associated with
  compression (or rather failure to compress) will dominate any savings we
  get from offset-assisted lookups.  I agree that the evidence for this
  opinion is pretty thin ... but the evidence against it is nonexistent.

 Well, I have shown one test case which shows where lengths is a net
 penalty.  However, for that to be the case, you have to have the
 following conditions *all* be true:

 * lots of top-level keys
 * short values
 * rows which are on the borderline for TOAST
 * table which fits in RAM

 ... so that's a special case and if it's sub-optimal, no bigee.  Also,
 it's not like it's an order-of-magnitude slower.

 Anyway,  I called for feedback on by blog, and have gotten some:

 http://www.databasesoup.com/2014/08/the-great-jsonb-tradeoff.html


It would be really interesting to see your results with column STORAGE
EXTERNAL for that benchmark. I think it is important to separate out the
slowdown due to decompression now being needed vs that inherent in the new
format, we can always switch off compression on a per-column basis using
STORAGE EXTERNAL.


My JSON data has smallish objects with a small number of keys, it barely
compresses at all with the patch and shows similar results to Arthur's
data. Across ~500K rows I get:

encoded=# select count(properties-'submitted_by') from compressed;
 count

 431948
(1 row)

Time: 250.512 ms

encoded=# select count(properties-'submitted_by') from uncompressed;
 count

 431948
(1 row)

Time: 218.552 ms


Laurence


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-08-26 15:01:27 -0400, Tom Lane wrote:
 Yeah, exactly.  Given current hardware trends, data compression is
 becoming more of a win not less as time goes on: CPU cycles are cheap
 even compared to main memory access, let alone mass storage.  So I'm
 thinking we want to adopt a compression-friendly data format even if
 it measures out as a small loss currently.

 On the other hand the majority of databases these day fit into main
 memory due to its increasing sizes and postgres is more often CPU than
 IO bound.

Well, better data compression helps make that true ;-).  And don't forget
cache effects; actual main memory is considered slow these days.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Claudio Freire
On Tue, Aug 26, 2014 at 4:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Berkus j...@agliodbs.com writes:
 On 08/26/2014 11:40 AM, Tom Lane wrote:
 I was hoping you'd get some useful data from that, but so far it seems
 like a rehash of points made in the on-list thread :-(

 Unfortunately even the outside commentors don't seem to understand that
 storage size *is* related to speed, it's exchanging I/O speed for CPU speed.

 Yeah, exactly.  Given current hardware trends, data compression is
 becoming more of a win not less as time goes on: CPU cycles are cheap
 even compared to main memory access, let alone mass storage.  So I'm
 thinking we want to adopt a compression-friendly data format even if
 it measures out as a small loss currently.

 I wish it were cache-friendly too, per the upthread tangent about having
 to fetch keys from all over the place within a large JSON object.


What about my earlier proposal?

An in-memory compressed representation would greatly help cache
locality, more so if you pack keys as you mentioned.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Andres Freund
On 2014-08-26 15:17:13 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-08-26 15:01:27 -0400, Tom Lane wrote:
  Yeah, exactly.  Given current hardware trends, data compression is
  becoming more of a win not less as time goes on: CPU cycles are cheap
  even compared to main memory access, let alone mass storage.  So I'm
  thinking we want to adopt a compression-friendly data format even if
  it measures out as a small loss currently.
 
  On the other hand the majority of databases these day fit into main
  memory due to its increasing sizes and postgres is more often CPU than
  IO bound.
 
 Well, better data compression helps make that true ;-).

People disable toast compression though because it results in better
performance :(. Part of that could be fixed by a faster compression
method, part of it by decompressing less often. But still.

 And don't forget cache effects; actual main memory is considered slow
 these days.

Right. But that plays the other way round too. Compressed datums need to
be copied to be accessed uncompressed. Whereas at least in comparison to
inline compressed datums that's not necessary.

Anyway, that's just to say that I don't really agree that CPU overhead
is a worthy price to pay for storage efficiency if the gains are small.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Peter Geoghegan
On Tue, Aug 26, 2014 at 12:27 PM, Andres Freund and...@2ndquadrant.com wrote:
 Anyway, that's just to say that I don't really agree that CPU overhead
 is a worthy price to pay for storage efficiency if the gains are small.

+1

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Josh Berkus
On 08/26/2014 12:27 PM, Andres Freund wrote:
 Anyway, that's just to say that I don't really agree that CPU overhead
 is a worthy price to pay for storage efficiency if the gains are small.

But in this case the gains aren't small; we're talking up to 60% smaller
storage.

Testing STORAGE EXTENDED soon.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Tom Lane
I wrote:
 I wish it were cache-friendly too, per the upthread tangent about having
 to fetch keys from all over the place within a large JSON object.

 ... and while I was typing that sentence, lightning struck.  The existing
 arrangement of object subfields with keys and values interleaved is just
 plain dumb.  We should rearrange that as all the keys in order, then all
 the values in the same order.  Then the keys are naturally adjacent in
 memory and object-key searches become much more cache-friendly: you
 probably touch most of the key portion of the object, but none of the
 values portion, until you know exactly what part of the latter to fetch.
 This approach might complicate the lookup logic marginally but I bet not
 very much; and it will be a huge help if we ever want to do smart access
 to EXTERNAL (non-compressed) JSON values.

 I will go prototype that just to see how much code rearrangement is
 required.

This looks pretty good from a coding point of view.  I have not had time
yet to see if it affects the speed of the benchmark cases we've been
trying.  I suspect that it won't make much difference in them.  I think
if we do decide to make an on-disk format change, we should seriously
consider including this change.

The same concept could be applied to offset-based storage of course,
although I rather doubt that we'd make that combination of choices since
it would be giving up on-disk compatibility for benefits that are mostly
in the future.

Attached are two patches: one is a delta against the last jsonb-lengths
patch I posted, and the other is a merged patch showing the total change
from HEAD, for ease of application.

regards, tom lane

diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index e47eaea..4e7fe67 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
***
*** 26,33 
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (the total size of an array's elements is also limited by JENTRY_LENMASK,
!  * but we're not concerned about that here)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
--- 26,33 
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (The total size of an array's or object's elements is also limited by
!  * JENTRY_LENMASK, but we're not concerned about that here.)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
*** findJsonbValueFromContainer(JsonbContain
*** 294,303 
  {
  	JEntry	   *children = container-children;
  	int			count = (container-header  JB_CMASK);
! 	JsonbValue *result = palloc(sizeof(JsonbValue));
  
  	Assert((flags  ~(JB_FARRAY | JB_FOBJECT)) == 0);
  
  	if (flags  JB_FARRAY  container-header)
  	{
  		char	   *base_addr = (char *) (children + count);
--- 294,309 
  {
  	JEntry	   *children = container-children;
  	int			count = (container-header  JB_CMASK);
! 	JsonbValue *result;
  
  	Assert((flags  ~(JB_FARRAY | JB_FOBJECT)) == 0);
  
+ 	/* Quick out without a palloc cycle if object/array is empty */
+ 	if (count = 0)
+ 		return NULL;
+ 
+ 	result = palloc(sizeof(JsonbValue));
+ 
  	if (flags  JB_FARRAY  container-header)
  	{
  		char	   *base_addr = (char *) (children + count);
*** findJsonbValueFromContainer(JsonbContain
*** 323,329 
  		char	   *base_addr = (char *) (children + count * 2);
  		uint32	   *offsets;
  		uint32		lastoff;
! 		int			lastoffpos;
  		uint32		stopLow = 0,
  	stopHigh = count;
  
--- 329,335 
  		char	   *base_addr = (char *) (children + count * 2);
  		uint32	   *offsets;
  		uint32		lastoff;
! 		int			i;
  		uint32		stopLow = 0,
  	stopHigh = count;
  
*** findJsonbValueFromContainer(JsonbContain
*** 332,379 
  
  		/*
  		 * We use a cache to avoid redundant getJsonbOffset() computations
! 		 * inside the search loop.  Note that count may well be zero at this
! 		 * point; to avoid an ugly special case for initializing lastoff and
! 		 * lastoffpos, we allocate one extra array element.
  		 */
! 		offsets = (uint32 *) palloc((count * 2 + 1) * sizeof(uint32));
! 		offsets[0] = lastoff = 0;
! 		lastoffpos = 0;
  
  		/* Binary search on object/pair keys *only* */
  		while (stopLow  stopHigh)
  		{
  			uint32		stopMiddle;
- 			int			index;
  			int			difference;
  			JsonbValue	candidate;
  
  			stopMiddle = stopLow + (stopHigh - stopLow) / 2;
  
- 			/*
- 			 * Compensate for the fact that we're searching through pairs (not
- 			 * entries).
- 			 */
- 			index = stopMiddle * 2;
- 
- 			/* Update the offsets cache through at least 

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Arthur Silva
Tom, here's the results with github data (8 top level keys) only. Here's a
sample object https://gist.github.com/igrigorik/2017462

All-Lenghts + Cache-Aware EXTERNAL
Query 1: 516ms
Query 2: 350ms

The difference is small but I's definitely faster, which makes sense since
cache line misses are probably slightly reduced.
As in the previous runs, I ran the query a dozen times and took the average
after excluding runs with a high deviation.

compare to (copied from my previous email)

HEAD (aka, all offsets) EXTERNAL
Test query 1 runtime: 505ms
Test query 2 runtime: 350ms


All Lengths (Tom Lane patch) EXTERNAL
Test query 1 runtime: 525ms
Test query 2 runtime: 355ms


--
Arthur Silva



On Tue, Aug 26, 2014 at 7:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I wrote:
  I wish it were cache-friendly too, per the upthread tangent about having
  to fetch keys from all over the place within a large JSON object.

  ... and while I was typing that sentence, lightning struck.  The existing
  arrangement of object subfields with keys and values interleaved is just
  plain dumb.  We should rearrange that as all the keys in order, then all
  the values in the same order.  Then the keys are naturally adjacent in
  memory and object-key searches become much more cache-friendly: you
  probably touch most of the key portion of the object, but none of the
  values portion, until you know exactly what part of the latter to fetch.
  This approach might complicate the lookup logic marginally but I bet not
  very much; and it will be a huge help if we ever want to do smart access
  to EXTERNAL (non-compressed) JSON values.

  I will go prototype that just to see how much code rearrangement is
  required.

 This looks pretty good from a coding point of view.  I have not had time
 yet to see if it affects the speed of the benchmark cases we've been
 trying.  I suspect that it won't make much difference in them.  I think
 if we do decide to make an on-disk format change, we should seriously
 consider including this change.

 The same concept could be applied to offset-based storage of course,
 although I rather doubt that we'd make that combination of choices since
 it would be giving up on-disk compatibility for benefits that are mostly
 in the future.

 Attached are two patches: one is a delta against the last jsonb-lengths
 patch I posted, and the other is a merged patch showing the total change
 from HEAD, for ease of application.

 regards, tom lane




Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Peter Geoghegan
On Tue, Aug 26, 2014 at 8:41 PM, Arthur Silva arthur...@gmail.com wrote:
 The difference is small but I's definitely faster, which makes sense since
 cache line misses are probably slightly reduced.
 As in the previous runs, I ran the query a dozen times and took the average
 after excluding runs with a high deviation.

I'm not surprised that it hasn't beaten HEAD. I haven't studied the
problem in detail, but I don't think that the cache awareness of the
new revision is necessarily a distinct advantage.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 I'm not surprised that it hasn't beaten HEAD. I haven't studied the
 problem in detail, but I don't think that the cache awareness of the
 new revision is necessarily a distinct advantage.

I doubt it's a significant advantage in the current state of the code;
I'm happy if it's not a loss.  I was looking ahead to someday fetching key
values efficiently from large EXTERNAL (ie out-of-line-but-not-compressed)
JSON values, analogously to the existing optimization for fetching text
substrings from EXTERNAL text values.  As mentioned upthread, the current
JSONB representation would be seriously unfriendly to such a thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-26 Thread Arthur Silva
It won't be faster by any means, but it should definitely be incorporated
if any format changes are made (like Tom already suggested).

I think it's important we gather at least 2 more things before making any
calls:
* Josh tests w/ cache aware patch, which should confirm cache aware is
indeed prefered
* Tests with toast hacked to use lz4 instead, which might ease any decisions


--
Arthur Silva



On Wed, Aug 27, 2014 at 12:53 AM, Peter Geoghegan p...@heroku.com wrote:

 On Tue, Aug 26, 2014 at 8:41 PM, Arthur Silva arthur...@gmail.com wrote:
  The difference is small but I's definitely faster, which makes sense
 since
  cache line misses are probably slightly reduced.
  As in the previous runs, I ran the query a dozen times and took the
 average
  after excluding runs with a high deviation.

 I'm not surprised that it hasn't beaten HEAD. I haven't studied the
 problem in detail, but I don't think that the cache awareness of the
 new revision is necessarily a distinct advantage.

 --
 Peter Geoghegan



Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-24 Thread Arthur Silva
On Thu, Aug 21, 2014 at 6:20 PM, Josh Berkus j...@agliodbs.com wrote:

 On 08/20/2014 03:42 PM, Arthur Silva wrote:
  What data are you using right now Josh?

 The same data as upthread.

 Can you test the three patches (9.4 head, 9.4 with Tom's cleanup of
 Heikki's patch, and 9.4 with Tom's latest lengths-only) on your workload?

 I'm concerned that my workload is unusual and don't want us to make this
 decision based entirely on it.

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com



Here's my test results so far with the github archive data.

It's important to keep in mind that the PushEvent event objects that I use
in the queries only contains a small number of keys (8 to be precise), so
these tests don't really stress the changed code.

Anyway, in this dataset (with the small objects) using the all-lengths
patch provide small compression savings but the overhead is minimal.



Test data: 610MB of Json -- 341969 items

Index size (jsonb_ops): 331MB

Test query 1: SELECT data-'url', data-'actor' FROM t_json WHERE data @
'{type: PushEvent}'
Test query 1 items: 169732

Test query 2: SELECT data FROM t_json WHERE data @ '{type: PushEvent}'
Test query 2 items:


HEAD (aka, all offsets) EXTENDED
Size: 374MB
Toast Size: 145MB

Test query 1 runtime: 680ms
Test query 2 runtime: 405ms

HEAD (aka, all offsets) EXTERNAL
Size: 366MB
Toast Size: 333MB

Test query 1 runtime: 505ms
Test query 2 runtime: 350ms

All Lengths (Tom Lane patch) EXTENDED
Size: 379MB
Toast Size: 108MB

Test query 1 runtime: 720ms
Test query 2 runtime: 420ms

All Lengths (Tom Lane patch) EXTERNAL
Size: 366MB
Toast Size: 333MB

Test query 1 runtime: 525ms
Test query 2 runtime: 355ms


--
Arthur Silva


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-21 Thread Josh Berkus
On 08/20/2014 03:42 PM, Arthur Silva wrote:
 What data are you using right now Josh?

The same data as upthread.

Can you test the three patches (9.4 head, 9.4 with Tom's cleanup of
Heikki's patch, and 9.4 with Tom's latest lengths-only) on your workload?

I'm concerned that my workload is unusual and don't want us to make this
decision based entirely on it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >