On 05/06/2014 11:30 PM, Peter Geoghegan wrote:
On Tue, May 6, 2014 at 12:48 PM, Andres Freund <and...@anarazel.de> wrote:
Enthusiatically seconded. I've asked for that about three times without much 
success. If it had been my decision the patch wouldn't have been merged without 
that and other adjustments.

I'm almost certain that the only feedback of yours that I didn't
incorporate was that I didn't change the name of JsonbValue, a
decision I stand by, and also that I didn't add ascii art to
illustrate the on-disk format. I can write a patch that adds the
latter soon.

That would be great.

I found the serialization routine, convertJsonb() to be a bit of a mess. It's maintaining a custom stack of levels, which can be handy if you need to avoid recursion, but it's also relying on the native stack. And I didn't understand the point of splitting it into the "walk" and "put" functions; the division of labor between the two was far from clear IMHO. I started refactoring that, and ended up with the attached.

One detail that I found scary is that the estSize field in JsonbValue is not just any rough estimate. It's used ín the allocation of the output buffer for convertJsonb(), so it has to be large enough or you hit an assertion or buffer overflow. I believe it was correct as it was, but that kind of programming is always scary. I refactored the convertJsonb() function to use a StringInfo buffer instead, and removed estSize altogether.

This is still work-in-progress, but I thought I'd post this now to let people know I'm working on it. For example, the StringInfo isn't actually very well suited for this purpose, it might be better to have a custom buffer that's enlarged when needed.

For my own sanity, I started writing some docs on the on-disk format. See the comments in jsonb.h for my understanding of it. I moved around the structs a bit in jsonb.h, to make the format clearer, but the actual on-disk format is unchanged.

- Heikki

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index cf5d6f2..8413da7 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -65,7 +65,7 @@ jsonb_recv(PG_FUNCTION_ARGS)
 	if (version == 1)
 		str = pq_getmsgtext(buf, buf->len - buf->cursor, &nbytes);
 	else
-		elog(ERROR, "Unsupported jsonb version number %d", version);
+		elog(ERROR, "unsupported jsonb version number %d", version);
 
 	return jsonb_from_cstring(str, nbytes);
 }
@@ -249,7 +249,6 @@ jsonb_in_object_field_start(void *pstate, char *fname, bool isnull)
 	v.type = jbvString;
 	v.val.string.len = checkStringLen(strlen(fname));
 	v.val.string.val = pnstrdup(fname, v.val.string.len);
-	v.estSize = sizeof(JEntry) + v.val.string.len;
 
 	_state->res = pushJsonbValue(&_state->parseState, WJB_KEY, &v);
 }
@@ -290,8 +289,6 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype)
 	JsonbInState *_state = (JsonbInState *) pstate;
 	JsonbValue	v;
 
-	v.estSize = sizeof(JEntry);
-
 	switch (tokentype)
 	{
 
@@ -300,7 +297,6 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype)
 			v.type = jbvString;
 			v.val.string.len = checkStringLen(strlen(token));
 			v.val.string.val = pnstrdup(token, v.val.string.len);
-			v.estSize += v.val.string.len;
 			break;
 		case JSON_TOKEN_NUMBER:
 
@@ -312,7 +308,6 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype)
 			v.type = jbvNumeric;
 			v.val.numeric = DatumGetNumeric(DirectFunctionCall3(numeric_in, CStringGetDatum(token), 0, -1));
 
-			v.estSize += VARSIZE_ANY(v.val.numeric) +sizeof(JEntry) /* alignment */ ;
 			break;
 		case JSON_TOKEN_TRUE:
 			v.type = jbvBool;
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 1caaa4a..49a1d4d 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -37,49 +37,16 @@
 #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), \
 							 JENTRY_POSMASK))
 
-/*
- * State used while converting an arbitrary JsonbValue into a Jsonb value
- * (4-byte varlena uncompressed representation of a Jsonb)
- *
- * ConvertLevel:  Bookkeeping around particular level when converting.
- */
-typedef struct convertLevel
-{
-	uint32		i;				/* Iterates once per element, or once per pair */
-	uint32	   *header;			/* Pointer to current container header */
-	JEntry	   *meta;			/* This level's metadata */
-	char	   *begin;			/* Pointer into convertState.buffer */
-} convertLevel;
-
-/*
- * convertState:  Overall bookkeeping state for conversion
- */
-typedef struct convertState
-{
-	/* Preallocated buffer in which to form varlena/Jsonb value */
-	Jsonb	   *buffer;
-	/* Pointer into buffer */
-	char	   *ptr;
-
-	/* State for  */
-	convertLevel *allState,		/* Overall state array */
-			   *contPtr;		/* Cur container pointer (in allState) */
-
-	/* Current size of buffer containing allState array */
-	Size		levelSz;
-
-} convertState;
-
 static int	compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
 static int	lexicalCompareJsonbStringValue(const void *a, const void *b);
-static Size convertJsonb(JsonbValue *val, Jsonb *buffer);
-static inline short addPaddingInt(convertState *cstate);
-static void walkJsonbValueConversion(JsonbValue *val, convertState *cstate,
-						 uint32 nestlevel);
-static void putJsonbValueConversion(convertState *cstate, JsonbValue *val,
-						uint32 flags, uint32 level);
-static void putScalarConversion(convertState *cstate, JsonbValue *scalarVal,
-					uint32 level, uint32 i);
+static Jsonb *convertJsonb(JsonbValue *val);
+static inline short addPaddingInt(StringInfo buffer);
+static void walkJsonbValueConversion(JsonbValue *val, StringInfo buffer, JEntry *header, int level);
+static void walkJsonbArrayConversion(JsonbValue *val, StringInfo buffer, JEntry *header, int level);
+static void walkJsonbObjectConversion(JsonbValue *val, StringInfo buffer, JEntry *header, int level);
+static void putScalarConversion(StringInfo buffer, JsonbValue *scalarVal,
+								JEntry *header);
+static int reserveStringInfo(StringInfo str, int datalen);
 static void iteratorFromContainerBuf(JsonbIterator *it, char *buffer);
 static bool formIterIsContainer(JsonbIterator **it, JsonbValue *val,
 					JEntry *ent, bool skipNested);
@@ -110,7 +77,6 @@ Jsonb *
 JsonbValueToJsonb(JsonbValue *val)
 {
 	Jsonb	   *out;
-	Size		sz;
 
 	if (IsAJsonbScalar(val))
 	{
@@ -127,17 +93,11 @@ JsonbValueToJsonb(JsonbValue *val)
 		pushJsonbValue(&pstate, WJB_ELEM, val);
 		res = pushJsonbValue(&pstate, WJB_END_ARRAY, NULL);
 
-		out = palloc(VARHDRSZ + res->estSize);
-		sz = convertJsonb(res, out);
-		Assert(sz <= res->estSize);
-		SET_VARSIZE(out, sz + VARHDRSZ);
+		out = convertJsonb(res);
 	}
 	else if (val->type == jbvObject || val->type == jbvArray)
 	{
-		out = palloc(VARHDRSZ + val->estSize);
-		sz = convertJsonb(val, out);
-		Assert(sz <= val->estSize);
-		SET_VARSIZE(out, VARHDRSZ + sz);
+		out = convertJsonb(val);
 	}
 	else
 	{
@@ -337,28 +297,22 @@ findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
 			if (JBE_ISNULL(*e) && key->type == jbvNull)
 			{
 				result->type = jbvNull;
-				result->estSize = sizeof(JEntry);
 			}
 			else if (JBE_ISSTRING(*e) && key->type == jbvString)
 			{
 				result->type = jbvString;
 				result->val.string.val = data + JBE_OFF(*e);
 				result->val.string.len = JBE_LEN(*e);
-				result->estSize = sizeof(JEntry) + result->val.string.len;
 			}
 			else if (JBE_ISNUMERIC(*e) && key->type == jbvNumeric)
 			{
 				result->type = jbvNumeric;
 				result->val.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*e)));
-
-				result->estSize = 2 * sizeof(JEntry) +
-					VARSIZE_ANY(result->val.numeric);
 			}
 			else if (JBE_ISBOOL(*e) && key->type == jbvBool)
 			{
 				result->type = jbvBool;
 				result->val.boolean = JBE_ISBOOL_TRUE(*e) != 0;
-				result->estSize = sizeof(JEntry);
 			}
 			else
 				continue;
@@ -395,7 +349,6 @@ findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
 			candidate.type = jbvString;
 			candidate.val.string.val = data + JBE_OFF(*entry);
 			candidate.val.string.len = JBE_LEN(*entry);
-			candidate.estSize = sizeof(JEntry) + candidate.val.string.len;
 
 			difference = lengthCompareJsonbStringValue(&candidate, key, NULL);
 
@@ -410,28 +363,22 @@ findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
 				if (JBE_ISNULL(*v))
 				{
 					result->type = jbvNull;
-					result->estSize = sizeof(JEntry);
 				}
 				else if (JBE_ISSTRING(*v))
 				{
 					result->type = jbvString;
 					result->val.string.val = data + JBE_OFF(*v);
 					result->val.string.len = JBE_LEN(*v);
-					result->estSize = sizeof(JEntry) + result->val.string.len;
 				}
 				else if (JBE_ISNUMERIC(*v))
 				{
 					result->type = jbvNumeric;
 					result->val.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*v)));
-
-					result->estSize = 2 * sizeof(JEntry) +
-						VARSIZE_ANY(result->val.numeric);
 				}
 				else if (JBE_ISBOOL(*v))
 				{
 					result->type = jbvBool;
 					result->val.boolean = JBE_ISBOOL_TRUE(*v) != 0;
-					result->estSize = sizeof(JEntry);
 				}
 				else
 				{
@@ -443,7 +390,6 @@ findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
 					result->val.binary.data = data + INTALIGN(JBE_OFF(*v));
 					result->val.binary.len = JBE_LEN(*v) -
 						(INTALIGN(JBE_OFF(*v)) - JBE_OFF(*v));
-					result->estSize = 2 * sizeof(JEntry) + result->val.binary.len;
 				}
 
 				return result;
@@ -500,34 +446,28 @@ getIthJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 i)
 	if (JBE_ISNULL(*e))
 	{
 		result->type = jbvNull;
-		result->estSize = sizeof(JEntry);
 	}
 	else if (JBE_ISSTRING(*e))
 	{
 		result->type = jbvString;
 		result->val.string.val = data + JBE_OFF(*e);
 		result->val.string.len = JBE_LEN(*e);
-		result->estSize = sizeof(JEntry) + result->val.string.len;
 	}
 	else if (JBE_ISNUMERIC(*e))
 	{
 		result->type = jbvNumeric;
 		result->val.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*e)));
-
-		result->estSize = 2 * sizeof(JEntry) + VARSIZE_ANY(result->val.numeric);
 	}
 	else if (JBE_ISBOOL(*e))
 	{
 		result->type = jbvBool;
 		result->val.boolean = JBE_ISBOOL_TRUE(*e) != 0;
-		result->estSize = sizeof(JEntry);
 	}
 	else
 	{
 		result->type = jbvBinary;
 		result->val.binary.data = data + INTALIGN(JBE_OFF(*e));
 		result->val.binary.len = JBE_LEN(*e) - (INTALIGN(JBE_OFF(*e)) - JBE_OFF(*e));
-		result->estSize = result->val.binary.len + 2 * sizeof(JEntry);
 	}
 
 	return result;
@@ -558,7 +498,6 @@ pushJsonbValue(JsonbParseState **pstate, int seq, JsonbValue *scalarVal)
 			*pstate = pushState(pstate);
 			result = &(*pstate)->contVal;
 			(*pstate)->contVal.type = jbvArray;
-			(*pstate)->contVal.estSize = 3 * sizeof(JEntry);
 			(*pstate)->contVal.val.array.nElems = 0;
 			(*pstate)->contVal.val.array.rawScalar = (scalarVal &&
 											 scalarVal->val.array.rawScalar);
@@ -580,7 +519,6 @@ pushJsonbValue(JsonbParseState **pstate, int seq, JsonbValue *scalarVal)
 			*pstate = pushState(pstate);
 			result = &(*pstate)->contVal;
 			(*pstate)->contVal.type = jbvObject;
-			(*pstate)->contVal.estSize = 3 * sizeof(JEntry);
 			(*pstate)->contVal.val.object.nPairs = 0;
 			(*pstate)->size = 4;
 			(*pstate)->contVal.val.object.pairs = palloc(sizeof(JsonbPair) *
@@ -1209,244 +1147,209 @@ lexicalCompareJsonbStringValue(const void *a, const void *b)
 					  vb->val.string.len, DEFAULT_COLLATION_OID);
 }
 
+
 /*
- * Given a JsonbValue, convert to Jsonb and store in preallocated Jsonb buffer
- * sufficiently large to fit the value
+ * Reserve 'datalen' bytes at the end of StringInfo, enlarging the underlying
+ * buffer if necessary. Returns the offset to the reserved area.
  */
-static Size
-convertJsonb(JsonbValue *val, Jsonb *buffer)
+static int
+reserveStringInfo(StringInfo str, int datalen)
 {
-	convertState state;
-	Size		len;
+	int		offset;
 
-	/* Should not already have binary representation */
-	Assert(val->type != jbvBinary);
+	/* Make more room if needed */
+	enlargeStringInfo(str, datalen);
 
-	state.buffer = buffer;
-	/* Start from superheader */
-	state.ptr = VARDATA(state.buffer);
-	state.levelSz = 8;
-	state.allState = palloc(sizeof(convertLevel) * state.levelSz);
+	/* remember current offset */
+	offset = str->len;
 
-	walkJsonbValueConversion(val, &state, 0);
+	/* reserve the space */
+	str->len += datalen;
 
-	len = state.ptr - VARDATA(state.buffer);
+	/* XXX: we don't bother with the trailing null that normal StringInfo
+	 * functions append */
 
-	Assert(len <= val->estSize);
-	return len;
+	return offset;
 }
 
 /*
- * Walk the tree representation of Jsonb, as part of the process of converting
- * a JsonbValue to a Jsonb.
- *
- * This high-level function takes care of recursion into sub-containers, but at
- * the top level calls putJsonbValueConversion once per sequential processing
- * token (in a manner similar to generic iteration).
+ * Given a JsonbValue, convert to Jsonb. The result is palloc'd.
  */
-static void
-walkJsonbValueConversion(JsonbValue *val, convertState *cstate,
-						 uint32 nestlevel)
+static Jsonb *
+convertJsonb(JsonbValue *val)
 {
-	int			i;
+	StringInfoData buffer;
+	JEntry		header;
+	Jsonb	   *res;
 
-	check_stack_depth();
+	initStringInfo(&buffer);
 
-	if (!val)
-		return;
+	/* Should not already have binary representation */
+	Assert(val->type != jbvBinary);
 
-	switch (val->type)
-	{
-		case jbvArray:
+	/* Reserve the Jsonb header */
+	reserveStringInfo(&buffer, sizeof(VARHDRSZ));
 
-			putJsonbValueConversion(cstate, val, WJB_BEGIN_ARRAY, nestlevel);
-			for (i = 0; i < val->val.array.nElems; i++)
-			{
-				if (IsAJsonbScalar(&val->val.array.elems[i]) ||
-					val->val.array.elems[i].type == jbvBinary)
-					putJsonbValueConversion(cstate, val->val.array.elems + i,
-											WJB_ELEM, nestlevel);
-				else
-					walkJsonbValueConversion(val->val.array.elems + i, cstate,
-											 nestlevel + 1);
-			}
-			putJsonbValueConversion(cstate, val, WJB_END_ARRAY, nestlevel);
+	walkJsonbValueConversion(val, &buffer, &header, 0);
 
-			break;
-		case jbvObject:
+	res = (Jsonb *) buffer.data;
 
-			putJsonbValueConversion(cstate, val, WJB_BEGIN_OBJECT, nestlevel);
-			for (i = 0; i < val->val.object.nPairs; i++)
-			{
-				putJsonbValueConversion(cstate, &val->val.object.pairs[i].key,
-										WJB_KEY, nestlevel);
-
-				if (IsAJsonbScalar(&val->val.object.pairs[i].value) ||
-					val->val.object.pairs[i].value.type == jbvBinary)
-					putJsonbValueConversion(cstate,
-											&val->val.object.pairs[i].value,
-											WJB_VALUE, nestlevel);
-				else
-					walkJsonbValueConversion(&val->val.object.pairs[i].value,
-											 cstate, nestlevel + 1);
-			}
-			putJsonbValueConversion(cstate, val, WJB_END_OBJECT, nestlevel);
+	SET_VARSIZE(res, buffer.len);
 
-			break;
-		default:
-			elog(ERROR, "unknown type of jsonb container");
-	}
+	return res;
 }
 
 /*
- * walkJsonbValueConversion() worker.  Add padding sufficient to int-align our
- * access to conversion buffer.
+ * Convert a single JsonbValue to a Jsonb node. 
+ *
+ * The value is written out to 'buffer'. The JEntry header for this node is
+ * returned in *header. It is filled in with the length of this value, but if
+ * it is stored in an array or an object (which is always, except for the root
+ * node), it is the caller's responsibility to adjust it with the offset
+ * within the container.
+ *
+ * If the value is an array or an object, this recurses. 'level' is only used
+ * for debugging purposes.
  */
-static inline
-short
-addPaddingInt(convertState *cstate)
+static void
+walkJsonbValueConversion(JsonbValue *val, StringInfo buffer, JEntry *header, int level)
 {
-	short		padlen,
-				p;
-
-	padlen = INTALIGN(cstate->ptr - VARDATA(cstate->buffer)) -
-		(cstate->ptr - VARDATA(cstate->buffer));
+	check_stack_depth();
 
-	for (p = padlen; p > 0; p--)
-	{
-		*cstate->ptr = '\0';
-		cstate->ptr++;
-	}
+	if (!val)
+		return;
 
-	return padlen;
+	if (IsAJsonbScalar(val) || val->type == jbvBinary)
+		putScalarConversion(buffer, val, header);
+	else if (val->type == jbvArray)
+		walkJsonbArrayConversion(val, buffer, header, level);
+	else if (val->type == jbvObject)
+		walkJsonbObjectConversion(val, buffer, header, level);
+	else
+		elog(ERROR, "unknown type of jsonb container");
 }
 
-/*
- * walkJsonbValueConversion() worker.
- *
- * As part of the process of converting an arbitrary JsonbValue to a Jsonb,
- * copy over an arbitrary individual JsonbValue.  This function may copy any
- * type of value, even containers (Objects/arrays).  However, it is not
- * responsible for recursive aspects of walking the tree (so only top-level
- * Object/array details are handled).
- *
- * No details about their keys/values/elements are handled recursively -
- * rather, the function is called as required for the start of an Object/Array,
- * and the end (i.e.  there is one call per sequential processing WJB_* token).
- */
 static void
-putJsonbValueConversion(convertState *cstate, JsonbValue *val, uint32 flags,
-						uint32 level)
+walkJsonbArrayConversion(JsonbValue *val, StringInfo buffer, JEntry *pheader, int level)
 {
-	if (level == cstate->levelSz)
+	int			offset;
+	int			metaoffset;
+	int			i;
+	int			totallen;
+	JEntry		header;
+
+	/* Initialize pointer into conversion buffer at this level */
+	offset = buffer->len;
+
+	addPaddingInt(buffer);
+
+	/*
+	 * Construct the header Jentry, stored in the beginning of the variable-
+	 * length payload.
+	 */
+	header.header = val->val.array.nElems | JB_FARRAY;
+	if (val->val.array.rawScalar)
 	{
-		cstate->levelSz *= 2;
-		cstate->allState = repalloc(cstate->allState,
-									sizeof(convertLevel) * cstate->levelSz);
+		Assert(val->val.array.nElems == 1);
+		Assert(level == 0);
+		header.header |= JB_FSCALAR;
 	}
 
-	cstate->contPtr = cstate->allState + level;
+	appendBinaryStringInfo(buffer, (char *) &header, sizeof(uint32));
+	/* reserve space for the JEntries of the elements. */
+	metaoffset = reserveStringInfo(buffer, sizeof(JEntry) * val->val.array.nElems);
 
-	if (flags & (WJB_BEGIN_ARRAY | WJB_BEGIN_OBJECT))
+	totallen = 0;
+	for (i = 0; i < val->val.array.nElems; i++)
 	{
-		Assert(((flags & WJB_BEGIN_ARRAY) && val->type == jbvArray) ||
-			   ((flags & WJB_BEGIN_OBJECT) && val->type == jbvObject));
+		JsonbValue *elem = &val->val.array.elems[i];
+		int len;
+		JEntry meta;
+
+		walkJsonbValueConversion(elem, buffer, &meta, level + 1);
+		len = meta.header & JENTRY_POSMASK;
+		totallen += len;
+		if (i == 0)
+			meta.header |= JENTRY_ISFIRST;
+		else
+			meta.header = (meta.header & ~JENTRY_POSMASK) | totallen;
+		memcpy(&buffer->data[metaoffset + sizeof(JEntry) * i], &meta, sizeof(JEntry));
+	}
 
-		/* Initialize pointer into conversion buffer at this level */
-		cstate->contPtr->begin = cstate->ptr;
+	totallen = buffer->len - offset;
 
-		addPaddingInt(cstate);
+	/* Initialize the header of this node, in the container's JEntry array */
+	pheader->header = JENTRY_ISNEST | totallen;
+}
 
-		/* Initialize everything else at this level */
-		cstate->contPtr->header = (uint32 *) cstate->ptr;
-		/* Advance past header */
-		cstate->ptr += sizeof(uint32);
-		cstate->contPtr->meta = (JEntry *) cstate->ptr;
-		cstate->contPtr->i = 0;
+static void
+walkJsonbObjectConversion(JsonbValue *val, StringInfo buffer, JEntry *pheader, int level)
+{
+	JEntry		header;
+	int			offset;
+	int			metaoffset;
+	int			i;
+	int			totallen;
 
-		if (val->type == jbvArray)
-		{
-			*cstate->contPtr->header = val->val.array.nElems | JB_FARRAY;
-			cstate->ptr += sizeof(JEntry) * val->val.array.nElems;
+	/* Initialize pointer into conversion buffer at this level */
+	offset = buffer->len;
 
-			if (val->val.array.rawScalar)
-			{
-				Assert(val->val.array.nElems == 1);
-				Assert(level == 0);
-				*cstate->contPtr->header |= JB_FSCALAR;
-			}
-		}
-		else
-		{
-			*cstate->contPtr->header = val->val.object.nPairs | JB_FOBJECT;
-			cstate->ptr += sizeof(JEntry) * val->val.object.nPairs * 2;
-		}
-	}
-	else if (flags & WJB_ELEM)
-	{
-		putScalarConversion(cstate, val, level, cstate->contPtr->i);
-		cstate->contPtr->i++;
-	}
-	else if (flags & WJB_KEY)
-	{
-		Assert(val->type == jbvString);
+	addPaddingInt(buffer);
 
-		putScalarConversion(cstate, val, level, cstate->contPtr->i * 2);
-	}
-	else if (flags & WJB_VALUE)
-	{
-		putScalarConversion(cstate, val, level, cstate->contPtr->i * 2 + 1);
-		cstate->contPtr->i++;
-	}
-	else if (flags & (WJB_END_ARRAY | WJB_END_OBJECT))
-	{
-		convertLevel *prevPtr;	/* Prev container pointer */
-		uint32		len,
-					i;
+	/* Initialize header */
+	header.header = val->val.object.nPairs | JB_FOBJECT;
+	appendBinaryStringInfo(buffer, (char *) &header, sizeof(uint32));
 
-		Assert(((flags & WJB_END_ARRAY) && val->type == jbvArray) ||
-			   ((flags & WJB_END_OBJECT) && val->type == jbvObject));
+	/* reserve space for the JEntries of the keys and values */
+	metaoffset = reserveStringInfo(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);
 
-		if (level == 0)
-			return;
+	totallen = 0;
+	for (i = 0; i < val->val.object.nPairs; i++)
+	{
+		JsonbPair *pair = &val->val.object.pairs[i];
+		int len;
+		JEntry meta;
 
-		len = cstate->ptr - (char *) cstate->contPtr->begin;
+		/* put key */
+		putScalarConversion(buffer, &pair->key, &meta);
 
-		prevPtr = cstate->contPtr - 1;
+		len = meta.header & JENTRY_POSMASK;
+		totallen += len;
+		if (i == 0)
+			meta.header |= JENTRY_ISFIRST;
+		else
+			meta.header = (meta.header & ~JENTRY_POSMASK) | totallen;
+		memcpy(&buffer->data[metaoffset + sizeof(JEntry) * (i * 2)], &meta, sizeof(JEntry));
+
+		walkJsonbValueConversion(&pair->value, buffer, &meta, level);
+		len = meta.header & JENTRY_POSMASK;
+		totallen += len;
+		meta.header = (meta.header & ~JENTRY_POSMASK) | totallen;
+		memcpy(&buffer->data[metaoffset + sizeof(JEntry) * (i * 2 + 1)], &meta, sizeof(JEntry));
+	}
 
-		if (*prevPtr->header & JB_FARRAY)
-		{
-			i = prevPtr->i;
+	totallen = buffer->len - offset;
 
-			prevPtr->meta[i].header = JENTRY_ISNEST;
+	pheader->header = JENTRY_ISNEST | totallen;
+}
 
-			if (i == 0)
-				prevPtr->meta[0].header |= JENTRY_ISFIRST | len;
-			else
-				prevPtr->meta[i].header |=
-					(prevPtr->meta[i - 1].header & JENTRY_POSMASK) + len;
-		}
-		else if (*prevPtr->header & JB_FOBJECT)
-		{
-			i = 2 * prevPtr->i + 1;		/* Value, not key */
+/*
+ * Append padding, so that the length of the StringInfo is int-aligned.
+ * Returns the number of padding bytes appended.
+ */
+static inline
+short
+addPaddingInt(StringInfo buffer)
+{
+	short		padlen,
+				p;
 
-			prevPtr->meta[i].header = JENTRY_ISNEST;
+	padlen = INTALIGN(buffer->len) - buffer->len;
 
-			prevPtr->meta[i].header |=
-				(prevPtr->meta[i - 1].header & JENTRY_POSMASK) + len;
-		}
-		else
-		{
-			elog(ERROR, "invalid jsonb container type");
-		}
+	for (p = 0; p < padlen; p++)
+		appendStringInfoChar(buffer, '\0');
 
-		Assert(cstate->ptr - cstate->contPtr->begin <= val->estSize);
-		prevPtr->i++;
-	}
-	else
-	{
-		elog(ERROR, "unknown flag encountered during jsonb tree walk");
-	}
+	return padlen;
 }
 
 /*
@@ -1456,64 +1359,42 @@ putJsonbValueConversion(convertState *cstate, JsonbValue *val, uint32 flags,
  * This is a worker function for putJsonbValueConversion() (itself a worker for
  * walkJsonbValueConversion()).  It handles the details with regard to Jentry
  * metadata peculiar to each scalar type.
+ *
+ * It is the callers responsibility to shift the offset if this is stored
+ * in an array or object.
  */
 static void
-putScalarConversion(convertState *cstate, JsonbValue *scalarVal, uint32 level,
-					uint32 i)
+putScalarConversion(StringInfo buffer, JsonbValue *scalarVal, JEntry *header)
 {
 	int			numlen;
 	short		padlen;
 
-	cstate->contPtr = cstate->allState + level;
-
-	if (i == 0)
-		cstate->contPtr->meta[0].header = JENTRY_ISFIRST;
-	else
-		cstate->contPtr->meta[i].header = 0;
-
 	switch (scalarVal->type)
 	{
 		case jbvNull:
-			cstate->contPtr->meta[i].header |= JENTRY_ISNULL;
-
-			if (i > 0)
-				cstate->contPtr->meta[i].header |=
-					cstate->contPtr->meta[i - 1].header & JENTRY_POSMASK;
+			header->header = JENTRY_ISNULL;
 			break;
+
 		case jbvString:
-			memcpy(cstate->ptr, scalarVal->val.string.val, scalarVal->val.string.len);
-			cstate->ptr += scalarVal->val.string.len;
+			appendBinaryStringInfo(buffer, scalarVal->val.string.val, scalarVal->val.string.len);
 
-			if (i == 0)
-				cstate->contPtr->meta[0].header |= scalarVal->val.string.len;
-			else
-				cstate->contPtr->meta[i].header |=
-					(cstate->contPtr->meta[i - 1].header & JENTRY_POSMASK) +
-					scalarVal->val.string.len;
+			header->header = scalarVal->val.string.len;
 			break;
+
 		case jbvNumeric:
 			numlen = VARSIZE_ANY(scalarVal->val.numeric);
-			padlen = addPaddingInt(cstate);
+			padlen = addPaddingInt(buffer);
 
-			memcpy(cstate->ptr, scalarVal->val.numeric, numlen);
-			cstate->ptr += numlen;
+			appendBinaryStringInfo(buffer, (char *) scalarVal->val.numeric, numlen);
 
-			cstate->contPtr->meta[i].header |= JENTRY_ISNUMERIC;
-			if (i == 0)
-				cstate->contPtr->meta[0].header |= padlen + numlen;
-			else
-				cstate->contPtr->meta[i].header |=
-					(cstate->contPtr->meta[i - 1].header & JENTRY_POSMASK)
-					+ padlen + numlen;
+			header->header = JENTRY_ISNUMERIC | (padlen + numlen);
 			break;
+
 		case jbvBool:
-			cstate->contPtr->meta[i].header |= (scalarVal->val.boolean) ?
+			header->header = (scalarVal->val.boolean) ?
 				JENTRY_ISTRUE : JENTRY_ISFALSE;
-
-			if (i > 0)
-				cstate->contPtr->meta[i].header |=
-					cstate->contPtr->meta[i - 1].header & JENTRY_POSMASK;
 			break;
+
 		default:
 			elog(ERROR, "invalid jsonb scalar type");
 	}
@@ -1584,7 +1465,6 @@ formIterIsContainer(JsonbIterator **it, JsonbValue *val, JEntry *ent,
 	if (JBE_ISNULL(*ent))
 	{
 		val->type = jbvNull;
-		val->estSize = sizeof(JEntry);
 
 		return false;
 	}
@@ -1593,7 +1473,6 @@ formIterIsContainer(JsonbIterator **it, JsonbValue *val, JEntry *ent,
 		val->type = jbvString;
 		val->val.string.val = (*it)->dataProper + JBE_OFF(*ent);
 		val->val.string.len = JBE_LEN(*ent);
-		val->estSize = sizeof(JEntry) + val->val.string.len;
 
 		return false;
 	}
@@ -1602,15 +1481,12 @@ formIterIsContainer(JsonbIterator **it, JsonbValue *val, JEntry *ent,
 		val->type = jbvNumeric;
 		val->val.numeric = (Numeric) ((*it)->dataProper + INTALIGN(JBE_OFF(*ent)));
 
-		val->estSize = 2 * sizeof(JEntry) + VARSIZE_ANY(val->val.numeric);
-
 		return false;
 	}
 	else if (JBE_ISBOOL(*ent))
 	{
 		val->type = jbvBool;
 		val->val.boolean = JBE_ISBOOL_TRUE(*ent) != 0;
-		val->estSize = sizeof(JEntry);
 
 		return false;
 	}
@@ -1619,7 +1495,6 @@ formIterIsContainer(JsonbIterator **it, JsonbValue *val, JEntry *ent,
 		val->type = jbvBinary;
 		val->val.binary.data = (*it)->dataProper + INTALIGN(JBE_OFF(*ent));
 		val->val.binary.len = JBE_LEN(*ent) - (INTALIGN(JBE_OFF(*ent)) - JBE_OFF(*ent));
-		val->estSize = val->val.binary.len + 2 * sizeof(JEntry);
 
 		return false;
 	}
@@ -1694,8 +1569,6 @@ appendKey(JsonbParseState *pstate, JsonbValue *string)
 
 	object->val.object.pairs[object->val.object.nPairs].key = *string;
 	object->val.object.pairs[object->val.object.nPairs].order = object->val.object.nPairs;
-
-	object->estSize += string->estSize;
 }
 
 /*
@@ -1710,7 +1583,6 @@ appendValue(JsonbParseState *pstate, JsonbValue *scalarVal)
 	Assert(object->type == jbvObject);
 
 	object->val.object.pairs[object->val.object.nPairs++].value = *scalarVal;
-	object->estSize += scalarVal->estSize;
 }
 
 /*
@@ -1737,7 +1609,6 @@ appendElement(JsonbParseState *pstate, JsonbValue *scalarVal)
 	}
 
 	array->val.array.elems[array->val.array.nElems++] = *scalarVal;
-	array->estSize += scalarVal->estSize;
 }
 
 /*
@@ -1832,11 +1703,7 @@ uniqueifyJsonbObject(JsonbValue *object)
 		while (ptr - object->val.object.pairs < object->val.object.nPairs)
 		{
 			/* Avoid copying over duplicate */
-			if (lengthCompareJsonbStringValue(ptr, res, NULL) == 0)
-			{
-				object->estSize -= ptr->key.estSize + ptr->value.estSize;
-			}
-			else
+			if (lengthCompareJsonbStringValue(ptr, res, NULL) != 0)
 			{
 				res++;
 				if (ptr != res)
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 6b1ce9b..a9fedcb 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -1477,7 +1477,7 @@ each_worker_jsonb(FunctionCallInfo fcinfo, bool as_text)
 						StringInfo	jtext = makeStringInfo();
 						Jsonb	   *jb = JsonbValueToJsonb(&v);
 
-						(void) JsonbToCString(jtext, VARDATA(jb), 2 * v.estSize);
+						(void) JsonbToCString(jtext, VARDATA(jb), 0);
 						sv = cstring_to_text_with_len(jtext->data, jtext->len);
 					}
 
@@ -1797,7 +1797,7 @@ elements_worker_jsonb(FunctionCallInfo fcinfo, bool as_text)
 						StringInfo	jtext = makeStringInfo();
 						Jsonb	   *jb = JsonbValueToJsonb(&v);
 
-						(void) JsonbToCString(jtext, VARDATA(jb), 2 * v.estSize);
+						(void) JsonbToCString(jtext, VARDATA(jb), 0);
 						sv = cstring_to_text_with_len(jtext->data, jtext->len);
 					}
 
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index dea64ad..2e62c25 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -16,17 +16,6 @@
 #include "utils/array.h"
 #include "utils/numeric.h"
 
-/*
- * JB_CMASK is used to extract count of items
- *
- * It's not possible to get more than 2^28 items into an Jsonb.
- */
-#define JB_CMASK				0x0FFFFFFF
-
-#define JB_FSCALAR				0x10000000
-#define JB_FOBJECT				0x20000000
-#define JB_FARRAY				0x40000000
-
 /* Get information on varlena Jsonb */
 #define JB_ROOT_COUNT(jbp_)		( *(uint32*) VARDATA(jbp_) & JB_CMASK)
 #define JB_ROOT_IS_SCALAR(jbp_) ( *(uint32*) VARDATA(jbp_) & JB_FSCALAR)
@@ -109,35 +98,88 @@ typedef char *JsonbSuperHeader;
  * representation.  Often, JsonbValues are just shims through which a Jsonb
  * buffer is accessed, but they can also be deep copied and passed around.
  *
- * We have an abstraction called a "superheader".  This is a pointer that
- * conventionally points to the first item after our 4-byte uncompressed
- * varlena header, from which we can read flags using bitwise operations.
+ * Jsonb is a tree structure. Each node in the tree consists of a JEntry
+ * header, and a variable-length content.  The JEntry header indicates what
+ * kind of a node it is, e.g. a string or an array (see JENTRY_IS* macros),
+ * and the offset and length of its variable-length portion within the
+ * container.
+ *
+ * The header and the content of a node are not stored physically together.
+ * Instead, the array or object containing the node has an array that holds
+ * the JEntry headers of all the child nodes, followed by their variable-length
+ * portions.
  *
+ * The root node is an exception; it has no parent array or object that could
+ * hold its JEntry. Hence, there is no Jentry header for the root node.
+ * It is implicitly known that the the root node must be an array or an
+ * object. The content of both an array and an object begins with a uint32
+ * header field containing the number of elements, and an JB_FOBJECT or
+ * JB_FARRAY flag. By peeking into that header, we can determine which it is.
+ * When a naked scalar value needs to be stored as a Jsonb value, what we
+ * actually store is an array with one element, with the flags in the array's
+ * header field set to JB_FSCALAR | JB_FARRAY.
+ *
+ * The variable-length data of a container node, an array or an object,
+ * begins with a uint32 header. It contains the number of child nodes,
+ * and a flag indicating if it's an array or an object (JB_* macros).
+ * An array has one child node for each element, and an object has two
+ * child nodes for each  key-value pair. After the uint32 header, there is
+ * an array of JEntry structs, one for each child node, followed by the
+ * variable-length data of each child.
+ *
+ * To encode the length and offset of the variable-length portion of each
+ * node in a compact way, the JEntry stores only the end offset within the
+ * variable-length portion of the container node. For the first JEntry in the
+ * container's JEntry array, that equals to the length of the node data. For
+ * convenience, the JENTRY_ISFIRST flag is set. The begin offset and length
+ * of the rest of the entries can be calculated using the end offset of the
+ * previous JEntry in the array.
+ *
+ *
+ * Alignment
+ * ---------
+ *
+ * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
+ * the variable-length portion of some node types is aligned to a 4-byte
+ * boundary, while others are not. When alignment is needed, the padding is
+ * in the beginning of the node that requires it. For example, if a numeric
+ * node is stored after a string node, so that the numeric node begins at
+ * offset 3, the variable-length portion of the numeric node will begin with
+ * one padding byte.
+ *
+ * XXX: old comment below; what does this mean?
  * Frequently, we pass a superheader reference to a function, and it doesn't
  * matter if it points to just after the start of a Jsonb, or to a temp buffer.
  */
+
 typedef struct
 {
-	int32		vl_len_;		/* varlena header (do not touch directly!) */
-	uint32		superheader;
-	/* (array of JEntry follows, size determined using uint32 superheader) */
-} Jsonb;
+	uint32		header;			/* Shares some flags with superheader */
+} JEntry;
 
 /*
- * JEntry: there is one of these for each key _and_ value for objects.  Arrays
- * have one per element.
- *
- * The position offset points to the _end_ so that we can get the length by
- * subtraction from the previous entry.  The JENTRY_ISFIRST flag indicates if
- * there is a previous entry.
+ * A jsonb array or object node.
+ * 
+ * An array has one child for each element. An object has two children for
+ * each key/value pair.
  */
+typedef struct JsonbContainer
+{
+	uint32		header;			/* number of children, and flags (JB_* below) */
+	JEntry		children[1];	/* variable length */
+} JsonbContainer;
+
 typedef struct
 {
-	uint32		header;			/* Shares some flags with superheader */
-} JEntry;
+	int32		vl_len_;		/* varlena header (do not touch directly!) */
+	JsonbContainer root;
+} Jsonb;
 
-#define IsAJsonbScalar(jsonbval)	((jsonbval)->type >= jbvNull && \
-									 (jsonbval)->type <= jbvBool)
+#define JB_CMASK				0x0FFFFFFF
+
+#define JB_FSCALAR				0x10000000
+#define JB_FOBJECT				0x20000000
+#define JB_FARRAY				0x40000000
 
 /*
  * JsonbValue:	In-memory representation of Jsonb.  This is a convenient
@@ -161,8 +203,6 @@ struct JsonbValue
 		jbvBinary
 	}			type;			/* Influences sort order */
 
-	int			estSize;		/* Estimated size of node (including subnodes) */
-
 	union
 	{
 		Numeric numeric;
@@ -194,6 +234,9 @@ struct JsonbValue
 	}			val;
 };
 
+#define IsAJsonbScalar(jsonbval)	((jsonbval)->type >= jbvNull && \
+									 (jsonbval)->type <= jbvBool)
+
 /*
  * Pair within an Object.
  *
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to