Re: Encoding of JCR values in json

Michael Dürig Fri, 13 Apr 2012 02:54:04 -0700


On 13.4.12 7:00, Thomas Mueller wrote:

Hi,

* A JSON value is a double if
value.equals(Double.valueOf(value).toString())
* A JSON value is a long if value.equals(Long.valueOf(value).toString())
* A JSON value is a decimal if it's a JSON number that matches neither
of the above two rules


That's the approach I used in spi2microkernel. However, it has the
drawback that we need to try catch through the cases when using the
valueOf methods for determining the exact numeric type. Or we write our
own code for parsing...


Java BigDecimal values can be very large. I'm not sure if, for example,
MongoDB has some kind of limitation on numbers (precision, length). Then,
how would you distinguish between BigDecimal 10, Double 10, and Long 10?
An alternative might be: numbers with an "e" are double ("10e0"), numbers
with a dot are decimal ("10.0"), all other numbers are long ("10"). But
that would require the MicroKernel stores the *exact* JSON representation.
I don't think that's a good idea either.

The underlying storage mechanism must not be of concern to the user ofthe Microkernel API. The Microkernel API uses JSON to transport values.So the Microkernel implementation is in charge of serializing these intoa format suitable for the underlying storage mechanism. Even if I put anumber with 10 Million digits into a JSON property.

If we want/need to make restrictions here, we will have to clarify anddocument these. See OAK-11.


So I would prefer explicit typing, except for Long.

I'd be OK with us explicitly *not* supporting special cases like
infinities and NaN values.


I would prefer supporting them, using a well defined syntax (as a String).

Right. There's an additional constraint for binary values in that the
MicroKernel garbage collector needs some way to connect JSON
properties to referenced binaries. It would be useful if the same
convention was used also higher up the stack.


Good point. Maybe Dominique can provide some insight here?


There are two garbage collectors in the MK: the "node data" GC and the
"data store" GC. I guess this is about the data store GC, which I wrote,
not Dominique.

Currently the data store GC is on a high level. Marking binaries that are
still in use is done using MicroKernel.getLength(String blobId). But data
store GC needs to traverse all nodes in all available revisions, so it is
really slow. It would be nice if binaries can be indexed, so that garbage
collection doesn't have to traverse *all* nodes in the repository (in all
revisions). So binary references should be easy to recognize.

Not sure whether I understand. How could the GC possible know whether abinary is still in use or not? I could do


String blobId = mk.write(inStream);

and write the returned blobId on a piece of paper. According to thecurrent Microkernel contract I could come back after a couple of yearsand would still be able to retrieve that blob.


Michael

The way I solved this in spi2microkernel [1] is by encoding values by
serializing them to their string representation (Value.getString()) and
prepend its property type (value.getType) in radix 16 and a colon (:).


I think it's a good solution. It has been proven to be robust so far. Even
thought, for debugging purposes, I would prefer not to use hex digits. But
that's a minor issue really.

On a related note: what kind of values do we want to expose from
oak-core?
JSON like or JCR like?


I would use JCR like. The (oak-jcr to oak-core) remoting implementation
might use the same JSON conversion as used between oak-core and oak-mk,
but do we really need to define this now?

Regards,
Thomas

Re: Encoding of JCR values in json

Reply via email to