On 12.4.12 19:15, Jukka Zitting wrote:
While String and Boolean are straight forward double, long and decimal
are already more troublesome.

As basic rules for handling the latter types I'd define something like this:

* A JSON value is a double if value.equals(Double.valueOf(value).toString())
* A JSON value is a long if value.equals(Long.valueOf(value).toString())
* A JSON value is a decimal if it's a JSON number that matches neither
of the above two rules

That's the approach I used in spi2microkernel. However, it has the drawback that we need to try catch through the cases when using the valueOf methods for determining the exact numeric type. Or we write our own code for parsing...

I'd be OK with us explicitly *not* supporting special cases like
infinities and NaN values. We'd just throw ValueFormatExceptions for
them on the oak-jcr level and IllegalArgumentExceptions on the
oak-core (or do something similar). Alternatively we should use
explicit typing information with a well defined syntax for expressing
such special cases.

Ack. Either way is fine with me.


Finally for binary, date, name, path, reference, weakreference and uri
there is no direct correspondence in JSON.

Right. There's an additional constraint for binary values in that the
MicroKernel garbage collector needs some way to connect JSON
properties to referenced binaries. It would be useful if the same
convention was used also higher up the stack.

Good point. Maybe Dominique can provide some insight here?


The way I solved this in spi2microkernel [1] is by encoding values by
serializing them to their string representation (Value.getString()) and
prepend its property type (value.getType) in radix 16 and a colon (:).

Sounds like a workable solution, though I have some reservations:

* The explicit encoding of numeric constants from JCR seems a bit
troublesome and makes potential extensions more cumbersome.

What extensions come to mind?


* The overloading of normal strings requires that all string values
will need to be checked for whether they need to be escaped.

That's a small penalty:

  if(s.length() >= 2 && s.charAt(1) == ':') { ... }


An alternative solution would be to use something like the @TypeHint
feature used by the JSON functionality in Sling. Instead of "@", we
should use something like "::" that's invalid in a JCR name to prevent
conflicts. With such a solution the example JSON object would look
like this:

     "example":{
       "long":123,
       "another long":"124",
       "another long::TypeHint":"long",
       "double":"123.4",
       "double::TypeHint":"double",
       "string":"foo",
       "another string":"a:string",
       "another string::TypeHint":"string"
     }

That's a bit verbose, so we could also put the type hint directly into
the relevant property name, like this:

The trouble with that is, that type info and value are spread across different properties. Setting a JCR property requires two JSON diff operations here. Worse for JCR observation: the corresponding set property entries might be spread across the journal.


     "example":{
       "long":123,
       "another long::long":"124",
       "double::double":"123.4",
       "string":"foo",
       "another string::string":"a:string"
     }

The main downsides of this approach are:

* Name-based property accesses will potentially need to traverse
through all properties to find a matching name. That should be
manageable since the implementation can pre-scan all property names
and split them to name and type parts.

* There's a potential for conflicts like when a JSON object contains
both "x" and "x::long" properties. That can be dealt with in a commit
validator that prevents such objects from being persisted.

So there are three different approaches now: 1) encoding the type into the value, 2) encoding it into a separate property, or 3) encoding it into the name of the property.

I think 2) is most troublesome for the reasons outlined above. Regarding 1) and 3) we should also think about consequences for query and indexing. Are there any drawbacks, advantages for either of those? Tom?


On a related note: what kind of values do we want to expose from oak-core?
JSON like or JCR like?

I'd ideally like to keep it JSON-like so we can easily implement a
JavaScript-friendly HTTP mapping directly based on the Oak API without
having to go through extra levels of mapping.

Hmmm, seems reasonable. What about Angela's concerns?

Michael


Implementation wise, would that en/decoding happen inside oak-jcr or oak-core?

I'd put the JSON-JCR type mapping into a shared helper class in
oak-core since it'll be needed by a lot of things like query and node
type handling inside oak-core. But the API interfaces should IMO be
based on JSON types to support cases where JCR typing isn't needed or
wanted.

BR,

Jukka Zitting

Reply via email to