multi-def values vs. security, elucidated and solved

John Rose Wed, 10 Apr 2019 11:25:41 -0700

One recurrent question about inlined value types is
whether they are less secure than regular object types.


The question revolves around a scenario where an
inlined value instance X functions as a security token,
and the value of a private field of X (X.p) must be
secured.  In this scenario, the attacker creates a
series of guesses G1, G2, … which attempt to
replicate the value X, substituting various guessed
values for X.p (G1.p, G2.p, etc.).  If the attacker
finds a guess Gi where Gi==X, then the attacker
has "unlocked" X by exposing the value of X.p,
since it must be the same as Gi.p which the attacker
has already guessed and now has confirmed.

This attack scenario is relatively narrow because
it requires that the possible values of X.p can be
enumerated in the time the attacker has to perform
the attack.  The time order for this attack is thus
O(N) where N is the number of possible values of
X.p.

(If X implements Comparable and X.p is a key in the
comparison, then the attack can be performed in
O(log N).  This is often feasible where the O(N)
attack is not.)

Why is this not a problem with classic indirect
objects (those which have identity)?  Because the
tool for comparing Gi with X, the == operator,
immediately returns false for any of the Gi,
since those were created by the attacker.

(If X is a classic object which implements Comparable,
then the attack is more feasible, even with classic
objects, since the attacker can use the compareTo
operation to bracket the X.p value between positive
and negative results.  This problem applies equally
to classic indirect objects and inline value objects.)

So classic indirect objects are highly resistant to
equality tests against attacker-created indirect objects,
because the equality test will fail unless the attacker
compares X with X itself—which gives the attacker
no new information.

Meanwhile, inline value objects are not resistant to
equality tests, so the guessing can eventually (in O(N)
time) produce a match against X.

In short, an exactly copy of the inline value object X
can be forged (as a lucky Gi) by an independent party.

Pulling back from the attack per se, we can observe
that a classic indirect object possesses an identity
thas is created at that object's defining site (a "new X"
expression or bytecode).  No other defining site in
space or time will ever create the same identity.

An inline value object V possesses no such identity,
and, therefore, several defining sites (a "new V"
expression or invokestatic bytecode) can end up
creating the *same* value, over and over again.

All occurrences of the same indirect object have
the same defining site; they are all connected by
a chain of data-flow from definition to use.
Multiple occurrences of the same value may have
*distinct* defining sites, *not* connected by
chains of data flow.  The first time the two copies
of the same value come together might be when
they are first compared.  They will compare equal
(if they are the same value), even though they came
from different data-flow chains of definition to
use (from two different definitions).  This never
happens for classic indirect objects.

This difference between classic indirect and
new inline types suggests a defense against
the attack scenario proposed above.  What if
we could ask a value type to emulate the special
property that a definition-to-use data-flow chain
is the only way for one value (of a given type X)
to be a copy of itself?  Forging a series of guesses
G1, G2, … would then be impossible.

In fact, this is readily done, and without damaging
the other desirable properties of inline value types.
Simply endow the type "X" with an extra private
field "X.q" which is initialized (in the constructor
of X) by the expression, "new Object()".  This
augmented version of X will (drum roll, please)
possess a bona fide *object identity* which cannot
be forged by an attacker.

If you think about this, the status of the JVM's
invisible object header takes on a new aspect,
that of a *field* which carries the *object identity*,
and is *inherited* from the type of all classic
indirect objects.  We have sometimes called this
hypothetical type "RefObject".  The idea here is
that every classic indirect object inherited,
from RefObject, an object identity, notionally
stored in the object header.  (Actually, it's the
address of the object header which is used,
but the point remains that if you have a header,
you can derive an object identity from it, by
taking its address.)  Meanwhile, every inline
value object does *not* have such a header.
(Some of its many copies *may* have headers,
but these headers are prevented from being
significant.)  So an instance of C <: RefObject
*inherits* an object identity from RefObject.

Meanwhile, an inline value instance X is not
an instance of RefObject, and does *not* inherit
the header nor the object identity.  *But*,
if the instance X wishes to acquire an object
identity, it can do so by *aggregation* instead
of *inheritance*.  Et voila; the upgraded version
of X has no header, but its object identity lives
on, in the field X.q.  Problem solved.

Therefore, if an inline value object is going to
be used as an unforgeable security token, and
the author is worried about an object-forging
attack, the attack can be headed off by adding
an object identity *field*.  There will be a cost
in footprint, but the object will continue to
possess all the other properties of inline values,
including flattenability.  Perhaps the author of
the class is already including a classic indirect
object reference X.a in the class definition.
If that is the case, a quick "clone" operation in
the constructor before setting X.a can smuggle in
an object identity without an increase in footprint.

I think these observations adequately answer the
persistent security concern about forging inline
value objects.  And they also help us understand
more deeply "what's in a value".

— John

multi-def values vs. security, elucidated and solved

Reply via email to