Re: API proposal for - Expose URL for Blob source (OAK-1963)

Michael Dürig Mon, 09 May 2016 01:58:36 -0700

Hi,

I very much share Francesco's concerns here. Unconditionally exposingaccess to operation system resources underlying Oak's inner working istroublesome for various reasons:

- who owns the resource? Who coordinates (concurrent) access to it andhow? What are the correctness and performance implications here (races,deadlock, corruptions, JCR semantics)?

- it limits implementation freedom and hinders further evolution(chunking, de-duplication, content based addressing, compression, gc,etc.) for data stores.


- bypassing JCR's security model

Pretty much all of this has been discussed in the scope ofhttps://issues.apache.org/jira/browse/JCR-3534 andhttps://issues.apache.org/jira/browse/OAK-834. So I suggest to reviewthose discussions before we jump to conclusion.

Also what is the use case requiring such a vast API surface? Can't wecome up with an API that allows the blobs to stay under control of Oak?If not, this is probably an indication that those blobs shouldn't gointo Oak but just references to it as Francesco already proposed.Anything else is whether fish nor fowl: you can't have the JCR goodiesbut at the same time access underlying resources at will.


Michael



On 5.5.16 11:00 , Francesco Mari wrote:

This proposal introduces a huge leak of abstractions and has deep security
implications.

I guess that the reason for this proposal is that some users of Oak would
like to perform some operations on binaries in a more performant way by
leveraging the way those binaries are stored. If this is the case, I
suggest those users to evaluate an applicative solution implemented on top
of the JCR API.

If a user needs to store some important binary data (files, images, etc.)
in an S3 bucket or on the file system for performance reasons, this
shouldn't affect how Oak handles blobs internally. If some assets are of
special interest for the user, then the user should bypass Oak and take
care of the storage of those assets directly. Oak can be used to store
*references* to those assets, that can be used in user code to manipulate
the assets in his own business logic.

If the scenario I outlined is not what inspired this proposal, I would like
to know more about the reasons why this proposal was brought up. Which
problems are we going to solve with this API? Is there a more concrete use
case that we can use as a driving example?

2016-05-05 10:06 GMT+02:00 Davide Giannella <[email protected]>:

On 04/05/2016 17:37, Ian Boston wrote:

Hi,
If the File or URL is writable, will writing to the location cause issues
for Oak ?
IIRC some Oak DS implementations use a digest of the content to determine
the location in the DS, so changing the content via Oak will change the
location, but changing the content via the File or URL wont. If I didn't
remember correctly, then ignore the concern.  Fully supportive of the
approach, as a consumer of Oak. The locations will certainly probably

leak

outside the context of an Oak session so the API contract should make it
clear that the code using a direct location needs to behave responsibly.


It's a reasonable concern and I'm not in the details of the
implementation. It's worth to keep in mind though and remember if we
want to adapt to URL or File that maybe we'll have to come up with some
sort of read-only version of such.

For the File class, IIRC, we could force/use the setReadOnly(),
setWritable() methods. I remember those to be quite expensive in time
though.

Davide

Re: API proposal for - Expose URL for Blob source (OAK-1963)

Reply via email to