Re: Usecases around Binary handling in Oak

2016-09-07 Thread Alexander Klimetschek
Hi everyone,

late to the gameā€¦ back from a long leave :) I just wanted to chime in the 
security discussion.

Please be aware that the ReferenceBinary interface [1] exists today (I haven't 
seen that mentioned in this or the previous thread, please excuse if I missed 
it). It has a String getReference() method which in case of a filedata store 
will include the hash, from which you can calculate the file location. We have 
actively used this in a performance optimization as described in the use case 
as described by Chetan. See [2] for some code showcasing it.

Yes, this requires knowing an implementation detail (and we fall back to using 
the JCR binary interface in case the file cannot be found), but if you think 
there is a security issue, it exists in Jackrabbit/Oak already.

I do understand the performance problem, which can be a big one, so finding a 
secure solution would be great. The important case is IMO about bridging 
non-JCR-API capable application (say imagemagick, S3 URLs/browsers etc.), which 
cannot be rewritten to use the JCR API, with the file data store, and IIUC 
readonly access is fine (UC1 mostly).

Cheers,
Alex

[1] 
https://jackrabbit.apache.org/api/2.6/org/apache/jackrabbit/api/ReferenceBinary.html

[2] sample code

public static File getDataStoreRef(Node ntFile) throws RepositoryException {
if (ntFile.hasProperty(PN_FILE_DATA)) {
Property property = ntFile.getProperty(PN_FILE_DATA);
if (property.getType() == PropertyType.BINARY) {
Binary binary = property.getBinary();
if (binary instanceof ReferenceBinary) {
String ref = ((ReferenceBinary) binary).getReference();
// oak reference is "hash:something"
ref = StringUtils.substringBefore(ref, ":");
if (ref == null) {
// This happens when asset has been created before file 
datastore option was configured
// Looks like rendition data is not being extracted for 
existing assets
return null;
}
// hash to datastore file structure - from Jackrabbit 
FileDataStore
File file = new File(ref.substring(0, 2));
file = new File(file, ref.substring(2, 4));
file = new File(file, ref.substring(4, 6));
file = new File(file, ref);
return file;
}
}
}
return null;
}



Re: Seekable access to a Binary

2016-09-07 Thread Julian Reschke

On 2016-09-07 11:06, Michael Marth wrote:

Hi,

I believe Oak has no notion of requests - the 1-1 binding of a request to a 
session is done in Sling.
However, having said that: I was not aware of all the complexities you mention. 
To add one more: probably the design would have to encounter for different 
clustered Sling instances (that share 1 repository) that receive chunks 
belonging to the same binary. Is that right?

Afaik branches are not exposed into userland, but are an implementation detail. 
 When I made my comment below, I did not realize that in order for this to work 
branches would have exposed. I am not sure if that's a good idea. Also not sure 
if it would even solve the problem.
Maybe a better approach could be to persist the chunks in a temp space, similar 
to what Marcel suggested. But maybe that temp space could be a functionality of 
the datastore (I believe Marcel suggested to create a temp location by the user 
itself via the JCR API)

Michael

Sent from a mobile device



Maybe we could have a Oak specific InputStream implementation that wraps 
a series of existing Binary implementations, and which Oak, when writing 
a new binary, could leverage? (by not actually reading the binaries, but 
just copying references around...)


Best regards, Julian


Re: Seekable access to a Binary

2016-09-07 Thread Michael Marth
Hi,

I believe Oak has no notion of requests - the 1-1 binding of a request to a 
session is done in Sling.
However, having said that: I was not aware of all the complexities you mention. 
To add one more: probably the design would have to encounter for different 
clustered Sling instances (that share 1 repository) that receive chunks 
belonging to the same binary. Is that right?

Afaik branches are not exposed into userland, but are an implementation detail. 
 When I made my comment below, I did not realize that in order for this to work 
branches would have exposed. I am not sure if that's a good idea. Also not sure 
if it would even solve the problem.
Maybe a better approach could be to persist the chunks in a temp space, similar 
to what Marcel suggested. But maybe that temp space could be a functionality of 
the datastore (I believe Marcel suggested to create a temp location by the user 
itself via the JCR API)

Michael

Sent from a mobile device

_
From: Ian Boston >
Sent: Wednesday, September 7, 2016 9:36 AM
Subject: Re: Seekable access to a Binary
To: >


Hi,

On 6 September 2016 at 18:00, Michael Marth 
> wrote:

> Hi,
>
> I think it would be neat if we could utilize our existing mechanism rather
> than a new flag. In particular, MVCC and branches for session isolation.
> And also simply use session.save() to indicate that an upload is complete
> (and the branch containing the binaries/chunks can be merged).
>

Do branches and sessions hang around between requests ?

Each body part will come from different requests, sometimes separated by
hours and possibly even from different source IP addresses, especially
under upload restart conditions. At present, in streaming mode, as each
body part is encountered a session.save is performed to cause JCR/Oak to
read that input stream from the request, since JCR does not expose anything
that can be used to write binary data to the repository.

Best Regards
Ian



>
> Michael
>
> Sent from a mobile device
>
>
>
>
> On Tue, Sep 6, 2016 at 1:15 PM +0200, "Marcel Reutegger" <
> mreut...@adobe.com> 
> wrote:
>
> Hi,
>
> On 06/09/16 12:34, Bertrand Delacretaz wrote:
> > On Tue, Sep 6, 2016 at 9:49 AM, Marcel Reutegger 
> > >
> wrote:
> >> ...we'd still have to add
> >> Jackrabbit API to support it. E.g. something like:
> >>
> >> valueFactory.createBinary(existingBinary, appendThisInputStream); ...
> >
> > And maybe a way to mark the binary as "in progress" to avoid
> > applications using half-uploaded binaries?
>
> This can easily be prevented if the 'in progress' binary is
> uploaded to a temporary location first and then copied over
> to the correct location once complete. Keep in mind that
> copying a large existing binary in Oak is simply a cheap
> copy of the reference.
>
> Regards
> Marcel
>