[DISCUSS] Impact of not updating last-modified time when writing a blob that already exists?

2019-10-02 Thread Matt Ryan
Hi,

I'm working on OAK-8105 which is to update AzureDataStore to use the new
Azure v12 SDK instead of the deprecated v8 SDK, and may have run into a
snag where I could use some input from the team.

The main issue:  Current cloud data store implementations (Azure and S3)
have the following behavior:  When a client tries to write a binary that
already exists in blob storage, instead of writing the binary, the existing
binary has the last-modified time updated and a record for the existing
binary is returned as the result.  The question is:  What would be the
impact if we were unable to update the last-modified time in this situation?

Background:  AzureDataStore currently allows authentication/authorization
to the Azure storage service two different ways.  One is via an access key
- essentially a shared secret created by the storage service.  The other
way is via a shared access signature, which can be generated via an API
call.  Importantly we don't use "both" in a single instance - we use the
access key if it is provided, and otherwise use the shared access signature.

Azure's API does not allow modifying the last-modified property of a blob
directly.  To do this up until now we have issued a service-side blob copy
instruction to copy the blob to itself, which has the effect of updating
the last-modified value.

However, with the new Azure SDK, based on my testing there are certain API
operations that you cannot perform when you authenticate with a shared
access signature.  One of these actions you cannot perform is a
service-side blob copy.  I am working with Microsoft directly to try to
find a workaround, but if my testing is correct we may not be able to
update the last-modified value in the situation of writing an already
existing binary, if a shared access signature is used to authenticate.

(It is possible this never worked with the old SDK either; I don't think
that particular behavior was ever tested using a shared access signature
before today.)


If we cannot find a workaround I see the following options:
- Don't update the last-modified value if we authenticate using a shared
access signature.  (Or don't worry about it at all if it doesn't actually
matter - but I assume it does matter.)
- Don't allow authentication/authorization with shared access signatures
for AzureDataStore.  (This would potentially break existing implementations
that are using this method to authenticate.)


Sorry for the long email, but I thought the full context was necessary.
Open to thoughts on this.


-MR


Re: [DISCUSS] Branching and release: version numbers

2019-09-27 Thread Matt Ryan
+1.  I was wondering this same thing.


=MR

On Fri, Sep 27, 2019 at 3:57 AM Julian Reschke 
wrote:

> On 04.03.2019 14:29, Davide Giannella wrote:
> > ...
>
> Picking up an old thread...
>
> So we've released 1.12.0, 1.14.0, 1.16.0, and will release 1.18.0 next
> week.
>
> What we apparently did not discuss what the project version for trunk
> should be in the meantime.
>
> So far, we've been using 1.12-SNAPSHOT, etc, and we are on 1.20-SNAPHOT
> right now.
>
> This however seems incorrect to me; shouldn't it be 1.19-SNAPSHOT?
>
> For this release I'd like to avoid any changes, but for future releases
> I'd like to document that we're using an odd-numbered version.
>
> Feedback appreciated,
>
> Julian
>
>


Re: [VOTE] Release Apache Jackrabbit Oak 1.18.0

2019-09-25 Thread Matt Ryan
On Wed, Sep 25, 2019 at 1:24 AM Julian Reschke  wrote:

> Please vote on releasing this package as Apache Jackrabbit Oak 1.18.0.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
>  [X] +1 Release this package as Apache Jackrabbit Oak 1.18.0
>
>
WHERE:

Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3;
2018-10-24T12:41:47-06:00)
[INFO] OS name: "mac os x", version: "10.14.6", arch: "x86_64", family:
"mac"
[INFO] Java version: 11.0.2, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk-11.0.2.jdk/Contents/Home
[INFO] MAVEN_OPTS:
[INFO]

[INFO] ALL CHECKS OK
[INFO]



Re: [VOTE] Release Apache Jackrabbit Oak 1.10.5

2019-09-13 Thread Matt Ryan
>
> Please vote on releasing this package as Apache Jackrabbit Oak 1.10.5.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
> [X] +1 Release this package as Apache Jackrabbit Oak 1.10.5
> [ ] -1 Do not release this package because...
>
>
>
Where:

[INFO] Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3;
2018-10-24T12:41:47-06:00)
[INFO] OS name: "mac os x", version: "10.14.6", arch: "x86_64", family:
"mac"
[INFO] Java version: 11.0.2, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk-11.0.2.jdk/Contents/Home
[INFO] MAVEN_OPTS:


-MR


Re: Oak 1.10.5 release plan

2019-09-12 Thread Matt Ryan
Hi,

OAK-8599 is committed now so we are good to go for 1.10 from that point of
view.


-MR

On Wed, Sep 11, 2019 at 9:30 AM Matt Ryan  wrote:

> Hi Nitin,
>
> I need the backport for OAK-8599 to be in 1.10.5 if possible.  I've
> updated the labels.  I should have no problem having this backport done in
> time though.
>
> On Tue, Sep 10, 2019 at 8:50 PM Nitin Gupta  wrote:
>
>> Hello Team,
>>
>> I am planning to cut 1.10.5 for oak on Friday (13th September) .
>>
>> Issues - https://issues.apache.org/jira/projects/OAK/versions/12346002
>> (No
>> open / in progress issues) .
>>
>> If there are any objections , please let me know.
>>
>> Thanks,
>> Nitin
>>
>


Re: Oak 1.10.5 release plan

2019-09-11 Thread Matt Ryan
Hi Nitin,

I need the backport for OAK-8599 to be in 1.10.5 if possible.  I've updated
the labels.  I should have no problem having this backport done in time
though.

On Tue, Sep 10, 2019 at 8:50 PM Nitin Gupta  wrote:

> Hello Team,
>
> I am planning to cut 1.10.5 for oak on Friday (13th September) .
>
> Issues - https://issues.apache.org/jira/projects/OAK/versions/12346002 (No
> open / in progress issues) .
>
> If there are any objections , please let me know.
>
> Thanks,
> Nitin
>


Propose to backport OAK-8599 fix

2019-09-11 Thread Matt Ryan
Hi,

I propose to backport the bugfix for OAK-8599 to 1.10.  This fix makes the
implementation more in line with the Javadoc and is a low-risk fix.

-MR


Intent to backport OAK-8298 to 1.10

2019-08-23 Thread Matt Ryan
Hi,

I propose to backport the fix to OAK-8298 to 1.10.  This is a bug fix for
direct binary access to ensure that binaries added via direct upload are
also tracked via the blob id tracker.
The fix is low risk in my view.


-MR


Input requested for speeding up signed download URI creation

2019-08-19 Thread Matt Ryan
Hi oak-dev,

Creating a signed download URI itself is really fast and does not require
communication with the cloud storage service (e.g. Azure, S3) to create the
URI.  However, in the current implementation we actually make at least
three network calls when we do this:
1 and 2) When we call BinaryImpl#getUri(), we check to see if we can get a
reference [0] for the binary to ensure the binary is not inlined.  We
cannot create a signed download URI for an inlined binary obviously.
However, calling getReference() actually turns into two network calls, at
least for Azure - one to check if the blob ID exists in the blob store, and
one to get the blob properties that are needed to construct a DataRecord
(e.g. last modified time, content length).  Then the reference is returned
from this record.
3) When the data store backend creates the URI it checks to see if the
binary is actually in the cloud storage before creating the URI [1] (see
OAK-7998).  It is possible to have a Binary instance get a reference for it
even if it is not uploaded yet (i.e. it is in the cache still).

I'm looking for ways to speed this up (seek OAK-8551 and OAK-8552).  Some
questions:
* Is there a better way to see if a Binary is stored inline than trying to
get a reference for it?
* For the purposes of getting a reference, could we bypass the creation of
the DataRecord and just ask the storage backend for a reference?  If you
follow the code path you can see the creation of the DataRecord and
therefore the network calls are not actually needed to get the reference.
* Could we instead rely on the BlobIdTracker for the exists() check in the
storage backend?  This of course would require a fix for OAK-8298.

Any other ideas?


[0] -
https://github.com/apache/jackrabbit-oak/blob/2bef22fe5c8ebd69d65bb05fd457c08713f51aa4/oak-store-spi/src/main/java/org/apache/jackrabbit/oak/plugins/value/jcr/BinaryImpl.java#L96
[1] -
https://github.com/apache/jackrabbit-oak/blob/2bef22fe5c8ebd69d65bb05fd457c08713f51aa4/oak-blob-cloud-azure/src/main/java/org/apache/jackrabbit/oak/blob/cloud/azure/blobstorage/AzureBlobStoreBackend.java#L778


-MR


Apache Jackrabbit swag available if interested

2019-08-19 Thread Matt Ryan
Hi,

Since I'm speaking at ApacheCon NA next month, there will be Apache
Jackrabbit and Apache Jackrabbit Oak stickers available at the Apache booth
at the conference.  If you are planning on attending ApacheCon NA next
month in Las Vegas, stop by and pick some up (and look me up too, while you
are at it).

For those not attending, stickers and other stuff (e.g. shirts) are
available online at RedBubble if you are interested in that sort of thing.
Search for works by "Apache Community Development."

I'm not trying to promote the swag, just letting people know if they are
interested.


-MR


Re: New Jackrabbit Committer: Nitin Gupta

2019-08-15 Thread Matt Ryan
Welcome Nitin!


-MR

On Thu, Aug 15, 2019 at 6:51 AM Julian Sedding  wrote:

> Welcome Nitin!
>
> Regards
> Julian
>
> On Wed, Aug 14, 2019 at 3:24 PM Woonsan Ko  wrote:
> >
> > Welcome, Nitin!
> >
> > Cheers,
> >
> > Woonsan
> >
> > On Wed, Aug 14, 2019 at 2:30 AM Tommaso Teofili
> >  wrote:
> > >
> > > Welcome to the team Nitin!
> > >
> > > Regards,
> > > Tommaso
> > >
> > > On Thu, 8 Aug 2019 at 08:31, Marcel Reutegger 
> wrote:
> > >>
> > >> Hi,
> > >>
> > >> Please welcome Nitin Gupta as a new committer and PMC member of
> > >> the Apache Jackrabbit project. The Jackrabbit PMC recently decided to
> > >> offer Nitin committership based on his contributions. I'm happy to
> > >> announce that he accepted the offer and that all the related
> > >> administrative work has now been taken care of.
> > >>
> > >> Welcome to the team, Nitin!
> > >>
> > >> Regards
> > >>  Marcel
> > >>
>


Re: New Jackrabbit Committer: Mohit Kataria

2019-08-15 Thread Matt Ryan
Welcome Mohit!


-MR

On Thu, Aug 15, 2019 at 6:51 AM Julian Sedding  wrote:

> Welcome Mohit!
>
> Regards
> Julian
>
> On Wed, Aug 14, 2019 at 3:25 PM Woonsan Ko  wrote:
> >
> > Welcome, Mohit!
> >
> > Cheers,
> >
> > Woonsan
> >
> > On Wed, Aug 14, 2019 at 2:31 AM Tommaso Teofili
> >  wrote:
> > >
> > > Welcome to the team Mohit!
> > >
> > > Regards,
> > > Tommaso
> > >
> > > On Thu, 8 Aug 2019 at 08:33, Marcel Reutegger 
> wrote:
> > >>
> > >> Hi,
> > >>
> > >> Please welcome Mohit Kataria as a new committer and PMC member of
> > >> the Apache Jackrabbit project. The Jackrabbit PMC recently decided to
> > >> offer Mohit committership based on his contributions. I'm happy to
> > >> announce that he accepted the offer and that all the related
> > >> administrative work has now been taken care of.
> > >>
> > >> Welcome to the team, Mohit!
> > >>
> > >> Regards
> > >>  Marcel
> > >>
>


Re: [VOTE] Release Apache Jackrabbit Oak 1.10.4

2019-08-12 Thread Matt Ryan
Hi,

On Mon, Aug 12, 2019 at 3:55 AM Nitin Gupta  wrote:

> Please vote on releasing this package as Apache Jackrabbit Oak 1.10.4.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
> [X] +1 Release this package as Apache Jackrabbit Oak 1.10.4
> [ ] -1 Do not release this package because...
>

Where:

Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3;
2018-10-24T12:41:47-06:00)
[INFO] OS name: "mac os x", version: "10.14.6", arch: "x86_64", family:
"mac"
[INFO] Java version: 11.0.2, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk-11.0.2.jdk/Contents/Home
[INFO] MAVEN_OPTS:
[INFO]

[INFO] ALL CHECKS OK


[jira] [Updated] (JCR-4463) Update JavaDoc for completeBinaryUpload explaining idempotency

2019-08-08 Thread Matt Ryan (JIRA)


 [ 
https://issues.apache.org/jira/browse/JCR-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated JCR-4463:
---
Affects Version/s: 2.18.2

> Update JavaDoc for completeBinaryUpload explaining idempotency
> --
>
> Key: JCR-4463
> URL: https://issues.apache.org/jira/browse/JCR-4463
> Project: Jackrabbit Content Repository
>  Issue Type: Improvement
>  Components: jackrabbit-api
>Affects Versions: 2.18.2
>    Reporter: Matt Ryan
>Assignee: Matt Ryan
>Priority: Minor
> Fix For: 2.18.3
>
>
> In OAK-8520 a fix was added in the direct binary upload implementation so 
> that if a client calls {{completeBinaryUpload()}} multiple times with the 
> same upload token, subsequent calls will return the already-uploaded binary 
> without making any change to the backend.  It would be good to reflect this 
> case in the JavaDocs for {{JackrabbitValueFactory.completeBinaryUpload()}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (JCR-4463) Update JavaDoc for completeBinaryUpload explaining idempotency

2019-08-08 Thread Matt Ryan (JIRA)


[ 
https://issues.apache.org/jira/browse/JCR-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903281#comment-16903281
 ] 

Matt Ryan commented on JCR-4463:


Fixed in 
[r1864732|https://svn.apache.org/viewvc?view=revision=1864732].

> Update JavaDoc for completeBinaryUpload explaining idempotency
> --
>
> Key: JCR-4463
> URL: https://issues.apache.org/jira/browse/JCR-4463
> Project: Jackrabbit Content Repository
>  Issue Type: Improvement
>  Components: jackrabbit-api
>Reporter: Matt Ryan
>    Assignee: Matt Ryan
>Priority: Minor
> Fix For: 2.18.3
>
>
> In OAK-8520 a fix was added in the direct binary upload implementation so 
> that if a client calls {{completeBinaryUpload()}} multiple times with the 
> same upload token, subsequent calls will return the already-uploaded binary 
> without making any change to the backend.  It would be good to reflect this 
> case in the JavaDocs for {{JackrabbitValueFactory.completeBinaryUpload()}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (JCR-4463) Update JavaDoc for completeBinaryUpload explaining idempotency

2019-08-08 Thread Matt Ryan (JIRA)


 [ 
https://issues.apache.org/jira/browse/JCR-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan resolved JCR-4463.

Resolution: Fixed

> Update JavaDoc for completeBinaryUpload explaining idempotency
> --
>
> Key: JCR-4463
> URL: https://issues.apache.org/jira/browse/JCR-4463
> Project: Jackrabbit Content Repository
>  Issue Type: Improvement
>  Components: jackrabbit-api
>Reporter: Matt Ryan
>    Assignee: Matt Ryan
>Priority: Minor
> Fix For: 2.18.3
>
>
> In OAK-8520 a fix was added in the direct binary upload implementation so 
> that if a client calls {{completeBinaryUpload()}} multiple times with the 
> same upload token, subsequent calls will return the already-uploaded binary 
> without making any change to the backend.  It would be good to reflect this 
> case in the JavaDocs for {{JackrabbitValueFactory.completeBinaryUpload()}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (JCR-4463) Update JavaDoc for completeBinaryUpload explaining idempotency

2019-08-08 Thread Matt Ryan (JIRA)


 [ 
https://issues.apache.org/jira/browse/JCR-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated JCR-4463:
---
Fix Version/s: 2.18.3

> Update JavaDoc for completeBinaryUpload explaining idempotency
> --
>
> Key: JCR-4463
> URL: https://issues.apache.org/jira/browse/JCR-4463
> Project: Jackrabbit Content Repository
>  Issue Type: Improvement
>  Components: jackrabbit-api
>Reporter: Matt Ryan
>    Assignee: Matt Ryan
>Priority: Minor
> Fix For: 2.18.3
>
>
> In OAK-8520 a fix was added in the direct binary upload implementation so 
> that if a client calls {{completeBinaryUpload()}} multiple times with the 
> same upload token, subsequent calls will return the already-uploaded binary 
> without making any change to the backend.  It would be good to reflect this 
> case in the JavaDocs for {{JackrabbitValueFactory.completeBinaryUpload()}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Reopened] (JCR-4463) Update JavaDoc for completeBinaryUpload explaining idempotency

2019-08-08 Thread Matt Ryan (JIRA)


 [ 
https://issues.apache.org/jira/browse/JCR-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan reopened JCR-4463:

  Assignee: Matt Ryan

Needs to be fixed in 2.18 branch.

> Update JavaDoc for completeBinaryUpload explaining idempotency
> --
>
> Key: JCR-4463
> URL: https://issues.apache.org/jira/browse/JCR-4463
> Project: Jackrabbit Content Repository
>  Issue Type: Improvement
>  Components: jackrabbit-api
>Reporter: Matt Ryan
>    Assignee: Matt Ryan
>Priority: Minor
>
> In OAK-8520 a fix was added in the direct binary upload implementation so 
> that if a client calls {{completeBinaryUpload()}} multiple times with the 
> same upload token, subsequent calls will return the already-uploaded binary 
> without making any change to the backend.  It would be good to reflect this 
> case in the JavaDocs for {{JackrabbitValueFactory.completeBinaryUpload()}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (JCR-4463) Update JavaDoc for completeBinaryUpload explaining idempotency

2019-08-08 Thread Matt Ryan (JIRA)


 [ 
https://issues.apache.org/jira/browse/JCR-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan resolved JCR-4463.

Resolution: Duplicate

Replaced by OAK-8536.

> Update JavaDoc for completeBinaryUpload explaining idempotency
> --
>
> Key: JCR-4463
> URL: https://issues.apache.org/jira/browse/JCR-4463
> Project: Jackrabbit Content Repository
>  Issue Type: Improvement
>  Components: jackrabbit-api
>Reporter: Matt Ryan
>Priority: Minor
>
> In OAK-8520 a fix was added in the direct binary upload implementation so 
> that if a client calls {{completeBinaryUpload()}} multiple times with the 
> same upload token, subsequent calls will return the already-uploaded binary 
> without making any change to the backend.  It would be good to reflect this 
> case in the JavaDocs for {{JackrabbitValueFactory.completeBinaryUpload()}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (JCR-4463) Update JavaDoc for completeBinaryUpload explaining idempotency

2019-08-08 Thread Matt Ryan (JIRA)
Matt Ryan created JCR-4463:
--

 Summary: Update JavaDoc for completeBinaryUpload explaining 
idempotency
 Key: JCR-4463
 URL: https://issues.apache.org/jira/browse/JCR-4463
 Project: Jackrabbit Content Repository
  Issue Type: Improvement
  Components: jackrabbit-api
Reporter: Matt Ryan


In OAK-8520 a fix was added in the direct binary upload implementation so that 
if a client calls {{completeBinaryUpload()}} multiple times with the same 
upload token, subsequent calls will return the already-uploaded binary without 
making any change to the backend.  It would be good to reflect this case in the 
JavaDocs for {{JackrabbitValueFactory.completeBinaryUpload()}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: Oak 1.10.4 release plan

2019-08-08 Thread Matt Ryan
Hi Nitin,

I should have OAK-8520 backported to 1.10.4 by the end of the week,
assuming nobody objects to the proposed backport.


-MR

On Thu, Aug 8, 2019 at 2:52 AM Nitin Gupta  wrote:

> Hello Team,
>
>
>
> I am planning to cut 1.10.4 for oak on Monday (12th Aug) .
>
> This is the only issue in Progress for 1.10.4 as of now -
> https://issues.apache.org/jira/browse/OAK-8520 .
>
> There are 6 issues marked as candidates for 1-10
>
> https://issues.apache.org/jira/browse/OAK-8507?jql=labels%20%3D%20candidate_oak_1_10
>
>
>
> If there are any objections please let me know . Otherwise I will
> reschedule any Open/In Progress issues for the next iteration .
>
>
>
>
>
> Also , this is the first time I am cutting an oak release – would be
> following the documentation here
> http://jackrabbit.apache.org/jcr/creating-releases.html .
>
>
>
> Thanks,
>
> Nitin
>


Propose to backport OAK-8520 to 1.10.4

2019-08-06 Thread Matt Ryan
Hi oak-dev,

I propose to backport the OAK-8520 bugfix to 1.10.4.

The fix is pretty low risk.  The issue it fixes can be a problem for the
use case described in the bug.  There is a unit test for the fix for both
S3 and Azure.

-MR


Proposed improvements to direct binary access

2019-07-30 Thread Matt Ryan
Hi oak-dev,

I'm asking for your feedback on two proposed improvements to direct binary
access.

As we've been testing the use of direct binary access we've come across a
couple of interesting edge cases.  I've created proposals for these edge
cases in OAK-8519 and OAK-8520.

OAK-8519 - When requesting a direct download URI, if one cannot be
generated the API returns null.  But there are multiple cases where null
can be returned.  OAK-8519 proposes that we clear up this potential
ambiguity and help clients know better why the URI was not created.

OAK-8520 - If a client issues a call to complete a binary upload more than
one time with the same token, it is possible to overwrite an existing
binary this way.  This proposal is to check to see if a binary already
exists before completing the upload.

Please take a minute to read the issues, and if you have thoughts please
chime in on the relevant issue(s).


-MR


Re: Backport OAK-7998 to 1.10.3

2019-07-18 Thread Matt Ryan
Ah yes, I think that was a typo.  Thanks Julian.

On Thu, Jul 18, 2019 at 2:14 AM Julian Reschke 
wrote:

> On 18.07.2019 02:06, Matt Ryan wrote:
> > Hi,
> >
> > I propose to backport the fix for OAK-7998 to 1.10.3.
> >
> > The issue in OAK-7998 is that it is possible to obtain a direct download
> > URI for a binary that doesn't exist in blob storage.  While not usually
> > possible, this situation can arise if the binary in question was added
> via
> > addRecord() and then a download URI is immediately requested, if the
> binary
> > is in cache and is not yet uploaded to cloud storage.  In such a case the
> > binary is "in the repo" but we can't create a valid download URI for it
> > until it is actually in cloud storage.
> >
> > The fix is implemented for 1.16.  It is a low-risk change - a couple of
> > unit test changes and an additional check to see whether the blob ID
> exists
> > in both the S3 and Azure backend implementations.
> >
> >
> > -MR
>
> +1, but note that 1.10.3 was released last week. So it would be
> something for 1.10.4...
>
> Best regards, Julian
>
>


Backport OAK-7998 to 1.10.3

2019-07-17 Thread Matt Ryan
Hi,

I propose to backport the fix for OAK-7998 to 1.10.3.

The issue in OAK-7998 is that it is possible to obtain a direct download
URI for a binary that doesn't exist in blob storage.  While not usually
possible, this situation can arise if the binary in question was added via
addRecord() and then a download URI is immediately requested, if the binary
is in cache and is not yet uploaded to cloud storage.  In such a case the
binary is "in the repo" but we can't create a valid download URI for it
until it is actually in cloud storage.

The fix is implemented for 1.16.  It is a low-risk change - a couple of
unit test changes and an additional check to see whether the blob ID exists
in both the S3 and Azure backend implementations.


-MR


Next Oakathon scheduled

2019-06-19 Thread Matt Ryan
Hi,

The next Oakathon has now been scheduled for August 19-23, 2019.  It will
take place at the Adobe office at Barfüsserplatz 6 in Basel, Switzerland.
All contributors to Apache Jackrabbit Oak are invited to attend, either in
person or via videoconference.

OAK-8416 has been created for this Oakathon.  Details of the Oakathon will
emerge over the next weeks within that ticket.  Please feel free to add
agenda items via issue comments.

-MR


Re: June 2019 report: draft for review

2019-06-11 Thread Matt Ryan
+1. Good to see mentions of the change to release cadence and wiki use.

-MR

On Tue, Jun 11, 2019 at 8:08 AM Julian Reschke 
wrote:

> On 11.06.2019 13:40, Marcel Reutegger wrote:
> > Hi,
> >
> > The draft for the June 2019 board report is available here:
> > http://jackrabbit.apache.org/jcr/status/board-report-2019-06.html
> >
> > Please review and let me know if something is missing or incorrect.
> >
> > I will submit the report tomorrow.
> >
> > Regards
> >   Marcel
>
> Looks good to me.
>
> Best regards, Julian
>
>


Re: [VOTE] Release Apache Jackrabbit Oak 1.14.0

2019-06-05 Thread Matt Ryan
> Please vote on releasing this package as Apache Jackrabbit Oak 1.14.0.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
> [X] +1 Release this package as Apache Jackrabbit Oak 1.14.0
>
>

Where:

[INFO] Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3;
2018-10-24T12:41:47-06:00)
[INFO] OS name: "mac os x", version: "10.14.5", arch: "x86_64", family:
"mac"
[INFO] Java version: 11.0.2, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk-11.0.2.jdk/Contents/Home
[INFO] MAVEN_OPTS:
[INFO]

[INFO] ALL CHECKS OK


Unused features in AzureDataStore?

2019-05-28 Thread Matt Ryan
Hi,

I'm in the process of updating AzureDataStore to use the latest Azure SDK -
which requires almost a complete rewrite of AzureBlobStoreBackend.  See
OAK-8105.

In doing this I'm seeing some minor features in the old implementation that
do not appear to have a direct counterpart in the new SDK.  So far there
are two that stand out:
- Allowing the use of a proxy (proxy host and port settings) to be used to
communicate with the cloud storage
- Allowing the definition of the connection using a shared-access signature
rather than via account name and account key

Does anyone know if these features are actually used?  I don't wish to
spuriously deprecate features, especially if they are used in the wild.
However, there is already a serious issue with Azure (OAK-8104) waiting on
this update.


-MR


Using CDNs in Direct Binary Access

2019-04-25 Thread Matt Ryan
Hi oak-dev,

I want to draw your attention to OAK-7702, which discusses adding
capability to the direct binary access feature to use CDN URIs instead of
standard URIs.  Both cloud service providers that Oak supports, Azure and
AWS, offer CDN capabilities to serve content in a blob storage container.
I've done some testing which I mentioned in OAK-7702 where I've learned
that in most cases using a signed CDN URI instead of a standard signed URI
for direct download offers equal or better throughput to using standard
URIs.  This appears to be true even for the first request of a blob
(meaning, one that is not cached in the CDN).

If you are interested in this topic please direct your questions or
comments to OAK-7702 for discussion.


Thanks!


-MR


Re: Configuring Oak for direct binary access

2019-03-27 Thread Matt Ryan
On Wed, Mar 27, 2019 at 8:11 AM Marcel Reutegger 
wrote:

> Hi Robert,
>
> The official documentation is here:
> https://jackrabbit.apache.org/oak/docs/features/direct-binary-access.html
>
> If the information you are looking for is missing, then it would probably
> be good
> to file an issue to improve the documentation.
>

+1.  Robert, if the link Marcel gave still doesn't help, please file an
issue (or several if needed :) ) and assign them to me, and I'll try to
fill in the gaps.


-MR


Re: [DISCUSS] Branching and release: version numbers

2019-03-20 Thread Matt Ryan
On Wed, Mar 20, 2019 at 4:53 AM Julian Reschke 
wrote:

> On 20.03.2019 11:36, Davide Giannella wrote:
> > On 05/03/2019 10:18, Davide Giannella wrote:
> >> On 04/03/2019 13:31, Robert Munteanu wrote:
> >>> As you mentioned, we don't need to increase the major version whenever
> >>> we branch. I just wanted to clarify that since in this email thread
> >>> branching seems to be conflated with major version increases and that
> >>> IMO not correct (and your reply seems to support that).
> >> +1
> >>
> >
> > during a chat with Amit a realised that we will still have to release a
> > version number with a revision to `0`.  So we'll have 1.12.0, 1.14.0,
> > 1.16.0 etc.
> >
> > This will make our life easier in OSGi environments when we'll have to
> > branch as the first patch release will be 1.14.1 (for example) which
> > will definitely be greater than 1.14.0.
> >
> > OSGi and maven speaking 1.14 and 1.14.0 are the same version
> >
> > http://versionatorr.appspot.com/?a=1.14=1.14.0
> >
> > so we either make sure to release 1.14 and 1.14.1 or we release 1.14.0.
> >
> > Thoughts?
>
> In the spirit of not changing things that do not need to be changed:
> 1.14.0.
>
>
>
+1

-MR


Re: [VOTE] Release Apache Jackrabbit Oak 1.10.2

2019-03-18 Thread Matt Ryan
On Mon, Mar 18, 2019 at 8:08 AM Julian Reschke  wrote:

>
> Please vote on releasing this package as Apache Jackrabbit Oak 1.10.2.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
>  [X] +1 Release this package as Apache Jackrabbit Oak 1.10.2
>  [ ] -1 Do not release this package because...
>
>
>
Where:


[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.14.3", arch: "x86_64", family:
"mac"
[INFO] Java version: 11.0.2, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk-11.0.2.jdk/Contents/Home
[INFO] MAVEN_OPTS:
[INFO]

[INFO] ALL CHECKS OK


Re: [DISCUSS] Branching and releasing: frequency

2019-03-14 Thread Matt Ryan
+1 from me.  Every two months seems a bit more practical for where we are
now.

Thanks Davide!


-MR

On Tue, Mar 12, 2019 at 8:59 AM Davide Giannella  wrote:

> Good afternoon team,
>
> as we discussed in separate threads about strategies and version numbers
> and we agreed in principles on the way forward we still have to address
> one topic of the release: the frequency.
>
> Was looking at what JDK does and merge it with our own experience.
>
> JDK releases a new one every 6 months and on top there will be a quarter
> release called "update release". We could see them as bugfix for us.
>
> I personally think that such frequency is way too long and while gives
> time to everyone to update to a new version is kind-of shifting us away
> from the model of no-branch.
>
> Probably by looking at the past I'm a bit thorn between a monthly or
> every two months release. I'm slightly more in favour of the latter (2
> months). I think it will be the right time for adoption and burden for
> us in terms of release cutting.
>
> Additionally we were never strict on timing and we always accommodated
> shifts in both direction (early as well as late) to make room for what's
> needed.
>
> Any thoughts on the above?
>
> Davide
>


[AzureDataStore] Removing dependency on outdated Azure SDK

2019-03-04 Thread Matt Ryan
Hi,

I've learned that Azure has released a new Java SDK for blob storage that
replaces the SDK originally used to create the AzureDataStore.  The new SDK
is not backwards compatible with the original, but contains a key bug fix
for an Oak bug identified in OAK-8013.

I'd like to have a discussion whether we should consider updating
AzureDataStore to use this latest Azure SDK.  Please take some time to read
and weigh in.

Question 1 - Why move from the old SDK to the new SDK?
The old SDK has a bug which prevents a fix for OAK-8013 (see also
OAK-8104).  In the current state, Oak will not properly support direct
download of binaries with special characters in the filename.  The way to
fix this issue is to move away from the old SDK.

Question 2 - Why is moving to the new SDK a big deal?
The new SDK is completely different from the old SDK.  While the new SDK
has new classes etc., the primary difference is a new paradigm - it uses a
more fluent/event-driven, async-style model.  Using this new SDK will
require that AzureDataStore do some tricks to perform the async operations
in synchronized ways, have to manage conversions from byte buffers to
streams, etc.  So not only is the new SDK not backward compatible, it also
uses a different approach.  This will result in substantial changes to
AzureDataStore, with a significant accompanying risk.
In addition, I've been playing with the new SDK over the past few days and
I have concerns about the SDK itself.  A very basic sample application,
which is nearly a verbatim copy of their online sample, prints warnings to
the console when it is run:

> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by com.microsoft.rest.v2.Validator to
field java.util.HashMap.serialVersionUID
> WARNING: Please consider reporting this to the maintainers of
com.microsoft.rest.v2.Validator
> WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
> WARNING: All illegal access operations will be denied in a future release

I've seen other issues, like unhandled exceptions, in other sample apps
I've created, even in code that actually does perform the desired tasks
correctly.

Question 3 - What are our options?
I see three:
1. Stay with the current, deprecated Azure SDK.  We would probably be
unable to fix OAK-8013/OAK-8105 correctly in that case, at least for Azure,
which would mean direct downloads of files with special characters in the
filename would not work.  (I suppose it is theoretically possible that
Microsoft would implement a fix in the deprecated SDK, but considering that
this bug is fixed in the new SDK I think it is unlikely.)
2. Update AzureDataStore to use the latest SDK.  I expect this will be a
significant effort - several weeks probably, at least, due to many unknowns
and the errors and exceptions the code currently produces and trying to
work them out of the code.
3. Rip out the Azure SDK dependencies altogether and instead implement
AzureDataStore directly to the Azure REST endpoints.

The last option is one I'm strongly considering.  Moving away from the SDK
is perhaps not great at first, but it avoids this problem in the future and
we don't have to worry about trying to accommodate an asynchronous API in
our synchronous access model.  I don't expect that the work will be any
greater.  My primary concern is whether we can rely on backwards
compatibility in the REST APIs moving forward.  I'm trying to find this out.

What does everyone else think?  What questions do I need to get answered?
Which option sounds best, or is there another better option I didn't list?


-MR


Re: Intent to backport OAK-8013

2019-02-27 Thread Matt Ryan
For reference:  https://issues.apache.org/jira/browse/OAK-8013


On Wed, Feb 27, 2019 at 4:41 PM Matt Ryan  wrote:

> Hi,
>
> I would like to backport OAK-8013 to Oak 1.10.  This change introduces a
> workaround for an issue with the direct binary access code that is caused
> by a bug in the Azure SDK.
>
> When a client requests a signed direct download URI, Oak includes a
> specification in the signed URI to tell the service provider how it should
> set the Content-Disposition on responses to requests for the signed URI.
> The filename* portion of that header needs to be properly encoded, but the
> Azure SDK does not handle this properly.  This workaround prevents HTTP
> clients from running into errors parsing the response headers due to an
> improperly formatted filaneme* portion of the Content-Disposition header.
> It is a temporary workaround until we can get a working solution in the
> Azure SDK.
>
> The risk is low, it is limited only to direct binary download use cases,
> and only applies to 1.10 and later.  Please let me know if there are any
> objections.
>
>
> -MR
>
>
>


Intent to backport OAK-8013

2019-02-27 Thread Matt Ryan
Hi,

I would like to backport OAK-8013 to Oak 1.10.  This change introduces a
workaround for an issue with the direct binary access code that is caused
by a bug in the Azure SDK.

When a client requests a signed direct download URI, Oak includes a
specification in the signed URI to tell the service provider how it should
set the Content-Disposition on responses to requests for the signed URI.
The filename* portion of that header needs to be properly encoded, but the
Azure SDK does not handle this properly.  This workaround prevents HTTP
clients from running into errors parsing the response headers due to an
improperly formatted filaneme* portion of the Content-Disposition header.
It is a temporary workaround until we can get a working solution in the
Azure SDK.

The risk is low, it is limited only to direct binary download use cases,
and only applies to 1.10 and later.  Please let me know if there are any
objections.


-MR


Re: Guidance for OAK-8013

2019-02-07 Thread Matt Ryan
Hi,

I've updated OAK-8013 with a proposal for how to move forward for now.
Please take a look and let me know what you think.


-MR

On Fri, Feb 1, 2019 at 11:52 AM Matt Ryan  wrote:

>
>
> On Fri, Feb 1, 2019 at 11:07 AM Matt Ryan  wrote:
>
>>
>> On Fri, Feb 1, 2019 at 11:05 AM Julian Reschke 
>> wrote:
>>
>>> On 2019-02-01 17:27, Matt Ryan wrote:
>>> > ...
>>> > Looking for feedback on this.  WDYT?
>>> > ...
>>>
>>> Did you already report a bug to Azure?
>>>
>>> Best regards, Julian
>>>
>>
>>
>> Next steps in my plan are to retry this with the latest SDK version; if
>> the bug persists then I will report an issue with them today.
>>
>>
> This is not fixed in the latest SDK version.  I have filed
> https://github.com/Azure/azure-sdk-for-java/issues/2900.
>
> In the meantime, I'm still open to brilliant ideas for workarounds :)
>
>
> -MR
>


Re: Guidance for OAK-8013

2019-02-01 Thread Matt Ryan
On Fri, Feb 1, 2019 at 11:07 AM Matt Ryan  wrote:

>
> On Fri, Feb 1, 2019 at 11:05 AM Julian Reschke 
> wrote:
>
>> On 2019-02-01 17:27, Matt Ryan wrote:
>> > ...
>> > Looking for feedback on this.  WDYT?
>> > ...
>>
>> Did you already report a bug to Azure?
>>
>> Best regards, Julian
>>
>
>
> Next steps in my plan are to retry this with the latest SDK version; if
> the bug persists then I will report an issue with them today.
>
>
This is not fixed in the latest SDK version.  I have filed
https://github.com/Azure/azure-sdk-for-java/issues/2900.

In the meantime, I'm still open to brilliant ideas for workarounds :)


-MR


Re: Guidance for OAK-8013

2019-02-01 Thread Matt Ryan
On Fri, Feb 1, 2019 at 11:05 AM Julian Reschke 
wrote:

> On 2019-02-01 17:27, Matt Ryan wrote:
> > ...
> > Looking for feedback on this.  WDYT?
> > ...
>
> Did you already report a bug to Azure?
>
> Best regards, Julian
>


Next steps in my plan are to retry this with the latest SDK version; if the
bug persists then I will report an issue with them today.

-MR


Guidance for OAK-8013

2019-02-01 Thread Matt Ryan
Hi,

I'm investigating OAK-8013 and looking for more opinions on what to do.
The problem is not as simple to solve as it seems due to inconsistent
behavior in Azure, which appears to be a bug in their SDK, and I'm not sure
how to handle this in the meantime.

The basic issue for OAK-8013 relates to how we specify the value for the
Content-Disposition header to be set on a direct download.  When requesting
a direct binary download URI, both S3 and Azure allow us to specify the
value that the provider should put in the Content-Disposition header when
the URI is requested.

Currently, Oak does not properly encode the "filename*" portion of this
value, which can lead to parsing errors for clients that parse the
response.  See OAK-8013 for details.

Properly encoding the "filename*" part works great in S3, but not in Azure
due to some odd behaviors in how they sign the URI and how (and when) they
encode the URI parameters.  See OAK-8013 for details; ultimately this
appears to be a bug in the Azure SDK, but it is probably too early to state
this definitively.

I think the proper way to solve this problem is for me to try to resolve
this with the Azure SDK project.  I have concerns about how long that might
take however, and in the meantime we have an implementation that does not
work well for Azure.

I can see two possible workarounds, neither of which I like:
- Encode the filename in the "filename*" portion of the header twice, let
the Azure SDK mess around with it when signing the URI, then fix it back to
what it should be before sending the URI back to the client.  It is an ugly
hack but appears to work in the testing.
- Don't include the "filename*" portion of the header at all, which would
mean that filenames with special characters wouldn't be supported

Either fix would probably have to be done to trunk and backported to 1.10,
and then fixed again in both branches (plus any newer releases) once there
was a fix in Azure - if that happens.


Looking for feedback on this.  WDYT?


-MR


Config to disable text extraction (OAK-7996)

2019-01-18 Thread Matt Ryan
Hi,

I've created OAK-7996 [0] to discuss allowing us to disable automatic text
extraction by configuration instead of using a tika.config in an index
definition to do it.

This was originally proposed as a possible Oak change last November, but in
discussion we agreed not to attempt this change at that time due to the
close proximity of the 1.10 release.  Now that 1.10 is out, I wonder if we
could consider this for a future release.

One use case for disabling text extraction would be if a user is performing
the text extraction for a new binary on their own, outside of Oak.

WDYT?


[0] - https://issues.apache.org/jira/browse/OAK-7996


-MR


Re: [VOTE] Release Apache Jackrabbit Oak 1.8.11

2019-01-14 Thread Matt Ryan
On Mon, Jan 14, 2019 at 9:19 AM Julian Reschke  wrote:

> Please vote on releasing this package as Apache Jackrabbit Oak 1.8.11.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
>  [X] +1 Release this package as Apache Jackrabbit Oak 1.8.11

 [ ] -1 Do not release this package because...
>
>
Where:

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.14.2", arch: "x86_64", family:
"mac"
[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre
[INFO] MAVEN_OPTS:
[INFO]

[INFO] ALL CHECKS OK

-MR


Re: [VOTE] Release Apache Jackrabbit Oak 1.6.16 (take 2)

2019-01-14 Thread Matt Ryan
On Fri, Jan 11, 2019 at 3:51 PM Davide Giannella  wrote:

> Please vote on releasing this package as Apache Jackrabbit Oak 1.6.16.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
> [X] +1 Release this package as Apache Jackrabbit Oak 1.6.16
> [ ] -1 Do not release this package because...
>

Where:

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.14.2", arch: "x86_64", family:
"mac"
[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre
[INFO] MAVEN_OPTS:
[INFO]

[INFO] ALL CHECKS OK


-MR


Re: [VOTE] Release Apache Jackrabbit Oak 1.10.0 (take 2)

2019-01-14 Thread Matt Ryan
On Fri, Jan 11, 2019 at 6:40 PM Davide Giannella  wrote:

> Please vote on releasing this package as Apache Jackrabbit Oak 1.10.0.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
> [X] +1 Release this package as Apache Jackrabbit Oak 1.10.0
> [ ] -1 Do not release this package because...
>
>
Where:

[INFO]

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.14.2", arch: "x86_64", family:
"mac"
[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre
[INFO] MAVEN_OPTS:
[INFO]

[INFO] ALL CHECKS OK


Re: [VOTE] Release Apache Jackrabbit Oak 1.10.0

2019-01-11 Thread Matt Ryan
On Thu, Jan 10, 2019 at 8:55 AM Davide Giannella  wrote:

>
> Please vote on releasing this package as Apache Jackrabbit Oak 1.10.0.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
> [X] +1 Release this package as Apache Jackrabbit Oak 1.10.0
> [ ] -1 Do not release this package because...
>
>
Where:

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.14.2", arch: "x86_64", family:
"mac"
[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre
[INFO] MAVEN_OPTS:
[INFO]

[INFO] ALL CHECKS OK


-MR


Re: Oak 1.10.0 release plan

2019-01-09 Thread Matt Ryan
The updated release notes show this feature now, thanks Davide.

-MR

On Wed, Jan 9, 2019 at 10:32 AM Matt Ryan  wrote:

> Hi Davide,
>
> On Wed, Jan 9, 2019 at 10:14 AM Davide Giannella 
> wrote:
>
>> I've produced the release notes and will probably produce the official
>> cut tomorrow morning GMT.
>>
>>
>> https://svn.apache.org/repos/asf/jackrabbit/oak/branches/1.10/RELEASE-NOTES.txt
>>
>> have a look and either commit or suggest any required change.
>>
>>
>>
> Sorry, but I don't see the new direct binary access feature in the release
> notes.  It's OAK-7569.
>
>
> -MR
>


Re: Oak 1.10.0 release plan

2019-01-09 Thread Matt Ryan
Hi Davide,

On Wed, Jan 9, 2019 at 10:14 AM Davide Giannella  wrote:

> I've produced the release notes and will probably produce the official
> cut tomorrow morning GMT.
>
>
> https://svn.apache.org/repos/asf/jackrabbit/oak/branches/1.10/RELEASE-NOTES.txt
>
> have a look and either commit or suggest any required change.
>
>
>
Sorry, but I don't see the new direct binary access feature in the release
notes.  It's OAK-7569.


-MR


Re: Oak 1.10.0 release plan

2019-01-04 Thread Matt Ryan
Hi Davide,

It would be nice to include a fix for the documentation issue brought up
on-list by Alex Klimetschek a couple of weeks ago.  It probably shouldn't
block the release, but I'll see if I can get a fix in for that today.


-MR

On Fri, Jan 4, 2019 at 4:02 AM Davide Giannella  wrote:

> Hello team,
>
> I'm planning to branch 1.10 and cut 1.10.0 on Monday 7th Jan.
>
> If there are any objections please let me know. Otherwise I will
> re-schedule any non-resolved issue for the next iteration.
>
> Thanks
> Davide
>
>
>


Re: [VOTE] Release Apache Jackrabbit Oak 1.8.10

2018-12-17 Thread Matt Ryan
On Mon, Dec 17, 2018 at 7:37 AM Davide Giannella  wrote:

> Please vote on releasing this package as Apache Jackrabbit Oak 1.8.10.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
> [X] +1 Release this package as Apache Jackrabbit Oak 1.8.10
>
>
Where:

[INFO]

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.14.2", arch: "x86_64", family:
"mac"
[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre
[INFO]

[INFO] ALL CHECKS OK


-MR


Re: How does oak cluster work

2018-12-14 Thread Matt Ryan
Hi,

I believe your concern is:  Content could be uploaded to the cluster via
one Oak instance, and your job to process the content runs in a different
Oak instance, and that there is a possibility that the job to process the
content reads from a MongoDB node that has stale data, so the content is
not available yet.

If I've understood your concern correctly, you are correct that this is
something you have to worry about, that there is a possibility that when
the job runs it gets stale data because where it reads from has not been
updated yet.  However, that's not something being caused by Oak; this would
be something you'd have to deal with whether Oak was there or not, no
matter what type of backing database cluster was being used.

Maybe I'm still missing something in your question.  How are you planning
to trigger your job?



On Fri, Dec 14, 2018 at 1:01 PM ems eril  wrote:

> Hi Matt ,
>
>I was looking for more details on the inner workings . I came across
> this https://markmail.org/message/jbkrsmz3krllqghr where it mentioned that
> changes in the cluster would eventually appear across other nodes and this
> is not a mongo specific issue but something oak has introduced . I can set
> the write concern to majority in mongo but if oak has its own eventually
> consistency model this can cause stale reads from other nodes which would
> be a problem with the distributed job Im trying to create.
>
> Thanks
>
> On Fri, Dec 14, 2018 at 8:02 AM Matt Ryan  wrote:
>
> > Hi Emily,
> >
> > Content is stored in Oak in two different configurable storage services.
> > This is a bit of an oversimplification, but basically the structure of
> > content repository - the content tree, nodes, properties, etc. - is
> stored
> > in a Node Store [0] and the binary content is stored in a Blob Store [1]
> > (you'll also sometimes see the term "data store").  Oak manages all of
> this
> > transparently to external clients.
> >
> > Oak clustering is therefore achieved by configuring Oak instances to use
> > clusterable storage services underneath [2].  For the node store, an
> > implementation of a DocumentNodeStore [3] is needed; one such
> > implementation uses MongoDB [4].  For the blob store, an implementation
> of
> > a SharedDataStore is needed.  For example, both the SharedS3DataStore and
> > AzureDataStore implementations can be used as a data store for an Oak
> > cluster.
> >
> > So, assume you were using MongoDB and S3.  Setting up an Oak cluster then
> > merely means that you have more than one Oak instance, each of which is
> > configured to use the MongoDB cluster as the node store, and S3 as the
> data
> > store.
> >
> >
> > [0] -
> >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/overview.md
> > [1] -
> >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/plugins/blobstore.md
> > [2] -
> >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/clustering.md
> > [3] -
> >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md
> > [4] -
> >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/document/mongo-document-store.md
> >
> >
> > Does that help?
> >
> >
> > -MR
> >
> > On Thu, Dec 13, 2018 at 5:52 PM ems eril  wrote:
> >
> > > Hi Team ,
> > >
> > >Im really interested in understanding how oak cluster works and how
> do
> > > cluster nodes sync up . These are some of the questions I have
> > >
> > > 1) How does the nodes sync
> > > 2) What is the mongo role
> > > 3) How does indexes in cluster work and sync up
> > > 4) What is the distributed model master/slave multi master
> > > 5) What is co-ordinated by the master node
> > > 6) How is master node elected
> > >
> > >One use case I have is to be able to leverage a oak cluster to be
> able
> > > to upload images/videos and have a consumer on one of the nodes process
> > it
> > > in a distributed way . I like to try my best to avoid unnecessary read
> > > checks if possible .
> > >
> > > Thanks
> > >
> > > Emily
> > >
> >
>


Re: How does oak cluster work

2018-12-14 Thread Matt Ryan
Hi Emily,

Content is stored in Oak in two different configurable storage services.
This is a bit of an oversimplification, but basically the structure of
content repository - the content tree, nodes, properties, etc. - is stored
in a Node Store [0] and the binary content is stored in a Blob Store [1]
(you'll also sometimes see the term "data store").  Oak manages all of this
transparently to external clients.

Oak clustering is therefore achieved by configuring Oak instances to use
clusterable storage services underneath [2].  For the node store, an
implementation of a DocumentNodeStore [3] is needed; one such
implementation uses MongoDB [4].  For the blob store, an implementation of
a SharedDataStore is needed.  For example, both the SharedS3DataStore and
AzureDataStore implementations can be used as a data store for an Oak
cluster.

So, assume you were using MongoDB and S3.  Setting up an Oak cluster then
merely means that you have more than one Oak instance, each of which is
configured to use the MongoDB cluster as the node store, and S3 as the data
store.


[0] -
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/overview.md
[1] -
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/plugins/blobstore.md
[2] -
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/clustering.md
[3] -
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md
[4] -
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/document/mongo-document-store.md


Does that help?


-MR

On Thu, Dec 13, 2018 at 5:52 PM ems eril  wrote:

> Hi Team ,
>
>Im really interested in understanding how oak cluster works and how do
> cluster nodes sync up . These are some of the questions I have
>
> 1) How does the nodes sync
> 2) What is the mongo role
> 3) How does indexes in cluster work and sync up
> 4) What is the distributed model master/slave multi master
> 5) What is co-ordinated by the master node
> 6) How is master node elected
>
>One use case I have is to be able to leverage a oak cluster to be able
> to upload images/videos and have a consumer on one of the nodes process it
> in a distributed way . I like to try my best to avoid unnecessary read
> checks if possible .
>
> Thanks
>
> Emily
>


Re: [VOTE] Release Apache Jackrabbit Oak 1.9.13

2018-12-12 Thread Matt Ryan
On Mon, Dec 10, 2018 at 9:59 AM Davide Giannella  wrote:

> Please vote on releasing this package as Apache Jackrabbit Oak 1.9.13.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
> [X] +1 Release this package as Apache Jackrabbit Oak 1.9.13
>
>

Where:

[INFO]

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.14.1", arch: "x86_64", family:
"mac"
[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre
[INFO]

[INFO] ALL CHECKS OK


Re: Why don't we move to Git?

2018-12-12 Thread Matt Ryan
I'm +1; as Davide said, after Christmas :)

Working out details seems like a good topic for an upcoming Oakathon,
assuming the general feeling is in favor of the change.

-MR

On Mon, Dec 10, 2018 at 2:58 AM Francesco Mari 
wrote:

> Given the recent announcement about gitbox.apache.org, the seamless
> integration with GitHub, and the fact that many of us already work with Git
> in our daily workflow, what about moving our repositories to Git?
>


Re: [VOTE] Release Apache Jackrabbit 2.19.0

2018-12-07 Thread Matt Ryan
On Fri, Dec 7, 2018 at 3:32 AM Julian Reschke  wrote:

> Please vote on releasing this package as Apache Jackrabbit 2.19.0.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
>  [X] +1 Release this package as Apache Jackrabbit 2.19.0
>  [ ] -1 Do not release this package because...
>

Where:

[INFO]

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.14.1", arch: "x86_64", family:
"mac"
[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre
[INFO]

[INFO] ALL CHECKS OK
-MR


Re: How to find out if similarity search is active - without doing a search

2018-11-20 Thread Matt Ryan
Hi Bertrand,

On Tue, Nov 20, 2018 at 8:00 AM Bertrand Delacretaz 
wrote:

> Hi,
>
> I need to find out whether the Oak similarity search functionality is
> active. I talked to Tommaso and he recommended doing a search under
> /oak:index [1].
>
> That works fine [2] but I need to use a service user to do that, which
> is suboptimal.
>
> Would it be possible for Oak to provide this capability information in
> a different way that does not require a JCR Session? I suppose the
> functionality is available if a specific version of the oak-lucene
> bundle is installed, so the following options come to mind:
>
> a) Adding an OSGi Provide-Capability header to that bundle, that's a
> trivial change to that bundle's build
>
> b) Providing that information in the JCR Repository object's Descriptors
>

This approach was discussed during the most recent Oakathon a couple of
weeks ago; you can see the notes at [0].  IIUC we had a successful outcome
from the prototypes which are linked in that same section.  Next step would
be to formalize the proposal for supporting this as an Oak feature I
believe.

Does this help?

[0] -
https://wiki.apache.org/jackrabbit/Oakathon%20November%202018#Oak_Capabilities


-MR


Re: [VOTE] Release Apache Jackrabbit 2.12.10

2018-11-01 Thread Matt Ryan
On Wed, Oct 31, 2018 at 3:06 AM Julian Reschke  wrote:

>
> Please vote on releasing this package as Apache Jackrabbit 2.12.10.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
>  [X] +1 Release this package as Apache Jackrabbit 2.12.10
>

Where:

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.14", arch: "x86_64", family: "mac"
[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre
[INFO]

[INFO] ALL CHECKS OK


-MR


Re: [VOTE] Release Apache Jackrabbit Oak 1.9.10

2018-11-01 Thread Matt Ryan
On Thu, Nov 1, 2018 at 6:20 AM Davide Giannella  wrote:

>
> Please vote on releasing this package as Apache Jackrabbit Oak 1.9.10.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
> [X] +1 Release this package as Apache Jackrabbit Oak 1.9.10
>
>

Where:

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.14", arch: "x86_64", family: "mac"
[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre
[INFO]

[INFO] ALL CHECKS OK


-MR


Re: [VOTE] Release Apache Jackrabbit Oak 1.9.9

2018-10-10 Thread Matt Ryan
On Tue, Oct 9, 2018 at 7:01 AM Julian Reschke  wrote:

> Please vote on releasing this package as Apache Jackrabbit Oak 1.9.9.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Jackrabbit PMC votes are cast.
>
>  [X] +1 Release this package as Apache Jackrabbit Oak 1.9.9
>
>

Where:

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.13.6", arch: "x86_64", family:
"mac"
[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre
[INFO]

[INFO] ALL CHECKS OK


Re: Release Apache Jackrabbit 2.17.6

2018-10-01 Thread Matt Ryan
On September 30, 2018 at 10:00:51 PM, Julian Reschke (resc...@apache.org) wrote:

Please vote on releasing this package as Apache Jackrabbit 2.17.6. 
The vote is open for the next 72 hours and passes if a majority of at 
least three +1 Jackrabbit PMC votes are cast. 

[X] +1 Release this package as Apache Jackrabbit 2.17.6 



where:

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.13.6", arch: "x86_64", family: "mac"
[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre
[INFO] 
[INFO] ALL CHECKS OK


-MR

Re: [VOTE] Release Apache Jackrabbit Oak 1.4.23

2018-09-26 Thread Matt Ryan
On September 26, 2018 at 7:44:48 AM, Davide Giannella (dav...@apache.org)
wrote:


Please vote on releasing this package as Apache Jackrabbit Oak 1.4.23.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Jackrabbit PMC votes are cast.

[X] +1 Release this package as Apache Jackrabbit Oak 1.4.23



Where:

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.13.6", arch: "x86_64", family:
"mac"
[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre
[INFO]

[INFO] ALL CHECKS OK


-MR


Re: [VOTE] Release Apache Jackrabbit Oak 1.8.8

2018-09-25 Thread Matt Ryan
On September 25, 2018 at 4:33:11 AM, Davide Giannella (dav...@apache.org)
wrote:

Please vote on releasing this package as Apache Jackrabbit Oak 1.8.8.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Jackrabbit PMC votes are cast.

[X] +1 Release this package as Apache Jackrabbit Oak 1.8.8



With:

[INFO]

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.13.6", arch: "x86_64", family:
"mac"
[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre
[INFO]

[INFO] ALL CHECKS OK


-MR


Backporting similarity search to Oak 1.8

2018-09-19 Thread Matt Ryan
Hi oak-dev,

I’ve just created OAK-7769 in which I would like us to consider backporting
the similarity search feature that Tommaso implemented in OAK-7575 to Oak
1.8.  There is a patch file included in the issue which applies cleanly to
1.8 and all unit tests pass.  It includes unit tests written for OAK-7575.
The change is almost all additive, meaning there is little existing
functionality in 1.8 that is being modified.

Tommaso has viewed the patch; IIUC he feels comfortable with it but I will
allow him to speak for himself and comment further one way or the other.


-MR


Re: [VOTE] Release Apache Jackrabbit Oak 1.6.14

2018-09-17 Thread Matt Ryan
On September 12, 2018 at 3:24:34 AM, Davide Giannella (dav...@apache.org)
wrote:


[X] +1 Release this package as Apache Jackrabbit Oak 1.6.14


Where:

Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
OS name: "mac os x", version: "10.13.6", arch: "x86_64", family: "mac"
Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre


-MR


Re: New Jackrabbit committer: Matt Ryan

2018-09-11 Thread Matt Ryan
On September 10, 2018 at 12:46:27 AM, Michael Dürig (mdue...@apache.org)
wrote:

Hi,

Welcome to the team, Matt!

Michael


Thanks Michael and everyone else on the PMC - happy to be here!


-MR


Re: [VOTE] Release Apache Jackrabbit Oak 1.2.30

2018-09-11 Thread Matt Ryan
On September 11, 2018 at 7:40:45 AM, Davide Giannella (dav...@apache.org)
wrote:


[X] +1 Release this package as Apache Jackrabbit Oak 1.2.30
[ ] -1 Do not release this package because...

where:

Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)

[INFO] OS name: "mac os x", version: "10.13.6", arch: "x86_64", family:
"mac"

[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre

[INFO]


[INFO] ALL CHECKS OK


-MR


Re: [VOTE] Release Apache Jackrabbit Oak 1.9.8

2018-08-28 Thread Matt Ryan
On August 28, 2018 at 7:01:30 AM, Davide Giannella (dav...@apache.org)
wrote:


[X] +1 Release this package as Apache Jackrabbit Oak 1.9.8


Where:

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)

[INFO] OS name: "mac os x", version: "10.13.6", arch: "x86_64", family:
"mac"

[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre

[INFO]


[INFO] ALL CHECKS OK


(Non-Binding Vote)


-MR


pull request for OAK-7717

2018-08-27 Thread Matt Ryan
Hi,

Earlier today I created OAK-7717 for a request to change documentation on
the direct binary access feature.  I’ve also submitted a pull request,
https://github.com/apache/jackrabbit-oak/pull/98, with a proposed version
of the change.

Please review and let me know if you prefer the changed version to the
original, or if we should just keep the original, or if a different change
should be made to make the documentation clear.


Thanks!


-MR


Re: [VOTE] Release Apache Jackrabbit 2.16.3

2018-07-31 Thread Matt Ryan
On July 31, 2018 at 1:35:47 AM, Julian Reschke (resc...@apache.org) wrote:

[X] +1 Release this package as Apache Jackrabbit 2.16.3


(nonbinding)

where:
[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)
[INFO] OS name: "mac os x", version: "10.13.6", arch: "x86_64", family:
"mac"
[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre


-MR


Re: [VOTE] Release Apache Jackrabbit Oak 1.8.6

2018-07-31 Thread Matt Ryan
On July 31, 2018 at 9:47:26 AM, Manfred Baedke (manfred.bae...@gmail.com)
wrote:



Please vote on releasing this package as Apache Jackrabbit Oak 1.8.6.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Jackrabbit PMC votes are cast.

[X] +1 Release this package as Apache Jackrabbit Oak 1.8.6

(nonbinding)

where:

[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)

[INFO] OS name: "mac os x", version: "10.13.6", arch: "x86_64", family:
"mac"

[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre


-MR


[DISCUSS] Enabling CI for Oak cloud-based features

2018-07-30 Thread Matt Ryan
Hi,

Oak now has a fair few cloud-based modules - meaning, modules that enable
Oak to make use of cloud service provider capabilities in order for the
feature to work - among them being oak-blob-cloud, oak-blob-cloud-azure,
and oak-segment-azure.

I’m not as familiar with oak-segment-azure, but I do know for
oak-blob-cloud and oak-blob-cloud-azure you need an environment set up to
run the tests including credentials for the corresponding cloud service
provider.  The consequence of this is that there is no regular CI testing
run on these modules, IIUC.

I wanted to kick off a discussion to see what everyone else thinks.  I
think coming up with some form of mock for the cloud objects might be nice,
or even better to use existing Apache-license-friendly ones if there are
some, but maybe others have already gone down this road further or have
better ideas?


-MR


Re: [VOTE] Release Apache Jackrabbit 2.17.5

2018-07-26 Thread Matt Ryan
On July 25, 2018 at 10:50:57 AM, Julian Reschke (resc...@apache.org) wrote:



[X] +1 Release this package as Apache Jackrabbit 2.17.5

(NONBINDING)


where


[INFO] Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe;
2018-06-17T12:33:14-06:00)

[INFO] OS name: "mac os x", version: "10.13.6", arch: "x86_64", family:
"mac"

[INFO] Java version: 1.8.0_77, vendor: Oracle Corporation, runtime:
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre


-MR


[jira] [Comment Edited] (JCR-4335) API for direct binary access

2018-07-24 Thread Matt Ryan (JIRA)


[ 
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554708#comment-16554708
 ] 

Matt Ryan edited comment on JCR-4335 at 7/24/18 7:34 PM:
-

Hmm.  Somehow my last patch file missed an entire Java file, 
{{JackrabbitValueFactory.java}} in {{jackrabbit-api}}.  My apologies for that.  
 [^JCR-4335-v3.patch] contains the missing file.  [~mreutegg] / [~reschke] will 
one of you please apply that file also to the committed changes?


was (Author: mattvryan):
Hmm.  Somehow my last patch file missed an entire Java file, 
`JackrabbitValueFactory.java` in `jackrabbit-api`.  My apologies for that.   
[^JCR-4335-v3.patch] contains the missing file.  [~mreutegg] / [~reschke] will 
one of you please apply that file also to the committed changes?

> API for direct binary access
> 
>
> Key: JCR-4335
> URL: https://issues.apache.org/jira/browse/JCR-4335
> Project: Jackrabbit Content Repository
>  Issue Type: New Feature
>  Components: jackrabbit-api
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Major
> Fix For: 2.18, 2.17.5
>
> Attachments: JCR-4335-v2.patch, JCR-4335-v3.patch, JCR-4335.patch, 
> JCR-4335.patch
>
>
> Jackrabbit Oak proposes to add a new direct binary access capability to the 
> repository. One part of the proposal is to expose this new capability in the 
> Jackrabbit API. For details see OAK-7569 and OAK-7589.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JCR-4335) API for direct binary access

2018-07-24 Thread Matt Ryan (JIRA)


[ 
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554708#comment-16554708
 ] 

Matt Ryan commented on JCR-4335:


Hmm.  Somehow my last patch file missed an entire Java file, 
`JackrabbitValueFactory.java` in `jackrabbit-api`.  My apologies for that.   
[^JCR-4335-v3.patch] contains the missing file.  [~mreutegg] / [~reschke] will 
one of you please apply that file also to the committed changes?

> API for direct binary access
> 
>
> Key: JCR-4335
> URL: https://issues.apache.org/jira/browse/JCR-4335
> Project: Jackrabbit Content Repository
>  Issue Type: New Feature
>  Components: jackrabbit-api
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Major
> Fix For: 2.18, 2.17.5
>
> Attachments: JCR-4335-v2.patch, JCR-4335-v3.patch, JCR-4335.patch, 
> JCR-4335.patch
>
>
> Jackrabbit Oak proposes to add a new direct binary access capability to the 
> repository. One part of the proposal is to expose this new capability in the 
> Jackrabbit API. For details see OAK-7569 and OAK-7589.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (JCR-4335) API for direct binary access

2018-07-24 Thread Matt Ryan (JIRA)


 [ 
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated JCR-4335:
---
Attachment: JCR-4335-v3.patch

> API for direct binary access
> 
>
> Key: JCR-4335
> URL: https://issues.apache.org/jira/browse/JCR-4335
> Project: Jackrabbit Content Repository
>  Issue Type: New Feature
>  Components: jackrabbit-api
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Major
> Fix For: 2.18, 2.17.5
>
> Attachments: JCR-4335-v2.patch, JCR-4335-v3.patch, JCR-4335.patch, 
> JCR-4335.patch
>
>
> Jackrabbit Oak proposes to add a new direct binary access capability to the 
> repository. One part of the proposal is to expose this new capability in the 
> Jackrabbit API. For details see OAK-7569 and OAK-7589.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JCR-4335) API for direct binary access

2018-07-24 Thread Matt Ryan (JIRA)


[ 
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554415#comment-16554415
 ] 

Matt Ryan commented on JCR-4335:


Updated [^JCR-4335-v2.patch] attached which matches the latest [pull 
request|https://github.com/apache/jackrabbit/pull/59].

> API for direct binary access
> 
>
> Key: JCR-4335
> URL: https://issues.apache.org/jira/browse/JCR-4335
> Project: Jackrabbit Content Repository
>  Issue Type: New Feature
>  Components: jackrabbit-api
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Major
> Fix For: 2.18, 2.17.5
>
> Attachments: JCR-4335-v2.patch, JCR-4335.patch, JCR-4335.patch
>
>
> Jackrabbit Oak proposes to add a new direct binary access capability to the 
> repository. One part of the proposal is to expose this new capability in the 
> Jackrabbit API. For details see OAK-7569 and OAK-7589.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (JCR-4335) API for direct binary access

2018-07-24 Thread Matt Ryan (JIRA)


 [ 
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated JCR-4335:
---
Attachment: JCR-4335-v2.patch

> API for direct binary access
> 
>
> Key: JCR-4335
> URL: https://issues.apache.org/jira/browse/JCR-4335
> Project: Jackrabbit Content Repository
>  Issue Type: New Feature
>  Components: jackrabbit-api
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Major
> Fix For: 2.18, 2.17.5
>
> Attachments: JCR-4335-v2.patch, JCR-4335.patch, JCR-4335.patch
>
>
> Jackrabbit Oak proposes to add a new direct binary access capability to the 
> repository. One part of the proposal is to expose this new capability in the 
> Jackrabbit API. For details see OAK-7569 and OAK-7589.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JCR-4335) API for direct binary access

2018-07-24 Thread Matt Ryan (JIRA)


[ 
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554411#comment-16554411
 ] 

Matt Ryan commented on JCR-4335:


I've updated the [pull request|https://github.com/apache/jackrabbit/pull/59] 
based feedback from [~mreutegg] and [~reschke].

> API for direct binary access
> 
>
> Key: JCR-4335
> URL: https://issues.apache.org/jira/browse/JCR-4335
> Project: Jackrabbit Content Repository
>  Issue Type: New Feature
>  Components: jackrabbit-api
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Major
> Fix For: 2.18, 2.17.5
>
> Attachments: JCR-4335.patch, JCR-4335.patch
>
>
> Jackrabbit Oak proposes to add a new direct binary access capability to the 
> repository. One part of the proposal is to expose this new capability in the 
> Jackrabbit API. For details see OAK-7569 and OAK-7589.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JCR-4335) API for direct binary access

2018-07-24 Thread Matt Ryan (JIRA)


[ 
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554408#comment-16554408
 ] 

Matt Ryan commented on JCR-4335:


{quote}Q: how much of these upload URIs actually vary? That is, do they have a 
long common prefix? (The part about the URIs being signed might mean "no"...)
{quote}
I'm afraid the answer is "no" as you suspected.  In my experience, the scheme, 
hostname, and path would all remain the same, but from that point on (query 
params) they would begin to vary.  The bulk of the length of these long URIs is 
probably in the query string - probably about 80% of the total length or more.  
The parameters can differ (for example, some require different "upload" or 
"part" IDs) and the signature then is different also.

It might be possible to gain some efficiency at the expense of increased 
complexity - and higher possibility of error - for clients making use of the 
URIs.

> API for direct binary access
> 
>
> Key: JCR-4335
> URL: https://issues.apache.org/jira/browse/JCR-4335
> Project: Jackrabbit Content Repository
>  Issue Type: New Feature
>  Components: jackrabbit-api
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Major
> Fix For: 2.18, 2.17.5
>
> Attachments: JCR-4335.patch, JCR-4335.patch
>
>
> Jackrabbit Oak proposes to add a new direct binary access capability to the 
> repository. One part of the proposal is to expose this new capability in the 
> Jackrabbit API. For details see OAK-7569 and OAK-7589.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (JCR-4335) API for direct binary access

2018-07-20 Thread Matt Ryan (JIRA)


[ 
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550887#comment-16550887
 ] 

Matt Ryan edited comment on JCR-4335 at 7/20/18 11:27 PM:
--

{quote} - do we really need to parametrize sizes and number of parts? I 
understand that the implementation doing the upload needs this, but why does it 
appear in the API?{quote}
I think they are necessary.  There are a few reasons for stating the number of 
parts, but they mostly center on the potential impact of a very large list of 
URIs, for example to a resulting JSON document.

Assume a JavaScript browser client is interacting with a web endpoint that, in 
turn, is invoking this API.  The JavaScript client wants to upload a binary 
directly, so it is requesting instructions on how to do that from the web 
endpoint.  The web endpoint would then call this API and obtain a 
{{BinaryUpload}} object that it then converts into a JSON document to return to 
the JavaScript client.  The JavaScript client or the web endpoint may have 
limitations on the size of the JSON document that it can support.

IIRC, S3 allows up to 10,000 upload parts in a multi-part upload.  Azure is 
even higher at 50,000.  In my testing, I've seen signed URIs over 500 
characters long.  If a client were unable to specify the number of parts, a 
list of 10,000 upload URIs of over 500 characters each would exceed 5MB in a 
JSON document just for the list of URIs itself.  This may or may not be a 
problem; only the client would know whether accepting a document that large is 
problematic.

The expected size of the upload is also needed for similar reasons, based on 
what the service provider capabilities are.  Some service providers require 
multi-part uploads for binaries above a certain size.  Some do not allow 
multi-part uploads of binaries smaller than a certain size.  Both Azure and S3 
have limits as to the maximum size of a binary that can be uploaded.

If the implementation knows the expected upload size and the number of parts 
the client can accept, then it can determine whether it is possible to perform 
this upload directly or whether the client will need to try to upload it 
through the repository as has been done traditionally.  For example if the 
client wants to upload a 300MB binary but does not support multi-part 
uploading, if the service provider requires multi-part uploading above 250MB 
then this upload request will fail so the client cannot upload this binary 
directly to storage.  However, the Oak backend may be able to handle this 
upload without problems so it could be uploaded the traditional way.


was (Author: mattvryan):
{quote} - do we really need to parametrize sizes and number of parts? I 
understand that the implementation doing the upload needs this, but why does it 
appear in the API?{quote}
I think they are necessary.  There are a few reasons for stating the number of 
parts, but they mostly center on the potential impact of a very large list of 
URIs, for example to a resulting JSON document.

Assume a JavaScript browser client is interacting with a web endpoint that, in 
turn, is invoking this API.  The JavaScript client wants to upload a binary 
directly, so it is requesting instructions on how to do that from the web 
endpoint.  The web endpoint would then call this API and obtain a 
{{BinaryUpload}} object that it then converts into a JSON document to return to 
the JavaScript client.  The JavaScript client or the web endpoint may have 
limitations on the size of the JSON document that it can support.

IIRC, Azure allows up to 10,000 upload parts in a multi-part upload.  S3 is 
even higher at 50,000.  In my testing, I've seen signed URIs over 500 
characters long.  If a client were unable to specify the number of parts, a 
list of 10,000 upload URIs of over 500 characters each would exceed 5MB in a 
JSON document just for the list of URIs itself.  This may or may not be a 
problem; only the client would know whether accepting a document that large is 
problematic.

The expected size of the upload is also needed for similar reasons, based on 
what the service provider capabilities are.  Some service providers require 
multi-part uploads for binaries above a certain size.  Some do not allow 
multi-part uploads of binaries smaller than a certain size.  Both Azure and S3 
have limits as to the maximum size of a binary that can be uploaded.

If the implementation knows the expected upload size and the number of parts 
the client can accept, then it can determine whether it is possible to perform 
this upload directly or whether the client will need to try to upload it 
through the repository as has been done traditionally.  For example if the 
client wants to upload a 300MB binary but does not support multi-part 
uploading, if the service provider requires multi-part uploading above 250MB 
then this upload request will fail so

[jira] [Comment Edited] (JCR-4335) API for direct binary access

2018-07-20 Thread Matt Ryan (JIRA)


[ 
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550887#comment-16550887
 ] 

Matt Ryan edited comment on JCR-4335 at 7/20/18 5:40 PM:
-

{quote} - do we really need to parametrize sizes and number of parts? I 
understand that the implementation doing the upload needs this, but why does it 
appear in the API?{quote}
I think they are necessary.  There are a few reasons for stating the number of 
parts, but they mostly center on the potential impact of a very large list of 
URIs, for example to a resulting JSON document.

Assume a JavaScript browser client is interacting with a web endpoint that, in 
turn, is invoking this API.  The JavaScript client wants to upload a binary 
directly, so it is requesting instructions on how to do that from the web 
endpoint.  The web endpoint would then call this API and obtain a 
{{BinaryUpload}} object that it then converts into a JSON document to return to 
the JavaScript client.  The JavaScript client or the web endpoint may have 
limitations on the size of the JSON document that it can support.

IIRC, Azure allows up to 10,000 upload parts in a multi-part upload.  S3 is 
even higher at 50,000.  In my testing, I've seen signed URIs over 500 
characters long.  If a client were unable to specify the number of parts, a 
list of 10,000 upload URIs of over 500 characters each would exceed 5MB in a 
JSON document just for the list of URIs itself.  This may or may not be a 
problem; only the client would know whether accepting a document that large is 
problematic.

The expected size of the upload is also needed for similar reasons, based on 
what the service provider capabilities are.  Some service providers require 
multi-part uploads for binaries above a certain size.  Some do not allow 
multi-part uploads of binaries smaller than a certain size.  Both Azure and S3 
have limits as to the maximum size of a binary that can be uploaded.

If the implementation knows the expected upload size and the number of parts 
the client can accept, then it can determine whether it is possible to perform 
this upload directly or whether the client will need to try to upload it 
through the repository as has been done traditionally.  For example if the 
client wants to upload a 300MB binary but does not support multi-part 
uploading, if the service provider requires multi-part uploading above 250MB 
then this upload request will fail so the client cannot upload this binary 
directly to storage.  However, the Oak backend may be able to handle this 
upload without problems so it could be uploaded the traditional way.


was (Author: mattvryan):
{quote} - do we really need to parametrize sizes and number of parts? I 
understand that the implementation doing the upload needs this, but why does it 
appear in the API?{quote}
I think they are necessary.  There are a few reasons for stating the number of 
parts, but they mostly center on the impact to a resulting JSON document, for 
example.

Assume a JavaScript browser client is interacting with a web endpoint that, in 
turn, is invoking this API.  The JavaScript client wants to upload a binary 
directly, so it is requesting instructions on how to do that from the web 
endpoint.  The web endpoint would then call this API and obtain a 
{{BinaryUpload}} object that it then converts into a JSON document to return to 
the JavaScript client.  The JavaScript client or the web endpoint may have 
limitations on the size of the JSON document that it can support.

IIRC, Azure allows up to 10,000 upload parts in a multi-part upload.  S3 is 
even higher at 50,000.  In my testing, I've seen signed URIs over 500 
characters long.  If a client were unable to specify the number of parts, a 
list of 10,000 upload URIs of over 500 characters each would exceed 5MB in a 
JSON document just for the list of URIs itself.  This may or may not be a 
problem; only the client would know whether accepting a document that large is 
problematic.

The expected size of the upload is also needed for similar reasons, based on 
what the service provider capabilities are.  Some service providers require 
multi-part uploads for binaries above a certain size.  Some do not allow 
multi-part uploads of binaries smaller than a certain size.  Both Azure and S3 
have limits as to the maximum size of a binary that can be uploaded.

If the implementation knows the expected upload size and the number of parts 
the client can accept, then it can determine whether it is possible to perform 
this upload directly or whether the client will need to try to upload it 
through the repository as has been done traditionally.  For example if the 
client wants to upload a 300MB binary but does not support multi-part 
uploading, if the service provider requires multi-part uploading above 250MB 
then this upload request will fail so the client cannot upload this binary

[jira] [Commented] (JCR-4335) API for direct binary access

2018-07-20 Thread Matt Ryan (JIRA)


[ 
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550887#comment-16550887
 ] 

Matt Ryan commented on JCR-4335:


{quote} - do we really need to parametrize sizes and number of parts? I 
understand that the implementation doing the upload needs this, but why does it 
appear in the API?{quote}
I think they are necessary.  There are a few reasons for stating the number of 
parts, but they mostly center on the impact to a resulting JSON document, for 
example.

Assume a JavaScript browser client is interacting with a web endpoint that, in 
turn, is invoking this API.  The JavaScript client wants to upload a binary 
directly, so it is requesting instructions on how to do that from the web 
endpoint.  The web endpoint would then call this API and obtain a 
{{BinaryUpload}} object that it then converts into a JSON document to return to 
the JavaScript client.  The JavaScript client or the web endpoint may have 
limitations on the size of the JSON document that it can support.

IIRC, Azure allows up to 10,000 upload parts in a multi-part upload.  S3 is 
even higher at 50,000.  In my testing, I've seen signed URIs over 500 
characters long.  If a client were unable to specify the number of parts, a 
list of 10,000 upload URIs of over 500 characters each would exceed 5MB in a 
JSON document just for the list of URIs itself.  This may or may not be a 
problem; only the client would know whether accepting a document that large is 
problematic.

The expected size of the upload is also needed for similar reasons, based on 
what the service provider capabilities are.  Some service providers require 
multi-part uploads for binaries above a certain size.  Some do not allow 
multi-part uploads of binaries smaller than a certain size.  Both Azure and S3 
have limits as to the maximum size of a binary that can be uploaded.

If the implementation knows the expected upload size and the number of parts 
the client can accept, then it can determine whether it is possible to perform 
this upload directly or whether the client will need to try to upload it 
through the repository as has been done traditionally.  For example if the 
client wants to upload a 300MB binary but does not support multi-part 
uploading, if the service provider requires multi-part uploading above 250MB 
then this upload request will fail so the client cannot upload this binary 
directly to storage.  However, the Oak backend may be able to handle this 
upload without problems so it could be uploaded the traditional way.

> API for direct binary access
> 
>
> Key: JCR-4335
> URL: https://issues.apache.org/jira/browse/JCR-4335
> Project: Jackrabbit Content Repository
>  Issue Type: New Feature
>  Components: jackrabbit-api
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Major
> Attachments: JCR-4335.patch, JCR-4335.patch
>
>
> Jackrabbit Oak proposes to add a new direct binary access capability to the 
> repository. One part of the proposal is to expose this new capability in the 
> Jackrabbit API. For details see OAK-7569 and OAK-7589.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JCR-4335) API for direct binary access

2018-07-19 Thread Matt Ryan (JIRA)


[ 
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550262#comment-16550262
 ] 

Matt Ryan commented on JCR-4335:


I've attached [^JCR-4335.patch] in case a patch file is preferred to a GitHub 
pull request.

> API for direct binary access
> 
>
> Key: JCR-4335
> URL: https://issues.apache.org/jira/browse/JCR-4335
> Project: Jackrabbit Content Repository
>  Issue Type: New Feature
>  Components: jackrabbit-api
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Major
> Attachments: JCR-4335.patch
>
>
> Jackrabbit Oak proposes to add a new direct binary access capability to the 
> repository. One part of the proposal is to expose this new capability in the 
> Jackrabbit API. For details see OAK-7569 and OAK-7589.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (JCR-4335) API for direct binary access

2018-07-19 Thread Matt Ryan (JIRA)


 [ 
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated JCR-4335:
---
Attachment: JCR-4335.patch

> API for direct binary access
> 
>
> Key: JCR-4335
> URL: https://issues.apache.org/jira/browse/JCR-4335
> Project: Jackrabbit Content Repository
>  Issue Type: New Feature
>  Components: jackrabbit-api
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Major
> Attachments: JCR-4335.patch
>
>
> Jackrabbit Oak proposes to add a new direct binary access capability to the 
> repository. One part of the proposal is to expose this new capability in the 
> Jackrabbit API. For details see OAK-7569 and OAK-7589.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JCR-4335) API for direct binary access

2018-07-19 Thread Matt Ryan (JIRA)


[ 
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550261#comment-16550261
 ] 

Matt Ryan commented on JCR-4335:


The proposed changed has been implemented can be viewed [on 
GitHub|https://github.com/apache/jackrabbit/pull/59].

This proposed change has been reviewed for the past few weeks with 
contributions and input by [~reschke], [~mreutegg], [~mduerig], 
[~alexander.klimetschek] and others.  I humbly propose that we consider this 
change for addition to Jackrabbit.

> API for direct binary access
> 
>
> Key: JCR-4335
> URL: https://issues.apache.org/jira/browse/JCR-4335
> Project: Jackrabbit Content Repository
>  Issue Type: New Feature
>  Components: jackrabbit-api
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Major
>
> Jackrabbit Oak proposes to add a new direct binary access capability to the 
> repository. One part of the proposal is to expose this new capability in the 
> Jackrabbit API. For details see OAK-7569 and OAK-7589.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Supporting direct binary access - PR submitted

2018-07-19 Thread Matt Ryan
On July 18, 2018 at 4:16:28 AM, Julian Reschke (julian.resc...@gmx.de)
wrote:

On 2018-07-18 09:42, Marcel Reutegger wrote:
>> I would say that these are not *necessarily* HTTP-specific, and thus
>> we should try to define them in a generic way.
>
> I agree when we look at file name and content type, these are also
> present in JCR when you have a nt:file/nt:resource node structure, but
> we probably also want to add a content disposition type to the download
> options. How would you define this in a generic way?
> ...

Adding the disposition type here is a new concept. Why is it needed now
and not before?


Well, RFC-6266 defines that the disposition type is part of the
Content-Disposition header (as you know Julian, since you wrote the RFC) -
a required part of that header, as I understand the RFC.  I assumed that a
client would need a way to specify whether they want “inline” or
“attachment”.

For example, using “attachment” as a default disposition type seems to make
sense to me based on section 4.2 of that RFC (interpreting that section as
it applies to this case).  If the user wanted to generate a web page with
binaries that are rendered inline, they’d need a way to do so.

Do you have a different view?


AFAIU, some applications on top of the JCR API add the "atttachment"
disposition type in order to prevent browsers to run user-provided
HTML/JS. But those do that in a servlet filter, right? We'd have to find
out how to address this use case with the new API...

Best regards, Julian


IIUC doing this in a servlet filter wouldn’t work in this case because the
URI has to be signed for the way it is requested (no modifying the URI in
code after it is obtained) and the request for the binary won’t go through
the JVM (no filtering available).


-MR


Re: Supporting direct binary access - PR submitted

2018-07-17 Thread Matt Ryan
I’ve changed the code to use URIs instead of URLs.

Regarding the HTTP-specific naming:  The latest iteration of the API
requires some HTTP-specific elements that need to be able to be set on
download URIs.  More specifically, these are elements that the
implementation directs a service provider to set headers to certain values
on responses to requests for the signed download URIs.  These are things
like being able to set the response Content-Type or Content-Disposition
header for example.  (This is discussed further in OAK-7637.)

As a result I do think it is evolving to be a bit HTTP-specific.

Are there specific concerns with the naming as it is?


On June 28, 2018 at 4:09:26 AM, Julian Reschke (julian.resc...@gmx.de)
wrote:

On 2018-06-27 12:22, Bertrand Delacretaz wrote:
> On Wed, Jun 27, 2018 at 9:43 AM Julian Reschke 
wrote:
>> ...Is the feature really tied to HTTP(s)? I don't think so. And if a
future
>> platform used a different protocol, the API wouldn't really need to
change...
>
> +1, this is probably more about URIs than HTTP URLs.
>
> -Bertrand

Well, switching into pedantic mode, ...

The term "URL" is not the issue here. Even if it wasn't an HTTP(s) URL,
it would still need to be a locator-typed URI for the functionality to
work.

Best regards, Julian


Re: Oak Direct Binary Access pull request

2018-07-16 Thread Matt Ryan
Hi oak-dev,

I’ve added a new ticket [0] as a subtask to the main ticket for Oak Direct
Binary Access, pertaining to headers that may need to be set in the
responses to download URIs.  Please take a look and chime in on that
discussion if you have an opinion, I think it is an important one to get
right.


[0] - https://issues.apache.org/jira/browse/OAK-7637


-MR

On June 26, 2018 at 5:40:26 PM, Matt Ryan (o...@mvryan.org) wrote:

Hi oak-dev,

Here is the latest on this proposed change.

- I’ve made most of the minor fixes requested in the main pull request:
https://github.com/apache/jackrabbit-oak/pull/88
- Marcel has asked that I submit a separate pull request for one change in
PR #88, namely to add a filter to exclude
“org.apache.jackrabbit.oak.plugins.value.jcr” from BND evaluation in
oak-parent/pom.xml.  I’ve done this and made a new pull request:
https://github.com/apache/jackrabbit-oak/pull/89
- Julian has raised a concern in PR #88 in which he expresses a desire to
use the URI class instead of the URL class.  While PR #88 still uses URL,
I’ve made another pull request using URI instead:
https://github.com/apache/jackrabbit-oak/pull/90  If you have an opinion on
this matter please weigh in at OAK-7574.
- Marcel has asked that API changes in PR #88 be moved out of oak-jcr and
into another location.  In OAK-7589 he expresses this in more detail and
expressed a preference to move these API changes into jackrabbit-api.  I’ve
submitted a pull request to jackrabbit-api with this change:
https://github.com/apache/jackrabbit/pull/59  Since this change would
require modifications to my original pull request to work, I submitted
another pull request to Oak which relies on the jackrabbit-api changes.
This new pull request is at:
https://github.com/apache/jackrabbit-oak/pull/91

Michael has also asked me to try to simplify the original pull request to
make it easier to follow.  I’ve intended to do so but simply have not had
the time, I apologize.

Can progress be made with things as they are currently?  Maybe there are
still some issues to be resolved, but if some of the supporting pull
requests can be accepted at least that would be a good start.


Thanks


-MR

On June 21, 2018 at 9:24:44 PM, Matt Ryan (o...@mvryan.org) wrote:

On June 21, 2018 at 6:53:44 AM, Marcel Reutegger (mreut...@adobe.com.invalid)
wrote:

Hi Matt,

New files in your pull request have a different format for the Apache
License header. Can you please change them to match the format of
existing source files?

Yes - I believe I have fixed this now, let me know if I missed any.



As mentioned in an offline conversion with you already, I'm a bit
concerned of the impact this optional feature has on nearly all layers
of Oak. SessionImpl implements HttpBinaryProvider, MutableRoot
implements HttpBlobProvider, SegmentNodeStore implements
HttpBlobProvider, DocumentNodeStore implements HttpBlobProvider. E.g.
the last two just pass through calls they are not concerned with.

Alternatively, could you do the required plumbing on construction time?
That is, if the BlobStore implements HttpBlobProvider register it with
that interface as well and use it to construct the repository. Something
like:

BlobStore bs = ...
NodeStore ns = ...
Jcr jcr = new Jcr(ns)
if (bs instanceof HttpBlobProvider)
jcr.with((HttpBlobProvider) bs)
Repository r = jcr.createRepository()

By default, the Jcr factory would have a HttpBlobProvider implementation
that doesn't support the feature, which also relieves the repository
implementation from checking the type or for null on every call to the
new feature (as is the case in SessionImpl, MutableRoot,
DocumentNodeStore, SegmentNodeStore).

I added OAK-7570 to discuss this.




I would also prefer if the API used by the client is moved to a separate
module that can be release independently. Yes, we don't do this right
now in Oak, but this may be a good opportunity to try this again.
Releasing the API independently with a stable version lowers the barrier
for consumers to adopt it.

I added OAK-7571 to discuss this.



-MR


Re: Supporting direct binary access - PR submitted

2018-07-13 Thread Matt Ryan
Hi,

I have updated the PR based on feedback from Julian and Bertrand as well as
review from the Oak team with some API changes.  Please let me know what
more I need to do, thanks!

https://github.com/apache/jackrabbit/pull/59



On June 28, 2018 at 4:09:26 AM, Julian Reschke (julian.resc...@gmx.de)
wrote:

On 2018-06-27 12:22, Bertrand Delacretaz wrote:
> On Wed, Jun 27, 2018 at 9:43 AM Julian Reschke 
wrote:
>> ...Is the feature really tied to HTTP(s)? I don't think so. And if a
future
>> platform used a different protocol, the API wouldn't really need to
change...
>
> +1, this is probably more about URIs than HTTP URLs.
>
> -Bertrand

Well, switching into pedantic mode, ...

The term "URL" is not the issue here. Even if it wasn't an HTTP(s) URL,
it would still need to be a locator-typed URI for the functionality to
work.

Best regards, Julian


Re: Oak Direct Binary Access pull request

2018-06-27 Thread Matt Ryan
Hi Bertrand,

On June 27, 2018 at 4:33:05 AM, Bertrand Delacretaz (bdelacre...@apache.org)
wrote:

Hi Matt,

>From the Sling clients perspective I'm interested in making this
somewhat transparent, maybe something like:

For downloads, a client requests
http://my.sling.instance/somebinary.jpg and is redirected to
https://somecloudprovider/23874623748623746234782634273846237846723864.jpg

For uploads, it's a bit more complicated - maybe the client POSTing to
Sling receives a 307 status with a JSON document that describes
where/how to upload. In this case the client requires some knowledge
of this new API, unless someone has a better idea.

Do you see any obstacles in implementing something like this on top of
your suggested API?


It seems to me the download case should work as you’ve described.  Sling
could ask for a download URL, and if it gets one Sling can send a redirect
to that URL; if not, Sling can then issue the request as is currently done
today.

Upload is more complicated because of multi-part uploads.  For example,
Azure requires that a multi-part upload be performed for any binary larger
than 256MB [0].  Both Azure and AWS require multi-part uploads to be done
using a distinct URL for each part (instead of allowing the reuse of the
same URL with Content-Range like Google does [1]).  Thus the new Oak API
needs to support multi-part uploading via distinct URLs.  I’m not sure how
Sling would manage to hide that away from a client via a redirect when
there are potentially multiple URLs involved, without creating a stateful
session or something like that.

Of course Sling could take the return value from the call to initiate the
upload and turn it into a JSON document that the client can then consume.
As you say the client will need to have some knowledge of the new API to do
this.


[0] -
https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob#remarks
[1] -
https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload


-MR


Supporting direct binary access - PR submitted

2018-06-26 Thread Matt Ryan
Hi,

I  have submitted a pull request for a new API feature in jackrabbit-api at
[0].  This addition is intended for use by a new proposed feature for Oak
that allows a client to directly upload or download binaries to/from a
supporting configured blob storage, as described at [1].  A correlated Oak
pull request using this new capability of Jackrabbit is at [2].  (Of course
the PR at [2] won’t build without the change; the original PR prior to
making the change to jackrabbit-api is at [3] if you want to pull it and
try to run it to see how it works.)

Please take a look and let me know what I need to do for this change to be
accepted.


[0] - https://github.com/apache/jackrabbit/pull/59
[1] - https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access
[2] - https://github.com/apache/jackrabbit-oak/pull/91
[3] - https://github.com/apache/jackrabbit-oak/pull/88



Thanks


-MR


Re: Oak Direct Binary Access pull request

2018-06-26 Thread Matt Ryan
Hi oak-dev,

Here is the latest on this proposed change.

- I’ve made most of the minor fixes requested in the main pull request:
https://github.com/apache/jackrabbit-oak/pull/88
- Marcel has asked that I submit a separate pull request for one change in
PR #88, namely to add a filter to exclude
“org.apache.jackrabbit.oak.plugins.value.jcr” from BND evaluation in
oak-parent/pom.xml.  I’ve done this and made a new pull request:
https://github.com/apache/jackrabbit-oak/pull/89
- Julian has raised a concern in PR #88 in which he expresses a desire to
use the URI class instead of the URL class.  While PR #88 still uses URL,
I’ve made another pull request using URI instead:
https://github.com/apache/jackrabbit-oak/pull/90  If you have an opinion on
this matter please weigh in at OAK-7574.
- Marcel has asked that API changes in PR #88 be moved out of oak-jcr and
into another location.  In OAK-7589 he expresses this in more detail and
expressed a preference to move these API changes into jackrabbit-api.  I’ve
submitted a pull request to jackrabbit-api with this change:
https://github.com/apache/jackrabbit/pull/59  Since this change would
require modifications to my original pull request to work, I submitted
another pull request to Oak which relies on the jackrabbit-api changes.
This new pull request is at:
https://github.com/apache/jackrabbit-oak/pull/91

Michael has also asked me to try to simplify the original pull request to
make it easier to follow.  I’ve intended to do so but simply have not had
the time, I apologize.

Can progress be made with things as they are currently?  Maybe there are
still some issues to be resolved, but if some of the supporting pull
requests can be accepted at least that would be a good start.


Thanks


-MR

On June 21, 2018 at 9:24:44 PM, Matt Ryan (o...@mvryan.org) wrote:

On June 21, 2018 at 6:53:44 AM, Marcel Reutegger (mreut...@adobe.com.invalid)
wrote:

Hi Matt,

New files in your pull request have a different format for the Apache
License header. Can you please change them to match the format of
existing source files?

Yes - I believe I have fixed this now, let me know if I missed any.



As mentioned in an offline conversion with you already, I'm a bit
concerned of the impact this optional feature has on nearly all layers
of Oak. SessionImpl implements HttpBinaryProvider, MutableRoot
implements HttpBlobProvider, SegmentNodeStore implements
HttpBlobProvider, DocumentNodeStore implements HttpBlobProvider. E.g.
the last two just pass through calls they are not concerned with.

Alternatively, could you do the required plumbing on construction time?
That is, if the BlobStore implements HttpBlobProvider register it with
that interface as well and use it to construct the repository. Something
like:

BlobStore bs = ...
NodeStore ns = ...
Jcr jcr = new Jcr(ns)
if (bs instanceof HttpBlobProvider)
jcr.with((HttpBlobProvider) bs)
Repository r = jcr.createRepository()

By default, the Jcr factory would have a HttpBlobProvider implementation
that doesn't support the feature, which also relieves the repository
implementation from checking the type or for null on every call to the
new feature (as is the case in SessionImpl, MutableRoot,
DocumentNodeStore, SegmentNodeStore).

I added OAK-7570 to discuss this.




I would also prefer if the API used by the client is moved to a separate
module that can be release independently. Yes, we don't do this right
now in Oak, but this may be a good opportunity to try this again.
Releasing the API independently with a stable version lowers the barrier
for consumers to adopt it.

I added OAK-7571 to discuss this.



-MR


Re: Oak Direct Binary Access pull request

2018-06-21 Thread Matt Ryan
On June 21, 2018 at 6:53:44 AM, Marcel Reutegger (mreut...@adobe.com.invalid)
wrote:

Hi Matt,

New files in your pull request have a different format for the Apache
License header. Can you please change them to match the format of
existing source files?

Yes - I believe I have fixed this now, let me know if I missed any.



As mentioned in an offline conversion with you already, I'm a bit
concerned of the impact this optional feature has on nearly all layers
of Oak. SessionImpl implements HttpBinaryProvider, MutableRoot
implements HttpBlobProvider, SegmentNodeStore implements
HttpBlobProvider, DocumentNodeStore implements HttpBlobProvider. E.g.
the last two just pass through calls they are not concerned with.

Alternatively, could you do the required plumbing on construction time?
That is, if the BlobStore implements HttpBlobProvider register it with
that interface as well and use it to construct the repository. Something
like:

BlobStore bs = ...
NodeStore ns = ...
Jcr jcr = new Jcr(ns)
if (bs instanceof HttpBlobProvider)
jcr.with((HttpBlobProvider) bs)
Repository r = jcr.createRepository()

By default, the Jcr factory would have a HttpBlobProvider implementation
that doesn't support the feature, which also relieves the repository
implementation from checking the type or for null on every call to the
new feature (as is the case in SessionImpl, MutableRoot,
DocumentNodeStore, SegmentNodeStore).

I added OAK-7570 to discuss this.




I would also prefer if the API used by the client is moved to a separate
module that can be release independently. Yes, we don't do this right
now in Oak, but this may be a good opportunity to try this again.
Releasing the API independently with a stable version lowers the barrier
for consumers to adopt it.

I added OAK-7571 to discuss this.



-MR


Re: Oak Direct Binary Access pull request

2018-06-21 Thread Matt Ryan
On June 21, 2018 at 1:35:30 AM, Michael Dürig (mdue...@apache.org) wrote:


Hi,

Any chance for cleaning up the history? This will make it much easier to
review an to maintain once applied.

Certainly; I will try.


I know that this can be a bit of a pain. But in my eyes the revision
history is part of the "code" as much as the code itself and it should
be as easy to read as possible.


Agreed.  I’m also trying to find a way to do that and maintain the history
of who made which changes (since there are multiple authors).


-MR


Re: Oak Direct Binary Access pull request

2018-06-21 Thread Matt Ryan
On June 20, 2018 at 10:25:20 PM, Julian Reschke (julian.resc...@gmx.de)
wrote:

On 2018-06-21 01:21, Matt Ryan wrote:
> Hi,
>
> A pull request [0] has been submitted containing a proposal for a Direct
> Binary Access feature in Oak.

...


>
> Regards,
>
> -MR

Hi Matt,

it would be helpful if you could link to example client code taking
advantage of this extension.

Best regards, Julian



Sure Julian.  There are some integration tests at [1].  Are you looking for
something more than that or does that address your question?

[1] -
https://github.com/mattvryan/jackrabbit-oak/blob/f46f5802e3dc48e1e3c26e2a5f89cbf3abe0ed8a/oak-jcr/src/test/java/org/apache/jackrabbit/oak/jcr/binary/HttpBinaryIT.java


-MR


Re: Oak Direct Binary Access pull request

2018-06-21 Thread Matt Ryan
Hi,

A JIRA issue has been created:
https://issues.apache.org/jira/browse/OAK-7569

At Marcel’s suggestion I have created subtasks for each of the points where
discussions may occur, and will add more as needed.  Feel free to add your
own if you have an item that you think merits further discussion than a
quick resolution on-list.

-MR


On June 20, 2018 at 5:21:39 PM, Matt Ryan (o...@mvryan.org) wrote:

Hi,

A pull request [0] has been submitted containing a proposal for a Direct
Binary Access feature in Oak.  The proposed feature is described at [1].
In a nutshell, it outlines a mechanism by which direct access to binary
data in a cloud-based Oak data store can be made available via signed URLs
with short TTLs.  Such a capability would have a significant positive
impact on Oak scalability.

I’m emailing to request review and discussion based on the proposal.  As
acknowledged in the wiki, there is some similarity to discussions we’ve had
in the past ([2], [3], [4]) but the approach in this proposal is slightly
different.


[0] - https://github.com/apache/jackrabbit-oak/pull/88
[1] - https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access
[2] - https://issues.apache.org/jira/browse/OAK-6575
[3] - https://markmail.org/thread/7eiwvkuv3ybv2vyz
[4] - https://markmail.org/thread/zh6zxdxytnyonqms


Regards,

-MR


Oak Direct Binary Access pull request

2018-06-20 Thread Matt Ryan
Hi,

A pull request [0] has been submitted containing a proposal for a Direct
Binary Access feature in Oak.  The proposed feature is described at [1].
In a nutshell, it outlines a mechanism by which direct access to binary
data in a cloud-based Oak data store can be made available via signed URLs
with short TTLs.  Such a capability would have a significant positive
impact on Oak scalability.

I’m emailing to request review and discussion based on the proposal.  As
acknowledged in the wiki, there is some similarity to discussions we’ve had
in the past ([2], [3], [4]) but the approach in this proposal is slightly
different.


[0] - https://github.com/apache/jackrabbit-oak/pull/88
[1] - https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access
[2] - https://issues.apache.org/jira/browse/OAK-6575
[3] - https://markmail.org/thread/7eiwvkuv3ybv2vyz
[4] - https://markmail.org/thread/zh6zxdxytnyonqms


Regards,

-MR


Re: [VOTE] Release Apache Jackrabbit Oak 1.9.2

2018-05-22 Thread Matt Ryan
+1 (non-binding)


On May 21, 2018 at 7:55:24 AM, Davide Giannella (dav...@apache.org) wrote:



A candidate for the Jackrabbit Oak 1.9.2 release is available at:

https://dist.apache.org/repos/dist/dev/jackrabbit/oak/1.9.2/

The release candidate is a zip archive of the sources in:


https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.9.2/

The SHA1 checksum of the archive is
6b6e546fc31072e994b7202d175329c828c0bd90.

A staged Maven repository is available for review at:

https://repository.apache.org/

The command for running automated checks against this release candidate is:

# run in SVN checkout of
https://dist.apache.org/repos/dist/dev/jackrabbit
$ sh check-release.sh oak 1.9.2
6b6e546fc31072e994b7202d175329c828c0bd90

Please vote on releasing this package as Apache Jackrabbit Oak 1.9.2.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Jackrabbit PMC votes are cast.

[ ] +1 Release this package as Apache Jackrabbit Oak 1.9.2
[ ] -1 Do not release this package because...


Re: revision-/audit-proof archive

2018-04-10 Thread Matt Ryan
Hi Oliver,

Can you provide a bit more detail about what you are looking for?

I think what is unclear to me is whether you are speaking about multiple
Oak instances or just one.  I’m guessing what you have in mind is something
like one Oak instance can write to a repo, and a second Oak instance can
access the repo read-only, thus allowing the second to present the content
but also achieve the “revision-/audit-proof” requirement.

If that resembles what you have in mind, the next question is whether the
second Oak instance needs to also support making changes in a separate
repository or whether it should be strictly read-only.

If that’s not what you were thinking of can you please explain it in a bit
more detail?


HTH


-MR


On April 10, 2018 at 3:07:27 AM, Oliver Lietz (apa...@oliverlietz.de) wrote:

hi,

I'm looking for info how to use Oak in a revision-/audit-proof manner to
archive versioned assets in a (read-only) repo. The only related document
I've
found so far is the talk "Binary Data Management Features in Oak 1.8" from
Matt and Conrad. Any other sources or ideas?

Thanks,
O.


Re: oak-search module

2018-04-04 Thread Matt Ryan
+1 (non-binding)


On April 4, 2018 at 7:51:24 AM, Chetan Mehrotra (chetan.mehro...@gmail.com)
wrote:

+1. In addition we should also include common set of test case which
can be used to validate the SPI implementations. Also we can leave
oak-lucene as is for now and just create new module and implement
oak-lucene-v2 based on that. Once it reaches feature parity we can
remove oak-lucene bundle
Chetan Mehrotra


On Wed, Apr 4, 2018 at 5:43 PM, Thomas Mueller
 wrote:
> +1
>
> On 04.04.18, 10:23, "Tommaso Teofili"  wrote:
>
> Hi all,
>
> In the context of creating an (abstract) implementation for Oak full text
> indexes [1], I'd like to create a new module called _oak-search_.
> Such module will contain:
> - implementation agnostic utilities for full text search (e.g.
aggregation
> utilities)
> - implementation agnostic SPIs to be extended by implementors (currently
we
> expose SPIs in oak-lucene whose signatures include Lucene specific APIs)
> - abstract full text editor / query index implementations
> - text extraction utilities
>
> Please share your feedback / opinions / concerns.
>
> Regards,
> Tommaso
>
> [1] : https://issues.apache.org/jira/browse/OAK-3336
>
>


  1   2   >