Intent to backport OAK-8071

2019-02-28 Thread Michael Dürig



Hi,

I intent to backport the changes we did for OAK-8071: 
http://svn.apache.org/viewvc?rev=1854515&view=rev


This adds some warning logging for specific cases where a commit is 
blocked for a long time (configurable via 
-Doak.segmentNodeStore.commitWaitWarnMillis) on a commit that is already 
in progress. Risk is low as this is pure logging. However it touches on 
some critical code paths.


Michael


Intent to backport OAK-8069

2019-02-21 Thread Michael Dürig



Hi,

I would like to backport OAK-8069 to Oak 1.10 and 1.8. This introduced 
some logging to catch cases where many direct child nodes are added to a 
node transiently. Risk is relatively low as there are no functional 
changes, just logging.


Michael


Re: Intent to backport OAK-8033 to Oak 1.10, 1.8 and 1.6

2019-02-18 Thread Michael Dürig
Merged into 1.6 at
http://svn.apache.org/viewvc?rev=1853814&view=revMerged into 1.8 at
http://svn.apache.org/viewvc?rev=1853813&view=revMerged into 1.10 at
http://svn.apache.org/viewvc?rev=1853812&view=revMichael
On Fri, 15 Feb 2019 at 09:58, Michael Dürig  wrote:
>
>
> Hi,
>
> In intend to backport OAK-8033 [1] to the branches mentioned in the
> subject. This fixes a regression introduced with OAK-7867 [2] that could
> cause data loss after running full compaction.
>
> The risk is relatively low as the fix is quite simple and has shown to
> resolve concessional test failures of
> CompactionAndCleanupIT.testMixedSegments. In addition I added
> CompactionAndCleanupIT.testMixedSegmentsGCGeneration, which fully and
> deterministically covers this issue.
>
>
> [1] https://issues.apache.org/jira/browse/OAK-8033
> [2] https://issues.apache.org/jira/browse/OAK-7867


Intent to backport OAK-8033 to Oak 1.10, 1.8 and 1.6

2019-02-15 Thread Michael Dürig



Hi,

In intend to backport OAK-8033 [1] to the branches mentioned in the 
subject. This fixes a regression introduced with OAK-7867 [2] that could 
cause data loss after running full compaction.


The risk is relatively low as the fix is quite simple and has shown to 
resolve concessional test failures of 
CompactionAndCleanupIT.testMixedSegments. In addition I added 
CompactionAndCleanupIT.testMixedSegmentsGCGeneration, which fully and 
deterministically covers this issue.



[1] https://issues.apache.org/jira/browse/OAK-8033
[2] https://issues.apache.org/jira/browse/OAK-7867


Re: svn commit: r1851789 - in /jackrabbit/oak/trunk: oak-run/src/main/java/org/apache/jackrabbit/oak/run/ oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/tool/ oak-segment-azur

2019-01-25 Thread Michael Dürig



Hi,

I think the quoted implementation of migrateSegments is overly complex. 
Since we apparently need / want to keep the order of the segments in the 
archive across migration there is no way to parallelize writing the 
segments. However, this took me a while to figure out looking at the 
current implementation.


I would suggest to implement this by reading in parallel and putting 
respective futures into a list. Writing is then done by subsequently 
waiting on those futures for completion:


private void migrateSegments(
SegmentArchiveReader reader,
SegmentArchiveWriter writer)
throws ExecutionException, InterruptedException, IOException {

List> futures = new ArrayList<>();
for (SegmentArchiveEntry entry : reader.listSegments()) {
futures.add(executor.submit(() -> {
Segment segment = new Segment(entry);
segment.read(reader);
return segment;
}));
}

for (Future future : futures) {
Segment segment = future.get();
segment.write(writer);
}
}


Michael


On 22.01.19 10:25, adulce...@apache.org wrote:

+private void migrateSegments(SegmentArchiveReader reader, 
SegmentArchiveWriter writer)
+throws InterruptedException, ExecutionException {
+BlockingDeque readDeque = new 
LinkedBlockingDeque<>(READ_THREADS);
+BlockingDeque writeDeque = new 
LinkedBlockingDeque<>(READ_THREADS);
+AtomicBoolean processingFinished = new AtomicBoolean(false);
+AtomicBoolean exception = new AtomicBoolean(false);
+List> futures = new ArrayList<>();
+for (int i = 0; i < READ_THREADS; i++) {
+futures.add(executor.submit(() -> {
+try {
+while (!exception.get() && !(readDeque.isEmpty() && 
processingFinished.get())) {
+Segment segment = readDeque.poll(100, 
TimeUnit.MILLISECONDS);
+if (segment != null) {
+segment.read(reader);
+}
+}
+return null;
+} catch (Exception e) {
+exception.set(true);
+throw e;
+}
+}));
+}
+futures.add(executor.submit(() -> {
+try {
+while (!exception.get() && !(writeDeque.isEmpty() && 
processingFinished.get())) {
+Segment segment = writeDeque.poll(100, 
TimeUnit.MILLISECONDS);
+if (segment != null) {
+while (segment.data == null && !exception.get()) {
+Thread.sleep(10);
+}
+segment.write(writer);
+}
+}
+return null;
+} catch (Exception e) {
+exception.set(true);
+throw e;
+}
+}));
+for (SegmentArchiveEntry entry : reader.listSegments()) {
+Segment segment = new Segment(entry);
+readDeque.putLast(segment);
+writeDeque.putLast(segment);
+}
+processingFinished.set(true);
+for (Future future : futures) {
+future.get();
+}
+}


Intent to backport OAK-7867 to Oak 1.8 and 1.6

2018-11-19 Thread Michael Dürig
Hi,

I intend to backport OAK-7867 to Oak 1.8 and 1.6. This fixes an issue
that can cause sever data loss with the segment node store. There is a
medium to high risk with this backport as it touches some of the core
parts of the segment node store. To mitigate the risk we ran a 14 days
longevity test internally (Adobe), which did not show any significant
difference with any of the tracked metrics. Furthermore I plan to run
each of those backports through the same longevity test before including
them in a release.

Michael


Intend to backport OAK-7838

2018-10-30 Thread Michael Dürig




Hi,

I intend to backport https://issues.apache.org/jira/browse/OAK-7838. 
This fixes an unclosed executor (by removing it). The fix only affects 
monitoring code and is thus rather low risk.



Michael


Intend to backport OAK-7854

2018-10-24 Thread Michael Dürig



Hi,

I intend to backport https://issues.apache.org/jira/browse/OAK-7854. 
This issues adds and additional monitoring endpoint to detect the case 
where the flush thread fails. Although the fix affects the file store, 
the changes are simple and low risk: addition of a timer update whenever 
the a flush is scheduled.


Michael


Intend to backport OAK-7837

2018-10-24 Thread Michael Dürig



Hi,

I intend to backport https://issues.apache.org/jira/browse/OAK-7837. 
This is a simple fix in tooling (oak-run check) preventing it from crash 
under certain circumstances. The fix is simple and the risk is low.


Michael


Intend to backport OAK-7853

2018-10-24 Thread Michael Dürig



Hi,

I intend to backport https://issues.apache.org/jira/browse/OAK-7853.

This fixes an issue that could cause data loss under rare circumstances. 
The fix touches a critical core code of the segment store. However, 
changes are confined to code paths that are rarely executed and very 
limited. There is a regression test covering this issue.


Michael


Jira component for oak-segmemt-azure

2018-10-04 Thread Michael Dürig



Hi,

With more work going into the Azure Segment Store, should we start 
tracking this via a dedicated Jira component?


If there are no objections I suggest to add a segment-azure component.

Michael


New Jackrabbit Committer: Woonsan Ko

2018-09-24 Thread Michael Dürig

Hi,

Please welcome Woonsan Ko as a new committer and PMC member of
the Apache Jackrabbit project. The Jackrabbit PMC recently decided to
offer Woonsan committership based on his contributions. I'm happy to
announce that he accepted the offer and that all the related
administrative work has now been taken care of.

Welcome to the team, Woonsan!

Michael


New Jackrabbit committer: Matt Ryan

2018-09-09 Thread Michael Dürig

Hi,

Please welcome Matt Ryan as a new committer and PMC member of
the Apache Jackrabbit project. The Jackrabbit PMC recently decided to
offer Matt committership based on his contributions. I'm happy to
announce that he accepted the offer and that all the related
administrative work has now been taken care of.

Welcome to the team, Matt!

Michael


Re: Intent to backport OAK-6890

2018-09-06 Thread Michael Dürig




On 05.09.18 11:23, Francesco Mari wrote:

I intend to backport OAK-6890 to the 1.6 branch. The keeps some background
threads alive in the face of unexpected failures. All of these threads are
critical for the correctness of the Segment Store.



+1

Michael


Re: Intent to backport OAK-7721

2018-09-04 Thread Michael Dürig




On 04.09.18 11:14, Francesco Mari wrote:

I intend to backport OAK-7721 to the 1.8 and 1.6 branches. The fix prevents
the segment buffers from being corrupted when too big records are
persisted. The corruption is only detected when the buffer is flushed to
disk, when it's too late to detect the code path that led to the
corruption. The fix fails earlier and louder, preventing the segment buffer
from being corrupted and allowing us to identify the defective code path.



+1. I think this is an important fix as it helps us to prevent corruptions.

Michael


Intent to backport OAK-6648

2018-08-17 Thread Michael Dürig



Hi,

I would like to backport OAK-6648 to Oak 1.6. This fixes an issue with 
offline revision cleanup causing tar files to not being removed under 
some circumstances.


The risk is low as it only contains minor changes to code that is 
executed when the repository is shut down.


Michael




Re: [DISCUSS] Enabling CI for Oak cloud-based features

2018-07-31 Thread Michael Dürig



I wonder how other communities address such issues. I.e. Apache jclouds 
would need to be tested against a rather broad variety of different 
cloud vendors. Maybe its worth to do some research on their approach.


Michael


On 31.07.18 10:01, Amit Jain wrote:

There's one provided by Adobe as well - https://github.com/adobe/S3Mock
But these would have to be enhanced to support the upload/download urls
which I don't think these support. Also, I am not aware of any similar
utility for Azure.

Thanks
Amit

On Tue, Jul 31, 2018 at 1:24 PM Julian Reschke 
wrote:


On 2018-07-30 19:26, Matt Ryan wrote:

Hi,

Oak now has a fair few cloud-based modules - meaning, modules that enable
Oak to make use of cloud service provider capabilities in order for the
feature to work - among them being oak-blob-cloud, oak-blob-cloud-azure,
and oak-segment-azure.

I’m not as familiar with oak-segment-azure, but I do know for
oak-blob-cloud and oak-blob-cloud-azure you need an environment set up to
run the tests including credentials for the corresponding cloud service
provider.  The consequence of this is that there is no regular CI testing
run on these modules, IIUC.

I wanted to kick off a discussion to see what everyone else thinks.  I
think coming up with some form of mock for the cloud objects might be

nice,

or even better to use existing Apache-license-friendly ones if there are
some, but maybe others have already gone down this road further or have
better ideas?
...


FWIW; this has been concerning me as well for quite some time.

Maybe it's worth trying out ?

Best regards, Julian





Re: [VOTE] Release Apache Jackrabbit Oak 1.9.0

2018-04-23 Thread Michael Dürig



On 23.04.18 14:53, Davide Giannella wrote:

     [X] +1 Release this package as Apache Jackrabbit Oak 1.9.0


Michael


Re: Azure Segment Store

2018-03-05 Thread Michael Dürig
> How does it perform compared to TarMK
> a) when the entire repo doesn't fit into RAM allocated to the container ?
> b) when the working set doesn't fit into RAM allocated to the container ?

I think this is some of the things we need to find out along the way.
Currently my thinking is to move from off heap caching (mmap) to on
heap caching (leveraging the segment cache). For that to work we
likely need better understand locality of the working set (see
https://issues.apache.org/jira/browse/OAK-5655) and rethink the
granularity of the cached items. There will likely be many more issues
coming through Jira re. this.

Michael

On 2 March 2018 at 09:45, Ian Boston  wrote:
> Hi Tomek,
> Thank you for the pointers and the description in OAK-6922. It all makes
> sense and seems like a reasonable approach. I assume the description is
> upto date.
>
> How does it perform compared to TarMK
> a) when the entire repo doesn't fit into RAM allocated to the container ?
> b) when the working set doesn't fit into RAM allocated to the container ?
>
> Since you mentioned cost, have you done a cost based analysis of RAM vs
> attached disk, assuming that TarMK has already been highly optimised to
> cope with deployments where the working set may only just fit into RAM ?
>
> IIRC the Azure attached disks mount Azure Blobs behind a kernel block
> device driver and use local SSD to optimise caching (in read and write
> through mode). Since there are a kernel block device they also benefit from
> the linux kernel VFS Disk Cache and support memory mapping via the page
> cache. So An Azure attached disk often behaves like a local SSD (IIUC). I
> realise that some containerisation frameworks in Azure dont yet support
> easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1])
>
> Best regards
> Ian
>
>
> 1 https://azure.microsoft.com/en-us/services/container-service/
> https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv
>
>
>
> On 1 March 2018 at 18:40, Matt Ryan  wrote:
>
>> Hi Tomek,
>>
>> Some time ago (November 2016 Oakathon IIRC) some people explored a similar
>> concept using AWS (S3) instead of Azure.  If you haven’t discussed with
>> them already it may be worth doing so.  IIRC Stefan Egli and I believe
>> Michael Duerig were involved and probably some others as well.
>>
>> -MR
>>
>>
>> On March 1, 2018 at 5:42:07 AM, Tomek Rekawek (reka...@adobe.com.invalid)
>> wrote:
>>
>> Hi Tommaso,
>>
>> so, the goal is to run the Oak in a cloud, in this case Azure. In order to
>> do this in a scalable way (eg. multiple instances on a single VM,
>> containerized), we need to take care of provisioning the sufficient amount
>> of space for the segmentstore. Mounting the physical SSD/HDD disks (in
>> Azure they’re called “Managed Disks” aka EBS in Amazon) has two drawbacks:
>>
>> * it’s expensive,
>> * it’s complex (each disk is a separate /dev/sdX that has to be formatted,
>> mounted, etc.)
>>
>> The point of the Azure Segment Store is to deal with these two issues, by
>> replacing the need for a local file system space with a remote service,
>> that will be (a) cheaper and (b) easier to provision (as it’ll be
>> configured on the application layer rather than VM layer).
>>
>> Another option would be using the Azure File Storage (which mounts the SMB
>> file system, not the “physical” disk). However, in this case we’d have a
>> remote storage that emulates a local one and SegmentMK doesn’t really
>> expect this. Rather than that it’s better to create a full-fledged remote
>> storage implementation, so we can work out the issues caused by the higher
>> latency, etc.
>>
>> Regards,
>> Tomek
>>
>> --
>> Tomek Rękawek | Adobe Research | www.adobe.com
>> reka...@adobe.com
>>
>> > On 1 Mar 2018, at 11:16, Tommaso Teofili 
>> wrote:
>> >
>> > Hi Tomek,
>> >
>> > While I think it's an interesting feature, I'd be also interested to hear
>> > about the user story behind your prototype.
>> >
>> > Regards,
>> > Tommaso
>> >
>> >
>> > Il giorno gio 1 mar 2018 alle ore 10:31 Tomek Rękawek > >
>> > ha scritto:
>> >
>> >> Hello,
>> >>
>> >> I prepared a prototype for the Azure-based Segment Store, which allows
>> to
>> >> persist all the SegmentMK-related resources (segments, journal,
>> manifest,
>> >> etc.) on a remote service, namely the Azure Blob Storage [1]. The whole
>> >> description of the approach, data structure, etc. as well as the patch
>> can
>> >> be found in OAK-6922. It uses the extension points introduced in the
>> >> OAK-6921.
>> >>
>> >> While it’s still an experimental code, I’d like to commit it to trunk
>> >> rather sooner than later. The patch is already pretty big and I’d like
>> to
>> >> avoid developing it “privately” on my own branch. It’s a new, optional
>> >> Maven module, which doesn’t change any existing behaviour of Oak or
>> >> SegmentMK. The only change it makes externally is adding a few exports
>> to
>> >> the oak-segment-tar, so it can use the SPI introduced in the OAK-6921.
>> We
>> >> may

Re: [SegmentStore] Blobs under 16 KB always inlined in tar files?

2018-02-21 Thread Michael Dürig



On 15.02.18 22:06, Alexander Klimetschek wrote:


I would agree on first sight. However, there might be good reasons for the 
current design and these concerns would not be true in practice. The same 
setting is essentially used for both STRING and BINARY properties - maybe it 
makes a lot of sense for Strings, but not so much for immutable binaries?

Could someone shed some light?


The current threshold is based on some statistics collected early on in 
the history of Oak. Numbers might have changed in the meanwhile so 
re-evaluating this makes sense.



IIUC, it also makes the minRecordLength config [3] of the datastore(s) have no 
effect, since that should probably be rather low (default is 100 bytes), given 
it encodes the binary in the blob id itself. But since only binaries larger 
than 16KB will ever reach the blob store (for a segment store setup), all 
binaries will effectively always be larger than minRecordLength.


That configuration is about the blob store. The segment store can make 
its own decisions independently of that setting on whether to inline a 
binary or not.


Michael


Re: Intent to backport OAK-6373 to 1.8

2018-01-31 Thread Michael Dürig


+1. The fix only affects tooling code. With this change we increase 
coverage for detecting corruptions we previously missed.


Michael

On 31.01.18 12:23, Andrei Dulceanu wrote:

Hi All,

I intend to backport OAK-6373 to 1.8 branch. This issue enhances the
behaviour of oak-run check command to traverse all, some or none of the
checkpoints.

Andrei



Re: Fwd: dump content of segment tar files

2018-01-30 Thread Michael Dürig


Hi,

Unfortunately there is currently no OOTB tooling apart from oak-run 
check followed by a manual roll back to repair a repository.


What where you doing with oak-run while the disk run full?

Michael

On 30.01.18 11:14, Torgeir Veimo wrote:

Is there a tool to inspect the content of segment tar files?

I've had a case of oak-run corrupting a repository due to the disk going
full, and need to see if there's any data for the last 24 hours that i can
get back from the segment files remaining.




Intend to backport OAK-7132 to 1.8

2018-01-22 Thread Michael Dürig


Hi,

I intent to backport OAK-7132 to Oak 1.8. This is a fix to a critical 
error in the TarMK persistence layer that could lead to data loss. The 
fix was evaluated in trunk through an internal (to Adobe) longevity test 
for 5 day.


Michael


Re: Oak 1.8.1 release plan

2018-01-19 Thread Michael Dürig


I'm still working on OAK-7132, which is a blocker for 1.8.1. A fix is 
currently in longevity testing. Looking good so far. If it survives the 
weekend I'll merge it into 1.8 first thing Mon. morning. Will keep you 
posted.


Michael

On 18.01.18 11:12, Davide Giannella wrote:

Hello team,

I'm planning to cut Oak on Monday 22nd Jan.

If there are any objections please let me know. Otherwise I will
re-schedule any non-resolved issue for the next iteration.

Thanks
Davide




Re: Intent to backport OAK-7157 to 1.8

2018-01-16 Thread Michael Dürig


+1. This is also relevant for OAK-7132 in the broader scope.

Michael

On 16.01.18 16:45, Francesco Mari wrote:

I intend to backport OAK-7157 to the 1.8 branch. The fix implements an
optimisation for cold standby instances. With the fix in place,
standby instances only retained the latest generation, instead of the
last two generations. This allows a cold standby instance to remove
old segments more aggressively and save space on disk.



Re: Intent to backport OAK-7158 to 1.8

2018-01-16 Thread Michael Dürig


+1. This is in the broader sense part of OAK-7132.

Michael

On 16.01.18 13:53, Francesco Mari wrote:

I intend to backport OAK-7158 to the 1.8 branch. The fix is about
disallowing users from changing the number of generations retained by
the FileStore. Setting the number of retained generations to a value
different than its default might cause data loss due to the way
cleanup works.



Re: [VOTE] Release Apache Jackrabbit Oak 1.8.0

2018-01-09 Thread Michael Dürig



On 09.01.18 12:53, Davide Giannella wrote:

  [X] +1 Release this package as Apache Jackrabbit Oak 1.8.0


Michael


Re: [VOTE] Release Apache Jackrabbit Oak 1.7.14

2017-12-21 Thread Michael Dürig



On 20.12.17 18:46, Davide Giannella wrote:

[X] +1 Release this package as Apache Jackrabbit Oak 1.7.14


Michael


Re: Build failure due to out of heap in oak-solr

2017-12-15 Thread Michael Dürig



On 15.12.17 12:41, Robert Munteanu wrote:

On Fri, 2017-12-15 at 15:18 +0530, Chetan Mehrotra wrote:

Caused by: java.lang.OutOfMemoryError: Java heap space


The build is failing due to OOM in oak-solr-core
https://builds.apache.org/job/Jackrabbit%20Oak/1090/


FWIW, the windows build does not fail ( yay?)

   https://builds.apache.org/job/Jackrabbit-Oak-Windows/

Granted, I have not set any -Xmx flag for this job so I don't know how
much memory it's taking. I see the 'main' job uses -Xmx2g, so maybe we
can bump it up to -Xmx3g, just as a test?



We shouldn't add any memory setting in the Jenins jobs but rely on the 
test.opts.memory property of the parent pom.xml. This is intentionally 
set to 512M so we find out about potential regressions early.


Michael


Re: Experimental build for Oak on Windows

2017-12-07 Thread Michael Dürig


Thanks Robert for taking this up again. Almost exactly a year ago I 
spent some time in understanding Jenkins issues [1]. This showed that 
back then infrastructure problems prevailed by large. Only a few issues 
reported by Jenkins were actual regressions. I would be interested to 
see whether and how the situation changed in the meanwhile.


Michael

[1] 
https://lists.apache.org/thread.html/8f5734bc8a70c6a85f566a7fc98efed088cb55e05ce9dde864625473@%3Coak-dev.jackrabbit.apache.org%3E


On 06.12.17 12:48, Robert Munteanu wrote:

Hi,

I set up yesterday an experimental build for Oak on Windows

   https://builds.apache.org/job/Jackrabbit-Oak-Windows/

It _seems_ to be working fine, but I've marked it as experimental given
the historical stability issues with ASF Windows bots. Feel free to
double-check with it in case you have doubts regarding the status of
the build on Windows.

I'll keep it alive for a couple of weeks to assess its stability, and
then we can discuss whether we want to promote it to a 'proper' job
that we actually pay attention to and that sends notifications.

Thanks,

Robert



Re: identify abandoned oak modules

2017-11-21 Thread Michael Dürig


Not exactly retiring but what about moving oak-pojosr under oak-examples?

Michael

On 21.11.17 16:53, Alex Deparvu wrote:

I think we can also add 'oak-http' to the list.

alex

On Tue, Nov 21, 2017 at 4:04 PM, Francesco Mari 
wrote:


I'm in favour of retiring oak-remote. It is not currently used and it
didn't receive much attention in the recent past.

On Tue, Nov 21, 2017 at 3:56 PM, Angela Schreiber
 wrote:

hi oak devs

looking at the list of modules we have in oak/trunk i get the impression
that some are not actively worked on or maintained.
would it make sense or be possible to retire some of the modules that

were

originally started for productive usage and have been abandoned in the
mean time?

kind regards
angela







Re: Intent to backport OAK-6784

2017-11-21 Thread Michael Dürig



On 21.11.17 15:37, Francesco Mari wrote:

I would like to backport OAK-6784 to 1.6. The Compact tool backend
swallows the exception thrown during its execution. The issue is about
propagating the exception forward, so that the tool frontend might
handle them properly.


+1

Michael


Intent to backport to 1.6 OAK-6931

2017-11-14 Thread Michael Dürig

https://issues.apache.org/jira/browse/OAK-6931

This fixes an issue in the offline compaction tool, which prevents the 
cache size to be set via the command line.


Michael


Re: Oak 1.7.11 release plan

2017-11-06 Thread Michael Dürig


I made https://issues.apache.org/jira/browse/OAK-6894 a blocker. It has 
the potential to break the offline compaction tool causing data loss. We 
need to have a better understanding of the risks before we go forward. 
I'll keep you posted.


Michael

On 02.11.17 11:43, Davide Giannella wrote:

Hello team,

I'm planning to cut Oak on Monday 6th Nov.

If there are any objections please let me know. Otherwise I will
re-schedule any non-resolved issue for the next iteration.

Thanks
Davide




Re: BUILD FAILURE: Jackrabbit Oak - Build # 878 - Still Failing

2017-10-16 Thread Michael Dürig


Hi,

Recent failures seem to be caused by the JIRA plugin. See 
https://issues.apache.org/jira/browse/INFRA-15290#


I disabled that plugin for now: 
https://builds.apache.org/job/Jackrabbit%20Oak/jobConfigHistory/showDiffFiles?timestamp1=2017-10-01_03-35-21×tamp2=2017-10-16_11-54-27


Michael

On 16.10.17 13:29, Apache Jenkins Server wrote:

The Apache Jenkins build system has built Jackrabbit Oak (build #878)

Status: Still Failing

Check console output at https://builds.apache.org/job/Jackrabbit%20Oak/878/ to 
view the results.

Changes:
[chetanm] OAK-6832 - Synchronous nodetype lucene index support

[chetanm] OAK-6831 - Nodetype index support in Lucene Index

Ensure that if jcr:primaryType is indexed then jcr:mixins is also indexed
This is required to ensure consistency when multiple index rules for different
types are defined

[chetanm] OAK-6831 - Nodetype index support in Lucene Index

-- For 'nodeTypeIndex' case disable aggregates

[chetanm] OAK-6831 - Nodetype index support in Lucene Index

-- Support for 'nodeTypeIndex' property at index definition node
-- Support for 'sync' indexRules to indicate that nodeTypeIndex for that
node is to be indexed in 'sync' mode

  


Test results:
All tests passed



Re: Oak Session save behavior

2017-09-27 Thread Michael Dürig


Hi,

As Chetan mentioned JCR sessions are synchronous. However IIUC, you are 
wrapping operations on sessions into futures in your code. So I would 
guess you would have to use the future API to determine the state of the 
wrapped operation (e.g. via a completion handler).


Michael

On 26.09.17 20:24, yogesh upadhyay wrote:

Hello,

We are working on a small CMS using Oak-Core (Version 1.6.0) with Play
framework and RDBMS as datastore.

"mysqlConnectorJava": "mysql:mysql-connector-java:5.1.38",

"oak-jcr": "org.apache.jackrabbit:oak-jcr:1.6.0",



We use the following code for session handle,

public  CompletableFuture withSession(final FunctionWithCE operation) {

   final CompletableFuture completableFuture = new CompletableFuture<>();
   CompletableFuture.runAsync(() -> {
 Session session = null;
 try {
   session = _repository.login(_credentials);
   completableFuture.complete(operation. apply(session));
 } catch (Exception e) {
   logger.error("Something went wrong while using session {}", e);
   completableFuture.completeExceptionally(e);
 } finally {
   if (session != null) {
 session.logout();
   }
 }
   }, (Runnable command) -> {
 ExecutionContexts.repositoryOperation().execute(command);
   });

   return completableFuture;
}



  And this is how repository component is set up

@Inject

public RepositoryComponentImpl(final ApplicationLifecycle applicationLifecycle, 
Database database,
 Configuration configuration) {
   final String jcrUsername = configuration.getString("jcrUsername");
   final String jcrPassword = configuration.getString("jcrPassword");
   if (jcrPassword != null || jcrUsername != null) {
 _credentials = new SimpleCredentials(jcrUsername, 
jcrPassword.toCharArray());
 final DocumentMK.Builder documentMkBuilder = new DocumentMK.Builder();
 documentMkBuilder.setAsyncDelay(3000);
 documentMkBuilder.setPrefetchExternalChanges(true);
 final DocumentNodeStore documentNodeStore =
 
documentMkBuilder.setRDBConnection(database.getDataSource()).getNodeStore();
 _repository = new Jcr(new Oak(documentNodeStore)).createRepository();
 final CompletableFuture setupCompletable = withSession((Session 
session) -> setupRepository(session));
 setupCompletable.exceptionally(e -> {
   /*
* If there is an exception, log the message and rethrow, otherwise it 
would be
* lost in the other thread and not surfaced
*/
   logger.error(e.getMessage(), e);
   throw new CompletionException(e);
 });
 applicationLifecycle.addStopHook(() -> {
   documentNodeStore.dispose();
   return F.Promise.pure(null);
 });
   } else {
 throw new IllegalStateException("Unable to get repository credentials.");
   }
}



And this is how session is being used in save operation,

public CompletableFuture deleteBlob(String blobId) {
   return _repositoryComponent.withSession((Session session) -> {
   session.getNode("/asset/").getNode(blobId).remove();
   session.save();
   return (null);
   });

}


For read operations, we use new session again.

Now the problem we are running into is, after save operation is performed,
there is no good way for us to know if save was done (Since save in oak is
async). So after save we refresh the page, the user sees old content.

We can provide some kind of delay on redirect to remedy this issue but it
won't be good user experience.

We are wondering, if there Is any good way to find if after
"session.save()" is called, data is saved in RDBMS and now all new session
will get updated data? Or we are doing anything wrong here.


Yogesh



Re: Access to NodeStore instance within benchmark

2017-09-26 Thread Michael Dürig


I tend to agree with Davide. Especially since the use case is for 
benchmarking only. Isn't there an alternative way where the node store 
could be exposed via the the benchmark's fixture?


Michael

On 25.09.17 11:00, Davide Giannella wrote:

On 25/09/2017 07:05, Chetan Mehrotra wrote:

One way can be to expose NodeStore instance from the Oak class. Would
that be ok to do?


I don't like it very much as it would make it too easy to access the
NodeStore from a consumer pov. `Oak` itself is already low level
therefore I'm not strongly objecting for adding such feature. However as
the Oak object accept the NodeStore already as part of the constructor I
would see if it is feasible to keep a reference to the NodeStore in the
AbstractTest itself and use it during repository construction.

HTH
Davide




Re: chroot-like content segregation?

2017-09-21 Thread Michael Dürig


Hi,

I agree that on the NodeStore level this is probably easy enough. 
However mind you that the JCR semantics demand for *a lot* of shared 
global state, all of which is implement on top of the NodeStore. It is 
this global state that complicated the composite store implementation 
and in fact required drastic limitations in some places.


If the segregation is security relevant we probably need to come up with 
similar limitations in order to properly sandbox the individual parts. 
E.g. don't leak through observation events, indexes, node type 
constraints leading to exceptions that could leak sensitive information. 
Avoid information exposure through the version store. Etc, etc...


Michael


On 22.09.17 07:03, Tomek Rekawek wrote:

Hello Bertrand,

this seems like an opposite of the composite node store - rather than combining 
multiple repositories together, we’re trying to split one repository into many 
jails. Maybe I’m too optimistic, but I think the implementation should be quite 
easy if done on the node store level.

The node states / builders - the basic objects representing the data on the 
lowest abstraction level - don’t know anything about their parents and their 
paths. The API client just calls NodeStore#getRoot() and gets a node state 
representing the root. If we have the JailedNodeStore, it can go to the 
underlying segment- or document node store and return basically any node (eg. 
/jails/foo). The node store implementation have to take care of transforming it 
back to the right place when the client calls NodeStore#merge().

For instance, the structure for the SegmentMK repository is as follows:

/
/root
/root/content
/root/home
/root/...
/checkpoints/foo/root
/checkpoints/bar/root

Where the /root represents the actual repository root and the /checkpoints 
subtree represents the checkpoints and is not accessible directly. This shows 
how easy it is to return some part of the tree as the root and block the API 
client from accessing other parts laying higher.

Regards,
Tomek



Re: Segment Store GC failing on Windows

2017-09-13 Thread Michael Dürig


Hi,

This is a know issue for some Windows environments. A workaround is to 
set tarmk.mode=32 in the configuration of the SegmentNodeStoreService. 
See also the "Tar storage" section at 
https://helpx.adobe.com/experience-manager/kb/performance-tuning-tips.html.


Michael

https://helpx.adobe.com/experience-manager/kb/performance-tuning-tips.html

On 13.09.17 14:27, Yegor Kozlov wrote:

Hi

Every time Segment Store GC runs I get a bunch of these exceptions:

04.09.2017 07:41:53.157 *WARN* [TarMK filer reaper
[C:\Users\yegor\aem\segmentstore]]
org.apache.jackrabbit.oak.segment.file.FileReaper Unable to remove file
C:\Users\yegor\aem\segmentstore\data00163a.tar
java.nio.file.FileSystemException:
E:\Inetpub\adobe\aem\authorrepository\repository\segmentstore\data00163a.tar:
The process cannot access the file because it is being used by another
process.
at
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
at
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
at
sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
at
sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)
at java.nio.file.Files.deleteIfExists(Files.java:1165)
at
org.apache.jackrabbit.oak.segment.file.FileReaper.reap(FileReaper.java:73)
at
org.apache.jackrabbit.oak.segment.file.FileStore$3.run(FileStore.java:245)
at
org.apache.jackrabbit.oak.segment.file.SafeRunnable.run(SafeRunnable.java:67)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
...


I'm running Oak 1.6.2 from AEM 6.3 SP1 on 64-bit Windows 7. The TarMK mode
is 64 which defaults from the 'sun.arch.data.model' system property.
The problem is reproducible on Windows Server 2012 as well.

It smells like I'm hitting OAK-4274. I perfectly understand it is a JDK
issue and it cannot be fixed from the Oak code. My point is that
SegmentNodeStoreService picks up a wrong TarMK mode on Windows. In the
current implementation the default value is taken from the
'sun.arch.data.model' system property which is always 64 on 64-bit
platforms. IMO it should always default to 32 on Windows and use
'sun.arch.data.model' on other operating systems.


Regards,
Yegor



Intended to backport OAK-6110 to 1.6

2017-09-11 Thread Michael Dürig


Hi,

I intend to backport OAK-6110 to Oak 1.6. See OAK-6642 for a patch.
The fix is very simple and showed big performance and scalability 
improvements compacting repositories with nodes having many siblings.


Michael


Re: OAK-6575 - A word of caution

2017-09-07 Thread Michael Dürig




See https://github.com/mduerig/jackrabbit-oak/commit/2709c097b01
a006784b7011135efcbbe3ce1ba88 for a *really* quickly hacked together and
entirely untested POC. But it should get the idea across though.




Thank you.
That makes sense.
I think it only needs the  java/org/apache/jackrabbit/
oak/blob/cloud/aws/s3/CloudFrontS3SignedUrlAdapterFactory.java and the
API to be inside Oak, everything else can be in Sling.
I'll update my patch and do a 2 options for Sling.




https://github.com/ieb/jackrabbit-oak/compare/trunk...ieb:OAK-6575-3?expand=1

and

https://github.com/apache/sling/compare/trunk...ieb:OAK-6575-3?expand=1

wdyt ?
Obviously the second patch needs to be discussed with Sling dev, but is
should not be too contentious.



+1. I think this approach is lean and mean. Thanks for figuring out the 
things I left out in my initial cruft.


Michael


Re: OAK-6575 - A word of caution

2017-09-06 Thread Michael Dürig



On 06.09.17 23:08, Michael Dürig wrote:


Hi,

On 05.09.17 14:09, Ian Boston wrote:

Repeating the comment to on OAK-6575 here for further discussion. 2 new
Patches exploring both options.


I would actually prefer the original patch 
(https://github.com/ieb/jackrabbit-oak/compare/trunk...ieb:OAK-6575?expand=1) 
in most parts. However I have concerns regarding the generality of the 
new OakConversionService API as mentioned in my previous mail. I would 
be more comfortable if this could be restricted to something that 
resembles more like a "URIProvider", which given a blob returns an URI.


On the implementation side, why do we need to introduce the adaptable 
machinery? Couldn't we re-use the Whiteboard and OSGiWhiteBoard 
mechanisms instead? I think these could be used to track URIProvider 
instances registered by the various blob stores.




See 
https://github.com/mduerig/jackrabbit-oak/commit/2709c097b01a006784b7011135efcbbe3ce1ba88 
for a *really* quickly hacked together and entirely untested POC. But it 
should get the idea across though.


Michael


Re: SegmentNodeStore documentation

2017-09-06 Thread Michael Dürig



On 06.09.17 22:27, Jörg Hoh wrote:

Hi Oak-Devs

I wonder about the documentation at
http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html

In the section "Node records" it's stated:

"A node that contains more than N properties or M child nodes (exact size
TBD, M ~ 1k) is stored differently, using map records for the properties
and child nodes. This way a node can become arbitrarily large and still
remain reasonably efficient to access and modify. The main downside of this
alternative storage layout is that the ordering of child nodes is lost."


Is this a valid statement for Oak 1.x? I would be really surprised, that
starting at an arbitrary number of childness the ordering of childnode gets
lost (for performance reasons). Can someone confirm?


Yes this is a valid statement for all versions of Oak since 1.0.

I think you are misreading it a bit. The sentence "The main downside of 
this alternative storage layout is that the ordering of child nodes is 
lost." does not imply there is some ordering and after a number of nodes 
there in no more ordering. It means that the implementation does not 
make any guarantee regarding the ordering.


Moreover this statement is in no way related to orderability of child 
nodes in JCR. This is a different concern implemented in a layer higher 
up the stack.


Michael




Re: OAK-6575 - A word of caution

2017-09-06 Thread Michael Dürig


Hi,

On 05.09.17 14:09, Ian Boston wrote:

Repeating the comment to on OAK-6575 here for further discussion. 2 new
Patches exploring both options.


I would actually prefer the original patch 
(https://github.com/ieb/jackrabbit-oak/compare/trunk...ieb:OAK-6575?expand=1) 
in most parts. However I have concerns regarding the generality of the 
new OakConversionService API as mentioned in my previous mail. I would 
be more comfortable if this could be restricted to something that 
resembles more like a "URIProvider", which given a blob returns an URI.


On the implementation side, why do we need to introduce the adaptable 
machinery? Couldn't we re-use the Whiteboard and OSGiWhiteBoard 
mechanisms instead? I think these could be used to track URIProvider 
instances registered by the various blob stores.


Michael


Re: OAK-6575 - A word of caution

2017-09-06 Thread Michael Dürig



On 06.09.17 13:59, Ian Boston wrote:

package for each new conversion Oak supported, greatly simplifying
dependencies downstream, especially where the source and target classes
already exist.

If a concrete method is used, the package will need to be versioned
everytime. I suspect OSGi rules will require a minor version number
increment each time, which is going to make a downstream developers life
painful.

In addition if an implementation bundle in Oak decides it wants to
optionally support a conversion, it wont need to version the Oak API to
achieve that. With concrete methods, ever change, wherever they are and
however experimental will require a new version of the Oak API.

This was the reason for going for a wildcard method. It allows extension
without any downstream disruption, missing dependencies or out of band
dependencies.

I think the boils down to how much disruption Oak wants to inflict
downstream to get new capabilities added, or inversely, how open Oak is to
requests for API changes from downstream ?


I agree with everything said here, but there is another way to look at 
it too: this is side stepping backward compatibility concerns by moving 
from a statically typed API to a dynamically typed one effectively 
evading semantic versioning. This eases development and deployment but 
might push problems further out to production.


For this reason (and as I said before) introducing such a general API is 
a significant change and would deserve a discussion on its own revolving 
around the pure API change, mostly decoupled from the "secure URI" 
issue, maybe mentioning it as a use case.






Re: OAK-6575 - A word of caution

2017-09-04 Thread Michael Dürig



On 04.09.17 16:57, Ian Boston wrote:

Hi,
IIUC There are 2 patterns:

1 Emitting a short lived signed URL as per the AWS CloudFront recommended
method of serving private content.


I think this is an area where your patch made a lot of progress. From 
your description my initial concerns in this area have been mostly 
addressed (e.g. no more leakage of implementation details). A missing 
bit would be clarification how this feature interacts with other 
features like active blob deletion / garbage collection. However I think 
it is fine to address those via documentation once we have a full 
understanding.




2 An Oak internal AdapterFactory/AdapterManager pattern to avoid Oak API
changes.


This is the area where some of us were taken by surprise. I'm fine with 
the current solution ("the conversion service") even though I would have 
preferred "the OakValueFactory" approach, which IMO is more consistent 
with the rest of the Oak API. My main concern with this kind of APIs is 
that it in a way circumvents statical type safety and along with it 
statical code analysis (e.g. re. backwards compatibility). OTOH Sling is 
using similar patterns successfully a lot and moving closer to our main 
API consumer is not a bad thing. Also IIUC the conversion service might 
in future line up nicely (and maybe even be replaced by) similar OSGi 
functionality!?


Michael



Would you be willing state your concerns for each one separately ?

Best Regards
Ian

On 4 September 2017 at 15:43, Francesco Mari 
wrote:


I'm in no position to veto the POC and I'm not willing to. I am well
aware of the importance of this feature. I expressed my concerns and
so did others. As the subject of this thread clearly stated, I only
wanted to point out that I had the feeling that we had a "reboot" of
the conversation for no good reason, with the unpleasant side effect
of proposing once again a pattern that received a lot of criticism in
the past.

On Mon, Sep 4, 2017 at 4:18 PM, Bertrand Delacretaz
 wrote:

On Mon, Sep 4, 2017 at 3:44 PM, Ian Boston  wrote:

...I feel
that Oak is weaker without the ability to offload bulk data streaming to
infrastructure designed for that purpose


FWIW as an Oak user I share that feeling, IMO the use cases described
at https://wiki.apache.org/jackrabbit/JCR%20Binary%20Usecase are
becoming more and more important.

Not being a committer I don't really care about the internals, but
please do not "throw the baby out with the bath water" if the
internals need to change.

-Bertrand






Re: OAK-6575 - A word of caution

2017-09-04 Thread Michael Dürig


Hi,

I think the discussion did move forward between the various issues but 
this might have been obfuscated because several topics where discussed 
at the same time. To me the two main topics touched exposure of an URI 
to binaries and an API to expose this from Oak.


In the meanwhile I just noted that Ian replied along the same lines 
already. I will follow up there re. these separate concerns.


Michael




On 04.09.17 16:43, Francesco Mari wrote:

I'm in no position to veto the POC and I'm not willing to. I am well
aware of the importance of this feature. I expressed my concerns and
so did others. As the subject of this thread clearly stated, I only
wanted to point out that I had the feeling that we had a "reboot" of
the conversation for no good reason, with the unpleasant side effect
of proposing once again a pattern that received a lot of criticism in
the past.

On Mon, Sep 4, 2017 at 4:18 PM, Bertrand Delacretaz
 wrote:

On Mon, Sep 4, 2017 at 3:44 PM, Ian Boston  wrote:

...I feel
that Oak is weaker without the ability to offload bulk data streaming to
infrastructure designed for that purpose


FWIW as an Oak user I share that feeling, IMO the use cases described
at https://wiki.apache.org/jackrabbit/JCR%20Binary%20Usecase are
becoming more and more important.

Not being a committer I don't really care about the internals, but
please do not "throw the baby out with the bath water" if the
internals need to change.

-Bertrand


Re: OAK-6575 - A word of caution

2017-09-04 Thread Michael Dürig



On 04.09.17 16:18, Bertrand Delacretaz wrote:

On Mon, Sep 4, 2017 at 3:44 PM, Ian Boston  wrote:

...I feel
that Oak is weaker without the ability to offload bulk data streaming to
infrastructure designed for that purpose


FWIW as an Oak user I share that feeling, IMO the use cases described
at https://wiki.apache.org/jackrabbit/JCR%20Binary%20Usecase are
becoming more and more important.

Not being a committer I don't really care about the internals, but
please do not "throw the baby out with the bath water" if the
internals need to change.



I don't think we are doing that. There is progress on the matter but 
since there are many forces it is slow and non-linear. However, I 
actually prefer this over a rushed solution everybody regrets in a 
couple of months from now.


Michael


Re: reading the checkpoint metadata in the SegmentMK

2017-08-29 Thread Michael Dürig


This thread doesn't seem to want to stay on oak-dev@ ;-). I guess this 
was my fault. Including the part of the conversation that dropped off 
the list below.




On 29.08.17 11:18, Tomek Rekawek wrote:

Hi Michael,

these methods doesn’t provide the information about the creation and expiration 
times. I need those to clone the checkpoints in the side grade.


Right, got it. So the discussion to have is whether we should reflect 
that information through the checkpoint properties. But let's separate 
this from this issue assuming you are fine with the CheckpointAccessor 
approach for now.


I'll follow up with an issue/discussion regarding the checkpoint 
properties.


Michael





Regards,
Tomek

-- Tomek Rękawek | Adobe Research | www.adobe.com reka...@adobe.com

On 29 Aug 2017, at 08:56, Michael Dürig  wrote:


Hi,

Looking at org.apache.jackrabbit.oak.upgrade.checkpoint.CheckpointRetriever I 
wonder whether this cannot be implemented on top of already existing APIs: 
org.apache.jackrabbit.oak.spi.state.NodeStore#checkpoints and 
org.apache.jackrabbit.oak.spi.state.NodeStore#checkpointInfo?

Michael

On 28.08.17 12:41, Tomek Rekawek wrote:

Hello,
the migration code requires access to the checkpoint metadata: the creation and 
expiry timestamps. They can be read by accessing the checkpoints root node 
(using the method mentioned in the subject). However, the method is 
package-scoped. Can we make it public, so the other modules can use it as well?
Alternatively, we may introduce some general way to read that data for all 
NodeStore implementations. Maybe some extra properties in the checkpoint 
properties?
Regards,
Tomek




Re: reading the checkpoint metadata in the SegmentMK

2017-08-28 Thread Michael Dürig


Hi,

I would prefer to not make the 
org.apache.jackrabbit.oak.segment.SegmentNodeStore#getCheckpoints method 
public as this would bind us to a contract how to store the meta data. 
Maybe we could add some API that is agnostic to the storage format?


Michael

On 28.08.17 12:41, Tomek Rekawek wrote:

Hello,

the migration code requires access to the checkpoint metadata: the creation and 
expiry timestamps. They can be read by accessing the checkpoints root node 
(using the method mentioned in the subject). However, the method is 
package-scoped. Can we make it public, so the other modules can use it as well?

Alternatively, we may introduce some general way to read that data for all 
NodeStore implementations. Maybe some extra properties in the checkpoint 
properties?

Regards,
Tomek




Fwd: reading the checkpoint metadata in the SegmentMK

2017-08-28 Thread Michael Dürig


Forwarding from jackrabbit-dev. As the Oak list is probably the right 
place to discuss this.


 Forwarded Message 
Subject: reading the checkpoint metadata in the SegmentMK
Date: Mon, 28 Aug 2017 10:41:24 +
From: Tomek Rekawek 
Reply-To: d...@jackrabbit.apache.org, Tomek Rekawek 
To: d...@jackrabbit.apache.org 

Hello,

the migration code requires access to the checkpoint metadata: the 
creation and expiry timestamps. They can be read by accessing the 
checkpoints root node (using the method mentioned in the subject). 
However, the method is package-scoped. Can we make it public, so the 
other modules can use it as well?


Alternatively, we may introduce some general way to read that data for 
all NodeStore implementations. Maybe some extra properties in the 
checkpoint properties?


Regards,
Tomek


--
Tomek Rękawek | Adobe Research | www.adobe.com
reka...@adobe.com



Re: Notification mails on pull request on github

2017-08-25 Thread Michael Dürig



On 25.08.17 09:47, Chetan Mehrotra wrote:

Hi,

Is anyone getting mail notification for any pull request sent on
https://github.com/apache/jackrabbit-oak/. I see such mails for Sling
but not for Oak



Maybe because https://github.com/asfbot is not watching the Oak repo? I 
checked other ASF mirrors and asfbot is watching all of those.


Michael


Re: OAK-6575 - Provide a secure external URL to a DataStore binary.

2017-08-24 Thread Michael Dürig



On 24.08.17 15:33, Chetan Mehrotra wrote:

Inside Oak it would have its own version of an AdapterManager,
AdapterFactory. the DataStore would implement an AdapterFactory and
register it with the AdapterManager. The OakConversionService
implementation would then use the AdapterManager to perform the conversion.
If no AdapterFactory to adapt from JCR Binary to URI existed, then null
would be returned from the OakConversionService.

Thats no API changes to Blob, binary or anything. No complex transformation
through multiple layers. No instanceof required and no difference between
Sling and non Sling usage.
It does require an Oak version of the AdapterManager and AdapterFactory
concepts, but does not require anything to implement Adaptable.


Thanks for those details. Much clear now. So with this we need not add
adaptTo to all stuff. Instead provide an OakConversionService which
converts the Binary to provided type and then have DataStores provide
the AdapterFactory.

This would indeed avoid any new methods in existing objects and
provide a single entry point.

+1 for this approach


Yay!

Michael



Chetan Mehrotra


On Thu, Aug 24, 2017 at 6:16 AM, Ian Boston  wrote:

Hi,
I am probably not helping as here as there are several layers and I think
they are getting confused between what I am thinking and what you are
thinking.

I was thinking Oak exposed a service to convert along the lines of the OSCi
converter service or the OakConversionService suggested earlier. Both Sling
and other uses of Oak would use it.

Inside Oak it would have its own version of an AdapterManager,
AdapterFactory. the DataStore would implement an AdapterFactory and
register it with the AdapterManager. The OakConversionService
implementation would then use the AdapterManager to perform the conversion.
If no AdapterFactory to adapt from JCR Binary to URI existed, then null
would be returned from the OakConversionService.

Thats no API changes to Blob, binary or anything. No complex transformation
through multiple layers. No instanceof required and no difference between
Sling and non Sling usage.
It does require an Oak version of the AdapterManager and AdapterFactory
concepts, but does not require anything to implement Adaptable.

As I showed in the PoC, all the S3 specific implementation fits inside the
S3DataStore which already does everything required to perform the
conversion. It already goes from Binary -> Blob -> ContentIdentifier -> S3
Key -> S3 URL by virtue of
ValueImpl.getBlob((Value)jcrBinary).getContentIdentifier() -> convert to
S3key and then signed URL.

If it would help, I can do a patch to show how it works.
Best Regards
Ian

On 24 August 2017 at 13:05, Chetan Mehrotra 
wrote:


No API changes to any existing Oak APIs,


Some API needs to be exposed. Note again Oak does not depend on Sling
API. Any such integration code is implemented in Sling Base module
[1]. But that module would still require some API in Oak to provide
such an adaptor

The adaptor proposal here is for enabling layers within Oak to allow
conversion of JCR Binary instance to SignedBinary. Now how this is
exposed to end user depends on usage context

Outside Sling
--

Check if binary instanceof Oak Adaptable. If yes then cast it and adapt it

import org.apache.jackrabbit.oak.api.Adaptable

Binary b = ...
SignedBinary sb  = null
if (b instanceof Adaptable) {
sb = ((Adaptable)b).adaptTo(SignedBinary.class);
}




Within Sling


Have an AdapterManager implemented in Sling JCR Base [1] which uses
above approach

Chetan Mehrotra
[1] https://github.com/apache/sling/tree/trunk/bundles/jcr/base


On Thu, Aug 24, 2017 at 4:55 AM, Ian Boston  wrote:

 From the javadoc in [1]

"The adaptable object may be any non-null object and is not required to
implement the Adaptable interface."


On 24 August 2017 at 12:54, Ian Boston  wrote:


Hi,
That would require javax.jcr.Binary to implement Adaptable, which it

cant.

(OakBinary could but it doesnt need to).

Using Sling AdapterFactory/AdapterManger javadoc (to be replaced with

Oaks

internal version of the same)

What is needed is an AdapterFactory for javax.jcr.Binary to SignedBinary
provided by the S3DataStore itself.

Since javax.jcr.Binary cant extend Adaptable, its not possible to call
binary.adaptTo(SignedBinary.class) without a cast, hence,
the call is done via the AdapterManager[1]

SignedBinary signedBinary =  adapterManager.getAdapter(binary,
SignedBinary.class);

---
You could just jump to
URI uri =  adapterManager.getAdapter(binary, URI.class);

No API changes to any existing Oak APIs,

Best Regards
Ian


1 https://sling.apache.org/apidocs/sling5/org/apache/sling/api/adapter/
AdapterManager.html



On 24 August 2017 at 12:38, Chetan Mehrotra 
wrote:


various layers involved. The bit I don't understand is how the

adaptable

pattern would make those go away. To me that pattern is just another

way to

implement this but it would also need to deal with all those layers.


Re: OAK-6575 - Provide a secure external URL to a DataStore binary.

2017-08-24 Thread Michael Dürig



On 24.08.17 14:47, Chetan Mehrotra wrote:

Which circles back to my initial concern: "According to YAGNI we should stick with 
instance of checks unless we already have a somewhat clear picture of future 
extensions."


I thought that with all those discussion around JCR Usecases for past
some time we have an agreement for such cases (specially UC3 and UC4).
Hence the push for this approach to enable further work on them going
forward.


I wasn't referring to those use cases but to the adapter pattern. That 
pattern as it was proposed here is far more general than what is 
required for the problem at hand. It represents a shift in paradigm of 
what we used to do in Oak. That's why I asked for a "somewhat clear 
picture of future extensions". Introducing the adapter pattern in its 
full generality would IMO even deserve an own discussion thread instead 
of being piggy backed on the secure URL discussion.


Michael




Chetan Mehrotra


On Thu, Aug 24, 2017 at 5:41 AM, Michael Dürig  wrote:



On 24.08.17 14:32, Chetan Mehrotra wrote:


Why not just add a method Blob.getSignedURI()? This would be inline with
getReference() and what we have done with ReferenceBinary.



Can be done. But later if we decide to support adapting to say
FileChannel [1] then would we be adding that to Blob. Though it may
not be related to different Blob types.

Having adaptable support allows to extend this later with minimal changes.



Which circles back to my initial concern: "According to YAGNI we should
stick with instance of checks unless we already have a somewhat clear
picture of future extensions."

Michael




Chetan Mehrotra
[1] https://wiki.apache.org/jackrabbit/JCR%20Binary%20Usecase#UC4


On Thu, Aug 24, 2017 at 5:25 AM, Michael Dürig  wrote:




On 24.08.17 13:38, Chetan Mehrotra wrote:



various layers involved. The bit I don't understand is how the
adaptable
pattern would make those go away. To me that pattern is just another
way
to
implement this but it would also need to deal with all those layers.




Yes this adapter support would need to be implement at all layers.

So call to
1. binary.adaptTo(SignedBinary.class) //binary is JCR Binary
2. results in blob.adaptTo(SignedBinary.class) //blob is Oak Blob.
Blob interface would extend adaptable




Why not just add a method Blob.getSignedURI()? This would be inline with
getReference() and what we have done with ReferenceBinary.

Michael



3. results in SegmentBlob delegating to BlobStoreBlob which
4. delegates to BlobStore // Here just passing the BlobId
5. which delegates to DataStoreBlobStore
6. which delegates to S3DataStore
7. which returns the SignedBinary implementation

However adapter support would allow us to make this instance of check
extensible. Otherwise we would be hardcoding instance of check to
SignedBinary at each of the above place though those layers need not
be aware of SignedBinary support (its specific to S3 impl)









Re: OAK-6575 - Provide a secure external URL to a DataStore binary.

2017-08-24 Thread Michael Dürig



On 24.08.17 14:32, Chetan Mehrotra wrote:

Why not just add a method Blob.getSignedURI()? This would be inline with 
getReference() and what we have done with ReferenceBinary.


Can be done. But later if we decide to support adapting to say
FileChannel [1] then would we be adding that to Blob. Though it may
not be related to different Blob types.

Having adaptable support allows to extend this later with minimal changes.


Which circles back to my initial concern: "According to YAGNI we should 
stick with instance of checks unless we already have a somewhat clear 
picture of future extensions."


Michael




Chetan Mehrotra
[1] https://wiki.apache.org/jackrabbit/JCR%20Binary%20Usecase#UC4


On Thu, Aug 24, 2017 at 5:25 AM, Michael Dürig  wrote:



On 24.08.17 13:38, Chetan Mehrotra wrote:


various layers involved. The bit I don't understand is how the adaptable
pattern would make those go away. To me that pattern is just another way
to
implement this but it would also need to deal with all those layers.



Yes this adapter support would need to be implement at all layers.

So call to
1. binary.adaptTo(SignedBinary.class) //binary is JCR Binary
2. results in blob.adaptTo(SignedBinary.class) //blob is Oak Blob.
Blob interface would extend adaptable



Why not just add a method Blob.getSignedURI()? This would be inline with
getReference() and what we have done with ReferenceBinary.

Michael



3. results in SegmentBlob delegating to BlobStoreBlob which
4. delegates to BlobStore // Here just passing the BlobId
5. which delegates to DataStoreBlobStore
6. which delegates to S3DataStore
7. which returns the SignedBinary implementation

However adapter support would allow us to make this instance of check
extensible. Otherwise we would be hardcoding instance of check to
SignedBinary at each of the above place though those layers need not
be aware of SignedBinary support (its specific to S3 impl)






Re: OAK-6575 - Provide a secure external URL to a DataStore binary.

2017-08-24 Thread Michael Dürig



On 24.08.17 13:38, Chetan Mehrotra wrote:

various layers involved. The bit I don't understand is how the adaptable
pattern would make those go away. To me that pattern is just another way to
implement this but it would also need to deal with all those layers.


Yes this adapter support would need to be implement at all layers.

So call to
1. binary.adaptTo(SignedBinary.class) //binary is JCR Binary
2. results in blob.adaptTo(SignedBinary.class) //blob is Oak Blob.
Blob interface would extend adaptable


Why not just add a method Blob.getSignedURI()? This would be inline with 
getReference() and what we have done with ReferenceBinary.


Michael


3. results in SegmentBlob delegating to BlobStoreBlob which
4. delegates to BlobStore // Here just passing the BlobId
5. which delegates to DataStoreBlobStore
6. which delegates to S3DataStore
7. which returns the SignedBinary implementation

However adapter support would allow us to make this instance of check
extensible. Otherwise we would be hardcoding instance of check to
SignedBinary at each of the above place though those layers need not
be aware of SignedBinary support (its specific to S3 impl)





Re: OAK-6575 - Provide a secure external URL to a DataStore binary.

2017-08-24 Thread Michael Dürig



On 24.08.17 13:54, Ian Boston wrote:

You could just jump to
URI uri =  adapterManager.getAdapter(binary, URI.class);

No API changes to any existing Oak APIs,


+1, I think this is what we should aim for.

Michael


Re: OAK-6575 - Provide a secure external URL to a DataStore binary.

2017-08-24 Thread Michael Dürig


I understand the difficulties involved with implementing this due to the 
various layers involved. The bit I don't understand is how the adaptable 
pattern would make those go away. To me that pattern is just another way 
to implement this but it would also need to deal with all those layers.


Michael

On 24.08.17 11:08, Chetan Mehrotra wrote:

As explained in previous mail adaptable pattern requirement is to
enable such a support within Oak itself. due to multiple layers
involved.


If it doesnt exist then perhaps Oak could add a Service interface that
deals with conversions, rather than expose a second adaptable pattern in
Sling, or require type casting and instanceof.


We can expose such a service also if that helps. That service
implementation internally would anyway have to use adaptable pattern.

public class OakConversionService{
  AdapterType adaptTo(Binary b, Class type) {
 if (b instanceof Adaptable) {
 return (type)b.adaptTo(type);
 }
 return null
 }
}

So just another level of abstraction.

Chetan Mehrotra



Re: OAK-6575 - Provide a secure external URL to a DataStore binary.

2017-08-24 Thread Michael Dürig



On 24.08.17 09:27, Ian Boston wrote:

On 24 August 2017 at 08:18, Michael Dürig  wrote:




URI uri = ((OakValueFactory) valueFactory).getSignedURI(binProp);



+1

One point
Users in Sling dont know abou Oak, they know about JCR.

URI uri = ((OakValueFactory)
valueFactory).getSignedURI(jcrNode.getProperty("jcr:data"));

No new APIs, let OakValueFactory work it out and return null if it cant do
it. It should also handle a null parameter.
(I assume OakValueFactory already exists)


No, OakValueFactory does not exist as API (yet). But adding it would be 
more inline with how we approached the Oak API traditionally.


I'm not against introducing the adaptable pattern but would like to 
understand whether there is concrete enough use cases beyond the current 
one to warrant it.


Michael



If you want to make it extensible

 T convertTo(Object source, Class target);

used as

URI uri = ((OakValueFactory)
valueFactory). convertTo(jcrNode.getProperty("jcr:data"), URI.class);

The user doesnt know or need to know the URI is signed, it needs a URI that
can be resolved.
Oak wants it to be signed.

Best Regards
Ian




Michael





A rough sketch of any alternative proposal would be helpful to decide
how to move forward

Chetan Mehrotra






Re: OAK-6575 - Provide a secure external URL to a DataStore binary.

2017-08-24 Thread Michael Dürig



On 24.08.17 09:06, Chetan Mehrotra wrote:

I think the discussion about the adapter pattern is orthogonal to the binary


For me its tied to how you are going to implement this support.
Adaptable patterns is one way based on my current understand of Oak
design.

At level of ValueFactory.getBinary we do not know if the Blob can
provide a signed url. Its deep down in the layer JCR Binary -> Oak
Blob -> DataStoreBlob -> S3DataStore DataRecord. So each Blob cannot
provided a signed url and it depends on backing DataStore. This can be
easily supported via adaptor pattern where JCR layer tries to adapt
and then final backing BlobStore impl decides to provide the adaption
implementation.

I do not see how instance of checks can be expressed across all these layers


Fair point. So this is more about dynamic adaptability than future 
extendibility. But AFIU this could still be achieved without the full 
adaptable machinery:


if (binProp instanceOf SignableBin) {
  URI uri = ((SignableBin) binProp).getSignedURI();
  if (uri != null) {
// resolve URI etc.
  }
}

Or alternatively something along the lines of:

URI uri = ((OakValueFactory) valueFactory).getSignedURI(binProp);


Michael




A rough sketch of any alternative proposal would be helpful to decide
how to move forward

Chetan Mehrotra



Re: OAK-6575 - Provide a secure external URL to a DataStore binary.

2017-08-23 Thread Michael Dürig


Hi,

I think the discussion about the adapter pattern is orthogonal to the 
binary issue. According to YAGNI we should stick with instance of checks 
unless we already have a somewhat clear picture of future extensions.


Michael

On 24.08.17 07:28, Chetan Mehrotra wrote:

Based on the feedback so far below is revised proposal

1. Define a new Adaptable interface in 'org.apache.jackrabbit.oak.api'

public interface Adaptable {

 /**
  * Adapts the binary to another type
  *
  * @param  The generic type to which this type is adapted
  *to
  * @param type The Class object of the target type
  * @return The adapter target or null if the type cannot
  * adapt to the requested type
  */
  AdapterType adaptTo(Class type);
}

2. Have the binary implementation in Oak implement Adaptable
3. Have a minimal implementation in Oak on line of Sling Adaptor support [1]

For current usecase we would provide an adaptation to SignedBinary

public interface SignedBinary {

 URI getUri()
}

Chetan Mehrotra

[1] 
https://github.com/apache/sling/tree/trunk/bundles/api/src/main/java/org/apache/sling/api/adapter


On Wed, Aug 23, 2017 at 10:04 PM, Chetan Mehrotra
 wrote:

Hence, why not simply use  binaryProp instanceof SignedBinary ?


As Julian mentioned it would make it tricky to support multiple
extensions with various permutations. Having adapter support for
simplify the implementation


No client should be issued a signed url that could be used in the distant
(relatively) future bypassing fresh ACL constraints saved to Oak.


Fair point. Then lets drop the ttl paramater

Chetan Mehrotra


Re: [VOTE] Release Apache Jackrabbit Oak 1.7.6

2017-08-22 Thread Michael Dürig



On 22.08.17 08:52, Amit Jain wrote:

[x] +1 Release this package as Apache Jackrabbit Oak 1.7.6


+1

Michael


Re: Backporting cold standby chunking to 1.6.x?

2017-08-21 Thread Michael Dürig


Same here. Let's wait for a concrete case. Hopefully until then that 
feature already had a bit of "real world" coverage.


Michael


On 21.08.17 09:12, Francesco Mari wrote:

I wouldn't backport unless strictly necessary. In my opinion, this is
not a bug but an improvement.

On Mon, Aug 21, 2017 at 9:03 AM, Andrei Dulceanu
 wrote:

Hi all,

With [0] and [1] blob chunking in cold standby was addressed in 1.8. I
think now we have a stable and robust solution which got rid of the 2.14
GB/blob limitation. As a positive side-effect, the memory footprint needed
for a successful sync of a big blob reduced considerably. While previously
4GB of heap memory were needed for syncing 1GB blob, now only 512MB are
needed for the same operation.

Considering all the above, I was wondering if it would make sense to
backport these fixes to 1.6.x. I know that traditionally we only backport
bug fixes, but depending on how you look at it, the limitation was also
kind of a bug :). I was only considering 1.6.x as a candidate branch
because the cold standby code in 1.8 and 1.6.x is 98% the same.

Thanks,

Andrei

[0] https://issues.apache.org/jira/browse/OAK-5902

[1] https://issues.apache.org/jira/browse/OAK-6565


Re: Percentile implementation

2017-07-04 Thread Michael Dürig



On 04.07.17 11:15, Francesco Mari wrote:

2017-07-04 10:52 GMT+02:00 Andrei Dulceanu :

Now my question is this: do we have a simple percentile implementation in
Oak (I didn't find one)?


I'm not aware of a percentile implementation in Oak.


If not, would you recommend writing my own or adapting/extracting an
existing one in a utility class?


In the past we copied and pasted source code from other projects in
Oak. As long as the license allows it and proper attribution is given,
it shouldn't be a problem. That said, I'm not a big fan of either
rewriting an implementation from scratch or copying and pasting source
code from other projects. Is exposing a percentile really necessary?
If yes, how big of a problem is embedding of commons-math3?



We should avoid copy paste as we might miss important fixes in later 
releases. I only did this once for some code where we needed a fix that 
wasn't yet released. It was a hassle.
I would just add a dependency to commons-math3. Its a library exposing 
the functionality we require, so let's use it.


Michael


Re: [DiSCUSS] - highly vs rarely used data

2017-06-26 Thread Michael Dürig


Hi,

I agree we should have a better look at access patterns, not only for 
indexing. I recently came across a repository with about 65% of its 
content in the version store. That content is pretty much archived and 
never accessed. Yet it fragments the index and thus impacts general 
access times.


Michael

On 23.06.17 10:22, Tommaso Teofili wrote:

Hi all,

recently I've been at a conference [1] where I attended an interesting
keynote about data management [2] (I think it refers to this 2016 paper
[3]).

Apart from the approaches proposed to solve the data management problem
(e.g. get rid of DBMSs!) I got interested in the discussion about how we
deal with the increasing amount of data that we have to manage (also
because of some issues we have [4]).
In many systems only a very small subset of the data is used because the
amount of information users really need refers only to most recently
ingested data (e.g. social networks); while that doesn't always apply for
content repositories in general (e.g. if you build a CMS on top of it) I
think it's interesting to think about whether we can optimize our
persistence layer to work better with highly used data (e.g. more recent)
and use less space/cpu for data that is used more rarely.

For example, putting this together with the incremental indexing section of
the paper [3] I was thinking (but that's already a solution rather than
"just" a discussion) perhaps we could simply avoid indexing *some* content
until it's needed (e.g. the first time you get traversal, then index so
that next query over same data will be faster) but that's just an example.

What do others think ?
Regards,
Tommaso

[1] : http://www.iccs-meeting.org/iccs2017/
[2] : http://www.iccs-meeting.org/iccs2017/keynote-lectures/#Ailamaki
[3] : https://infoscience.epfl.ch/record/219993/files/p12-pavlovic.pdf
[4] : https://issues.apache.org/jira/browse/OAK-5192



Re: copy on write node store

2017-05-30 Thread Michael Dürig



On 30.05.17 09:34, Tomek Rekawek wrote:

Hello Michael,

thanks for the reply!


On 30 May 2017, at 09:18, Michael Dürig  wrote:
AFAIU from your mail and from looking at the patch this is about a node store 
implementation that can be rolled back to a previous state.

If this is the case, a simpler way to achieve this might be to use the TarMK 
and and add functionality for rolling it back.


Indeed, it would be much simpler. However, the main purpose of the new feature 
is testing the blue-green Sling deployments. That’s why we need the DocumentMK 
to support it as well.


Ok I see. I think the fact that these classes are not for production use 
should be stated in the Javadoc along with what clarifications of what 
can be expected from the store wrt. interleaving of calls to various 
mutators (e.g. enableCopyOnWrite() / disableCopyOnWrite() / merge(), 
etc.). I foresee a couple of very sneaky race conditions here.


Michael


Re: copy on write node store

2017-05-30 Thread Michael Dürig


Hi Tomek,

AFAIU from your mail and from looking at the patch this is about a node 
store implementation that can be rolled back to a previous state.


If this is the case, a simpler way to achieve this might be to use the 
TarMK and and add functionality for rolling it back. The cleanest way 
would be to add a method SegmentNodeStore.rollBack(String checkpoint), 
whose implementation is basically a call to Revisions.setHead(...) 
passing the state of the checkpoint.
Implementation wise there might be a few hoops as there is this probably 
needs to be looped through the Scheduler. But I guess Andrei might be 
able to help out on the details here.


Michael


On 29.05.17 10:50, Tomek Rekawek wrote:

Hello,

in the OAK-6220 I’m exploring a topic of having a switchable copy-on-write node 
store implementation. The idea is that the “main” node store (eg. DocumentMK) 
is wrapped with an extra layer (copy-on-write node store), which can be turned 
on/off in the runtime. When the copy-on-write is turned on, all the new changes 
are not merged with the main store, but kept in a separate, volatile store.

The new mode is meant to be used for testing - so we can perform even 
destructible tests and then reverse all the changes seamlessly. It’s especially 
useful in the blue-green deployments with CompositeNodeStore and DocumentMK, 
since we can test the new version of the application on the new (green) 
instance, even if the tests requires changes in the node schema. The changes 
won’t be propagated to the old (blue) instance as long as the COW mode is on.

Together with other people involved in the issue we had 3 ideas how this can be 
implemented:

1. By copying the / node and its subtree into some private location and then 
mount the COW store on top of it.

This works fine for the SegmentMK (supporting the copy-by-reference), but not 
with the DocumentMK (which actually copied the whole tree). Since the new 
feature is more useful with DocumentMK, we needed to find something else.

2. By storing the data in a NodeBuilder taken from the store without merging it 
back to the main repository.

This seemed to worked fine, but because of the DocumentMK limitations 
(OAK-1838) this wasn’t reliable.

3. By creating a MemoryNodeStore on a top of the recent root state

This is the current implementation, it works fine [1]. The newly created 
MemoryNodeStore didn’t contain any checkpoints, so some extra layer 
(BranchNodeStore) was introduced to inherit the already existing checkpoints 
from the main store. Another layer (CowNodeStore) is being used to dynamically 
switch between the main and the branch node store.

Potential limitation here is that the changes have to fit into memory. 
Switching the repository into COW mode and forgetting about this is not a good 
idea.

I’d like to merge the [1], so the blue-green Sling deployments can be tested in 
the more robust way. Any thoughts?

Regards,
Tomek

[1] https://issues.apache.org/jira/secure/attachment/12868273/OAK-6220-3.patch



Re: svn commit: r1796624 - /jackrabbit/oak/trunk/oak-store-spi/src/main/java/org/apache/jackrabbit/oak/spi/state/AbstractNodeState.java

2017-05-29 Thread Michael Dürig



On 30.05.17 07:00, Chetan Mehrotra wrote:

On Mon, May 29, 2017 at 6:50 PM,   wrote:

+private static final int CHILDREN_CAP = getInteger("children.cap", 100);


Better to have system property prefix with 'oak' i.e. 'oak.children.cap'



Agreed. Done at  http://svn.apache.org/viewvc?rev=1796728&view=rev

Michael


Re: Oakathon: Oak Hackathon

2017-05-24 Thread Michael Dürig





The page is marked as ImmutablePage to me so I can't edit it. Username
is RobertMunteanu.


I added you contributor: 
https://wiki.apache.org/jackrabbit/ContributorsGroup. Should work now.


Michael


Re: [VOTE] Release Apache Jackrabbit Oak 1.7.0

2017-05-24 Thread Michael Dürig



On 24.05.17 15:35, Davide Giannella wrote:

[X] +1 Release this package as Apache Jackrabbit Oak 1.7.0


Michael


Oakathon: Oak Hackathon

2017-05-24 Thread Michael Dürig


Hi,

We are going to hold an Oak Hackathon in Basel, Switzerland from August 
21st to August 25th. This is a developers workshop with the goal to 
advance the overall state of Oak by fixing bugs, implementing new 
features or coming up with interesting POCs.


Please indicate your attendance on the Oakathon Wiki page [1]. Feel free 
to propose topics and discussion items in the respective table or 
subscribe for already proposed topics by putting your name in the box 
next to it.


Once closer to the date we will start discussing and finalising the 
agenda here on the list.


Michael


[1] https://wiki.apache.org/jackrabbit/Oakathon%20August%202017


Re: Intent to backport to 1.6 and 1.4: OAK-5741

2017-05-22 Thread Michael Dürig


Back to the list, which I unintentionally dropped.


On 22.05.17 10:19, Julian Reschke wrote:

On 2017-05-22 10:16, Michael Dürig wrote:


AFIK this is a new feature, why are we backporting this?

Michael


Because OAK-5704 relies on it.


But why should we backport that on? It is a minor improvement. Unless 
there is a strong reason to do otherwise we should only backport bugs.


Michael


Backport OAK-6208 to 1.6

2017-05-15 Thread Michael Dürig


Hi,

I would like to back port OAK-6208 [1] to the 1.6 branch.

Before 1.6 oak-run compact had a way to explicitly enable/disable memory 
mapping of the tar files. Somehow this got lost in oak-segment-tar and 
one has to resort to nasty workarounds like setting 
-Dsun.arch.data.model=32 on the command line.


This does only affect tooling related code.

@Francesco, Andrei, Alex, please have a look at the fix in trunk and 
share your concerns.


Michael



[1] https://issues.apache.org/jira/browse/OAK-6208


Re: Enforcing minimal test coverage

2017-05-11 Thread Michael Dürig



On 11.05.17 12:05, Angela Schreiber wrote:

- oak-segment-tar: 0.65


I guess this only includes the coverage from running unit tests and does 
not include integration tests. For segment tar the figures are actually 
much better with the latter, which is (another) indicator that segment 
tar needs to be further decoupled to make it easier to unit test.


Michael


Re: [ops] Unify NodeStore/DataStore configurations by using nstab

2017-05-10 Thread Michael Dürig


Hi Arek,

My mistake, my initial message (below) was intended for oak-dev.

Michael

On 10.05.17 11:01, Arek Kita wrote:

Hi Michael,

sorry, but I noticed right now, that the email was directed *only* to
me (not sure if on purpose).

Yes, I guess the approach should be not in the way how to create and
load such file, but starting from the other way round, how we can glue
all NodeStores (DataStores) together first based on what we have now
using one Builder/Factory interfaces that will create and link
everything. Then we can think about higher-level configuration format
(human configurable, not only dev configurable).

I guess this would be a good topic for an Oakathon because I'm not
involved in Oak heavily right now at any particular story. I guess
however that it would be easier for me to be focused and involved once
for that particular improvement.

Thanks,
Arek

2017-05-02 10:42 GMT+02:00 Michael Dürig :


Hi Arek,

I agree that Oak would benefit from being simpler and more uniform to
configure and run. The current approach with the Oak and Jcr builder classes
has somehow outgrown its initial purpose.

I am somewhat sceptic about (starting with) configurations files thought.
Configuration files tend to suffer from weak semantics that is not well
understood and poorly documented. Going forward it can be difficult to
evolve the format in a consistent way while keeping backward compatibility.

I would therefore rather suggest to start experimenting with simpler ways to
setup Oak in code through an internal DSL. From there we can start a
discussion whether it is worth to introduce external bindings and how these
should look. Maybe such bindings would just expose the most common subset of
features and you would need to turn to the internal configuration DSL for
the full expressive power.

Michael


On 28.04.17 12:56, Arek Kita wrote:


Hi,

I've noticed recently that with many different NodeStore
implementation (Segment, Document, Multiplexing) but also DataStore
implementation (File, S3, Azure) and some composite ones like
(Hierarchical, Federated - that was already mentioned in [0]) it
becomes more and more difficult to set up everything correctly and be
able to know the current persistence state of repository (especially
with pretty aged repos).

Moreover, the configuration pattern that is based on individual PID of
one service becomes problematic (i.e. recent change for
SegmentNodeStoreService).

  From the operations and user perspective everything should be treated
as a whole IMHO no matter which service handles which fragment of
persistence layout. Oak should know itself how to "autowire" different
parts, obviously with some hints and pointers from users as they want
to run Oak in their own preferred layout.

My proposal would be to integrate everything together to a pretty old
concept called "fstab". For our purposes I would call it "nstab".

This could look like [1] for the most simple case (with internal
blobs), [2] for typical SegmentMK + FDS, [3] for SegmentMK + S3DS, [4]
for MultiplexingNodeStore with some areas of repo set as read only. I
think we could also model Hierarchical and Federated DataStores as
well in the future.

Examples are for illustration purposes but I guess such setup will
help changing layout without a need to inspect many OSGi
configurations in a current setup and making sure some conflicting
ones aren't active.

The schema is also similar to an UNIX-way of configuring filesystem so
it will help Oak users to understand the layout (at least better than
it is now). I see also advantage for automated tooling like
oak-upgrade for complex cases in the future - user just provides
source nstab and target nstab in order to migrate repository.

The config should be also simpler avoiding things like customBlobStore
(it will be inferred from context).

WDYT? I have some thoughts how could this be implemented but first I
would like to know your opinions on that.

Thanks in advance for feedback!
Arek


[0] http://oak.markmail.org/thread/22dvuo6b7ab5ib7m
[1]
https://gist.githubusercontent.com/kitarek/f755dab6e889d1dfc5a1c595727f0171/raw/53d41ac7f935886783afd6c85d60e38e565a9259/nstab.1
[2]
https://gist.githubusercontent.com/kitarek/f755dab6e889d1dfc5a1c595727f0171/raw/53d41ac7f935886783afd6c85d60e38e565a9259/nstab.2
[3]
https://gist.githubusercontent.com/kitarek/f755dab6e889d1dfc5a1c595727f0171/raw/53d41ac7f935886783afd6c85d60e38e565a9259/nstab.3
[4]
https://gist.githubusercontent.com/kitarek/f755dab6e889d1dfc5a1c595727f0171/raw/53d41ac7f935886783afd6c85d60e38e565a9259/nstab.4





Re: Merge policy for the 1.6 branch

2017-04-25 Thread Michael Dürig



On 10.04.17 17:59, Michael Dürig wrote:


Hi,

I think we can get a consensus on the following statement:

"Back ports bear a certain risk of introducing regressions to otherwise 
stable branches. Each back ported change should be carefully evaluated 
for its potential impact, risk and possible mitigations. It is the 
responsibility of each committer to asses these and ask for advise or 
reviewing on oak-dev@ if uncertain. Whether using RTC or CTR is up to 
the committer."


I will add a statement along these lines to the "Participating" section 
of the Oak documentation unless there are further objections.


Done http://svn.apache.org/viewvc?rev=1792601&view=rev

Michael


Michael


On 14.03.17 11:59, Michael Dürig wrote:


Hi,

Following up on Davide's release plan for Oak 1.6 [1] we should define a
merge policy for the 1.6 branch. I would suggest to be a bit more
conservative here than we have been in the past and ask for reviews of
backports. That is, announce candidates on @oak-dev mentioning the issue
reference, potential risks, mitigations, etc. I don't think we need to
block the actual backport being performed on the outcome of the review
as in the worst case changes can always be reverted. The main aim of the
announcement should be to increase visibility of the backports and
ensure they are eventually reviewed.

In short, announce your backport on @oak-dev and ask for review. If
confident enough that the review will pass anyway, go ahead but be
prepared to revert.

I think this is what we informally did so far already but wanted to
state this a bit more explicitly.

WDYT?

Michael



[1]
https://lists.apache.org/thread.html/e5e71b61de9612d7cac195cbe948e8bdca58ee38ee16e7f124ea742c@%3Coak-dev.jackrabbit.apache.org%3E 





Dangling Java 8 Maven profile!?

2017-04-25 Thread Michael Dürig


Hi,

After OAK-5664 moved us to Java 8 I believe we can remove the java8 
Maven profile as well [1].


Michael

[1] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-parent/pom.xml#L960


Re: [m12] Effort to Improve Modularisation of Oak

2017-04-19 Thread Michael Dürig



On 19.04.17 16:26, Angela Schreiber wrote:

I'm actually reluctant about 1) and 2) as renaming established modules
have quite a ripple effect. As with 3) we already have sub-modules in
one place we should probably start a discussion of switching to a
hierarchical module structure.

makes sense to me.


To address 1) and 2) once the main
modularisation effort stabilised.

to be honest, i think we have more work ahead of us when looking at what
remains in oak-core right now.
i guess we would need another discussion here how we want to proceed with
various plugins (mainly to comply with jcr), document nodestores, query
and security.


Ack, let's leave it at that. I'm mostly fine with the naming and 
renaming existing modules is really not worth the trouble.


Michael


Re: svn commit: r1791885 - /jackrabbit/oak/trunk/oak-examples/webapp/pom.xml

2017-04-19 Thread Michael Dürig



On 19.04.17 12:32, Chetan Mehrotra wrote:

On Wed, Apr 19, 2017 at 2:49 PM,   wrote:

-2.5
+3.0.0


May be we specify the version in parent pluginManagement and not have
explicit version in any child pom


+1

Michael



Chetan Mehrotra



Re: [m12] Effort to Improve Modularisation of Oak

2017-04-19 Thread Michael Dürig



On 18.04.17 14:51, Angela Schreiber wrote:

Hi Michael

Sure... what modules do you think should be renamed? You mentioned
oak-commons-run... anything else?


Apart from renaming oak-commons-run to oak-run-commons there is:

1) oak-authentication-* instead of oak-auth-* as this would be inline 
with oak-authorization-*.


2) Also it is not obvious that oak-segment-tar and oak-store-spi are 
related. From that POC oak-segment-tar should be something like 
oak-store-segment.


3) Further oak-example and oak-exercise: the former already has sub 
modules. Maybe we can rename it to oak-getting-started (or similar) and 
move oak-exercise into the renamed one.


I'm actually reluctant about 1) and 2) as renaming established modules 
have quite a ripple effect. As with 3) we already have sub-modules in 
one place we should probably start a discussion of switching to a 
hierarchical module structure. To address 1) and 2) once the main 
modularisation effort stabilised.


Michael



Kind regards
Angela

On 18/04/17 08:57, "Michael Dürig"  wrote:




On 13.04.17 15:52, Angela Schreiber wrote:

{quote}
I would suggest to go with a naming scheme that reflects how modules
would be grouped together in a hierarchical structure as much as
possible for now. E.g. rename oak-commons-run to oak-run-commons.
{quote}

I would like to address this separately as it would further expand the
scope of OAK-6073, which will be open for review over the weekend. After
that I would suggest that we incorporate the refactoring into oak-trunk.


Works for me, but let's address it quickly afterwards so that those
"intermediate" module names do not get a chance to "stick around".

Michael




Re: [m12n] A new home for ApproximateCounter, NodeUtil and TreeUtil?

2017-04-18 Thread Michael Dürig


AFAIK ApproximateCounter was intended to be general purpose. So moving 
it close to indexing would not be the right thing. We probably need to 
involve Thomas here to provide his perspective.


But then, I think this (and as well as the considerations around 
NodeUtil and TreeUtil) should be tackled separately.


Michael

On 18.04.17 08:21, Angela Schreiber wrote:

Hi Robert

While NodeUtil and TreeUtil would naturally fit to plugins.tree, I am not
convinced that ApproximateCounter really belongs there. Afaik it is only
used for query index strategy and counting. I would rather move
'ApproximateCounter' to 'plugins.index'.

Regarding moving 'NodeUtil' and 'TreeUtil': IMHO we have here 2 utility
classes providing almost the same functionality. I would prefer to decide
on the redundancy (and potentially clean it up) before moving it to a
package that already has semantic versioning enabled (in contrast to the
util package where they currently are located).

wdyt?

Kind regards
Angela


On 14/04/17 12:47, "Robert Munteanu"  wrote:


I created a final PR for this as I have somewhat mixed feelings. One
one had, it finally nukes the util package. On the other hand, it looks
like a lot of noise for 3 classes.


https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.co
m%2Fmreutegg%2Fjackrabbit-oak%2Fpull%2F6&data=02%7C01%7C%7C287ecd3d735246c
cbc8308d48323b62e%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C63627763678
7796080&sdata=HSDWauCHB%2Bb4OaX90CfEWsA7487EP3FvrSicZNgKD8Q%3D&reserved=0

Robert

On Thu, 2017-04-06 at 14:49 +, Angela Schreiber wrote:

Hi Robert

plugins.tree would feel natural to me.
regarding the export: not sure about that either... the plugins.tree
has
some unfortunate dependencies e.g. to oak.core. so probably more work
ahead in that area.

kind regards
angela

On 06/04/17 16:41, "Robert Munteanu"  wrote:


Hi,

Working in the m12n branch [1] I'm trying to get rid of the
o.a.j.oak.util package and the last surviving members are
ApproximateCounter, NodeUtil and TreeUtil.

As I see it these classes are essentially helpers built on top of
the
Tree and NodeState APIs. Those would make them candidates on for
either
oak-store-spi or (if we manage to trim down the dependencies) oak-
base.

However I am having trouble naming the package which will hold
them.
They're not part of the spi, so I can't put them in spi.state .

Maybe they belong in oak-core in plugins.tree, but I'm not sure if
we
want to keep that as a package which is exported outside oak-core.

Thoughts?

Robert

[1]:
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
thub.co
m%2Fmreutegg%2Fjackrabbit-
oak%2Ftree%2Fm12n&data=02%7C01%7C%7Cbfc1feb5ff4a
4866c79c08d47cfafe6d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C6
3627086
4841267177&sdata=CWwq4ifTZIU1gW9UEd2STRLm%2B1svSP0kvlkLMksmWcM%3D&r
eserved
=0









Re: [m12] Effort to Improve Modularisation of Oak

2017-04-17 Thread Michael Dürig



On 13.04.17 15:52, Angela Schreiber wrote:

{quote}
I would suggest to go with a naming scheme that reflects how modules
would be grouped together in a hierarchical structure as much as
possible for now. E.g. rename oak-commons-run to oak-run-commons.
{quote}

I would like to address this separately as it would further expand the
scope of OAK-6073, which will be open for review over the weekend. After
that I would suggest that we incorporate the refactoring into oak-trunk.


Works for me, but let's address it quickly afterwards so that those 
"intermediate" module names do not get a chance to "stick around".


Michael


Re: [m12] Effort to Improve Modularisation of Oak

2017-04-13 Thread Michael Dürig



I try to describe the changes proposed by the PoC in
https://issues.apache.org/jira/browse/OAK-6073?focusedCommentId=15965623&pa
ge=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment
-15965623.
Additionally added some step-by-step instruction on how we proceeded.



Thanks, this is very valuable!



When we first looked at it on Tuesday last week, I thought that we would
end the exercise with a "tried hard but failed" summary. So, I am quite
pleased that actually ended up with a working PoC.


So am I, I'm impressed by the progress here!



Looking back I thing the biggest issues are

- putting everything in oak-core was obviously convenient but it turned
out to be impossible to protect against boundary violations
- packages sometimes contain classes that not really belong together, e.g.
  - org.apache.jackrabbit.oak.spi.lifecycle containing OakInitializer
  - oak.apache.jackrabbit.oak.spi.whiteboard containing classes that
should be located with the corresponding feature (e.g. user mgt, index)
- impl specific methods that are not defined by an API contract such as
e.g. ValueImpl.getBlob, ValueImpl.getOakString... this was actually the
only place where we added a new interface and modified existing code
- somehow i got the impression that we didn't manage to make consistent
decision wrt package naming
  - what should go into a 'spi' package?
  - what should go into 'plugins'-something? and how is that different
from spi? (and what is e.g. the diff between spi.blob and plugins.blob?)
  - when do we create a new package space oak.somethingnew and how are
those packages intended to be used.

Moving forward I think it would help a lot if we had a common
understanding here and come up with some description what is used for what.
Maybe we also need to take a closer look when adding new stuff to oak-core
while moving forward.


I think this is a discussion we need to take up again in the aftermath 
of this restructuring. For now I think it is best to create JIRA issue 
for those things you had to somehow work around or leave out.






Quick wins?


Well... for me the biggest win is the fact that 'oak-blob-azure',
'oak-blob-cloud' and 'oak-segment-tar' no longer would depend on oak-core.


+1, I'm specially happy with the result for oak-segment-tar. We already 
tried last year to make this module more independent but had to revert 
eventually.




Looking at the list of modules, its size and the names, did you consider
switching to a hierarchical module structure?


No, we didn't discuss that.


Or could this make sense later on?


I don't have any strong preference here. We had some discussion on how we
should align the svn structure in general and what would be the best when
we want to start releasing modules individually.


Otherwise can we come up with a naming scheme that implies
grouping (e.g. node store implementations, blob store implementations,
etc.)


Sure, makes a lot of sense to me. :-)


I would suggest to go with a naming scheme that reflects how modules 
would be grouped together in a hierarchical structure as much as 
possible for now. E.g. rename oak-commons-run to oak-run-commons.


Michael




Re. oak-base and oak-commons, these are probably separated to avoid
circular dependencies. Is there anyway to otherwise clarify the
difference between the two? I.e. if I implement a new class, which
module it should go into? Would oak-base be something like oak-core-spi
or even oak-spi? This would nicely dual the oak-store-spi module.


Exactly... and actually I like the name 'oak-core-spi' a lot better. In
OAK-6073 I stated that IMO that module might be a tmp solution as it
currently contains a somewhat loose collection of packages that were in
'oak-core' and didn't really fit into 'oak-commons' from my point of view.
After all I wanted to avoid converting 'oak-commons' into a second
'oak-core' :-).
That module is the one with the least consistency IMO. But things may
clarify if we move on... I definitely would love to move oak.spi.security
and oak.security.* out of oak-core... but that probably requires a second
round *wishful thinking*.



Is there plans to move document/rdb stores to separate modules or is
this beyond the current scope?


I guess that would be a natural step as we move on... but during the last
week we didn't look into this.

Kind regards
Angela

PS: will attach simplified picture to OAK-6073 to illustrated the big
picture.



Michael

On 12.04.17 11:21, Angela Schreiber wrote:

Hi

As mentioned my Marcel this morning [0] we had some offline discussions
related to the oak-blob-azure module and how we could independently
release it. While we didn't see a satisfying solution for the 1.6
branch,
we concluded that we should pick up the modularisation discussion for
address this in the near future.

Consequently a group of oak devs started to work on a PoC on how to
improve modularisation of Oak (in particular oak-core). As we managed to
get rid of the dependency 

Re: Module to host oak-run console scripts

2017-04-13 Thread Michael Dürig


Hi,

On 13.04.17 11:33, Chetan Mehrotra wrote:

Major benefit of scripts are they are faster to implement and can be
used in older branches without impacting runtime.


I agree on this part. But most of this benefit comes from not providing 
the same level of maintenance, testing, compatibility etc. for such 
scripts. This flexibility can be a big advantage but when putting such 
scripts close to our source code most people would probably expect them 
to be "fully supported". To that respect I share Marcel's concerns. 
Maybe we can find another common home for such scripts? And btw, I would 
also like to broaden the scope from oak-console scripts to all kinds of 
useful scripts for Oak (e.g. for scripts for the other Oak console at 
https://github.com/mduerig/script-oak).


Michael


Re: Backport of OAK-4933 to 1.6

2017-04-12 Thread Michael Dürig



On 12.04.17 09:17, Marcel Reutegger wrote:

Hi,

On 28/03/17 09:09, Michael Dürig wrote:

As Marcel mentions on the issue a better approach would be to release
this independently. If this is blocked by dependencies we should make an
effort to sort this out, as now is the time in the release cycle for
doing so.

So for now -1 from my side to back porting this until

a) we have a clear picture of the alternatives


A group of Oak committers discussed alternatives offline last week. I'd
like to summarize this discussion here on the dev list again. Feel free
to correct me or add to below if you think I missed important conclusions.

In general we recognized the need for this module to work with 1.6.
After all the whole modularization effort is about enabling independent
releases and make it possible to release new features outside of our so
far yearly minor version Oak release cycle. Having to wait for nearly a
year in order to use a rather independent new module with a stable Oak
release seems disproportionate.

An alternative was presented in OAK-4933. Release the new module
independently and then use it in 1.6. The broad agreement in the
discussion last week was, this is not desirable either. We'd end up in a
situation where the 1.6 branch is a mix of multi-module and independent
release modules. The modularization in progress for trunk, with the goal
to have cleaner dependencies also for the new azure module, will make it
more difficult to use the new module in
the 1.6 branch. We definitively don't want to restructure the 1.6 branch
as well.

The proposal therefore is to backport OAK-4933 despite the concerns
mentioned earlier.


Makes sense, thanks for the in-depth analysis! Given there is good 
progress with the modularisation [1] and that I'm confident that we'll 
keep up with this I'm taking back my -1 from earlier. Consider me +1.




b) in the case of backporting, understand how we would ensure quality.
This is new code that was so far never exposed to the level of testing
people would expect from 1.6 code.


I don't have a good answer to this other than thorough code review and
good test coverage. Anything else we can do?


Can we in addition put some sort of a disclaimer into the 1.6 branch for 
this module stating its origin? This would help setting user's 
expectations right.


Michael

[1] 
https://lists.apache.org/thread.html/73f6916cc920e36c116ba0311bdf602021598c31f1cf0e5e840e379c@%3Coak-dev.jackrabbit.apache.org%3E


Re: [m12] Effort to Improve Modularisation of Oak

2017-04-12 Thread Michael Dürig


Hi Angela,

Thanks for driving this and coming up with a PoC. I like the direction 
this is taking. The module boundaries make sense to me and having 
certain dependencies de-tangled will certainly be helpful going forward.
Could you share a bit of your experience doing this refactoring? What 
were the main difficulties? Quick wins? Is there anything that could be 
controversial?


Looking at the list of modules, its size and the names, did you consider 
switching to a hierarchical module structure? Or could this make sense 
later on? Otherwise can we come up with a naming scheme that implies 
grouping (e.g. node store implementations, blob store implementations, 
etc.)


Re. oak-base and oak-commons, these are probably separated to avoid 
circular dependencies. Is there anyway to otherwise clarify the 
difference between the two? I.e. if I implement a new class, which 
module it should go into? Would oak-base be something like oak-core-spi 
or even oak-spi? This would nicely dual the oak-store-spi module.


Is there plans to move document/rdb stores to separate modules or is 
this beyond the current scope?


Michael

On 12.04.17 11:21, Angela Schreiber wrote:

Hi

As mentioned my Marcel this morning [0] we had some offline discussions
related to the oak-blob-azure module and how we could independently
release it. While we didn't see a satisfying solution for the 1.6 branch,
we concluded that we should pick up the modularisation discussion for
address this in the near future.

Consequently a group of oak devs started to work on a PoC on how to
improve modularisation of Oak (in particular oak-core). As we managed to
get rid of the dependency of oak-blob-azure (and oak-segment-tar for that
matter) from oak-core with a reasonable effort, we would like move forward
with this in oak-trunk.

For that matter I created a new epic "Modularisation of Oak" (OAK-6069 at
[1]) and added/linked a initial bunch of issues spotted during the
workshop and earlier. For the 'oak-blob-azure' topic I create a dedicated
task OAK-6073 [2], where I will also add some detailed summary of the
initial effort. The latter can also be looked at on a github fork at [3].

Kind regards
Angela

[0]
http://markmail.org/message/neoiyv5qsffo424e?q=azure+list:org%2Eapache%2Eja
ckrabbit%2Eoak-dev+from:%22Marcel+Reutegger%22&page=1
[1] https://issues.apache.org/jira/browse/OAK-6069
[2] https://issues.apache.org/jira/browse/OAK-6073
[3] https://github.com/mreutegg/jackrabbit-oak/tree/m12n.



Re: Merge policy for the 1.6 branch

2017-04-10 Thread Michael Dürig


Hi,

I think we can get a consensus on the following statement:

"Back ports bear a certain risk of introducing regressions to otherwise 
stable branches. Each back ported change should be carefully evaluated 
for its potential impact, risk and possible mitigations. It is the 
responsibility of each committer to asses these and ask for advise or 
reviewing on oak-dev@ if uncertain. Whether using RTC or CTR is up to 
the committer."


I will add a statement along these lines to the "Participating" section 
of the Oak documentation unless there are further objections.


Michael


On 14.03.17 11:59, Michael Dürig wrote:


Hi,

Following up on Davide's release plan for Oak 1.6 [1] we should define a
merge policy for the 1.6 branch. I would suggest to be a bit more
conservative here than we have been in the past and ask for reviews of
backports. That is, announce candidates on @oak-dev mentioning the issue
reference, potential risks, mitigations, etc. I don't think we need to
block the actual backport being performed on the outcome of the review
as in the worst case changes can always be reverted. The main aim of the
announcement should be to increase visibility of the backports and
ensure they are eventually reviewed.

In short, announce your backport on @oak-dev and ask for review. If
confident enough that the review will pass anyway, go ahead but be
prepared to revert.

I think this is what we informally did so far already but wanted to
state this a bit more explicitly.

WDYT?

Michael



[1]
https://lists.apache.org/thread.html/e5e71b61de9612d7cac195cbe948e8bdca58ee38ee16e7f124ea742c@%3Coak-dev.jackrabbit.apache.org%3E



Re: What's the contract of QueryBuilder.impersonates(String name)?

2017-04-03 Thread Michael Dürig


Confirmed, this is principle name. At least this is what it was built 
for in Jackrabbit 2. The string passed is escaped vis 
org.apache.jackrabbit.oak.commons.QueryUtils#escapeForQuery




Michael

On 03.04.17 16:36, Angela Schreiber wrote:

Hi

I don't know how Michael intended it to work originally. Given the fact
that the impersonation setup is established and evaluated using principals
I would expect it to be a principal name, which in the default
implementation just can be any string value.

Kind regards
Angela

On 03/04/17 16:14, "Manfred Baedke"  wrote:


Hi all,

Can anyone clarify the contract of the method
org.apache.jackrabbit.api.security.user.QueryBuilder.impersonate(String
name)?
According to the JavaDoc, the parameter is the "name of an
authorizable". But the interface
org.apache.jackrabbit.api.security.user.Authorizable doesn't have a
name, just an id and a principal (which in turn has a name).
If a principal name is expected here (which seems to be the case
according to the implementations), then it needs to be specified if the
caller has to do any necessary escaping: if the user in question is e.g.
an LDAP user, it's principal name may contain backslash characters.

Best regards,
Manfred




Re: Moving to Java 8

2017-04-03 Thread Michael Dürig



On 31.03.17 13:29, Julian Reschke wrote:

On 2017-02-15 12:17, Julian Reschke wrote:

Hi there,

I understand that we might not be able to move to Java 8 just yet, but I
felt it would be good to capture information related to this topic in
Jira (so that we can link other related tickets).

So feel free to provide feedback (and include more information) in

https://issues.apache.org/jira/browse/OAK-5664

Best regards, Julian


So far I didn't see anybody unhappy with a move to required Java 8 for
Oak 1.7 (unstable) and Oak 1.8 (once we get there in ~12 months). Thus,
I propose that we actually make the switch now.


+1 for 1.8.



Does anybody thing we need a vote?


No need for a vote IMO, (lazy) consensus here should be sufficient.

Michael



Best regards, Julian




Re: New JIRA component for observation

2017-04-03 Thread Michael Dürig



On 31.03.17 07:06, Chetan Mehrotra wrote:

On Thu, Mar 30, 2017 at 7:55 PM, Thomas Mueller  wrote:

Depending on that, we can use "Maven" module boundaries, or "Logical" module 
boundaries.


My preference is for "Logical" module boundaries and not be bounded by
the Maven module boundaries.



I would prefer to stay aligned with Maven boundaries as much as possible 
as this simplifies bug reporting for parties not deeply involved with 
Oak very much. Most of the apparent need to break out of that scheme is 
to me rather a symptom of missing modularity rather than a cure. If we 
introduce logical modules in Jira, I strongly suggest to come up with a 
clear and concise definition for them: what exactly belongs to them, 
what not? What are the criteria and how are they applied. Can modules 
overlap? Do they need to stay aligned with the boundaries of the Maven 
modules or can a logical module be part of multiple Maven modules?


Michael


Re: Backport of OAK-4933 to 1.6

2017-03-30 Thread Michael Dürig



On 28.03.17 15:02, Raul-Nicolae Hudea wrote:

Hi,

I wasn’t aware that “releasing independently” is an option, and maybe even 
desired for future modules. This is currently proposed on 
https://issues.apache.org/jira/browse/OAK-4933.

If you have input on how to make it successful, please add your input there 
(I’d be interested in the lessons learned from the previous attempt, which I 
understand it was related to segment-tar).


We decoupled oak-segment-tar for some time from the Oak release cycle 
last year. This gave us a lot of additional flexibility and enhanced 
modularity. We were effectively able to just re-deploy oak-segment-tar 
in AEM. Something that was and is much harder otherwise.


The show stopper latter was that with this way of releasing it is 
difficult to align the version of oak-run (tooling) with the module. To 
fix this we would have had to also decouple the tooling. Something we 
didn't have the resources for at that point in time.


Michael



Thanks,
Raul

On 28/03/2017, 11:54, "Angela Schreiber"  wrote:

i was about to write pretty much the same thing :-)

regards
angela

On 28/03/17 09:09, "Michael Dürig"  wrote:

>
>As this is a new feature I would be interested in the motivation for
>having to backport this. Generally we should only backport fixes for
>defects.
>
>As Marcel mentions on the issue a better approach would be to release
>this independently. If this is blocked by dependencies we should make an
>effort to sort this out, as now is the time in the release cycle for
>doing so.
>
>So for now -1 from my side to back porting this until
>
>a) we have a clear picture of the alternatives and
>b) in the case of backporting, understand how we would ensure quality.
>This is new code that was so far never exposed to the level of testing
>people would expect from 1.6 code.
>
>Michael
>
>On 27.03.17 11:21, Raul-Nicolae Hudea wrote:
>> Hi,
>>
>> I would like to backport OAK-4933 to 1.6. The impact should be minimal
>>since the changes are about bringing the AzureBlobStore connector to 1.6.
>>
>> Changes are:
>> - new module
>> - changes in oak-run to support the azure data store
>>
>> Thanks,
>> Raul
>>
>>





Re: Breakdown of Jira issues reported by Jenkins

2017-03-30 Thread Michael Dürig


Hi,

The new Jenkins Job I configured [1] doesn't seem to send emails nor 
create issues when it fails. E.g. see build 78 [2]. Could someone else 
have a quick look at the configuration of that job as I cannot figure 
out what is wrong with it.




[1] https://builds.apache.org/view/J/job/Jackrabbit%20Oak/
[2] https://builds.apache.org/view/J/job/Jackrabbit%20Oak/78/

On 21.03.17 09:06, Michael Dürig wrote:


Hi,

As a first reaction to this and to increase our benefit from Jenkins I
disabled email notifications and Jira issue reporting for our Jenkins
Matrix jobs [1, 2]. The jobs are still there and I suggest everyone has
a look at them once in a while.
At the same time I set up a new Jenkins jobs, which is much lighter as
it only runs the unit tests on trunk [3]. The job is triggered at every
commit and usually completes after about 25 minutes. Currently the job
sends a notification to @oak-dev should it fail and I might experiment
with adding Jira issue reporting to it (but might hit INFRA-13599 [4]).
So far this jobs has proofed very stable (no failures in the past 20
builds). The stability together with the quick turnaround should give us
fast feedback on regressions. Any failure reported by this job is thus a
signal for immediate action.

Michael

[1]
https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/
[2] https://builds.apache.org/view/All/job/Oak-Win/
[3] https://builds.apache.org/view/J/job/Jackrabbit%20Oak/
[4] https://issues.apache.org/jira/browse/INFRA-13599


On 28.02.17 12:31, Michael Dürig wrote:


Hi,

To get an overview on what is going on with our Jenkins instances, what
value they provide and how much effort they generate, I broke down the
issues reported by them along various axis.

There where 327 issues reported between 8.12.16 and 28.2.17. With 82
days this amounts to almost 4 issues a day. Note that this number is
quite biased as that time period includes the Christmas break where we
didn't had much activity. The correct numbers are probably closer to 72
days and and 4.5 issues per day.

To me the most striking thing in below breakdowns are the high number of
duplicates (256 / 78%) and the high number of infrastructure relates
issues (84 / 26%). To me this means we are spending too much time in
triaging issues and hunting down infrastructure problems.

From the total of 25 fixed issues only 4 where actual regressions. Two
of which were caused by missing licenses headers, a problem that our
release process also would have caught.

Finally all numbers are further biased because the Jenkins Jira
notification plugin itself fails sometimes [1] (frequently?), which
causes build failures not to be reported.

Michael

Issues by resolution:
256 Duplicates (172 test failures / 84 infra issues)
 27 Unresolved ( 21 test failures /  6 infra issues)
 25 Fixed
 15 CI and infra issue
  4 Rare test artefacts

Infra issues (84):
 32 Backing channel disconnected
 20 JVM crash
 12 File name too long
  6 Failed silently
  4 Artifact resolution error
  4 Maven failure
  3 Timeout (120 min.)
  2 Disk full
  1 Checksum mismatch

Fixed issues (25):
  4 bug / regression (OAK-5339, OAK-5540, OAK-5241, OAK-5471)
  7 timing
 14 test artefact



[1]
ERROR: Build step failed with exception
java.lang.NullPointerException
at
hudson.plugins.jira.JiraCreateIssueNotifier.getStatus(JiraCreateIssueNotifier.java:218)


at
hudson.plugins.jira.JiraCreateIssueNotifier.currentBuildResultSuccess(JiraCreateIssueNotifier.java:387)


at
hudson.plugins.jira.JiraCreateIssueNotifier.perform(JiraCreateIssueNotifier.java:159)


at
hudson.tasks.BuildStepMonitor$3.perform(BuildStepMonitor.java:45)
at
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779)


at
hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:720)


at hudson.model.Build$BuildExecution.post2(Build.java:185)
at
hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:665)


at hudson.model.Run.execute(Run.java:1753)
at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
at
hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:404)
Build step 'JIRA: Create issue' marked build as failure
Finished: FAILURE


Re: Backport of OAK-4933 to 1.6

2017-03-28 Thread Michael Dürig


As this is a new feature I would be interested in the motivation for 
having to backport this. Generally we should only backport fixes for 
defects.


As Marcel mentions on the issue a better approach would be to release 
this independently. If this is blocked by dependencies we should make an 
effort to sort this out, as now is the time in the release cycle for 
doing so.


So for now -1 from my side to back porting this until

a) we have a clear picture of the alternatives and
b) in the case of backporting, understand how we would ensure quality. 
This is new code that was so far never exposed to the level of testing 
people would expect from 1.6 code.


Michael

On 27.03.17 11:21, Raul-Nicolae Hudea wrote:

Hi,

I would like to backport OAK-4933 to 1.6. The impact should be minimal since 
the changes are about bringing the AzureBlobStore connector to 1.6.

Changes are:
- new module
- changes in oak-run to support the azure data store

Thanks,
Raul




Re: New JIRA component for observation

2017-03-27 Thread Michael Dürig



On 27.03.17 09:26, Marcel Reutegger wrote:

I'm wondering if this is the best approach. Initially we used the JIRA
component 1:1 for modules we have in SVN. Now we also use them for
sub-modules like 'documentmk', 'mongomk', 'property-index', ...


+1

Michael


Re: Oak 1.0.29 vs 1.4.10 memory mapping.

2017-03-23 Thread Michael Dürig


Hi,


I'm pretty sure that this method is the one introducing the extra
full mapping of the repository:
FileStoreHelper.checkFileStoreVersionOrFail


We should probably run this check with memory mapping disabled anyway. 
Nothing to gain here but probably this would fix the double mapping and 
sideline OAK-4274, which effectively is 
http://bugs.java.com/view_bug.do?bug_id=4724038.


Michael


On 23.03.17 16:13, Alex Parvulescu wrote:

Hi,

To add what I have found so far. This seems related to OAK-4274, but
I think there might be a twist in there somewhere. I'm pretty sure
that this method is the one introducing the extra full mapping of the
repository: FileStoreHelper.checkFileStoreVersionOrFail [0].
Disabling this method takes the 2x mapping away completely.

The reason I'm saying it is relate to OAK-4274 is because I looked at
a heap dump to verify what is keeping the references to the readonly
store and there are no live ones, the refs should be GC'ed, but for
some reason they are not.

I'm still poking around, did not create an oak issue yet. Still
pending is to verify if this affects other areas than oak-run.

Feedback is more than welcome!

best, alex

[0]
https://github.com/apache/jackrabbit-oak/blob/1.6/oak-run/src/main/java/org/apache/jackrabbit/oak/plugins/segment/FileStoreHelper.java#L209







On Thu, Mar 23, 2017 at 12:10 PM, Ian Boston  wrote:


Hi, Based on the page fault behaviour, I think the areas mapped and
reported by pmap are being actively accessed by the JVM. The number
of page faults for Oak 1.4.11 is well over 2x the number of page
faults for Oak 1.0.29 on the same VM, with the same DB when
performing an oak-run offline compaction. This is on the same VM
with the same repository in the same state. The Tar files are not
the same, but 1 copy of the tar files is 32GB in both instances,
1.4.11 maps 64GB as mentioned before.

I dont know if its the behaviour seen in OAK-4274. I have seen
similar in the past. I was not confident that a GC cycle did unmap,
but it would be logical. Best Regards Ian

On 23 March 2017 at 09:07, Francesco Mari
 wrote:


You might be hitting OAK-4274, which I discovered quite some time
ago. I'm not aware of a way to resolve this issue at the moment.

2017-03-22 16:47 GMT+01:00 Alex Parvulescu
:

Hi,

To give more background this came about during an investigation
into a

slow

offline compaction but it may affect any running FileStore as
well (to

be

verified). I don't think it's related to oak-run itself, but
more with the way we

map

files, and so far it looks like a bug (there is no reasonable

explanation

for mapping each tar file twice).

Took a quick look at the TarReader but there are not many
changes in

this

area 1.0 vs. 1.4 branches. If no one has better ideas, I'll
create an oak issue and investigate

this a

bit further.

thanks, alex


On Wed, Mar 22, 2017 at 4:28 PM, Ian Boston 
wrote:


Hi, I am looking at Oak-run and I see 2x the mapped memory
between 1.0.29

and

1.4.10. It looks like in 1.0.29 each segment file is mapped
into

memory

once, but in 1.4.10 its mapped into memory 2x.

Is this expected ?

Its not great for page faults. Best Regards Ian









Re: Metrics support in Oak

2017-03-23 Thread Michael Dürig


Hi,

I followed up with https://issues.apache.org/jira/browse/OAK-5973 to 
discuss the particulars.


Michael

On 23.03.17 09:36, Ian Boston wrote:

Hi,

IIRC (a) is doable and the prefered way of naming metrics. In other systems
that use Metrics they typically use the package or class name, sometimes an
API classname, in the same way that loggers do. This makes it much easier
to process and report on blocks of functionality at the reporting stage.
For instance when the metrics are ingested into InfluxDB using Grafana as a
front end they can be filtered effectively on the metrics name.

Some background (mostly for the wider community)

 Oaks MetricsRegistry is deployed as a service into Sling with a name the
name "oak". Sling has its own MetricsRegistry exposed as a service with the
name "sling". The reporting tools aggregate all the MetricsRegistries
prefixing them with their service name. Hence the Oak MetricsRegistry
metrics will all be prefixed with "oak-" when reported.

That means Oak doesn't need to differentiate itself from other metrics, but
(a) is a good idea to avoid 100s of metrics all in 1 namespace.
MetricsRegistries are designed to scale to 1000s.

Anyone using a MetricsRegistry service should bind to a the "sling"
registry service or create their own and register it with a unque name as
is done here [1]. Thats the runtime instrumentation bundle, service named
"woven".

+1 to (a)

Best Regards
Ian


1
https://github.com/ieb/slingmetrics/blob/master/src/main/java/org/apache/sling/metrics/impl/MetricsActivator.java#L79

On 21 March 2017 at 12:53, Michael Dürig  wrote:



Hi,

AFAICS Oak's Metrics support exposes all Stats in a flat namespace under
"org.apache.jackrabbit.oak". I don't think this is desirable. We should (a)
either come up with a way to expose them by grouping related ones together
or at least (b) arrive at a consensus on how we construct the names of the
individual Stats in an unambiguous and standard way. Currently we have
different approaches in the various component resulting in a confusing list
of items.

My preference would be (a), but I don't know if this is doable.


Michael





Metrics support in Oak

2017-03-21 Thread Michael Dürig


Hi,

AFAICS Oak's Metrics support exposes all Stats in a flat namespace under 
"org.apache.jackrabbit.oak". I don't think this is desirable. We should 
(a) either come up with a way to expose them by grouping related ones 
together or at least (b) arrive at a consensus on how we construct the 
names of the individual Stats in an unambiguous and standard way. 
Currently we have different approaches in the various component 
resulting in a confusing list of items.


My preference would be (a), but I don't know if this is doable.


Michael


Re: Breakdown of Jira issues reported by Jenkins

2017-03-21 Thread Michael Dürig


Hi,

As a first reaction to this and to increase our benefit from Jenkins I 
disabled email notifications and Jira issue reporting for our Jenkins 
Matrix jobs [1, 2]. The jobs are still there and I suggest everyone has 
a look at them once in a while.
At the same time I set up a new Jenkins jobs, which is much lighter as 
it only runs the unit tests on trunk [3]. The job is triggered at every 
commit and usually completes after about 25 minutes. Currently the job 
sends a notification to @oak-dev should it fail and I might experiment 
with adding Jira issue reporting to it (but might hit INFRA-13599 [4]). 
So far this jobs has proofed very stable (no failures in the past 20 
builds). The stability together with the quick turnaround should give us 
fast feedback on regressions. Any failure reported by this job is thus a 
signal for immediate action.


Michael

[1] 
https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/

[2] https://builds.apache.org/view/All/job/Oak-Win/
[3] https://builds.apache.org/view/J/job/Jackrabbit%20Oak/
[4] https://issues.apache.org/jira/browse/INFRA-13599


On 28.02.17 12:31, Michael Dürig wrote:


Hi,

To get an overview on what is going on with our Jenkins instances, what
value they provide and how much effort they generate, I broke down the
issues reported by them along various axis.

There where 327 issues reported between 8.12.16 and 28.2.17. With 82
days this amounts to almost 4 issues a day. Note that this number is
quite biased as that time period includes the Christmas break where we
didn't had much activity. The correct numbers are probably closer to 72
days and and 4.5 issues per day.

To me the most striking thing in below breakdowns are the high number of
duplicates (256 / 78%) and the high number of infrastructure relates
issues (84 / 26%). To me this means we are spending too much time in
triaging issues and hunting down infrastructure problems.

From the total of 25 fixed issues only 4 where actual regressions. Two
of which were caused by missing licenses headers, a problem that our
release process also would have caught.

Finally all numbers are further biased because the Jenkins Jira
notification plugin itself fails sometimes [1] (frequently?), which
causes build failures not to be reported.

Michael

Issues by resolution:
256 Duplicates (172 test failures / 84 infra issues)
 27 Unresolved ( 21 test failures /  6 infra issues)
 25 Fixed
 15 CI and infra issue
  4 Rare test artefacts

Infra issues (84):
 32 Backing channel disconnected
 20 JVM crash
 12 File name too long
  6 Failed silently
  4 Artifact resolution error
  4 Maven failure
  3 Timeout (120 min.)
  2 Disk full
  1 Checksum mismatch

Fixed issues (25):
  4 bug / regression (OAK-5339, OAK-5540, OAK-5241, OAK-5471)
  7 timing
 14 test artefact



[1]
ERROR: Build step failed with exception
java.lang.NullPointerException
at
hudson.plugins.jira.JiraCreateIssueNotifier.getStatus(JiraCreateIssueNotifier.java:218)

at
hudson.plugins.jira.JiraCreateIssueNotifier.currentBuildResultSuccess(JiraCreateIssueNotifier.java:387)

at
hudson.plugins.jira.JiraCreateIssueNotifier.perform(JiraCreateIssueNotifier.java:159)

at
hudson.tasks.BuildStepMonitor$3.perform(BuildStepMonitor.java:45)
at
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779)

at
hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:720)

at hudson.model.Build$BuildExecution.post2(Build.java:185)
at
hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:665)

at hudson.model.Run.execute(Run.java:1753)
at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
at
hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:404)
Build step 'JIRA: Create issue' marked build as failure
Finished: FAILURE


Re: Merge policy for the 1.6 branch

2017-03-16 Thread Michael Dürig


My interpretation of this is as follows:

1.  I send out a notification before merging changes
2a. I think review should pass anyway -> go ahead and merge
2b. Otherwise: give others some time to look at it before merging,
depending on complexity, availability etc.
3.  Optional: In case of review failure after the fact -> revert again

With 3 we don't limit ourselves to perform the review in a fix time frame,
which might not be feasible.


That's exactly what I had in mind!

Michael


Wdyt?

Kind regards

Angela

On 16/03/17 09:53, "Davide Giannella"  wrote:


On 14/03/2017 10:59, Michael Dürig wrote:


In short, announce your backport on @oak-dev and ask for review. If
confident enough that the review will pass anyway, go ahead but be
prepared to revert.


+1 if we time box it for each backport. For example 3 days or whatever.
Something like we do for releases. This is to prevent a backport to be
stalling for too long. We may even define a vote policy like for
releases but to be taken on the issue itself rather than here in the list.

Davide






Re: Merge policy for the 1.6 branch

2017-03-14 Thread Michael Dürig


I don't think this works well:

On 14.03.17 14:04, Julian Reschke wrote:

Let me suggest something else:

a) follow commit emails,


As outlined in my previous mail this distributes the effort of figuring 
out the particulars of a backport to every committer where it would be 
less effort to just write a single short message to @oak-dev. Also due 
to the large volume of traffic on @commits it is too easy to miss 
something.



b) when we do a release from a stable branch, actually review what
changed, instead of just ~3 people only running the release checker script.


The release process is mostly about compliance with ASF licence 
requirements [1]. Apart from that when releasing it is not the 
appropriate time to discuss individual issues, their potential impact 
and risks. This is too late in the game. Such issues should be discussed 
close to the time when they are being worked on.


Michael

[1] http://www.apache.org/dev/release-publishing.html


  1   2   3   4   5   6   7   8   9   10   >