Export org.apache.jackrabbit.oak.cache package from oak-core (OAK-3598)
For OAK-3092 oak-lucene would need to access classes from org.apache.jackrabbit.oak.cache package. For now its limited to CacheStats to expose the cache related statistics I have opened for that OAK-3598. Kindly provide your feedback there around wether its fine to start exporting this package for consumption by oak-lucene Chetan Mehrotra
Re: Why does oak-core import every package with an optional resolution?
Looking at history of oak-core/pom.xml this change was done in [1] for OAK-1708 most like to support loading of various DB drivers from within Oak Core and probably a temp change which was not looked back again. That might not be required now as the DataSource gets injected and oak-core need not be aware of drivers etc. So we can get rid of that @Julian - Thoughts? Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/commit/7f844d5cde52dc53c41cc01aad9079afb275438a On Mon, Oct 26, 2015 at 4:20 PM, Francesco Mari <mari.france...@gmail.com> wrote: > A friendly reminder of this issue. Is there a specific reason why > every dependency in oak-core has an optional resolution? > > 2015-10-22 15:34 GMT+02:00 Francesco Mari <mari.france...@gmail.com>: >> Hi, >> >> can somebody explain me why oak-core has the "Import-Package" >> directive set to "*;resolution:=optional"? >> >> The effect of this directive is that *every* imported package is >> declared optional in the manifest file. Because of this, the OSGi >> framework will always resolve the oak-core bundle, even if some of its >> requirements are not satisfied. In particular, oak-core must always be >> ready to cope with NoClassDefFoundExceptions. >> >> We should definitely fix this.
Re: svn commit: r1710162 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak: api/PropertyState.java api/Tree.java api/Type.java api/package-info.java plugins/memory/ModifiedNo
On Fri, Oct 23, 2015 at 3:15 PM, <f...@apache.org> wrote: > public final class Type implements Comparable<Type> { > > -private static final Map<String, Type> TYPES = newHashMap(); > +private static final Map<String, Type> TYPES = new HashMap<String, > Type>(); > > private static Type create(int tag, boolean array, String string) > { > Type type = new Type(tag, array, string); > @@ -242,10 +237,23 @@ public final class Type implements Co > > @Override > public int compareTo(@Nonnull Type that) { > -return ComparisonChain.start() > -.compare(tag, that.tag) > -.compareFalseFirst(array, that.array) > -.result(); > +if (tag < that.tag) { > +return -1; > +} > + > +if (tag > that.tag) { > +return 1; > +} > + > +if (!array && that.array) { > +return -1; > +} > + > +if (array && !that.array) { > +return 1; > +} > + > +return 0; > } I am fine with removing dependency on Guava from API but not sure if we should remove it from implementation side Also it would be good to have a testcase to validate the above logic if we are removing dependency from a tested Guava utility method Chetan Mehrotra
Re: Reindexing problems
> (a) Hardcode (not rely on the Whiteboard or OSGi) the known indexes That would not work if the implementation makes use of OSGi features like configuration or DI. For e.g. Lucene implementation relies on OSGi config and also to expose certain extension points > (b) Where we can't use hardcoding, use hard service references (Whiteboard / > OSGi). +1. That would be preferable. I think we can go for approach taken in OAK-3201 as depending on setup even custom implementation might be required. So just hard references would not help and we would need to make the component which registers repository to be aware of all its *required* dependencies > (c) If we can't do that, block or fail commits if one of the configured > indexes is not available, for example for the Solr index (if such an index is > configured). +1. Current approach is problamatic. Missing index provider is more of a setup issue which can be addressed by system admin and repository should not try to handle that. So failing the commit should be fine. > Additionally, for "synchronous" indexes (property index and so on), I would > like to always create and reindex them asynchronously by default, That might be tricky for DocumentNodeStore as even if you build them asynchronously when final merge happens then it might be very expensive to deal with such a large branch commit. Also if a critical index like uuid/reference index it would be better if system does not get started otherwise it would trigger large traversal if no index was present or previous revision of index is not usable (due to some corruption) Chetan Mehrotra On Wed, Oct 21, 2015 at 2:24 PM, Thomas Mueller <muel...@adobe.com> wrote: > Hi, > > If an index provider is (temporarily) not available, the > MissingIndexProviderStrategy resets the index so it is re-indexed. This is a > problem (OAK-2024, OAK-2203, OAK-2429, OAK-3325, OAK-3366, OAK-3505, > OAK-3512, OAK-3513), because re-indexing is slow and one transaction. It can > also cause many threads to concurrently build the index. Currently, > synchronous indexes are built in one "transaction", which is anyway a > performance problem (for new indexes and reindexing). If an index is not > available when running a query, traversal is used, which is also a problem. > > What about: > > * (a) Hardcode (not rely on the Whiteboard or OSGi) the known indexes for > property, reference, nodeType, lucene, counter index. This is for both > writing (IndexEditor) and reading (QueryIndex) . That way, those indexes are > always available, and we never get into a situation where they are > temporarily not available. > > * (b) Where we can't use hardcoding, use hard service references (Whiteboard > / OSGi). > > * (c) If we can't do that, block or fail commits if one of the configured > indexes is not available, for example for the Solr index (if such an index is > configured). > > Additionally, for "synchronous" indexes (property index and so on), I would > like to always create and reindex them asynchronously by default, and only > once they are available switch to sychronous mode. I think (but I'm not sure) > this is OAK-1456. > > What do you think? > > Regards, > Thomas >
Re: jackrabbit-oak build #6619: Errored
Past failures having following different reasons - 2 failed with timeout - 1 failed with OOM - 1 failed with due to intermittent test failure in QueryResultTest#testGetSize (OAK-2689) - 1 failed in Segment test which looks like a new one @Michale/Alex can you have a look? Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.158 sec <<< FAILURE! removeSome[0](org.apache.jackrabbit.oak.plugins.segment.CompactionMapTest) Time elapsed: 0.26 sec <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNull(Assert.java:551) at org.junit.Assert.assertNull(Assert.java:562) at org.apache.jackrabbit.oak.plugins.segment.CompactionMapTest.removeSome(CompactionMapTest.java:156) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ==== Chetan Mehrotra On Tue, Oct 13, 2015 at 12:16 AM, Travis CI <ju...@apache.org> wrote: > Build Update for apache/jackrabbit-oak > - > > Build: #6619 > Status: Errored > > Duration: 3004 seconds > Commit: 3e80198beb76e65934524d2467ce22e2cf0b7fe9 (1.0) > Author: Chetan Mehrotra > Message: OAK-3504 - CopyOnRead directory should not schedule a copy task for > non existent file > > Merging 1708105 > > > git-svn-id: > https://svn.apache.org/repos/asf/jackrabbit/oak/branches/1.0@1708108 > 13f79535-47bb-0310-9956-ffa450edef68 > > View the changeset: > https://github.com/apache/jackrabbit-oak/compare/ea3cd2c2bd5c...3e80198beb76 > > View the full build log and details: > https://travis-ci.org/apache/jackrabbit-oak/builds/84919313 > > -- > sent by Jukka's Travis notification gateway
Re: svn commit: r1704844 - in /jackrabbit/oak/trunk: oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/counter/jmx/ oak-core/src/main/java/org/apache/jackrabbit/oak/query/ oak-core/src/ma
On Thu, Sep 24, 2015 at 1:56 PM, Thomas Mueller <muel...@adobe.com> wrote: > what about getIndexCostInfo +1 Chetan Mehrotra
Re: svn commit: r1704844 - in /jackrabbit/oak/trunk: oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/counter/jmx/ oak-core/src/main/java/org/apache/jackrabbit/oak/query/ oak-core/src/ma
Hi Thomas, On Wed, Sep 23, 2015 at 6:51 PM, <thom...@apache.org> wrote: > /** > + * Get the index cost. The query must already be prepared. > + * > + * @return the index cost > + */ > +String getIndexCost(); Should this be returning string? May be we should name it better Chetan Mehrotra
Re: svn commit: r1704655 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/document/ test/java/org/apache/jackrabbit/oak/plugins/document/ test/java/org/apache/jackr
Hi Marcel, A short description of what was the fix would be very helpful for future reference! Chetan Mehrotra On Tue, Sep 22, 2015 at 9:00 PM, <mreut...@apache.org> wrote: > Author: mreutegg > Date: Tue Sep 22 15:30:08 2015 > New Revision: 1704655 > > URL: http://svn.apache.org/viewvc?rev=1704655=rev > Log: > OAK-3433: Commit does not detect conflict when background read happens after > rebase > > Add yet another test, enable existing test and implement fix > > Modified: > > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStore.java > > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/LastRevRecoveryAgent.java > > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/UnsavedModifications.java > > jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/JournalTest.java > > jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/mongo/ClusterConflictTest.java > > Modified: > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStore.java > URL: > http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStore.java?rev=1704655=1704654=1704655=diff > == > --- > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStore.java > (original) > +++ > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStore.java > Tue Sep 22 15:30:08 2015 > @@ -2084,9 +2084,9 @@ public final class DocumentNodeStore > BackgroundWriteStats backgroundWrite() { > return unsavedLastRevisions.persist(this, new > UnsavedModifications.Snapshot() { > @Override > -public void acquiring() { > +public void acquiring(Revision mostRecent) { > if (store.create(JOURNAL, > - > singletonList(changes.asUpdateOp(getHeadRevision() { > +singletonList(changes.asUpdateOp(mostRecent { > changes = JOURNAL.newDocument(getDocumentStore()); > } > } > > Modified: > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/LastRevRecoveryAgent.java > URL: > http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/LastRevRecoveryAgent.java?rev=1704655=1704654=1704655=diff > == > --- > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/LastRevRecoveryAgent.java > (original) > +++ > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/LastRevRecoveryAgent.java > Tue Sep 22 15:30:08 2015 > @@ -235,7 +235,7 @@ public class LastRevRecoveryAgent { > unsaved.persist(nodeStore, new UnsavedModifications.Snapshot() { > > @Override > -public void acquiring() { > +public void acquiring(Revision mostRecent) { > if (lastRootRev == null) { > // this should never happen - when unsaved has no > changes > // that is reflected in the 'map' to be empty - in > that > > Modified: > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/UnsavedModifications.java > URL: > http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/UnsavedModifications.java?rev=1704655=1704654=1704655=diff > == > --- > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/UnsavedModifications.java > (original) > +++ > jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/UnsavedModifications.java > Tue Sep 22 15:30:08 2015 > @@ -159,7 +159,7 @@ class UnsavedModifications { > time = clock.getTime(); > Map<String, Revision> pending; > try { > -snapshot.acquiring(); > +snapshot.acquiring(getMostRecentRevision()); > pending = Maps.newTreeMap(PathComparator.INSTANCE); > pending.putAll(map); > } finally { > @@ -234,14 +234,26 @@ class UnsavedModifications { &g
Re: JCR node name and non space whitespace chars like line break etc
Opened OAK-3412 to track this issue. Chetan Mehrotra On Tue, Sep 15, 2015 at 3:24 PM, Julian Reschke <julian.resc...@gmx.de> wrote: > On 2015-09-15 10:03, Chetan Mehrotra wrote: >> >> If I change the code in Namespace#isValidLocalName to use >> Character#isWhitespace then NameValidator starts objecting to node >> names having non space whitespace chars. So looks like current impl >> has a issue and we should fix it. But fixing it would be now a >> backward incompatible change! >> >> Thoughts? >> Chetan Mehrotra > > > We should fix it in any case. It's compatible with the original code, and > with Jackrabbit. > > Best regards, Julian
Re: JCR node name and non space whitespace chars like line break etc
If I change the code in Namespace#isValidLocalName to use Character#isWhitespace then NameValidator starts objecting to node names having non space whitespace chars. So looks like current impl has a issue and we should fix it. But fixing it would be now a backward incompatible change! Thoughts? Chetan Mehrotra On Mon, Sep 14, 2015 at 4:13 PM, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > Micheal Durig mentioned that this has been discussed earlier. So > going back in time this was discussed in OAK-1891 and thread [1]. Oak > used to prevent such nodes from getting created earlier but that logic > was changed as part of OAK-1174 and r1582804 [0] and check was moved > to NameValidator class (see Namespace#isValidLocalName). > > However when that change was done the check initially used > Character#isWhitespace and switched to using Character#isSpaceChar > which is limited to very few checks. Now looks like isSpaceChar is > returning false for '\n', '\r' etc not sure why. > > Chetan Mehrotra > [0] > https://github.com/apache/jackrabbit-oak/commit/342809f7f04221782ca6bbfbde9392ec4ff441c2 > > [1] > http://mail-archives.apache.org/mod_mbox/jackrabbit-oak-dev/201406.mbox/%3ccab+dfin-smo5egc-m2ma_wwhar8eme+czwwdob1wjuvej+n...@mail.gmail.com%3E > > > On Mon, Sep 14, 2015 at 3:28 PM, Michael Marth <mma...@adobe.com> wrote: >> Hi Chetan, >> >> Given that JR2 did not allow those characters I see no good reason why Oak >> should. >> >> my2c >> Michael >> >> >> >> >> On 14/09/15 11:47, "Chetan Mehrotra" <chetan.mehro...@gmail.com> wrote: >> >>>Hi Team, >>> >>>While looking into OAK-3395 it was realized that in Oak we allow node >>>name with non space whitespace chars like \t, \r etc. This is >>>currently causing problem in DocumentNodeStore logic (which can be >>>fixed). >>> >>>However it might be better to prevent such node name to be created as >>>it can cause problem other. Specially when JR2 does not allow creation >>>of such node names [1] >>> >>>So the question is >>> >>>Should Oak allow node names with non space whitespace chars like \t, \r etc >>> >>>Chetan Mehrotra >>>[1] >>>https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-spi-commons/src/main/java/org/apache/jackrabbit/spi/commons/conversion/PathParser.java#L257
Re: JCR node name and non space whitespace chars like line break etc
Micheal Durig mentioned that this has been discussed earlier. So going back in time this was discussed in OAK-1891 and thread [1]. Oak used to prevent such nodes from getting created earlier but that logic was changed as part of OAK-1174 and r1582804 [0] and check was moved to NameValidator class (see Namespace#isValidLocalName). However when that change was done the check initially used Character#isWhitespace and switched to using Character#isSpaceChar which is limited to very few checks. Now looks like isSpaceChar is returning false for '\n', '\r' etc not sure why. Chetan Mehrotra [0] https://github.com/apache/jackrabbit-oak/commit/342809f7f04221782ca6bbfbde9392ec4ff441c2 [1] http://mail-archives.apache.org/mod_mbox/jackrabbit-oak-dev/201406.mbox/%3ccab+dfin-smo5egc-m2ma_wwhar8eme+czwwdob1wjuvej+n...@mail.gmail.com%3E On Mon, Sep 14, 2015 at 3:28 PM, Michael Marth <mma...@adobe.com> wrote: > Hi Chetan, > > Given that JR2 did not allow those characters I see no good reason why Oak > should. > > my2c > Michael > > > > > On 14/09/15 11:47, "Chetan Mehrotra" <chetan.mehro...@gmail.com> wrote: > >>Hi Team, >> >>While looking into OAK-3395 it was realized that in Oak we allow node >>name with non space whitespace chars like \t, \r etc. This is >>currently causing problem in DocumentNodeStore logic (which can be >>fixed). >> >>However it might be better to prevent such node name to be created as >>it can cause problem other. Specially when JR2 does not allow creation >>of such node names [1] >> >>So the question is >> >>Should Oak allow node names with non space whitespace chars like \t, \r etc >> >>Chetan Mehrotra >>[1] >>https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-spi-commons/src/main/java/org/apache/jackrabbit/spi/commons/conversion/PathParser.java#L257
JCR node name and non space whitespace chars like line break etc
Hi Team, While looking into OAK-3395 it was realized that in Oak we allow node name with non space whitespace chars like \t, \r etc. This is currently causing problem in DocumentNodeStore logic (which can be fixed). However it might be better to prevent such node name to be created as it can cause problem other. Specially when JR2 does not allow creation of such node names [1] So the question is Should Oak allow node names with non space whitespace chars like \t, \r etc Chetan Mehrotra [1] https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-spi-commons/src/main/java/org/apache/jackrabbit/spi/commons/conversion/PathParser.java#L257
Re: Lucene Property Index and OR condition
How did you performed the test? If you tested out with explain then current code in 1.0/1.2 return misleading result and this got fixed in trunk with OAK-2943. Technically Oak would convert the OR clause to a union query and then each part of union should then be able to make use of index. Chetan Mehrotra On Mon, Sep 7, 2015 at 6:36 PM, Davide Giannella <dav...@apache.org> wrote: > On 07/09/2015 14:32, Burkhard Pauli wrote: >> ... >> Question: Does the Lucene property index support or conditions? I tried >> even with a oak property index without success. >> > > I can be be totally wrong, so please take this carefully, but AFAIR > lucene property index does not support ORs. > > This is mainly used for tests but should be valid for real-life as well > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/IndexPlanner.java#L452 > > Davide > >
Re: svn commit: r1700720 - in /jackrabbit/oak/trunk/oak-lucene/src: main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexEditorContext.java test/java/org/apache/jackrabbit/oak/jcr/query
Hi Tommaso, On Wed, Sep 2, 2015 at 1:28 PM, <tomm...@apache.org> wrote: > +Analyzer definitionAnalyzer = definition.getAnalyzer(); > +Map<String, Analyzer> analyzers = new HashMap<String, > Analyzer>(); > +analyzers.put(FieldNames.SPELLCHECK, new > ShingleAnalyzerWrapper(LuceneIndexConstants.ANALYZER, 3)); > +Analyzer analyzer = new > PerFieldAnalyzerWrapper(definitionAnalyzer, analyzers); > +IndexWriterConfig config = new IndexWriterConfig(VERSION, > analyzer); Have a look at IndexDefinition#createAnalyzer which already creates a PerFieldAnalyzerWrapper. So would be better to move this logic there. Or you want to customize the analyzer for that field only during indexing. Chetan Mehrotra
Re: [VOTE] Epics in Jira
+1 Chetan Mehrotra On Tue, Sep 1, 2015 at 1:10 PM, Davide Giannella <dav...@apache.org> wrote: > Hello team, > > some of us noticed we lack the epics in our jira so I raised an issue > asking whether that would be possible to have them. > > https://issues.apache.org/jira/browse/INFRA-10185 > > had a I reply (which TBH didn't really understand completely). Feel free > to follow-up on the issue itself if you require more details. > > Can we start a vote session for changing our jira schema to allows epics? > > My vote is +1. > > Cheers > Davide > >
Re: New committer: Francesco Mari
Welcome Francesco! Chetan Mehrotra On Fri, Aug 28, 2015 at 5:18 PM, Michael Dürig mdue...@apache.org wrote: Hi, Please welcome Francesco as a new committer and PMC member of the Apache Jackrabbit project. The Jackrabbit PMC recently decided to offer Francesco committership based on his contributions. I'm happy to announce that he accepted the offer and that all the related administrative work has now been taken care of. Welcome to the team, Francesco! Michael
Re: persistent set of strings
Hi Tomek, To start with I think a flat file based approach should be fine. While working on [1] it was observed that 2M blobId consumed 500MB memory. As this logic is to be implemented in oak-run probably it should be fine for now to just use a in memory HashSet Later if it becomes problem we can think of some offheap solution. You can also look into using MVStore which is being used in DocumentNodeStore for persistent cache. Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-2882?focusedCommentId=14550198page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14550198 On Mon, Aug 24, 2015 at 5:17 PM, Tomek Rekawek reka...@adobe.com wrote: Hello, I started working on OAK-3148, which is a new feature that allows to gradually migrate blobs from one store to another, without turning off the instance. In order to create the SplitBlobStore I need a way to remember (and save) already transferred blob ids. So, basically I need a persistent and mutable set of strings. Do we have something like this in Oak already? I thought about a few custom solutions: 1. Saving blob ids in a file (at the beginning it can be a flat text file, then some b-tree), with a memory cache and/or bloom filter. - but it adds complexity, requires the maintenance, etc. 2. Creating SegmentNodeStore, with bucketing via the hashcode - but running the second segment node store just to persist a bunch of ids seems a little excessive. 3. Custom cache solution, like ehcache - but adding a new, big library just to support this feature doesn’t seem right as we have to deal with dependency versions, embedding, etc. So, maybe there is some lightweight and reliable “4” in the Oak already? Thanks, Tomek -- Tomek Rękawek | Adobe Research | www.adobe.com reka...@adobe.com
Re: [Oak origin/trunk] Apache Jackrabbit Oak matrix - Build # 335 - Still Failing
On Tue, Aug 18, 2015 at 8:12 PM, Michael Dürig mdue...@apache.org wrote: This is caused by MBeanIntegrationTest. Any idea what could have caused this? Chetan? Seen this before also and was not able to reproduce locally. Looking again today I realized that its resource cleanup issue and some test is not shutting down the repository properly leaving mbeans registered. The order in which test get executed on CI and on local differs. Its bit hard to reproduce the same order of execution so have to create suite and reduce the possible candidates to 2 import org.junit.runner.RunWith; import org.junit.runners.Suite; @RunWith(Suite.class) @Suite.SuiteClasses({SimpleRepositoryFactoryTest.class, MBeanIntegrationTest.class, }) public class TestSuite { } And then execute that within IDE or command line and issue was seen. SimpleRepositoryFactoryTest was not closing the created repo. Fixed that now with rev http://svn.apache.org/r1696522. Hopefully this should fix the issue! Chetan Mehrotra
Re: [Oak origin/trunk] Apache Jackrabbit Oak matrix - Build # 337 - Still Failing
Failure is again in MBeanServerIntegration test. However the build was done at rev '63c4a4db95b0f39f9ab27f499416c99c918a4955' [1] which is 2 days old and has yet not picked my changes of last day. Lets see couple of more run till it fetches new revision btw find it strange that build is not on svn but git as the git mirrors might lag behind! Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/commit/63c4a4db95b0f39f9ab27f499416c99c918a4955 On Wed, Aug 19, 2015 at 9:35 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build #337) Status: Still Failing Check console output at https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/337/ to view the results.
Recipe for using Oak in standalone environments
Hi, Off late I have seen quite a few queries from people trying to use Oak in non OSGi environment like embedding repository in a webapp deployed on Tomcat or just in standalone application. Our current documented way [1] is very limited and repository constructed in such a way does not expose the full power of Oak stack and user also has to know many internal setup details of Oak to get it working correctly. Quite a bit of setup logic in Oak is now dependent on OSGi configuration. Trying to setup Oak without using those would not provide a stable and performant setup. Instead of that if we can have a way to reuse all the OSGi based setup support and still enable users to use Oak in non OSGi env then that would provide a more stable setup approach. Recipe == For past sometime I have been working on oak-pojosr module [2]. This module can now stable and can be used to setup Oak with all the OSGi support in a non OSGi world like webapp. For an end user the steps required would be 1. Create a config file for enabling various parts of Oak { org.apache.felix.jaas.Configuration.factory-LoginModuleImpl: { jaas.controlFlag: required, jaas.classname: org.apache.jackrabbit.oak.security.authentication.user.LoginModuleImpl, jaas.ranking: 100 }, ..., org.apache.jackrabbit.oak.jcr.osgi.RepositoryManager: {}, org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService : { mongouri : mongodb://${mongodb.host}:${mongodb.port}, db : ${mongodb.name} } } 2. Add dependency to oak-pojosr and thus include various transitive dependencies like Felix Connect, SCR, ConfigAdmin etc 3. Construct a repository instance import org.apache.jackrabbit.commons.JcrUtils; MapString,String config = new HashMapString, String(); config.put(org.apache.jackrabbit.repository.home, /path/to/repo); config.put(org.apache.jackrabbit.oak.repository.configFile, /path/to/oak-config.json); Repository repository = JcrUtils.getRepository(config); Thats all! This would construct a full stack Oak based on OSGi config with all Lucene, Solr support usable. Examples WebApp I have adapted the existing Jackrabbit Webapp module to work with Oak and be constructed based on oak-pojor [3]. You can check out the app and just run mvn jetty:run Access the webui at http://localhost:8080 and create a repository as per UI. It currently has following features 1. Repository is configurable via a JSON file copies to 'oak' folder (default) 2. Felix WebConsole is integrated - Allows developer to view OSGi state and config etc Check /osgi/system/console 3. Felix Script Console integrated to get programatic access to repository 4. All Oak MBean registered and can be used by user to perform maintainence tasks Spring Boot Clay has been working a Oak based application [4] which uses Spring Boot [7]. The fork of the same at [5] is now using pojosr to configure a repository to be used in Spring [6]. In addition again Felix WebConsole etc would also work To try it out checkout the application and build it. Then run following command java -jar target/com.meta64.mobile-0.0.1-SNAPSHOT.jar --jcrHome=oak --jcrAdminPassword=password --aeskey=password --server.port=8990 --spring.config.location=classpath:/application.properties,classpath:/application-dev.properties And then access the app at 8990 port Proposal === Do share your feedback around above proposed approach. In particular following aspect Q - Should we make oak-pojosr based setup as one of the recommended/supported approach for configuring Oak in non OSGi env Chetan Mehrotra [1] http://jackrabbit.apache.org/oak/docs/construct.html [2] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-pojosr [3] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-examples/webapp [4] https://github.com/Clay-Ferguson/meta64 [5] https://github.com/chetanmeh/meta64/tree/oak-pojosr [6] https://github.com/chetanmeh/meta64/blob/oak-pojosr/src/main/java/com/meta64/mobile/repo/OakRepository.java#L218 [7] http://projects.spring.io/spring-boot/
Re: [Oak origin/trunk] Apache Jackrabbit Oak matrix - Build # 331 - Still Failing
There were 2 failures - Test Result (2 failures / -39) org.apache.jackrabbit.oak.jcr.OrderableNodesTest.setPrimaryType[0] org.apache.jackrabbit.oak.run.osgi.MBeanIntegrationTest.jmxIntegration - One of the failure was in PojoSR which looks like due to Repository shutdown not complete. Did a fix with OAK-3203. Hopefully that should fix it -- Assertion failed: assert mbeans.size() == 1 | | | | 2 false [org.apache.jackrabbit.oak.management.RepositoryManager[org.apache.jackrabbit.oak:name=repository manager,type=RepositoryManagement,id=146], org.apache.jackrabbit.oak.management.RepositoryManager[org.apache.jackrabbit.oak:name=repository manager,type=RepositoryManagement,id=44]] at org.codehaus.groovy.runtime.InvokerHelper.assertFailed(InvokerHelper.java:398) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.assertFailed(ScriptBytecodeAdapter.java:646) at org.apache.jackrabbit.oak.run.osgi.MBeanIntegrationTest.jmxIntegration(MBeanIntegrationTest.groovy:47) -- Chetan Mehrotra On Sat, Aug 15, 2015 at 10:39 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build #331) Status: Still Failing Check console output at https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/331/ to view the results.
Re: [Oak origin/trunk] Apache Jackrabbit Oak matrix - Build # 324 - Still Failing
Builds are failing due to missing snapshot dependency. And as the CI for Jackrabbit is not working (https://builds.apache.org/job/Jackrabbit-trunk/) the snapshots are not there. I am deploying snapshot builds from my system. Hopefully that should get this build going [ERROR] Failed to execute goal on project oak-blob-cloud: Could not resolve dependencies for project org.apache.jackrabbit:oak-blob-cloud:bundle:1.4-SNAPSHOT: The following artifacts could not be resolved: org.apache.jackrabbit:jackrabbit-jcr-commons:jar:2.11.0-SNAPSHOT, org.apache.jackrabbit:jackrabbit-data:jar:2.11.0-SNAPSHOT, org.apache.jackrabbit:jackrabbit-data:jar:tests:2.11.0-SNAPSHOT: Failure to find org.apache.jackrabbit:jackrabbit-jcr-commons:jar:2.11.0-SNAPSHOT in http://repository.apache.org/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of Nexus has elapsed or updates are forced - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :oak-blob-cloud Chetan Mehrotra On Wed, Aug 12, 2015 at 9:50 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build #324) Status: Still Failing Check console output at https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/324/ to view the results.
Re: [Oak origin/1.0] Apache Jackrabbit Oak matrix - Build # 325 - Still Failing
Now the compile passes but 55 test fail. Majority of them from Solr. Opened OAK-3215 to track that Chetan Mehrotra On Wed, Aug 12, 2015 at 10:38 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build #325) Status: Still Failing Check console output at https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/325/ to view the results.
Re: build failure due to oak-pojosr ??
Looks like some issue with my recent work. Would have a look. Thanks for filing the issue and marking current one as ignored! Chetan Mehrotra On Mon, Aug 10, 2015 at 2:02 PM, Angela Schreiber anch...@adobe.com wrote: hi i get the following failures in oak-pojosr. is it only me? anybody working on this? Running org.apache.jackrabbit.oak.run.osgi.LuceneSupportTest Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 11.447 sec FAILURE! fullTextSearch(org.apache.jackrabbit.oak.run.osgi.LuceneSupportTest) Time elapsed: 11.442 sec ERROR! java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy14.shutdown(Unknown Source) at org.apache.jackrabbit.api.JackrabbitRepository$shutdown.call(Unknown Source) at org.apache.jackrabbit.oak.run.osgi.AbstractRepositoryFactoryTest.tearDown(A bstractRepositoryFactoryTest.groovy:61) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:5 7) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImp l.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod .java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable. java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.j ava:42) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:36) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.ja va:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.ja va:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java :252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provid er.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java: 112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:5 7) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImp l.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(Reflec tionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(Provi derFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFac tory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBoot er.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:5 7) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImp l.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.jackrabbit.oak.run.osgi.OakOSGiRepositoryFactory$RepositoryProxy .invoke(OakOSGiRepositoryFactory.java:396) ... 34 more Caused by: java.lang.IllegalStateException: Service already unregistered. at org.apache.felix.connect.felix.framework.ServiceRegistrationImpl.unregister (ServiceRegistrationImpl.java:128) at org.apache.jackrabbit.oak.osgi.OsgiWhiteboard$1.unregister(OsgiWhiteboard.j ava:75) at org.apache.jackrabbit.oak.spi.whiteboard.CompositeRegistration.unregister(C ompositeRegistration.java:43) at org.apache.jackrabbit.oak.Oak$6.close(Oak.java:640) at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:303) at org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.shutdown(Repository Impl.java:308) ... 39 more Running org.apache.jackrabbit.oak.run.osgi.NodeStoreConfigTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.129 sec Running
Re: Copy jackrabbit-webapp module to Oak as oak-webapp
Is this an official part of Oak or rather an example? For now its an example untill its stable and we have a consensus on wether proposed approach should be the recommended approach for configuring Oak in standalone cases. if it's an example we can put it to oak-example. That should be fine. So should the final location be oak/oak-example/webapp ? I can then move the current copy to that place. I have some local commit now in my git-svn. Once I am done I would commit then to current place and then move them to final place. Would that be OK? Chetan Mehrotra
Updating a single page on the site
At times we need to modify only single page on the site. And while doing that many times we deploy the whole site. The effort can be reduced slightly by using following approach 1. In oak-doc module make require changes in markdown files 2. Run `mvn site-deploy -Dscmpublish.skipCheckin=true` 3. Then go to oak-doc/target/scmpublish-checkout 4. and directly commit the changed html files via svn commit This is also documented under oak-doc/README.md [1] and has proved to be a faster approach for me compared to building and deploying the whole site Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-doc
Re: [VOTE] Release Apache Jackrabbit 2.10.2
On Thu, Aug 6, 2015 at 2:21 PM, Angela Schreiber anch...@adobe.com wrote: i am fine... quite frankly i am surprised to see that there is no 2.10 branch. I think this was discussed earlier [1]. Looks like we would have to revisit that decision and continue with stable/unstable releases Chetan Mehrotra [1] http://markmail.org/thread/p7k6lzbebgrgoz63
Copy jackrabbit-webapp module to Oak as oak-webapp
Hi Team, Currently we do not have good example around how to run Oak properly in standalone environment. One of the good example is jackrabbit-webapp [1] module which serve as a blueprint for any user on how to embed Oak. Currently this module only enables running Oak with Segment store and that to with most basic setup. I would like to modify this module to use oak-pojorsr [2] to configure complete Oak stack as we do it in Sling. For that I would like to copy this module to oak under oak-webapp and then refactor it to run complete Oak stack. Thoughts? Chetan Mehrotra [1] https://github.com/apache/jackrabbit/tree/trunk/jackrabbit-webapp [2] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-pojosr
Re: Copy jackrabbit-webapp module to Oak as oak-webapp
On Thu, Aug 6, 2015 at 12:16 AM, Davide Giannella dav...@apache.org wrote: Will then mean it will work as http API for Oak? I'm not familiar with jackrabbit-webapp jackrabbit-webapp demonstrates a way to configure Jackrabbit repository in standalone env and have it running in a WebApp. It also configures the webdav servlet and JCR Remoting which work with any repository implementation and thus should work with Oak. Chetan Mehrotra
Re: 1.3.4 cut
Hi Davide, It would be helpful if while moving the bugs to next version we do not add any comment like 'Bulk Move to xxx'. This would reduce the unnecessary noise in bug comment history. Recently this was discussed on DL at [1] Chetan Mehrotra [1] http://markmail.org/thread/2jvphlkdw4eqaxdh On Mon, Aug 3, 2015 at 11:38 AM, Davide Giannella dav...@apache.org wrote: Good morning team, today I'd like to cut 1.3.4. Ideally around 10AM CEST. We have 46 issues left and none marked as blocker. https://issues.apache.org/jira/issues/?jql=project%20%3D%20OAK%20AND%20fixVersion%20%3D%201.3.4%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC If you want any issue to be included in the cut, as usual simply say it. Please act on your issues if resolved or not and I'll bulk move all the others to 1.3.5. Cheers Davide PS: sorry for short notice. Just back from holidays and was looking at my agenda :)
Re: JCR + MongoDb + Lucene Holy Grail Finally Found
Thanks Clay for putting this together. Current documentation is not good for standalone usage as quite a bit of logic of configuring Oak is based on OSGi. Due to that using Oak as is in standalone environment is tricky The oak-pojosr [1] was intended to enable use of Oak with all its OSGi based config in non OSGi env like say war deployment. Need to get some time to finish it and adopt the standalone web example to use that Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-pojosr On Sun, Aug 2, 2015 at 11:26 PM, Clay Ferguson wcl...@gmail.com wrote: Fellow Jackrabbits, I discovered it was a *huge* effort and there is a *huge* lack in the examples related to simply getting MongoDb up and running (as JCR) with Lucene indexes getting properly used. Since this effort took me 2 solid days and there are no great examples that come up via google i'm sharing my example: This code creates a Full Text Search on jcr:content, and an sorting capability on jcr:lastModified: https://github.com/Clay-Ferguson/meta64/blob/master/src/main/java/com/meta64/mobile/repo/OakRepository.java I also just updated meta64 project to be using the 1.0.18 branch of the Jackrabbit code, so it's all up to date stuff. I would highly recommend adding this or a similar example right onto the Lucene page of the Oak docs, because what I'm doing is exactly what everyone else wants, and the documentation itself is just completely confusing and mind boggling without a real example. Cheers, and happy jackrabbiting. Best regards, Clay Ferguson wcl...@gmail.com meta64.com
Re: [VOTE] Release Apache Jackrabbit Oak 1.0.18
On Fri, Jul 31, 2015 at 5:39 PM, Julian Reschke julian.resc...@greenbytes.de wrote: Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.595 sec FAILURE! copyOnWriteAndLocks(org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorTest) Time elapsed: 0.182 sec ERROR! org.apache.jackrabbit.oak.api.CommitFailedException: OakLucene0003: Failed to index the node /test Thats a known issue on Windows OAK-3072. Not a blocker for this release Chetan Mehrotra
Re: [VOTE] Release Apache Jackrabbit Oak 1.0.18
+1 Chetan Mehrotra On Fri, Jul 31, 2015 at 3:44 PM, Alex Parvulescu alex.parvule...@gmail.com wrote: [X] +1 Release this package as Apache Jackrabbit Oak 1.0.18 On Fri, Jul 31, 2015 at 11:01 AM, Amit Jain am...@apache.org wrote: A candidate for the Jackrabbit Oak 1.0.18 release is available at: https://dist.apache.org/repos/dist/dev/jackrabbit/oak/1.0.18/ The release candidate is a zip archive of the sources in: https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.0.18/ The SHA1 checksum of the archive is ef8e68edfef9b0c1470fe9de4d10d127f741633b. A staged Maven repository is available for review at: https://repository.apache.org/ The command for running automated checks against this release candidate is: $ sh check-release.sh oak 1.0.18 ef8e68edfef9b0c1470fe9de4d10d127f741633b Please vote on releasing this package as Apache Jackrabbit Oak 1.0.18. The vote is open for the next 72 hours and passes if a majority of at least three +1 Jackrabbit PMC votes are cast. [ ] +1 Release this package as Apache Jackrabbit Oak 1.0.18 [ ] -1 Do not release this package because... My vote is +1 Thanks Amit
Do not add comments when bulk moves are performed in JIRA
Hi Team, Currently most of the issues scheduled for 1.3.x release have comments like 'Bulk Move to xxx'. This creates unnecessary noise in the comment log. Would it be possible to move the issues to next version silently i.e. just get fix version changed and not add any comment Chetan Mehrotra
Re: [discuss] Near real time search to account for latency in background indexing
On Fri, Jul 24, 2015 at 12:15 PM, Michael Marth mma...@adobe.com wrote: From your description I am not sure how the indexing would be triggered for local changes. Probably not through the Async Indexer (this would not gain us much, right?). Would this be a Commit Hook? My thought was to use an Observor so as to not add cost to commit call. Observor would listen only for local changes and would invoke IndexUpdate on the diff Chetan Mehrotra
Re: [discuss] Near real time search to account for latency in background indexing
On Fri, Jul 24, 2015 at 2:40 PM, Amit Jain am...@ieee.org wrote: Well that would work for a single node deployment when TarMK is used but would still have a lag as based on frequency of AsyncIndexer which we are seeing is having delays of upto 10-20 sec and may vary. For cluster where indexing is happening on a single node it cannot be used. But wouldn't that be a problem for the in-memory index also? Nopes. The in memory index would be managed on each cluster node and has visibility to *local* changes happening on that cluster node. So it would certainly not provide a cluster wide real time search. But it help in those cases where user is performing changes via his session established with a single cluster node and is not able to see affect of his latest changes. We have seen some problems reported so far, so we can wait further to see if the problem affect more usecases and then decide to invest in such a feature! Chetan Mehrotra
Re: [discuss] Near real time search to account for latency in background indexing
Hi Ian, To be clear the in memory index is purely ephemeral and is not meant to be persisted. It just compliments the persistent index to allow access to recently added/modified entries. So now to your queries How will you deal with JVM failure ? Do nothing. The index as explained is transient. Current AsyncIndex would anyway be performing the usual indexing and is resilient enough How frequently will commits to the persisted index be performed ? This index lives separately. Persisted index managed by AsyncIndex works as is I assume that switching to use ElasticSearch, which delivers NRT reliably in the 0.1s range has been rejected as an option ? No. The problem here is bit different. Lucene indexes are being used for all sort of indexing currently in Oak. In many cases its being used as purely property index. ES makes sense mostly for global fulltext index and would be an overkill for smaller more focused property index types of usecases. If it has, you may find yourself implementing much of the core of ElasticSearch to make NTR work properly in a cluster. Again usecase here is not to support NTR as is. Current indexing would work as is and this transient index would compliment it. Chetan Mehrotra On Fri, Jul 24, 2015 at 1:01 PM, Ian Boston i...@tfd.co.uk wrote: Hi Chetan, The overall approach looks ok. Some questions about indexing. How will you deal with JVM failure ? and related. How frequently will commits to the persisted index be performed ? I assume that switching to use ElasticSearch, which delivers NRT reliably in the 0.1s range has been rejected as an option ? If it has, you may find yourself implementing much of the core of ElasticSearch to make NTR work properly in a cluster. Best Regards Ian On 24 July 2015 at 08:09, Chetan Mehrotra chetan.mehro...@gmail.com wrote: On Fri, Jul 24, 2015 at 12:15 PM, Michael Marth mma...@adobe.com wrote: From your description I am not sure how the indexing would be triggered for local changes. Probably not through the Async Indexer (this would not gain us much, right?). Would this be a Commit Hook? My thought was to use an Observor so as to not add cost to commit call. Observor would listen only for local changes and would invoke IndexUpdate on the diff Chetan Mehrotra
Re: [discuss] Near real time search to account for latency in background indexing
On Fri, Jul 24, 2015 at 2:19 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: I think it'd be possible though to make use of Lucene's NRT capability by changing a bit the code that creates an IndexReader [2] to use DirectoryReader#open(IndexWriter,boolean) [3]. Well that would work for a single node deployment when TarMK is used but would still have a lag as based on frequency of AsyncIndexer which we are seeing is having delays of upto 10-20 sec and may vary. For cluster where indexing is happening on a single node it cannot be used. Chetan Mehrotra
Re: [discuss] Near real time search to account for latency in background indexing
On Fri, Jul 24, 2015 at 12:56 PM, Michael Marth mma...@adobe.com wrote: Would the indexer need to be Lucene-based? Need not be. The reason I preferred using Lucene is that current property index only support single condition evaluation. Having a Lucene based impl would allow most of current JCR Query - Lucene Query mapping logic getting reused as is. But yes it would be an interesting approach where instead of doing this as Lucene query index level QE can work with both indexes and combine the results. So an aspect to keep in mind Chetan Mehrotra
Re: Cleanup Callback to IndexEditor in case of exception in processing
On Thu, Jul 23, 2015 at 4:19 PM, Davide Giannella dav...@apache.org wrote: So to have a callback that is always invoked either on success or failure. For now just a callback for failure is missing. The editors current anyway perform required cleanup when leaving the root node which kind of act like a success callback. If we can get both then much better! Chetan Mehrotra
[discuss] Near real time search to account for latency in background indexing
Hi Team, As the use of async index like lucene is growing we would need to account for delay in showing updated result due to async nature of indexing. Depending on system load the asyn indexer might lag behind the latest state by some margin. We have improved quite a bit in terms of performance but by design there would be a lag and with load that lag would increase at times. For e.g. a typical flow in content authoring involves the user uploading some asset to application. And after uploading the asset he goes to the authoring view and look for that uploaded asset via content finder kind of ui. That ui relies on query to show the available assets. Due to delay introduced by async indexer it would take some time (10-15 sec) To account for that we can go for a near real time (NRT*) in memory indexing which would complement the actual persisted async indexer and would exploit the fact the request from same user in a give session would most likely hit same cluster node. Below is brief proposal - This would require changes in layer above in Oak but for now focus is on feasibility. Proposal === A - Indexing Side -- The Lucene index can be configured to support NRT mode. If this mode is enabled then on each cluster node we would perform AsyncIndex only for local changes. For such indexer LuceneIndexEditor would use a RAMDirectory. This directory would only have *recently* modified/added documents. B - Query Side --- On Query side the LucenePropertyIndex would perform search against two IndexSearcher 1. IndexSearcher based on persisted OakDirectory 2. IndexSearcher obtained from the current active IndexWrite used with RAMDirectory Query would be performed against both and a merged cursor [2] would be returned back C - Benefits This approach would allow the user to at least see his modifications appear quickly in search results and would make the search results accuracy more deterministic. This feature need not be enabled globally but can be enabled on per index basis. Based on business requirement D- Challenges --- 1. Ensuring that RAMDirectory is bounded and only contain recently modified documents. The lower limit can be based on last indexed time from AsyncIndexer. Periodically we would need to prune old documents from this RAMDirectory 2. IndexUpdate would need to be adapted to support this hybrid model for same index type - So something to be looked into Thoughts? Chetan Mehrotra NRT - Near real Time is technically a Lucene term https://wiki.apache.org/lucene-java/NearRealtimeSearch. However using here as approach is bit similar! [2] Such a merged cursor and performing query against multiple searcher would anyway be required to support zero downtime kind of requirement where index content would be split across local and global instance
Re: svn commit: r1692367 - /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreService.java
On Thu, Jul 23, 2015 at 3:08 PM, thom...@apache.org wrote: OAK-260 Avoid the Turkish Locale Problem So the fix version is ... ;) Chetan Mehrotra
Re: svn commit: r1692177 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins: blob/BlobStoreBlob.java document/DocumentNodeStore.java
On Wed, Jul 22, 2015 at 4:33 PM, Manfred Baedke manfred.bae...@gmail.com wrote: which is just as broken in IMHO and failed to write blobs in Oak2Oak migration scenarios. Thats why I would like to have a testcase for this. So far current designed assumed that BlobStore is singleton. Supporting blobs from multiple BlobStore would impact other places also. I would also consider two blobs to be equal iff they contain the same binary content, which is also not the contract we use for BlobStoreBlob. Have a look at AbstractBlob#equal on what is considered as a blob equality contract which takes care of that. So probably we use that here So if we plan to support such case then we cleanup this part first and then fix it. wdyt? Chetan Mehrotra
Re: svn commit: r1692177 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins: blob/BlobStoreBlob.java document/DocumentNodeStore.java
On Wed, Jul 22, 2015 at 8:01 PM, Manfred Baedke manfred.bae...@gmail.com wrote: verifying that the BlobStoreBlob in question comes from this very instance. It shouldn't use equals(), though. Makes sense. Lets discuss this via patch on the bug itself then! Chetan Mehrotra
Re: svn commit: r1692177 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins: blob/BlobStoreBlob.java document/DocumentNodeStore.java
Hi Manfred, On Tue, Jul 21, 2015 at 10:58 PM, bae...@apache.org wrote: +if (bsbBlobStore != null bsbBlobStore.equals(blobStore)) { +return bsb.getBlobId(); +} Can we have a testcase for this scenario? So far we do not have requirement to support equality for BlobStore instance. So would like to understand the usecase preferably with a testcase. May be the problem affect SegmentNodeStore also (not sure though) Chetan Mehrotra
Re: svn commit: r1692177 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins: blob/BlobStoreBlob.java document/DocumentNodeStore.java
Also note that its possible that later we may be wrapping the BlobStore instance. For example add a wrapper for monitioring and in such a case this equality condition might fail. A more stable fix would be to check with registered BlobStore weather it knows the given blobId or not. Chetan Mehrotra On Wed, Jul 22, 2015 at 9:09 AM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: Hi Manfred, On Tue, Jul 21, 2015 at 10:58 PM, bae...@apache.org wrote: +if (bsbBlobStore != null bsbBlobStore.equals(blobStore)) { +return bsb.getBlobId(); +} Can we have a testcase for this scenario? So far we do not have requirement to support equality for BlobStore instance. So would like to understand the usecase preferably with a testcase. May be the problem affect SegmentNodeStore also (not sure though) Chetan Mehrotra
Re: New committer: Stefan Egli
Welcome Stefan! Chetan Mehrotra On Mon, Jul 20, 2015 at 1:56 PM, Michael Dürig mdue...@apache.org wrote: Hi, Please welcome Stefan as a new committer and PMC member of the Apache Jackrabbit project. The Jackrabbit PMC recently decided to offer Stefan committership based on his contributions. I'm happy to announce that he accepted the offer and that all the related administrative work has now been taken care of. Welcome to the team, Stefan! Michael
Utility method for show time duration in words
Hi, At times I feel a need for a utility method which can convert time in mills to words for logs and JMX There are two options I see 1. commons-lang DurationFormatterUtils [1] - Adding dependency to whole of commons-lang might not make sense. So we can probably copy it. It though depends on others common lang classes so copying would be tricky 2. Guava Stopwatch private method [2] - Guava Stopwatch internally has such a method but its not exposed. Probably we can copy that and expose that in oak-commons. Thoughts? Chetan Mehrotra [1] https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/time/DurationFormatUtils.java [2] https://github.com/google/guava/blob/master/guava/src/com/google/common/base/Stopwatch.java#L216
Re: svn commit: r1690941 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/document/rdb/ test/java/org/apache/jackrabbit/oak/plugins/document/
Hi Julian, On Tue, Jul 14, 2015 at 7:57 PM, resc...@apache.org wrote: + +ListString ids = new ArrayListString(); +for (T doc : documents) { +ids.add(doc.getId()); +} +LOG.debug(insert of + ids + failed, ex); + +// collect additional exceptions +String messages = LOG.isDebugEnabled() ? RDBJDBCTools.getAdditionalMessages(ex) : ; +if (!messages.isEmpty()) { +LOG.debug(additional diagnostics: + messages); +} If all that work is to be done for debug logging then probably the whole block should be within isDebugEnabled check + LOG.debug(insert of + ids + failed, ex); Instead of using concatenation it would be better to use placeholders like LOG.debug(insert of {} failed,ids, ex); Chetan Mehrotra
Re: RDBDocumentStore using same table schema for all collections
On Mon, Jul 13, 2015 at 3:53 PM, Julian Reschke julian.resc...@gmx.de wrote: Simplicity and the complete lack of contract. How would the DS implementation *know* what needs to be indexed? Then we should define the contract. What needs to be indexed is an important information and should be made available to DocumentStore explicitly might be large is guesswork, no? The additional columns are all numbers/flags. For now yes a guesswork. But logically this duplication should not happen! In addition most of such indexes are defined as sparse in Mongo. For RDB I think there would be DB specific approaches for creating sparse index. Currently RDBDocumentStore stores some default value if no value is specified. It might be better to store null there [1] Each *document*? Did you mean collection? Yes I meant collection there. Chetan Mehrotra [1] http://stackoverflow.com/questions/8764910/sparse-column-vs-indirection
Re: svn commit: r1690861 - /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java
If this looks like a bug (which it appears to be) then it would be better if we create an issue for this and have that merged to branches also Chetan Mehrotra On Tue, Jul 14, 2015 at 9:28 AM, dbros...@apache.org wrote: Author: dbrosius Date: Tue Jul 14 03:58:28 2015 New Revision: 1690861 URL: http://svn.apache.org/r1690861 Log: fix typo in equals which did not validate that parm was of right type Modified: jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java Modified: jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java?rev=1690861r1=1690860r2=1690861view=diff == --- jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java (original) +++ jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java Tue Jul 14 03:58:28 2015 @@ -722,7 +722,7 @@ public class SelectorImpl extends Source public boolean equals(Object other) { if (this == other) { return true; -} else if (!(this instanceof SelectorImpl)) { +} else if (!(other instanceof SelectorImpl)) { return false; } return selectorName.equals(((SelectorImpl) other).selectorName);
Re: [VOTE] Release Apache Jackrabbit Oak 1.0.17
+1. All checks passed Chetan Mehrotra On Fri, Jul 10, 2015 at 1:28 PM, Davide Giannella dav...@apache.org wrote: [X] +1 Release this package as Apache Jackrabbit Oak 1.0.17 Davide
Re: /oak:index (DocumentNodeStore)
On Thu, Jul 9, 2015 at 12:45 PM, Marcel Reutegger mreut...@adobe.com wrote: - Data in Oak is multi-versioned. It must be possible to query nodes at a specific revision of the tree. To add - That also makes it difficult to use Mongo indexes as the index itself is versioned. So instead of just indexing property 'foo' you need to index it for every revision Chetan Mehrotra
RDBDocumentStore using same table schema for all collections
Hi, Looking at RDBDocumentStore it appears that it is using same table schema for all collections. For e.g. columns like deletedOnce, hasBinary are only required for NodeDocument. However they are present in the tables Any specific reason for doing this and not going for schema per collection? This is fine for small collection like settings and clusterNodes. But for bigger collection like journal the overhead of such empty columns might be large. It would be better if each Document provides a set of column names along with types to be indexed and then RDBDocumentStore create the correct schema. Chetan Mehrotra
Managing backport work for issues fixed in trunk
Hi Team, Often we consider some issue need to be merged to one of the branch but it is not immediately required. For e.g. a practice we have recently started is to have some new feature implemented in trunk and then have it enabled by default there. Once we find it to be stable we enable it by default in branches. For such work I typically create 2 issues, A (e.g. OAK-3069) for actual work and B (e.g. OAK-3073) for tracking making it enabled by default. Now #B has to be marked resolved in trunk but I have to keep a mental note that this needs to be done in branch also sometime later. This approach is error prone. Instead if we make use of labels to mark issues which are suitable _candidates_ for merge to branches then we can track such issues via JIRA query and revisit them when we cut new releases on branch. I propose we use following labels candidate_oak_1_0 candidate_oak_1_2 For issues to be merged to 1.0 and 1.2 branches. Then later we can query for such issues. Thoughts? Chetan Mehrotra
Re: Baseline warnings
On Wed, Jul 1, 2015 at 12:33 PM, Marcel Reutegger mreut...@adobe.com wrote: I think we should explicitly manage the version of *all* exported packages and add the required package-info.java files now. Yup that should be done. I thought we did this already for all such exported package. Which ones we are seeing the error ... may be introduce later Chetan Mehrotra
Re: [Oak origin/trunk] Apache Jackrabbit Oak matrix - Build # 236 - Still Failing
Most failures in RemoteServerIT due to address already in use. Opened OAK-3065 for that java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187) at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316) at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.eclipse.jetty.server.Server.doStart(Server.java:291) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.apache.jackrabbit.oak.remote.http.handler.RemoteServer.start(RemoteServer.java:54) at org.apache.jackrabbit.oak.remote.http.handler.RemoteServerIT.setUp(RemoteServerIT.java:134) Chetan Mehrotra On Thu, Jul 2, 2015 at 9:51 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build #236) Status: Still Failing Check console output at https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/236/ to view the results.
Re: OSGi configuration lookup
On Tue, Jun 30, 2015 at 12:00 PM, Francesco Mari mari.france...@gmail.com wrote: I suggest to fix OAK-3022 maintaining exactly the same behaviour, and without changing the SegmentNodeStoreService Makes sense. They are two different issue Chetan Mehrotra
Re: OSGi configuration lookup
Looking at code flow now yes it differs. The thought behind reading from framework property first was to provide a simple way to override the config which might be packaged by default. For e.g. while launching Oak via Sling one can provide the framework property at command line (using -Doak.mongo.uri) which would supercede the one packaged by default. This simplifies the testing. Chetan Mehrotra On Mon, Jun 29, 2015 at 7:01 PM, Davide Giannella dav...@apache.org wrote: On 29/06/2015 10:22, Francesco Mari wrote: ... Is it possible - or does it make sense - to make this behaviour uniform across components? I think it's a good idea to uniform this aspect. Maybe we could put it down as a guideline by setting up a new page on the doc site: code-conventions.md. Somewhere beside: http://jackrabbit.apache.org/oak/docs/dev_getting_started.html Personally I'd go for component first and bundle then, but I'm not too religious about it :) Anyone against it? Davide
Re: OSGi configuration lookup
That can be done but some more details! When properties are read from framework properties then property names are prefixed with 'oak.documentstore.' kind of like namespaced as framework properties are flat and global. So if we need to do that for Segment also then we should use a similar namespaceing. For e.g. if the property name is 'cache' then when reading from fwk then 'oak.documentstore.cache' would be used oak.mongo.uri and oak.mongo.db are spacial cased though and not follow this rule. Chetan Mehrotra On Tue, Jun 30, 2015 at 2:55 AM, Francesco Mari mari.france...@gmail.com wrote: So we should probably adopt this strategy instead. This means that SegmentNodeStoreService is the one that should be modified. 2015-06-29 17:15 GMT+02:00 Chetan Mehrotra chetan.mehro...@gmail.com: Looking at code flow now yes it differs. The thought behind reading from framework property first was to provide a simple way to override the config which might be packaged by default. For e.g. while launching Oak via Sling one can provide the framework property at command line (using -Doak.mongo.uri) which would supercede the one packaged by default. This simplifies the testing. Chetan Mehrotra On Mon, Jun 29, 2015 at 7:01 PM, Davide Giannella dav...@apache.org wrote: On 29/06/2015 10:22, Francesco Mari wrote: ... Is it possible - or does it make sense - to make this behaviour uniform across components? I think it's a good idea to uniform this aspect. Maybe we could put it down as a guideline by setting up a new page on the doc site: code-conventions.md. Somewhere beside: http://jackrabbit.apache.org/oak/docs/dev_getting_started.html Personally I'd go for component first and bundle then, but I'm not too religious about it :) Anyone against it? Davide
Re: [Oak origin/trunk] Apache Jackrabbit Oak matrix - Build # 232 - Still Failing
Most of the test in oak-remote/RemoteServerIT fail with following exception. Opened OAK-3047 to track this -- java.io.FileNotFoundException: /home/jenkins/jenkins-slave/workspace/Apache%20Jackrabbit%20Oak%20matrix/jdk/latest1.7/label/Ubuntu/nsfixtures/SEGMENT_MK/profile/unittesting/oak-remote/target/test-classes/org/apache/jackrabbit/oak/remote/http/handler/addNodeMultiPathProperty.json (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:146) at com.google.common.io.Files$FileByteSource.openStream(Files.java:127) at com.google.common.io.Files$FileByteSource.openStream(Files.java:117) at com.google.common.io.ByteSource$AsCharSource.openStream(ByteSource.java:404) at com.google.common.io.CharSource.read(CharSource.java:155) at com.google.common.io.Files.toString(Files.java:391) at org.apache.jackrabbit.oak.remote.http.handler.RemoteServerIT.load(RemoteServerIT.java:119) at org.apache.jackrabbit.oak.remote.http.handler.RemoteServerIT.testPatchLastRevisionAddMultiPathProperty(RemoteServerIT.java:1199) - Chetan Mehrotra On Tue, Jun 30, 2015 at 10:11 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build #232) Status: Still Failing Check console output at https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/232/ to view the results.
Re: [Observation] Question on local events reported as external
This is most likely due to rate at which event gets generated and thus causing Observation queue to fill up (1000 default size). If the backing JCR listener is slow in processing the queue would get filled and BackgroundObserver would start compacting/merging the diff thus converting local events to external. There are two solutions 1. Throttle the commits - CommitRateLimiter 2. Have a non limiting queue - Then you end up with OOM if gap in processing rate is large Chetan Mehrotra On Thu, Jun 25, 2015 at 8:47 PM, Marius Petria mpet...@adobe.com wrote: Hi, I understand that under high load local events can be reported as external. My question is why does this happen also for tarmk, where there is a single instance active (receiving external events on a single instance seems odd)? Also, is there a way to disable this functionality, meaning on tarmk to always receive the local events with their associated data (specifically the userData)? Regards, Marius
Re: [Observation] Question on local events reported as external
On Thu, Jun 25, 2015 at 10:49 PM, Marius Petria mpet...@adobe.com wrote: AFAIU because the local events are changed to external events it means that they can also be dropped completly under load, is that true? Well they are not dropped. Observation in Oak works on basis of diff of NodeState. So lets say you have an Observation queue of size 3 with 3 local events 1. [ns1, ns2, ci1] 2. [ns2, ns3, ci2] 3. [ns3, ns4, ci3] Each tuple is [base root nodestate, nodeState post change, commit info]. Now if 4 change event [ns4, ns5, ci4] comes the BackgroundObserver has following options 1. Block on put 2. Have a indefinite length queue then just add to queue 3. Pull out the last event and replace it with merged So current logic in Oak goes for #3. 1. [ns1, ns2, ci1] 2. [ns2, ns3, ci2] 3. [ns3, ns5, null] Now when it merged/collapsed the content change then you cannot associate any specific commit info to it. In such cases 1. You cannot determine which user has made that change. 2. Some changes might not get visible. For example if a foo property is added in E3 and removed in E4. Then Merged content change would not provide any indication that any such change happened So such merged changes are shown as external i.e. JackrabbitEvent#isExternal returns true for them currently. Problem is currently its not possible to distinguish such collapsed events from truely external events. May be we make that distinction so that component which just rely on *some local change* to react can continue to work. Though there is no gurantee that they see *each* local change. Chetan Mehrotra
Re: Observation: External vs local - Load distribution
Just ensure that your Observer is fast as its invoked the critical path. This would probably end up with a design similar to Background Observer. May be better option would be to allow BO have non bounded queue. Chetan Mehrotra On Wed, Jun 17, 2015 at 2:05 PM, Carsten Ziegeler cziege...@apache.org wrote: Ok, just to recap. In Sling we can implement the Observer interface (and not use the BackgroundObserver base class). This will give us reliably user id for all local events. Does anyone see a problem with this approach? Carsten -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org
Re: Observation: External vs local - Load distribution
On Mon, Jun 15, 2015 at 1:13 PM, Carsten Ziegeler cziege...@apache.org wrote: Now, with Oak there is still this distinction, however if I remember correctly under heavy load it might happen that local events are reported as external events. And in that case the above pattern fails. Regardless of how rare this situation might be, if it can happen it will eventually happen. This is an implementation detail of BackgroundObserver (BO) which is used by OakResourceListener in Sling. BO keeps a queue of changed NodeState tuples and if it gets filled it is collapsed. If you want to avoid that at *any* cost that you can used a different impl which uses say LinkedBlockingQueue and does not enforce any limit. That would be similar to how JcrResourceListener works which uses an unbound in memory queue Chetan Mehrotra
Re: MongoDB collections in MongoDocumentStore
On Fri, Jun 12, 2015 at 5:20 PM, Ian Boston i...@tfd.co.uk wrote: Are all queries expected to query all keys within a collection as it is now, or is there some logical structure to the querying ? Not sure if I get your question. The queries are always for immediate children. For for 1:/a the query is like $query: { _id: { $gt: 2:/a/, $lt: 2:/a0 } Chetan Mehrotra
Re: [VOTE] Release Apache Jackrabbit Oak 1.0.15
On Fri, Jun 12, 2015 at 1:56 PM, Amit Jain am...@ieee.org wrote: [ ] +1 Release this package as Apache Jackrabbit Oak 1.0.15 All checks ok Chetan Mehrotra
Re: MongoDB collections in MongoDocumentStore
On Fri, Jun 12, 2015 at 3:31 PM, Ian Boston i...@tfd.co.uk wrote: I am thinking that the collection name is a fn(key). What problems would that cause elsewhere ? One potential problem is when querying for children. If 2:/a/b and 2:/a/c are mapped to different collection then querying for children of 1:/a would be become tricky Chetan Mehrotra
Re: MongoDB collections in MongoDocumentStore
On Fri, Jun 12, 2015 at 7:32 PM, Ian Boston i...@tfd.co.uk wrote: Initially I was thinking about the locking behaviour but I realises 2.6.* is still locking at a database level, and that only changes to at a collection level 3.0 with MMAPv1 and row if you switch to WiredTiger [1]. I initially thought the same and then we benchmarked the throughput by placing the BlobStore in a separate database (OAK-1153). But did not observed any significant gains. So that approach was not pursued further. If we have some benchmark which can demonstrate that write throughput increases if we _shard_ node collection into separate database on same server then we can look further there Chetan Mehrotra
Re: LazyInputStream does not uses BufferedInputStream while creating stream from underlying File
On Thu, May 21, 2015 at 1:31 PM, Thomas Mueller muel...@adobe.com wrote: Yes, it would be better if we wrap it somewhere (not necessarily right there, but somewhere I think we can do that in DataStoreBlobStore. Right? Chetan Mehrotra
LazyInputStream does not uses BufferedInputStream while creating stream from underlying File
While having a look at how the InputStream is opened while accessing content from any binary file it appears that LazyInputStream creates a FileInputStream [1] from the underlying file instance Should not it be a BufferedInputStream or is it the responsibility of the caller to wrap it in the buffered variant? Chetan Mehrotra [1] https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-data/src/main/java/org/apache/jackrabbit/core/data/LazyFileInputStream.java#L102
Re: LazyInputStream does not uses BufferedInputStream while creating stream from underlying File
Opened OAK-2898 to track this. Testing that passed stream at Oak/JCR level is buffered would indeed be tricky! Chetan Mehrotra On Thu, May 21, 2015 at 2:16 PM, Thomas Mueller muel...@adobe.com wrote: Hi, Sure. We should probably have some kind of test case to ensure the stream is wrapped. Not sure how to be test it, if the BufferedInputStream is again wrapped in some other way: * Check if markSupported() returns true (it does for BufferedInputStream, but not for FileInputStream; and if BufferedInputStream is again wrapped, typically markSupported() is delegated, at least in FilterInputStream). * Using reflection? :-) * Measureing performance: not a reliable way to test it Regards, Thomas On 21/05/15 10:33, Chetan Mehrotra chetan.mehro...@gmail.com wrote: On Thu, May 21, 2015 at 1:31 PM, Thomas Mueller muel...@adobe.com wrote: Yes, it would be better if we wrap it somewhere (not necessarily right there, but somewhere I think we can do that in DataStoreBlobStore. Right? Chetan Mehrotra
Re: Build failed in Jenkins: Apache Jackrabbit Oak matrix » latest1.7,Ubuntu,DOCUMENT_NS,unittesting #139
Failure in getSize OAK-2689 On Fri, May 22, 2015 at 9:36 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Tests run: 218, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 352.443 sec FAILURE! testGetSize(org.apache.jackrabbit.core.query.QueryResultTest) Time elapsed: 4.2 sec FAILURE! junit.framework.AssertionFailedError: Wrong size of NodeIterator in result expected:51 but was:-1 at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) Chetan Mehrotra
Re: Build failed in Jenkins: Apache Jackrabbit Oak matrix » jdk-1.6u45,Ubuntu,DOCUMENT_NS,unittesting #139
Failure due class not getting loaded. Looks like updated Apache DS is compiled and meant to be used with JDK 1.7 only. In that case we would either need to disable these test for 1.6 matrix or look for other option Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.055 sec FAILURE! org.apache.jackrabbit.oak.security.authentication.ldap.LdapProviderTest Time elapsed: 0.054 sec ERROR! java.lang.UnsupportedClassVersionError: org/apache/directory/server/core/api/DirectoryService : Unsupported major.minor version 51.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) Chetan Mehrotra On Fri, May 22, 2015 at 9:48 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: See https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/jdk=jdk-1.6u45,label=Ubuntu,nsfixtures=DOCUMENT_NS,profile=unittesting/139/changes Changes: [chetanm] OAK-2247 - CopyOnWriteDirectory implementation for Lucene for use in indexing [thomasm] OAK-2889 Ignore order by jcr:score desc in the query engine (for union queries) [mreutegg] OAK-2899: Update to Jackrabbit 2.10.1 [chetanm] OAK-2895 - Avoid accessing binary content if the mimeType is excluded from indexing Update the docs [chetanm] OAK-2895 - Avoid accessing binary content if the mimeType is excluded from indexing -- Use TypeDetector instead of DefaultDetector to avoid Tika sniffing the mimeType by reading the input stream -- Use a LazyInputStream to lazily load the stream if and when required [chetanm] OAK-2898 - DataStoreBlobStore should expose a buffer input stream for getInputStream call [alexparvulescu] OAK-2872 ExternalLoginModule should clear state when login was not successful - added another missing cleanup [tripod] Use correct copyright notice -- [...truncated 1947 lines...] at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) org.apache.jackrabbit.oak.security.authentication.ldap.LdapDefaultLoginModuleTest Time elapsed: 0.006 sec ERROR! java.lang.NoClassDefFoundError: Could not initialize class org.apache.jackrabbit.oak.security.authentication.ldap.LdapLoginTestBase at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:36) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115
Re: Faster indexing of binary files - For migration and incremental async indexing
Opened OAK-2892 to track this Chetan Mehrotra On Wed, May 20, 2015 at 2:45 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: On Wed, May 20, 2015 at 2:34 PM, Ian Boston i...@tfd.co.uk wrote: And does that apply to all BlobStore implementations including those that use Mongo as the BlobStore It should apply to Mongo also which guarantees strong consistency. Yes latency is different from consistency but any writes done to Mongo primary would be visible to later reads from other clients. Latency that occurs while reading from repository is different - That happens due to the way DocumentNodeStore performs background reads. So any change from other cluster node would only be picked up when the background read happens. Chetan Mehrotra
Making release notes more meaningfull and usefull to end users
Hi Team, Currently the release notes provided with every release [1] is just a dump from JIRA. However looking at it its not obvious if some new feature, new setting is exposed which is useful to end user. The title in JIRA issues are often very brief (given they are title!) and hence cannot provide the required information completely. We can see how other projects provide the release notes 1. Lucene [2], Solr [4] - It shows a brief explanation of new feature instead of just showing the bug title. Moreover it makes the link clickable hence a user can easily navigate 2. Guava [3] - It provide a brief summary of important changes which user can quickly see and comprehend. if required he can go to complete listing As these are used as library such level of details makes sense. Given that Oak is mostly used as an application we can at least focus on providing details around any new config setting , new feature introduced. 1. Document explicitly if any new config option is introduced 2. If any new feature is introduced then a brief description of that This should be done before the actual release is performed. Do share your thoughts on how the release notes can be improved further! Chetan Mehrotra [1] https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.0.13/RELEASE-NOTES.txt [2] https://lucene.apache.org/core/4_1_0/changes/Changes.html#4.1.0.new_features [3] https://code.google.com/p/guava-libraries/wiki/Release18 [4] http://lucene.apache.org/solr/4_5_0/changes/Changes.html#v4.5.0.new_features
[docs] Add inner links directly to side bar in Oak Docs
Hi Team, Currently the links show on the side and top bar only list the top level links [1] and do not show the inner links. For example link to Persistent Cache [2] is mentioned somewhere in DocumentNodeStore [3] which is again not directly listed. So looking at documentation its not obvious where Persistent Cache doc is referred. Unless we restructure the site like say Apache Drill [4] (which shows nested link in side bar) I think we should also refer to all such inner links directly. Thoughts? Chetan Mehrotra [1] http://jackrabbit.apache.org/oak/docs/ [2] http://jackrabbit.apache.org/oak/docs/nodestore/persistent-cache.html [3] http://jackrabbit.apache.org/oak/docs/nodestore/documentmk.html [4] http://drill.apache.org/docs/
Re: svn commit: r1679959 - in /jackrabbit/oak/trunk: oak-commons/ oak-commons/src/main/java/org/apache/jackrabbit/oak/commons/benchmark/ oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/segmen
Hi Michael, On Mon, May 18, 2015 at 2:02 PM, mdue...@apache.org wrote: /dependency +dependency + groupIdorg.apache.commons/groupId + artifactIdcommons-math3/artifactId +/dependency Is adding a required dependency on commons-math3 required? Probably MicroBenchmark can be made part of commons/test and move this dependency to test scope Chetan Mehrotra
Keeping Oak documentation upto date - Use label 'docs-impacting'
Hi Team, At times we introduce new config settings as part of some bug/feature implementation. The required details often remain only in the bug notes and not made part of documentation. I suggest that we mark any such issue with label 'docs-impacting' and at time of release make use of that to update the release notes and also ensure that documentation gets updated Chetan Mehrotra
Re: svn commit: r1676235 - in /jackrabbit/oak/trunk/oak-run/src/main/java/org/apache/jackrabbit/oak: ContinuousRevisionGCTest.java benchmark/BenchmarkRunner.java benchmark/RevisionGCTest.java
On Mon, Apr 27, 2015 at 3:24 PM, mreut...@apache.org wrote: +protected static NodeStore getNodeStore(Oak oak) throws Exception { Field f = Oak.class.getDeclaredField(store); f.setAccessible(true); return (NodeStore) f.get(oak); } I have also often struggled to get hold of underlying NodeStore from given Oak instance. May be we should expose it as part of API itself. After each Oak instance would always be back by NodeStore Chetan Mehrotra
Re: Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC
To avoid missing this issue opened OAK-2808. Data collected from recent runs suggest that this aspect would need to be looked into going forward Chetan Mehrotra On Tue, Mar 10, 2015 at 9:49 PM, Thomas Mueller muel...@adobe.com wrote: Hi, I think removing binaries directly without going though the GC logic is dangerous, because we can't be sure if there are other references. There is one exception, it is if each file is guaranteed to be unique. For that, we could for example append a unique UUID to each file. The Lucene file system implementation would need to be changed for that (write the UUID, but ignore it when reading and reading the file size). Even in that case, there is still a risk, for example if the binary _reference_ is copied, or if an old revision is accessed. How do we ensure this does not happen? Regards, Thomas On 10/03/15 07:46, Chetan Mehrotra chetan.mehro...@gmail.com wrote: Hi Team, With storing of Lucene index files within DataStore our usage pattern of DataStore has changed between JR2 and Oak. With JR2 the writes were mostly application based i.e. if application stores a pdf/image file then that would be stored in DataStore. JR2 by default would not write stuff to DataStore. Further in deployment where large number of binary content is present then systems tend to share the DataStore to avoid duplication of storage. In such cases running Blob GC is a non trivial task as it involves a manual step and coordination across multiple deployments. Due to this systems tend to delay frequency of GC Now with Oak apart from application the Oak system itself *actively* uses the DataStore to store the index files for Lucene and there the churn might be much higher i.e. frequency of creation and deletion of index file is lot higher. This would accelerate the rate of garbage generation and thus put lot more pressure on the DataStore storage requirements. Any thoughts on how to avoid/reduce the requirement to increase the frequency of Blob GC? One possible way would be to provide a special cleanup tool which can look for such old Lucene index files and deletes them directly without going through the full fledged MarkAndSweep logic Thoughts? Chetan Mehrotra
Re: Quickest way of running oak to validate DocumentNodeStore mbeans
On Thu, Apr 23, 2015 at 3:51 PM, Robert Munteanu romb...@apache.org wrote: Some of the specific MBeans do not appear and only seem to be registered by the DocumentNodeStoreService ( see [2] ). That was what lead me to believe that JMX MBean registration is tied to OSGi. Is this available for registration in another way in oak-run? Aah yup those would get missed out in non OSGi runs. To see them in non OSGi env you would have to make use of oak-pojosr module. Launch repo using that and then have it running. Might require some more tweaks Chetan Mehrotra
Re: Quickest way of running oak to validate DocumentNodeStore mbeans
I assume that this happens because there is no OSGi environment available Thats not the case. MBean would be registered only if a MBeanServer is provided while constructing Oak instance (in non OSGi env). So in oak-run where Oak instance is created if you also set MBeanServer Something like Oak oak = .. oak.with(ManagementFactory.getPlatformMBeanServer()); This would lead to registration of MBean Chetan Mehrotra On Wed, Apr 22, 2015 at 6:37 PM, Robert Munteanu romb...@apache.org wrote: Hi, I've built Oak from trunk and want to access the DocumentNodeStoreMBean. I see that the mbeans are not registered when using oak-run ( I assume that this happens because there is no OSGi environment available ). I can always install a custom version of Oak in Sling, but I was wondering whether there's a faster way of running a locally-built Oak in an OSGi environment. Thanks, Robert
Re: svn commit: r1674107 - /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/nodetype/TypeEditorProvider.java
On Thu, Apr 16, 2015 at 9:49 PM, resc...@apache.org wrote: +LOG.info(Node type changes: + modifiedTypes + ; repository scan took + (System.currentTimeMillis() - start) ++ ms + (exception == null ? : ; failed with + exception.getMessage())); It would be better to make use of PerfLogger here Chetan Mehrotra
Re: TokenLoginModule Spring
On Tue, Apr 14, 2015 at 10:25 PM, Angela Schreiber anch...@adobe.com wrote: Since I initialize the JCR with an instance of the Oak, it would be nice to reach in and get the underlaying oak repo I am seeing similar requirement for that at OAK-2760 where the HttpServer has to access both ContentRepository and JCR Repository. Should we modify the Jcr class to 1. To not allow more than 1 invocation for createRepository 2. Cache the repo created in createRepository 3. Expose a getter for the ContentRepository instance created in createRepository OR 1. Have special interface for providing ContentRepository via RepositoryImpl. Something like RepositoryImpl implements OakRepository and have it provide accessor for the backing content repo instance Chetan Mehrotra
[jira] [Commented] (JCR-3865) Use markdown to generate Jackrabbit Site
[ https://issues.apache.org/jira/browse/JCR-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486716#comment-14486716 ] Chetan Mehrotra commented on JCR-3865: -- bq. note that the site only updates the 'jackrabbit/site/live/jcr' directory. handling the entire site is very slow, since scm-publish checks out the entire tree. In Oak I avoid doing the complete checkin by first running the build with {{-Dscmpublish.skipCheckin=true}} and then just checkin in the changed html files. But yes it still has the drawback of checking out whole site which might be slow Use markdown to generate Jackrabbit Site Key: JCR-3865 URL: https://issues.apache.org/jira/browse/JCR-3865 Project: Jackrabbit Content Repository Issue Type: Improvement Components: docs Reporter: Tobias Bocanegra Assignee: Tobias Bocanegra Priority: Minor The current jackrabbit site is nor well maintained, mainly because we need to edit directly the HTML. most of the content is already available as markdown. goal is to automate the site generation via maven and svn pubsub. 1. phase is to reuse the same template/skin as in oak and to get the content right. 2. phase is to beautify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: jackrabbit-oak build #5374: Broken
Added LogDumper rule with OAK-2721. Now you can make use of that in those test which fail intermittently to get better details around the failure Chetan Mehrotra On Thu, Apr 2, 2015 at 3:21 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: I implemented something on that line to get logs from remote server [1] for Sling. Had plans to get that work for local jvm process also but never got to complete it. Would try to get that done and see if that can be leverage in Oak test. This feature would dump the logs as part of JUnit report rendered in Jenkins. So along with failure stacktrace you can see what all logs were captured in that run Chetan Mehrotra [1] https://plus.google.com/+ChetanMehrotra/posts/Ao1w9SACKSh On Thu, Apr 2, 2015 at 2:43 PM, Davide Giannella dav...@apache.org wrote: On 01/04/2015 07:29, Marcel Reutegger wrote: The test failure was: Failed tests: testProxyFlippedIntermediateByteChange( org.apache.jackrabbit.oak.plugins.se gment.standby.ExternalSharedStoreIT): expected:{ root = { ... } } but was:{ root : { } } It's a bit difficult for me to see what went wrong, because the test forcibly causes exceptions. So, many of the exceptions in the log are kind of expected... Could we catch all the logged exceptions with an ad-hoc appender and then in case output only the relevant ones? An example with custom appender for testing can be found in https://github.com/apache/jackrabbit-oak/blob/105f890e04ee990f0e71d88937955680670d96f7/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/index/property/Oak2077QueriesTest.java -- Davide
PropertyIndex handle large deletions : Possible optimization
Hi Team, Currently in case large deletion are performed i.e. deleting big subtree where multiple property indexes are configured for nodes in that deleted tree then deletion is found to be very slow (at least for DocumentMK). Looking at the code it seems that if a deletion is detected the editor still traverses the deleted sub tree completely and updates the backing index on per node basis. Instead of that if it utilizes the fact that index is also managed as a tree i.e. at least for ContentMirrorStoreStrategy it can just delete the index tree at that path for various values. LuceneIndexEditor takes a similar approach [1] by issuing a PrefixQuery to drop all Lucene documents under that path Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexEditor.java#L230-255
Re: jackrabbit-oak build #5374: Broken
I implemented something on that line to get logs from remote server [1] for Sling. Had plans to get that work for local jvm process also but never got to complete it. Would try to get that done and see if that can be leverage in Oak test. This feature would dump the logs as part of JUnit report rendered in Jenkins. So along with failure stacktrace you can see what all logs were captured in that run Chetan Mehrotra [1] https://plus.google.com/+ChetanMehrotra/posts/Ao1w9SACKSh On Thu, Apr 2, 2015 at 2:43 PM, Davide Giannella dav...@apache.org wrote: On 01/04/2015 07:29, Marcel Reutegger wrote: The test failure was: Failed tests: testProxyFlippedIntermediateByteChange( org.apache.jackrabbit.oak.plugins.se gment.standby.ExternalSharedStoreIT): expected:{ root = { ... } } but was:{ root : { } } It's a bit difficult for me to see what went wrong, because the test forcibly causes exceptions. So, many of the exceptions in the log are kind of expected... Could we catch all the logged exceptions with an ad-hoc appender and then in case output only the relevant ones? An example with custom appender for testing can be found in https://github.com/apache/jackrabbit-oak/blob/105f890e04ee990f0e71d88937955680670d96f7/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/index/property/Oak2077QueriesTest.java -- Davide
[DISCUSS] Enable CopyOnRead feature for Lucene indexes by default
Hi Team, CopyOnRead feature was provided as part of 1.0.9 release and has been in used in quite a few customer deployment. Of late we have to recommend to enable this setting on most of the deployments where queries are found to be performing slowly and it provides considerable better performance. I would like to enable this feature by default now [1]. Both in trunk and in branch. Would it be fine to do that? Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-2708
Re: New committer: Shashank Gupta
Welcome Shashank! Chetan Mehrotra On Thu, Mar 26, 2015 at 8:56 AM, Amit Jain am...@apache.org wrote: Welcome Shashank!! On Thu, Mar 26, 2015 at 2:20 AM, Michael Dürig mdue...@apache.org wrote: Hi, Please welcome Shashank Gupta as a new committer and PMC member of the Apache Jackrabbit project. The Jackrabbit PMC recently decided to offer Shashank committership based on his contributions. I'm happy to announce that he accepted the offer and that all the related administrative work has now been taken care of. Welcome to the team, Shashank! Michael
[jira] [Commented] (JCR-3862) [FileDataStore]: deleteRecord leaves the parent directories empty
[ https://issues.apache.org/jira/browse/JCR-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375782#comment-14375782 ] Chetan Mehrotra commented on JCR-3862: -- +1. Patch looks fine [FileDataStore]: deleteRecord leaves the parent directories empty - Key: JCR-3862 URL: https://issues.apache.org/jira/browse/JCR-3862 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-data Reporter: Amit Jain Assignee: Amit Jain Attachments: JCR-3862-mreutegg.patch, JCR-3862.patch Calling deleteRecord to delete a particular record does not delete any empty parent directories empty. Oak uses this particular method for garbage collection due to which after a while large number of empty directories keep lying around, making the process increasingly slower. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: svn commit: r1668034 - in /jackrabbit/oak/trunk/oak-lucene: ./ src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/ src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/score/ src
On Fri, Mar 20, 2015 at 8:22 PM, thom...@apache.org wrote: +@Activate +private void activate() { +scorerProviderMap.clear(); +} Probably this should only be done in deactivate Chetan Mehrotra
Re: svn commit: r1667590 - /jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCDeletionTest.java
Interesting way for test case scenario construction Marcel ! Chetan Mehrotra On Wed, Mar 18, 2015 at 10:38 PM, mreut...@apache.org wrote: Author: mreutegg Date: Wed Mar 18 17:08:59 2015 New Revision: 1667590 URL: http://svn.apache.org/r1667590 Log: OAK-2420: DocumentNodeStore revision GC may lead to NPE Test to reproduce the problem Modified: jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCDeletionTest.java Modified: jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCDeletionTest.java URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCDeletionTest.java?rev=1667590r1=1667589r2=1667590view=diff == --- jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCDeletionTest.java (original) +++ jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCDeletionTest.java Wed Mar 18 17:08:59 2015 @@ -22,20 +22,31 @@ package org.apache.jackrabbit.oak.plugin import java.util.Collections; import java.util.Comparator; import java.util.List; +import java.util.concurrent.Callable; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.Future; +import java.util.concurrent.Semaphore; import java.util.concurrent.TimeUnit; import javax.annotation.Nonnull; +import org.apache.jackrabbit.oak.api.CommitFailedException; +import org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.VersionGCStats; import org.apache.jackrabbit.oak.plugins.document.memory.MemoryDocumentStore; import org.apache.jackrabbit.oak.spi.commit.CommitInfo; import org.apache.jackrabbit.oak.spi.commit.EmptyHook; +import org.apache.jackrabbit.oak.spi.state.ChildNodeEntry; import org.apache.jackrabbit.oak.spi.state.NodeBuilder; +import org.apache.jackrabbit.oak.spi.state.NodeState; import org.apache.jackrabbit.oak.stats.Clock; import org.junit.After; import org.junit.Before; +import org.junit.Ignore; import org.junit.Test; +import static java.util.concurrent.Executors.newSingleThreadExecutor; import static java.util.concurrent.TimeUnit.HOURS; +import static java.util.concurrent.TimeUnit.MINUTES; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNull; import static org.junit.Assert.fail; @@ -140,7 +151,7 @@ public class VersionGCDeletionTest { VersionGarbageCollector gc = store.getVersionGarbageCollector(); gc.setOverflowToDiskThreshold(100); -VersionGarbageCollector.VersionGCStats stats = gc.gc(maxAge * 2, HOURS); +VersionGCStats stats = gc.gc(maxAge * 2, HOURS); assertEquals(noOfDocsToDelete * 2 + 1, stats.deletedDocGCCount); @@ -152,6 +163,88 @@ public class VersionGCDeletionTest { } } +// OAK-2420 +@Ignore +@Test +public void queryWhileDocsAreRemoved() throws Exception { +//Baseline the clock +clock.waitUntil(Revision.getCurrentTimestamp()); + +final Thread currentThread = Thread.currentThread(); +final Semaphore queries = new Semaphore(0); +final CountDownLatch ready = new CountDownLatch(1); +MemoryDocumentStore ms = new MemoryDocumentStore() { +@Override +public T extends Document T find(CollectionT collection, + String key) { +if (Thread.currentThread() != currentThread) { +ready.countDown(); +queries.acquireUninterruptibly(); +} +return super.find(collection, key); +} +}; +store = new DocumentMK.Builder().clock(clock) +.setDocumentStore(ms).setAsyncDelay(0).getNodeStore(); + +// create nodes +NodeBuilder builder = store.getRoot().builder(); +NodeBuilder node = builder.child(node); +for (int i = 0; i 100; i++) { +node.child(c- + i); +} +merge(store, builder); + +clock.waitUntil(clock.getTime() + HOURS.toMillis(1)); + +// remove nodes +builder = store.getRoot().builder(); +node = builder.child(node); +for (int i = 0; i 90; i++) { +node.getChildNode(c- + i).remove(); +} +merge(store, builder); + +store.runBackgroundOperations(); + +clock.waitUntil(clock.getTime() + HOURS.toMillis(1)); + +// fill caches +NodeState n = store.getRoot().getChildNode(node); +for (ChildNodeEntry entry : n.getChildNodeEntries()) { +entry.getName(); +} + +// invalidate the nodeChildren cache only +store.invalidateNodeChildrenCache
Re: Slow running test for oak-lucene and Lucene Suggestor getting created by default
Thanks Tommaso!. Let see how the next build runs http://ci.apache.org/builders/oak-trunk/builds/1144 Chetan Mehrotra On Thu, Mar 12, 2015 at 2:36 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: I've created https://issues.apache.org/jira/browse/OAK-2611 to track the mentioned issue. Regards, Tommaso 2015-03-12 9:36 GMT+01:00 Tommaso Teofili tommaso.teof...@gmail.com: Hi Chetan, there are 2 things at play there I think. First thing is that for testing purposes the suggester was configured to be updated upon each commit [1], the other thing, which is a bug, is that the code you mentioned [2] should actually check if the the useInSuggest property is set before eventually update the suggester, so at least this check needs to be introduced. For the testing configuration we should probably look for a less intrusive setting. Regards, Tommaso [1] : https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/jcr/LuceneOakRepositoryStub.java#L88 [2] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexEditorContext.java#L167 2015-03-12 8:00 GMT+01:00 Chetan Mehrotra chetan.mehro...@gmail.com: Hi Tommaso, Last couple of builds on Apache CI are failing in oak-lucene [1] [2]. Running the system locally reveals that quite a bit of time is being spent in building up suggestor [3]. QueryJcrTest taking some time and its the test which probably gets hanged in the CI build Running org.apache.jackrabbit.oak.jcr.query.QueryJcrTest Tests run: 218, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 245.208 sec Further looking at the code [0] it appears that a suggestor directory would always be created/updated irrespective wether user has enabled suggestor for that index or not. I think suggestor should only be built if the index has that feature enabled? For example for normal lucene-property index building up the suggestor would not be useful Chetan Mehrotra [0] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexEditorContext.java#L167 [1] http://ci.apache.org/builders/oak-trunk/builds/1142/steps/compile/logs/stdio [2] http://ci.apache.org/builders/oak-trunk/builds/1141/steps/compile/logs/stdio [3] Thread-9 prio=10 tid=0x7f1790797000 nid=0x6b6f runnable [0x7f175ef0b000] java.lang.Thread.State: RUNNABLE at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexFile.init(OakDirectory.java:201) at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexFile.init(OakDirectory.java:155) at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.init(OakDirectory.java:340) at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.clone(OakDirectory.java:345) at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.clone(OakDirectory.java:329) at org.apache.lucene.store.Directory$SlicedIndexInput.clone(Directory.java:288) at org.apache.lucene.store.Directory$SlicedIndexInput.clone(Directory.java:269) at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader.init(BlockTreeTermsReader.java:481) at org.apache.lucene.codecs.BlockTreeTermsReader.init(BlockTreeTermsReader.java:176) at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:437) at org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:116) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:96) at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:141) at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:235) - locked 0xfc700320 (a org.apache.lucene.index.ReadersAndUpdates) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:382) - locked 0xf19fbee0 (a org.apache.lucene.index.IndexWriter) - locked 0xf19fc010 (a java.lang.Object) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111) at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorContext.updateSuggester(LuceneIndexEditorContext.java:185)
Re: svn commit: r1666220 - in /jackrabbit/oak/trunk: oak-commons/ oak-commons/src/main/java/org/apache/jackrabbit/oak/commons/sort/ oak-commons/src/test/java/org/apache/jackrabbit/oak/commons/sort/ oa
Looks like Closer closes the closeables in LIFO manner due to which directory containing that file got deleted first. I have change the logic now. Let me know if the test passes for you on Windows Chetan Mehrotra On Thu, Mar 12, 2015 at 10:21 PM, Julian Reschke julian.resc...@gmx.de wrote: With this change, I get a reliable test failure on Windows: Tests in error: overflowToDisk(org.apache.jackrabbit.oak.commons.sort.StringSortTest): Unable to delete file: C:\tmp\oak-sorter-1426178913437-0\strings-sorted.txt Best regards, Julian On 2015-03-12 16:22, chet...@apache.org wrote: Author: chetanm Date: Thu Mar 12 15:22:46 2015 New Revision: 1666220 URL: http://svn.apache.org/r1666220 Log: OAK-2557 - VersionGC uses way too much memory if there is a large pile of garbage Added: jackrabbit/oak/trunk/oak-commons/src/main/java/org/apache/jackrabbit/oak/commons/sort/StringSort.java (with props) jackrabbit/oak/trunk/oak-commons/src/test/java/org/apache/jackrabbit/oak/commons/sort/StringSortTest.java (with props) Modified: jackrabbit/oak/trunk/oak-commons/pom.xml jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreService.java jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollector.java jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCDeletionTest.java jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCWithSplitTest.java Modified: jackrabbit/oak/trunk/oak-commons/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-commons/pom.xml?rev=1666220r1=1666219r2=1666220view=diff == --- jackrabbit/oak/trunk/oak-commons/pom.xml (original) +++ jackrabbit/oak/trunk/oak-commons/pom.xml Thu Mar 12 15:22:46 2015 @@ -93,6 +93,11 @@ artifactIdoak-mk-api/artifactId version${project.version}/version /dependency +dependency + groupIdcommons-io/groupId + artifactIdcommons-io/artifactId + version2.4/version +/dependency !-- Test dependencies -- dependency Added: jackrabbit/oak/trunk/oak-commons/src/main/java/org/apache/jackrabbit/oak/commons/sort/StringSort.java URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-commons/src/main/java/org/apache/jackrabbit/oak/commons/sort/StringSort.java?rev=1666220view=auto == --- jackrabbit/oak/trunk/oak-commons/src/main/java/org/apache/jackrabbit/oak/commons/sort/StringSort.java (added) +++ jackrabbit/oak/trunk/oak-commons/src/main/java/org/apache/jackrabbit/oak/commons/sort/StringSort.java Thu Mar 12 15:22:46 2015 @@ -0,0 +1,255 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.jackrabbit.oak.commons.sort; + +import java.io.BufferedWriter; +import java.io.Closeable; +import java.io.File; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.Reader; +import java.nio.charset.Charset; +import java.util.Collections; +import java.util.Comparator; +import java.util.Iterator; +import java.util.List; + +import com.google.common.base.Charsets; +import com.google.common.collect.Lists; +import com.google.common.io.Closer; +import com.google.common.io.Files; +import org.apache.commons.io.FileUtils; +import org.apache.commons.io.LineIterator; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Utility class to store a list of string and perform sort on that. For small size + * the list would be maintained in memory. If the size crosses the required threshold then + * the sorting would be performed externally + */ +public class StringSort implements Closeable { +private final Logger log = LoggerFactory.getLogger(getClass()); +public static final int BATCH_SIZE = 2048; + +private final int overflowToDiskThreshold; +private final ComparatorString comparator; + +private final ListString ids = Lists.newArrayList
Slow running test for oak-lucene and Lucene Suggestor getting created by default
Hi Tommaso, Last couple of builds on Apache CI are failing in oak-lucene [1] [2]. Running the system locally reveals that quite a bit of time is being spent in building up suggestor [3]. QueryJcrTest taking some time and its the test which probably gets hanged in the CI build Running org.apache.jackrabbit.oak.jcr.query.QueryJcrTest Tests run: 218, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 245.208 sec Further looking at the code [0] it appears that a suggestor directory would always be created/updated irrespective wether user has enabled suggestor for that index or not. I think suggestor should only be built if the index has that feature enabled? For example for normal lucene-property index building up the suggestor would not be useful Chetan Mehrotra [0] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexEditorContext.java#L167 [1] http://ci.apache.org/builders/oak-trunk/builds/1142/steps/compile/logs/stdio [2] http://ci.apache.org/builders/oak-trunk/builds/1141/steps/compile/logs/stdio [3] Thread-9 prio=10 tid=0x7f1790797000 nid=0x6b6f runnable [0x7f175ef0b000] java.lang.Thread.State: RUNNABLE at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexFile.init(OakDirectory.java:201) at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexFile.init(OakDirectory.java:155) at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.init(OakDirectory.java:340) at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.clone(OakDirectory.java:345) at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.clone(OakDirectory.java:329) at org.apache.lucene.store.Directory$SlicedIndexInput.clone(Directory.java:288) at org.apache.lucene.store.Directory$SlicedIndexInput.clone(Directory.java:269) at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader.init(BlockTreeTermsReader.java:481) at org.apache.lucene.codecs.BlockTreeTermsReader.init(BlockTreeTermsReader.java:176) at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:437) at org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:116) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:96) at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:141) at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:235) - locked 0xfc700320 (a org.apache.lucene.index.ReadersAndUpdates) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:382) - locked 0xf19fbee0 (a org.apache.lucene.index.IndexWriter) - locked 0xf19fc010 (a java.lang.Object) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111) at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorContext.updateSuggester(LuceneIndexEditorContext.java:185)
Re: Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC
On Tue, Mar 10, 2015 at 1:50 PM, Michael Marth mma...@adobe.com wrote: But I wonder: how do you envision that this new index cleanup would locate indexes in the content-addressed DS Thats bit tricky. Have rough idea here on how to approach but would require more thinking here. The approach I am thinking of is 1. Have an index on oak:QueryIndexDefinition 2. Query for all index definition nodes with type=lucene 3. Get the ':data node and then perform the listing. Each child node is a Lucene index file representation For Mongo I can easy read the previous revisions of the jcr:blob property and then extract the blobId which can be then be deleted via direct invocation GarbageCollectableBlobStore API. For Segment I am not sure how to easily read previous revisions of given NodeState Chetan Mehrotra
Re: Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC
On Tue, Mar 10, 2015 at 3:33 PM, Michael Dürig mdue...@apache.org wrote: SegmentMK doesn't even have the concept of a previous revision of a NodeState. Yes that is to be thought about. I want to read all previous revision for path /oak:index/lucene/:data. For segment I believe I would need to start at root references for all previous revisions and then read along the required path from those root segments to collect previous revisions. Would that work? Chetan Mehrotra
Re: Parallelize text extraction from binary fields
Is Oak already single instance when it comes to the identification and storage of binaries ? Yes. Oak uses content addressable storage for binaries Are the existing TextExtractors also single instance ? No. If same binary is referred at multiple places then text extraction would be performed for each such reference of that binary By Single instance I mean, 1 copy of the binary and its token stream in the repository regardless of how many times its referenced. So based on above token stream would be multiple. What's the approach you are thinking ... and would benefit from 'Single instance' based design? Chetan Mehrotra On Tue, Mar 10, 2015 at 1:15 PM, Ian Boston i...@tfd.co.uk wrote: Hi, Is Oak already single instance when it comes to the identification and storage of binaries ? Are the existing TextExtractors also single instance ? By Single instance I mean, 1 copy of the binary and its token stream in the repository regardless of how many times its referenced. Best Regards Ian On 10 March 2015 at 07:05, Chetan Mehrotra chetan.mehro...@gmail.com wrote: LuceneIndexEditor currently extract the binary contents via Tika in same thread which is used for processing the commit. Such an approach does not make good use of multi processor system specifically when index is being built up as part of migration process. Looking at JR2 I see LazyTextExtractor [1] which I think would help in parallelize text extraction. Would it make sense to bring this to Oak. Would that help in improving performance? Chetan Mehrotra [1] https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/LazyTextExtractorField.java