Re: svn commit: r1629643 - /jackrabbit/oak/trunk/oak-run/README.md
HI Michael, On Mon, Oct 6, 2014 at 6:44 PM, mdue...@apache.org wrote: +* upgrade : Upgrade from Jackrabbit 2 upgrade mode is only supported in oak-run-jr2 jar (as that only packages JR2 classes) and is documented there [1] Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#oak-runnable-jar---jr-2
Re: Command line tools documentation
Hi Thomas, Command line tool options are documented at the readme in oak-run folder. See [1]. Or you are looking for something else? Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#oak-runnable-jar On Mon, Oct 6, 2014 at 7:13 PM, Thomas Mueller muel...@adobe.com wrote: Hi, I didn't find any official documentation about the command line tools. I have started documenting oak-run: http://jackrabbit.apache.org/oak/docs/command_line.html Regards, Thomas
Re: svn commit: r1627052 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/index/IndexUpdate.java test/java/org/apache/jackrabbit/oak/plugins/index/IndexUpdateTest.j
On Wed, Sep 24, 2014 at 1:57 PM, Davide Giannella dav...@apache.org wrote: Let's say we want to keep logs of upgrades between the index definitions, I would store these kind of information under a hidden node as well. It would be better to store them in proper nodes. With that you can access them via std JCR Tooling. The only downside that they would be pickedup in observation. To enable precise deletion of index data nodes in case of reindex we can capture those node names in index definition and reindex logic then only removes them. Or follow some convention :data and :index for storage of index nodes and then remove only those nodes Chetan Mehrotra
Re: AggregateIndex and AdvanceQueryIndex
Hi Thomas, On Tue, Sep 23, 2014 at 3:54 PM, Thomas Mueller muel...@adobe.com wrote: Index plans need to be wrapped as well. If there is only one sub-index, then it's easy: just wrap the plans from the sub-index. If there are multiple sub-indexes, then most likely each sub-index will only return one plan. If one of them returns multiple plans (which could be the case for sorting), then all possible combinations could be wrapped and returned. If there are too many combinations, then a warning could be logged and only a subset of the combinations returned. If that really happens, we can still think about a better solution (for example use a greedy algorithm as this is done for join ordering). But personally, I don't think such a reduction will ever be needed. Not sure if we have to do all that at all. As the baseIndex is not registered with Oak going for such finer cost calculation is probably not required. We can just propagate the baseIndex plans I have updated a patch to OAK-2119 which changes AggregateIndex to implement AdvanceQueryIndex. With those changes (and related changes in Lucene) all test pass fine. If it looks fine I can go ahead with my change and move forward in OAK-2005. Note that initial plan was to 1. Have current LuceneIndex moved to AdvanceQueryIndex 2. Then branch off the impl and make a new copy which has changes done for Property index support Chetan Mehrotra
AggregateIndex and AdvanceQueryIndex
Hi, For Lucene based property index (OAK-2005) I need to make LuceneIndex implement AdvanceQueryIndex. As AggregateIndex (AI) wrap LuceneIndex (for FullText search) it would also need to be adapted to support the same (OAK-2119). However making it do that seems bit tricky A - Cost aggregation --- AggregateIndex aggregates the cost also. How such thing should be implemented in terms of IndexPlan is not clear. Also I am not sure if cost needs to be re defined as wrapped index is not registered. Probably AggregateIndex just returns the baseIndex cost B - FulltextQueryIndex -- As FulltextQueryIndex does not extend AdvanceQueryIndex it causes issue in wrapping. Should I create a new AdvanceFulltextQueryIndex like. public interface AdvanceFulltextQueryIndex extends FulltextQueryIndex, AdvancedQueryIndex { } Further I do not understand the AggregateIndex logic very well and not sure how a Fulltext index which also handles property restriction can be wrapped. Any guidance here would be helpful! Given that initial implementation would not support both Fulltext query and Property based query simultaneously we can take an alternative approach for now (its fallback Plan B and only considered last option) 1. Have two impls LuceneIndex and LucenePropertyIndex 2. LuceneIndex would be wrapped by AggregateIndex and would server Fulltext queries 3. LucenePropertyIndex would not be wrapped and would only serve queries which involve property restrictions. With this existing logic would not be modified and we can move ahead with Lucene based property index. Later once we unify them we can tackle this issue Chetan Mehrotra
Re: [VOTE] Release Apache Jackrabbit Oak 1.0.6
+1 Chetan Mehrotra On Mon, Sep 22, 2014 at 8:55 PM, Michael Dürig mdue...@apache.org wrote: On 22.9.14 6:13 , Amit Jain wrote: [X] +1 Release this package as Apache Jackrabbit Oak 1.0.6 Michael
Checkpoint might not get cleaned up in case of abrupt shutdown and thus stop GC for 3 years!
Currently AsyncIndexUpdate creates a checkpoint with a lifetime of 3 years. Post index complete the checkpoint is moved to point to revision upto which indexing is done and previous checkpoint is released. Now if a system is killed when a index update is in progress particularly in run method where a new checkpoint gets created (Line 229) then that checkpoint would not be released and also would not be recorded in index metadata. Such a checkpoint would prevent GC for a long time. Is that understanding correct? Chetan Mehrotra
Re: Checkpoint might not get cleaned up in case of abrupt shutdown and thus stop GC for 3 years!
On Wed, Sep 10, 2014 at 1:33 PM, Alex Parvulescu alex.parvule...@gmail.com wrote: Second is the worse in my view, the cp get created and referenced with a life of 3y, but the async gets shut down and doesn't get a chance to cleanup. This is a node state that will keep the GC from cleaning properly. Yes thats the case I was referring to Let's continue the investigation on OAK-2087. This issue is different from OAK-2087 as that is only meant for informational purpose. Opened OAK-2088 for this issue and would followup there Chetan Mehrotra
Re: buildbot failure in ASF Buildbot on oak-trunk
Failure in Tar MK failover test Running org.apache.jackrabbit.oak.plugins.segment.failover.MBeanTest Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 18.861 sec FAILURE! testClientAndServerEmptyConfig(org.apache.jackrabbit.oak.plugins.segment.failover.MBeanTest) Time elapsed: 10.532 sec FAILURE! junit.framework.AssertionFailedError: unexpected Statusexception occurred: Connection reset by peer at junit.framework.Assert.fail(Assert.java:50) at org.apache.jackrabbit.oak.plugins.segment.failover.MBeanTest.testClientAndServerEmptyConfig(MBeanTest.java:185) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Chetan Mehrotra On Tue, Sep 9, 2014 at 11:27 AM, build...@apache.org wrote: The Buildbot has detected a new failure on builder oak-trunk while building ASF Buildbot. Full details are available at: http://ci.apache.org/builders/oak-trunk/builds/503 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: bb-vm_ubuntu Build Reason: scheduler Build Source Stamp: [branch jackrabbit/oak/trunk] 1623645 Blamelist: chetanm BUILD FAILED: failed compile sincerely, -The Buildbot
Re: Using BlobStore by default with SegmentNodeStore
I have updated OAK-2082 with test run results. Looking at the result I think FDS does provide a benefit in terms of lesser storage space. Putting Lucene index on file system provides best storage efficiency but then it would not work once we have TarMK failover implemented. Chetan Mehrotra On Tue, Sep 9, 2014 at 12:44 PM, Thomas Mueller muel...@adobe.com wrote: Hi, In addition or instead of using the BlobStore, we could store the Lucene index to the filesystem (persistence = file, path = ...). But I would probably only do that on a case-by-case basis. I think it would reduce, but not solve the compaction problem. Some numbers from a test repository I have (not compacted): * 7 million segments in 3 tar files, of which are * 4.3 million (146 GB) data segments, and * 2.7 million (187 GB) binary segments. For this case, using external blobs would at most reduce the repository size by around 60% (so 40%, 146 GB of data segments would still remain). This change might make it possible to more efficiently compact. But I'm not sure. Regards, Thomas On 04/09/14 13:25, Chetan Mehrotra chetan.mehro...@gmail.com wrote: Hi Team, Currently SegmentNodeStore does not uses BlobStore by default and stores the binary data within data tar files. This has the goodness that 1. Backup is simpler - User just needs to backup segmentstore directory 2. No Blob GC - The RevisionGC would also delete the binary content and a separate Blob GC need not be performed 3. Faster IO - The binary content would be fetched via memory mapped files and hence might have better performance compared to streamed io. However of late we are seeing issue where repository is not able to reclaim space from deleted binary content as part of normal cleanup and full scale compaction needs to be performed to reclaim the space. However running compaction has other issue (see OAK-2045) and currently it needs to be performed offline to get optimum results. In quite a few cases it has been see that repository growth is mostly due to Lucene index content changes which leads to creation of new binary content and also causes fragmentation due to newer revisions. Further as Segment logic does not perform de duplication any change in Lucene index file would probably re create the whole index file (need to confirm). Given that such repository growth is troublesome it might be better if we configure a BlobStore by default with SegmentNodeStore (or atleast for applications like AEM). This should reduce the rate of repository growth due to 1. De duplication - BlobStore and DataStore (current impls) implement de duplication so adding same binary would not cause size growth 2. Lesser Fragmentation - As large binary content would not be part of data tar files Blob GC would be able to reclaim space. Currently in a cleanup if even one bulk segment in a data tar is having a reference the cleanup would not be able to remove that. That space can only be reclaimed via compaction. Compared to benefits mentioned initially 1. Backup - User needs to backup two folders 2. Blob GC would need to be run separately 3. Faster IO - That needs to be seen. For Lucene this can be mitigated to an extent with proposed CopyOnReadDirectory support in OAK-1724 Further we also get the benefit of sharing the BlobStore between multiple instances if required!! Thoughts? Chetan Mehrotra
Using BlobStore by default with SegmentNodeStore
Hi Team, Currently SegmentNodeStore does not uses BlobStore by default and stores the binary data within data tar files. This has the goodness that 1. Backup is simpler - User just needs to backup segmentstore directory 2. No Blob GC - The RevisionGC would also delete the binary content and a separate Blob GC need not be performed 3. Faster IO - The binary content would be fetched via memory mapped files and hence might have better performance compared to streamed io. However of late we are seeing issue where repository is not able to reclaim space from deleted binary content as part of normal cleanup and full scale compaction needs to be performed to reclaim the space. However running compaction has other issue (see OAK-2045) and currently it needs to be performed offline to get optimum results. In quite a few cases it has been see that repository growth is mostly due to Lucene index content changes which leads to creation of new binary content and also causes fragmentation due to newer revisions. Further as Segment logic does not perform de duplication any change in Lucene index file would probably re create the whole index file (need to confirm). Given that such repository growth is troublesome it might be better if we configure a BlobStore by default with SegmentNodeStore (or atleast for applications like AEM). This should reduce the rate of repository growth due to 1. De duplication - BlobStore and DataStore (current impls) implement de duplication so adding same binary would not cause size growth 2. Lesser Fragmentation - As large binary content would not be part of data tar files Blob GC would be able to reclaim space. Currently in a cleanup if even one bulk segment in a data tar is having a reference the cleanup would not be able to remove that. That space can only be reclaimed via compaction. Compared to benefits mentioned initially 1. Backup - User needs to backup two folders 2. Blob GC would need to be run separately 3. Faster IO - That needs to be seen. For Lucene this can be mitigated to an extent with proposed CopyOnReadDirectory support in OAK-1724 Further we also get the benefit of sharing the BlobStore between multiple instances if required!! Thoughts? Chetan Mehrotra
Re: svn commit: r1622201 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document: LastRevRecoveryAgent.java UnsavedModifications.java
Hi Marcel, On Wed, Sep 3, 2014 at 3:22 PM, mreut...@apache.org wrote: log.info(Updated lastRev of [{}] documents while performing lastRev recovery for + -cluster node [{}], size, clusterId); +cluster node [{}]: , size, clusterId, updates); You would need to add one for '{}' in log message to log the 'updates' argument. Chetan Mehrotra
Build bot for 1.0 branch
Hi, Do we have any build bot running for 1.0 branch Chetan Mehrotra
Re: oak-run public distribution
This was discussed earlier [1] and Jukka mentioned that there were some restriction of deployment size. I tried pushing a snapshot version sometime back and that got deploy fine. So I think we should try to deploy artifacts again Chetan Mehrotra [1] http://markmail.org/thread/ofvy3z5lyu5cw2i7 On Thu, Aug 28, 2014 at 8:42 PM, Michael Dürig mdue...@apache.org wrote: So far we didn't deploy oak-run. I'm not sure why but I think there where concerns regarding making developer tooling available to end users. Michael On 28.8.14 4:59 , Geoffroy Schneck wrote: Hello, AEM Support releases every weekly AEM Hotfix upgrading oak to 1.0.x publicly on Package Share. We would like to add to this public page a Download link to the oak-run-1.0.x as well. A link pointing to the same version of oak-run on the Apache Nexus would be good. However, we cannot find oak-run there . Could you help us finding where is oak-run-1.0.5 there please ? cid:image001.png@01CF3C74.944BEB30 *Geoffroy Schneck* Enterprise Support Engineer – Team Lead Marketing Cloud Customer Care T: +41 61 226 55 70 M: +41 79 207 45 04 email: gschn...@adobe.com mailto:gschn...@adobe.com Barfuesserplatz 6 CH-4001 Basel, Switzerland www.adobe.com http://www.adobe.com/ For CQ support and tips, follow us on Twitter: @CQCare http://twitter.com/CQCare
Re: New Jackrabbit committer: Amit Jain
Welcome Amit!! Chetan Mehrotra On Wed, Aug 27, 2014 at 12:51 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: welcome Amit! Regards, Tommaso 2014-08-26 22:06 GMT+02:00 Michael Dürig mdue...@apache.org: Hi, Please welcome Amit Jain as a new committer and PMC member of the Apache Jackrabbit project. The Jackrabbit PMC recently decided to offer Amit committership based on his contributions. I'm happy to announce that he accepted the offer and that all the related administrative work has now been taken care of. Welcome to the team, Amit! Michael
Re: [DISCUSS] supporting faceting in Oak query engine
This looks useful Tommaso. With OAK-2005 we should be able to support multiple LuceneIndexes and manage them easily. If we can abstract all this out and just expose the facet information as virtual node that would simplify the stuff for end users. Probably we can have a read only NodeStore impl to expose the faceted data bound to a system path. Otherwise we would need to expose the Lucene API and OakDirectory Chetan Mehrotra On Tue, Aug 26, 2014 at 1:28 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: 2014-08-25 19:02 GMT+02:00 Lukas Smith sm...@pooteeweet.org: Aloha, Aloha! you should definitely talk to the HippoCMS developers. They forked Jackrabbit 2.x to add facetting as virtual nodes. They ran into some performance issues but I am sure they still have value-able feedback on this. Cool, thanks for letting us know, if you or any other (from Hippo) would like to give some more insight on pros and cons of such an approach that'd be very good. Regards, Tommaso regards, Lukas Kahwe Smith On 25 Aug 2014, at 18:43, Laurie Byrum lby...@adobe.com wrote: Hi Tommaso, I am happy to see this thread! Questions: Do you expect to want to support hierarchical or pivoted facets soonish? If so, does that influence this decision? Do you know how ACLs will come into play with your facet implementation? If so, does that influence this decision? :-) Thanks! Laurie On 8/25/14 7:08 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, since this has been asked every now and then [1] and since I think it's a pretty useful and common feature for search engine nowadays I'd like to discuss introduction of facets [2] for the Oak query engine. Pros: having facets in search results usually helps filtering (drill down) the results before browsing all of them, so the main usage would be for client code. Impact: probably change / addition in both the JCR and Oak APIs to support returning other than just nodes (a NodeIterator and a Cursor respectively). Right now a couple of ideas on how we could do that come to my mind, both based on the approach of having an Oak index for them: 1. a (multivalued) property index for facets, meaning we would store the facets in the repository, so that we would run a query against it to have the facets of an originating query. 2. a dedicated QueryIndex implementation, eventually leveraging Lucene faceting capabilities, which could use the Lucene index we already have, together with a sidecar index [3]. What do you think? Regards, Tommaso [1] : http://markmail.org/search/?q=oak%20faceting#query:oak%20faceting%20list%3 Aorg.apache.jackrabbit.oak-dev+page:1+state:facets [2] : http://en.wikipedia.org/wiki/Faceted_search [3] : http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-file s/userguide.html
Re: JCR API implementation transparency
On Tue, Aug 26, 2014 at 10:44 AM, Tobias Bocanegra tri...@apache.org wrote: IMO, this should work, even if the value is not a ValueImpl. In this case, it should fall back to the API methods to read the binary. +1 Chetan Mehrotra
Re: testing helper
Probably we can package the test classes as an attached artifact and make use of that. For example oak-lucene uses the Oak Core test via dependency groupIdorg.apache.jackrabbit/groupId artifactIdoak-core/artifactId version${project.version}/version classifiertests/classifier scopetest/scope /dependency Chetan Mehrotra On Thu, Aug 21, 2014 at 1:36 PM, Davide Giannella dav...@apache.org wrote: On 20/08/2014 10:11, Marcel Reutegger wrote: oops, you are right, that would be a bad idea. I thought this is about a production class and not a test utility. I can see an additional bundle like testing-commons that can be imported with scope test by other projects. The pain point in here is that the testing helpers: functions, classes, etc; uses part of the oak-core api (NodeBuilder, NodeState, etc) and without having the exposed API as a bundle but together with the implementation we go in a loop of having oak-core depending on testing-commons and testing-commons depending on oak-core. D.
Re: [Document Cache Size] Is it better to have cache size using number of entries
Hi Vikas, Sizing the cache can be done by either number of entries or the size taken by cache. Currently in Oak we limit by size however as you mentioned limit by count is more deterministic. We use Guava Cache and it supports either limiting by size or by number of entries i.e. the two policies are exclusive. So at minimum if you can provide a patch which allows the admin to choose between the two it would allow us to experiment and later see how we can put a max cap on cache size. Chetan Mehrotra On Mon, Aug 18, 2014 at 7:55 PM, Vikas Saurabh vikas.saur...@gmail.com wrote: we can probably have both and cache respects whichever constraint hits first (sort of min(byte size, entry size)). First of all I don't know MongoNS implementation details so I can be wrong. I'd rather keep the size in bytes as it gives me much more control over the memory I have and what I decide to provide to the application. If we say, to take an extreme example, 1 document only in cache and then this single document exceed the amount of available memory I fear an OOM. On the other hand having bytes ensure us the application keeps working and it will be task of a sysadmin to monitor the eventual hit/miss ratio to adjust the cache accordingly. Yes, sysadmin can modify cache size in bytes if miss ratio increases. But, in current scenario, I couldn't figure out a neat way (heuristic/guesswork) to figure out if it's application mis-behavior or lack of cache size (notice our issue didn't happen to be related to cache size... but still the question did bug us). On the other hand, an sysadmin can be provided with a rough idea about relation of (frequently used) repo nodes using which sysadmin can update cache size. Also, I do take the point of avoiding OOMs in case of pretty large documents which is why we can have both properties(byte size and entry count) with byte constraint being a fail safe. Thanks, Vikas
Re: Extending the IndexPlan with custom data
On Tue, Aug 19, 2014 at 11:59 AM, Marcel Reutegger mreut...@adobe.com wrote: Maybe this is already sufficient for your requirement? Yup that would serve the purpose. For now I do not require that support after following Thomas suggestion about using one index per implementation. So for multiple Lucene index defintions multiple LuceneIndex instances would be returned by LucenIndexProvider and each impl would return a single plan Chetan Mehrotra
Re: [Document Cache Size] Is it better to have cache size using number of entries
Hi Thomas, On Tue, Aug 19, 2014 at 6:13 PM, Thomas Mueller muel...@adobe.com wrote: How, or in what way, is it more deterministic? Missed providing some context there so here are the details. Currently we limit the cache by total size taken. Now given a system where you have say 32 GB RAM available to you and admin needs to decide how much memory one should allocate to DocumentNodeStore. Currently we do not have a definitive way to tell that as there are couple of factors to consider 1. Number of entries in Document Cache - The Document cache (which caches the NodeDocuments) is the most critical cache and is currently allocated 70% of the cache size. We would like to give it as much memory as possible but then we also need to take into account the time taken to perform consistency check for the entries present in the cache. If time taken to perform consistency checks take more than 1 sec then it would delay the background job in DocumentNodeStore and hence root node version would become stale. Now time taken to perform consistency check ~= f(n) where n = number of entries in the cache and not the size of cache. With Mongo we can have good estimate of time taken to query modCount for 'n' node. Going forward would add some stats collection in this logic to determine how much time is being spent in cache consistency check 2. Effect of GC with larger heaps - As we run JVM with higher heap size we need to take into account delays that might occur with such large heaps So if we can have a cache policy which can put up a max cap on memory taken and also allow limit of number of entries then that would give a more deterministic control on tuning the cache Chetan Mehrotra
Re: AW: Cleanup NodeStore and MK implementations (was: Re: AW: Confusions about API's and Flavours in the documentation ...)
On Mon, Aug 18, 2014 at 1:02 PM, Michael Dürig mdue...@apache.org wrote: This will affect the ongoing work in theses areas, which is all merged back into the 1.0 branch. We should also consider Marcel and Alex, who will probably be most affected by the additional merge effort. I also think that this refactoring should be delayed (if that is to be done) for some time till we get more stability in the two NodeStore implementations. Chetan Mehrotra
Re: buildbot failure in ASF Buildbot on oak-trunk-win7
Test failed is in oak-pojosr for TokenAuthenticationTest. It passes locally but has been seen failing few times on build bot. Would have a look at that. However the failure is not related to changes done in last commit Running org.apache.jackrabbit.oak.run.osgi.TokenAuthenticationTest Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.545 sec FAILURE! tokenCreationWithPreAuth(org.apache.jackrabbit.oak.run.osgi.TokenAuthenticationTest) Time elapsed: 4.545 sec ERROR! java.lang.reflect.UndeclaredThrowableException at $Proxy7.login(Unknown Source) at javax.jcr.Repository$login.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116) at org.apache.jackrabbit.oak.run.osgi.TokenAuthenticationTest.tokenCreationWithPreAuth(TokenAuthenticationTest.groovy:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.jackrabbit.oak.run.osgi.OakOSGiRepositoryFactory$RepositoryProxy.invoke(OakOSGiRepositoryFactory.java:325) ... 37 more Caused by: javax.jcr.LoginException: Login Failure: all modules ignored at org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:261) at org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:219) ... 42 more Caused by: javax.security.auth.login.LoginException: Login Failure: all modules ignored at javax.security.auth.login.LoginContext.invoke(LoginContext.java:921) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:186) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703) at javax.security.auth.login.LoginContext.login(LoginContext.java:575) at org.apache.jackrabbit.oak.core.ContentRepositoryImpl.login(ContentRepositoryImpl.java:161) at org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.login(RepositoryImpl.java:253) ... 43 more Chetan Mehrotra On Tue, Aug 12, 2014 at 1:00 PM, build...@apache.org wrote
Re: [Vote] Upgrading junit to 4.11
+1 btw IMHO you need not go for a vote for incremental change in version number of an existing library. Vote would be helpful when a new library need to be introduced and that too in compile time dependencies. For other case just opening a JIRA Task and making the change against that task should be fine Chetan Mehrotra On Tue, Aug 5, 2014 at 3:40 PM, Davide Giannella dav...@apache.org wrote: Currently we use junit 4.10. By looking at the release notes of 4.11[0] there's a nice improvement on the Assume that we use, being able to provide a custom message. (0) https://github.com/junit-team/junit/blob/master/doc/ReleaseNotes4.11.md#improvements-to-assert-and-assume I find it very useful cause some time tests are skip because of the failing assumption but we don't have any messages back and it's difficult to spot it. Let's vote for an upgrade and test. If passed I'll file an issue and follow up. [ ] +1 let's upgrade and test to junit 4.11 [ ] -1 no, I don't see any added value My vote: +1 Cheers Davide
Re: Adding a timer in commons
Hi Davide, Recently Ian pointed to Metrics [1] project which is related to such timing measurements. It might be helpful to use this (via a wrapper) to measure timings in critical areas in Oak. So have a look at that also Chetan Mehrotra [1] http://metrics.codahale.com/ On Sat, Aug 2, 2014 at 2:57 PM, Davide Giannella dav...@apache.org wrote: On 23/06/2014 14:26, Davide Giannella wrote: On 23/06/2014 13:57, Michael Dürig wrote: +1 in general. However, - although it results in nice code on the client side, I'm a bit reluctant about putting all the code into the instance initialiser. it was my concern as well. But don't see that much of a difference from something timer = new Timer(foobar); timer.start(); // blabla timer.trackTime(); - how about reusing org.apache.jackrabbit.oak.stats.Clock instead of using Guava's Stopwatch? If necessary we could still implement Clock based on Stopwatch. Didn't know about it. Will have a look and amend accordingly. - Timer might not be the best name for the class. Should probably better be something with Log in its name It was an example. It could be LogTimer if you prefer or whatever. Don't mind the name and I'm very bad at it :) Had time on train to work a bit on it. here's a the final class http://goo.gl/xUcPdY The idea behind it is to have something for tracking time like // ... StopwatchLogger sl = new StopwatchLogger(FooBar.class); // ... public void foo() { sl.start(); // perform all sort of things sl.split(performed some operations); // perform other things sl.stop(operation completed); } then by setting in logs the debug for o.a.j.oak.stats.StopwatchLogger should do the trick to have some gross numbers around performances. If ok, I'll create an issue an commit to trunk. thoguhts? Cheers Davide
Re: [VOTE] Release Apache Jackrabbit Oak 1.0.4
+1 All checks ok Chetan Mehrotra On Fri, Aug 1, 2014 at 3:53 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: +1 Regards, Tommaso 2014-08-01 11:45 GMT+02:00 Thomas Mueller muel...@adobe.com: A candidate for the Jackrabbit Oak 1.0.4 release is available at: https://dist.apache.org/repos/dist/dev/jackrabbit/oak/1.0.4/ The release candidate is a zip archive of the sources in: https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.0.4/ The SHA1 checksum of the archive is 84b2b31d9bff0159a2a7ab22f4af3c5d08b0e81e. A staged Maven repository is available for review at: https://repository.apache.org/ The command for running automated checks against this release candidate is: $ sh check-release.sh oak 1.0.4 84b2b31d9bff0159a2a7ab22f4af3c5d08b0e81e Please vote on releasing this package as Apache Jackrabbit Oak 1.0.4. The vote is open for the next 72 hours and passes if a majority of at least three +1 Jackrabbit PMC votes are cast. [ ] +1 Release this package as Apache Jackrabbit Oak 1.0.4 [ ] -1 Do not release this package because... My vote is +1 Regards Thomas
Re: Extending the IndexPlan with custom data
On Thu, Jul 31, 2014 at 5:23 PM, Thomas Mueller muel...@adobe.com wrote: You could simply have *multiple* index instances, and each index returns its own cost or plan(s). The query engine will figure out which index to use. Aah missed the fact that QueryProvider can return multiple indexes. That would simplify things! As for OAK-2005, you *could* implement the AdvanceQueryIndex. Actually it would be nice if that's implemented (just in general). But you *have* to: you could just implement a regular QueryIndex, and return the cost or infinity. If you implement AdvanceQueryIndex, return either none or one IndexPlan. No need to return multiple IndexPlan. The interest in AdvanceQueryIndex was to support ordering. Filter does not provide any info wrt ordering requirement. IndexPlan provides access to ordering information which can be used to order results in Lucene. So would go with the suggestion of returning one plan per index. btw any plans to add hint support to force usage of specific index? Chetan Mehrotra
Re: Extending the IndexPlan with custom data
On Thu, Jul 31, 2014 at 7:01 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: as far as I know one workaround to trigger usage of a specific index is to use the native language supported by that specific index, but then that, of course, would require writing the query in a implementation specific way (e.g. select * from [nt:base] where native('lucene','title:foo -title:bar') ). Yup that would work. So I can have native type as the lucene index name which say has index for only a limited subset of properties tailor made for a particular query and then it can be used. And yes it would be implementation specific Chetan Mehrotra
Order of property restrictions in Query Filter
Suppose we have a query like select [jcr:path] from [nt:base] where id = '1' and x = '2' Currently the property restrictions are maintained as a HashMap in FilterImpl so above ordering information would be lost. Such ordering information might be useful when querying against Lucene index. The Boolean query created would maintain the order and might be faster if the result from first clause is small. Would it make sense to retain the order of property restrictions? Chetan Mehrotra
Re: h2 dependency still being embedded in oak-core
This is being tracked via OAK-1708. Julian do we still require it or these can be removed now? Chetan Mehrotra On Wed, Jul 30, 2014 at 6:41 PM, Alex Parvulescu alex.parvule...@gmail.com wrote: Hi, I noticed that even if the h2 dependency is now included with a 'test' scope [0], we still embed it in the jar file [1], is there a need to still do this or was it simply forgotten? thanks, alex [0] https://github.com/apache/jackrabbit-oak/commit/74cbf1ffb40b452195e944704ecb8a63ab273c80 [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/pom.xml#L43
Re: buildbot failure in ASF Buildbot on oak-trunk
Failure due to Clock drift Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.713 sec FAILURE! testClockDrift(org.apache.jackrabbit.oak.stats.ClockTest) Time elapsed: 0.597 sec FAILURE! junit.framework.AssertionFailedError: Clock.Fast unexpected drift: -77ms (estimated limit was 23ms, measured granularity was 10.0ms) Chetan Mehrotra On Fri, Jul 18, 2014 at 4:34 PM, build...@apache.org wrote: The Buildbot has detected a new failure on builder oak-trunk while building ASF Buildbot. Full details are available at: http://ci.apache.org/builders/oak-trunk/builds/384 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: bb-vm_ubuntu Build Reason: scheduler Build Source Stamp: [branch jackrabbit/oak/trunk] 1611584 Blamelist: chetanm BUILD FAILED: failed compile sincerely, -The Buildbot
Avoiding intermediate saves while installing packages via File Vault
Hi, Currently JR FileVault supports 'noIntermediateSaves' property which indicates that no intermediate save would be performed while a package is being installed. However it does not appear to work as expected 1. AutoSave Usage For this it uses AutoSave class which commits the change if certain threshold is reached. If 'noIntermediateSaves' is set then this threshold is set to Integer.MAX thus effectively disabling intermediate commits. I see AutoSave being used at two placed in Importer. One of them is guarded by autoSave.needsSave but other one is not [1]. Would that be a bug? 2. Sub Packages - Other place where it does not work properly is when a package contains sub packages as Vault needs to save details regarding intermediate packages at various stages which causes any content changes also getting committed. I see quite a few calls to Item.save in vault codebase for the path taken by package installation. It might be possible to use a sub session to perform internal bookkeeping task by vault and use ther other session to save the package content and thus avoid intermediate save. Chetan Mehrotra [1] https://github.com/apache/jackrabbit-filevault/blob/trunk/vault-core/src/main/java/org/apache/jackrabbit/vault/fs/io/Importer.java#L402
Re: Mongo connection in debug mode using Eclipse
On Fri, Jun 27, 2014 at 3:42 PM, Raquel Neves raquel.ne...@alert-online.com wrote: Timed out while waiting for a server that matches AnyServerSelector Are you able to connect to Mongo via Mongo shell? Also can you try connecting with host set 127.0.0.1 Chetan Mehrotra
Re: Blob and Nodes interaction
Looks like you are using the default MongoBlobStore. Based on that 1) Based on what size data goes in blob collection For MongoBlobStore that is determined via org.apache.jackrabbit.oak.spi.blob.AbstractBlobStore#blockSizeMin which defaults to 4096 2) Is there any reference between Nodes and blob collection The BlobId created for Binaries referred in Node property are stored in Node Document. Those blobId are for the document in Blob collection 3) Is there a flow diagram from where I can find how exactly a content is getting loaded from Mongo to UI Which UI? These are mostly internal implementation details and you would have to refer to code for that. btw whats the higher level objective here. That might help us to clarify better Chetan Mehrotra On Wed, Jun 25, 2014 at 10:33 PM, Abhijit Mazumder abhijit.mazum...@gmail.com wrote: Hi, This is my first mail to Oak mailing list. I tried to search markmail and existing documentation to understand the interaction between blob and nodes collection, but I am not able to find any reference to understand 1) Based on what size data goes in blob collection 2) Is there any reference between Nodes and blob collection 3) Is there a flow diagram from where I can find how exactly a content is getting loaded from Mongo to UI If somebody can point me to relevant classes or any other reference that would be great Regards, Abhijit
Re: [VOTE] Release Apache Jackrabbit Oak 1.0.1
On trying to verify I get a test failure on my Linux box Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.099 sec FAILURE! testCompactionMap(org.apache.jackrabbit.oak.plugins.segment.CompactionMapTest) Time elapsed: 0.099 sec FAILURE! junit.framework.AssertionFailedError: Failed with seed -255091622 at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.assertTrue(Assert.java:20) at org.apache.jackrabbit.oak.plugins.segment.CompactionMapTest.testCompactionMap(CompactionMapTest.java:89) And on running the testcase it indeed fails with that seed Chetan Mehrotra On Tue, Jun 17, 2014 at 11:12 AM, Julian Reschke julian.resc...@gmx.de wrote: On 2014-06-16 21:45, Jukka Zitting wrote: Hi, A candidate for the Jackrabbit Oak 1.0.1 release is available at: https://dist.apache.org/repos/dist/dev/jackrabbit/oak/1.0.1/ The release candidate is a zip archive of the sources in: https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.0.1/ The SHA1 checksum of the archive is 4f38ef7feabfa35eb2bd2749aab2cc4d7d88a78e. A staged Maven repository is available for review at: https://repository.apache.org/content/repositories/orgapachejackrabbit-1017 The command for running automated checks against this release candidate is: $ sh check-release.sh oak 1.0.1 4f38ef7feabfa35eb2bd2749aab2cc4d7d88a78e Please vote on releasing this package as Apache Jackrabbit Oak 1.0.1. The vote is open for the next 72 hours and passes if a majority of at least three +1 Jackrabbit PMC votes are cast. [ ] +1 Release this package as Apache Jackrabbit Oak 1.0.1 [ ] -1 Do not release this package because... My vote is +1. [X] +1 Release this package as Apache Jackrabbit Oak 1.0.1
Re: svn commit: r1603155 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/document/mongo/MongoDocumentStore.java test/java/org/apache/jackrabbit/oak/plugins/documen
On Tue, Jun 17, 2014 at 6:33 PM, mreut...@apache.org wrote: +@Ignore(OAK-1897) +@Test +public void cacheConsistency() throws Exception { +mk.commit(/, +\node\:{}, null, null); +// add a child node. this will require an update +// of _lastRev on /node +mk.commit(/node, +\child\:{}, null, null); + +// make sure the document is not cached +store.invalidateCache(NODES, Utils.getIdFromPath(/node)); + +Thread t = new Thread(new Runnable() { +@Override +public void run() { +store.query(NODES, +Utils.getKeyLowerLimit(/), +Utils.getKeyUpperLimit(/), 10); +} +}); +// block thread when it tries to convert db objects +store.semaphores.put(t, new Semaphore(0)); +t.start(); + +while (!store.semaphores.get(t).hasQueuedThreads()) { +Thread.sleep(10); +} + +// trigger write back of _lastRevs +mk.runBackgroundOperations(); + +// release thread +store.semaphores.get(t).release(); +t.join(); + +NodeState root = mk.getNodeStore().getRoot(); +assertTrue(root.getChildNode(node).getChildNode(child).exists()); +} + +private static final class TestStore extends MongoDocumentStore { + +final MapThread, Semaphore semaphores = Maps.newConcurrentMap(); + +TestStore(DB db, DocumentMK.Builder builder) { +super(db, builder); +} + +@Override +protected T extends Document T convertFromDBObject( +@Nonnull CollectionT collection, @Nullable DBObject n) { +Semaphore s = semaphores.get(Thread.currentThread()); +if (s != null) { +s.acquireUninterruptibly(); +} +try { +return super.convertFromDBObject(collection, n); +} finally { +if (s != null) { +s.release(); +} +} +} +} + +} Interesting test approach Marcel!! Chetan Mehrotra
Re: Oak 1.0.1 release plan
On Fri, Jun 13, 2014 at 4:47 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: OAK-1645: Route find queries to Mongo secondary in MongoDocumentStore This part still needs to be tested completely so I have set the fix version now to 1.0.2 Chetan Mehrotra
Close method on NodeStore API
Hi, Recently Alex pointed that we do not close the NodeStore in the oak-run console which brought up the topic of how to support closing of NodeStore. Currently both SegmentNodeStore and DocumentNodeStore need to be closed properly There are couple of options 1. Make NodeStore API extend Closeable 2. Have the actual implementations (i.e. SegmentNodeStore and DocumentNodeStore) implement Closeable and the close logic does an instanceof check to determine if the NodeStore has to be closed or not Which approach to take? Chetan Mehrotra
Re: oak-parent: DB2 dependency with system scope
Instead of embedding various such drivers I would prefer we include/refer them as part of classpath on command line. So one of the following two approaches can be used 1. Specify the jars as part of classpath java -cp oak-run-xxx.jar:driver.jar org.apache.jackrabbit.oak.run.Main benchmark ... 2. OR refer to the jars (pre defined names) as part of Class-Path attribute of oak-run manifest and place the required jars in same directory. In that can you can just run the jar with java -jar Chetan Mehrotra On Wed, Jun 11, 2014 at 11:09 PM, Julian Reschke julian.resc...@gmx.de wrote: Hi, we currently have a system-scoped dependency for the DB2 JDBC drivers, because (by copyright) they are not available from Maven repos. Turns out that this doesn't work well with the Maven Shade plugin, which is used to build oak-run. It seems the path of least resistance is to make the DB2 dependency a regular one, and require those who need it to deploy the JARs to their local Maven repo. Can everybody live with that? Best regards, Julian
[jira] [Updated] (JCR-3788) S3DataStore require to set endpoint for thirdparty cloud provider (IDCF)
[ https://issues.apache.org/jira/browse/JCR-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Mehrotra updated JCR-3788: - Fix Version/s: (was: 2.7.5) 2.9 S3DataStore require to set endpoint for thirdparty cloud provider (IDCF) - Key: JCR-3788 URL: https://issues.apache.org/jira/browse/JCR-3788 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-data Affects Versions: 2.7.5 Reporter: Shashank Gupta Assignee: Chetan Mehrotra Fix For: 2.9 Attachments: JCR-3788.patch For IDCF cloud provider API are compatible to AWS S3 SDK. We can access S3 without any endpoint since S3 API should be designed and include S3 URL or address so that it can access to S3 if there are not any endpoint. However, any address or endpoint needs to be set for 3rd party cloud provider. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (JCR-3788) S3DataStore require to set endpoint for thirdparty cloud provider (IDCF)
[ https://issues.apache.org/jira/browse/JCR-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Mehrotra updated JCR-3788: - Resolution: Fixed Assignee: Chetan Mehrotra Status: Resolved (was: Patch Available) Applied the patch in http://svn.apache.org/r1601550. The region would now be specified before the call to check for bucket existence would be made S3DataStore require to set endpoint for thirdparty cloud provider (IDCF) - Key: JCR-3788 URL: https://issues.apache.org/jira/browse/JCR-3788 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-data Affects Versions: 2.7.5 Reporter: Shashank Gupta Assignee: Chetan Mehrotra Fix For: 2.7.5 Attachments: JCR-3788.patch For IDCF cloud provider API are compatible to AWS S3 SDK. We can access S3 without any endpoint since S3 API should be designed and include S3 URL or address so that it can access to S3 if there are not any endpoint. However, any address or endpoint needs to be set for 3rd party cloud provider. -- This message was sent by Atlassian JIRA (v6.2#6252)
Using PreAuthentication with Token creation
Hi, I am trying to use PreAuthentication [2] with Token Creation support in Oak. For that I have following LoginModules configured in order below 1. TokenLoginModule 2. PreAuthLoginModule 3. LoginModuleImpl I managed to get preauth along with token creation work by changing the PreAuthLoginModule from [1] with following modifications 1. Set the .token to empty in the SimpleCredentials passed in shared credentials. This would enable the TokenLoginModule to create a token in commit phase 2. Copy the token value from shared credential back to the custom credential attribute And then access the token value from the passed credential attribute in session login call. Wanted to check if this approach is ok or it should be done in a different way? Chetan Mehrotra [1] http://jackrabbit.apache.org/oak/docs/security/authentication/preauthentication.html [2] http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-pojosr/src/test/groovy/org/apache/jackrabbit/oak/run/osgi/TokenAuthenticationTest.groovy?view=markup
Re: New Jackrabbit committer: Davide Giannella
Welcome Davide !! Chetan Mehrotra On Wed, Jun 4, 2014 at 6:47 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: Hi, Please welcome Davide Giannella as a new committer and PMC member of the Apache Jackrabbit project. The Jackrabbit PMC recently decided to offer Davide committership based on his contributions and continuing work to Oak. He accepted the offer and has just made his first commit. Welcome to the team, Davide! BR, Jukka Zitting
Re: Embedding Groovy in oak-run for Oak Shell (OAK-1805)
For me more than size the problem is usage of Lucene To use full power of Oak we need to include Lucene 4.x and thus would need to drop JR2. So probably have two modules 1. oak-run - server, benchmarking, console, debugging, scalability, backup 2. oak-migration - upgrade, jr2 specific benchmarking (possibly by including the oak-run classes only Various sub features in oak-run together do not still add much complexity and are neatly seperated via various main methods. So should be ok for now Chetan Mehrotra On Tue, May 27, 2014 at 7:17 PM, Michael Dürig mdue...@apache.org wrote: On 27.5.14 3:33 , Jukka Zitting wrote: Hi, On Mon, May 26, 2014 at 9:12 AM, Michael Dürig mdue...@apache.org wrote: Apart from that - and this is probably a separate discussion - I also think we should split oak-run up as it is getting too heavy. Too heavy in which way? Functionality wise. It is growing into a chief cook and bottle washer. It already does backup, benchmarking, simple console, debugging, server, upgrade, and scalability testing. There is also the Groovy Console coming along and probably a couple of repair tools. Also I'd prefer Scala or Frege to Groovy. Others probably Clojure or Jython. Michael
Re: Embedding Groovy in oak-run for Oak Shell (OAK-1805)
fwiw I would (for a change :) ) would like to avoid OSGi here and keep things as they are for following reasons 1. JR2 is not OSGi friendly and we use some of the internal classes for upgrade which would be problematic in OSGI 2. For debug and console we rely on some non exported packages. Using them in OSGi would again be tricky Chetan Mehrotra On Tue, May 27, 2014 at 8:23 PM, Michael Dürig mdue...@apache.org wrote: On 27.5.14 4:48 , Jukka Zitting wrote: Hi, On Tue, May 27, 2014 at 10:36 AM, Michael Dürig mdue...@apache.org wrote: We can turn the jar in to an OSGi container, but why not ship everything everything we can by default? Because increase flexibility, test and showcase OSGi readiness of Oak, resolve version conflicts (e.g. Lucene), let others easily plug in their own stuff (e.g scripting language through JSR-223). Michael
Re: Embedding Groovy in oak-run for Oak Shell (OAK-1805)
On Fri, May 23, 2014 at 12:05 AM, Marcel Reutegger mreut...@adobe.com wrote: but the resulting jar file is indeed quite big. Do we really need all the jars we currently embed? Yes currently we are embedding quite a few jars. Looking at oak-run I see it embed following major types of deps 1. JR2 jars (required for benchmark and also upgrade) 2. Lucene 3.6.x jars for JR2 3. H2 and related DBCP jars for RDB 4. Oak jars 5. Logback/jopt etc required for standalone usage 6. Now groovy Alternatively we may want to consider a new module. E.g. oak-console with only the required jar files to run the console. Might be better to go this way as we anyway have to start using Lucene 4.x to allow say a command to dump the Lucene directory content. Given oak-run would be used for benchmark and upgrade it has to package Jr2 and Lucene 3.6.x. So for pure oak related feature set we might require a new module. Chetan Mehrotra
Re: My repository is not indexing PDFs, what am I missing?
Hi Bertrand, This might be due to OAK-1462. We had to disable the LuceneIndexProvider form getting registered as OSGi service due to handle case where LuceneIndexProvider was getting registered twice (1 default and other for Aggregate case). Would try to resolve this soon by next week and then it should work fine Chetan Mehrotra On Wed, May 21, 2014 at 8:58 PM, Bertrand Delacretaz bdelacre...@apache.org wrote: Hi, I'm upgrading the OakSlingRepositoryManager used for Sling tests to Oak 1.0, and it's not indexing PDFs anymore - it used to with oak 0.8. After uploading a text file to /tmp, the /jcr:root/foo//*[jcr:contains(.,'some word')] query finds it, but the same doesn't work with a PDF. My repository setup is in the OakSlingRepositoryManager [1] - am I missing something in there? -Bertrand [1] https://svn.apache.org/repos/asf/sling/trunk/bundles/jcr/oak-server/src/main/java/org/apache/sling/oak/server/OakSlingRepositoryManager.java
Re: How to activate a SecurityProvider
On Tue, May 20, 2014 at 7:36 PM, Galo Gimenez galo.gime...@gmail.com wrote: I am running an old version of Felix, maybe that is the problem? Looks like you are using an old version of SCR. Try to run with more recent version of SCR. Chetan Mehrotra
Re: svn commit: r1587286 - in /jackrabbit/oak/trunk: oak-core/pom.xml oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreService.java oak-parent/pom.xml
For the record I have implemented a DataSource provider bundle [2] (based on above flow) as part of SLING-3574. That bundle can be used to configure a DataSource in OSGi env Chetan Mehrotra [1] https://issues.apache.org/jira/browse/SLING-3574 [2] https://github.com/chetanmeh/sling-datasource On Tue, Apr 15, 2014 at 12:30 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: Register a DataSource where? DataSource would be registered with OSGi ServiceRegistery Does this work in an OSGi context? Yes it should work in OSGi context. Would try to implement the approach by end of week if time permits How does it get the DataSource? Per JNDI? The DataSource would be obtained from OSGi service registry just like it currently obtains the BlobStore instance Chetan Mehrotra On Tue, Apr 15, 2014 at 12:00 PM, Julian Reschke julian.resc...@gmx.de wrote: On 2014-04-15 06:10, Chetan Mehrotra wrote: Hi Julian, On Tue, Apr 15, 2014 at 12:39 AM, resc...@apache.org wrote: - Embed-Dependencycommons-dbcp,commons-pool,h2,json-simple/Embed-Dependency + Embed-Dependencycommons-dbcp,commons-pool,h2,json-simple,postgresql,db2,db2-license/Embed-Dependency Embed-Transitivetrue/Embed-Transitive I believe this is a temporary change and would not be required for final implementation? Would be helpful if we add a TODO/FIXME there such that we remember to remove this later OK. Instead of embedding all such types of drivers/dbcp/pool etc within oak-core it would be better to decouple them. For example one approach can be 1. Have a bundle which embeds common-dbcp and required dependencies. It would be responsible for registering a DataSource Register a DataSource where? 2. Driver bundle would be fragments to the bundle #1 as host. With JDBC 4.0 the Driver classes are provided as part of META-INF/services/java.sql.Driver [1]. For such cases fragment bundles can be avoided by having #1 monitor for such drivers and register them programatically Does this work in an OSGi context? 3. DocumentNodeStoreService should only have a reference to DataSource and use that How does it get the DataSource? Per JNDI? Best regards, Julian
Re: NodeStore and BlobStore configurations in OSGi
On Mon, May 19, 2014 at 3:29 PM, Marc Pfaff pfa...@adobe.com wrote: For SegmentNodeStore my research ends up in FileBlobStore and for the DocumentNodeStore it appears to be the MongoBlobStore. Is that correct? SegementNodeStore does not uses BlobStore by default. Instead all the binary content is stored as part of segment data itself. Sofollowing points can be noted for BlobStore 1. SegmentNodeStore does not require BlobStore by default 2. DocumentNodeStore uses MongoBlobStore by default 3. Both can be configured to use a BlobStore via OSGi config * I was only able to find the FileBlobStoreService that registers the FileBlobStore as an OSGi service. I was not able to find more BlobStore implementations to be exposed in OSGi. Are there any more? And how about the MongoBlobStore in particular? MongoBlobStore is not configured as an explicit service instead is used as the default fallback option if no other BlobStore is configured. As it just requires Mongo connection detail, it is currently configured along with MongoDocumentStore in DocumentNodeStore There is AbstractDataStoreService which wraps JR2 DataStore as BlobStore and configures and registers them with OSGi. it currently support FileDataStore and S3DataStore. Note that FileDataStore is currently preferred over FileBlobStore The DocumentNodeStoreService references the same blob store service as the SegmentNodeStoreService. As I'm not able to find the MongoBlobStore exposed as service, does that mean the DocumentNodeStore uses the FileBlobStore No. The MongoBlobStore is configured implicitly in org.apache.jackrabbit.oak.plugins.document.DocumentMK.Builder#setMongoDB(com.mongodb.DB, int). So unless a BlobStore is explicitly configured DocumentNodeStore would use MongoBlobStore. Both, the SegmentNodeStoreService and the DocumentNodeStoreService appear to check for 'custom blob store' property but both components do not expose such a property? And how would they select from a specific BlobStore service? Here there is an assumption that system has only one BlobStore registered with OSGi Service Registry. If multiple BlobStore services are registered then you can possibly specify a specific one by configuring the 'blobStore.target' to the required OSGi service filter (DS 112.6 of OSGi Compedium) Chetan Mehrotra
Re: NodeStore and BlobStore configurations in OSGi
On Mon, May 19, 2014 at 8:10 PM, Marc Pfaff pfa...@adobe.com wrote: SegmentNodeStore.getBlob() does not seem to be used when reading binaries through JCR Yes when reading via JCR the read is handled via Blob itself i.e. SegmentBlob in this case. A SegmentBlob gets created when the JCR property access - SegmentNodeState.getProperty - SegmentPropertyState.getValue - SegmentPropertyState#getValue(Segment, RecordId, TypeT) Here there is an assumption that system has only one BlobStore registered with OSGi Service Registry. If multiple BlobStore services are registered then you can possibly specify a specific one by configuring the 'blobStore.target' to the required OSGi service filter (DS 112.6 of OSGi Compedium) Assuming same is true for NodeStore services. Did not get the query? btw I have update the docs at [1] (update should reflect on github in couple of hours) Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/blobstore.md
Re: How to activate a SecurityProvider
SecurityProvider should get registered. Do you have the Felix running with WebConsole. Whats the status of the 'org.apache.jackrabbit.oak.security.SecurityProviderImpl' Chetan Mehrotra On Sat, May 17, 2014 at 1:30 AM, Galo Gimenez galo.gime...@gmail.com wrote: Hello, I am setting up Oak on a Felix container , and the RepositoryManager reference to the SecurityProvider does not get satisfied, by looking at the documentation I do not see a way to fix this. I have noticed that the Sling project has a very different way to setup the repository, should I follow that model , or there is something I missing that makes the SecurityProvider service not to register. -- Galo
Lucene blob size different in trunk and 1.0 branch
Hi, As part of [1] the Lucene blob size was changed to 16kb (from 32 kb) to ensure that Lucene blobs are not made part of FileDataStore when SegmentMK is used. However this revision was not merged to 1.0 branch. This miss also affects the caching logic in DataStore (OAK-1726) as there it was assumed that Lucene blobs would be less than 16 kb hence it only cached binaries upto 16 kb. However in trunk the Lucene blobs are of size 32 kb which breaks this assumption and Lucene blobs would not be cached in memory. This can be fixed via config setting 'maxCachedBinarySize' Changing it to 16 now in 1.0 would cause upgrade issue. So should the change be reverted in trunk? Chetan Mehrotra [1] http://svn.apache.org/viewvc?view=revisionrevision=r1587430 [2] http://svn.apache.org/viewvc/jackrabbit/oak/branches/1.0/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/OakDirectory.java?view=markup
Re: buildbot failure in ASF Buildbot on oak-trunk-win7
Failure in ObservationTest Tests run: 110, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 179.557 sec FAILURE! observationDispose[4](org.apache.jackrabbit.oak.jcr.observation.ObservationTest) Time elapsed: 7.138 sec FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertFalse(Assert.java:68) at org.junit.Assert.assertFalse(Assert.java:79) at org.apache.jackrabbit.oak.jcr.observation.ObservationTest.observationDispose(ObservationTest.java:467) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Chetan Mehrotra On Tue, May 13, 2014 at 12:10 PM, build...@apache.org wrote: The Buildbot has detected a new failure on builder oak-trunk-win7 while building ASF Buildbot. Full details are available at: http://ci.apache.org/builders/oak-trunk-win7/builds/67 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: bb-win7 Build Reason: scheduler Build Source Stamp: [branch jackrabbit/oak/trunk] 1594128 Blamelist: chetanm BUILD FAILED: failed compile sincerely, -The Buildbot
Re: [VOTE] Release Apache Jackrabbit Oak 1.0.0
[X] +1 Release this package as Apache Jackrabbit Oak 1.0.0 All tests passed and All checks ok. Chetan Mehrotra On Mon, May 12, 2014 at 2:15 PM, Davide Giannella giannella.dav...@gmail.com wrote: [X] +1 Release this package as Apache Jackrabbit Oak 1.0.0 Davide
Re: svn commit: r1560611 - in /jackrabbit/oak/trunk: oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/mongomk/util/ oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/mongomk/ oak-jcr/sr
On Fri, Apr 25, 2014 at 11:04 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: The credentials in any case need to be valid for the database that holds the repository, so I don't see why we couldn't use it for this purpose. As per the doc the database name tells mongo driver the name of DB in which user details are stored. Typically in SQL database the admin user tables are managed in a dedicated schema. Probably similar scheme is followed in Mongo side also. Probably we can modify the logic to use the db name present as part of url if no db name is explicitly provided via 'oak.mongo.db' Chetan Mehrotra
[jira] [Updated] (JCR-3772) Local File cache is not reduced to zero size after specifying in confuguration
[ https://issues.apache.org/jira/browse/JCR-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Mehrotra updated JCR-3772: - Resolution: Fixed Assignee: Chetan Mehrotra Status: Resolved (was: Patch Available) Applied the patch in http://svn.apache.org/r1589926 Local File cache is not reduced to zero size after specifying in confuguration -- Key: JCR-3772 URL: https://issues.apache.org/jira/browse/JCR-3772 Project: Jackrabbit Content Repository Issue Type: Bug Affects Versions: 2.7.5 Reporter: Shashank Gupta Assignee: Chetan Mehrotra Priority: Minor Fix For: 2.8 Attachments: JCR-3772.patch The local cache size is specified in repository.xml {noformat} DataStore class=org.apache.jackrabbit.aws.ext.ds.S3DataStore param name=config value=${rep.home}/aws.properties/ param name=secret value=123456789/ param name=minRecordLength value=16384/ param name=cacheSize value=68719476736/ param name=cachePurgeTrigFactor value=0.95d/ param name=cachePurgeResizeFactor value=0.85d/ param name=continueOnAsyncUploadFailure value=false/ param name=concurrentUploadsThreads value=10/ param name=asyncUploadLimit value=100/ param name=uploadRetries value=3/ /DataStore {noformat} To disable local cache, {{cacheSize}} is set to 0. Upon setting it to 0 and restart, the expectation is that all files in cache would be deleted and local cache won't be used in any operation. The issue is that local cache is not resetting to 0 size. *WorkAround*: set it to 1. -- This message was sent by Atlassian JIRA (v6.2#6252)
Adding ProviderType and ConsumerType annotation to interfaces in exported packages
As part of OAK-1741 I was changing the Version of exported packages to 1.0.0. Looking at the interfaces which are part of exported packages I do not see usage of ConsumerType/ProviderType annotations [1] In brief and simple terms the interfaces which are expected to be implemented by users of Oak api (like org.apache.jackrabbit.oak.plugins.observation.EventHandler) should be marked with ConsumerType anotation. This enables bnd tool to generate package import instructions with stricter range [1.0,1.1) For all other interface which are supposed to be provided by Oak we should mark them with ProviderType. This enables bnd to generate the package import instructions with relaxed range [1.0,2) for our api consumers. This would help us evolve the api in future easily. Currently we are having following interfaces as part of exported packages [2]. Looking at the list I believe most are of ProviderType i.e. provided by Oak and not required by Oak API users. Some like org.apache.jackrabbit.oak.plugins.observation.EventHandler are of ConsumerType as we require the API users to implement them. Should we add the required annotations for 1.0 release? If yes then can team members look into the list and set the right type Chetan Mehrotra [1] https://github.com/osgi/design/raw/master/rfcs/rfc0197/rfc-0197-OSGiPackageTypeAnnotations.pdf [2] https://issues.apache.org/jira/browse/OAK-1741?focusedCommentId=13979465#comment-13979465
Oak CI notifications not comming
Hi, I was checking the CI status for Oak trunk and it seems build are not getting built at [1] and [2]. Do we have to enable it somehow? Chetan Mehrotra [1] https://travis-ci.org/apache/jackrabbit-oak/builds [2] http://ci.apache.org/builders/oak-trunk/
Re: plugin.document not exported in OSGi bundle
The preferred approach is to instantiate via OSGi configuration. So in your OSGi create configuration for DocumentNodeStore for pid 'org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService' [1]. This would activate the DocumentNodeStoreService [2] component and that would register DocumentNodeStore against NodeStore interface Chetan Mehrotra [1] http://jackrabbit.apache.org/oak/docs/osgi_config.html#DocumentNodeStore [2] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreService.java On Tue, Apr 22, 2014 at 11:23 PM, Galo Gimenez galo.gime...@gmail.com wrote: Hello, I noticed org.apache.jackrabbit.oak.plugins.document.DocumentMK is not exported in the OSGi bundle, is there a way to get Oak with the DocumentMK instantiated in OSGi. -- Galo
[jira] [Resolved] (JCR-3771) Pending async uploads fails to get uploaded on restart.
[ https://issues.apache.org/jira/browse/JCR-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Mehrotra resolved JCR-3771. -- Resolution: Fixed Assignee: Chetan Mehrotra Applied the patch in http://svn.apache.org/r1588850 Pending async uploads fails to get uploaded on restart. Key: JCR-3771 URL: https://issues.apache.org/jira/browse/JCR-3771 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-data Affects Versions: 2.7.5 Reporter: Shashank Gupta Assignee: Chetan Mehrotra Priority: Critical Fix For: 2.8 Attachments: JCR-3771.patch Steps to reproduce: # Configure CachingDataStore to use S3Backend # Upload few large files to repository. Note ongoing uploads in AysncUploadCache. # Kill the server # Start the server. # The ongoing uploads failed to gets uploaded to S3. # The expectation is that all ongoing uploads should be synchronously uploaded to S3 on restart. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review currently exported package version for 1.0 release
Hi Team, As part of OAK-1741 [1] I have captured details about current exported packages from various bundles provided as part of Oak. Currently some packages are exported at 0.18, 0.16 and some are being exported at bundle version. Should we bump all of them to 1.0.0 for the 1.0 release and ensure they are consistent from there on. Also would helpful to review the list once i.e. if the package export is required for e.g in oak-solr-osgi exports quite abit but are probably not required. Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-1741
Re: Using Lucene indexes for property queries
Should we let the user decide whether it's OK to use an asynchronous index for this case +1 for that. It has been the case with JR2 (I may be wrong here). And when user is searching for say some asset via DAM in Adobe CQ then he would be ok if result is not for latest head. A small lag should be acceptable. This would enable scenarios where traversal would be too costly and Lucene can still be used to provide required results in a lot lesser time. Chetan Mehrotra On Mon, Apr 14, 2014 at 2:33 PM, Thomas Mueller muel...@adobe.com wrote: Hi, In theory, the Lucene index could be used quite easily. As far as I see, we would only need to change the cost function of the Lucene index (return a reasonable cost even if there is no full-text constraint). One problem might be: the Lucene index is asynchronous, and the user might expect the result to be up-to-date. The user knows this already for full-text constraints, but not for property constraints. Should we let the user decide whether it's OK to use an asynchronous index for this case? For example by specifying an option in the query (for example similar to the order by, at the very end of the query, option async)? So a query that can use an asynchronous index would look like this: //*[@prop = 'x'] option async or //*[@prop = 'x'] order by @otherProperty option async or select [jcr:path] from [nt:base] as a where [prop] 1 option async Regards, Thomas On 14/04/14 06:54, Chetan Mehrotra chetan.mehro...@gmail.com wrote: Hi, In JR2 I believe Lucene was used for all types of queries and not only for full text searches. In Oak we have our own PropertyIndexes for handling queries involving constraints on properties. This I believe provides a more accurate result as its built on top of mvcc support so results obtained are consistent with session state/revision. However this involves creating a index for property to be queried. And the way currently property indexes are stored they consume quite a bit of state (at least in DocumentNodeStore). In comparison Lucene stores the index content in quite compact form. In quite a few cases (like user choice based query builder) it might not be known in advance which property the user would use. As we already have all string property indexed in Lucene. Would it be possible to use Lucene for performing such queries? Or allow the user to choose which types of index he wants to use depending on the usecase. Chetan Mehrotra
Re: svn commit: r1587286 - in /jackrabbit/oak/trunk: oak-core/pom.xml oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreService.java oak-parent/pom.xml
Hi Julian, On Tue, Apr 15, 2014 at 12:39 AM, resc...@apache.org wrote: - Embed-Dependencycommons-dbcp,commons-pool,h2,json-simple/Embed-Dependency + Embed-Dependencycommons-dbcp,commons-pool,h2,json-simple,postgresql,db2,db2-license/Embed-Dependency Embed-Transitivetrue/Embed-Transitive I believe this is a temporary change and would not be required for final implementation? Would be helpful if we add a TODO/FIXME there such that we remember to remove this later Instead of embedding all such types of drivers/dbcp/pool etc within oak-core it would be better to decouple them. For example one approach can be 1. Have a bundle which embeds common-dbcp and required dependencies. It would be responsible for registering a DataSource 2. Driver bundle would be fragments to the bundle #1 as host. With JDBC 4.0 the Driver classes are provided as part of META-INF/services/java.sql.Driver [1]. For such cases fragment bundles can be avoided by having #1 monitor for such drivers and register them programatically 3. DocumentNodeStoreService should only have a reference to DataSource and use that Chetan Mehrotra [1] http://docs.oracle.com/javase/7/docs/api/java/sql/DriverManager.html
Using Lucene indexes for property queries
Hi, In JR2 I believe Lucene was used for all types of queries and not only for full text searches. In Oak we have our own PropertyIndexes for handling queries involving constraints on properties. This I believe provides a more accurate result as its built on top of mvcc support so results obtained are consistent with session state/revision. However this involves creating a index for property to be queried. And the way currently property indexes are stored they consume quite a bit of state (at least in DocumentNodeStore). In comparison Lucene stores the index content in quite compact form. In quite a few cases (like user choice based query builder) it might not be known in advance which property the user would use. As we already have all string property indexed in Lucene. Would it be possible to use Lucene for performing such queries? Or allow the user to choose which types of index he wants to use depending on the usecase. Chetan Mehrotra
Re: jackrabbit-oak build #4073: Errored
I'm sorry but your test run exceeded 50.0 minutes. Build failure is due to TimeOut Chetan Mehrotra On Thu, Apr 10, 2014 at 11:49 AM, Travis CI ju...@apache.org wrote: Build Update for apache/jackrabbit-oak - Build: #4073 Status: Errored Duration: 3002 seconds Commit: a653c0f168842a5d9b1de8072fdcc5f6d216ad12 (trunk) Author: Chetan Mehrotra Message: OAK-1716 - Enable passing of a execution context to runTest in multi threaded runs Exposed a protected method `prepareThreadExecutionContext` which subclasses can override to return a context instance which would be used by that thread of execution git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1586218 13f79535-47bb-0310-9956-ffa450edef68 View the changeset: https://github.com/apache/jackrabbit-oak/compare/2371ef73a4cd...a653c0f16884 View the full build log and details: https://travis-ci.org/apache/jackrabbit-oak/builds/22666739 -- sent by Jukka's Travis notification gateway
Re: Slow full text query performance and Lucene Index handling in Oak
On Wed, Apr 9, 2014 at 12:25 PM, Marcel Reutegger mreut...@adobe.com wrote: Since the Lucene index is in any case updated asynchronously, it should be fine for us to ignore the base NodeState of the current session and instead use an IndexSearcher based on the last state as updated by the async indexer. This would allow us to reuse the IndexSearcher over multiple queries. I was also wondering if it makes sense to share it across multiple sessions performing a query to reduce the number of index readers that may be open at the same time. however, this will likely also reduce concurrency because we synchronize access to a single session. I tried with one approach where I used a custom SerahcerManager based on Lucene SearcherManager. It obtains the root NodeState directly from NodeStore. As NodeStore can be accessed concurrently it should not have any impact on session concurrency With this change there is a slight improvement Oak-Tar1 39 40 40 44 641459 Oak-Tar(Shared)1 32 33 34 36 611738 So did not gave much boost (at least with approach taken). As I do not have much understanding of Lucene internal can someone review the approach taken and see if there are some major issues with it Chetan Mehrotra [1] https://issues.apache.org/jira/secure/attachment/12639366/OAK-1702-shared-indexer.patch [2] https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/SearcherManager.html
Re: Slow full text query performance and Lucene Index handling in Oak
On Wed, Apr 9, 2014 at 3:00 PM, Alex Parvulescu alex.parvule...@gmail.com wrote: - the patch assumes that there is and will be a single lucene index directly under the root node, which may not necessarily be the case. I agree this assumption holds now, but I would not introduce any changes that take away this flexibility. That is not a problem per se as IndexReader starts with a count of 1. So it would never go zero The problem appears to be somewhere else. As I modified the code to use shared IndexSearcher and native FSDirectory and still the performance improvement was marginal. The problem is occuring because the org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndex#query [1] currently does a eager initialization of cursor while the testcase only fetches the first result. Compared to this the JR2 version does a lazy evaluation. If put a break in loop (exit after first result) the results are much better Oak-Tar(break.shared searcher,fs) 1 2 2 3 3 170 23204 Oak-Tar(break) 1 5 5 5 6 90 10593 Jackrabbit 1 4 4 5 6 231 11385 Now I am not sure if this a problem with the usecase taken. Or the Lucene Index cursor management should be improved as in many case the results would be multiple but the client code only makes use of initial few results Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java#L381-L409
Re: Slow full text query performance and Lucene Index handling in Oak
On Wed, Apr 9, 2014 at 5:14 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: Is that a common use case? To better simulate a normal usage scenario I'd make the benchmark fetch up to N results (where N is configurable, with default something like 20) and access the path and the title property of the matching nodes. I changed the logic of benchmark in http://svn.apache.org/r1585962. With that JR2 slows down a bit # FullTextSearchTest C min 10% 50% 90% max N Oak-Tar1 34 35 36 39 601639 Jackrabbit 1 5 5 6 7 68 10038 Profiling the result shows that quite a bit of time goes in org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I think is part of Lucene 4.x and not present in 3.x. Any idea if I can disable compression? Chetan Mehrotra
Re: Slow full text query performance and Lucene Index handling in Oak
Current update 1. Tommaso provided a patch (OAK-1702) to disable compression and that also helps quite a bit 2. Currently we are storing the full tokenized text in Lucene Index [1]. This would cause fetching of doc fields to be slower. On disabling the storage the number improve quite a bit. This was added as part of OAK-319 for supporting MLT # FullTextSearchTest C min 10% 50% 90% max N Oak-Tar (codec)1 9 9 10 12 415664 Oak-Tar (codec,mlt off)1 7 8 8 10 216921 Would look further Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/FieldFactory.java#L44 On Wed, Apr 9, 2014 at 7:15 PM, Alex Parvulescu alex.parvule...@gmail.com wrote: Aside from the compression issue, there was another one related to the 'order by' clause. I saw Collections.sort taking up as far as 23% of the perf. I removed the order by temporarily so it doesn't get in the way of the Lucene stuff, but I think the QueryEngine should skip ordering results in this case. On Wed, Apr 9, 2014 at 3:31 PM, Tommaso Teofili tommaso.teof...@gmail.comwrote: I'm looking into the Lucene codecs right now. Tommaso 2014-04-09 15:20 GMT+02:00 Alex Parvulescu alex.parvule...@gmail.com: Profiling the result shows that quite a bit of time goes in org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I think is part of Lucene 4.x and not present in 3.x. Any idea if I can disable compression? +1 I noticed that too, we should try to disable compression and compare results. alex On Wed, Apr 9, 2014 at 3:16 PM, Chetan Mehrotra chetan.mehro...@gmail.comwrote: On Wed, Apr 9, 2014 at 5:14 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: Is that a common use case? To better simulate a normal usage scenario I'd make the benchmark fetch up to N results (where N is configurable, with default something like 20) and access the path and the title property of the matching nodes. I changed the logic of benchmark in http://svn.apache.org/r1585962. With that JR2 slows down a bit # FullTextSearchTest C min 10% 50% 90% max N Oak-Tar1 34 35 36 39 601639 Jackrabbit 1 5 5 6 7 68 10038 Profiling the result shows that quite a bit of time goes in org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I think is part of Lucene 4.x and not present in 3.x. Any idea if I can disable compression? Chetan Mehrotra
Slow full text query performance and Lucene Index handling in Oak
Hi, As part of OAK-1702 I have added a benchmark to compare the performance of Full text query search with JR2 Based on approach taken (which might be wrong) I get following numbers Apache Jackrabbit Oak 0.21.0-SNAPSHOT # FullTextSearchTest C min 10% 50% 90% max N Oak-Mongo 1 58 71 101 119 287 610 Oak-Mongo-FDS 1 50 51 52 58 1841106 Oak-Tar1 39 40 40 44 641459 Oak-Tar-FDS1 53 54 55 64 1971030 Jackrabbit 1 4 4 5 6 231 11385 Which shows that JR2 performs lot better for full text queries and subsequent queries are quite faster once Lucene has warmed up. Looking at current usage of Lucene in Oak and the way we store and access the Lucene indexes [2] I have couple of doubts 1. Multiple IndexSearcher instances - Current impl would create a new IndexSearcher for every Lucene query as the OakDirectory uses is bound to NodeState of executing JCR session. Compared to this in JR2 we probably had a singleton IndexSearcher which was shared across all the query execution path. This would potentially cause performance issue as Lucene is effectively used in a state less way and it has to perform initialization for every call. As [3] the IndexSearcher must be shared 2. Index Access - Currently we have custom OakDirectory which provides access to Lucene indexes stored in NodeStore. Even with SegmentStore which has memory mapped file the random access used by Lucene would probably be lot slower with OakDirectory in comparison to default Lucene MMapDirectory. For small setups where Lucene index can be accomodated on each node I think it would be better if the index is access from file system Are the above concerns valid and should we relook into how we are using Lucene in Oak? Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-1702 [2] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/OakDirectory.java [3] http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
[jira] [Updated] (JCR-3754) [jackrabbit-aws-ext] Add retry logic to S3 asynchronous failed upload
[ https://issues.apache.org/jira/browse/JCR-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Mehrotra updated JCR-3754: - Resolution: Fixed Status: Resolved (was: Patch Available) Applied the patch in # http://svn.apache.org/r1585459 # http://svn.apache.org/r1585460 # http://svn.apache.org/r1585461 [jackrabbit-aws-ext] Add retry logic to S3 asynchronous failed upload - Key: JCR-3754 URL: https://issues.apache.org/jira/browse/JCR-3754 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-data Affects Versions: 2.7.5 Reporter: Shashank Gupta Assignee: Chetan Mehrotra Fix For: 2.7.6 Attachments: JCR-3754.patch, JCR-3754V2.patch Currently s3 asynchronous uploads are not retried after failure. since failed upload file is served from local cache it doesn't hamper datastore functionality. During next restart all accumulated failed upload files are uploaded to s3 in synchronized manner. There should be retry logic for failed s3 asynchronous upload so that failed uploads are not accumulated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (JCR-3760) FileDataStore: reduce synchronization
[ https://issues.apache.org/jira/browse/JCR-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Mehrotra updated JCR-3760: - Fix Version/s: 2.7.6 FileDataStore: reduce synchronization - Key: JCR-3760 URL: https://issues.apache.org/jira/browse/JCR-3760 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-data Reporter: Thomas Mueller Assignee: Thomas Mueller Fix For: 2.7.6 The FileDataStore uses the following synchronization: synchronized (this) { if (!file.exists()) { return null; } ... File.exists calls are very slow, it would be better if this check could be done outside of the synchronized block. I don't think this would cause any issues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (JCR-3764) Provide an option to disable use of inUseMap in FileDataStore
Chetan Mehrotra created JCR-3764: Summary: Provide an option to disable use of inUseMap in FileDataStore Key: JCR-3764 URL: https://issues.apache.org/jira/browse/JCR-3764 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-data Reporter: Chetan Mehrotra Assignee: Chetan Mehrotra Priority: Minor JR2 FileDataStore#inUseMap [1] is currently a synchronized map and that at times causes contention concurrent env. This map is used for supporting the Blob GC logic for JR2. When used in Oak the GC logic does not rely on isUseMap. So to reduce any overhead FDS should provide an option to disable use of this map [1] https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-data/src/main/java/org/apache/jackrabbit/core/data/FileDataStore.java#L118 -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: svn commit: r1577449 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/segment/ main/java/org/apache/jackrabbit/oak/plugins/segment/file/ main/java/org/apache/ja
On Wed, Apr 2, 2014 at 11:36 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: I consider this an unfortunate recent development. Not sure. There are some deployment scenarios where a shared FileDataStore is a must requirement and thus we need to support cases where blobs can be stored separately from node data. Yes it adds to the complexity of backup but then such a feature is required then that cost has to be paid. Default setups currently do not use FileDataStore or BlobStore with SegmentNodeStore. So as per defaults original design is still honored. Chetan Mehrotra
Re: svn commit: r1583285 - /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/value/ValueImpl.java
On Wed, Apr 2, 2014 at 12:18 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: The getContentIdentity() method has a specific contract and the return value should generally not be interpreted as a referenceable identifier. Ack. If you need a method that exposes the blobId, it would be best to add a separate method for that. But note that not all Blob implementations have a blobId like in BlobStoreBlob. For now there is no strong requirement for that. If need arises would follow up this way Chetan Mehrotra
Re: Question regarding missing _lastRev recovery - OAK-1295
The lease time is set to 1 minute. Would it be ok to check this every minute, from every node? Adding to that the default time intervals are - asyncDelay = 1 sec - The background operation are performed every 1 sec per cluster node. If nothing changes we would fire 1query/sec/cluster node to check the head revision - cluster lease time = 1 min - This is the time after a cluster lease would be renewed. So we need to decide the time interval for Job for detecting recovery condition Chetan Mehrotra On Wed, Apr 2, 2014 at 4:31 PM, Amit Jain am...@ieee.org wrote: Hi, 1) a cluster node starts up and sees it didn't shut down properly. I'm not sure this information is available, but remember we discussed this once. Yes, this case has been taken care of in the startup. this check could be done in the background operations thread on a regular basis. probably depending on the lease interval. The lease time is set to 1 minute. Would it be ok to check this every minute, from every node? Thanks Amit On Wed, Apr 2, 2014 at 4:14 PM, Marcel Reutegger mreut...@adobe.com wrote: Hi, I think the recovery should be triggered automatically by the system when: 1) a cluster node starts up and sees it didn't shut down properly. I'm not sure this information is available, but remember we discussed this once. 2) a cluster node sees a lease timeout of another cluster node and initiates the recovery for the failed cluster node. this check could be done in the background operations thread on a regular basis. probably depending on the lease interval. In addition it would probably also be useful to have the recovery operation available as a command in oak-run. that way you can manually trigger it from the command line. WDYT? Regards Marcel How do we expose _lastRev recovery operation? This would need to check all the cluster nodes info and run recovery for those nodes which need recovery. 1. We either have a scheduled job which checks all the nodes and run the recovery. What should be the interval to trigger the job? 2. Or if we want it run only when triggered manually, then expose an appropriate MBean. Thanks Amit
Re: svn commit: r1577449 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/segment/ main/java/org/apache/jackrabbit/oak/plugins/segment/file/ main/java/org/apache/ja
@Chetan: why would the configs not be stored in the repo? I do not see how this relates to non-OSGi environments Well thats the basic config required to configure DocumentNodeStore/SegmentNodeStore. These config cannot be stored as content. Other settings like security related config currently is not read from NodeStore and in OSGi env is being provided by OSGi ConfigAdmin. And more other settings like Index are currently stored as content Chetan Mehrotra On Wed, Apr 2, 2014 at 3:49 PM, Michael Marth mma...@adobe.com wrote: On 02 Apr 2014, at 08:06, Jukka Zitting jukka.zitt...@gmail.commailto:jukka.zitt...@gmail.com wrote: That design gets broken if components start storing data separately in the repository folder. Agree with that design principle, but the (shared) file system DS is a valid exception IMO (same for the S3 DS). Later we would probably store the config files when using Oak outside of std OSGi env like with PojoSR @Chetan: why would the configs not be stored in the repo? I do not see how this relates to non-OSGi environments
Re: jackrabbit-oak build #3994: Broken
Test case failure on oak-solr Failed tests: testOffsetAndLimit(org.apache.jackrabbit.core.query.LimitAndOffsetTest): expected:1 but was:0 testOffsetAndLimitWithGetSize(org.apache.jackrabbit.core.query.LimitAndOffsetTest): expected:2 but was:0 Chetan Mehrotra On Wed, Apr 2, 2014 at 6:04 PM, Travis CI ju...@apache.org wrote: Build Update for apache/jackrabbit-oak - Build: #3994 Status: Broken Duration: 2194 seconds Commit: 0e0a47ec387626e494a65dd143e3a25a3d004abe (trunk) Author: Julian Reschke Message: OAK-1533 - remove JDBC URL specific constructors from -core git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1583981 13f79535-47bb-0310-9956-ffa450edef68 View the changeset: https://github.com/apache/jackrabbit-oak/compare/1402e7db17c0...0e0a47ec3876 View the full build log and details: https://travis-ci.org/apache/jackrabbit-oak/builds/22096647 -- sent by Jukka's Travis notification gateway
Re: svn commit: r1583994 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/blob/datastore/OakFileDataStore.java test/java/org/apache/jackrabbit/oak/plugins/blob/data
On Wed, Apr 2, 2014 at 6:30 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: The inUse map is in FileDataStore for a reason. Ack. From what I have understood from Blob GC logic in Oak is that it relies on blob last modified value to distinguish between active used blobs. So for performing GC only those blob would be considered whose lastModified value is say 1 day. Only these blobs would be candidate for deletion. This ensures that any blob created in transient space are not considered for GC. So current logic does make an assumption that 1 day is sufficient time and hence not foolproof. However the current impl of inUse would probably only work for a single node system and would fail for shared DataStore scenario as its an in memory state and its hard to determine inUse state for whole cluster. For supporting such case we would have to rely on lastModified time interval to distinguish between active used blobs regards Chetan Chetan Mehrotra
Re: svn commit: r1583325 - in /jackrabbit/oak/trunk: oak-auth-external/pom.xml oak-core/pom.xml oak-jcr/pom.xml oak-mk-perf/pom.xml oak-mk/pom.xml oak-run/pom.xml oak-upgrade/pom.xml
Might be simpler to define the version in oak-parent Chetan Mehrotra On Mon, Mar 31, 2014 at 6:57 PM, resc...@apache.org wrote: Author: reschke Date: Mon Mar 31 13:27:46 2014 New Revision: 1583325 URL: http://svn.apache.org/r1583325 Log: use the latest H2 DB throughout Modified: jackrabbit/oak/trunk/oak-auth-external/pom.xml jackrabbit/oak/trunk/oak-core/pom.xml jackrabbit/oak/trunk/oak-jcr/pom.xml jackrabbit/oak/trunk/oak-mk-perf/pom.xml jackrabbit/oak/trunk/oak-mk/pom.xml jackrabbit/oak/trunk/oak-run/pom.xml jackrabbit/oak/trunk/oak-upgrade/pom.xml Modified: jackrabbit/oak/trunk/oak-auth-external/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-auth-external/pom.xml?rev=1583325r1=1583324r2=1583325view=diff == Binary files - no diff available. Modified: jackrabbit/oak/trunk/oak-core/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/pom.xml?rev=1583325r1=1583324r2=1583325view=diff == --- jackrabbit/oak/trunk/oak-core/pom.xml (original) +++ jackrabbit/oak/trunk/oak-core/pom.xml Mon Mar 31 13:27:46 2014 @@ -275,7 +275,7 @@ dependency groupIdcom.h2database/groupId artifactIdh2/artifactId - version1.3.158/version + version1.3.175/version optionaltrue/optional /dependency dependency Modified: jackrabbit/oak/trunk/oak-jcr/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-jcr/pom.xml?rev=1583325r1=1583324r2=1583325view=diff == --- jackrabbit/oak/trunk/oak-jcr/pom.xml (original) +++ jackrabbit/oak/trunk/oak-jcr/pom.xml Mon Mar 31 13:27:46 2014 @@ -294,7 +294,7 @@ dependency groupIdcom.h2database/groupId artifactIdh2/artifactId - version1.3.158/version + version1.3.175/version scopetest/scope /dependency dependency Modified: jackrabbit/oak/trunk/oak-mk-perf/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-mk-perf/pom.xml?rev=1583325r1=1583324r2=1583325view=diff == --- jackrabbit/oak/trunk/oak-mk-perf/pom.xml (original) +++ jackrabbit/oak/trunk/oak-mk-perf/pom.xml Mon Mar 31 13:27:46 2014 @@ -111,7 +111,7 @@ dependency groupIdcom.h2database/groupId artifactIdh2/artifactId -version1.3.158/version +version1.3.175/version /dependency dependency groupIdcom.cedarsoft.commons/groupId Modified: jackrabbit/oak/trunk/oak-mk/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-mk/pom.xml?rev=1583325r1=1583324r2=1583325view=diff == --- jackrabbit/oak/trunk/oak-mk/pom.xml (original) +++ jackrabbit/oak/trunk/oak-mk/pom.xml Mon Mar 31 13:27:46 2014 @@ -114,7 +114,7 @@ dependency groupIdcom.h2database/groupId artifactIdh2/artifactId - version1.3.158/version + version1.3.175/version optionaltrue/optional /dependency Modified: jackrabbit/oak/trunk/oak-run/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-run/pom.xml?rev=1583325r1=1583324r2=1583325view=diff == --- jackrabbit/oak/trunk/oak-run/pom.xml (original) +++ jackrabbit/oak/trunk/oak-run/pom.xml Mon Mar 31 13:27:46 2014 @@ -142,7 +142,7 @@ dependency groupIdcom.h2database/groupId artifactIdh2/artifactId - version1.3.158/version + version1.3.175/version /dependency dependency groupIdorg.mongodb/groupId Modified: jackrabbit/oak/trunk/oak-upgrade/pom.xml URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-upgrade/pom.xml?rev=1583325r1=1583324r2=1583325view=diff == --- jackrabbit/oak/trunk/oak-upgrade/pom.xml (original) +++ jackrabbit/oak/trunk/oak-upgrade/pom.xml Mon Mar 31 13:27:46 2014 @@ -95,7 +95,7 @@ dependency groupIdcom.h2database/groupId artifactIdh2/artifactId - version1.3.158/version + version1.3.175/version scopetest/scope /dependency dependency
Re: svn commit: r1583325 - in /jackrabbit/oak/trunk: oak-auth-external/pom.xml oak-core/pom.xml oak-jcr/pom.xml oak-mk-perf/pom.xml oak-mk/pom.xml oak-run/pom.xml oak-upgrade/pom.xml
You can define the version in dependencyManagement section and that would be inherited by child projects. For e.g. there are entries for junit, easymock etc. In child project you just define the groupId and artifactId Chetan Mehrotra On Mon, Mar 31, 2014 at 7:22 PM, Julian Reschke julian.resc...@gmx.de wrote: On 2014-03-31 15:31, Chetan Mehrotra wrote: Might be simpler to define the version in oak-parent Chetan Mehrotra ... Likely. I quickly looked at oak-parent and couldn't see any test dependencies over there, so decided to lave it alone for now... Best regards, Julian
Re: svn commit: r1583285 - /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/value/ValueImpl.java
+1. I also missed getting a clean way to get blobId from Blob. So adding this method would be useful in other cases also Chetan Mehrotra On Tue, Apr 1, 2014 at 8:05 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: Hi, On Mon, Mar 31, 2014 at 3:25 PM, Michael Dürig mdue...@apache.org wrote: 2nd try: http://svn.apache.org/r1583413 That's more correct, but has horrible performance with any implementation (including BlobStoreBlob and SegmentBlob) that doesn't precompute the hash. As mentioned earlier, a better alternative would be to add an explicit method for this and let the implementations decide what the best identifier would be. For BlobStoreBlob that would likely be: public String getContentIdentity() { return blobId; } And for SegmentBlob: public String getContentIdentity() { return getRecordId().toString(); } BR, Jukka Zitting
Re: [DocumentNodeStore] Clarify behaviour for Commit.getModified
I think the intention of the method is to return a value in seconds with a five second resolution. Makes sense. Change the logic to use seconds and also fixed method names/constant to reflect that Chetan Mehrotra On Fri, Mar 28, 2014 at 3:28 PM, Marcel Reutegger mreut...@adobe.com wrote: Hi, the fix looks good, but why do you want to convert the seconds to milliseconds again at the end? I think the intention of the method is to return a value in seconds with a five second resolution. we definitively need to add javadoc :-/ Regards Marcel -Original Message- From: Chetan Mehrotra [mailto:chetan.mehro...@gmail.com] Sent: Donnerstag, 27. März 2014 18:05 To: oak-dev@jackrabbit.apache.org Subject: [DocumentNodeStore] Clarify behaviour for Commit.getModified Hi, Currently Commit.getModified has following impl - public static long getModified(long timestamp) { // 5 second resolution return timestamp / 1000 / 5; } - The result when treated as timestamp cause the time to set to 0 i.e. 1970 I intend to fix this with (looking at comment) - public static long getModified(long timestamp) { long timeInSec = TimeUnit.MILLISECONDS.toSeconds(timestamp); timeInSec = timeInSec - timeInSec % 5; return TimeUnit.SECONDS.toMillis(timeInSec); } - Would that be correct approach? Chetan Mehrotrarted
Re: jackrabbit-oak build #3922: Errored
Travis says I'm sorry but your test run exceeded 50.0 minutes. One possible solution is to split up your test run. - Chetan Mehrotra On Fri, Mar 28, 2014 at 4:49 PM, Travis CI ju...@apache.org wrote: Build Update for apache/jackrabbit-oak - Build: #3922 Status: Errored Duration: 3002 seconds Commit: 4690d169b8469689436d14e3cadfe8f56621f99f (trunk) Author: Marcel Reutegger Message: OAK-1341 - DocumentNodeStore: Implement revision garbage collection (WIP) JavaDoc git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1582676 13f79535-47bb-0310-9956-ffa450edef68 View the changeset: https://github.com/apache/jackrabbit-oak/compare/67f784241281...4690d169b846 View the full build log and details: https://travis-ci.org/apache/jackrabbit-oak/builds/21746766 -- sent by Jukka's Travis notification gateway
Remove SynchronizedDocumentStoreWrapper
I see two similar classes - org.apache.jackrabbit.oak.plugins.document.rdb.SynchronizedDocumentStoreWrapper - org.apache.jackrabbit.oak.plugins.document.util.SynchronizingDocumentStoreWrapper And these are not being used anywhere. Further I am not sure what purpose it would serve. Should they be removed. Further can we implement other wrapper via proxies as it increases work if new methods are to be added to DocumentStore Chetan Mehrotra
Re: Remove SynchronizedDocumentStoreWrapper
On Thu, Mar 27, 2014 at 2:30 PM, Julian Reschke julian.resc...@gmx.de wrote: I created the first one, and use it occasionally for debugging. (it's similar to the Logging*Wrapper. Please do not remove. Okie but there already one preset in Util. Are they different? If not we should keep only one Example? Something like public class SynchronizedDocumentStoreWrapper2 { public static DocumentStore create(DocumentStore documentStore){ return (DocumentStore) Proxy.newProxyInstance( SynchronizedDocumentStoreWrapper2.class.getClassLoader(), new Class[] {DocumentStore.class}, new DocumentStoreProxy(documentStore)); } private static class DocumentStoreProxy implements InvocationHandler { private final DocumentStore delegate; private final Object lock = new Object(); private DocumentStoreProxy(DocumentStore delegate) { this.delegate = delegate; } @Override public Object invoke(Object proxy, Method method, Object[] args) throws Throwable { synchronized (lock){ return method.invoke(delegate, args); } } } } Chetan Mehrotra
Re: Remove SynchronizedDocumentStoreWrapper
On Thu, Mar 27, 2014 at 4:00 PM, Julian Reschke julian.resc...@gmx.de wrote: We can kill the one in rdb (I didn't see the other one when I added it). Would do Chetan Mehrotra
Re: AbstractBlobStoreTest
On Thu, Mar 27, 2014 at 10:08 PM, Julian Reschke julian.resc...@gmx.de wrote: if there's a reason not to It might effect test related to GC as GC logic would clean more than expected set of blobs Chetan Mehrotra
[DocumentNodeStore] Clarify behaviour for Commit.getModified
Hi, Currently Commit.getModified has following impl - public static long getModified(long timestamp) { // 5 second resolution return timestamp / 1000 / 5; } - The result when treated as timestamp cause the time to set to 0 i.e. 1970 I intend to fix this with (looking at comment) - public static long getModified(long timestamp) { long timeInSec = TimeUnit.MILLISECONDS.toSeconds(timestamp); timeInSec = timeInSec - timeInSec % 5; return TimeUnit.SECONDS.toMillis(timeInSec); } - Would that be correct approach? Chetan Mehrotrarted
Re: jackrabbit-oak build #3838: Broken
I fixed that issue yesterday but build is currently failing in rat check Warning: org.apache.xerces.parsers.SAXParser: Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized. [INFO] Rat check: Summary of files. Unapproved: 1 unknown: 1 generated: 0 approved: 1031 licence. [INFO] [INFO] Reactor Summary: rat.txt points to two new files that get created under oak-core/oaknodes.trace.db which look like H2 db files. Probably the testcase need to be adjusted to create these files in target folder Chetan Mehrotra On Tue, Mar 25, 2014 at 6:20 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: On Tue, Mar 25, 2014 at 5:49 PM, Michael Dürig mdue...@apache.org wrote: CacheInvalidationIT Looking into it Chetan Mehrotra
Re: jackrabbit-oak build #3838: Broken
Fixed the RDBDocumentStore to create file in target folder. However current approach would cause issue in test env. Would start a separate thread on that Chetan Mehrotra On Wed, Mar 26, 2014 at 12:52 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: I fixed that issue yesterday but build is currently failing in rat check Warning: org.apache.xerces.parsers.SAXParser: Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized. [INFO] Rat check: Summary of files. Unapproved: 1 unknown: 1 generated: 0 approved: 1031 licence. [INFO] [INFO] Reactor Summary: rat.txt points to two new files that get created under oak-core/oaknodes.trace.db which look like H2 db files. Probably the testcase need to be adjusted to create these files in target folder Chetan Mehrotra On Tue, Mar 25, 2014 at 6:20 PM, Chetan Mehrotra chetan.mehro...@gmail.com wrote: On Tue, Mar 25, 2014 at 5:49 PM, Michael Dürig mdue...@apache.org wrote: CacheInvalidationIT Looking into it Chetan Mehrotra
Re: Request for feedback: OSGi Configuration for Query Limits (OAK-1571)
Patch looks fine to me. Probably we can collapse QueryIndexProvider and QueryEngineSettings into a single QueryEngineContext and pass that along till Root. So: is it worth it to have the 100 KB source code overhead just to make things configurable separately for each Oak instance? I think there are couple of benefits * Isolation between multiple oak instance running on same jvm (minor) * It opens up possibility to have session specific settings. So later we require say JR2 compatible behaviour case for some session then those settings can be overlayed with session attributes * it allows to change the setting at runtime via gui as some of these settings would not require repository restart and can effect the next query that gets executed. That would be a major win So this effort now would enable incremental improvements in QueryEngine in future! The Whiteboard is per Oak instance, right? For OSGi case yes Chetan Mehrotra On Wed, Mar 26, 2014 at 2:23 PM, Thomas Mueller muel...@adobe.com wrote: Hi, I'm trying to make some query settings (limits on the number of nodes read) configurable via OSGi. So far, I have a patch of about 100 KB, and this is just wiring together the components (no OSGi / Whiteboard so far). I wonder, is there an easier way to do it? With system properties, it's just a few lines of code. The disadvantage is that all Oak instances in the same JVM use the same settings, but with OSGi configuration I guess in reality it's not much different. So: is it worth it to have the 100 KB source code overhead just to make things configurable separately for each Oak instance? If not, how could it be implemented? The Whiteboard is per Oak instance, right? Regards, Thomas
Re: jackrabbit-oak build #3838: Broken
On Tue, Mar 25, 2014 at 5:49 PM, Michael Dürig mdue...@apache.org wrote: CacheInvalidationIT Looking into it Chetan Mehrotra
Re: jackrabbit-oak build #3809: Broken
My fault. Looking into it Chetan Mehrotra On Mon, Mar 24, 2014 at 11:58 AM, Travis CI ju...@apache.org wrote: Build Update for apache/jackrabbit-oak - Build: #3809 Status: Broken Duration: 444 seconds Commit: afb6c5335b46067a3ea43ce69c987a46d9a3fd38 (trunk) Author: Chetan Mehrotra Message: OAK-1586 - Implement checkpoint support in DocumentNodeStore Initial implementation which stores the checkpoint data as part of NODES collection -- Using Clock for determining current time to simplify testing -- Custom Clock can be specified via Builder git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1580769 13f79535-47bb-0310-9956-ffa450edef68 View the changeset: https://github.com/apache/jackrabbit-oak/compare/6b79fda2e9f2...afb6c5335b46 View the full build log and details: https://travis-ci.org/apache/jackrabbit-oak/builds/21404793 -- sent by Jukka's Travis notification gateway
[jira] [Commented] (JCR-3754) [jackrabbit-aws-ext] Add retry logic to S3 asynchronous failed upload
[ https://issues.apache.org/jira/browse/JCR-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941445#comment-13941445 ] Chetan Mehrotra commented on JCR-3754: -- Couple of comments wrt patch {code:java} public interface AsyncUploadCallback { public void call(DataIdentifier identifier, File file, RESULT result, MapString,Object args); public static String EXCEPTION_ARG = exceptionArg; public enum RESULT { /** * Asynchronous upload has succeeded. */ SUCCESS, /** * Asynchronous upload has failed. */ FAILED, /** * Asynchronous upload has been aborted. */ ABORTED }; } {code} * As {{AsyncUploadCallback}} is part of API it needs to be documented better. What is the purpose of file and identifier and what is callback used for * Using a generic argument map should be avoided. I would prefer if the callback is modelled on [FutureCallback|http://docs.guava-libraries.googlecode.com/git-history/v14.0/javadoc/com/google/common/util/concurrent/FutureCallback.html]. The current impl leads to if-else mode of logic in call. Instead they should be in different methods {noformat} + */ +protected volatile MapDataIdentifier, Integer uploadRetryMap = Collections.synchronizedMap(new HashMapDataIdentifier, Integer()); {noformat} Use final instead of volatile {noformat} +if (asyncWriteCache.hasEntry(fileName, false)) { +synchronized (uploadRetryMap) { +Integer retry = uploadRetryMap.get(identifier); +if (retry == null) { +retry = new Integer(1); +} else { +retry++; +} +if (retry = uploadRetries) { +uploadRetryMap.put(identifier, retry); +LOG.info(Retring [ + retry ++ ] times failed upload for dataidentifer [ ++ identifier + ]); +try { +backend.writeAsync(identifier, file, this); +} catch (DataStoreException e) { + +} +} else { +LOG.info(Retries [ + (retry - 1) ++ ] exhausted for dataidentifer [ ++ identifier + ]); +uploadRetryMap.remove(identifier); +} +} +} +} catch (IOException ie) { +LOG.warn( +Cannot retry failed async file upload. Dataidentifer [ ++ identifier + ], file [ + file.getAbsolutePath() ++ ], ie); +} {noformat} * DataStoreException getting lost * Logs should be warning [jackrabbit-aws-ext] Add retry logic to S3 asynchronous failed upload - Key: JCR-3754 URL: https://issues.apache.org/jira/browse/JCR-3754 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-data Affects Versions: 2.7.5 Reporter: Shashank Gupta Assignee: Chetan Mehrotra Fix For: 2.7.6 Attachments: JCR-3754.patch Currently s3 asynchronous uploads are not retried after failure. since failed upload file is served from local cache it doesn't hamper datastore functionality. During next restart all accumulated failed upload files are uploaded to s3 in synchronized manner. There should be retry logic for failed s3 asynchronous upload so that failed uploads are not accumulated. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: friendly reminder about license headers
Roger that! For Java file IDE takes care of them. Probably we can just exclude test/resources from rat plugin? Most of the missing headers are reported in that probably Chetan Mehrotra On Thu, Mar 20, 2014 at 2:50 PM, Alex Parvulescu alex.parvule...@gmail.com wrote: Yes boys and girls, files need licence headers! Please check new files before committing them, last 2 days I found 3 occurrences, probably more than the entire last month put together. When in doubt, run your builds with the pedantic profile activated. (mvn clean install -PintegrationTesting,pedantic) your friendly release manager
[jira] [Assigned] (JCR-3754) [jackrabbit-aws-ext] Add retry logic to S3 asynchronous failed upload
[ https://issues.apache.org/jira/browse/JCR-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Mehrotra reassigned JCR-3754: Assignee: Chetan Mehrotra [jackrabbit-aws-ext] Add retry logic to S3 asynchronous failed upload - Key: JCR-3754 URL: https://issues.apache.org/jira/browse/JCR-3754 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-data Affects Versions: 2.7.5 Reporter: Shashank Gupta Assignee: Chetan Mehrotra Fix For: 2.7.6 Attachments: JCR-3754.patch Currently s3 asynchronous uploads are not retried after failure. since failed upload file is served from local cache it doesn't hamper datastore functionality. During next restart all accumulated failed upload files are uploaded to s3 in synchronized manner. There should be retry logic for failed s3 asynchronous upload so that failed uploads are not accumulated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (JCR-3755) Export S3DataStore package to enable osgi resolution
[ https://issues.apache.org/jira/browse/JCR-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Mehrotra updated JCR-3755: - Affects Version/s: 2.7.5 Export S3DataStore package to enable osgi resolution Key: JCR-3755 URL: https://issues.apache.org/jira/browse/JCR-3755 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-data Affects Versions: 2.7.5 Reporter: Amit Jain Assignee: Chetan Mehrotra Priority: Minor Fix For: 2.7.6 Attachments: JCR_3755.patch S3DataStore package org.apache.jackrabbit.ext.ds should be exported from the bundle so that osgi resolution is possible on bundles depending on this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (JCR-3755) Export S3DataStore package to enable osgi resolution
[ https://issues.apache.org/jira/browse/JCR-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Mehrotra updated JCR-3755: - Fix Version/s: 2.7.6 Export S3DataStore package to enable osgi resolution Key: JCR-3755 URL: https://issues.apache.org/jira/browse/JCR-3755 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-data Affects Versions: 2.7.5 Reporter: Amit Jain Assignee: Chetan Mehrotra Priority: Minor Fix For: 2.7.6 Attachments: JCR_3755.patch S3DataStore package org.apache.jackrabbit.ext.ds should be exported from the bundle so that osgi resolution is possible on bundles depending on this. -- This message was sent by Atlassian JIRA (v6.2#6252)