Export org.apache.jackrabbit.oak.cache package from oak-core (OAK-3598)

2015-11-08 Thread Chetan Mehrotra
For OAK-3092 oak-lucene would need to access classes from
org.apache.jackrabbit.oak.cache package. For now its limited to
CacheStats to expose the cache related statistics

I have opened for that OAK-3598. Kindly provide your feedback there
around wether its fine to start exporting this package for consumption
by oak-lucene

Chetan Mehrotra


Re: Why does oak-core import every package with an optional resolution?

2015-10-26 Thread Chetan Mehrotra
Looking at history of oak-core/pom.xml  this change was done in [1]
for OAK-1708 most like to support loading of various DB drivers from
within Oak Core and probably a temp change which was not looked back
again. That might not be required now as the DataSource gets injected
and oak-core need not be aware of drivers etc. So we can get rid of
that

@Julian - Thoughts?


Chetan Mehrotra
[1] 
https://github.com/apache/jackrabbit-oak/commit/7f844d5cde52dc53c41cc01aad9079afb275438a


On Mon, Oct 26, 2015 at 4:20 PM, Francesco Mari
<mari.france...@gmail.com> wrote:
> A friendly reminder of this issue. Is there a specific reason why
> every dependency in oak-core has an optional resolution?
>
> 2015-10-22 15:34 GMT+02:00 Francesco Mari <mari.france...@gmail.com>:
>> Hi,
>>
>> can somebody explain me why oak-core has the "Import-Package"
>> directive set to "*;resolution:=optional"?
>>
>> The effect of this directive is that *every* imported package is
>> declared optional in the manifest file. Because of this, the OSGi
>> framework will always resolve the oak-core bundle, even if some of its
>> requirements are not satisfied. In particular, oak-core must always be
>> ready to cope with NoClassDefFoundExceptions.
>>
>> We should definitely fix this.


Re: svn commit: r1710162 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak: api/PropertyState.java api/Tree.java api/Type.java api/package-info.java plugins/memory/ModifiedNo

2015-10-23 Thread Chetan Mehrotra
On Fri, Oct 23, 2015 at 3:15 PM,  <f...@apache.org> wrote:
> public final class Type implements Comparable<Type> {
>
> -private static final Map<String, Type> TYPES = newHashMap();
> +private static final Map<String, Type> TYPES = new HashMap<String, 
> Type>();
>
>  private static  Type create(int tag, boolean array, String string) 
> {
>  Type type = new Type(tag, array, string);
> @@ -242,10 +237,23 @@ public final class Type implements Co
>
>  @Override
>  public int compareTo(@Nonnull Type that) {
> -return ComparisonChain.start()
> -.compare(tag, that.tag)
> -.compareFalseFirst(array, that.array)
> -.result();
> +if (tag < that.tag) {
> +return -1;
> +}
> +
> +if (tag > that.tag) {
> +return 1;
> +}
> +
> +if (!array && that.array) {
> +return -1;
> +}
> +
> +if (array && !that.array) {
> +return 1;
> +}
> +
> +return 0;
>  }

I am fine with removing dependency on Guava from API but not sure if
we should remove it from implementation side

Also it would be good to have a testcase to validate the above logic
if we are removing dependency from a tested Guava utility method

Chetan Mehrotra


Re: Reindexing problems

2015-10-21 Thread Chetan Mehrotra
> (a) Hardcode (not rely on the Whiteboard or OSGi) the known indexes

That would not work if the implementation makes use of OSGi features
like configuration or DI. For e.g. Lucene implementation relies on
OSGi config and also to expose certain extension points

> (b) Where we can't use hardcoding, use hard service references (Whiteboard / 
> OSGi).

+1. That would be preferable. I think we can go for approach taken in
OAK-3201 as depending on setup even custom implementation might be
required. So just hard references would not help and we would need to
make the component which registers repository to be aware of all its
*required* dependencies

>  (c) If we can't do that, block or fail commits if one of the configured 
> indexes is not available, for example for the Solr index (if such an index is 
> configured).

+1. Current approach is problamatic. Missing index provider is more of
a setup issue which can be addressed by system admin and repository
should not try to handle that. So failing the commit should be fine.

> Additionally, for "synchronous" indexes (property index and so on), I would 
> like to always create and reindex them asynchronously by default,

That might be tricky for DocumentNodeStore as even if you build them
asynchronously when final merge happens then it might be very
expensive to deal with such a large branch commit. Also if a critical
index like uuid/reference index it would be better if system does not
get started otherwise it would trigger large traversal if no index was
present or previous revision of index is not usable (due to some
corruption)
Chetan Mehrotra


On Wed, Oct 21, 2015 at 2:24 PM, Thomas Mueller <muel...@adobe.com> wrote:
> Hi,
>
> If an index provider is (temporarily) not available, the 
> MissingIndexProviderStrategy resets the index so it is re-indexed. This is a 
> problem (OAK-2024, OAK-2203, OAK-2429, OAK-3325, OAK-3366, OAK-3505, 
> OAK-3512, OAK-3513), because re-indexing is slow and one transaction. It can 
> also cause many threads to concurrently build the index. Currently, 
> synchronous indexes are built in one "transaction", which is anyway a 
> performance problem (for new indexes and reindexing). If an index is not 
> available when running a query, traversal is used, which is also a problem.
>
> What about:
>
> * (a) Hardcode (not rely on the Whiteboard or OSGi) the known indexes for 
> property, reference, nodeType, lucene, counter index. This is for both 
> writing (IndexEditor) and reading (QueryIndex) . That way, those indexes are 
> always available, and we never get into a situation where they are 
> temporarily not available.
>
> * (b) Where we can't use hardcoding, use hard service references (Whiteboard 
> / OSGi).
>
> * (c) If we can't do that, block or fail commits if one of the configured 
> indexes is not available, for example for the Solr index (if such an index is 
> configured).
>
> Additionally, for "synchronous" indexes (property index and so on), I would 
> like to always create and reindex them asynchronously by default, and only 
> once they are available switch to sychronous mode. I think (but I'm not sure) 
> this is OAK-1456.
>
> What do you think?
>
> Regards,
> Thomas
>


Re: jackrabbit-oak build #6619: Errored

2015-10-12 Thread Chetan Mehrotra
Past failures having following different reasons
- 2 failed with timeout
- 1 failed with OOM
- 1 failed with due to intermittent test failure in
QueryResultTest#testGetSize (OAK-2689)
- 1 failed in Segment test which looks like a new one @Michale/Alex
can you have a look?


Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.158
sec <<< FAILURE!
removeSome[0](org.apache.jackrabbit.oak.plugins.segment.CompactionMapTest)
 Time elapsed: 0.26 sec  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNull(Assert.java:551)
at org.junit.Assert.assertNull(Assert.java:562)
at 
org.apache.jackrabbit.oak.plugins.segment.CompactionMapTest.removeSome(CompactionMapTest.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
====
Chetan Mehrotra


On Tue, Oct 13, 2015 at 12:16 AM, Travis CI <ju...@apache.org> wrote:
> Build Update for apache/jackrabbit-oak
> -
>
> Build: #6619
> Status: Errored
>
> Duration: 3004 seconds
> Commit: 3e80198beb76e65934524d2467ce22e2cf0b7fe9 (1.0)
> Author: Chetan Mehrotra
> Message: OAK-3504 - CopyOnRead directory should not schedule a copy task for 
> non existent file
>
> Merging 1708105
>
>
> git-svn-id: 
> https://svn.apache.org/repos/asf/jackrabbit/oak/branches/1.0@1708108 
> 13f79535-47bb-0310-9956-ffa450edef68
>
> View the changeset: 
> https://github.com/apache/jackrabbit-oak/compare/ea3cd2c2bd5c...3e80198beb76
>
> View the full build log and details: 
> https://travis-ci.org/apache/jackrabbit-oak/builds/84919313
>
> --
> sent by Jukka's Travis notification gateway


Re: svn commit: r1704844 - in /jackrabbit/oak/trunk: oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/counter/jmx/ oak-core/src/main/java/org/apache/jackrabbit/oak/query/ oak-core/src/ma

2015-09-24 Thread Chetan Mehrotra
On Thu, Sep 24, 2015 at 1:56 PM, Thomas Mueller <muel...@adobe.com> wrote:
> what about getIndexCostInfo

+1

Chetan Mehrotra


Re: svn commit: r1704844 - in /jackrabbit/oak/trunk: oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/counter/jmx/ oak-core/src/main/java/org/apache/jackrabbit/oak/query/ oak-core/src/ma

2015-09-23 Thread Chetan Mehrotra
Hi Thomas,

On Wed, Sep 23, 2015 at 6:51 PM,  <thom...@apache.org> wrote:
>  /**
> + * Get the index cost. The query must already be prepared.
> + *
> + * @return the index cost
> + */
> +String getIndexCost();

Should this be returning string? May be we should name it better

Chetan Mehrotra


Re: svn commit: r1704655 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/document/ test/java/org/apache/jackrabbit/oak/plugins/document/ test/java/org/apache/jackr

2015-09-22 Thread Chetan Mehrotra
Hi Marcel,

A short description of what was the fix would be very helpful for
future reference!
Chetan Mehrotra


On Tue, Sep 22, 2015 at 9:00 PM,  <mreut...@apache.org> wrote:
> Author: mreutegg
> Date: Tue Sep 22 15:30:08 2015
> New Revision: 1704655
>
> URL: http://svn.apache.org/viewvc?rev=1704655=rev
> Log:
> OAK-3433: Commit does not detect conflict when background read happens after 
> rebase
>
> Add yet another test, enable existing test and implement fix
>
> Modified:
> 
> jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStore.java
> 
> jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/LastRevRecoveryAgent.java
> 
> jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/UnsavedModifications.java
> 
> jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/JournalTest.java
> 
> jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/mongo/ClusterConflictTest.java
>
> Modified: 
> jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStore.java
> URL: 
> http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStore.java?rev=1704655=1704654=1704655=diff
> ==
> --- 
> jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStore.java
>  (original)
> +++ 
> jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStore.java
>  Tue Sep 22 15:30:08 2015
> @@ -2084,9 +2084,9 @@ public final class DocumentNodeStore
>  BackgroundWriteStats backgroundWrite() {
>  return unsavedLastRevisions.persist(this, new 
> UnsavedModifications.Snapshot() {
>  @Override
> -public void acquiring() {
> +public void acquiring(Revision mostRecent) {
>  if (store.create(JOURNAL,
> -
> singletonList(changes.asUpdateOp(getHeadRevision() {
> +singletonList(changes.asUpdateOp(mostRecent {
>  changes = JOURNAL.newDocument(getDocumentStore());
>  }
>  }
>
> Modified: 
> jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/LastRevRecoveryAgent.java
> URL: 
> http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/LastRevRecoveryAgent.java?rev=1704655=1704654=1704655=diff
> ==
> --- 
> jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/LastRevRecoveryAgent.java
>  (original)
> +++ 
> jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/LastRevRecoveryAgent.java
>  Tue Sep 22 15:30:08 2015
> @@ -235,7 +235,7 @@ public class LastRevRecoveryAgent {
>  unsaved.persist(nodeStore, new UnsavedModifications.Snapshot() {
>
>  @Override
> -public void acquiring() {
> +public void acquiring(Revision mostRecent) {
>  if (lastRootRev == null) {
>  // this should never happen - when unsaved has no 
> changes
>  // that is reflected in the 'map' to be empty - in 
> that
>
> Modified: 
> jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/UnsavedModifications.java
> URL: 
> http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/UnsavedModifications.java?rev=1704655=1704654=1704655=diff
> ==
> --- 
> jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/UnsavedModifications.java
>  (original)
> +++ 
> jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/UnsavedModifications.java
>  Tue Sep 22 15:30:08 2015
> @@ -159,7 +159,7 @@ class UnsavedModifications {
>  time = clock.getTime();
>  Map<String, Revision> pending;
>  try {
> -snapshot.acquiring();
> +snapshot.acquiring(getMostRecentRevision());
>  pending = Maps.newTreeMap(PathComparator.INSTANCE);
>  pending.putAll(map);
>  } finally {
> @@ -234,14 +234,26 @@ class UnsavedModifications {
&g

Re: JCR node name and non space whitespace chars like line break etc

2015-09-16 Thread Chetan Mehrotra
Opened OAK-3412 to track this issue.
Chetan Mehrotra


On Tue, Sep 15, 2015 at 3:24 PM, Julian Reschke <julian.resc...@gmx.de> wrote:
> On 2015-09-15 10:03, Chetan Mehrotra wrote:
>>
>> If I change the code in Namespace#isValidLocalName to use
>> Character#isWhitespace then NameValidator starts objecting to node
>> names having non space whitespace chars. So looks like current impl
>> has a issue and we should fix it. But fixing it would be now a
>> backward incompatible change!
>>
>> Thoughts?
>> Chetan Mehrotra
>
>
> We should fix it in any case. It's compatible with the original code, and
> with Jackrabbit.
>
> Best regards, Julian


Re: JCR node name and non space whitespace chars like line break etc

2015-09-15 Thread Chetan Mehrotra
If I change the code in Namespace#isValidLocalName to use
Character#isWhitespace then NameValidator starts objecting to node
names having non space whitespace chars. So looks like current impl
has a issue and we should fix it. But fixing it would be now a
backward incompatible change!

Thoughts?
Chetan Mehrotra


On Mon, Sep 14, 2015 at 4:13 PM, Chetan Mehrotra
<chetan.mehro...@gmail.com> wrote:
> Micheal Durig mentioned that this has been discussed earlier.  So
> going back in time this was discussed in OAK-1891 and thread [1]. Oak
> used to prevent such nodes from getting created earlier but that logic
> was changed as part of OAK-1174 and r1582804 [0] and check was moved
> to NameValidator class (see Namespace#isValidLocalName).
>
> However when that change was done the check initially used
> Character#isWhitespace and switched to using Character#isSpaceChar
> which is limited to very few checks. Now looks like isSpaceChar is
> returning false for '\n', '\r' etc not sure why.
>
> Chetan Mehrotra
> [0] 
> https://github.com/apache/jackrabbit-oak/commit/342809f7f04221782ca6bbfbde9392ec4ff441c2
>
> [1] 
> http://mail-archives.apache.org/mod_mbox/jackrabbit-oak-dev/201406.mbox/%3ccab+dfin-smo5egc-m2ma_wwhar8eme+czwwdob1wjuvej+n...@mail.gmail.com%3E
>
>
> On Mon, Sep 14, 2015 at 3:28 PM, Michael Marth <mma...@adobe.com> wrote:
>> Hi Chetan,
>>
>> Given that JR2 did not allow those characters I see no good reason why Oak 
>> should.
>>
>> my2c
>> Michael
>>
>>
>>
>>
>> On 14/09/15 11:47, "Chetan Mehrotra" <chetan.mehro...@gmail.com> wrote:
>>
>>>Hi Team,
>>>
>>>While looking into OAK-3395 it was realized that in Oak we allow node
>>>name with non space whitespace chars like \t, \r etc. This is
>>>currently causing problem in DocumentNodeStore logic (which can be
>>>fixed).
>>>
>>>However it might be better to prevent such node name to be created as
>>>it can cause problem other. Specially when JR2 does not allow creation
>>>of such node names [1]
>>>
>>>So the question is
>>>
>>>Should Oak allow node names with non space whitespace chars like \t, \r etc
>>>
>>>Chetan Mehrotra
>>>[1] 
>>>https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-spi-commons/src/main/java/org/apache/jackrabbit/spi/commons/conversion/PathParser.java#L257


Re: JCR node name and non space whitespace chars like line break etc

2015-09-14 Thread Chetan Mehrotra
Micheal Durig mentioned that this has been discussed earlier.  So
going back in time this was discussed in OAK-1891 and thread [1]. Oak
used to prevent such nodes from getting created earlier but that logic
was changed as part of OAK-1174 and r1582804 [0] and check was moved
to NameValidator class (see Namespace#isValidLocalName).

However when that change was done the check initially used
Character#isWhitespace and switched to using Character#isSpaceChar
which is limited to very few checks. Now looks like isSpaceChar is
returning false for '\n', '\r' etc not sure why.

Chetan Mehrotra
[0] 
https://github.com/apache/jackrabbit-oak/commit/342809f7f04221782ca6bbfbde9392ec4ff441c2

[1] 
http://mail-archives.apache.org/mod_mbox/jackrabbit-oak-dev/201406.mbox/%3ccab+dfin-smo5egc-m2ma_wwhar8eme+czwwdob1wjuvej+n...@mail.gmail.com%3E


On Mon, Sep 14, 2015 at 3:28 PM, Michael Marth <mma...@adobe.com> wrote:
> Hi Chetan,
>
> Given that JR2 did not allow those characters I see no good reason why Oak 
> should.
>
> my2c
> Michael
>
>
>
>
> On 14/09/15 11:47, "Chetan Mehrotra" <chetan.mehro...@gmail.com> wrote:
>
>>Hi Team,
>>
>>While looking into OAK-3395 it was realized that in Oak we allow node
>>name with non space whitespace chars like \t, \r etc. This is
>>currently causing problem in DocumentNodeStore logic (which can be
>>fixed).
>>
>>However it might be better to prevent such node name to be created as
>>it can cause problem other. Specially when JR2 does not allow creation
>>of such node names [1]
>>
>>So the question is
>>
>>Should Oak allow node names with non space whitespace chars like \t, \r etc
>>
>>Chetan Mehrotra
>>[1] 
>>https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-spi-commons/src/main/java/org/apache/jackrabbit/spi/commons/conversion/PathParser.java#L257


JCR node name and non space whitespace chars like line break etc

2015-09-14 Thread Chetan Mehrotra
Hi Team,

While looking into OAK-3395 it was realized that in Oak we allow node
name with non space whitespace chars like \t, \r etc. This is
currently causing problem in DocumentNodeStore logic (which can be
fixed).

However it might be better to prevent such node name to be created as
it can cause problem other. Specially when JR2 does not allow creation
of such node names [1]

So the question is

Should Oak allow node names with non space whitespace chars like \t, \r etc

Chetan Mehrotra
[1] 
https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-spi-commons/src/main/java/org/apache/jackrabbit/spi/commons/conversion/PathParser.java#L257


Re: Lucene Property Index and OR condition

2015-09-07 Thread Chetan Mehrotra
How did you performed the test? If you tested out with explain then
current code in 1.0/1.2 return misleading result and this got fixed in
trunk with OAK-2943. Technically Oak would convert the OR clause to a
union query and then each part of union should then be able to make
use of index.
Chetan Mehrotra


On Mon, Sep 7, 2015 at 6:36 PM, Davide Giannella <dav...@apache.org> wrote:
> On 07/09/2015 14:32, Burkhard Pauli wrote:
>> ...
>> Question: Does the Lucene property index support or conditions? I tried
>> even with a oak property index without success.
>>
>
> I can be be totally wrong, so please take this carefully, but AFAIR
> lucene property index does not support ORs.
>
> This is mainly used for tests but should be valid for real-life as well
>
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/IndexPlanner.java#L452
>
> Davide
>
>


Re: svn commit: r1700720 - in /jackrabbit/oak/trunk/oak-lucene/src: main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexEditorContext.java test/java/org/apache/jackrabbit/oak/jcr/query

2015-09-02 Thread Chetan Mehrotra
Hi Tommaso,

On Wed, Sep 2, 2015 at 1:28 PM,  <tomm...@apache.org> wrote:
> +Analyzer definitionAnalyzer = definition.getAnalyzer();
> +Map<String, Analyzer> analyzers = new HashMap<String, 
> Analyzer>();
> +analyzers.put(FieldNames.SPELLCHECK, new 
> ShingleAnalyzerWrapper(LuceneIndexConstants.ANALYZER, 3));
> +Analyzer analyzer = new 
> PerFieldAnalyzerWrapper(definitionAnalyzer, analyzers);
> +IndexWriterConfig config = new IndexWriterConfig(VERSION, 
> analyzer);

Have a look at IndexDefinition#createAnalyzer which already creates a
PerFieldAnalyzerWrapper. So would be better to move this logic there.
Or you want to customize the analyzer for that field only during
indexing.

Chetan Mehrotra


Re: [VOTE] Epics in Jira

2015-09-01 Thread Chetan Mehrotra
+1
Chetan Mehrotra


On Tue, Sep 1, 2015 at 1:10 PM, Davide Giannella <dav...@apache.org> wrote:
> Hello team,
>
> some of us noticed we lack the epics in our jira so I raised an issue
> asking whether that would be possible to have them.
>
> https://issues.apache.org/jira/browse/INFRA-10185
>
> had a I reply (which TBH didn't really understand completely). Feel free
> to follow-up on the issue itself if you require more details.
>
> Can we start a vote session for changing our jira schema to allows epics?
>
> My vote is +1.
>
> Cheers
> Davide
>
>


Re: New committer: Francesco Mari

2015-08-30 Thread Chetan Mehrotra
Welcome Francesco!
Chetan Mehrotra


On Fri, Aug 28, 2015 at 5:18 PM, Michael Dürig mdue...@apache.org wrote:
 Hi,

 Please welcome Francesco as a new committer and PMC member of the Apache
 Jackrabbit project. The Jackrabbit PMC recently decided to offer Francesco
 committership based on his contributions. I'm happy to announce that he
 accepted the offer and that all the related administrative work has now been
 taken care of.

 Welcome to the team, Francesco!

 Michael


Re: persistent set of strings

2015-08-24 Thread Chetan Mehrotra
Hi Tomek,

To start with I think a flat file based approach should be fine. While
working on [1] it was observed that 2M blobId consumed 500MB memory.
As this logic is to be implemented in oak-run probably it should be
fine for now to just use a in memory HashSet

Later if it becomes problem we can think of some offheap solution. You
can also look into using MVStore which is being used in
DocumentNodeStore for persistent cache.

Chetan Mehrotra
[1] 
https://issues.apache.org/jira/browse/OAK-2882?focusedCommentId=14550198page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14550198


On Mon, Aug 24, 2015 at 5:17 PM, Tomek Rekawek reka...@adobe.com wrote:
 Hello,

 I started working on OAK-3148, which is a new feature that allows to 
 gradually migrate blobs from one store to another, without turning off the 
 instance. In order to create the SplitBlobStore I need a way to remember (and 
 save) already transferred blob ids.

 So, basically I need a persistent and mutable set of strings. Do we have 
 something like this in Oak already? I thought about a few custom solutions:

 1. Saving blob ids in a file (at the beginning it can be a flat text file, 
 then some b-tree), with a memory cache and/or bloom filter.
   - but it adds complexity, requires the maintenance, etc.
 2. Creating SegmentNodeStore, with bucketing via the hashcode
   - but running the second segment node store just to persist a bunch of ids 
 seems a little excessive.
 3. Custom cache solution, like ehcache
   - but adding a new, big library just to support this feature doesn’t seem 
 right as we have to deal with dependency versions, embedding, etc.

 So, maybe there is some lightweight and reliable “4” in the Oak already?

 Thanks,
 Tomek

 --
 Tomek Rękawek | Adobe Research | www.adobe.com
 reka...@adobe.com


Re: [Oak origin/trunk] Apache Jackrabbit Oak matrix - Build # 335 - Still Failing

2015-08-19 Thread Chetan Mehrotra
On Tue, Aug 18, 2015 at 8:12 PM, Michael Dürig mdue...@apache.org wrote:
 This is caused by MBeanIntegrationTest. Any idea what could have caused
 this? Chetan?

Seen this before also and was not able to reproduce locally.

Looking again today I realized that its resource cleanup issue and
some test is not shutting down the repository properly leaving mbeans
registered. The order in which test get executed on CI and on local
differs. Its bit hard to reproduce the same order of execution so have
to create suite and reduce the possible candidates to 2

import org.junit.runner.RunWith;
import org.junit.runners.Suite;

@RunWith(Suite.class)
@Suite.SuiteClasses({SimpleRepositoryFactoryTest.class,
 MBeanIntegrationTest.class,
})
public class TestSuite
{
}

And then execute that within IDE or command line and issue was seen.
SimpleRepositoryFactoryTest was not closing the created repo. Fixed
that now with rev http://svn.apache.org/r1696522. Hopefully this
should fix the issue!


Chetan Mehrotra


Re: [Oak origin/trunk] Apache Jackrabbit Oak matrix - Build # 337 - Still Failing

2015-08-19 Thread Chetan Mehrotra
Failure is again in MBeanServerIntegration test. However the build was
done at rev '63c4a4db95b0f39f9ab27f499416c99c918a4955' [1] which is 2
days old and has yet not picked my changes of last day. Lets see
couple of more run till it fetches new revision

btw find it strange that build is not on svn but git as the git
mirrors might lag behind!

Chetan Mehrotra
[1] 
https://github.com/apache/jackrabbit-oak/commit/63c4a4db95b0f39f9ab27f499416c99c918a4955


On Wed, Aug 19, 2015 at 9:35 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build 
 #337)

 Status: Still Failing

 Check console output at 
 https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/337/ to view 
 the results.


Recipe for using Oak in standalone environments

2015-08-18 Thread Chetan Mehrotra
Hi,

Off late I have seen quite a few queries from people trying to use Oak
in non OSGi environment like embedding repository in a webapp deployed
on Tomcat or just in standalone application. Our current documented
way [1] is very limited and repository constructed in such a way does
not expose the full power of Oak stack and user also has to know many
internal setup details of Oak to get it working correctly.

Quite a bit of setup logic in Oak is now dependent on OSGi
configuration. Trying to setup Oak without using those would not
provide a stable and performant setup. Instead of that if we can have
a way to reuse all the OSGi based setup support and still enable users
to use Oak in non OSGi env then that would provide a more stable setup
approach.

Recipe
==

For past sometime I have been working on oak-pojosr module [2]. This
module can now stable and can be used to setup Oak with all the OSGi
support in a non OSGi world like webapp. For an end user the steps
required would be

1. Create a config file for enabling various parts of Oak

{
org.apache.felix.jaas.Configuration.factory-LoginModuleImpl: {
jaas.controlFlag: required,
jaas.classname:
org.apache.jackrabbit.oak.security.authentication.user.LoginModuleImpl,
jaas.ranking: 100
},
...,
org.apache.jackrabbit.oak.jcr.osgi.RepositoryManager: {},
org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService : {
mongouri : mongodb://${mongodb.host}:${mongodb.port},
db : ${mongodb.name}
}
}

2. Add dependency to oak-pojosr and thus include various transitive
dependencies like Felix Connect, SCR, ConfigAdmin etc

3. Construct a repository instance

import org.apache.jackrabbit.commons.JcrUtils;

MapString,String config = new HashMapString, String();
config.put(org.apache.jackrabbit.repository.home, /path/to/repo);
config.put(org.apache.jackrabbit.oak.repository.configFile,
/path/to/oak-config.json);

Repository repository = JcrUtils.getRepository(config);

Thats all! This would construct a full stack Oak based on OSGi config
with all Lucene, Solr support usable.

Examples


WebApp


I have adapted the existing Jackrabbit Webapp module to work with Oak
and be constructed based on oak-pojor [3]. You can check out the app
and just run

 mvn jetty:run

Access the webui at http://localhost:8080 and create a repository as
per UI. It currently has following features

1. Repository is configurable via a JSON file copies to 'oak' folder (default)

2. Felix WebConsole is integrated - Allows developer to view OSGi
state and config etc
Check /osgi/system/console

3. Felix Script Console integrated to get programatic access to repository

4. All Oak MBean registered and can be used by user to perform
maintainence tasks

Spring Boot


Clay has been working a Oak based application [4] which uses Spring
Boot [7]. The fork of the same at [5] is now using pojosr to configure
a repository to be used in Spring [6]. In addition again Felix
WebConsole etc would also work

To try it out checkout the application and build it. Then run following command

 java -jar target/com.meta64.mobile-0.0.1-SNAPSHOT.jar --jcrHome=oak 
 --jcrAdminPassword=password --aeskey=password --server.port=8990 
 --spring.config.location=classpath:/application.properties,classpath:/application-dev.properties

And then access the app at 8990 port

Proposal
===

Do share your feedback around above proposed approach. In particular
following aspect

Q - Should we make oak-pojosr based setup as one of the
recommended/supported approach for configuring Oak in non OSGi env

Chetan Mehrotra
[1] http://jackrabbit.apache.org/oak/docs/construct.html
[2] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-pojosr
[3] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-examples/webapp
[4] https://github.com/Clay-Ferguson/meta64
[5] https://github.com/chetanmeh/meta64/tree/oak-pojosr
[6] 
https://github.com/chetanmeh/meta64/blob/oak-pojosr/src/main/java/com/meta64/mobile/repo/OakRepository.java#L218
[7] http://projects.spring.io/spring-boot/


Re: [Oak origin/trunk] Apache Jackrabbit Oak matrix - Build # 331 - Still Failing

2015-08-16 Thread Chetan Mehrotra
There were 2 failures

-
Test Result (2 failures / -39)

org.apache.jackrabbit.oak.jcr.OrderableNodesTest.setPrimaryType[0]
org.apache.jackrabbit.oak.run.osgi.MBeanIntegrationTest.jmxIntegration
-

One of the failure was in PojoSR which looks like due to Repository
shutdown not complete. Did a fix with OAK-3203. Hopefully that should
fix it

--
Assertion failed:

assert mbeans.size() == 1
   |  |  |
   |  2  false
   
[org.apache.jackrabbit.oak.management.RepositoryManager[org.apache.jackrabbit.oak:name=repository
manager,type=RepositoryManagement,id=146],
org.apache.jackrabbit.oak.management.RepositoryManager[org.apache.jackrabbit.oak:name=repository
manager,type=RepositoryManagement,id=44]]

at 
org.codehaus.groovy.runtime.InvokerHelper.assertFailed(InvokerHelper.java:398)
at 
org.codehaus.groovy.runtime.ScriptBytecodeAdapter.assertFailed(ScriptBytecodeAdapter.java:646)
at 
org.apache.jackrabbit.oak.run.osgi.MBeanIntegrationTest.jmxIntegration(MBeanIntegrationTest.groovy:47)
--
Chetan Mehrotra


On Sat, Aug 15, 2015 at 10:39 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build 
 #331)

 Status: Still Failing

 Check console output at 
 https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/331/ to view 
 the results.


Re: [Oak origin/trunk] Apache Jackrabbit Oak matrix - Build # 324 - Still Failing

2015-08-11 Thread Chetan Mehrotra
Builds are failing due to missing snapshot dependency. And as the CI
for Jackrabbit is not working
(https://builds.apache.org/job/Jackrabbit-trunk/) the snapshots are
not there. I am deploying snapshot builds from my system. Hopefully
that should get this build going

[ERROR] Failed to execute goal on project oak-blob-cloud: Could not
resolve dependencies for project
org.apache.jackrabbit:oak-blob-cloud:bundle:1.4-SNAPSHOT: The
following artifacts could not be resolved:
org.apache.jackrabbit:jackrabbit-jcr-commons:jar:2.11.0-SNAPSHOT,
org.apache.jackrabbit:jackrabbit-data:jar:2.11.0-SNAPSHOT,
org.apache.jackrabbit:jackrabbit-data:jar:tests:2.11.0-SNAPSHOT:
Failure to find
org.apache.jackrabbit:jackrabbit-jcr-commons:jar:2.11.0-SNAPSHOT in
http://repository.apache.org/snapshots was cached in the local
repository, resolution will not be reattempted until the update
interval of Nexus has elapsed or updates are forced - [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with
the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn goals -rf :oak-blob-cloud
Chetan Mehrotra


On Wed, Aug 12, 2015 at 9:50 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build 
 #324)

 Status: Still Failing

 Check console output at 
 https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/324/ to view 
 the results.


Re: [Oak origin/1.0] Apache Jackrabbit Oak matrix - Build # 325 - Still Failing

2015-08-11 Thread Chetan Mehrotra
Now the compile passes but 55 test fail. Majority of them from Solr.
Opened OAK-3215 to track that
Chetan Mehrotra


On Wed, Aug 12, 2015 at 10:38 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build 
 #325)

 Status: Still Failing

 Check console output at 
 https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/325/ to view 
 the results.


Re: build failure due to oak-pojosr ??

2015-08-10 Thread Chetan Mehrotra
Looks like some issue with my recent work. Would have a look.

Thanks for filing the issue and marking current one as ignored!
Chetan Mehrotra


On Mon, Aug 10, 2015 at 2:02 PM, Angela Schreiber anch...@adobe.com wrote:
 hi

 i get the following failures in oak-pojosr. is it only me?
 anybody working on this?

 Running org.apache.jackrabbit.oak.run.osgi.LuceneSupportTest
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 11.447 sec
  FAILURE!
 fullTextSearch(org.apache.jackrabbit.oak.run.osgi.LuceneSupportTest)  Time
 elapsed: 11.442 sec   ERROR!
 java.lang.reflect.UndeclaredThrowableException
 at com.sun.proxy.$Proxy14.shutdown(Unknown Source)
 at 
 org.apache.jackrabbit.api.JackrabbitRepository$shutdown.call(Unknown
 Source)
 at
 org.apache.jackrabbit.oak.run.osgi.AbstractRepositoryFactoryTest.tearDown(A
 bstractRepositoryFactoryTest.groovy:61)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:5
 7)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImp
 l.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod
 .java:45)
 at
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.
 java:15)
 at
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.j
 ava:42)
 at
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:36)
 at 
 org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46)
 at org.junit.rules.RunRules.evaluate(RunRules.java:18)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
 at
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.ja
 va:68)
 at
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.ja
 va:47)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
 at
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java
 :252)
 at
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provid
 er.java:141)
 at
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:
 112)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:5
 7)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImp
 l.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(Reflec
 tionUtils.java:189)
 at
 org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(Provi
 derFactory.java:165)
 at
 org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFac
 tory.java:85)
 at
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBoot
 er.java:115)
 at
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:5
 7)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImp
 l.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.jackrabbit.oak.run.osgi.OakOSGiRepositoryFactory$RepositoryProxy
 .invoke(OakOSGiRepositoryFactory.java:396)
 ... 34 more
 Caused by: java.lang.IllegalStateException: Service already unregistered.
 at
 org.apache.felix.connect.felix.framework.ServiceRegistrationImpl.unregister
 (ServiceRegistrationImpl.java:128)
 at
 org.apache.jackrabbit.oak.osgi.OsgiWhiteboard$1.unregister(OsgiWhiteboard.j
 ava:75)
 at
 org.apache.jackrabbit.oak.spi.whiteboard.CompositeRegistration.unregister(C
 ompositeRegistration.java:43)
 at org.apache.jackrabbit.oak.Oak$6.close(Oak.java:640)
 at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:303)
 at
 org.apache.jackrabbit.oak.jcr.repository.RepositoryImpl.shutdown(Repository
 Impl.java:308)
 ... 39 more

 Running org.apache.jackrabbit.oak.run.osgi.NodeStoreConfigTest
 Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.129 sec
 Running

Re: Copy jackrabbit-webapp module to Oak as oak-webapp

2015-08-06 Thread Chetan Mehrotra
 Is this an official part of Oak or rather an example?

For now its an example untill its stable and we have a consensus on
wether proposed approach should be the recommended approach for
configuring Oak in standalone cases.

 if it's an example we can put it to oak-example.

That should be fine. So should the final location be oak/oak-example/webapp ?

I can then move the current copy to that place.  I have some local
commit now in my git-svn. Once I am done I would commit then to
current place and then move them to final place.

Would that be OK?



Chetan Mehrotra


Updating a single page on the site

2015-08-06 Thread Chetan Mehrotra
At times we need to modify only single page on the site. And while
doing that many times we deploy the whole site. The effort can be
reduced slightly by using following approach

1. In oak-doc module make require changes in markdown files
2. Run `mvn site-deploy -Dscmpublish.skipCheckin=true`
3. Then go to oak-doc/target/scmpublish-checkout
4. and directly commit the changed html files via svn commit

This is also documented under oak-doc/README.md [1] and has proved to
be a faster approach for me compared to building and deploying the
whole site

Chetan Mehrotra
[1] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-doc


Re: [VOTE] Release Apache Jackrabbit 2.10.2

2015-08-06 Thread Chetan Mehrotra
On Thu, Aug 6, 2015 at 2:21 PM, Angela Schreiber anch...@adobe.com wrote:
 i am fine...
 quite frankly i am surprised to see that there is no 2.10
 branch.

I think this was discussed earlier [1]. Looks like we would have to
revisit that decision and continue with stable/unstable releases

Chetan Mehrotra
[1] http://markmail.org/thread/p7k6lzbebgrgoz63


Copy jackrabbit-webapp module to Oak as oak-webapp

2015-08-05 Thread Chetan Mehrotra
Hi Team,

Currently we do not have good example around how to run Oak properly
in standalone environment. One of the good example is
jackrabbit-webapp [1] module which serve as a blueprint for any user
on how to embed Oak. Currently this module only enables running Oak
with Segment store and that to with most basic setup.

I would like to modify this module to use oak-pojorsr [2] to configure
complete Oak stack as we do it in Sling. For that I would like to copy
this module to oak under oak-webapp and then refactor it to run
complete Oak stack.

Thoughts?

Chetan Mehrotra
[1] https://github.com/apache/jackrabbit/tree/trunk/jackrabbit-webapp
[2] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-pojosr


Re: Copy jackrabbit-webapp module to Oak as oak-webapp

2015-08-05 Thread Chetan Mehrotra
On Thu, Aug 6, 2015 at 12:16 AM, Davide Giannella dav...@apache.org wrote:
 Will then mean it will work as http API for
 Oak? I'm not familiar with jackrabbit-webapp

jackrabbit-webapp demonstrates a way to configure Jackrabbit
repository in standalone env and have it running in a WebApp. It also
configures the webdav servlet and JCR Remoting which work with any
repository implementation and thus should work with Oak.

Chetan Mehrotra


Re: 1.3.4 cut

2015-08-03 Thread Chetan Mehrotra
Hi Davide,

It would be helpful if while moving the bugs to next version we do not
add any comment like 'Bulk Move to xxx'. This would reduce the
unnecessary noise in bug comment history.

Recently this was discussed on DL at [1]

Chetan Mehrotra
[1] http://markmail.org/thread/2jvphlkdw4eqaxdh


On Mon, Aug 3, 2015 at 11:38 AM, Davide Giannella dav...@apache.org wrote:
 Good morning team,

 today I'd like to cut 1.3.4. Ideally around 10AM CEST.

 We have 46 issues left and none marked as blocker.

 https://issues.apache.org/jira/issues/?jql=project%20%3D%20OAK%20AND%20fixVersion%20%3D%201.3.4%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC

 If you want any issue to be included in the cut, as usual simply say it.

 Please act on your issues if resolved or not and I'll bulk move all the
 others to 1.3.5.

 Cheers
 Davide

 PS: sorry for short notice. Just back from holidays and was looking at
 my agenda :)




Re: JCR + MongoDb + Lucene Holy Grail Finally Found

2015-08-02 Thread Chetan Mehrotra
Thanks Clay for putting this together. Current documentation is not
good for standalone usage as quite a bit of logic of configuring Oak
is based on OSGi. Due to that using Oak as is in standalone
environment is tricky

The oak-pojosr [1] was intended to enable use of Oak with all its OSGi
based config in non OSGi env like say war deployment. Need to get some
time to finish it and adopt the standalone web example to use that

Chetan Mehrotra
[1] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-pojosr


On Sun, Aug 2, 2015 at 11:26 PM, Clay Ferguson wcl...@gmail.com wrote:
 Fellow Jackrabbits,

 I discovered it was a *huge* effort and there is a *huge* lack in the
 examples related to simply getting MongoDb up and running (as JCR) with
 Lucene indexes getting properly used. Since this effort took me 2 solid days
 and there are no great examples that come up via google i'm sharing my
 example:

 This code creates a Full Text Search on jcr:content, and an sorting
 capability on jcr:lastModified:

 https://github.com/Clay-Ferguson/meta64/blob/master/src/main/java/com/meta64/mobile/repo/OakRepository.java

 I also just updated meta64 project to be using the 1.0.18 branch of the
 Jackrabbit code, so it's all up to date stuff. I would highly recommend
 adding this or a similar example right onto the Lucene page of the Oak docs,
 because what I'm doing is exactly what everyone else wants, and the
 documentation itself is just completely confusing and mind boggling without
 a real example.

 Cheers, and happy jackrabbiting.

 Best regards,
 Clay Ferguson
 wcl...@gmail.com
 meta64.com



Re: [VOTE] Release Apache Jackrabbit Oak 1.0.18

2015-07-31 Thread Chetan Mehrotra
On Fri, Jul 31, 2015 at 5:39 PM, Julian Reschke
julian.resc...@greenbytes.de wrote:
 Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.595 sec
  FAILURE!

 copyOnWriteAndLocks(org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorTest)
 Time elapsed: 0.182 sec   ERROR!
 org.apache.jackrabbit.oak.api.CommitFailedException: OakLucene0003: Failed
 to index the node /test

Thats a known issue on Windows OAK-3072. Not a blocker for this release

Chetan Mehrotra


Re: [VOTE] Release Apache Jackrabbit Oak 1.0.18

2015-07-31 Thread Chetan Mehrotra
+1
Chetan Mehrotra


On Fri, Jul 31, 2015 at 3:44 PM, Alex Parvulescu
alex.parvule...@gmail.com wrote:
 [X] +1 Release this package as Apache Jackrabbit Oak 1.0.18

 On Fri, Jul 31, 2015 at 11:01 AM, Amit Jain am...@apache.org wrote:

 A candidate for the Jackrabbit Oak 1.0.18 release is available at:

 https://dist.apache.org/repos/dist/dev/jackrabbit/oak/1.0.18/

 The release candidate is a zip archive of the sources in:


 https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.0.18/

 The SHA1 checksum of the archive is
 ef8e68edfef9b0c1470fe9de4d10d127f741633b.

 A staged Maven repository is available for review at:

 https://repository.apache.org/

 The command for running automated checks against this release candidate is:

 $ sh check-release.sh oak 1.0.18
 ef8e68edfef9b0c1470fe9de4d10d127f741633b

 Please vote on releasing this package as Apache Jackrabbit Oak 1.0.18.
 The vote is open for the next 72 hours and passes if a majority of at
 least three +1 Jackrabbit PMC votes are cast.

 [ ] +1 Release this package as Apache Jackrabbit Oak 1.0.18
 [ ] -1 Do not release this package because...

 My vote is +1

 Thanks
 Amit



Do not add comments when bulk moves are performed in JIRA

2015-07-29 Thread Chetan Mehrotra
Hi Team,

Currently most of the issues scheduled for 1.3.x release have comments
like 'Bulk Move to xxx'. This creates unnecessary noise in the comment
log. Would it be possible to move the issues to next version silently
i.e. just get fix version changed and not add any comment

Chetan Mehrotra


Re: [discuss] Near real time search to account for latency in background indexing

2015-07-24 Thread Chetan Mehrotra
On Fri, Jul 24, 2015 at 12:15 PM, Michael Marth mma...@adobe.com wrote:
 From your description I am not sure how the indexing would be triggered for 
 local changes. Probably not through the Async Indexer (this would not gain us 
 much, right?). Would this be a Commit Hook?

My thought was to use an Observor so as to not add cost to commit
call. Observor would listen only for local changes and would invoke
IndexUpdate on the diff

Chetan Mehrotra


Re: [discuss] Near real time search to account for latency in background indexing

2015-07-24 Thread Chetan Mehrotra
On Fri, Jul 24, 2015 at 2:40 PM, Amit Jain am...@ieee.org wrote:
 Well that would work for a single node deployment when TarMK is used
 but would still have a lag as based on frequency of AsyncIndexer which
 we are seeing is having delays of upto 10-20 sec and may vary. For
 cluster where indexing is happening on a single node it cannot be
 used.


 But wouldn't that be a problem for the in-memory index also?

Nopes. The in memory index would be managed on each cluster node and
has visibility to *local* changes happening on that cluster node. So
it would certainly not provide a cluster wide real time search. But it
help in those cases where user is performing changes via his session
established with a single cluster node and is not able to see affect
of his latest changes.  We have seen some problems reported so far, so
we can wait further to see if the problem affect more usecases and
then decide to invest in such a feature!

Chetan Mehrotra


Re: [discuss] Near real time search to account for latency in background indexing

2015-07-24 Thread Chetan Mehrotra
Hi Ian,

To be clear the in memory index is purely ephemeral and is not meant
to be persisted. It just compliments the persistent index to allow
access to recently added/modified entries. So now to your queries

 How will you deal with JVM failure ?
Do nothing. The index as explained is transient. Current AsyncIndex
would anyway be performing the usual indexing and is resilient enough

 How frequently will commits to the persisted index be performed ?
This index lives separately. Persisted index managed by AsyncIndex works as is

 I assume that switching to use ElasticSearch, which delivers NRT reliably
in the 0.1s range has been rejected as an option ?

No. The problem here is bit different. Lucene indexes are being used
for all sort of indexing currently in Oak. In many cases its being
used as purely property index. ES makes sense mostly for global
fulltext index and would be an overkill for smaller more focused
property index types of usecases.

 If it has, you may find yourself implementing much of the core of
ElasticSearch to make NTR work properly in a cluster.

Again usecase here is not to support NTR as is. Current indexing would
work as is and this transient index would compliment it.
Chetan Mehrotra


On Fri, Jul 24, 2015 at 1:01 PM, Ian Boston i...@tfd.co.uk wrote:
 Hi Chetan,

 The overall approach looks ok.

 Some questions about indexing.

 How will you deal with JVM failure ?
 and related.
 How frequently will commits to the persisted index be performed ?

 I assume that switching to use ElasticSearch, which delivers NRT reliably
 in the 0.1s range has been rejected as an option ?

 If it has, you may find yourself implementing much of the core of
 ElasticSearch to make NTR work properly in a cluster.

 Best Regards
 Ian


 On 24 July 2015 at 08:09, Chetan Mehrotra chetan.mehro...@gmail.com wrote:

 On Fri, Jul 24, 2015 at 12:15 PM, Michael Marth mma...@adobe.com wrote:
  From your description I am not sure how the indexing would be triggered
 for local changes. Probably not through the Async Indexer (this would not
 gain us much, right?). Would this be a Commit Hook?

 My thought was to use an Observor so as to not add cost to commit
 call. Observor would listen only for local changes and would invoke
 IndexUpdate on the diff

 Chetan Mehrotra



Re: [discuss] Near real time search to account for latency in background indexing

2015-07-24 Thread Chetan Mehrotra
On Fri, Jul 24, 2015 at 2:19 PM, Tommaso Teofili
tommaso.teof...@gmail.com wrote:
 I think it'd be possible though to make use of Lucene's NRT capability by
 changing a bit the code that creates an IndexReader [2] to use
 DirectoryReader#open(IndexWriter,boolean) [3].

Well that would work for a single node deployment when TarMK is used
but would still have a lag as based on frequency of AsyncIndexer which
we are seeing is having delays of upto 10-20 sec and may vary. For
cluster where indexing is happening on a single node it cannot be
used.

Chetan Mehrotra


Re: [discuss] Near real time search to account for latency in background indexing

2015-07-24 Thread Chetan Mehrotra
On Fri, Jul 24, 2015 at 12:56 PM, Michael Marth mma...@adobe.com wrote:
 Would the indexer need to be Lucene-based?

Need not be. The reason I preferred using Lucene is that current
property index only support single condition evaluation. Having a
Lucene based impl would allow most of current JCR Query - Lucene
Query mapping logic getting reused as is.

But yes it would be an interesting approach where instead of doing
this as Lucene query index level QE can work with both indexes and
combine the results. So an aspect to keep in mind

Chetan Mehrotra


Re: Cleanup Callback to IndexEditor in case of exception in processing

2015-07-23 Thread Chetan Mehrotra
On Thu, Jul 23, 2015 at 4:19 PM, Davide Giannella dav...@apache.org wrote:
 So to have a callback that is always invoked either on
 success or failure.

For now just a callback for failure is missing. The editors current
anyway perform required cleanup when leaving the root node which kind
of act like a success callback. If we can get both then much better!

Chetan Mehrotra


[discuss] Near real time search to account for latency in background indexing

2015-07-23 Thread Chetan Mehrotra
Hi Team,

As the use of async index like lucene is growing we would need to
account for delay in showing updated result due to async nature of
indexing. Depending on system load the asyn indexer might lag behind
the latest state by some margin. We have improved quite a bit in terms
of performance but by design there would be a lag and with load that
lag would increase at times.

For e.g. a typical flow in content authoring involves the user
uploading some asset to application. And after uploading the asset he
goes to the authoring view and look for that uploaded asset via
content finder kind of ui. That ui relies on query to show the
available assets. Due to delay introduced by async indexer it would
take some time (10-15 sec)

To account for that we can go for a near real time (NRT*) in memory
indexing which would complement the actual persisted async indexer and
would exploit the fact the request from same user in a give session
would most likely hit same cluster node.

Below is brief proposal - This would require changes in layer above in
Oak but for now focus is on feasibility.

Proposal
===

A - Indexing Side
--

The Lucene index can be configured to support NRT mode. If this mode
is enabled then on each cluster node we would perform AsyncIndex only
for local changes. For such indexer LuceneIndexEditor would use a
RAMDirectory. This directory would only have *recently* modified/added
documents.

B - Query Side
---

On Query side the LucenePropertyIndex would perform search against two
IndexSearcher

1. IndexSearcher based on persisted OakDirectory
2. IndexSearcher obtained from the current active IndexWrite used with
RAMDirectory

Query would be performed against both and a merged cursor [2] would be
returned back

C - Benefits


This approach would allow the user to at least see his modifications
appear quickly in search results and would make the search results
accuracy more deterministic.

This feature need not be enabled globally but can be enabled on per
index basis. Based on business requirement

D- Challenges
---
1. Ensuring that RAMDirectory is bounded and only contain recently
modified documents. The lower limit can be based on last indexed time
from AsyncIndexer. Periodically we would need to prune old documents
from this RAMDirectory

2. IndexUpdate would need to be adapted to support this hybrid model
for same index type - So something to be looked into

Thoughts?

Chetan Mehrotra

NRT - Near real Time is technically a Lucene term
https://wiki.apache.org/lucene-java/NearRealtimeSearch. However using
here as approach is bit similar!

[2] Such a merged cursor and performing query against multiple
searcher would anyway be required to support zero downtime kind of
requirement where index content would be split across local and global
instance


Re: svn commit: r1692367 - /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreService.java

2015-07-23 Thread Chetan Mehrotra
On Thu, Jul 23, 2015 at 3:08 PM,  thom...@apache.org wrote:
 OAK-260 Avoid the Turkish Locale Problem

So the fix version is ... ;)

Chetan Mehrotra


Re: svn commit: r1692177 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins: blob/BlobStoreBlob.java document/DocumentNodeStore.java

2015-07-22 Thread Chetan Mehrotra
On Wed, Jul 22, 2015 at 4:33 PM, Manfred Baedke
manfred.bae...@gmail.com wrote:
 which is just as broken in IMHO and failed to write blobs in Oak2Oak
 migration scenarios.

Thats why I would like to have a testcase for this. So far current
designed assumed that BlobStore is singleton. Supporting blobs from
multiple BlobStore would impact other places also.

 I would also consider two blobs to be equal iff they contain the same binary 
 content, which is also not the contract we use for BlobStoreBlob.

Have a look at AbstractBlob#equal on what is considered as a blob
equality contract which takes care of that. So probably we use that
here

So if we plan to support such case then we cleanup this part first and
then fix it.

wdyt?

Chetan Mehrotra


Re: svn commit: r1692177 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins: blob/BlobStoreBlob.java document/DocumentNodeStore.java

2015-07-22 Thread Chetan Mehrotra
On Wed, Jul 22, 2015 at 8:01 PM, Manfred Baedke
manfred.bae...@gmail.com wrote:
 verifying that the BlobStoreBlob in question comes from this very instance.
 It shouldn't use equals(), though.

Makes sense. Lets discuss this via patch on the bug itself then!

Chetan Mehrotra


Re: svn commit: r1692177 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins: blob/BlobStoreBlob.java document/DocumentNodeStore.java

2015-07-21 Thread Chetan Mehrotra
Hi Manfred,

On Tue, Jul 21, 2015 at 10:58 PM,  bae...@apache.org wrote:
 +if (bsbBlobStore != null  bsbBlobStore.equals(blobStore)) {
 +return bsb.getBlobId();
 +}

Can we have a testcase for this scenario? So far we do not have
requirement to support equality for BlobStore instance. So would like
to understand the usecase preferably with a testcase. May be the
problem affect SegmentNodeStore also (not sure though)

Chetan Mehrotra


Re: svn commit: r1692177 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins: blob/BlobStoreBlob.java document/DocumentNodeStore.java

2015-07-21 Thread Chetan Mehrotra
Also note that its possible that later we may be wrapping the
BlobStore instance. For example add a wrapper for monitioring and in
such a case this equality condition might fail. A more stable fix
would be to check with registered BlobStore weather it knows the given
blobId or not.
Chetan Mehrotra


On Wed, Jul 22, 2015 at 9:09 AM, Chetan Mehrotra
chetan.mehro...@gmail.com wrote:
 Hi Manfred,

 On Tue, Jul 21, 2015 at 10:58 PM,  bae...@apache.org wrote:
 +if (bsbBlobStore != null  bsbBlobStore.equals(blobStore)) 
 {
 +return bsb.getBlobId();
 +}

 Can we have a testcase for this scenario? So far we do not have
 requirement to support equality for BlobStore instance. So would like
 to understand the usecase preferably with a testcase. May be the
 problem affect SegmentNodeStore also (not sure though)

 Chetan Mehrotra


Re: New committer: Stefan Egli

2015-07-20 Thread Chetan Mehrotra
Welcome Stefan!
Chetan Mehrotra


On Mon, Jul 20, 2015 at 1:56 PM, Michael Dürig mdue...@apache.org wrote:

 Hi,

 Please welcome Stefan as a new committer and PMC member of the Apache
 Jackrabbit project. The Jackrabbit PMC recently decided to offer Stefan
 committership based on his contributions. I'm happy to announce that he
 accepted the offer and that all the related administrative work has now been
 taken care of.

 Welcome to the team, Stefan!

 Michael


Utility method for show time duration in words

2015-07-14 Thread Chetan Mehrotra
Hi,

At times I feel a need for a utility method which can convert time in
mills to words for logs and JMX

There are two options I see

1. commons-lang DurationFormatterUtils [1] - Adding dependency to
whole of commons-lang might not make sense. So we can probably copy
it. It though depends on others common lang classes so copying would
be tricky

2. Guava Stopwatch private method [2] - Guava Stopwatch internally has
such a method but its not exposed. Probably we can copy that and
expose that in oak-commons.

Thoughts?

Chetan Mehrotra
[1] 
https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/time/DurationFormatUtils.java
[2] 
https://github.com/google/guava/blob/master/guava/src/com/google/common/base/Stopwatch.java#L216


Re: svn commit: r1690941 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/document/rdb/ test/java/org/apache/jackrabbit/oak/plugins/document/

2015-07-14 Thread Chetan Mehrotra
Hi Julian,

On Tue, Jul 14, 2015 at 7:57 PM,  resc...@apache.org wrote:
 +
 +ListString ids = new ArrayListString();
 +for (T doc : documents) {
 +ids.add(doc.getId());
 +}
 +LOG.debug(insert of  + ids +  failed, ex);
 +
 +// collect additional exceptions
 +String messages = LOG.isDebugEnabled() ? 
 RDBJDBCTools.getAdditionalMessages(ex) : ;
 +if (!messages.isEmpty()) {
 +LOG.debug(additional diagnostics:  + messages);
 +}

If all that work is to be done for debug logging then probably the
whole block should be within isDebugEnabled check

+ LOG.debug(insert of  + ids +  failed, ex);

Instead of using concatenation it would be better to use placeholders like

LOG.debug(insert of {} failed,ids, ex);

Chetan Mehrotra


Re: RDBDocumentStore using same table schema for all collections

2015-07-13 Thread Chetan Mehrotra
On Mon, Jul 13, 2015 at 3:53 PM, Julian Reschke julian.resc...@gmx.de wrote:
 Simplicity and the complete lack of contract. How would the DS
 implementation *know* what needs to be indexed?

Then we should define the contract. What needs to be indexed is an
important information and should be made available to DocumentStore
explicitly

 might be large is guesswork, no? The additional columns are all 
 numbers/flags.

For now yes a guesswork. But logically this duplication should not
happen! In addition most of such indexes are defined as sparse in
Mongo. For RDB I think there would be DB specific approaches for
creating sparse index. Currently RDBDocumentStore stores some default
value if no value is specified. It might be better to store null there
[1]

 Each *document*? Did you mean collection?
Yes I meant collection there.

Chetan Mehrotra
[1] http://stackoverflow.com/questions/8764910/sparse-column-vs-indirection


Re: svn commit: r1690861 - /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java

2015-07-13 Thread Chetan Mehrotra
If this looks like a bug (which it appears to be) then it would be
better if we create an issue for this and have that merged to branches
also
Chetan Mehrotra


On Tue, Jul 14, 2015 at 9:28 AM,  dbros...@apache.org wrote:
 Author: dbrosius
 Date: Tue Jul 14 03:58:28 2015
 New Revision: 1690861

 URL: http://svn.apache.org/r1690861
 Log:
 fix typo in equals which did not validate that parm was of right type

 Modified:
 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java

 Modified: 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java
 URL: 
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java?rev=1690861r1=1690860r2=1690861view=diff
 ==
 --- 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java
  (original)
 +++ 
 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java
  Tue Jul 14 03:58:28 2015
 @@ -722,7 +722,7 @@ public class SelectorImpl extends Source
  public boolean equals(Object other) {
  if (this == other) {
  return true;
 -} else if (!(this instanceof SelectorImpl)) {
 +} else if (!(other instanceof SelectorImpl)) {
  return false;
  }
  return selectorName.equals(((SelectorImpl) other).selectorName);




Re: [VOTE] Release Apache Jackrabbit Oak 1.0.17

2015-07-10 Thread Chetan Mehrotra
+1. All checks passed
Chetan Mehrotra


On Fri, Jul 10, 2015 at 1:28 PM, Davide Giannella dav...@apache.org wrote:
 [X] +1 Release this package as Apache Jackrabbit Oak 1.0.17
 Davide




Re: /oak:index (DocumentNodeStore)

2015-07-09 Thread Chetan Mehrotra
On Thu, Jul 9, 2015 at 12:45 PM, Marcel Reutegger mreut...@adobe.com wrote:
 - Data in Oak is multi-versioned. It must be possible to query
   nodes at a specific revision of the tree.

To add - That also makes it difficult to use Mongo indexes as the
index itself is versioned. So instead of just indexing property 'foo'
you need to index it for every revision

Chetan Mehrotra


RDBDocumentStore using same table schema for all collections

2015-07-09 Thread Chetan Mehrotra
Hi,

Looking at RDBDocumentStore it appears that it is using same table
schema for all collections. For e.g. columns like deletedOnce,
hasBinary are only required for NodeDocument. However they are present
in the tables

Any specific reason for doing this and not going for schema per collection?

This is fine for small collection like settings and clusterNodes. But
for bigger collection like journal the overhead of such empty columns
might be large. It would be better if each Document provides a set of
column names along with types to be indexed and then RDBDocumentStore
create the correct schema.

Chetan Mehrotra


Managing backport work for issues fixed in trunk

2015-07-03 Thread Chetan Mehrotra
Hi Team,

Often we consider some issue need to be merged to one of the branch
but it is not immediately required. For e.g. a practice we have
recently started is to have some new feature implemented in trunk and
then have it enabled by default there. Once we find it to be stable we
enable it by default in branches. For such work I typically create 2
issues, A (e.g. OAK-3069) for actual work and B (e.g. OAK-3073) for
tracking making it enabled by default.

Now #B has to be marked resolved in trunk but I have to keep a mental
note that this needs to be done in branch also sometime later. This
approach is error prone.

Instead if we make use of labels to mark issues which are suitable
_candidates_ for merge to branches then we can track such issues via
JIRA query and revisit them when we cut new releases on branch.

I propose we use following labels

candidate_oak_1_0
candidate_oak_1_2

For issues to be merged to 1.0 and 1.2 branches.

Then later we can query for such issues.

Thoughts?

Chetan Mehrotra


Re: Baseline warnings

2015-07-01 Thread Chetan Mehrotra
On Wed, Jul 1, 2015 at 12:33 PM, Marcel Reutegger mreut...@adobe.com wrote:
 I think we should explicitly manage the version of *all* exported
 packages and add the required package-info.java files now.

Yup that should be done. I thought we did this already for all such
exported package. Which ones we are seeing the error ... may be
introduce later

Chetan Mehrotra


Re: [Oak origin/trunk] Apache Jackrabbit Oak matrix - Build # 236 - Still Failing

2015-07-01 Thread Chetan Mehrotra
Most failures in RemoteServerIT due to address already in use. Opened
OAK-3065 for that

java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at 
org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at 
org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at 
org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.eclipse.jetty.server.Server.doStart(Server.java:291)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at 
org.apache.jackrabbit.oak.remote.http.handler.RemoteServer.start(RemoteServer.java:54)
at 
org.apache.jackrabbit.oak.remote.http.handler.RemoteServerIT.setUp(RemoteServerIT.java:134)
Chetan Mehrotra


On Thu, Jul 2, 2015 at 9:51 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build 
 #236)

 Status: Still Failing

 Check console output at 
 https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/236/ to view 
 the results.


Re: OSGi configuration lookup

2015-06-30 Thread Chetan Mehrotra
On Tue, Jun 30, 2015 at 12:00 PM, Francesco Mari
mari.france...@gmail.com wrote:
 I suggest to fix OAK-3022 maintaining exactly
 the same behaviour, and without changing the SegmentNodeStoreService

Makes sense. They are two different issue

Chetan Mehrotra


Re: OSGi configuration lookup

2015-06-29 Thread Chetan Mehrotra
Looking at code flow now yes it differs. The thought behind reading
from framework property first was to provide a simple way to override
the config which might be packaged by default. For e.g. while
launching Oak via Sling one can provide the framework property at
command line (using -Doak.mongo.uri) which would supercede the one
packaged by default. This simplifies the testing.
Chetan Mehrotra


On Mon, Jun 29, 2015 at 7:01 PM, Davide Giannella dav...@apache.org wrote:
 On 29/06/2015 10:22, Francesco Mari wrote:
 ...

 Is it possible - or does it make sense - to make this behaviour
 uniform across components?

 I think it's a good idea to uniform this aspect. Maybe we could put it
 down as a guideline by setting up a new page on the doc site:
 code-conventions.md. Somewhere beside:
 http://jackrabbit.apache.org/oak/docs/dev_getting_started.html

 Personally I'd go for component first and bundle then, but I'm not too
 religious about it :)

 Anyone against it?

 Davide




Re: OSGi configuration lookup

2015-06-29 Thread Chetan Mehrotra
That can be done but some more details!

When properties are read from framework properties then property names
are prefixed with 'oak.documentstore.' kind of like namespaced as
framework properties are flat and global. So if we need to do that for
Segment also then we should use a similar namespaceing. For e.g. if
the property name is 'cache' then when reading from fwk then
'oak.documentstore.cache' would be used

oak.mongo.uri and oak.mongo.db are spacial cased though and not follow
this rule.
Chetan Mehrotra


On Tue, Jun 30, 2015 at 2:55 AM, Francesco Mari
mari.france...@gmail.com wrote:
 So we should probably adopt this strategy instead. This means that
 SegmentNodeStoreService is the one that should be modified.

 2015-06-29 17:15 GMT+02:00 Chetan Mehrotra chetan.mehro...@gmail.com:
 Looking at code flow now yes it differs. The thought behind reading
 from framework property first was to provide a simple way to override
 the config which might be packaged by default. For e.g. while
 launching Oak via Sling one can provide the framework property at
 command line (using -Doak.mongo.uri) which would supercede the one
 packaged by default. This simplifies the testing.
 Chetan Mehrotra


 On Mon, Jun 29, 2015 at 7:01 PM, Davide Giannella dav...@apache.org wrote:
 On 29/06/2015 10:22, Francesco Mari wrote:
 ...

 Is it possible - or does it make sense - to make this behaviour
 uniform across components?

 I think it's a good idea to uniform this aspect. Maybe we could put it
 down as a guideline by setting up a new page on the doc site:
 code-conventions.md. Somewhere beside:
 http://jackrabbit.apache.org/oak/docs/dev_getting_started.html

 Personally I'd go for component first and bundle then, but I'm not too
 religious about it :)

 Anyone against it?

 Davide




Re: [Oak origin/trunk] Apache Jackrabbit Oak matrix - Build # 232 - Still Failing

2015-06-29 Thread Chetan Mehrotra
Most of the test in oak-remote/RemoteServerIT fail with following
exception. Opened OAK-3047 to track this

--

java.io.FileNotFoundException:
/home/jenkins/jenkins-slave/workspace/Apache%20Jackrabbit%20Oak%20matrix/jdk/latest1.7/label/Ubuntu/nsfixtures/SEGMENT_MK/profile/unittesting/oak-remote/target/test-classes/org/apache/jackrabbit/oak/remote/http/handler/addNodeMultiPathProperty.json
(No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.init(FileInputStream.java:146)
at com.google.common.io.Files$FileByteSource.openStream(Files.java:127)
at com.google.common.io.Files$FileByteSource.openStream(Files.java:117)
at com.google.common.io.ByteSource$AsCharSource.openStream(ByteSource.java:404)
at com.google.common.io.CharSource.read(CharSource.java:155)
at com.google.common.io.Files.toString(Files.java:391)
at 
org.apache.jackrabbit.oak.remote.http.handler.RemoteServerIT.load(RemoteServerIT.java:119)
at 
org.apache.jackrabbit.oak.remote.http.handler.RemoteServerIT.testPatchLastRevisionAddMultiPathProperty(RemoteServerIT.java:1199)
-
Chetan Mehrotra


On Tue, Jun 30, 2015 at 10:11 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 The Apache Jenkins build system has built Apache Jackrabbit Oak matrix (build 
 #232)

 Status: Still Failing

 Check console output at 
 https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/232/ to view 
 the results.


Re: [Observation] Question on local events reported as external

2015-06-25 Thread Chetan Mehrotra
This is most likely due to rate at which event gets generated and thus
causing Observation queue to fill up (1000 default size). If the
backing JCR listener is slow in processing the queue would get filled
and BackgroundObserver would start compacting/merging the diff thus
converting local events to external.

There are two solutions
1. Throttle the commits - CommitRateLimiter
2. Have a non limiting queue - Then you end up with OOM if gap in
processing rate is large
Chetan Mehrotra


On Thu, Jun 25, 2015 at 8:47 PM, Marius Petria mpet...@adobe.com wrote:
 Hi,

 I understand that under high load local events can be reported as external. 
 My question is why does this happen also for tarmk, where there is a single 
 instance active (receiving external events on a single instance seems odd)? 
 Also, is there a way to disable this functionality, meaning on tarmk to 
 always receive the local events with their associated data (specifically the 
 userData)?

 Regards,
 Marius



Re: [Observation] Question on local events reported as external

2015-06-25 Thread Chetan Mehrotra
On Thu, Jun 25, 2015 at 10:49 PM, Marius Petria mpet...@adobe.com wrote:
 AFAIU because the local events are changed to external events it means that 
 they can also be dropped completly under load, is that true?

Well they are not dropped. Observation in Oak works on basis of diff
of NodeState. So lets say you have an Observation queue of size 3 with
3 local events

1. [ns1, ns2, ci1]
2. [ns2, ns3, ci2]
3. [ns3, ns4, ci3]

Each tuple is [base root nodestate, nodeState post change, commit
info]. Now if 4 change event [ns4, ns5, ci4] comes the
BackgroundObserver has following options

1. Block on put
2. Have a indefinite length queue then just add to queue
3. Pull out the last event and replace it with merged

So current logic in Oak goes for #3.

1. [ns1, ns2, ci1]
2. [ns2, ns3, ci2]
3. [ns3, ns5, null]

Now when it merged/collapsed the content change then you cannot
associate any specific commit info to it. In such cases

1. You cannot determine which user has made that change.
2. Some changes might not get visible. For example if a foo property
is added in E3 and removed in E4. Then Merged content change would not
provide any indication that any such change happened

So such merged changes are shown as external i.e.
JackrabbitEvent#isExternal returns true for them currently. Problem is
currently its not possible to distinguish such collapsed events from
truely external events. May be we make that distinction so that
component which just rely on *some local change* to react can continue
to work. Though there is no gurantee that they see *each* local
change.

Chetan Mehrotra


Re: Observation: External vs local - Load distribution

2015-06-17 Thread Chetan Mehrotra
Just ensure that your Observer is fast as its invoked the critical path.

This would probably end up with a design similar to Background
Observer. May be better option would be to allow BO have non bounded
queue.
Chetan Mehrotra


On Wed, Jun 17, 2015 at 2:05 PM, Carsten Ziegeler cziege...@apache.org wrote:
 Ok, just to recap. In Sling we can implement the Observer interface (and
 not use the BackgroundObserver base class). This will give us reliably
 user id for all local events.

 Does anyone see a problem with this approach?

 Carsten
 --
 Carsten Ziegeler
 Adobe Research Switzerland
 cziege...@apache.org


Re: Observation: External vs local - Load distribution

2015-06-15 Thread Chetan Mehrotra
On Mon, Jun 15, 2015 at 1:13 PM, Carsten Ziegeler cziege...@apache.org wrote:
 Now, with Oak there is still this distinction, however if I remember
 correctly under heavy load it might happen that local events are
 reported as external events. And in that case the above pattern fails.
 Regardless of how rare this situation might be, if it can happen it will
 eventually happen.

This is an implementation detail of BackgroundObserver (BO) which is
used by OakResourceListener in Sling. BO keeps a queue of changed
NodeState tuples and if it gets filled it is collapsed. If you want to
avoid that at *any* cost that you can used a different impl which uses
say LinkedBlockingQueue and does not enforce any limit. That would be
similar to how JcrResourceListener works which uses an unbound in
memory queue



Chetan Mehrotra


Re: MongoDB collections in MongoDocumentStore

2015-06-12 Thread Chetan Mehrotra
On Fri, Jun 12, 2015 at 5:20 PM, Ian Boston i...@tfd.co.uk wrote:
 Are all queries expected to query all keys within a collection as it is
 now, or is there some logical structure to the querying ?

Not sure if I get your question. The queries are always for immediate
children. For for 1:/a the query is like

$query: { _id: { $gt: 2:/a/, $lt: 2:/a0 }

Chetan Mehrotra


Re: [VOTE] Release Apache Jackrabbit Oak 1.0.15

2015-06-12 Thread Chetan Mehrotra
On Fri, Jun 12, 2015 at 1:56 PM, Amit Jain am...@ieee.org wrote:
   [ ] +1 Release this package as Apache Jackrabbit Oak 1.0.15

All checks ok

Chetan Mehrotra


Re: MongoDB collections in MongoDocumentStore

2015-06-12 Thread Chetan Mehrotra
On Fri, Jun 12, 2015 at 3:31 PM, Ian Boston i...@tfd.co.uk wrote:
 I am thinking that the collection name is a fn(key). What problems would
 that cause elsewhere ?

One potential problem is when querying for children. If 2:/a/b and
2:/a/c are mapped to different collection then querying for children
of 1:/a would be become tricky

Chetan Mehrotra


Re: MongoDB collections in MongoDocumentStore

2015-06-12 Thread Chetan Mehrotra
On Fri, Jun 12, 2015 at 7:32 PM, Ian Boston i...@tfd.co.uk wrote:
 Initially I was thinking about the locking behaviour but I realises 2.6.*
 is still locking at a database level, and that only changes to at a
 collection level 3.0 with MMAPv1 and row if you switch to WiredTiger [1].

I initially thought the same and then we benchmarked the throughput by
placing the BlobStore in a separate database (OAK-1153). But did not
observed any significant gains. So that approach was not pursued
further. If we have some benchmark which can demonstrate that write
throughput increases if we _shard_ node collection into separate
database on same server then we can look further there

Chetan Mehrotra


Re: LazyInputStream does not uses BufferedInputStream while creating stream from underlying File

2015-05-21 Thread Chetan Mehrotra
On Thu, May 21, 2015 at 1:31 PM, Thomas Mueller muel...@adobe.com wrote:
 Yes, it would be better if we wrap it somewhere (not necessarily right
 there, but somewhere

I think we can do that in DataStoreBlobStore. Right?

Chetan Mehrotra


LazyInputStream does not uses BufferedInputStream while creating stream from underlying File

2015-05-21 Thread Chetan Mehrotra
While having a look at how the InputStream is opened while accessing
content from any binary file it appears that LazyInputStream creates a
FileInputStream [1] from the underlying file instance

Should not it be a BufferedInputStream or is it the responsibility of
the caller to wrap it in the buffered variant?

Chetan Mehrotra
[1] 
https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-data/src/main/java/org/apache/jackrabbit/core/data/LazyFileInputStream.java#L102


Re: LazyInputStream does not uses BufferedInputStream while creating stream from underlying File

2015-05-21 Thread Chetan Mehrotra
Opened OAK-2898 to track this.

Testing that passed stream at Oak/JCR level is buffered would indeed be tricky!
Chetan Mehrotra


On Thu, May 21, 2015 at 2:16 PM, Thomas Mueller muel...@adobe.com wrote:
 Hi,

 Sure.

 We should probably have some kind of test case to ensure the stream is
 wrapped. Not sure how to be test it, if the BufferedInputStream is again
 wrapped in some other way:

 * Check if markSupported() returns true (it does for
 BufferedInputStream, but not for FileInputStream; and if
 BufferedInputStream is again wrapped, typically markSupported() is
 delegated, at least in FilterInputStream).
 * Using reflection? :-)
 * Measureing performance: not a reliable way to test it


 Regards,
 Thomas

 On 21/05/15 10:33, Chetan Mehrotra chetan.mehro...@gmail.com wrote:

On Thu, May 21, 2015 at 1:31 PM, Thomas Mueller muel...@adobe.com wrote:
 Yes, it would be better if we wrap it somewhere (not necessarily right
 there, but somewhere

I think we can do that in DataStoreBlobStore. Right?

Chetan Mehrotra



Re: Build failed in Jenkins: Apache Jackrabbit Oak matrix » latest1.7,Ubuntu,DOCUMENT_NS,unittesting #139

2015-05-21 Thread Chetan Mehrotra
Failure in getSize OAK-2689

On Fri, May 22, 2015 at 9:36 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Tests run: 218, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 352.443 sec 
  FAILURE!
 testGetSize(org.apache.jackrabbit.core.query.QueryResultTest)  Time elapsed: 
 4.2 sec   FAILURE!
 junit.framework.AssertionFailedError: Wrong size of NodeIterator in result 
 expected:51 but was:-1
 at junit.framework.Assert.fail(Assert.java:50)
 at junit.framework.Assert.failNotEquals(Assert.java:287)
 at junit.framework.Assert.assertEquals(Assert.java:67)



Chetan Mehrotra


Re: Build failed in Jenkins: Apache Jackrabbit Oak matrix » jdk-1.6u45,Ubuntu,DOCUMENT_NS,unittesting #139

2015-05-21 Thread Chetan Mehrotra
Failure due class not getting loaded. Looks like updated Apache DS is
compiled and meant to be used with JDK 1.7 only. In that case we would
either need to disable these test for 1.6 matrix or look for other
option

Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.055
sec  FAILURE!
org.apache.jackrabbit.oak.security.authentication.ldap.LdapProviderTest
 Time elapsed: 0.054 sec   ERROR!
java.lang.UnsupportedClassVersionError:
org/apache/directory/server/core/api/DirectoryService : Unsupported
major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
Chetan Mehrotra


On Fri, May 22, 2015 at 9:48 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 See 
 https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/jdk=jdk-1.6u45,label=Ubuntu,nsfixtures=DOCUMENT_NS,profile=unittesting/139/changes

 Changes:

 [chetanm] OAK-2247 - CopyOnWriteDirectory implementation for Lucene for use 
 in indexing

 [thomasm] OAK-2889 Ignore order by jcr:score desc in the query engine (for 
 union queries)

 [mreutegg] OAK-2899: Update to Jackrabbit 2.10.1

 [chetanm] OAK-2895 - Avoid accessing binary content if the mimeType is 
 excluded from indexing

 Update the docs

 [chetanm] OAK-2895 - Avoid accessing binary content if the mimeType is 
 excluded from indexing

 -- Use TypeDetector instead of DefaultDetector to avoid Tika sniffing the 
 mimeType by reading the input stream
 -- Use a LazyInputStream to lazily load the stream if and when required

 [chetanm] OAK-2898 - DataStoreBlobStore should expose a buffer input stream 
 for getInputStream call

 [alexparvulescu] OAK-2872 ExternalLoginModule should clear state when login 
 was not successful
  - added another missing cleanup

 [tripod] Use correct copyright notice

 --
 [...truncated 1947 lines...]
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
 at 
 org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
 at 
 org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

 org.apache.jackrabbit.oak.security.authentication.ldap.LdapDefaultLoginModuleTest
   Time elapsed: 0.006 sec   ERROR!
 java.lang.NoClassDefFoundError: Could not initialize class 
 org.apache.jackrabbit.oak.security.authentication.ldap.LdapLoginTestBase
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
 at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
 at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:36)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
 at 
 org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
 at 
 org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115

Re: Faster indexing of binary files - For migration and incremental async indexing

2015-05-20 Thread Chetan Mehrotra
Opened OAK-2892 to track this
Chetan Mehrotra


On Wed, May 20, 2015 at 2:45 PM, Chetan Mehrotra
chetan.mehro...@gmail.com wrote:
 On Wed, May 20, 2015 at 2:34 PM, Ian Boston i...@tfd.co.uk wrote:
 And does that apply to all BlobStore implementations including those that
 use Mongo as the BlobStore

 It should apply to Mongo also which guarantees strong consistency. Yes
 latency is different from consistency but any writes done to Mongo
 primary would be visible to later reads from other clients.

 Latency that occurs while reading from repository is different - That
 happens due to the way DocumentNodeStore performs background reads. So
 any change from other cluster node would only be picked up when the
 background read happens.

 Chetan Mehrotra


Making release notes more meaningfull and usefull to end users

2015-05-20 Thread Chetan Mehrotra
Hi Team,

Currently the release notes provided with every release [1] is just a
dump from JIRA. However looking at it its not obvious if some new
feature, new setting is exposed which is useful to end user. The title
in JIRA issues are often very brief (given they are title!) and hence
cannot provide the required information completely.

We can see how other projects provide the release notes

1. Lucene [2], Solr [4] - It shows a brief explanation of new feature
instead of just showing the bug title. Moreover it makes the link
clickable hence a user can easily navigate

2. Guava [3] - It provide a brief summary of important changes which
user can quickly see and comprehend. if required he can go to complete
listing

As these are used as library such level of details makes sense. Given
that Oak is mostly used as an application we can at least focus on
providing details around any new config setting , new feature
introduced.

1. Document explicitly if any new config option is introduced
2. If any new feature is introduced then a brief description of that

This should be done before the actual release is performed.

Do share your thoughts on how the release notes can be improved further!

Chetan Mehrotra
[1] 
https://svn.apache.org/repos/asf/jackrabbit/oak/tags/jackrabbit-oak-1.0.13/RELEASE-NOTES.txt
[2] https://lucene.apache.org/core/4_1_0/changes/Changes.html#4.1.0.new_features
[3] https://code.google.com/p/guava-libraries/wiki/Release18
[4] http://lucene.apache.org/solr/4_5_0/changes/Changes.html#v4.5.0.new_features


[docs] Add inner links directly to side bar in Oak Docs

2015-05-20 Thread Chetan Mehrotra
Hi Team,

Currently the links show on the side and top bar only list the top
level links [1] and do not show the inner links. For example link to
Persistent Cache [2] is mentioned somewhere in DocumentNodeStore [3]
which is again not directly listed.

So looking at documentation its not obvious where Persistent Cache doc
is referred.

Unless we restructure the site like say Apache Drill [4] (which shows
nested link in side bar) I think we should also refer to all such
inner links directly.

Thoughts?

Chetan Mehrotra
[1] http://jackrabbit.apache.org/oak/docs/
[2] http://jackrabbit.apache.org/oak/docs/nodestore/persistent-cache.html
[3] http://jackrabbit.apache.org/oak/docs/nodestore/documentmk.html
[4] http://drill.apache.org/docs/


Re: svn commit: r1679959 - in /jackrabbit/oak/trunk: oak-commons/ oak-commons/src/main/java/org/apache/jackrabbit/oak/commons/benchmark/ oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/segmen

2015-05-18 Thread Chetan Mehrotra
Hi Michael,

On Mon, May 18, 2015 at 2:02 PM,  mdue...@apache.org wrote:
  /dependency
 +dependency
 +  groupIdorg.apache.commons/groupId
 +  artifactIdcommons-math3/artifactId
 +/dependency

Is adding a required dependency on commons-math3 required? Probably
MicroBenchmark can be made part of commons/test and move this
dependency to test scope

Chetan Mehrotra


Keeping Oak documentation upto date - Use label 'docs-impacting'

2015-05-05 Thread Chetan Mehrotra
Hi Team,

At times we introduce new config settings as part of some bug/feature
implementation. The required details often remain only in the bug
notes and not made part of documentation.

I suggest that we mark any such issue with label 'docs-impacting' and
at time of release make use of that to update the release notes and
also ensure that documentation gets updated

Chetan Mehrotra


Re: svn commit: r1676235 - in /jackrabbit/oak/trunk/oak-run/src/main/java/org/apache/jackrabbit/oak: ContinuousRevisionGCTest.java benchmark/BenchmarkRunner.java benchmark/RevisionGCTest.java

2015-04-27 Thread Chetan Mehrotra
On Mon, Apr 27, 2015 at 3:24 PM,  mreut...@apache.org wrote:
 +protected static NodeStore getNodeStore(Oak oak) throws Exception {
  Field f = Oak.class.getDeclaredField(store);
  f.setAccessible(true);
  return (NodeStore) f.get(oak);
  }

I have also often struggled to get hold of underlying NodeStore from
given Oak instance. May be we should expose it as part of API itself.
After each Oak instance would always be back by NodeStore


Chetan Mehrotra


Re: Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC

2015-04-24 Thread Chetan Mehrotra
To avoid missing this issue opened OAK-2808. Data collected from
recent runs suggest that this aspect would need to be looked into
going forward
Chetan Mehrotra


On Tue, Mar 10, 2015 at 9:49 PM, Thomas Mueller muel...@adobe.com wrote:
 Hi,

 I think removing binaries directly without going though the GC logic is
 dangerous, because we can't be sure if there are other references. There
 is one exception, it is if each file is guaranteed to be unique. For that,
 we could for example append a unique UUID to each file. The Lucene file
 system implementation would need to be changed for that (write the UUID,
 but ignore it when reading and reading the file size).

 Even in that case, there is still a risk, for example if the binary
 _reference_ is copied, or if an old revision is accessed. How do we ensure
 this does not happen?

 Regards,
 Thomas


 On 10/03/15 07:46, Chetan Mehrotra chetan.mehro...@gmail.com wrote:

Hi Team,

With storing of Lucene index files within DataStore our usage pattern
of DataStore has changed between JR2 and Oak.

With JR2 the writes were mostly application based i.e. if application
stores a pdf/image file then that would be stored in DataStore. JR2 by
default would not write stuff to DataStore. Further in deployment
where large number of binary content is present then systems tend to
share the DataStore to avoid duplication of storage. In such cases
running Blob GC is a non trivial task as it involves a manual step and
coordination across multiple deployments. Due to this systems tend to
delay frequency of GC

Now with Oak apart from application the Oak system itself *actively*
uses the DataStore to store the index files for Lucene and there the
churn might be much higher i.e. frequency of creation and deletion of
index file is lot higher. This would accelerate the rate of garbage
generation and thus put lot more pressure on the DataStore storage
requirements.

Any thoughts on how to avoid/reduce the requirement to increase the
frequency of Blob GC?

One possible way would be to provide a special cleanup tool which can
look for such old Lucene index files and deletes them directly without
going through the full fledged MarkAndSweep logic

Thoughts?

Chetan Mehrotra



Re: Quickest way of running oak to validate DocumentNodeStore mbeans

2015-04-23 Thread Chetan Mehrotra
On Thu, Apr 23, 2015 at 3:51 PM, Robert Munteanu romb...@apache.org wrote:
 Some of the specific MBeans do not appear and only
 seem to be registered by the DocumentNodeStoreService ( see [2] ).
 That was what lead me to believe that JMX MBean registration is tied
 to OSGi. Is this available for registration in another way in oak-run?

Aah yup those would get missed out in non OSGi runs. To see them in
non OSGi env you would have to make use of oak-pojosr module. Launch
repo using that and then have it running. Might require some more
tweaks

Chetan Mehrotra


Re: Quickest way of running oak to validate DocumentNodeStore mbeans

2015-04-22 Thread Chetan Mehrotra
 I assume that this happens because there is no OSGi environment available

Thats not the case. MBean would be registered only if a MBeanServer is
provided while constructing Oak instance (in non OSGi env). So in
oak-run where Oak instance is created if you also set MBeanServer

Something like
Oak oak = ..
oak.with(ManagementFactory.getPlatformMBeanServer());

This would lead to registration of MBean
Chetan Mehrotra


On Wed, Apr 22, 2015 at 6:37 PM, Robert Munteanu romb...@apache.org wrote:
 Hi,

 I've built Oak from trunk and want to access the
 DocumentNodeStoreMBean. I see that the mbeans are not registered when
 using oak-run ( I assume that this happens because there is no OSGi
 environment available ).

 I can always install a custom version of Oak in Sling, but I was
 wondering whether there's a faster way of running a locally-built Oak
 in an OSGi environment.

 Thanks,

 Robert


Re: svn commit: r1674107 - /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/nodetype/TypeEditorProvider.java

2015-04-20 Thread Chetan Mehrotra
On Thu, Apr 16, 2015 at 9:49 PM,  resc...@apache.org wrote:
 +LOG.info(Node type changes:  + modifiedTypes + ; 
 repository scan took  + (System.currentTimeMillis() - start)
 ++ ms + (exception == null ?  : ; failed with  
 + exception.getMessage()));

It would be better to make use of PerfLogger here


Chetan Mehrotra


Re: TokenLoginModule Spring

2015-04-14 Thread Chetan Mehrotra
On Tue, Apr 14, 2015 at 10:25 PM, Angela Schreiber anch...@adobe.com wrote:
 Since I initialize the JCR with
an instance of the Oak, it would be nice to reach in and get the
underlaying oak repo

I am seeing similar requirement for that at OAK-2760 where the
HttpServer has to access both ContentRepository and JCR Repository.
Should we modify the Jcr class to

1. To not allow more than 1 invocation for createRepository
2. Cache the repo created in createRepository
3. Expose a getter for the ContentRepository instance created in
createRepository

OR
1. Have special interface for providing ContentRepository via
RepositoryImpl. Something like RepositoryImpl implements OakRepository
and have it provide accessor for the backing content repo instance

Chetan Mehrotra


[jira] [Commented] (JCR-3865) Use markdown to generate Jackrabbit Site

2015-04-08 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486716#comment-14486716
 ] 

Chetan Mehrotra commented on JCR-3865:
--

bq. note that the site only updates the 'jackrabbit/site/live/jcr' directory. 
handling the entire site is very slow, since scm-publish checks out the entire 
tree.

In Oak I avoid doing the complete checkin by first running the build with 
{{-Dscmpublish.skipCheckin=true}} and then just checkin in the changed html 
files. But yes it still has the drawback of checking out whole site which might 
be slow

 Use markdown to generate Jackrabbit Site
 

 Key: JCR-3865
 URL: https://issues.apache.org/jira/browse/JCR-3865
 Project: Jackrabbit Content Repository
  Issue Type: Improvement
  Components: docs
Reporter: Tobias Bocanegra
Assignee: Tobias Bocanegra
Priority: Minor

 The current jackrabbit site is nor well maintained, mainly because we need to 
 edit directly the HTML. most of the content is already available as markdown.
 goal is to automate the site generation via maven and svn pubsub.
 1. phase is to reuse the same template/skin as in oak and to get the content 
 right.
 2. phase is to beautify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: jackrabbit-oak build #5374: Broken

2015-04-07 Thread Chetan Mehrotra
Added LogDumper rule with OAK-2721. Now you can make use of that in those
test which fail intermittently to get better details around the failure

Chetan Mehrotra

On Thu, Apr 2, 2015 at 3:21 PM, Chetan Mehrotra chetan.mehro...@gmail.com
wrote:

 I implemented something on that line to get logs from remote server [1]
 for Sling. Had plans to get that work for local jvm process also but never
 got to complete it. Would try to get that done and see if that can be
 leverage in Oak test.

 This feature would dump the logs as part of JUnit report rendered in
 Jenkins. So along with failure stacktrace you can see what all logs were
 captured in that run

 Chetan Mehrotra
 [1] https://plus.google.com/+ChetanMehrotra/posts/Ao1w9SACKSh

 On Thu, Apr 2, 2015 at 2:43 PM, Davide Giannella dav...@apache.org
 wrote:

 On 01/04/2015 07:29, Marcel Reutegger wrote:
  The test failure was:
 
  Failed tests:
  testProxyFlippedIntermediateByteChange(
 org.apache.jackrabbit.oak.plugins.se
  gment.standby.ExternalSharedStoreIT): expected:{ root = { ... } } but
  was:{ root : { } }
 
 
  It's a bit difficult for me to see what went wrong, because the
  test forcibly causes exceptions. So, many of the exceptions in
  the log are kind of expected...
 
 Could we catch all the logged exceptions with an ad-hoc appender and
 then in case output only the relevant ones?

 An example with custom appender for testing can be found in

 https://github.com/apache/jackrabbit-oak/blob/105f890e04ee990f0e71d88937955680670d96f7/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/index/property/Oak2077QueriesTest.java

 -- Davide






PropertyIndex handle large deletions : Possible optimization

2015-04-06 Thread Chetan Mehrotra
Hi Team,

Currently in case large deletion are performed i.e. deleting big subtree
where multiple property indexes are configured for nodes in that deleted
tree then deletion is found to be very slow (at least for DocumentMK).

Looking at the code it seems that if a deletion is detected the editor
still traverses the deleted sub tree completely and updates the backing
index on per node basis. Instead of that if it utilizes the fact that index
is also managed as a tree i.e. at least for ContentMirrorStoreStrategy it
can just delete the index tree at that path for various values.

LuceneIndexEditor takes a similar approach [1] by issuing a PrefixQuery to
drop all Lucene documents under that path

Chetan Mehrotra
[1]
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexEditor.java#L230-255


Re: jackrabbit-oak build #5374: Broken

2015-04-02 Thread Chetan Mehrotra
I implemented something on that line to get logs from remote server [1] for
Sling. Had plans to get that work for local jvm process also but never got
to complete it. Would try to get that done and see if that can be leverage
in Oak test.

This feature would dump the logs as part of JUnit report rendered in
Jenkins. So along with failure stacktrace you can see what all logs were
captured in that run

Chetan Mehrotra
[1] https://plus.google.com/+ChetanMehrotra/posts/Ao1w9SACKSh

On Thu, Apr 2, 2015 at 2:43 PM, Davide Giannella dav...@apache.org wrote:

 On 01/04/2015 07:29, Marcel Reutegger wrote:
  The test failure was:
 
  Failed tests:
  testProxyFlippedIntermediateByteChange(
 org.apache.jackrabbit.oak.plugins.se
  gment.standby.ExternalSharedStoreIT): expected:{ root = { ... } } but
  was:{ root : { } }
 
 
  It's a bit difficult for me to see what went wrong, because the
  test forcibly causes exceptions. So, many of the exceptions in
  the log are kind of expected...
 
 Could we catch all the logged exceptions with an ad-hoc appender and
 then in case output only the relevant ones?

 An example with custom appender for testing can be found in

 https://github.com/apache/jackrabbit-oak/blob/105f890e04ee990f0e71d88937955680670d96f7/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/index/property/Oak2077QueriesTest.java

 -- Davide





[DISCUSS] Enable CopyOnRead feature for Lucene indexes by default

2015-03-31 Thread Chetan Mehrotra
Hi Team,

CopyOnRead feature was provided as part of 1.0.9 release and has been in
used in quite a few customer deployment. Of late we have to recommend to
enable this setting on most of the deployments where queries are found to
be performing slowly and it provides considerable better performance.

I would like to enable this feature by default now [1]. Both in trunk and
in branch.

Would it be fine to do that?

Chetan Mehrotra
[1] https://issues.apache.org/jira/browse/OAK-2708


Re: New committer: Shashank Gupta

2015-03-25 Thread Chetan Mehrotra
Welcome Shashank!

Chetan Mehrotra

On Thu, Mar 26, 2015 at 8:56 AM, Amit Jain am...@apache.org wrote:

 Welcome Shashank!!

 On Thu, Mar 26, 2015 at 2:20 AM, Michael Dürig mdue...@apache.org wrote:

 Hi,

 Please welcome Shashank Gupta as a new committer and PMC member of
 the Apache Jackrabbit project. The Jackrabbit PMC recently decided to
 offer Shashank committership based on his contributions. I'm happy to
 announce that he accepted the offer and that all the related
 administrative work has now been taken care of.

 Welcome to the team, Shashank!

 Michael





[jira] [Commented] (JCR-3862) [FileDataStore]: deleteRecord leaves the parent directories empty

2015-03-23 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375782#comment-14375782
 ] 

Chetan Mehrotra commented on JCR-3862:
--

+1. Patch looks fine

 [FileDataStore]: deleteRecord leaves the parent directories empty
 -

 Key: JCR-3862
 URL: https://issues.apache.org/jira/browse/JCR-3862
 Project: Jackrabbit Content Repository
  Issue Type: Bug
  Components: jackrabbit-data
Reporter: Amit Jain
Assignee: Amit Jain
 Attachments: JCR-3862-mreutegg.patch, JCR-3862.patch


 Calling deleteRecord to delete a particular record does not delete any empty 
 parent directories empty. Oak uses this particular method for garbage 
 collection due to which after a while large number of empty directories keep 
 lying around, making the process increasingly slower.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: svn commit: r1668034 - in /jackrabbit/oak/trunk/oak-lucene: ./ src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/ src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/score/ src

2015-03-20 Thread Chetan Mehrotra
On Fri, Mar 20, 2015 at 8:22 PM, thom...@apache.org wrote:

 +@Activate
 +private void activate() {
 +scorerProviderMap.clear();
 +}


Probably this should only be done in deactivate

Chetan Mehrotra


Re: svn commit: r1667590 - /jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCDeletionTest.java

2015-03-18 Thread Chetan Mehrotra
Interesting way for test case scenario construction Marcel !

Chetan Mehrotra

On Wed, Mar 18, 2015 at 10:38 PM, mreut...@apache.org wrote:

 Author: mreutegg
 Date: Wed Mar 18 17:08:59 2015
 New Revision: 1667590

 URL: http://svn.apache.org/r1667590
 Log:
 OAK-2420: DocumentNodeStore revision GC may lead to NPE

 Test to reproduce the problem

 Modified:

 jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCDeletionTest.java

 Modified:
 jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCDeletionTest.java
 URL:
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCDeletionTest.java?rev=1667590r1=1667589r2=1667590view=diff

 ==
 ---
 jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCDeletionTest.java
 (original)
 +++
 jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCDeletionTest.java
 Wed Mar 18 17:08:59 2015
 @@ -22,20 +22,31 @@ package org.apache.jackrabbit.oak.plugin
  import java.util.Collections;
  import java.util.Comparator;
  import java.util.List;
 +import java.util.concurrent.Callable;
 +import java.util.concurrent.CountDownLatch;
 +import java.util.concurrent.Future;
 +import java.util.concurrent.Semaphore;
  import java.util.concurrent.TimeUnit;

  import javax.annotation.Nonnull;

 +import org.apache.jackrabbit.oak.api.CommitFailedException;
 +import
 org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.VersionGCStats;
  import
 org.apache.jackrabbit.oak.plugins.document.memory.MemoryDocumentStore;
  import org.apache.jackrabbit.oak.spi.commit.CommitInfo;
  import org.apache.jackrabbit.oak.spi.commit.EmptyHook;
 +import org.apache.jackrabbit.oak.spi.state.ChildNodeEntry;
  import org.apache.jackrabbit.oak.spi.state.NodeBuilder;
 +import org.apache.jackrabbit.oak.spi.state.NodeState;
  import org.apache.jackrabbit.oak.stats.Clock;
  import org.junit.After;
  import org.junit.Before;
 +import org.junit.Ignore;
  import org.junit.Test;

 +import static java.util.concurrent.Executors.newSingleThreadExecutor;
  import static java.util.concurrent.TimeUnit.HOURS;
 +import static java.util.concurrent.TimeUnit.MINUTES;
  import static org.junit.Assert.assertEquals;
  import static org.junit.Assert.assertNull;
  import static org.junit.Assert.fail;
 @@ -140,7 +151,7 @@ public class VersionGCDeletionTest {
  VersionGarbageCollector gc = store.getVersionGarbageCollector();
  gc.setOverflowToDiskThreshold(100);

 -VersionGarbageCollector.VersionGCStats stats = gc.gc(maxAge * 2,
 HOURS);
 +VersionGCStats stats = gc.gc(maxAge * 2, HOURS);
  assertEquals(noOfDocsToDelete * 2 + 1, stats.deletedDocGCCount);


 @@ -152,6 +163,88 @@ public class VersionGCDeletionTest {
  }
  }

 +// OAK-2420
 +@Ignore
 +@Test
 +public void queryWhileDocsAreRemoved() throws Exception {
 +//Baseline the clock
 +clock.waitUntil(Revision.getCurrentTimestamp());
 +
 +final Thread currentThread = Thread.currentThread();
 +final Semaphore queries = new Semaphore(0);
 +final CountDownLatch ready = new CountDownLatch(1);
 +MemoryDocumentStore ms = new MemoryDocumentStore() {
 +@Override
 +public T extends Document T find(CollectionT collection,
 +   String key) {
 +if (Thread.currentThread() != currentThread) {
 +ready.countDown();
 +queries.acquireUninterruptibly();
 +}
 +return super.find(collection, key);
 +}
 +};
 +store = new DocumentMK.Builder().clock(clock)
 +.setDocumentStore(ms).setAsyncDelay(0).getNodeStore();
 +
 +// create nodes
 +NodeBuilder builder = store.getRoot().builder();
 +NodeBuilder node = builder.child(node);
 +for (int i = 0; i  100; i++) {
 +node.child(c- + i);
 +}
 +merge(store, builder);
 +
 +clock.waitUntil(clock.getTime() + HOURS.toMillis(1));
 +
 +// remove nodes
 +builder = store.getRoot().builder();
 +node = builder.child(node);
 +for (int i = 0; i  90; i++) {
 +node.getChildNode(c- + i).remove();
 +}
 +merge(store, builder);
 +
 +store.runBackgroundOperations();
 +
 +clock.waitUntil(clock.getTime() + HOURS.toMillis(1));
 +
 +// fill caches
 +NodeState n = store.getRoot().getChildNode(node);
 +for (ChildNodeEntry entry : n.getChildNodeEntries()) {
 +entry.getName();
 +}
 +
 +// invalidate the nodeChildren cache only
 +store.invalidateNodeChildrenCache

Re: Slow running test for oak-lucene and Lucene Suggestor getting created by default

2015-03-12 Thread Chetan Mehrotra
Thanks Tommaso!. Let see how the next build runs
http://ci.apache.org/builders/oak-trunk/builds/1144
Chetan Mehrotra


On Thu, Mar 12, 2015 at 2:36 PM, Tommaso Teofili
tommaso.teof...@gmail.com wrote:
 I've created https://issues.apache.org/jira/browse/OAK-2611 to track the
 mentioned issue.

 Regards,
 Tommaso

 2015-03-12 9:36 GMT+01:00 Tommaso Teofili tommaso.teof...@gmail.com:

 Hi Chetan,

 there are 2 things at play there I think.
 First thing is that for testing purposes the suggester was configured to
 be updated upon each commit [1], the other thing, which is a bug, is that
 the code you mentioned [2] should actually check if the the useInSuggest
 property is set before eventually update the suggester, so at least this
 check needs to be introduced. For the testing configuration we should
 probably look for a less intrusive setting.

 Regards,
 Tommaso


 [1] :
 https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/jcr/LuceneOakRepositoryStub.java#L88
 [2]
 https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexEditorContext.java#L167

 2015-03-12 8:00 GMT+01:00 Chetan Mehrotra chetan.mehro...@gmail.com:

 Hi Tommaso,

 Last couple of builds on Apache CI are failing in oak-lucene [1] [2].
 Running the system locally reveals that quite a bit of time is being
 spent in building up suggestor [3]. QueryJcrTest taking some time and
 its the test which probably gets hanged in the CI build

 Running org.apache.jackrabbit.oak.jcr.query.QueryJcrTest
 Tests run: 218, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 245.208
 sec

 Further looking at the code [0] it appears that a suggestor directory
 would always be created/updated irrespective wether user has enabled
 suggestor for that index or not.

 I think suggestor should only be built if the index has that feature
 enabled? For example for normal lucene-property index building up the
 suggestor would not be useful

 Chetan Mehrotra
 [0]
 https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexEditorContext.java#L167
 [1]
 http://ci.apache.org/builders/oak-trunk/builds/1142/steps/compile/logs/stdio
 [2]
 http://ci.apache.org/builders/oak-trunk/builds/1141/steps/compile/logs/stdio
 [3]
 Thread-9 prio=10 tid=0x7f1790797000 nid=0x6b6f runnable
 [0x7f175ef0b000]
java.lang.Thread.State: RUNNABLE
 at
 org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexFile.init(OakDirectory.java:201)
 at
 org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexFile.init(OakDirectory.java:155)
 at
 org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.init(OakDirectory.java:340)
 at
 org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.clone(OakDirectory.java:345)
 at
 org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.clone(OakDirectory.java:329)
 at
 org.apache.lucene.store.Directory$SlicedIndexInput.clone(Directory.java:288)
 at
 org.apache.lucene.store.Directory$SlicedIndexInput.clone(Directory.java:269)
 at
 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader.init(BlockTreeTermsReader.java:481)
 at
 org.apache.lucene.codecs.BlockTreeTermsReader.init(BlockTreeTermsReader.java:176)
 at
 org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:437)
 at
 org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:116)
 at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:96)
 at
 org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:141)
 at
 org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:235)
 - locked 0xfc700320 (a
 org.apache.lucene.index.ReadersAndUpdates)
 at
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
 at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:382)
 - locked 0xf19fbee0 (a org.apache.lucene.index.IndexWriter)
 - locked 0xf19fc010 (a java.lang.Object)
 at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111)
 at
 org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorContext.updateSuggester(LuceneIndexEditorContext.java:185)





Re: svn commit: r1666220 - in /jackrabbit/oak/trunk: oak-commons/ oak-commons/src/main/java/org/apache/jackrabbit/oak/commons/sort/ oak-commons/src/test/java/org/apache/jackrabbit/oak/commons/sort/ oa

2015-03-12 Thread Chetan Mehrotra
Looks like Closer closes the closeables in LIFO manner due to which
directory containing that file got deleted first. I have change the
logic now.

Let me know if the test passes for you on Windows
Chetan Mehrotra


On Thu, Mar 12, 2015 at 10:21 PM, Julian Reschke julian.resc...@gmx.de wrote:
 With this change, I get a reliable test failure on Windows:


 Tests in error:

 overflowToDisk(org.apache.jackrabbit.oak.commons.sort.StringSortTest):
 Unable to delete file: C:\tmp\oak-sorter-1426178913437-0\strings-sorted.txt


 Best regards, Julian


 On 2015-03-12 16:22, chet...@apache.org wrote:

 Author: chetanm
 Date: Thu Mar 12 15:22:46 2015
 New Revision: 1666220

 URL: http://svn.apache.org/r1666220
 Log:
 OAK-2557 - VersionGC uses way too much memory if there is a large pile of
 garbage

 Added:

 jackrabbit/oak/trunk/oak-commons/src/main/java/org/apache/jackrabbit/oak/commons/sort/StringSort.java
 (with props)

 jackrabbit/oak/trunk/oak-commons/src/test/java/org/apache/jackrabbit/oak/commons/sort/StringSortTest.java
 (with props)
 Modified:
  jackrabbit/oak/trunk/oak-commons/pom.xml

 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreService.java

 jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollector.java

 jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCDeletionTest.java

 jackrabbit/oak/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/VersionGCWithSplitTest.java

 Modified: jackrabbit/oak/trunk/oak-commons/pom.xml
 URL:
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-commons/pom.xml?rev=1666220r1=1666219r2=1666220view=diff

 ==
 --- jackrabbit/oak/trunk/oak-commons/pom.xml (original)
 +++ jackrabbit/oak/trunk/oak-commons/pom.xml Thu Mar 12 15:22:46 2015
 @@ -93,6 +93,11 @@
 artifactIdoak-mk-api/artifactId
 version${project.version}/version
   /dependency
 +dependency
 +  groupIdcommons-io/groupId
 +  artifactIdcommons-io/artifactId
 +  version2.4/version
 +/dependency

   !-- Test dependencies --
   dependency

 Added:
 jackrabbit/oak/trunk/oak-commons/src/main/java/org/apache/jackrabbit/oak/commons/sort/StringSort.java
 URL:
 http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-commons/src/main/java/org/apache/jackrabbit/oak/commons/sort/StringSort.java?rev=1666220view=auto

 ==
 ---
 jackrabbit/oak/trunk/oak-commons/src/main/java/org/apache/jackrabbit/oak/commons/sort/StringSort.java
 (added)
 +++
 jackrabbit/oak/trunk/oak-commons/src/main/java/org/apache/jackrabbit/oak/commons/sort/StringSort.java
 Thu Mar 12 15:22:46 2015
 @@ -0,0 +1,255 @@
 +/*
 + * Licensed to the Apache Software Foundation (ASF) under one
 + * or more contributor license agreements.  See the NOTICE file
 + * distributed with this work for additional information
 + * regarding copyright ownership.  The ASF licenses this file
 + * to you under the Apache License, Version 2.0 (the
 + * License); you may not use this file except in compliance
 + * with the License.  You may obtain a copy of the License at
 + *
 + *   http://www.apache.org/licenses/LICENSE-2.0
 + *
 + * Unless required by applicable law or agreed to in writing,
 + * software distributed under the License is distributed on an
 + * AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 + * KIND, either express or implied.  See the License for the
 + * specific language governing permissions and limitations
 + * under the License.
 + */
 +
 +package org.apache.jackrabbit.oak.commons.sort;
 +
 +import java.io.BufferedWriter;
 +import java.io.Closeable;
 +import java.io.File;
 +import java.io.FileNotFoundException;
 +import java.io.IOException;
 +import java.io.Reader;
 +import java.nio.charset.Charset;
 +import java.util.Collections;
 +import java.util.Comparator;
 +import java.util.Iterator;
 +import java.util.List;
 +
 +import com.google.common.base.Charsets;
 +import com.google.common.collect.Lists;
 +import com.google.common.io.Closer;
 +import com.google.common.io.Files;
 +import org.apache.commons.io.FileUtils;
 +import org.apache.commons.io.LineIterator;
 +import org.slf4j.Logger;
 +import org.slf4j.LoggerFactory;
 +
 +/**
 + * Utility class to store a list of string and perform sort on that. For
 small size
 + * the list would be maintained in memory. If the size crosses the
 required threshold then
 + * the sorting would be performed externally
 + */
 +public class StringSort implements Closeable {
 +private final Logger log = LoggerFactory.getLogger(getClass());
 +public static final int BATCH_SIZE = 2048;
 +
 +private final int overflowToDiskThreshold;
 +private final ComparatorString comparator;
 +
 +private final ListString ids = Lists.newArrayList

Slow running test for oak-lucene and Lucene Suggestor getting created by default

2015-03-12 Thread Chetan Mehrotra
Hi Tommaso,

Last couple of builds on Apache CI are failing in oak-lucene [1] [2].
Running the system locally reveals that quite a bit of time is being
spent in building up suggestor [3]. QueryJcrTest taking some time and
its the test which probably gets hanged in the CI build

Running org.apache.jackrabbit.oak.jcr.query.QueryJcrTest
Tests run: 218, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 245.208 sec

Further looking at the code [0] it appears that a suggestor directory
would always be created/updated irrespective wether user has enabled
suggestor for that index or not.

I think suggestor should only be built if the index has that feature
enabled? For example for normal lucene-property index building up the
suggestor would not be useful

Chetan Mehrotra
[0] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexEditorContext.java#L167
[1] http://ci.apache.org/builders/oak-trunk/builds/1142/steps/compile/logs/stdio
[2] http://ci.apache.org/builders/oak-trunk/builds/1141/steps/compile/logs/stdio
[3]
Thread-9 prio=10 tid=0x7f1790797000 nid=0x6b6f runnable
[0x7f175ef0b000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexFile.init(OakDirectory.java:201)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexFile.init(OakDirectory.java:155)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.init(OakDirectory.java:340)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.clone(OakDirectory.java:345)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.clone(OakDirectory.java:329)
at org.apache.lucene.store.Directory$SlicedIndexInput.clone(Directory.java:288)
at org.apache.lucene.store.Directory$SlicedIndexInput.clone(Directory.java:269)
at 
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader.init(BlockTreeTermsReader.java:481)
at 
org.apache.lucene.codecs.BlockTreeTermsReader.init(BlockTreeTermsReader.java:176)
at 
org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:437)
at 
org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:116)
at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:96)
at 
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:141)
at 
org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:235)
- locked 0xfc700320 (a org.apache.lucene.index.ReadersAndUpdates)
at 
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:382)
- locked 0xf19fbee0 (a org.apache.lucene.index.IndexWriter)
- locked 0xf19fc010 (a java.lang.Object)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorContext.updateSuggester(LuceneIndexEditorContext.java:185)


Re: Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC

2015-03-10 Thread Chetan Mehrotra
On Tue, Mar 10, 2015 at 1:50 PM, Michael Marth mma...@adobe.com wrote:
 But I wonder: how do you envision that this new index cleanup would locate 
 indexes in the content-addressed DS

Thats bit tricky. Have rough idea here on how to approach but would
require more thinking here. The approach I am thinking of is

1. Have an index on oak:QueryIndexDefinition
2. Query for all index definition nodes with type=lucene
3. Get the ':data node and then perform the listing. Each child node
is a Lucene index file representation

For Mongo I can easy read the previous revisions of the jcr:blob
property and then extract the blobId which can be then be deleted via
direct invocation GarbageCollectableBlobStore API. For Segment I am
not sure how to easily read previous revisions of given NodeState

Chetan Mehrotra


Re: Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC

2015-03-10 Thread Chetan Mehrotra
On Tue, Mar 10, 2015 at 3:33 PM, Michael Dürig mdue...@apache.org wrote:
 SegmentMK doesn't even have the concept of a previous revision of a
 NodeState.

Yes that is to be thought about. I want to read all previous revision
for path /oak:index/lucene/:data. For segment I believe I would need
to start at root references for all previous revisions and then read
along the required path from those root segments to collect previous
revisions.

Would that work?

Chetan Mehrotra


Re: Parallelize text extraction from binary fields

2015-03-10 Thread Chetan Mehrotra
 Is Oak already single instance when it comes to the identification and 
 storage of binaries ?

Yes. Oak uses content addressable storage for binaries

 Are the existing TextExtractors also single instance ?

No. If same binary is referred at multiple places then text extraction
would be performed for each such reference of that binary

 By Single instance I mean, 1 copy of the binary and its token stream in the 
 repository regardless of how many times its referenced.

So based on above token stream would be multiple.

What's the approach you are thinking ... and would benefit from
'Single instance' based design?
Chetan Mehrotra


On Tue, Mar 10, 2015 at 1:15 PM, Ian Boston i...@tfd.co.uk wrote:
 Hi,
 Is Oak already single instance when it comes to the identification and
 storage of binaries ?
 Are the existing TextExtractors also single instance ?
 By Single instance I mean, 1 copy of the binary and its token stream in the
 repository regardless of how many times its referenced.

 Best Regards
 Ian

 On 10 March 2015 at 07:05, Chetan Mehrotra chetan.mehro...@gmail.com
 wrote:

 LuceneIndexEditor currently extract the binary contents via Tika in
 same thread which is used for processing the commit. Such an approach
 does not make good use of multi processor system specifically when
 index is being built up as part of migration process.

 Looking at JR2 I see LazyTextExtractor [1] which I think would help in
 parallelize text extraction.

 Would it make sense to bring this to Oak. Would that help in improving
 performance?

 Chetan Mehrotra
 [1]
 https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/LazyTextExtractorField.java



<    1   2   3   4   5   6   7   8   >