[jira] [Resolved] (OAK-6534) Compute indexPaths from index definitions json

2017-08-09 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-6534.
--
   Resolution: Fixed
Fix Version/s: 1.7.6

Done with 1804632

> Compute indexPaths from index definitions json
> --
>
> Key: OAK-6534
> URL: https://issues.apache.org/jira/browse/OAK-6534
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.8, 1.7.6
>
>
> Currently while adding/updating indexes via {{--index-definitions-file}} 
> (OAK-6471) the index paths are always determined by {{--index-paths}} option. 
> If there are more index definitions present in the json file then those would 
> be ignored.
> To avoid confusion following approach should be implemented
> * If {{--index-paths}} is specified then use that
> * If not and {{--index-definitions-file}} is provided then compute index 
> paths from that
> * If both are specified then {{--index-paths}} then merge as user may want to 
> reindex few indexes and also update few others



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6534) Compute indexPaths from index definitions json

2017-08-09 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-6534:
-
Description: 
Currently while adding/updating indexes via {{--index-definitions-file}} 
(OAK-6471) the index paths are always determined by {{--index-paths}} option. 
If there are more index definitions present in the json file then those would 
be ignored.

To avoid confusion following approach should be implemented
* If {{--index-paths}} is specified then use that
* If not and {{--index-definitions-file}} is provided then compute index paths 
from that
* If both are specified then {{--index-paths}} then merge as user may want to 
reindex few indexes and also update few others

  was:
Currently while adding/updating indexes via {{--index-definitions-file}} 
(OAK-6471) the index paths are always determined by {{--index-paths}} option. 
If there are more index definitions present in the json file then those would 
be ignored.

To avoid confusion following approach should be implemented
* If {{--index-paths}} is specified then use that
* If not and {{--index-definitions-file}} is provided then compute index paths 
from that
* If both are specified then {{--index-paths}} takes precendence (no merging 
done)


> Compute indexPaths from index definitions json
> --
>
> Key: OAK-6534
> URL: https://issues.apache.org/jira/browse/OAK-6534
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.8
>
>
> Currently while adding/updating indexes via {{--index-definitions-file}} 
> (OAK-6471) the index paths are always determined by {{--index-paths}} option. 
> If there are more index definitions present in the json file then those would 
> be ignored.
> To avoid confusion following approach should be implemented
> * If {{--index-paths}} is specified then use that
> * If not and {{--index-definitions-file}} is provided then compute index 
> paths from that
> * If both are specified then {{--index-paths}} then merge as user may want to 
> reindex few indexes and also update few others



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-6541) While importing new index property indexes are getting marked for reindex

2017-08-09 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-6541.
--
   Resolution: Fixed
Fix Version/s: 1.7.6

Fixed with 1804631

> While importing new index property indexes are getting marked for reindex
> -
>
> Key: OAK-6541
> URL: https://issues.apache.org/jira/browse/OAK-6541
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: run
>Affects Versions: 1.7.5
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.8, 1.7.6
>
>
> OAK-6471 added support for adding new indexes. While doing that its being 
> seen that non lucene indexes are getting marked for reindex



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-6541) While importing new index property indexes are getting marked for reindex

2017-08-09 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-6541:


 Summary: While importing new index property indexes are getting 
marked for reindex
 Key: OAK-6541
 URL: https://issues.apache.org/jira/browse/OAK-6541
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: run
Affects Versions: 1.7.5
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Priority: Minor
 Fix For: 1.8


OAK-6471 added support for adding new indexes. While doing that its being seen 
that non lucene indexes are getting marked for reindex



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-6504) Active deletion of blobs needs to indicate information about purged blobs to mark-sweep collector

2017-08-09 Thread Amit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Jain resolved OAK-6504.

Resolution: Fixed

Incorporated the review suggestion, done with - r1804626, r1804628


> Active deletion of blobs needs to indicate information about purged blobs to 
> mark-sweep collector
> -
>
> Key: OAK-6504
> URL: https://issues.apache.org/jira/browse/OAK-6504
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.7.1
>Reporter: Vikas Saurabh
>Assignee: Amit Jain
>Priority: Minor
> Fix For: 1.8, 1.7.6
>
> Attachments: OAK_6504.patch
>
>
> Mark sweep blob collector (since 1.6) tracks blobs in store. Active purge of 
> lucene index blobs doesn't update these tracked blobs which leads to mark 
> sweep collector to attempt to delete those blobs again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6497) Support old Segment NodeStore setups for oak-run index tooling

2017-08-09 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121009#comment-16121009
 ] 

Chetan Mehrotra commented on OAK-6497:
--

With 1804624 added support to fallback to older oak-segment in case of 
InvalidFileStoreVersionException. With this user need not specify {{--segment}} 
option explicitly and tooling would take care of that

> Support old Segment NodeStore setups for oak-run index tooling
> --
>
> Key: OAK-6497
> URL: https://issues.apache.org/jira/browse/OAK-6497
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.6
>
> Attachments: OAK-6497-v1.patch
>
>
> oak-run index command has been introduced in trunk and can be used in read 
> only mode against existing setups. This would work fine for all 
> DocumentNodeStore setups. However would not work SegmentNodeStore setups <= 
> Oak 1.4
> This task is meant to figure out possible approaches for enabling such a 
> support for oak-run builds from trunk



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint

2017-08-09 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120994#comment-16120994
 ] 

Chetan Mehrotra commented on OAK-937:
-

bq.  For example, each index can have a multi-valued property "tags". Then a 
query can specify "option(index tag )".

+1. This allows customer to bind to specific index or enable QE to select from 
a set of indexes.

[~catholicon] Regarding the aggregate - There are other cases also like custom 
synonyms, analyzer configured for same nodetype. So its best to do selection at 
index level instead.

> Query engine index selection tweaks: shortcut and hint
> --
>
> Key: OAK-937
> URL: https://issues.apache.org/jira/browse/OAK-937
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Reporter: Alex Deparvu
>Assignee: Thomas Mueller
>Priority: Critical
>  Labels: performance
> Fix For: 1.8
>
>
> This issue covers 2 different changes related to the way the QueryEngine 
> selects a query index:
>  Firstly there could be a way to end the index selection process early via a 
> known constant value: if an index returns a known value token (like -1000) 
> then the query engine would effectively stop iterating through the existing 
> index impls and use that index directly.
>  Secondly it would be nice to be able to specify a desired index (if one is 
> known to perform better) thus skipping the existing selection mechanism (cost 
> calculation and comparison). This could be done via certain query hints [0].
> [0] http://en.wikipedia.org/wiki/Hint_(SQL)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-937) Query engine index selection tweaks: shortcut and hint

2017-08-09 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120750#comment-16120750
 ] 

Vikas Saurabh commented on OAK-937:
---

While I like the idea of providing tag-based-index hints (with a minor 
improvement could be pick a set of tags - "option(index tag ,") );  but
bq. The main problem I want to address with this issue is: there are multiple 
Lucene index configurations, with different aggregation rules.
I think this particular problem might be solved by doing indirection inside 
index def itself. e.g.
{noformat}
+ /aggregates//
   + useCase1/
  - oak:aggregateClassifier = true
  + 
   + useCase2/
  - oak:aggregateClassifier = true
  + 
   + 
{noformat}
... and extend {{contains()}} clause to potentially choose nothing (all 
aggregates participate) or a subset of classifiers.

The reason I'd want to solve multiple use-cases of aggregation/nodeScopeIndex 
this way is to still hold the convention that we have one index for a 
particular type - that, imo, makes people think more about index design and 
also provides a clearer view right away from index definitions (yes, tag 
approach would also work... but to me humans are worse at indirection than 
computers)

> Query engine index selection tweaks: shortcut and hint
> --
>
> Key: OAK-937
> URL: https://issues.apache.org/jira/browse/OAK-937
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Reporter: Alex Deparvu
>Assignee: Thomas Mueller
>Priority: Critical
>  Labels: performance
> Fix For: 1.8
>
>
> This issue covers 2 different changes related to the way the QueryEngine 
> selects a query index:
>  Firstly there could be a way to end the index selection process early via a 
> known constant value: if an index returns a known value token (like -1000) 
> then the query engine would effectively stop iterating through the existing 
> index impls and use that index directly.
>  Secondly it would be nice to be able to specify a desired index (if one is 
> known to perform better) thus skipping the existing selection mechanism (cost 
> calculation and comparison). This could be done via certain query hints [0].
> [0] http://en.wikipedia.org/wiki/Hint_(SQL)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6540) Session.hasAccess(...) should reflect read-only status of mounts

2017-08-09 Thread angela (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120127#comment-16120127
 ] 

angela commented on OAK-6540:
-

[~rombert], IMHO it has nothing to do with the security component as the 
read-only status is not defined by means of security. What I would suggest 
though is to use {{Session.hasCapability}} for that matter... this is exactly 
what your are looking for from a JCR API point of view :-) See 
https://docs.adobe.com/docs/en/spec/jcr/2.0/9_Permissions_and_Capabilities.html

> Session.hasAccess(...) should reflect read-only status of mounts
> 
>
> Key: OAK-6540
> URL: https://issues.apache.org/jira/browse/OAK-6540
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: composite, security
>Reporter: Robert Munteanu
> Fix For: 1.8, 1.7.6
>
>
> When a mount is set in read-only mode callers that check 
> {{Session.hasPermission("set_property", ...)}} or 
> {{Session.hasPermission("add_node", ...)}} for mounted paths will believe 
> that they are able to write under those paths. For a composite setup with a 
> read-only mount this should (IMO) reflect that callers are not able to write, 
> taking into account the mount information on top of the ACEs.
> [~anchela], [~stillalex] - WDYT?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6540) Session.hasAccess(...) should reflect read-only status of mounts

2017-08-09 Thread Robert Munteanu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120113#comment-16120113
 ] 

Robert Munteanu commented on OAK-6540:
--

[~anchela] - thanks for the quick reply. Do you see a way of surfacing this 
read-only status from the POV of the security component? I'd like to avoid 
binding clients to the {{spi.mount}} package.

> Session.hasAccess(...) should reflect read-only status of mounts
> 
>
> Key: OAK-6540
> URL: https://issues.apache.org/jira/browse/OAK-6540
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: composite, security
>Reporter: Robert Munteanu
> Fix For: 1.8, 1.7.6
>
>
> When a mount is set in read-only mode callers that check 
> {{Session.hasPermission("set_property", ...)}} or 
> {{Session.hasPermission("add_node", ...)}} for mounted paths will believe 
> that they are able to write under those paths. For a composite setup with a 
> read-only mount this should (IMO) reflect that callers are not able to write, 
> taking into account the mount information on top of the ACEs.
> [~anchela], [~stillalex] - WDYT?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6539) Decrease version export for org.apache.jackrabbit.oak.spi.security.authentication

2017-08-09 Thread Robert Munteanu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120101#comment-16120101
 ] 

Robert Munteanu commented on OAK-6539:
--

Are there {{@ProviderType}} interfaces exposed by this package? If so, I think 
it's unsafe to change the version back.

The reason is that if a package implements a {{@ProviderType}} interface from 
this package it would import {{[1.3.0,1.4.0)}} . If we move the version back to 
{{1.2.0}} then the imports would no longer resolve. 

On the other hand if this version was not included in a release we can revert 
it.

> Decrease version export for 
> org.apache.jackrabbit.oak.spi.security.authentication
> -
>
> Key: OAK-6539
> URL: https://issues.apache.org/jira/browse/OAK-6539
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, security
>Reporter: Alex Deparvu
>Assignee: Alex Deparvu
>Priority: Trivial
>
> There's a warning when building oak-core related to the export version for 
> the org.apache.jackrabbit.oak.spi.security.authentication package:
> {noformat}
> [WARNING] org.apache.jackrabbit.oak.spi.security.authentication: Excessive 
> version increase; detected 1.3.0, suggested 1.2.0
> {noformat}
> I see no reason to not decrease the version. [~anchela], thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-6540) Session.hasAccess(...) should reflect read-only status of mounts

2017-08-09 Thread angela (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angela resolved OAK-6540.
-
Resolution: Invalid

> Session.hasAccess(...) should reflect read-only status of mounts
> 
>
> Key: OAK-6540
> URL: https://issues.apache.org/jira/browse/OAK-6540
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: composite, security
>Reporter: Robert Munteanu
> Fix For: 1.8, 1.7.6
>
>
> When a mount is set in read-only mode callers that check 
> {{Session.hasPermission("set_property", ...)}} or 
> {{Session.hasPermission("add_node", ...)}} for mounted paths will believe 
> that they are able to write under those paths. For a composite setup with a 
> read-only mount this should (IMO) reflect that callers are not able to write, 
> taking into account the mount information on top of the ACEs.
> [~anchela], [~stillalex] - WDYT?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6540) Session.hasAccess(...) should reflect read-only status of mounts

2017-08-09 Thread angela (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angela updated OAK-6540:

Component/s: security

> Session.hasAccess(...) should reflect read-only status of mounts
> 
>
> Key: OAK-6540
> URL: https://issues.apache.org/jira/browse/OAK-6540
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: composite, security
>Reporter: Robert Munteanu
> Fix For: 1.8, 1.7.6
>
>
> When a mount is set in read-only mode callers that check 
> {{Session.hasPermission("set_property", ...)}} or 
> {{Session.hasPermission("add_node", ...)}} for mounted paths will believe 
> that they are able to write under those paths. For a composite setup with a 
> read-only mount this should (IMO) reflect that callers are not able to write, 
> taking into account the mount information on top of the ACEs.
> [~anchela], [~stillalex] - WDYT?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6540) Session.hasAccess(...) should reflect read-only status of mounts

2017-08-09 Thread angela (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120090#comment-16120090
 ] 

angela commented on OAK-6540:
-

[~rombert], I don't think that this would be correct as the read-only status 
has nothing to do with permission evalution. the read-only status of a mount is 
rather like the read-only status of the version storage, which isn't reflected 
in {{Session.hasPermission}} either. 

> Session.hasAccess(...) should reflect read-only status of mounts
> 
>
> Key: OAK-6540
> URL: https://issues.apache.org/jira/browse/OAK-6540
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: composite
>Reporter: Robert Munteanu
> Fix For: 1.8, 1.7.6
>
>
> When a mount is set in read-only mode callers that check 
> {{Session.hasPermission("set_property", ...)}} or 
> {{Session.hasPermission("add_node", ...)}} for mounted paths will believe 
> that they are able to write under those paths. For a composite setup with a 
> read-only mount this should (IMO) reflect that callers are not able to write, 
> taking into account the mount information on top of the ACEs.
> [~anchela], [~stillalex] - WDYT?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6539) Decrease version export for org.apache.jackrabbit.oak.spi.security.authentication

2017-08-09 Thread angela (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120087#comment-16120087
 ] 

angela commented on OAK-6539:
-

[~stillalex], no that I was aware of... i remember that i once had a major 
version bump and [~rombert] fixed that by adding provider type annotation... 
but i wasn't aware of that warning. feel free to fix it, removing a warning is 
always good! thanks for spotting.

> Decrease version export for 
> org.apache.jackrabbit.oak.spi.security.authentication
> -
>
> Key: OAK-6539
> URL: https://issues.apache.org/jira/browse/OAK-6539
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, security
>Reporter: Alex Deparvu
>Assignee: Alex Deparvu
>Priority: Trivial
>
> There's a warning when building oak-core related to the export version for 
> the org.apache.jackrabbit.oak.spi.security.authentication package:
> {noformat}
> [WARNING] org.apache.jackrabbit.oak.spi.security.authentication: Excessive 
> version increase; detected 1.3.0, suggested 1.2.0
> {noformat}
> I see no reason to not decrease the version. [~anchela], thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6504) Active deletion of blobs needs to indicate information about purged blobs to mark-sweep collector

2017-08-09 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120081#comment-16120081
 ] 

Vikas Saurabh commented on OAK-6504:


[~amitjain], the fix looks good to me. A minor nitpick though - I think the 
temp file to track deleted blobs should be created in {{rootDirectory}} passed 
onto {{ActiveDeletedBlobCollectorFactory}}.

For the test, I have OAK-6334 on my plate. I'd try to refactor those later. For 
now, extracting out and creating the new class look fine to me.

> Active deletion of blobs needs to indicate information about purged blobs to 
> mark-sweep collector
> -
>
> Key: OAK-6504
> URL: https://issues.apache.org/jira/browse/OAK-6504
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.7.1
>Reporter: Vikas Saurabh
>Assignee: Amit Jain
>Priority: Minor
> Fix For: 1.8, 1.7.6
>
> Attachments: OAK_6504.patch
>
>
> Mark sweep blob collector (since 1.6) tracks blobs in store. Active purge of 
> lucene index blobs doesn't update these tracked blobs which leads to mark 
> sweep collector to attempt to delete those blobs again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-6540) Session.hasAccess(...) should reflect read-only status of mounts

2017-08-09 Thread Robert Munteanu (JIRA)
Robert Munteanu created OAK-6540:


 Summary: Session.hasAccess(...) should reflect read-only status of 
mounts
 Key: OAK-6540
 URL: https://issues.apache.org/jira/browse/OAK-6540
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: composite
Reporter: Robert Munteanu
 Fix For: 1.8, 1.7.6


When a mount is set in read-only mode callers that check 
{{Session.hasPermission("set_property", ...)}} or 
{{Session.hasPermission("add_node", ...)}} for mounted paths will believe that 
they are able to write under those paths. For a composite setup with a 
read-only mount this should (IMO) reflect that callers are not able to write, 
taking into account the mount information on top of the ACEs.

[~anchela], [~stillalex] - WDYT?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6513) Journal based Async Indexer

2017-08-09 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-6513:
-
Description: 
Current async indexer design is based on NodeState diff. This has served us 
fine so far however off late it is not able to perform well if rate of 
repository writes is high. When changes happen faster than index-update can 
process them, larger and larger diffs will happen. These make index-updates 
slower, which again lead to the next diff being ever larger than the one before 
(assuming a constant ingestion rate). 

In current diff based flow the indexer performs complete diff for all changes 
happening between 2 cycle. It may happen that lots of writes happens but not 
much indexable content is written. So doing diff there is a wasted effort.

In 1.6 release for NRT Indexing we implemented a journal based indexing for 
external changes(OAK-4808, OAK-5430). That approach can be generalized and used 
for async indexing. 

Before talking about the journal based approach lets see how IndexEditor work 
currently

h4. IndexEditor 

Currently any IndexEditor performs 2 tasks

# Identify which node is to be indexed based on some index definition. The 
Editor gets invoked as part of content diff where it determines which NodeState 
is to be indexed
# Update the index based on node to be indexed

For e.g. in oak-lucene we have LuceneIndexEditor which identifies the 
NodeStates to be indexed and LuceneDocumentMaker which constructs the Lucene 
Document from NodeState to be indexed. For journal based approach we can 
decouple these 2 parts and thus have 

* IndexEditor - Identifies which all paths need to be indexed for given index 
definition
* IndexUpdater - Updates the index based on given NodeState and its path

h4. High Level Flow

# Session Commit Flow
## Each index type would provide a IndexEditor which would be invoked as part 
of commit (like sync indexes). These IndexEditor would just determine which 
paths needs to be indexed. 
## As part of commit the paths to be indexed would be written to journal. 
# AsyncIndexUpdate flow
## AsyncIndexUpdate would query this journal to fetch all such indexed paths 
between the 2 checkpoints
## Based on the index path data it would invoke the {{IndexUpdater}} to update 
the index for that path
## Merge the index updates

h4. Benefits

Such a design would have following impact

# More work done as part of write
# Marking of indexable content is distributed hence at indexing time lesser 
work to be done
# Indexing can progress in batches 
# The indexers can be called in parallel

h4. Journal Implementation

DocumentNodeStore currently has an in built journal which is being used for NRT 
Indexing. That feature can be exposed as an api. 

For scaling index this design is mostly required for cluster case. So we can 
possibly have both indexing support implemented and use the journal based 
support for DocumentNodeStore setups. Or we can look into implementing such a 
journal for SegmentNodeStore setups also

h4. Open Points

* Journal support in SegmentNodeStore
* Handling deletes. 

Detailed proposal - 
https://wiki.apache.org/jackrabbit/Journal%20based%20Async%20Indexer

  was:
Current async indexer design is based on NodeState diff. This has served us 
fine so far however off late it is not able to perform well if rate of 
repository writes is high. When changes happen faster than index-update can 
process them, larger and larger diffs will happen. These make index-updates 
slower, which again lead to the next diff being ever larger than the one before 
(assuming a constant ingestion rate). 

In current diff based flow the indexer performs complete diff for all changes 
happening between 2 cycle. It may happen that lots of writes happens but not 
much indexable content is written. So doing diff there is a wasted effort.

In 1.6 release for NRT Indexing we implemented a journal based indexing for 
external changes(OAK-4808, OAK-5430). That approach can be generalized and used 
for async indexing. 

Before talking about the journal based approach lets see how IndexEditor work 
currently

h4. IndexEditor 

Currently any IndexEditor performs 2 tasks

# Identify which node is to be indexed based on some index definition. The 
Editor gets invoked as part of content diff where it determines which NodeState 
is to be indexed
# Update the index based on node to be indexed

For e.g. in oak-lucene we have LuceneIndexEditor which identifies the 
NodeStates to be indexed and LuceneDocumentMaker which constructs the Lucene 
Document from NodeState to be indexed. For journal based approach we can 
decouple these 2 parts and thus have 

* IndexEditor - Identifies which all paths need to be indexed for given index 
definition
* IndexUpdater - Updates the index based on given NodeState and its path

h4. High Level Flow

# Session Commit Flow
## Each index type would provide a 

[jira] [Created] (OAK-6539) Decrease version export for org.apache.jackrabbit.oak.spi.security.authentication

2017-08-09 Thread Alex Deparvu (JIRA)
Alex Deparvu created OAK-6539:
-

 Summary: Decrease version export for 
org.apache.jackrabbit.oak.spi.security.authentication
 Key: OAK-6539
 URL: https://issues.apache.org/jira/browse/OAK-6539
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: core, security
Reporter: Alex Deparvu
Assignee: Alex Deparvu
Priority: Trivial


There's a warning when building oak-core related to the export version for the 
org.apache.jackrabbit.oak.spi.security.authentication package:
{noformat}
[WARNING] org.apache.jackrabbit.oak.spi.security.authentication: Excessive 
version increase; detected 1.3.0, suggested 1.2.0
{noformat}

I see no reason to not decrease the version. [~anchela], thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-5902) Cold standby should allow syncing of blobs bigger than 2.2 GB

2017-08-09 Thread Andrei Dulceanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Dulceanu resolved OAK-5902.
--
Resolution: Fixed

Fixed at r1804515.
Created OAK-6538 to investigate cold standby memory consumption.

> Cold standby should allow syncing of blobs bigger than 2.2 GB
> -
>
> Key: OAK-5902
> URL: https://issues.apache.org/jira/browse/OAK-5902
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Affects Versions: 1.6.1
>Reporter: Andrei Dulceanu
>Assignee: Andrei Dulceanu
>Priority: Minor
> Fix For: 1.8, 1.7.6
>
>
> Currently there is a limitation for the maximum binary size (in bytes) to be 
> synced between primary and standby instances. This matches 
> {{Integer.MAX_VALUE}} (2,147,483,647) bytes and no binaries bigger than this 
> limit can be synced between the instances.
> Per comment at [1], the current protocol needs to be changed to allow sending 
> of binaries in chunks, to surpass this limitation.
> [1] 
> https://github.com/apache/jackrabbit-oak/blob/1.6/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/standby/client/StandbyClient.java#L125



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-6527) CompositeNodeStore permission evaluation fails for open setups

2017-08-09 Thread Alex Deparvu (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Deparvu resolved OAK-6527.
---
Resolution: Fixed

fixed with http://svn.apache.org/viewvc?rev=1804509=rev

following [~anchela]'s feedback I moved the flush method and dropped the 
AbstractPermissionStore.

> CompositeNodeStore permission evaluation fails for open setups
> --
>
> Key: OAK-6527
> URL: https://issues.apache.org/jira/browse/OAK-6527
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: composite, security
>Affects Versions: 1.7.3, 1.7.4, 1.7.5
>Reporter: Alex Deparvu
>Assignee: Alex Deparvu
> Fix For: 1.7.6
>
>
> It seems the current setup of OR-ing the composite nodestore permission 
> setups breaks down when the root node has an allow all reads. This seems a 
> fundamental flaw in the way it works now, so I'm considering going back to 
> the drawing board and working on the solution proposed by [~chetanm] as a 
> part of OAK-3777, effectively making OAK-6356 and OAK-6469 obsolete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-6538) Investigate cold standby memory consumption

2017-08-09 Thread Andrei Dulceanu (JIRA)
Andrei Dulceanu created OAK-6538:


 Summary: Investigate cold standby memory consumption 
 Key: OAK-6538
 URL: https://issues.apache.org/jira/browse/OAK-6538
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: segment-tar
Affects Versions: 1.6.1
Reporter: Andrei Dulceanu
Assignee: Andrei Dulceanu
Priority: Minor
 Fix For: 1.8, 1.7.6


In an investigation from some time ago, 4GB of heap were needed for 
transferring 1GB blob and 6GB for 2GB blob. This was in part due to using 
{{addTestContent}} [0] in the investigation, which allocates a huge {{byte[]}} 
on the heap. 

OAK-5902 introduced chunking for transferring blobs between primary and 
standby. This way, the memory needed for syncing a big blob should be around 
the chunk size used. Solving the way test data is created, it should be 
possible to transfer a big blob (e.g. 2.5 GB) with less memory.

[0] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/test/java/org/apache/jackrabbit/oak/segment/standby/DataStoreTestBase.java#L96



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-6537) Don't encode the checksums in the TAR index tests

2017-08-09 Thread Francesco Mari (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Mari resolved OAK-6537.
-
Resolution: Fixed

Fixed at r1804504.

> Don't encode the checksums in the TAR index tests
> -
>
> Key: OAK-6537
> URL: https://issues.apache.org/jira/browse/OAK-6537
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
> Fix For: 1.8, 1.7.6
>
>
> The tests for the different formats of the TAR indices encode the checksums 
> of the entries. This makes the tests particularly brittle. The checksums 
> should be computed on the fly based on the test data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-4638) Mostly async unique index (for UUIDs for example)

2017-08-09 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119733#comment-16119733
 ] 

Chetan Mehrotra edited comment on OAK-4638 at 8/9/17 11:19 AM:
---

Based on approach proposed here I have also created OAK-4638 which covers both. 
Put up an initial proposal at [https://wiki.apache.org/jackrabbit/Synchronous 
Lucene Property Indexes]


was (Author: chetanm):
Based on approach proposed here I have also created OAK-4638 which covers both. 
Put up an initial proposal at https://wiki.apache.org/jackrabbit/Synchronous 
Lucene Property Indexes

> Mostly async unique index (for UUIDs for example)
> -
>
> Key: OAK-4638
> URL: https://issues.apache.org/jira/browse/OAK-4638
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: property-index, query
>Reporter: Thomas Mueller
>
> The UUID index takes a lot of space. For the UUID index, we should consider 
> using mainly an async index. This is possible because there are two types of 
> UUIDs: those generated in Oak, which are sure to be unique (no need to 
> check), and those set in the application code, for example by importing 
> packages. For older nodes, an async index is sufficient, and a synchronous 
> index is only (temporarily) needed for imported nodes. For UUIDs, we could 
> also change the generation algorithm if needed.
> It might be possible to use a similar pattern for regular unique indexes as 
> well: only keep the added entries of the last 24 hours (for example) in a 
> property index, and then move entries to an async index which needs less 
> space. That would slow down adding entries, as two indexes need to be checked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-4638) Mostly async unique index (for UUIDs for example)

2017-08-09 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119733#comment-16119733
 ] 

Chetan Mehrotra commented on OAK-4638:
--

Based on approach proposed here I have also created OAK-4638 which covers both. 
Put up an initial proposal at https://wiki.apache.org/jackrabbit/Synchronous 
Lucene Property Indexes

> Mostly async unique index (for UUIDs for example)
> -
>
> Key: OAK-4638
> URL: https://issues.apache.org/jira/browse/OAK-4638
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: property-index, query
>Reporter: Thomas Mueller
>
> The UUID index takes a lot of space. For the UUID index, we should consider 
> using mainly an async index. This is possible because there are two types of 
> UUIDs: those generated in Oak, which are sure to be unique (no need to 
> check), and those set in the application code, for example by importing 
> packages. For older nodes, an async index is sufficient, and a synchronous 
> index is only (temporarily) needed for imported nodes. For UUIDs, we could 
> also change the generation algorithm if needed.
> It might be possible to use a similar pattern for regular unique indexes as 
> well: only keep the added entries of the last 24 hours (for example) in a 
> property index, and then move entries to an async index which needs less 
> space. That would slow down adding entries, as two indexes need to be checked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-6537) Don't encode the checksums in the TAR index tests

2017-08-09 Thread Francesco Mari (JIRA)
Francesco Mari created OAK-6537:
---

 Summary: Don't encode the checksums in the TAR index tests
 Key: OAK-6537
 URL: https://issues.apache.org/jira/browse/OAK-6537
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: segment-tar
Reporter: Francesco Mari
Assignee: Francesco Mari
 Fix For: 1.8, 1.7.6


The tests for the different formats of the TAR indices encode the checksums of 
the entries. This makes the tests particularly brittle. The checksums should be 
computed on the fly based on the test data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-6529) IndexLoaderV1 and IndexLoaderV2 should not rely on Buffer.array()

2017-08-09 Thread Francesco Mari (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Mari resolved OAK-6529.
-
Resolution: Fixed

Fixed at r1804503.

> IndexLoaderV1 and IndexLoaderV2 should not rely on Buffer.array()
> -
>
> Key: OAK-6529
> URL: https://issues.apache.org/jira/browse/OAK-6529
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segment-tar
>Reporter: Francesco Mari
>Assignee: Francesco Mari
> Fix For: 1.8, 1.7.6
>
>
> The code in {{IndexLoaderV1}} and {{IndexLoaderV2}} calls {{Buffer.array()}} 
> to compute the checksum. This method might fail with an 
> {{UnsupportedOperationException}} if the {{Buffer}} points to a memory mapped 
> region.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-5902) Cold standby should allow syncing of blobs bigger than 2.2 GB

2017-08-09 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119687#comment-16119687
 ] 

Francesco Mari commented on OAK-5902:
-

[~dulceanu] makes sense. Go ahead and commit this, we will tackle the rest 
later.

> Cold standby should allow syncing of blobs bigger than 2.2 GB
> -
>
> Key: OAK-5902
> URL: https://issues.apache.org/jira/browse/OAK-5902
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Affects Versions: 1.6.1
>Reporter: Andrei Dulceanu
>Assignee: Andrei Dulceanu
>Priority: Minor
> Fix For: 1.8, 1.7.6
>
>
> Currently there is a limitation for the maximum binary size (in bytes) to be 
> synced between primary and standby instances. This matches 
> {{Integer.MAX_VALUE}} (2,147,483,647) bytes and no binaries bigger than this 
> limit can be synced between the instances.
> Per comment at [1], the current protocol needs to be changed to allow sending 
> of binaries in chunks, to surpass this limitation.
> [1] 
> https://github.com/apache/jackrabbit-oak/blob/1.6/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/standby/client/StandbyClient.java#L125



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-3710) Continuous revision GC

2017-08-09 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger resolved OAK-3710.
---
   Resolution: Fixed
 Assignee: Marcel Reutegger
Fix Version/s: 1.7.6
   1.8

This feature is now implemented but disabled by default. See also documentation 
on OSGi configuration for the DocumentNodeStore (versionGCContinuous): 
https://jackrabbit.apache.org/oak/docs/osgi_config.html#DocumentNodeStore

> Continuous revision GC
> --
>
> Key: OAK-3710
> URL: https://issues.apache.org/jira/browse/OAK-3710
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: documentmk
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
> Fix For: 1.8, 1.7.6
>
>
> Implement continuous revision GC cleaning up documents older than a given 
> threshold (e.g. one day). This issue is related to OAK-3070 where each GC run 
> starts where the last one finished.
> This will avoid peak load on the system as we see it right now, when GC is 
> triggered once a day.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OAK-6536) Periodic log message from continuous RGC

2017-08-09 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger resolved OAK-6536.
---
   Resolution: Fixed
Fix Version/s: 1.7.6

The continuous revision GC job now logs an info message ever hour.

Implemented in trunk: http://svn.apache.org/r1804500

> Periodic log message from continuous RGC
> 
>
> Key: OAK-6536
> URL: https://issues.apache.org/jira/browse/OAK-6536
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, documentmk
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>Priority: Minor
> Fix For: 1.8, 1.7.6
>
>
> The continuous revision garbage collection should issue periodic info log 
> messages with statistics. The format should be similar to the log message 
> issued by the regular revision garbage collector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-5902) Cold standby should allow syncing of blobs bigger than 2.2 GB

2017-08-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119638#comment-16119638
 ] 

Michael Dürig commented on OAK-5902:


bq. I suggest to create a separate issue for analysing the memory consumption 
and to commit all the changes

+1

> Cold standby should allow syncing of blobs bigger than 2.2 GB
> -
>
> Key: OAK-5902
> URL: https://issues.apache.org/jira/browse/OAK-5902
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Affects Versions: 1.6.1
>Reporter: Andrei Dulceanu
>Assignee: Andrei Dulceanu
>Priority: Minor
> Fix For: 1.8, 1.7.6
>
>
> Currently there is a limitation for the maximum binary size (in bytes) to be 
> synced between primary and standby instances. This matches 
> {{Integer.MAX_VALUE}} (2,147,483,647) bytes and no binaries bigger than this 
> limit can be synced between the instances.
> Per comment at [1], the current protocol needs to be changed to allow sending 
> of binaries in chunks, to surpass this limitation.
> [1] 
> https://github.com/apache/jackrabbit-oak/blob/1.6/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/standby/client/StandbyClient.java#L125



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6504) Active deletion of blobs needs to indicate information about purged blobs to mark-sweep collector

2017-08-09 Thread Amit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Jain updated OAK-6504:
---
Attachment: OAK_6504.patch

Attached patch.
[~catholicon], [~chetanm] Please review. I also restructured the existing 
ActiveDeletedBlobCollectionIT to extract an abstract class and added a new test 
class.

> Active deletion of blobs needs to indicate information about purged blobs to 
> mark-sweep collector
> -
>
> Key: OAK-6504
> URL: https://issues.apache.org/jira/browse/OAK-6504
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.7.1
>Reporter: Vikas Saurabh
>Assignee: Amit Jain
>Priority: Minor
> Fix For: 1.8, 1.7.6
>
> Attachments: OAK_6504.patch
>
>
> Mark sweep blob collector (since 1.6) tracks blobs in store. Active purge of 
> lucene index blobs doesn't update these tracked blobs which leads to mark 
> sweep collector to attempt to delete those blobs again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6504) Active deletion of blobs needs to indicate information about purged blobs to mark-sweep collector

2017-08-09 Thread Amit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Jain updated OAK-6504:
---
Fix Version/s: 1.7.6

> Active deletion of blobs needs to indicate information about purged blobs to 
> mark-sweep collector
> -
>
> Key: OAK-6504
> URL: https://issues.apache.org/jira/browse/OAK-6504
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.7.1
>Reporter: Vikas Saurabh
>Assignee: Amit Jain
>Priority: Minor
> Fix For: 1.8, 1.7.6
>
> Attachments: OAK_6504.patch
>
>
> Mark sweep blob collector (since 1.6) tracks blobs in store. Active purge of 
> lucene index blobs doesn't update these tracked blobs which leads to mark 
> sweep collector to attempt to delete those blobs again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-5902) Cold standby should allow syncing of blobs bigger than 2.2 GB

2017-08-09 Thread Andrei Dulceanu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119568#comment-16119568
 ] 

Andrei Dulceanu commented on OAK-5902:
--

[~frm], [~mduerig]

bq. Before committing, the problems with the memory consumption in 
{{DataStoreTestBase.testSyncBigBlog}} 
I think the memory consumption is the key here. In an investigation from some 
time ago, 4GB of heap were needed for 1GB blob and 6GB for 2GB blob. This was 
in part due to using {{addTestContent}} in the investigation, which allocates 
that huge {{byte[]}} on the heap. With the new approach in 
{{addTestContentOnTheFly}} this problem is solved, and the chunking per se 
improved things a lot. We are now in the position of successfully syncing 2.5 
GB blob with only 3.5 GB memory. 

bq. the running time in ExternalPrivateStoreIT should be investigated.
My analysis shows that {{51s}} are spent for adding the test content (i.e. 2.5 
GB blob), {{61s}} are spent for syncing between master and standby and another 
{{44s}} are spent for checking that the sync was ok (i.e. comparing two streams 
summing up to 2.5 GB). I find nothing unusual here.

bq. Agreed, increasing the heap for the tests is problematic and we shouldn't 
do this. At least we need to understand where the memory requirements come 
from: is it the test or the code?
Agree. I suggest to create a separate issue for analysing the memory 
consumption and to commit all the changes, except:
* heap size increase in {{pom.xml}}
* annotate {{testSyncBigBlog}} with {{@Ignore(OAK-XXX)}}

Since all our ITs for cold standby use chunking now (the default {{1MB}} chunk 
size) and they all pass, I'd say we can safely commit the rest of the changes, 
as explained above.

WDYT?



> Cold standby should allow syncing of blobs bigger than 2.2 GB
> -
>
> Key: OAK-5902
> URL: https://issues.apache.org/jira/browse/OAK-5902
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Affects Versions: 1.6.1
>Reporter: Andrei Dulceanu
>Assignee: Andrei Dulceanu
>Priority: Minor
> Fix For: 1.8, 1.7.6
>
>
> Currently there is a limitation for the maximum binary size (in bytes) to be 
> synced between primary and standby instances. This matches 
> {{Integer.MAX_VALUE}} (2,147,483,647) bytes and no binaries bigger than this 
> limit can be synced between the instances.
> Per comment at [1], the current protocol needs to be changed to allow sending 
> of binaries in chunks, to surpass this limitation.
> [1] 
> https://github.com/apache/jackrabbit-oak/blob/1.6/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/standby/client/StandbyClient.java#L125



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-6536) Periodic log message from continuous RGC

2017-08-09 Thread Marcel Reutegger (JIRA)
Marcel Reutegger created OAK-6536:
-

 Summary: Periodic log message from continuous RGC
 Key: OAK-6536
 URL: https://issues.apache.org/jira/browse/OAK-6536
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: core, documentmk
Reporter: Marcel Reutegger
Assignee: Marcel Reutegger
Priority: Minor
 Fix For: 1.8


The continuous revision garbage collection should issue periodic info log 
messages with statistics. The format should be similar to the log message 
issued by the regular revision garbage collector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-5902) Cold standby should allow syncing of blobs bigger than 2.2 GB

2017-08-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119542#comment-16119542
 ] 

Michael Dürig commented on OAK-5902:


bq. memory consumption

Agreed, increasing the heap for the tests is problematic and we shouldn't do 
this. At least we need to understand where the memory requirements come from: 
is it the test or the code?

> Cold standby should allow syncing of blobs bigger than 2.2 GB
> -
>
> Key: OAK-5902
> URL: https://issues.apache.org/jira/browse/OAK-5902
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Affects Versions: 1.6.1
>Reporter: Andrei Dulceanu
>Assignee: Andrei Dulceanu
>Priority: Minor
> Fix For: 1.8, 1.7.6
>
>
> Currently there is a limitation for the maximum binary size (in bytes) to be 
> synced between primary and standby instances. This matches 
> {{Integer.MAX_VALUE}} (2,147,483,647) bytes and no binaries bigger than this 
> limit can be synced between the instances.
> Per comment at [1], the current protocol needs to be changed to allow sending 
> of binaries in chunks, to surpass this limitation.
> [1] 
> https://github.com/apache/jackrabbit-oak/blob/1.6/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/standby/client/StandbyClient.java#L125



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-5902) Cold standby should allow syncing of blobs bigger than 2.2 GB

2017-08-09 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119535#comment-16119535
 ] 

Francesco Mari commented on OAK-5902:
-

[~dulceanu], I had a look at the code. Your solution looks very good. Before 
committing, the problems with the memory consumption in 
`DataStoreTestBase.testSyncBigBlob` and the running time in 
`ExternalPrivateStoreIT` should be investigated. It would be good to at least 
frame the problem, so we can plan for further improvements on this patch.

> Cold standby should allow syncing of blobs bigger than 2.2 GB
> -
>
> Key: OAK-5902
> URL: https://issues.apache.org/jira/browse/OAK-5902
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Affects Versions: 1.6.1
>Reporter: Andrei Dulceanu
>Assignee: Andrei Dulceanu
>Priority: Minor
> Fix For: 1.8, 1.7.6
>
>
> Currently there is a limitation for the maximum binary size (in bytes) to be 
> synced between primary and standby instances. This matches 
> {{Integer.MAX_VALUE}} (2,147,483,647) bytes and no binaries bigger than this 
> limit can be synced between the instances.
> Per comment at [1], the current protocol needs to be changed to allow sending 
> of binaries in chunks, to surpass this limitation.
> [1] 
> https://github.com/apache/jackrabbit-oak/blob/1.6/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/standby/client/StandbyClient.java#L125



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-5902) Cold standby should allow syncing of blobs bigger than 2.2 GB

2017-08-09 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119535#comment-16119535
 ] 

Francesco Mari edited comment on OAK-5902 at 8/9/17 7:47 AM:
-

[~dulceanu], I had a look at the code. Your solution looks very good. Before 
committing, the problems with the memory consumption in 
{{DataStoreTestBase.testSyncBigBlo}} and the running time in 
{{ExternalPrivateStoreIT}} should be investigated. It would be good to at least 
frame the problem, so we can plan for further improvements on this patch.


was (Author: frm):
[~dulceanu], I had a look at the code. Your solution looks very good. Before 
committing, the problems with the memory consumption in 
`DataStoreTestBase.testSyncBigBlob` and the running time in 
`ExternalPrivateStoreIT` should be investigated. It would be good to at least 
frame the problem, so we can plan for further improvements on this patch.

> Cold standby should allow syncing of blobs bigger than 2.2 GB
> -
>
> Key: OAK-5902
> URL: https://issues.apache.org/jira/browse/OAK-5902
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segment-tar
>Affects Versions: 1.6.1
>Reporter: Andrei Dulceanu
>Assignee: Andrei Dulceanu
>Priority: Minor
> Fix For: 1.8, 1.7.6
>
>
> Currently there is a limitation for the maximum binary size (in bytes) to be 
> synced between primary and standby instances. This matches 
> {{Integer.MAX_VALUE}} (2,147,483,647) bytes and no binaries bigger than this 
> limit can be synced between the instances.
> Per comment at [1], the current protocol needs to be changed to allow sending 
> of binaries in chunks, to surpass this limitation.
> [1] 
> https://github.com/apache/jackrabbit-oak/blob/1.6/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/standby/client/StandbyClient.java#L125



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-6534) Compute indexPaths from index definitions json

2017-08-09 Thread Paul Chibulcuteanu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119499#comment-16119499
 ] 

Paul Chibulcuteanu edited comment on OAK-6534 at 8/9/17 6:54 AM:
-

[~chetanm], yes this would be fine. This way, if one wants to reindex 
everything present in the --index-definitions-file then --index-paths should 
not be provided.


was (Author: chibulcu):
[~chetanm], yes this would be fine. This way, if one wants to reindex 
everything present in the _--index-definitions-file_ then _--index-paths_ 
should not be provided.

> Compute indexPaths from index definitions json
> --
>
> Key: OAK-6534
> URL: https://issues.apache.org/jira/browse/OAK-6534
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.8
>
>
> Currently while adding/updating indexes via {{--index-definitions-file}} 
> (OAK-6471) the index paths are always determined by {{--index-paths}} option. 
> If there are more index definitions present in the json file then those would 
> be ignored.
> To avoid confusion following approach should be implemented
> * If {{--index-paths}} is specified then use that
> * If not and {{--index-definitions-file}} is provided then compute index 
> paths from that
> * If both are specified then {{--index-paths}} takes precendence (no merging 
> done)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6534) Compute indexPaths from index definitions json

2017-08-09 Thread Paul Chibulcuteanu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119499#comment-16119499
 ] 

Paul Chibulcuteanu commented on OAK-6534:
-

[~chetanm], yes this would be fine. This way, if one wants to reindex 
everything present in the _--index-definitions-file_ then _--index-paths_ 
should not be provided.

> Compute indexPaths from index definitions json
> --
>
> Key: OAK-6534
> URL: https://issues.apache.org/jira/browse/OAK-6534
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.8
>
>
> Currently while adding/updating indexes via {{--index-definitions-file}} 
> (OAK-6471) the index paths are always determined by {{--index-paths}} option. 
> If there are more index definitions present in the json file then those would 
> be ignored.
> To avoid confusion following approach should be implemented
> * If {{--index-paths}} is specified then use that
> * If not and {{--index-definitions-file}} is provided then compute index 
> paths from that
> * If both are specified then {{--index-paths}} takes precendence (no merging 
> done)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-6535) Synchronous Lucene Property Indexes

2017-08-09 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-6535:


 Summary: Synchronous Lucene Property Indexes
 Key: OAK-6535
 URL: https://issues.apache.org/jira/browse/OAK-6535
 Project: Jackrabbit Oak
  Issue Type: New Feature
  Components: lucene, property-index
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
 Fix For: 1.8


Oak 1.6 added support for Lucene Hybrid Index (OAK-4412). That enables near 
real time (NRT) support for Lucene based indexes. It also had a limited support 
for sync indexes. This feature aims to improve that to next level and enable 
support for sync property indexes.

More details at 
https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (OAK-6269) Support non chunk storage in OakDirectory

2017-08-09 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra reassigned OAK-6269:


Assignee: Vikas Saurabh

> Support non chunk storage in OakDirectory
> -
>
> Key: OAK-6269
> URL: https://issues.apache.org/jira/browse/OAK-6269
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Vikas Saurabh
> Fix For: 1.8
>
>
> Logging this issue based on offline discussion with [~catholicon].
> Currently OakDirectory stores files in chunk of 1 MB each. So a 1 GB file 
> would be stored in 1000+ chunks of 1 MB.
> This design was done to support direct usage of OakDirectory with Lucene as 
> Lucene makes use of random io. Chunked storage allows it to seek to random 
> position quickly. If the files are stored as Blobs then its only possible to 
> access via streaming which would be slow
> As most setup now use copy-on-read and copy-on-write support and rely on 
> local copy of index we can have an implementation which stores the file as 
> single blob.
> *Pros*
> * Quite a bit of reduction in number of small blobs stored in BlobStore. 
> Which should reduce the GC time specially for S3 
> * Reduced overhead of storing a single file in repository. Instead of array 
> of 1k blobids we would be stored a single blobid
> * Potential improvement in IO cost as file can be read in one connection and 
> uploaded in one.
> *Cons*
> It would not be possible to use OakDirectory directly (or would be very slow) 
> and we would always need to do local copy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6513) Journal based Async Indexer

2017-08-09 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-6513:
-
Description: 
Current async indexer design is based on NodeState diff. This has served us 
fine so far however off late it is not able to perform well if rate of 
repository writes is high. When changes happen faster than index-update can 
process them, larger and larger diffs will happen. These make index-updates 
slower, which again lead to the next diff being ever larger than the one before 
(assuming a constant ingestion rate). 

In current diff based flow the indexer performs complete diff for all changes 
happening between 2 cycle. It may happen that lots of writes happens but not 
much indexable content is written. So doing diff there is a wasted effort.

In 1.6 release for NRT Indexing we implemented a journal based indexing for 
external changes(OAK-4808, OAK-5430). That approach can be generalized and used 
for async indexing. 

Before talking about the journal based approach lets see how IndexEditor work 
currently

h4. IndexEditor 

Currently any IndexEditor performs 2 tasks

# Identify which node is to be indexed based on some index definition. The 
Editor gets invoked as part of content diff where it determines which NodeState 
is to be indexed
# Update the index based on node to be indexed

For e.g. in oak-lucene we have LuceneIndexEditor which identifies the 
NodeStates to be indexed and LuceneDocumentMaker which constructs the Lucene 
Document from NodeState to be indexed. For journal based approach we can 
decouple these 2 parts and thus have 

* IndexEditor - Identifies which all paths need to be indexed for given index 
definition
* IndexUpdater - Updates the index based on given NodeState and its path

h4. High Level Flow

# Session Commit Flow
## Each index type would provide a IndexEditor which would be invoked as part 
of commit (like sync indexes). These IndexEditor would just determine which 
paths needs to be indexed. 
## As part of commit the paths to be indexed would be written to journal. 
# AsyncIndexUpdate flow
## AsyncIndexUpdate would query this journal to fetch all such indexed paths 
between the 2 checkpoints
## Based on the index path data it would invoke the {{IndexUpdater}} to update 
the index for that path
## Merge the index updates

h4. Benefits

Such a design would have following impact

# More work done as part of write
# Marking of indexable content is distributed hence at indexing time lesser 
work to be done
# Indexing can progress in batches 
# The indexers can be called in parallel

h4. Journal Implementation

DocumentNodeStore currently has an in built journal which is being used for NRT 
Indexing. That feature can be exposed as an api. 

For scaling index this design is mostly required for cluster case. So we can 
possibly have both indexing support implemented and use the journal based 
support for DocumentNodeStore setups. Or we can look into implementing such a 
journal for SegmentNodeStore setups also

h4. Open Points

* Journal support in SegmentNodeStore
* Handling deletes. 


  was:
Current async indexer design is based on NodeState diff. This has served us 
fine so far however off late it is not able to perform well if rate of 
repository writes is high. When changes happen faster than index-update can 
process them, larger and larger diffs will happen. These make index-updates 
slower, which again lead to the next diff being ever larger than the one before 
(assuming a constant ingestion rate). 

In current diff based flow the indexer performs complete diff for all changes 
happening between 2 cycle. It may happen that lots of writes happens but not 
much indexable content is written. So doing diff there is a wasted effort.

In 1.6 release for NRT Indexing we implemented a journal based indexing for 
external changes(OAK-4808, OAK-5430). That approach can be generalized and used 
for async indexing. 

Before talking about the journal based approach lets see how IndexEditor work 
currently

h4. IndexEditor 

Currently any IndexEditor performs 2 tasks

# Identify which node is to be indexed based on some index definition. The 
Editor gets invoked as part of content diff where it determines which NodeState 
is to be indexed
# Update the index based on node to be indexed

For e.g. in oak-lucene we have LuceneIndexEditor which identifies the 
NodeStates to be indexed and LuceneDocumentMaker which constructs the Lucene 
Document from NodeState to be indexed. For journal based approach we can 
decouple these 2 parts and thus have 

* IndexEditor - Identifies which all paths need to be indexed for given index 
definition
* IndexUpdater - Updates the index based on given NodeState and its path

h4. High Level Flow

# Session Commit Flow
## Each index type would provide a IndexEditor which would be invoked as part 
of commit (like sync indexes). These