Re: Question about Oak search/query.

2013-12-06 Thread Jukka Zitting
Hi,

On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.uk wrote:
 Will the search index contain access control information or will the
 search results be filtered as each result is retrieved ?

The results will be filtered after the index lookup. It would be
possible for a custom search index to do the access checks already
when building/updating the index, but even in that case the query
engine would still double-check the access rights (the benefit would
be to avoid having to retrieve and then discard many inaccessible
hits).

 If the number of terms in the query exceeds the number of terms
 supported by Solr, does the Oak handle that transparently ?

I'm not sure, you'll need to look at the oak-solr indexing code. Or
perhaps Tommaso who wrote the code can chime in here.

BR,

Jukka Zitting


Re: Question about MVCC with MongoMK.

2013-12-06 Thread Jukka Zitting
Hi,

On Thu, Dec 5, 2013 at 9:43 PM, Ian Boston i...@tfd.co.uk wrote:
 Is it possible to branch an Oak repository and maintain a detached
 root node for a period of time that one or more Oak instances attached
 to a MongoDB instance can follow for a short period of time before
 merging the branch back into the main tree ?

The SegmentMK (with the MongoDB backend) can do this using the
hierarchical journal feature. The SegmentMK maintains one or more
journals that each track the evolution of a particular branch of the
repository. These branches would normally be automatically merged back
to the root journal, but a particular deployment could easily
disable automatic merging for a particular branch and use it for a
purpose like the one you described.

BR,

Jukka Zitting


Re: Question about MVCC with MongoMK.

2013-12-06 Thread Bertrand Delacretaz
Hi,

On Fri, Dec 6, 2013 at 11:12 AM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 ...The SegmentMK (with the MongoDB backend) can do this using the
 hierarchical journal feature The SegmentMK maintains one or more
 journals that each track the evolution of a particular branch of the
 repository. These branches would normally be automatically merged back
 to the root journal, but a particular deployment could easily
 disable automatic merging for a particular branch and use it for a
 purpose like the one you described

Is there a way to tell the repository to start operating on such a
branch forever for a given client?

IIUC Ian's scenario, an application instance would tell Oak create a
BEFORE_UPGRADE branch and start working on that from now on so that
the content can be upgraded in the background and tested on other
application instances, before eventually merging the BEFORE_UPGRADE
branch back.

Is that possible today, or reasonably simple to implement? If yes that
would enable such a scenario with minimal application changes, which
sounds extremely useful.

-Bertrand


Re: Question about MVCC with MongoMK.

2013-12-06 Thread Jukka Zitting
Hi,

On Fri, Dec 6, 2013 at 5:26 AM, Bertrand Delacretaz
bdelacre...@apache.org wrote:
 Is that possible today, or reasonably simple to implement? If yes that
 would enable such a scenario with minimal application changes, which
 sounds extremely useful.

It's not available yet, but shouldn't be too difficult to implement.

The main question here is whether we want to go down that path, as the
feature is only available with the SegmentMK (at least for now) and
we've generally wanted to avoid exposing such implementation-specific
features to higher level code.

BR,

Jukka Zitting


Re: Question about Oak search/query.

2013-12-06 Thread Tommaso Teofili
Hi all,

2013/12/6 Jukka Zitting jukka.zitt...@gmail.com

 Hi,

 On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.uk wrote:
  Will the search index contain access control information or will the
  search results be filtered as each result is retrieved ?

 The results will be filtered after the index lookup. It would be
 possible for a custom search index to do the access checks already
 when building/updating the index, but even in that case the query
 engine would still double-check the access rights (the benefit would
 be to avoid having to retrieve and then discard many inaccessible
 hits).


by the way, probably there's room for some optimization, e.g. very simple
idea: exclude paths at depth 1 (children of root node) the principle is not
able to read (which may mean adding them to the query passed to the Index
implementation), if any, then you'd always have to apply fine grained ACLs
on the result but maybe excluding some branches from start may help.



  If the number of terms in the query exceeds the number of terms
  supported by Solr, does the Oak handle that transparently ?

 I'm not sure, you'll need to look at the oak-solr indexing code. Or
 perhaps Tommaso who wrote the code can chime in here.


sure.
What limitation are you exactly referring to? Is it the BooleanQuery max
clause limit [1]?

Regards,
Tommaso


[1] :
http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanQuery.TooManyClauses.html


 BR,

 Jukka Zitting



Re: Question: In Repository Index file.

2013-12-06 Thread Alex Parvulescu
Hi,

On Fri, Dec 6, 2013 at 11:06 AM, Jukka Zitting jukka.zitt...@gmail.comwrote:

 Hi,

 On Fri, Dec 6, 2013 at 1:25 AM, Ian Boston i...@tfd.co.uk wrote:
  In Oak when a index is stored in the repository, how is it updated
  when the repository is MongoDB backed and there are multiple JVM
  processes connected to the MongoDB ?

 That depends on the index and MK implementations. For example the
 PropertyIndex uses an index structure that can be updated concurrently
 when the updates affect different areas of the content repository.
 When using the MongoMK backend concurrent updates to the same nodes
 will automatically be synchronized, and with the SegmentMK (which also
 can be used with MongoDB) all commits against the same journal are
 synchronized. In both cases concurrent updates will automatically get
 resolved.

  Also if using SolrCloud as a search index, is it possible to fallback
  to an internal repository stored index if the the SolrCloud index
  becomes unavailable ?

 Yes. The query engine will automatically pick the best available index
 for each query execution. If a particular index is not available, then
 the second-best match for those queries that would have used it would
 automatically get picked.



There is one minor nitpick with this statement.

So far we've assumed that the solr index will be used for full-text queries
only. the only fallback you could use if the solr index becomes unavailable
is the lucene one, but as far as I know we've said that you would usually
use one _or_ the other.
Areas of concern are: the full-text indexing settings may differ, and the
cost output may need to be tricked into treating the local lucene index as
a fallback and not a competing full-text index.
But this is definitely doable.




 BR,

 Jukka Zitting



Re: Question: In Repository Index file.

2013-12-06 Thread Tommaso Teofili
2013/12/6 Alex Parvulescu alex.parvule...@gmail.com

 Hi,

 On Fri, Dec 6, 2013 at 11:06 AM, Jukka Zitting jukka.zitt...@gmail.com
 wrote:

  Hi,
 
  On Fri, Dec 6, 2013 at 1:25 AM, Ian Boston i...@tfd.co.uk wrote:
   In Oak when a index is stored in the repository, how is it updated
   when the repository is MongoDB backed and there are multiple JVM
   processes connected to the MongoDB ?
 
  That depends on the index and MK implementations. For example the
  PropertyIndex uses an index structure that can be updated concurrently
  when the updates affect different areas of the content repository.
  When using the MongoMK backend concurrent updates to the same nodes
  will automatically be synchronized, and with the SegmentMK (which also
  can be used with MongoDB) all commits against the same journal are
  synchronized. In both cases concurrent updates will automatically get
  resolved.
 
   Also if using SolrCloud as a search index, is it possible to fallback
   to an internal repository stored index if the the SolrCloud index
   becomes unavailable ?
 
  Yes. The query engine will automatically pick the best available index
  for each query execution. If a particular index is not available, then
  the second-best match for those queries that would have used it would
  automatically get picked.
 


 There is one minor nitpick with this statement.

 So far we've assumed that the solr index will be used for full-text queries
 only. the only fallback you could use if the solr index becomes unavailable
 is the lucene one, but as far as I know we've said that you would usually
 use one _or_ the other.
 Areas of concern are: the full-text indexing settings may differ, and the
 cost output may need to be tricked into treating the local lucene index as
 a fallback and not a competing full-text index.
 But this is definitely doable.


good point Alex, and probably we may have to write some tests for the cost
comparison for different queries with one or more running indexes to
eventually tune the cost evaluation to work properly in the different
setups.

Tommaso





 
  BR,
 
  Jukka Zitting
 



Re: Question about MVCC with MongoMK.

2013-12-06 Thread Bertrand Delacretaz
Hi,

On Fri, Dec 6, 2013 at 11:36 AM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 ...The main question here is whether we want to go down that path, as the
 feature is only available with the SegmentMK (at least for now) and
 we've generally wanted to avoid exposing such implementation-specific
 features to higher level code...

Agreed, OTOH the scenario that we're discussing here looks extremely
useful in clustered environments, where managing upgrades and
minimizing downtime is hard. I suspect Ian will agree that having this
in Oak would be very valuable, even if that requires using a specific
microkernel.

-Bertrand


jackrabbit-oak build #2913: Fixed

2013-12-06 Thread Travis CI
Build Update for apache/jackrabbit-oak
-

Build: #2913
Status: Fixed

Duration: 2099 seconds
Commit: a941ff82d2992190ce2af3c4de84fe8b69dda8a5 (trunk)
Author: Jukka Zitting
Message: OAK-17: Modularisation and configuration concept

Simplify SegmentNodeStoreService

git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1548500 
13f79535-47bb-0310-9956-ffa450edef68

View the changeset: 
https://github.com/apache/jackrabbit-oak/compare/0534e919d87f...a941ff82d299

View the full build log and details: 
https://travis-ci.org/apache/jackrabbit-oak/builds/15035603

--
sent by Jukka's Travis notification gateway


Re: Question about Oak search/query.

2013-12-06 Thread Ian Boston
Hi,

On 6 December 2013 16:12, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 Hi all,

 2013/12/6 Jukka Zitting jukka.zitt...@gmail.com

 Hi,

 On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.uk wrote:
  Will the search index contain access control information or will the
  search results be filtered as each result is retrieved ?

 The results will be filtered after the index lookup. It would be
 possible for a custom search index to do the access checks already
 when building/updating the index, but even in that case the query
 engine would still double-check the access rights (the benefit would
 be to avoid having to retrieve and then discard many inaccessible
 hits).


 by the way, probably there's room for some optimization, e.g. very simple
 idea: exclude paths at depth 1 (children of root node) the principle is not
 able to read (which may mean adding them to the query passed to the Index
 implementation), if any, then you'd always have to apply fine grained ACLs
 on the result but maybe excluding some branches from start may help.


Ok, thank you, it is as I thought. It may be possible to work around
it by adding  some properties to make the result dense.




  If the number of terms in the query exceeds the number of terms
  supported by Solr, does the Oak handle that transparently ?

 I'm not sure, you'll need to look at the oak-solr indexing code. Or
 perhaps Tommaso who wrote the code can chime in here.


 sure.
 What limitation are you exactly referring to? Is it the BooleanQuery max
 clause limit [1]?

Yes, I believe its that limit.

Do you know how many that is in the version used by Oak ?


 Regards,
 Tommaso


 [1] :
 http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanQuery.TooManyClauses.html


 BR,

 Jukka Zitting



Re: Question: In Repository Index file.

2013-12-06 Thread Ian Boston
Hi,
Thanks all for the clarification. Good to know there is fallback.

If the Solr index is intended for full text, can it still be used to
build facets on a reasonably well defined set of properties ?

Best Regards
Ian

On 6 December 2013 16:20, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 2013/12/6 Alex Parvulescu alex.parvule...@gmail.com

 Hi,

 On Fri, Dec 6, 2013 at 11:06 AM, Jukka Zitting jukka.zitt...@gmail.com
 wrote:

  Hi,
 
  On Fri, Dec 6, 2013 at 1:25 AM, Ian Boston i...@tfd.co.uk wrote:
   In Oak when a index is stored in the repository, how is it updated
   when the repository is MongoDB backed and there are multiple JVM
   processes connected to the MongoDB ?
 
  That depends on the index and MK implementations. For example the
  PropertyIndex uses an index structure that can be updated concurrently
  when the updates affect different areas of the content repository.
  When using the MongoMK backend concurrent updates to the same nodes
  will automatically be synchronized, and with the SegmentMK (which also
  can be used with MongoDB) all commits against the same journal are
  synchronized. In both cases concurrent updates will automatically get
  resolved.
 
   Also if using SolrCloud as a search index, is it possible to fallback
   to an internal repository stored index if the the SolrCloud index
   becomes unavailable ?
 
  Yes. The query engine will automatically pick the best available index
  for each query execution. If a particular index is not available, then
  the second-best match for those queries that would have used it would
  automatically get picked.
 


 There is one minor nitpick with this statement.

 So far we've assumed that the solr index will be used for full-text queries
 only. the only fallback you could use if the solr index becomes unavailable
 is the lucene one, but as far as I know we've said that you would usually
 use one _or_ the other.
 Areas of concern are: the full-text indexing settings may differ, and the
 cost output may need to be tricked into treating the local lucene index as
 a fallback and not a competing full-text index.
 But this is definitely doable.


 good point Alex, and probably we may have to write some tests for the cost
 comparison for different queries with one or more running indexes to
 eventually tune the cost evaluation to work properly in the different
 setups.

 Tommaso





 
  BR,
 
  Jukka Zitting
 



Re: Question about MVCC with MongoMK.

2013-12-06 Thread Ian Boston
On 6 December 2013 16:44, Bertrand Delacretaz bdelacre...@apache.org wrote:
 Hi,

 On Fri, Dec 6, 2013 at 11:36 AM, Jukka Zitting jukka.zitt...@gmail.com 
 wrote:
 ...The main question here is whether we want to go down that path, as the
 feature is only available with the SegmentMK (at least for now) and
 we've generally wanted to avoid exposing such implementation-specific
 features to higher level code...

 Agreed, OTOH the scenario that we're discussing here looks extremely
 useful in clustered environments, where managing upgrades and
 minimizing downtime is hard. I suspect Ian will agree that having this
 in Oak would be very valuable, even if that requires using a specific
 microkernel.

Yes,
very valuable indeed, and well worth doing (imho, be happy to help if
I am capable) I think, subject to some experimentation it will bring
upgrades on Oak to a new level, especially in large clusters.

Best Regards
Ian



 -Bertrand


Re: Question about Oak search/query.

2013-12-06 Thread Tommaso Teofili
Hi,

2013/12/6 Ian Boston i...@tfd.co.uk

 Hi,

 On 6 December 2013 16:12, Tommaso Teofili tommaso.teof...@gmail.com
 wrote:
  Hi all,
 
  2013/12/6 Jukka Zitting jukka.zitt...@gmail.com
 
  Hi,
 
  On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.uk wrote:
   Will the search index contain access control information or will the
   search results be filtered as each result is retrieved ?
 
  The results will be filtered after the index lookup. It would be
  possible for a custom search index to do the access checks already
  when building/updating the index, but even in that case the query
  engine would still double-check the access rights (the benefit would
  be to avoid having to retrieve and then discard many inaccessible
  hits).
 
 
  by the way, probably there's room for some optimization, e.g. very simple
  idea: exclude paths at depth 1 (children of root node) the principle is
 not
  able to read (which may mean adding them to the query passed to the Index
  implementation), if any, then you'd always have to apply fine grained
 ACLs
  on the result but maybe excluding some branches from start may help.


 Ok, thank you, it is as I thought. It may be possible to work around
 it by adding  some properties to make the result dense.

 
 
 
   If the number of terms in the query exceeds the number of terms
   supported by Solr, does the Oak handle that transparently ?
 
  I'm not sure, you'll need to look at the oak-solr indexing code. Or
  perhaps Tommaso who wrote the code can chime in here.
 
 
  sure.
  What limitation are you exactly referring to? Is it the BooleanQuery max
  clause limit [1]?

 Yes, I believe its that limit.

 Do you know how many that is in the version used by Oak ?


At the moment Oak has the Solr dependency with scope provided, version
4.1.0 (which uses Lucene 4.1.0) so that one could use from 4.1.x to the
latest (4.6.0 right now).
Default is 1024.

Regards,
Tommaso



 
  Regards,
  Tommaso
 
 
  [1] :
 
 http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanQuery.TooManyClauses.html
 
 
  BR,
 
  Jukka Zitting
 



Re: Question about Oak search/query.

2013-12-06 Thread Ian Boston
On Friday, December 6, 2013, Tommaso Teofili wrote:

 Hi,

 2013/12/6 Ian Boston i...@tfd.co.uk javascript:;

  Hi,
 
  On 6 December 2013 16:12, Tommaso Teofili 
  tommaso.teof...@gmail.comjavascript:;
 
  wrote:
   Hi all,
  
   2013/12/6 Jukka Zitting jukka.zitt...@gmail.com javascript:;
  
   Hi,
  
   On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.ukjavascript:;
 wrote:
Will the search index contain access control information or will the
search results be filtered as each result is retrieved ?
  
   The results will be filtered after the index lookup. It would be
   possible for a custom search index to do the access checks already
   when building/updating the index, but even in that case the query
   engine would still double-check the access rights (the benefit would
   be to avoid having to retrieve and then discard many inaccessible
   hits).
  
  
   by the way, probably there's room for some optimization, e.g. very
 simple
   idea: exclude paths at depth 1 (children of root node) the principle is
  not
   able to read (which may mean adding them to the query passed to the
 Index
   implementation), if any, then you'd always have to apply fine grained
  ACLs
   on the result but maybe excluding some branches from start may help.
 
 
  Ok, thank you, it is as I thought. It may be possible to work around
  it by adding  some properties to make the result dense.
 
  
  
  
If the number of terms in the query exceeds the number of terms
supported by Solr, does the Oak handle that transparently ?
  
   I'm not sure, you'll need to look at the oak-solr indexing code. Or
   perhaps Tommaso who wrote the code can chime in here.
  
  
   sure.
   What limitation are you exactly referring to? Is it the BooleanQuery
 max
   clause limit [1]?
 
  Yes, I believe its that limit.
 
  Do you know how many that is in the version used by Oak ?
 

 At the moment Oak has the Solr dependency with scope provided, version
 4.1.0 (which uses Lucene 4.1.0) so that one could use from 4.1.x to the
 latest (4.6.0 right now).
 Default is 1024.


Hi,

Thank you. All questions on this thread answered.

Best regards
Ian





 Regards,
 Tommaso


 
  
   Regards,
   Tommaso
  
  
   [1] :
  
 
 http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanQuery.TooManyClauses.html
  
  
   BR,
  
   Jukka Zitting
  
 



Re: Question: In Repository Index file.

2013-12-06 Thread Ian Boston
On Friday, December 6, 2013, Alex Parvulescu wrote:

 if we are being technical :)

 the property index already has all the info you'd need, just list all the
 keys and you get the facets (not so easy for item counts for one facet
 though).



Do IIUC?
The facet support[1] in the Solr Api is not exposed except by going direct
to the Solr Api?

Best regards
Ian

1. http://wiki.apache.org/solr/SolrFacetingOverview










 On Fri, Dec 6, 2013 at 4:58 PM, Tommaso Teofili
 tommaso.teof...@gmail.com javascript:;wrote:

  Hi,
 
  2013/12/6 Ian Boston i...@tfd.co.uk
 
   Hi,
   Thanks all for the clarification. Good to know there is fallback.
  
   If the Solr index is intended for full text, can it still be used to
   build facets on a reasonably well defined set of properties ?
  
 
  technically speaking of course, we may also support facets for the Lucene
  index [1].
  What I wonder is if / how we could expose them on the JCR API level.
  Any idea?
 
  Regards,
  Tommaso
 
  [1] :
 
 
 http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-files/userguide.html
 
 
  
   Best Regards
   Ian
  
   On 6 December 2013 16:20, Tommaso Teofili tommaso.teof...@gmail.com
   wrote:
2013/12/6 Alex Parvulescu alex.parvule...@gmail.com
   
Hi,
   
On Fri, Dec 6, 2013 at 11:06 AM, Jukka Zitting 
  jukka.zitt...@gmail.com
wrote:
   
 Hi,

 On Fri, Dec 6, 2013 at 1:25 AM, Ian Boston i...@tfd.co.uk wrote:
  In Oak when a index is stored in the repository, how is it
 updated
  when the repository is MongoDB backed and there are multiple JVM
  processes connected to the MongoDB ?

 That depends on the index and MK implementations. For example the
 PropertyIndex uses an index structure that can be updated
  concurrently
 when the updates affect different areas of the content repository.
 When using the MongoMK backend concurrent updates to the same
 nodes
 will automatically be synchronized, and with the SegmentMK (which
  also
 can be used with MongoDB) all commits against the same journal are
 synchronized. In both cases concurrent updates will automatically
  get
 resolved.

  Also if using SolrCloud as a search index, is it possible to
   fallback
  to an internal repository stored index if the the SolrCloud
 index
  becomes unavailable ?

 Yes. The query engine will automatically pick the best available
  index
 for each query execution. If a particular index is not available,
  then
 the second-best match for those queries that would have used it
  would
 automatically get picked.

   
   
There is one minor nitpick with this statement.
   
So far we've assumed that the solr index will be used for full-text
   queries
only. the only fallback you could use if the solr index becomes
   unavailable
is the lucene one, but as far as I know we've said that you would
   usually
use one _or_ the other.
Areas of concern are: the full-text indexing settings may differ,
 and
   the
cost output may need to be tricked into treating the local lucene
  index
   as
a fallback and not a competing full-text index.
But this is definitely doable.
   
   
good point Alex, and probably we may have to write some tests for the
   cost
comparison for different queries with one or more running indexes to
eventually tune the cost evaluation to work properly in the differ