date:20131206

Re: Question about Oak search/query.

2013-12-06 Thread Jukka Zitting

Hi,

On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.uk wrote:
 Will the search index contain access control information or will the
 search results be filtered as each result is retrieved ?

The results will be filtered after the index lookup. It would be
possible for a custom search index to do the access checks already
when building/updating the index, but even in that case the query
engine would still double-check the access rights (the benefit would
be to avoid having to retrieve and then discard many inaccessible
hits).

 If the number of terms in the query exceeds the number of terms
 supported by Solr, does the Oak handle that transparently ?

I'm not sure, you'll need to look at the oak-solr indexing code. Or
perhaps Tommaso who wrote the code can chime in here.

BR,

Jukka Zitting

Re: Question about MVCC with MongoMK.

2013-12-06 Thread Jukka Zitting

Hi,

On Thu, Dec 5, 2013 at 9:43 PM, Ian Boston i...@tfd.co.uk wrote:
 Is it possible to branch an Oak repository and maintain a detached
 root node for a period of time that one or more Oak instances attached
 to a MongoDB instance can follow for a short period of time before
 merging the branch back into the main tree ?

The SegmentMK (with the MongoDB backend) can do this using the
hierarchical journal feature. The SegmentMK maintains one or more
journals that each track the evolution of a particular branch of the
repository. These branches would normally be automatically merged back
to the root journal, but a particular deployment could easily
disable automatic merging for a particular branch and use it for a
purpose like the one you described.

BR,

Jukka Zitting

Re: Question about MVCC with MongoMK.

2013-12-06 Thread Bertrand Delacretaz

Hi,

On Fri, Dec 6, 2013 at 11:12 AM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 ...The SegmentMK (with the MongoDB backend) can do this using the
 hierarchical journal feature The SegmentMK maintains one or more
 journals that each track the evolution of a particular branch of the
 repository. These branches would normally be automatically merged back
 to the root journal, but a particular deployment could easily
 disable automatic merging for a particular branch and use it for a
 purpose like the one you described

Is there a way to tell the repository to start operating on such a
branch forever for a given client?

IIUC Ian's scenario, an application instance would tell Oak create a
BEFORE_UPGRADE branch and start working on that from now on so that
the content can be upgraded in the background and tested on other
application instances, before eventually merging the BEFORE_UPGRADE
branch back.

Is that possible today, or reasonably simple to implement? If yes that
would enable such a scenario with minimal application changes, which
sounds extremely useful.

-Bertrand

Re: Question about MVCC with MongoMK.

2013-12-06 Thread Jukka Zitting

Hi,

On Fri, Dec 6, 2013 at 5:26 AM, Bertrand Delacretaz
bdelacre...@apache.org wrote:
 Is that possible today, or reasonably simple to implement? If yes that
 would enable such a scenario with minimal application changes, which
 sounds extremely useful.

It's not available yet, but shouldn't be too difficult to implement.

The main question here is whether we want to go down that path, as the
feature is only available with the SegmentMK (at least for now) and
we've generally wanted to avoid exposing such implementation-specific
features to higher level code.

BR,

Jukka Zitting

Re: Question about Oak search/query.

2013-12-06 Thread Tommaso Teofili

Hi all,

2013/12/6 Jukka Zitting jukka.zitt...@gmail.com

 Hi,

 On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.uk wrote:
  Will the search index contain access control information or will the
  search results be filtered as each result is retrieved ?

 The results will be filtered after the index lookup. It would be
 possible for a custom search index to do the access checks already
 when building/updating the index, but even in that case the query
 engine would still double-check the access rights (the benefit would
 be to avoid having to retrieve and then discard many inaccessible
 hits).


by the way, probably there's room for some optimization, e.g. very simple
idea: exclude paths at depth 1 (children of root node) the principle is not
able to read (which may mean adding them to the query passed to the Index
implementation), if any, then you'd always have to apply fine grained ACLs
on the result but maybe excluding some branches from start may help.



  If the number of terms in the query exceeds the number of terms
  supported by Solr, does the Oak handle that transparently ?

 I'm not sure, you'll need to look at the oak-solr indexing code. Or
 perhaps Tommaso who wrote the code can chime in here.


sure.
What limitation are you exactly referring to? Is it the BooleanQuery max
clause limit [1]?

Regards,
Tommaso


[1] :
http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanQuery.TooManyClauses.html


 BR,

 Jukka Zitting

Re: Question: In Repository Index file.

2013-12-06 Thread Alex Parvulescu

Hi,

On Fri, Dec 6, 2013 at 11:06 AM, Jukka Zitting jukka.zitt...@gmail.comwrote:

 Hi,

 On Fri, Dec 6, 2013 at 1:25 AM, Ian Boston i...@tfd.co.uk wrote:
  In Oak when a index is stored in the repository, how is it updated
  when the repository is MongoDB backed and there are multiple JVM
  processes connected to the MongoDB ?

 That depends on the index and MK implementations. For example the
 PropertyIndex uses an index structure that can be updated concurrently
 when the updates affect different areas of the content repository.
 When using the MongoMK backend concurrent updates to the same nodes
 will automatically be synchronized, and with the SegmentMK (which also
 can be used with MongoDB) all commits against the same journal are
 synchronized. In both cases concurrent updates will automatically get
 resolved.

  Also if using SolrCloud as a search index, is it possible to fallback
  to an internal repository stored index if the the SolrCloud index
  becomes unavailable ?

 Yes. The query engine will automatically pick the best available index
 for each query execution. If a particular index is not available, then
 the second-best match for those queries that would have used it would
 automatically get picked.



There is one minor nitpick with this statement.

So far we've assumed that the solr index will be used for full-text queries
only. the only fallback you could use if the solr index becomes unavailable
is the lucene one, but as far as I know we've said that you would usually
use one _or_ the other.
Areas of concern are: the full-text indexing settings may differ, and the
cost output may need to be tricked into treating the local lucene index as
a fallback and not a competing full-text index.
But this is definitely doable.




 BR,

 Jukka Zitting

Re: Question: In Repository Index file.

2013-12-06 Thread Tommaso Teofili

2013/12/6 Alex Parvulescu alex.parvule...@gmail.com

 Hi,

 On Fri, Dec 6, 2013 at 11:06 AM, Jukka Zitting jukka.zitt...@gmail.com
 wrote:

  Hi,
 
  On Fri, Dec 6, 2013 at 1:25 AM, Ian Boston i...@tfd.co.uk wrote:
   In Oak when a index is stored in the repository, how is it updated
   when the repository is MongoDB backed and there are multiple JVM
   processes connected to the MongoDB ?
 
  That depends on the index and MK implementations. For example the
  PropertyIndex uses an index structure that can be updated concurrently
  when the updates affect different areas of the content repository.
  When using the MongoMK backend concurrent updates to the same nodes
  will automatically be synchronized, and with the SegmentMK (which also
  can be used with MongoDB) all commits against the same journal are
  synchronized. In both cases concurrent updates will automatically get
  resolved.
 
   Also if using SolrCloud as a search index, is it possible to fallback
   to an internal repository stored index if the the SolrCloud index
   becomes unavailable ?
 
  Yes. The query engine will automatically pick the best available index
  for each query execution. If a particular index is not available, then
  the second-best match for those queries that would have used it would
  automatically get picked.
 


 There is one minor nitpick with this statement.

 So far we've assumed that the solr index will be used for full-text queries
 only. the only fallback you could use if the solr index becomes unavailable
 is the lucene one, but as far as I know we've said that you would usually
 use one _or_ the other.
 Areas of concern are: the full-text indexing settings may differ, and the
 cost output may need to be tricked into treating the local lucene index as
 a fallback and not a competing full-text index.
 But this is definitely doable.


good point Alex, and probably we may have to write some tests for the cost
comparison for different queries with one or more running indexes to
eventually tune the cost evaluation to work properly in the different
setups.

Tommaso





 
  BR,
 
  Jukka Zitting

Re: Question about MVCC with MongoMK.

2013-12-06 Thread Bertrand Delacretaz

Hi,

On Fri, Dec 6, 2013 at 11:36 AM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 ...The main question here is whether we want to go down that path, as the
 feature is only available with the SegmentMK (at least for now) and
 we've generally wanted to avoid exposing such implementation-specific
 features to higher level code...

Agreed, OTOH the scenario that we're discussing here looks extremely
useful in clustered environments, where managing upgrades and
minimizing downtime is hard. I suspect Ian will agree that having this
in Oak would be very valuable, even if that requires using a specific
microkernel.

-Bertrand

jackrabbit-oak build #2913: Fixed

2013-12-06 Thread Travis CI

Build Update for apache/jackrabbit-oak
-

Build: #2913
Status: Fixed

Duration: 2099 seconds
Commit: a941ff82d2992190ce2af3c4de84fe8b69dda8a5 (trunk)
Author: Jukka Zitting
Message: OAK-17: Modularisation and configuration concept

Simplify SegmentNodeStoreService

git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1548500 
13f79535-47bb-0310-9956-ffa450edef68

View the changeset: 
https://github.com/apache/jackrabbit-oak/compare/0534e919d87f...a941ff82d299

View the full build log and details: 
https://travis-ci.org/apache/jackrabbit-oak/builds/15035603

--
sent by Jukka's Travis notification gateway

Re: Question about Oak search/query.

2013-12-06 Thread Ian Boston

Hi,

On 6 December 2013 16:12, Tommaso Teofili tommaso.teof...@gmail.com wrote:
Hi all,

2013/12/6 Jukka Zitting jukka.zitt...@gmail.com

Hi,

On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.uk wrote:
Will the search index contain access control information or will the
search results be filtered as each result is retrieved ?

The results will be filtered after the index lookup. It would be
possible for a custom search index to do the access checks already
when building/updating the index, but even in that case the query
engine would still double-check the access rights (the benefit would
be to avoid having to retrieve and then discard many inaccessible
hits).

by the way, probably there's room for some optimization, e.g. very simple
idea: exclude paths at depth 1 (children of root node) the principle is not
able to read (which may mean adding them to the query passed to the Index
implementation), if any, then you'd always have to apply fine grained ACLs
on the result but maybe excluding some branches from start may help.

Ok, thank you, it is as I thought. It may be possible to work around
it by adding some properties to make the result dense.

If the number of terms in the query exceeds the number of terms
supported by Solr, does the Oak handle that transparently ?

I'm not sure, you'll need to look at the oak-solr indexing code. Or
perhaps Tommaso who wrote the code can chime in here.

sure.
What limitation are you exactly referring to? Is it the BooleanQuery max
clause limit [1]?

Yes, I believe its that limit.

Do you know how many that is in the version used by Oak ?

Regards,
Tommaso

[1] :
http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanQuery.TooManyClauses.html

BR,

Jukka Zitting

Re: Question: In Repository Index file.

2013-12-06 Thread Ian Boston

Hi,
Thanks all for the clarification. Good to know there is fallback.

If the Solr index is intended for full text, can it still be used to
build facets on a reasonably well defined set of properties ?

Best Regards
Ian

On 6 December 2013 16:20, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 2013/12/6 Alex Parvulescu alex.parvule...@gmail.com

 Hi,

 On Fri, Dec 6, 2013 at 11:06 AM, Jukka Zitting jukka.zitt...@gmail.com
 wrote:

  Hi,
 
  On Fri, Dec 6, 2013 at 1:25 AM, Ian Boston i...@tfd.co.uk wrote:
   In Oak when a index is stored in the repository, how is it updated
   when the repository is MongoDB backed and there are multiple JVM
   processes connected to the MongoDB ?
 
  That depends on the index and MK implementations. For example the
  PropertyIndex uses an index structure that can be updated concurrently
  when the updates affect different areas of the content repository.
  When using the MongoMK backend concurrent updates to the same nodes
  will automatically be synchronized, and with the SegmentMK (which also
  can be used with MongoDB) all commits against the same journal are
  synchronized. In both cases concurrent updates will automatically get
  resolved.
 
   Also if using SolrCloud as a search index, is it possible to fallback
   to an internal repository stored index if the the SolrCloud index
   becomes unavailable ?
 
  Yes. The query engine will automatically pick the best available index
  for each query execution. If a particular index is not available, then
  the second-best match for those queries that would have used it would
  automatically get picked.
 


 There is one minor nitpick with this statement.

 So far we've assumed that the solr index will be used for full-text queries
 only. the only fallback you could use if the solr index becomes unavailable
 is the lucene one, but as far as I know we've said that you would usually
 use one _or_ the other.
 Areas of concern are: the full-text indexing settings may differ, and the
 cost output may need to be tricked into treating the local lucene index as
 a fallback and not a competing full-text index.
 But this is definitely doable.


 good point Alex, and probably we may have to write some tests for the cost
 comparison for different queries with one or more running indexes to
 eventually tune the cost evaluation to work properly in the different
 setups.

 Tommaso





 
  BR,
 
  Jukka Zitting

Re: Question about MVCC with MongoMK.

2013-12-06 Thread Ian Boston

On 6 December 2013 16:44, Bertrand Delacretaz bdelacre...@apache.org wrote:
 Hi,

 On Fri, Dec 6, 2013 at 11:36 AM, Jukka Zitting jukka.zitt...@gmail.com 
 wrote:
 ...The main question here is whether we want to go down that path, as the
 feature is only available with the SegmentMK (at least for now) and
 we've generally wanted to avoid exposing such implementation-specific
 features to higher level code...

 Agreed, OTOH the scenario that we're discussing here looks extremely
 useful in clustered environments, where managing upgrades and
 minimizing downtime is hard. I suspect Ian will agree that having this
 in Oak would be very valuable, even if that requires using a specific
 microkernel.

Yes,
very valuable indeed, and well worth doing (imho, be happy to help if
I am capable) I think, subject to some experimentation it will bring
upgrades on Oak to a new level, especially in large clusters.

Best Regards
Ian



 -Bertrand

Re: Question about Oak search/query.

2013-12-06 Thread Tommaso Teofili

Hi,

2013/12/6 Ian Boston i...@tfd.co.uk

Hi,

On 6 December 2013 16:12, Tommaso Teofili tommaso.teof...@gmail.com
wrote:
Hi all,

2013/12/6 Jukka Zitting jukka.zitt...@gmail.com

Hi,

On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.uk wrote:
Will the search index contain access control information or will the
search results be filtered as each result is retrieved ?

by the way, probably there's room for some optimization, e.g. very simple
idea: exclude paths at depth 1 (children of root node) the principle is
not
able to read (which may mean adding them to the query passed to the Index
implementation), if any, then you'd always have to apply fine grained
ACLs
on the result but maybe excluding some branches from start may help.

Ok, thank you, it is as I thought. It may be possible to work around
it by adding some properties to make the result dense.

If the number of terms in the query exceeds the number of terms
supported by Solr, does the Oak handle that transparently ?

I'm not sure, you'll need to look at the oak-solr indexing code. Or
perhaps Tommaso who wrote the code can chime in here.

sure.
What limitation are you exactly referring to? Is it the BooleanQuery max
clause limit [1]?

Yes, I believe its that limit.

Do you know how many that is in the version used by Oak ?

At the moment Oak has the Solr dependency with scope provided, version
4.1.0 (which uses Lucene 4.1.0) so that one could use from 4.1.x to the
latest (4.6.0 right now).
Default is 1024.

Regards,
Tommaso

[1] :

http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanQuery.TooManyClauses.html

BR,

Jukka Zitting

Re: Question about Oak search/query.

2013-12-06 Thread Ian Boston

On Friday, December 6, 2013, Tommaso Teofili wrote:

Hi,

2013/12/6 Ian Boston i...@tfd.co.uk javascript:;

Hi,

On 6 December 2013 16:12, Tommaso Teofili
tommaso.teof...@gmail.comjavascript:;

wrote:
Hi all,

2013/12/6 Jukka Zitting jukka.zitt...@gmail.com javascript:;

Hi,

On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.ukjavascript:;
wrote:
Will the search index contain access control information or will the
search results be filtered as each result is retrieved ?

by the way, probably there's room for some optimization, e.g. very
simple
idea: exclude paths at depth 1 (children of root node) the principle is
not
able to read (which may mean adding them to the query passed to the
Index
implementation), if any, then you'd always have to apply fine grained
ACLs
on the result but maybe excluding some branches from start may help.

Ok, thank you, it is as I thought. It may be possible to work around
it by adding some properties to make the result dense.

If the number of terms in the query exceeds the number of terms
supported by Solr, does the Oak handle that transparently ?

I'm not sure, you'll need to look at the oak-solr indexing code. Or
perhaps Tommaso who wrote the code can chime in here.

sure.
What limitation are you exactly referring to? Is it the BooleanQuery
max
clause limit [1]?

Yes, I believe its that limit.

Do you know how many that is in the version used by Oak ?

At the moment Oak has the Solr dependency with scope provided, version
4.1.0 (which uses Lucene 4.1.0) so that one could use from 4.1.x to the
latest (4.6.0 right now).
Default is 1024.

Hi,

Thank you. All questions on this thread answered.

Best regards
Ian

Regards,
Tommaso

[1] :

http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanQuery.TooManyClauses.html

BR,

Jukka Zitting

Re: Question: In Repository Index file.

2013-12-06 Thread Ian Boston

On Friday, December 6, 2013, Alex Parvulescu wrote:

if we are being technical :)

the property index already has all the info you'd need, just list all the
keys and you get the facets (not so easy for item counts for one facet
though).

Do IIUC?
The facet support[1] in the Solr Api is not exposed except by going direct
to the Solr Api?

Best regards
Ian

1. http://wiki.apache.org/solr/SolrFacetingOverview

On Fri, Dec 6, 2013 at 4:58 PM, Tommaso Teofili
tommaso.teof...@gmail.com javascript:;wrote:

Hi,

2013/12/6 Ian Boston i...@tfd.co.uk

Hi,
Thanks all for the clarification. Good to know there is fallback.

If the Solr index is intended for full text, can it still be used to
build facets on a reasonably well defined set of properties ?

technically speaking of course, we may also support facets for the Lucene
index [1].
What I wonder is if / how we could expose them on the JCR API level.
Any idea?

Regards,
Tommaso

[1] :

http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-files/userguide.html

Best Regards
Ian

On 6 December 2013 16:20, Tommaso Teofili tommaso.teof...@gmail.com
wrote:
2013/12/6 Alex Parvulescu alex.parvule...@gmail.com

Hi,

On Fri, Dec 6, 2013 at 11:06 AM, Jukka Zitting
jukka.zitt...@gmail.com
wrote:

Hi,

On Fri, Dec 6, 2013 at 1:25 AM, Ian Boston i...@tfd.co.uk wrote:
In Oak when a index is stored in the repository, how is it
updated
when the repository is MongoDB backed and there are multiple JVM
processes connected to the MongoDB ?

That depends on the index and MK implementations. For example the
PropertyIndex uses an index structure that can be updated
concurrently
when the updates affect different areas of the content repository.
When using the MongoMK backend concurrent updates to the same
nodes
will automatically be synchronized, and with the SegmentMK (which
also
can be used with MongoDB) all commits against the same journal are
synchronized. In both cases concurrent updates will automatically
get
resolved.

Also if using SolrCloud as a search index, is it possible to
fallback
to an internal repository stored index if the the SolrCloud
index
becomes unavailable ?

Yes. The query engine will automatically pick the best available
index
for each query execution. If a particular index is not available,
then
the second-best match for those queries that would have used it
would
automatically get picked.

There is one minor nitpick with this statement.

So far we've assumed that the solr index will be used for full-text
queries
only. the only fallback you could use if the solr index becomes
unavailable
is the lucene one, but as far as I know we've said that you would
usually
use one _or_ the other.
Areas of concern are: the full-text indexing settings may differ,
and
the
cost output may need to be tricked into treating the local lucene
index
as
a fallback and not a competing full-text index.
But this is definitely doable.

good point Alex, and probably we may have to write some tests for the
cost
comparison for different queries with one or more running indexes to
eventually tune the cost evaluation to work properly in the differ

Re: Question about Oak search/query.

Re: Question about MVCC with MongoMK.

Re: Question about MVCC with MongoMK.

Re: Question about MVCC with MongoMK.

Re: Question about Oak search/query.

Re: Question: In Repository Index file.

Re: Question: In Repository Index file.

Re: Question about MVCC with MongoMK.

jackrabbit-oak build #2913: Fixed

Re: Question about Oak search/query.

Re: Question: In Repository Index file.

Re: Question about MVCC with MongoMK.

Re: Question about Oak search/query.

Re: Question about Oak search/query.

Re: Question: In Repository Index file.

15 matches

Site Navigation

Mail list logo

Footer information