Re: Question about Oak search/query.
Hi, On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.uk wrote: Will the search index contain access control information or will the search results be filtered as each result is retrieved ? The results will be filtered after the index lookup. It would be possible for a custom search index to do the access checks already when building/updating the index, but even in that case the query engine would still double-check the access rights (the benefit would be to avoid having to retrieve and then discard many inaccessible hits). If the number of terms in the query exceeds the number of terms supported by Solr, does the Oak handle that transparently ? I'm not sure, you'll need to look at the oak-solr indexing code. Or perhaps Tommaso who wrote the code can chime in here. BR, Jukka Zitting
Re: Question about MVCC with MongoMK.
Hi, On Thu, Dec 5, 2013 at 9:43 PM, Ian Boston i...@tfd.co.uk wrote: Is it possible to branch an Oak repository and maintain a detached root node for a period of time that one or more Oak instances attached to a MongoDB instance can follow for a short period of time before merging the branch back into the main tree ? The SegmentMK (with the MongoDB backend) can do this using the hierarchical journal feature. The SegmentMK maintains one or more journals that each track the evolution of a particular branch of the repository. These branches would normally be automatically merged back to the root journal, but a particular deployment could easily disable automatic merging for a particular branch and use it for a purpose like the one you described. BR, Jukka Zitting
Re: Question about MVCC with MongoMK.
Hi, On Fri, Dec 6, 2013 at 11:12 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: ...The SegmentMK (with the MongoDB backend) can do this using the hierarchical journal feature The SegmentMK maintains one or more journals that each track the evolution of a particular branch of the repository. These branches would normally be automatically merged back to the root journal, but a particular deployment could easily disable automatic merging for a particular branch and use it for a purpose like the one you described Is there a way to tell the repository to start operating on such a branch forever for a given client? IIUC Ian's scenario, an application instance would tell Oak create a BEFORE_UPGRADE branch and start working on that from now on so that the content can be upgraded in the background and tested on other application instances, before eventually merging the BEFORE_UPGRADE branch back. Is that possible today, or reasonably simple to implement? If yes that would enable such a scenario with minimal application changes, which sounds extremely useful. -Bertrand
Re: Question about MVCC with MongoMK.
Hi, On Fri, Dec 6, 2013 at 5:26 AM, Bertrand Delacretaz bdelacre...@apache.org wrote: Is that possible today, or reasonably simple to implement? If yes that would enable such a scenario with minimal application changes, which sounds extremely useful. It's not available yet, but shouldn't be too difficult to implement. The main question here is whether we want to go down that path, as the feature is only available with the SegmentMK (at least for now) and we've generally wanted to avoid exposing such implementation-specific features to higher level code. BR, Jukka Zitting
Re: Question about Oak search/query.
Hi all, 2013/12/6 Jukka Zitting jukka.zitt...@gmail.com Hi, On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.uk wrote: Will the search index contain access control information or will the search results be filtered as each result is retrieved ? The results will be filtered after the index lookup. It would be possible for a custom search index to do the access checks already when building/updating the index, but even in that case the query engine would still double-check the access rights (the benefit would be to avoid having to retrieve and then discard many inaccessible hits). by the way, probably there's room for some optimization, e.g. very simple idea: exclude paths at depth 1 (children of root node) the principle is not able to read (which may mean adding them to the query passed to the Index implementation), if any, then you'd always have to apply fine grained ACLs on the result but maybe excluding some branches from start may help. If the number of terms in the query exceeds the number of terms supported by Solr, does the Oak handle that transparently ? I'm not sure, you'll need to look at the oak-solr indexing code. Or perhaps Tommaso who wrote the code can chime in here. sure. What limitation are you exactly referring to? Is it the BooleanQuery max clause limit [1]? Regards, Tommaso [1] : http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanQuery.TooManyClauses.html BR, Jukka Zitting
Re: Question: In Repository Index file.
Hi, On Fri, Dec 6, 2013 at 11:06 AM, Jukka Zitting jukka.zitt...@gmail.comwrote: Hi, On Fri, Dec 6, 2013 at 1:25 AM, Ian Boston i...@tfd.co.uk wrote: In Oak when a index is stored in the repository, how is it updated when the repository is MongoDB backed and there are multiple JVM processes connected to the MongoDB ? That depends on the index and MK implementations. For example the PropertyIndex uses an index structure that can be updated concurrently when the updates affect different areas of the content repository. When using the MongoMK backend concurrent updates to the same nodes will automatically be synchronized, and with the SegmentMK (which also can be used with MongoDB) all commits against the same journal are synchronized. In both cases concurrent updates will automatically get resolved. Also if using SolrCloud as a search index, is it possible to fallback to an internal repository stored index if the the SolrCloud index becomes unavailable ? Yes. The query engine will automatically pick the best available index for each query execution. If a particular index is not available, then the second-best match for those queries that would have used it would automatically get picked. There is one minor nitpick with this statement. So far we've assumed that the solr index will be used for full-text queries only. the only fallback you could use if the solr index becomes unavailable is the lucene one, but as far as I know we've said that you would usually use one _or_ the other. Areas of concern are: the full-text indexing settings may differ, and the cost output may need to be tricked into treating the local lucene index as a fallback and not a competing full-text index. But this is definitely doable. BR, Jukka Zitting
Re: Question: In Repository Index file.
2013/12/6 Alex Parvulescu alex.parvule...@gmail.com Hi, On Fri, Dec 6, 2013 at 11:06 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: Hi, On Fri, Dec 6, 2013 at 1:25 AM, Ian Boston i...@tfd.co.uk wrote: In Oak when a index is stored in the repository, how is it updated when the repository is MongoDB backed and there are multiple JVM processes connected to the MongoDB ? That depends on the index and MK implementations. For example the PropertyIndex uses an index structure that can be updated concurrently when the updates affect different areas of the content repository. When using the MongoMK backend concurrent updates to the same nodes will automatically be synchronized, and with the SegmentMK (which also can be used with MongoDB) all commits against the same journal are synchronized. In both cases concurrent updates will automatically get resolved. Also if using SolrCloud as a search index, is it possible to fallback to an internal repository stored index if the the SolrCloud index becomes unavailable ? Yes. The query engine will automatically pick the best available index for each query execution. If a particular index is not available, then the second-best match for those queries that would have used it would automatically get picked. There is one minor nitpick with this statement. So far we've assumed that the solr index will be used for full-text queries only. the only fallback you could use if the solr index becomes unavailable is the lucene one, but as far as I know we've said that you would usually use one _or_ the other. Areas of concern are: the full-text indexing settings may differ, and the cost output may need to be tricked into treating the local lucene index as a fallback and not a competing full-text index. But this is definitely doable. good point Alex, and probably we may have to write some tests for the cost comparison for different queries with one or more running indexes to eventually tune the cost evaluation to work properly in the different setups. Tommaso BR, Jukka Zitting
Re: Question about MVCC with MongoMK.
Hi, On Fri, Dec 6, 2013 at 11:36 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: ...The main question here is whether we want to go down that path, as the feature is only available with the SegmentMK (at least for now) and we've generally wanted to avoid exposing such implementation-specific features to higher level code... Agreed, OTOH the scenario that we're discussing here looks extremely useful in clustered environments, where managing upgrades and minimizing downtime is hard. I suspect Ian will agree that having this in Oak would be very valuable, even if that requires using a specific microkernel. -Bertrand
jackrabbit-oak build #2913: Fixed
Build Update for apache/jackrabbit-oak - Build: #2913 Status: Fixed Duration: 2099 seconds Commit: a941ff82d2992190ce2af3c4de84fe8b69dda8a5 (trunk) Author: Jukka Zitting Message: OAK-17: Modularisation and configuration concept Simplify SegmentNodeStoreService git-svn-id: https://svn.apache.org/repos/asf/jackrabbit/oak/trunk@1548500 13f79535-47bb-0310-9956-ffa450edef68 View the changeset: https://github.com/apache/jackrabbit-oak/compare/0534e919d87f...a941ff82d299 View the full build log and details: https://travis-ci.org/apache/jackrabbit-oak/builds/15035603 -- sent by Jukka's Travis notification gateway
Re: Question about Oak search/query.
Hi, On 6 December 2013 16:12, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, 2013/12/6 Jukka Zitting jukka.zitt...@gmail.com Hi, On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.uk wrote: Will the search index contain access control information or will the search results be filtered as each result is retrieved ? The results will be filtered after the index lookup. It would be possible for a custom search index to do the access checks already when building/updating the index, but even in that case the query engine would still double-check the access rights (the benefit would be to avoid having to retrieve and then discard many inaccessible hits). by the way, probably there's room for some optimization, e.g. very simple idea: exclude paths at depth 1 (children of root node) the principle is not able to read (which may mean adding them to the query passed to the Index implementation), if any, then you'd always have to apply fine grained ACLs on the result but maybe excluding some branches from start may help. Ok, thank you, it is as I thought. It may be possible to work around it by adding some properties to make the result dense. If the number of terms in the query exceeds the number of terms supported by Solr, does the Oak handle that transparently ? I'm not sure, you'll need to look at the oak-solr indexing code. Or perhaps Tommaso who wrote the code can chime in here. sure. What limitation are you exactly referring to? Is it the BooleanQuery max clause limit [1]? Yes, I believe its that limit. Do you know how many that is in the version used by Oak ? Regards, Tommaso [1] : http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanQuery.TooManyClauses.html BR, Jukka Zitting
Re: Question: In Repository Index file.
Hi, Thanks all for the clarification. Good to know there is fallback. If the Solr index is intended for full text, can it still be used to build facets on a reasonably well defined set of properties ? Best Regards Ian On 6 December 2013 16:20, Tommaso Teofili tommaso.teof...@gmail.com wrote: 2013/12/6 Alex Parvulescu alex.parvule...@gmail.com Hi, On Fri, Dec 6, 2013 at 11:06 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: Hi, On Fri, Dec 6, 2013 at 1:25 AM, Ian Boston i...@tfd.co.uk wrote: In Oak when a index is stored in the repository, how is it updated when the repository is MongoDB backed and there are multiple JVM processes connected to the MongoDB ? That depends on the index and MK implementations. For example the PropertyIndex uses an index structure that can be updated concurrently when the updates affect different areas of the content repository. When using the MongoMK backend concurrent updates to the same nodes will automatically be synchronized, and with the SegmentMK (which also can be used with MongoDB) all commits against the same journal are synchronized. In both cases concurrent updates will automatically get resolved. Also if using SolrCloud as a search index, is it possible to fallback to an internal repository stored index if the the SolrCloud index becomes unavailable ? Yes. The query engine will automatically pick the best available index for each query execution. If a particular index is not available, then the second-best match for those queries that would have used it would automatically get picked. There is one minor nitpick with this statement. So far we've assumed that the solr index will be used for full-text queries only. the only fallback you could use if the solr index becomes unavailable is the lucene one, but as far as I know we've said that you would usually use one _or_ the other. Areas of concern are: the full-text indexing settings may differ, and the cost output may need to be tricked into treating the local lucene index as a fallback and not a competing full-text index. But this is definitely doable. good point Alex, and probably we may have to write some tests for the cost comparison for different queries with one or more running indexes to eventually tune the cost evaluation to work properly in the different setups. Tommaso BR, Jukka Zitting
Re: Question about MVCC with MongoMK.
On 6 December 2013 16:44, Bertrand Delacretaz bdelacre...@apache.org wrote: Hi, On Fri, Dec 6, 2013 at 11:36 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: ...The main question here is whether we want to go down that path, as the feature is only available with the SegmentMK (at least for now) and we've generally wanted to avoid exposing such implementation-specific features to higher level code... Agreed, OTOH the scenario that we're discussing here looks extremely useful in clustered environments, where managing upgrades and minimizing downtime is hard. I suspect Ian will agree that having this in Oak would be very valuable, even if that requires using a specific microkernel. Yes, very valuable indeed, and well worth doing (imho, be happy to help if I am capable) I think, subject to some experimentation it will bring upgrades on Oak to a new level, especially in large clusters. Best Regards Ian -Bertrand
Re: Question about Oak search/query.
Hi, 2013/12/6 Ian Boston i...@tfd.co.uk Hi, On 6 December 2013 16:12, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, 2013/12/6 Jukka Zitting jukka.zitt...@gmail.com Hi, On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.uk wrote: Will the search index contain access control information or will the search results be filtered as each result is retrieved ? The results will be filtered after the index lookup. It would be possible for a custom search index to do the access checks already when building/updating the index, but even in that case the query engine would still double-check the access rights (the benefit would be to avoid having to retrieve and then discard many inaccessible hits). by the way, probably there's room for some optimization, e.g. very simple idea: exclude paths at depth 1 (children of root node) the principle is not able to read (which may mean adding them to the query passed to the Index implementation), if any, then you'd always have to apply fine grained ACLs on the result but maybe excluding some branches from start may help. Ok, thank you, it is as I thought. It may be possible to work around it by adding some properties to make the result dense. If the number of terms in the query exceeds the number of terms supported by Solr, does the Oak handle that transparently ? I'm not sure, you'll need to look at the oak-solr indexing code. Or perhaps Tommaso who wrote the code can chime in here. sure. What limitation are you exactly referring to? Is it the BooleanQuery max clause limit [1]? Yes, I believe its that limit. Do you know how many that is in the version used by Oak ? At the moment Oak has the Solr dependency with scope provided, version 4.1.0 (which uses Lucene 4.1.0) so that one could use from 4.1.x to the latest (4.6.0 right now). Default is 1024. Regards, Tommaso Regards, Tommaso [1] : http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanQuery.TooManyClauses.html BR, Jukka Zitting
Re: Question about Oak search/query.
On Friday, December 6, 2013, Tommaso Teofili wrote: Hi, 2013/12/6 Ian Boston i...@tfd.co.uk javascript:; Hi, On 6 December 2013 16:12, Tommaso Teofili tommaso.teof...@gmail.comjavascript:; wrote: Hi all, 2013/12/6 Jukka Zitting jukka.zitt...@gmail.com javascript:; Hi, On Thu, Dec 5, 2013 at 9:36 PM, Ian Boston i...@tfd.co.ukjavascript:; wrote: Will the search index contain access control information or will the search results be filtered as each result is retrieved ? The results will be filtered after the index lookup. It would be possible for a custom search index to do the access checks already when building/updating the index, but even in that case the query engine would still double-check the access rights (the benefit would be to avoid having to retrieve and then discard many inaccessible hits). by the way, probably there's room for some optimization, e.g. very simple idea: exclude paths at depth 1 (children of root node) the principle is not able to read (which may mean adding them to the query passed to the Index implementation), if any, then you'd always have to apply fine grained ACLs on the result but maybe excluding some branches from start may help. Ok, thank you, it is as I thought. It may be possible to work around it by adding some properties to make the result dense. If the number of terms in the query exceeds the number of terms supported by Solr, does the Oak handle that transparently ? I'm not sure, you'll need to look at the oak-solr indexing code. Or perhaps Tommaso who wrote the code can chime in here. sure. What limitation are you exactly referring to? Is it the BooleanQuery max clause limit [1]? Yes, I believe its that limit. Do you know how many that is in the version used by Oak ? At the moment Oak has the Solr dependency with scope provided, version 4.1.0 (which uses Lucene 4.1.0) so that one could use from 4.1.x to the latest (4.6.0 right now). Default is 1024. Hi, Thank you. All questions on this thread answered. Best regards Ian Regards, Tommaso Regards, Tommaso [1] : http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanQuery.TooManyClauses.html BR, Jukka Zitting
Re: Question: In Repository Index file.
On Friday, December 6, 2013, Alex Parvulescu wrote: if we are being technical :) the property index already has all the info you'd need, just list all the keys and you get the facets (not so easy for item counts for one facet though). Do IIUC? The facet support[1] in the Solr Api is not exposed except by going direct to the Solr Api? Best regards Ian 1. http://wiki.apache.org/solr/SolrFacetingOverview On Fri, Dec 6, 2013 at 4:58 PM, Tommaso Teofili tommaso.teof...@gmail.com javascript:;wrote: Hi, 2013/12/6 Ian Boston i...@tfd.co.uk Hi, Thanks all for the clarification. Good to know there is fallback. If the Solr index is intended for full text, can it still be used to build facets on a reasonably well defined set of properties ? technically speaking of course, we may also support facets for the Lucene index [1]. What I wonder is if / how we could expose them on the JCR API level. Any idea? Regards, Tommaso [1] : http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-files/userguide.html Best Regards Ian On 6 December 2013 16:20, Tommaso Teofili tommaso.teof...@gmail.com wrote: 2013/12/6 Alex Parvulescu alex.parvule...@gmail.com Hi, On Fri, Dec 6, 2013 at 11:06 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: Hi, On Fri, Dec 6, 2013 at 1:25 AM, Ian Boston i...@tfd.co.uk wrote: In Oak when a index is stored in the repository, how is it updated when the repository is MongoDB backed and there are multiple JVM processes connected to the MongoDB ? That depends on the index and MK implementations. For example the PropertyIndex uses an index structure that can be updated concurrently when the updates affect different areas of the content repository. When using the MongoMK backend concurrent updates to the same nodes will automatically be synchronized, and with the SegmentMK (which also can be used with MongoDB) all commits against the same journal are synchronized. In both cases concurrent updates will automatically get resolved. Also if using SolrCloud as a search index, is it possible to fallback to an internal repository stored index if the the SolrCloud index becomes unavailable ? Yes. The query engine will automatically pick the best available index for each query execution. If a particular index is not available, then the second-best match for those queries that would have used it would automatically get picked. There is one minor nitpick with this statement. So far we've assumed that the solr index will be used for full-text queries only. the only fallback you could use if the solr index becomes unavailable is the lucene one, but as far as I know we've said that you would usually use one _or_ the other. Areas of concern are: the full-text indexing settings may differ, and the cost output may need to be tricked into treating the local lucene index as a fallback and not a competing full-text index. But this is definitely doable. good point Alex, and probably we may have to write some tests for the cost comparison for different queries with one or more running indexes to eventually tune the cost evaluation to work properly in the differ