[jira] [Created] (OAK-8010) Lucene index: Index definitions implications of isRegexp

2019-01-29 Thread David Gonzalez (JIRA)
David Gonzalez created OAK-8010:
---

 Summary: Lucene index: Index definitions implications of isRegexp
 Key: OAK-8010
 URL: https://issues.apache.org/jira/browse/OAK-8010
 Project: Jackrabbit Oak
  Issue Type: Documentation
  Components: search
Reporter: David Gonzalez


It would be good to put a sentence in the documentation (in the `isRegexp` 
section of  [1]) citing if there are any adverse implications of using 
`isRegexp` in conjunction with options like `property=true`, `facets=true`, or 
other configs.

 

The common use of `isRegexp` is a way to index many properties, variable 
properties (variable in number). Any implications (or lack thereof) for this 
"broad brushed" approach with should be articulated to guide developers to use 
`isRegexp` in a responsible manner.

 

[1] 
https://jackrabbit.apache.org/oak/docs/query/lucene.html#property-definitions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-6370) Improve documentation for text pre-extraction

2017-06-19 Thread David Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054133#comment-16054133
 ] 

David Gonzalez commented on OAK-6370:
-

think that clarifies the mechanics of the tool…. but i think clarifications on 
its expected uses cases are warranted?

1) Im moving many NEW files into Oak; this tool presumes that the files are 
already in Oak; so what is the safe way(s) to get them in without incurring the 
cost of text extraction? 
1a) How is 1 handled in the context of a new new Oak repo that is not under use 
(ie. you have more leeway configuring/disabling features)
1b) How is 1 handled if you're migrating large sets into a repo being used (ie. 
you cant turn off text extraction wholesale because someone might be normally 
uploading something and expect it to be indexed right away) 

2) Does this extract text from *everything* when you run it? Can you limit it 
to parts of the JCR? Or by new/modified content?

3) Should this be run on the same machine that is running the Oak repo or on a 
different machine w/ the paths mounted?

4) Would this need to be run on every publish instance? Or could you run this 
once (ex. the CSV extraction) and re-purpose that to save time? (Is the time 
saved meaningful? Or is it preferred to run this process once per oak instance?)

> Improve documentation for text pre-extraction
> -
>
> Key: OAK-6370
> URL: https://issues.apache.org/jira/browse/OAK-6370
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>  Components: lucene, run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8
>
>
> The docs for pre-extraction does not make things very clear. This should be 
> improved



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6201) Native Query syntax for rep:native unclear

2017-05-11 Thread David Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006357#comment-16006357
 ] 

David Gonzalez commented on OAK-6201:
-

[~teofili] Agree; IME not many developers that work on oak-platforms have an 
understanding of lucene's underpinnings, making the many nods to lucene 
functionalities confounding; TBH, im still unclear how to derive the field name 
for any bit of data indexed into the lucene document. (which is fine, as this 
should be addressed in the docs, not necessarily in the jira issue). Looking 
forward to the eventual doc updates! 

> Native Query syntax for rep:native unclear
> --
>
> Key: OAK-6201
> URL: https://issues.apache.org/jira/browse/OAK-6201
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>Reporter: David Gonzalez
>Priority: Minor
>
> I was trying to perform a few test queries against an Oak 1.6.x repository 
> using the XPath with the rep:native(..) function and but could not get any to 
> work.
> I suspect my problem is with specifying the field, but i've tried a variety 
> of permutations to no avail. 
> For example, if tried to re-purpose the example XPath query to search over an 
> asset's dc:title [1],
> Generally, i think the documentation [2] is lacking in that 1) unclear on the 
> syntax 2) unclear if i need to do anything special to Oak indexes 3) if only 
> the provided lucene index will service these requests (these are all from a 
> lucene POV, rather than Solr, though assume all the same considerations apply)
> [1] Non-working: {code}
> //element(*, app:Asset)[(rep:native('lucene', 
> 'jcr:contnet/metdata/dc:title:(Hello OR World)'))]
> {code}
> [2] 
> http://jackrabbit.apache.org/oak/docs/query/query-engine.html#Native_Queries



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (OAK-6201) Native Query syntax for rep:native unclear

2017-05-09 Thread David Gonzalez (JIRA)
David Gonzalez created OAK-6201:
---

 Summary: Native Query syntax for rep:native unclear
 Key: OAK-6201
 URL: https://issues.apache.org/jira/browse/OAK-6201
 Project: Jackrabbit Oak
  Issue Type: Documentation
Reporter: David Gonzalez
Priority: Minor


I was trying to perform a few test queries against an Oak 1.6.x repository 
using the XPath with the rep:native(..) function and but could not get any to 
work.

I suspect my problem is with specifying the field, but i've tried a variety of 
permutations to no avail. 

For example, if tried to re-purpose the example XPath query to search over an 
asset's dc:title [1],

Generally, i think the documentation [2] is lacking in that 1) unclear on the 
syntax 2) unclear if i need to do anything special to Oak indexes 3) if only 
the provided lucene index will service these requests (these are all from a 
lucene POV, rather than Solr, though assume all the same considerations apply)

[1] Non-working: {code}
//element(*, app:Asset)[(rep:native('lucene', 
'jcr:contnet/metdata/dc:title:(Hello OR World)'))]
{code}

[2] http://jackrabbit.apache.org/oak/docs/query/query-engine.html#Native_Queries



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5762) [Oak Lucene] Several full-text operators do not work (NOT, !, AND)

2017-03-24 Thread David Gonzalez (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Gonzalez updated OAK-5762:

Description: 
The following fulltext operators do not appear to be evaluated correctly.

AND operator
{noformat}
QUERY: /jcr:root/content/docs/en/aem/_x0036_-3//element(*, 
cq:Page)[(jcr:contains(., 'response AND layout'))]/rep:excerpt(.)
PLAN: [cq:Page] as [a] /* lucene:cqPageLucene(/oak:index/cqPageLucene) 
+(+:fulltext:respons +:fulltext:and +:fulltext:layout) 
+:ancestors:/content/docs/en/aem/6-3 ft:("response" "AND" "layout") where 
(contains([a].[*], 'response AND layout')) and (isdescendantnode([a], 
[/content/docs/en/aem/6-3])) */
{noformat}

NOT operator
{noformat}
QUERY: /jcr:root/content/docs/en/aem/_x0036_-3//element(*, 
cq:Page)[(jcr:contains(., 'response NOT layout'))]/rep:excerpt(.)
PLAN: [cq:Page] as [a] /* lucene:cqPageLucene(/oak:index/cqPageLucene) 
+(+:fulltext:respons +:fulltext:not +:fulltext:layout) 
+:ancestors:/content/docs/en/aem/6-3 ft:("response" "NOT" "layout") where 
(contains([a].[*], 'response NOT layout')) and (isdescendantnode([a], 
[/content/docs/en/aem/6-3])) */
{noformat}

! operator
{noformat}
QUERY: /jcr:root/content/docs/en/aem/_x0036_-3//element(*, 
cq:Page)[(jcr:contains(., 'response !layout'))]/rep:excerpt(.)
PLAN: [cq:Page] as [a] /* lucene:cqPageLucene(/oak:index/cqPageLucene) 
+(+:fulltext:respons +:fulltext:layout) +:ancestors:/content/docs/en/aem/6-3 
ft:("response" "!layout") where (contains([a].[*], 'response !layout')) and 
(isdescendantnode([a], [/content/docs/en/aem/6-3])) */
{noformat}

Note the `-` operator works.


  was:
The following 3 full-text queries return the same results. I expected the 1st 
to be different from the 2nd and 3rd, but the 2nd and 3rd should be identical 
to one another. All 3 yield the same results.

{noformat}
dog AND cat
{noformat}

{noformat}
dog AND NOT cat
{noformat}

{noformat}
dog AND !cat
{noformat}

The following (which, IIUC is the equivalent to NOT and !) does yield expected 
results.

{noformat}
dog AND -cat
{noformat}


> [Oak Lucene] Several full-text operators do not work (NOT, !, AND)
> --
>
> Key: OAK-5762
> URL: https://issues.apache.org/jira/browse/OAK-5762
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.0
>Reporter: David Gonzalez
>
> The following fulltext operators do not appear to be evaluated correctly.
> AND operator
> {noformat}
> QUERY: /jcr:root/content/docs/en/aem/_x0036_-3//element(*, 
> cq:Page)[(jcr:contains(., 'response AND layout'))]/rep:excerpt(.)
> PLAN: [cq:Page] as [a] /* lucene:cqPageLucene(/oak:index/cqPageLucene) 
> +(+:fulltext:respons +:fulltext:and +:fulltext:layout) 
> +:ancestors:/content/docs/en/aem/6-3 ft:("response" "AND" "layout") where 
> (contains([a].[*], 'response AND layout')) and (isdescendantnode([a], 
> [/content/docs/en/aem/6-3])) */
> {noformat}
> NOT operator
> {noformat}
> QUERY: /jcr:root/content/docs/en/aem/_x0036_-3//element(*, 
> cq:Page)[(jcr:contains(., 'response NOT layout'))]/rep:excerpt(.)
> PLAN: [cq:Page] as [a] /* lucene:cqPageLucene(/oak:index/cqPageLucene) 
> +(+:fulltext:respons +:fulltext:not +:fulltext:layout) 
> +:ancestors:/content/docs/en/aem/6-3 ft:("response" "NOT" "layout") where 
> (contains([a].[*], 'response NOT layout')) and (isdescendantnode([a], 
> [/content/docs/en/aem/6-3])) */
> {noformat}
> ! operator
> {noformat}
> QUERY: /jcr:root/content/docs/en/aem/_x0036_-3//element(*, 
> cq:Page)[(jcr:contains(., 'response !layout'))]/rep:excerpt(.)
> PLAN: [cq:Page] as [a] /* lucene:cqPageLucene(/oak:index/cqPageLucene) 
> +(+:fulltext:respons +:fulltext:layout) +:ancestors:/content/docs/en/aem/6-3 
> ft:("response" "!layout") where (contains([a].[*], 'response !layout')) and 
> (isdescendantnode([a], [/content/docs/en/aem/6-3])) */
> {noformat}
> Note the `-` operator works.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5762) [Oak Lucene] Several full-text operators do not work (NOT, !, AND)

2017-03-24 Thread David Gonzalez (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Gonzalez updated OAK-5762:

Summary: [Oak Lucene] Several full-text operators do not work (NOT, !, AND) 
 (was: [Oak Lucene] NOT and ! full-text operators do not work.)

> [Oak Lucene] Several full-text operators do not work (NOT, !, AND)
> --
>
> Key: OAK-5762
> URL: https://issues.apache.org/jira/browse/OAK-5762
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.0
>Reporter: David Gonzalez
>
> The following 3 full-text queries return the same results. I expected the 1st 
> to be different from the 2nd and 3rd, but the 2nd and 3rd should be identical 
> to one another. All 3 yield the same results.
> {noformat}
> dog AND cat
> {noformat}
> {noformat}
> dog AND NOT cat
> {noformat}
> {noformat}
> dog AND !cat
> {noformat}
> The following (which, IIUC is the equivalent to NOT and !) does yield 
> expected results.
> {noformat}
> dog AND -cat
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5762) [Oak Lucene] NOT and ! full-text operators do not work.

2017-02-22 Thread David Gonzalez (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Gonzalez updated OAK-5762:

Description: 
The following 3 full-text queries return the same results. I expected the 1st 
to be different from the 2nd and 3rd, but the 2nd and 3rd should be identical 
to one another. All 3 yield the same results.

{noformat}
dog AND cat
{noformat}

{noformat}
dog AND NOT cat
{noformat}

{noformat}
dog AND !cat
{noformat}

The following (which, IIUC is the equivalent to NOT and !) does yield expected 
results.

{noformat}
dog AND -cat
{noformat}

  was:
The following 3 full-text queries return the same results.

{noformat}
dog AND cat
{noformat}

{noformat}
dog AND NOT cat
{noformat}

{noformat}
dog AND !cat
{noformat}

The following (which, IIUC is the equivalent to NOT and !) does yield expected 
results.

{noformat}
dog AND -cat
{noformat}


> [Oak Lucene] NOT and ! full-text operators do not work.
> ---
>
> Key: OAK-5762
> URL: https://issues.apache.org/jira/browse/OAK-5762
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.0
>Reporter: David Gonzalez
>
> The following 3 full-text queries return the same results. I expected the 1st 
> to be different from the 2nd and 3rd, but the 2nd and 3rd should be identical 
> to one another. All 3 yield the same results.
> {noformat}
> dog AND cat
> {noformat}
> {noformat}
> dog AND NOT cat
> {noformat}
> {noformat}
> dog AND !cat
> {noformat}
> The following (which, IIUC is the equivalent to NOT and !) does yield 
> expected results.
> {noformat}
> dog AND -cat
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5762) [Oak Lucene] NOT and ! full-text operators do not work.

2017-02-22 Thread David Gonzalez (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Gonzalez updated OAK-5762:

Summary: [Oak Lucene] NOT and ! full-text operators do not work.  (was: 
[Oak Lucene] Not and ! full-text operators do not work.)

> [Oak Lucene] NOT and ! full-text operators do not work.
> ---
>
> Key: OAK-5762
> URL: https://issues.apache.org/jira/browse/OAK-5762
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.0
>Reporter: David Gonzalez
>
> The following 3 full-text queries return the same results.
> {noformat}
> dog AND cat
> {noformat}
> {noformat}
> dog AND NOT cat
> {noformat}
> {noformat}
> dog AND !cat
> {noformat}
> The following (which, IIUC is the equivalent to NOT and !) does yield 
> expected results.
> {noformat}
> dog AND -cat
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (OAK-5762) [Oak Lucene] Not and ! full-text operators do not work.

2017-02-22 Thread David Gonzalez (JIRA)
David Gonzalez created OAK-5762:
---

 Summary: [Oak Lucene] Not and ! full-text operators do not work.
 Key: OAK-5762
 URL: https://issues.apache.org/jira/browse/OAK-5762
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: lucene
Affects Versions: 1.6.0
Reporter: David Gonzalez


The following 3 full-text queries return the same results.

{noformat}
dog AND cat
{noformat}

{noformat}
dog AND NOT cat
{noformat}

{noformat}
dog AND !cat
{noformat}

The following (which, IIUC is the equivalent to NOT and !) does yield expected 
results.

{noformat}
dog AND -cat
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (OAK-5744) [Oak Lucene] Exact match restrictions in Groups return no results

2017-02-21 Thread David Gonzalez (JIRA)
David Gonzalez created OAK-5744:
---

 Summary: [Oak Lucene] Exact match restrictions in Groups return no 
results
 Key: OAK-5744
 URL: https://issues.apache.org/jira/browse/OAK-5744
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: lucene
Affects Versions: 1.6.0
Reporter: David Gonzalez


I am testing on: org.apache.jackrabbit.oak-lucene1.6.0.R1783091

Performing the query:

{noformat}
(cat OR dog)
{noformat}

Returns results.

{noformat}
"cat" OR "dog"
{noformat}

Also returns results.

{noformat}
("cat" OR "dog")
{noformat}

Returns no results. My expectation is all 3 lucene full-text queries should 
return the same result set. Substitute multi-phrase terms for 'cat' and 'dog' 
and the results are of a similar effect.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (OAK-5707) [Oak lucene indexes] Clarify aggregates, nodeScopeIndex, propertyIndex, analyzed

2017-02-17 Thread David Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872536#comment-15872536
 ] 

David Gonzalez edited comment on OAK-5707 at 2/17/17 9:16 PM:
--

Including helpful offline conversations w/ Vikas. 

The following require review for correctness, and are added here to help shape 
the discussion and for convenience and should NOT be considered correct until 
the review has been finalized.

* Aggregate instruct Oak to  fulltext-index any property found under the 
provided path pattern (ex. */*/*) (avoiding complication of how it recurses 
through types... )
** By default all String and String[] properties are candidates for 
aggregation, however other property types can be specified at the aggregation  
level.
* Specific property index definitions defined under indexRules, are about "how 
to index a specific property"
** The effected property is defined via a) a relative property path or b) via a 
regex property path match from the  the indexRules applies to.
* A property that can be reached by an aggregate rule pattern is same as that 
property having a `indexRules//properties/myProp@nodeScopeIndex=true`
** The advantages of ALSO defining an indexRule for a property covered by an 
aggregate are:
*** The property can also be marked as a propertyIndex which allows for more 
performation property based equality matches
*** The property can be marked for special use (ex. useInSuggest, 
useInSpellecheck, boost, etc.) which may (depending on the applied special use 
properties) how it behaves in the aggregated search.

* `analyzed=true` in an indexRules prop def (say for `jcr:content/jcr:title`) 
allows for `...FROM [app:Page] where CONTAINS([jcr:content/jcr:title], 'foo')`
** TBD what special user props are applicable to this (if any?)
* 'propertyIndex=true` is for the condition you were asking at first `WHERE 
[jcr:content/jcr:title]='foo'`
** If the property is a candidate for aggregates or nodeScopeIndex (which as 
noted are ~ equivalent), equality property conditions (WHERE 
[jcr:content/jcr:title]='foo') may still appear fast and not report a traversal 
warning, as Oak is able to leverage the internal aggregate index to quickly 
isolate matches. That being said, for property equality checks, it is always as 
fast (if not faster) to defined an indexRule for the property with 
`propertyIndex=true`
** TBD clearly describe the considerations of equality matches when only using 
the aggregate index.
* aggregate and nodeScopeIndex are intended to roll content up into the index's 
"nodeType" index, so that content will be candidate for fulltext searchs 
against that node (vs against a specific property) or rather: `WHERE 
CONTAINS(*, 'foo')`
 * The `excludeFromAggregation` prop "disables" aggregate indexing of a prop 
that matches a prop def having `excludeFromAggregation`



was (Author: empire29):
Including helpful offline conversations w/ Vikas.

The following require review for correctness, and are added here to help shape 
the discussion and for convenience.

* Aggregate instruct Oak to  fulltext-index any property found under the 
provided path pattern (ex. */*/*) (avoiding complication of how it recurses 
through types... )
** By default all String and String[] properties are candidates for 
aggregation, however other property types can be specified at the aggregation  
level.
* Specific property index definitions defined under indexRules, are about "how 
to index a specific property"
** The effected property is defined via a) a relative property path or b) via a 
regex property path match from the  the indexRules applies to.
* A property that can be reached by an aggregate rule pattern is same as that 
property having a `indexRules//properties/myProp@nodeScopeIndex=true`
** The advantages of ALSO defining an indexRule for a property covered by an 
aggregate are:
*** The property can also be marked as a propertyIndex which allows for more 
performation property based equality matches
*** The property can be marked for special use (ex. useInSuggest, 
useInSpellecheck, boost, etc.) which may (depending on the applied special use 
properties) how it behaves in the aggregated search.

* `analyzed=true` in an indexRules prop def (say for `jcr:content/jcr:title`) 
allows for `...FROM [app:Page] where CONTAINS([jcr:content/jcr:title], 'foo')`
** TBD what special user props are applicable to this (if any?)
* 'propertyIndex=true` is for the condition you were asking at first `WHERE 
[jcr:content/jcr:title]='foo'`
** If the property is a candidate for aggregates or nodeScopeIndex (which as 
noted are ~ equivalent), equality property conditions (WHERE 
[jcr:content/jcr:title]='foo') may still appear fast and not report a traversal 
warning, as Oak is able to leverage the internal aggregate index to quickly 
isolate matches. That being said, for property equality checks, it is always as 

[jira] [Created] (OAK-5707) [Oak lucene indexes] Clarify aggregates, nodeScopeIndex, propertyIndex, analyzed

2017-02-17 Thread David Gonzalez (JIRA)
David Gonzalez created OAK-5707:
---

 Summary: [Oak lucene indexes] Clarify aggregates, nodeScopeIndex, 
propertyIndex, analyzed
 Key: OAK-5707
 URL: https://issues.apache.org/jira/browse/OAK-5707
 Project: Jackrabbit Oak
  Issue Type: Documentation
Reporter: David Gonzalez


Oak lucene documentation would benefit from clarifying the relationships and 
expect behaviors around aggregates, nodeScopeIndex, propertyIndex and analyzed.

These features have some overlap in what they do and/or augment one another, 
but to the lay-developer it is unclear how these work in concern and/or the 
implications of these using the various features.

Its worth remembering many developers are under the mindset (shifting from 
jackrabbit 2 -> oak) that oak indexing requires explicit inclusion of content 
into search results; thus implicit content inclusion into indexes via 
generalized aggregations (vs named properties) is unclear/unexpected to many.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-5692) Oak Lucene analyzers docs unclear on viable configurations

2017-02-17 Thread David Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871971#comment-15871971
 ] 

David Gonzalez commented on OAK-5692:
-

[~chetanm] thanks - this is a treasure trove of information!!!

Do you have any thoughts on the language-based considerations and if that is 
possible to handle today in Oak? The 2 main use-case i see are:
* Switching the default language from English (which i believe it is?) to some 
other language/locale
* Supporting multiple languages in a single index. For example i have a 
app:Page index, but i have app:Pages in english, french, german, spanish -- can 
i specify multiple language specific charFilters/tokenizers/filters? If yes, 
how and how does the indexing/query know which to invoke?

I will check out updated docs!

> Oak Lucene analyzers docs unclear on viable configurations
> --
>
> Key: OAK-5692
> URL: https://issues.apache.org/jira/browse/OAK-5692
> Project: Jackrabbit Oak
>  Issue Type: Documentation
>Reporter: David Gonzalez
>Assignee: Chetan Mehrotra
>
> The Oak lucene docs [1] > Analyzers section would benefit from clarification:
> Combining analyzer-based topics into a single ticket
> * If no analyzer is specified, what analyzer setup is used (at a bare 
> minimum, _some_ tokenizer must be used)
> * The docs mention the "default" analyzer 
> ([oak:queryIndexDefinition]/analyzers/default). 
> ** Can other analyzers be defined? 
> ** How are they selected for use? 
> ** is the selection configurable?
> * Is the analyzer both index AND query time (unless specified by 
> `type=index|query` property)?
> * What is the naming for multiple analyzer nodes? Are all children of 
> analyzers assumed to be an analyzer? Ex. If i want a special configuration or 
> index and another for query, could i create:
> {noformat}
> ../myIndex/analyzers/indexAnalyzer@type=index
> .. define the index-time analyzer ...
> ../myIndex/analyzers/queryAnalyzer@type=query
> .. define the query-time analyzer ...
> {noformat}
> * How are languages handled? Ex. language specific stop words, synonyms, char 
> mapping,  and Stemming.
> * If 
> [oak:queryIndexDefinition]/analyzers/default@class=org.apache.lucene.analysis.standard.StandardAnalyzer
>  it appears the Standard Tokenizer and Standard Lowercase and Stop Filters 
> are used. The Stop filter can be augmented w the well-named stopwords file.
> ** Can other charFilters/filters be layered on top of this "named" Analyzer 
> (it seems not).
> * When the Stop Filter is used it provided the OOTB language-based stop 
> words. If a custom stopwords file is provided, that list replaced the OOTB 
> lang-based, requiring the developer to provide their own language based Stop 
> words. Is this correct? This should be called out and link out to the catalog 
> of OOTB stopword txt files for easy inclusion)
> * The Stop filters words property must be a String not String[] and the value 
> is a comma delimited String value. Would be good to call this out.
> * What are all the CharFilters/Filters available? Is there a concise list w/ 
> their params? (Ex. i think the PorterStem might support and ignoreCase param?)
> * Synonym Filter syntax is unclear; It seems like here are 2 formats; 
> directional x -> y and bi-directional (comma delimited); i could only get the 
> latter to work.
> * Are all the options in the link [2] supported. Its unclear if there is a 
> 1:1 between oak lucene and solr's capabilities or if [2] is a loose example 
> of the "types" of supported analyzers.
> * For things something like the PatternReplaceCharFilterFactory [3], how do 
> you define multiple pattern mappings, as IIUC the charFilter node MUST be 
> named:
> {noformat}.../charFilters/PatternReplace{noformat} so you can't have multiple 
> "PatternReplace" named nodes, each with its own "@pattern" and "@replace" 
> properties.  It seems like there is only support for a single object for each 
> Factory type?
> Generally this seems like the handiest resource: 
> https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers%2C+Tokenizers%2C+and+Filters
> [1]  http://jackrabbit.apache.org/oak/docs/query/lucene.html
> [2] 
> https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema
> [3] https://cwiki.apache.org/confluence/display/solr/CharFilterFactories



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5692) Oak Lucene analyzers docs unclear on viable configurations

2017-02-17 Thread David Gonzalez (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Gonzalez updated OAK-5692:

Description: 
The Oak lucene docs [1] > Analyzers section would benefit from clarification:

Combining analyzer-based topics into a single ticket

* If no analyzer is specified, what analyzer setup is used (at a bare minimum, 
_some_ tokenizer must be used)
* The docs mention the "default" analyzer 
([oak:queryIndexDefinition]/analyzers/default). 
** Can other analyzers be defined? 
** How are they selected for use? 
** is the selection configurable?
* Is the analyzer both index AND query time (unless specified by 
`type=index|query` property)?
* What is the naming for multiple analyzer nodes? Are all children of analyzers 
assumed to be an analyzer? Ex. If i want a special configuration or index and 
another for query, could i create:
{noformat}
../myIndex/analyzers/indexAnalyzer@type=index
.. define the index-time analyzer ...
../myIndex/analyzers/queryAnalyzer@type=query
.. define the query-time analyzer ...
{noformat}
* How are languages handled? Ex. language specific stop words, synonyms, char 
mapping,  and Stemming.
* If 
[oak:queryIndexDefinition]/analyzers/default@class=org.apache.lucene.analysis.standard.StandardAnalyzer
 it appears the Standard Tokenizer and Standard Lowercase and Stop Filters are 
used. The Stop filter can be augmented w the well-named stopwords file.
** Can other charFilters/filters be layered on top of this "named" Analyzer (it 
seems not).
* When the Stop Filter is used it provided the OOTB language-based stop words. 
If a custom stopwords file is provided, that list replaced the OOTB lang-based, 
requiring the developer to provide their own language based Stop words. Is this 
correct? This should be called out and link out to the catalog of OOTB stopword 
txt files for easy inclusion)
* The Stop filters words property must be a String not String[] and the value 
is a comma delimited String value. Would be good to call this out.
* What are all the CharFilters/Filters available? Is there a concise list w/ 
their params? (Ex. i think the PorterStem might support and ignoreCase param?)
* Synonym Filter syntax is unclear; It seems like here are 2 formats; 
directional x -> y and bi-directional (comma delimited); i could only get the 
latter to work.
* Are all the options in the link [2] supported. Its unclear if there is a 1:1 
between oak lucene and solr's capabilities or if [2] is a loose example of the 
"types" of supported analyzers.
* For things something like the PatternReplaceCharFilterFactory [3], how do you 
define multiple pattern mappings, as IIUC the charFilter node MUST be named:
{noformat}.../charFilters/PatternReplace{noformat} so you can't have multiple 
"PatternReplace" named nodes, each with its own "@pattern" and "@replace" 
properties.  It seems like there is only support for a single object for each 
Factory type?


Generally this seems like the handiest resource: 
https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers%2C+Tokenizers%2C+and+Filters

[1]  http://jackrabbit.apache.org/oak/docs/query/lucene.html
[2] 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema
[3] https://cwiki.apache.org/confluence/display/solr/CharFilterFactories

  was:
The Oak lucene docs [1] > Analyzers section would benefit from clarification:

Combining analyzer-based topics into a single ticket

* If no analyzer is specified, what analyzer setup is used (at the vert least 
some tokenizer must be used)
* The docs mention the "default" analyzer 
([oak:queryIndexDefinition]/analyzers/default). Can other analyzers be defined? 
How are they selected for use? is the selection configurable?
* By default is the analyzer index AND query time, unless specified by 
`type=index|query` property?
* What is the naming for multiple analyzer nodes? Are all children of analyzers 
assumed to be an analyzer? Ex. If i want a special configuration or index and 
another for query, could i create:
{noformat}
../myIndex/analyzers/indexAnalyzer@type=index
.. define the index-time analyzer ...
../myIndex/analyzers/queryAnalyzer@type=query
.. define the query-time analyzer ...
{noformat}
* How are languages handled? Ex. language specific stop words, synonyms, char 
mapping,  and Stemming.
* If 
[oak:queryIndexDefinition]/analyzers/default@class=org.apache.lucene.analysis.standard.StandardAnalyzer
 it appears the Standard Tokenizer and Standard Lowercase and Stop Filters are 
used. The Stop filter can be augmented w the well-named stopwords file.
** Can other charFilters/filters be layered on top of this "named" Analyzer (it 
seems not).
* When the Stop Filter is used it provided the OOTB language-based stop words. 
If a custom stopwords file is provided, that list replaced the OOTB lang-based, 
requiring the developer to provide their own language 

[jira] [Updated] (OAK-5692) Oak Lucene analyzers docs unclear on viable configurations

2017-02-16 Thread David Gonzalez (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Gonzalez updated OAK-5692:

Description: 
The Oak lucene docs [1] > Analyzers section would benefit from clarification:

Combining analyzer-based topics into a single ticket

* If no analyzer is specified, what analyzer setup is used (at the vert least 
some tokenizer must be used)
* The docs mention the "default" analyzer 
([oak:queryIndexDefinition]/analyzers/default). Can other analyzers be defined? 
How are they selected for use? is the selection configurable?
* By default is the analyzer index AND query time, unless specified by 
`type=index|query` property?
* What is the naming for multiple analyzer nodes? Are all children of analyzers 
assumed to be an analyzer? Ex. If i want a special configuration or index and 
another for query, could i create:
{noformat}
../myIndex/analyzers/indexAnalyzer@type=index
.. define the index-time analyzer ...
../myIndex/analyzers/queryAnalyzer@type=query
.. define the query-time analyzer ...
{noformat}
* How are languages handled? Ex. language specific stop words, synonyms, char 
mapping,  and Stemming.
* If 
[oak:queryIndexDefinition]/analyzers/default@class=org.apache.lucene.analysis.standard.StandardAnalyzer
 it appears the Standard Tokenizer and Standard Lowercase and Stop Filters are 
used. The Stop filter can be augmented w the well-named stopwords file.
** Can other charFilters/filters be layered on top of this "named" Analyzer (it 
seems not).
* When the Stop Filter is used it provided the OOTB language-based stop words. 
If a custom stopwords file is provided, that list replaced the OOTB lang-based, 
requiring the developer to provide their own language based Stop words. Is this 
correct? This should be called out and link out to the catalog of OOTB stopword 
txt files for easy inclusion)
* The Stop filters words property must be a String not String[] and the value 
is a comma delimited String value. Would be good to call this out.
* What are all the CharFilters/Filters available? Is there a concise list w/ 
their params? (Ex. i think the PorterStem might support and ignoreCase param?)
* Synonym Filter syntax is unclear; It seems like here are 2 formats; 
directional x -> y and bi-directional (comma delimited); i could only get the 
latter to work.
* Are all the options in the link [2] supported. Its unclear if there is a 1:1 
between oak lucene and solr's capabilities or if [2] is a loose example of the 
"types" of supported analyzers.
* For things something like the PatternReplaceCharFilterFactory [3], how do you 
define multiple pattern mappings, as IIUC the charFilter node MUST be named:
{noformat}.../charFilters/PatternReplace{noforma} so you can't have multiple 
"PatternReplace" named nodes, each with its own "@pattern" and "@replace" 
properties.  It seems like there is only support for a single object for each 
Factory type?


Generally this seems like the handiest resource: 
https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers%2C+Tokenizers%2C+and+Filters

[1]  http://jackrabbit.apache.org/oak/docs/query/lucene.html
[2] 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema
[3] https://cwiki.apache.org/confluence/display/solr/CharFilterFactories

  was:
The Oak lucene docs [1] > Analyzers section would benefit from clarification:

Combining analyzer-based topics into a single ticket

* If no analyzer is specified, what analyzer setup is used (at the vert least 
some tokenizer must be used)
* The docs mention the "default" analyzer 
([oak:queryIndexDefinition]/analyzers/default). Can other analyzers be defined? 
How are they selected for use? is the selection configurable?
* By default is the analyzer index AND query time, unless specified by 
`type=index|query` property?
* What is the naming for multiple analyzer nodes? Are all children of analyzers 
assumed to be an analyzer? Ex. If i want a special configuration or index and 
another for query, could i create:
{noformat}
../myIndex/analyzers/indexAnalyzer@type=index
.. define the index-time analyzer ...
../myIndex/analyzers/queryAnalyzer@type=query
.. define the query-time analyzer ...
{noformat}
* How are languages handled? Ex. language specific stop words, synonyms, char 
mapping,  and Stemming.
* If 
[oak:queryIndexDefinition]/analyzers/default@class=org.apache.lucene.analysis.standard.StandardAnalyzer
 it appears the Standard Tokenizer and Standard Lowercase and Stop Filters are 
used. The Stop filter can be augmented w the well-named stopwords file.
** Can other charFilters/filters be layered on top of this "named" Analyzer (it 
seems not).
* When the Stop Filter is used it provided the OOTB language-based stop words. 
If a custom stopwords file is provided, that list replaced the OOTB lang-based, 
requiring the developer to provide their own language based Stop 

[jira] [Updated] (OAK-5692) Oak Lucene analyzers docs unclear on viable configurations

2017-02-16 Thread David Gonzalez (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Gonzalez updated OAK-5692:

Description: 
The Oak lucene docs [1] > Analyzers section would benefit from clarification:

Combining analyzer-based topics into a single ticket

* If no analyzer is specified, what analyzer setup is used (at the vert least 
some tokenizer must be used)
* The docs mention the "default" analyzer 
([oak:queryIndexDefinition]/analyzers/default). Can other analyzers be defined? 
How are they selected for use? is the selection configurable?
* By default is the analyzer index AND query time, unless specified by 
"type=index|query" property?
* What is the naming for multiple analyzer nodes? Are all children of analyzers 
assumed to be an analyzer? Ex. If i want a special configuration or index and 
another for query, could i create:
{noformat}
../myIndex/analyzers/indexAnalyzer@type=index
.. define the index-time analyzer ...
../myIndex/analyzers/queryAnalyzer@type=query
.. define the query-time analyzer ...
{noformat}
* How are languages handled? Ex. language specific stop words, synonyms, char 
mapping,  and Stemming.
* If 
[oak:queryIndexDefinition]/analyzers/default@class=org.apache.lucene.analysis.standard.StandardAnalyzer
 it appears the Standard Tokenizer and Standard Lowercase and Stop Filters are 
used. The Stop filter can be augmented w the well-named stopwords file.
** Can other charFilters/filters be layered on top of this "named" Analyzer (it 
seems not).
* When the Stop Filter is used it provided the OOTB language-based stop words. 
If a custom stopwords file is provided, that list replaced the OOTB lang-based, 
requiring the developer to provide their own language based Stop words. Is this 
correct? This should be called out and link out to the catalog of OOTB stopword 
txt files for easy inclusion)
* The Stop filters words property must be a String not String[] and the value 
is a comma delimited String value. Would be good to call this out.
* What are all the CharFilters/Filters available? Is there a concise list w/ 
their params? (Ex. i think the PorterStem might support and ignoreCase param?)
* Synonym Filter syntax is unclear; It seems like here are 2 formats; 
directional x -> y and bi-directional (comma delimited); i could only get the 
latter to work.
* Are all the options in the link [2] supported. Its unclear if there is a 1:1 
between oak lucene and solr's capabilities or if [2] is a loose example of the 
"types" of supported analyzers.

[1]  http://jackrabbit.apache.org/oak/docs/query/lucene.html
[2] 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema

  was:
The Oak lucene docs [1] > Analyzers section would benefit from clarification:

Combining analyzer-based topics into a single ticket

* If no analyzer is specified, what analyzer setup is used (at the vert least 
some tokenizer must be used)
* The docs mention the "default" analyzer 
([oak:queryIndexDefinition]/analyzers/default). Can other analyzers be defined? 
How are they selected for use? is the selection configurable?
* How are languages handled? Ex. language specific stop words, synonyms, char 
mapping,  and Stemming.
* If 
[oak:queryIndexDefinition]/analyzers/default@class=org.apache.lucene.analysis.standard.StandardAnalyzer
 it appears the Standard Tokenizer and Standard Lowercase and Stop Filters are 
used. The Stop filter can be augmented w the well-named stopwords file.
** Can other charFilters/filters be layered on top of this "named" Analyzer (it 
seems not).
* When the Stop Filter is used it provided the OOTB language-based stop words. 
If a custom stopwords file is provided, that list replaced the OOTB lang-based, 
requiring the developer to provide their own language based Stop words. Is this 
correct? This should be called out and link out to the catalog of OOTB stopword 
txt files for easy inclusion)
* The Stop filters words property must be a String not String[] and the value 
is a comma delimited String value. Would be good to call this out.
* What are all the CharFilters/Filters available? Is there a concise list w/ 
their params? (Ex. i think the PorterStem might support and ignoreCase param?)
* Synonym Filter syntax is unclear; It seems like here are 2 formats; 
directional x -> y and bi-directional (comma delimited); i could only get the 
latter to work.
* Are all the options in the link [2] supported. Its unclear if there is a 1:1 
between oak lucene and solr's capabilities or if [2] is a loose example of the 
"types" of supported analyzers.

[1]  http://jackrabbit.apache.org/oak/docs/query/lucene.html
[2] 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema


> Oak Lucene analyzers docs unclear on viable configurations
> --
>
> Key: OAK-5692
>

[jira] [Updated] (OAK-5692) Oak Lucene analyzers docs unclear on viable configurations

2017-02-16 Thread David Gonzalez (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Gonzalez updated OAK-5692:

Description: 
The Oak lucene docs [1] > Analyzers section would benefit from clarification:

Combining analyzer-based topics into a single ticket

* If no analyzer is specified, what analyzer setup is used (at the vert least 
some tokenizer must be used)
* The docs mention the "default" analyzer 
([oak:queryIndexDefinition]/analyzers/default). Can other analyzers be defined? 
How are they selected for use? is the selection configurable?
* By default is the analyzer index AND query time, unless specified by 
`type=index|query` property?
* What is the naming for multiple analyzer nodes? Are all children of analyzers 
assumed to be an analyzer? Ex. If i want a special configuration or index and 
another for query, could i create:
{noformat}
../myIndex/analyzers/indexAnalyzer@type=index
.. define the index-time analyzer ...
../myIndex/analyzers/queryAnalyzer@type=query
.. define the query-time analyzer ...
{noformat}
* How are languages handled? Ex. language specific stop words, synonyms, char 
mapping,  and Stemming.
* If 
[oak:queryIndexDefinition]/analyzers/default@class=org.apache.lucene.analysis.standard.StandardAnalyzer
 it appears the Standard Tokenizer and Standard Lowercase and Stop Filters are 
used. The Stop filter can be augmented w the well-named stopwords file.
** Can other charFilters/filters be layered on top of this "named" Analyzer (it 
seems not).
* When the Stop Filter is used it provided the OOTB language-based stop words. 
If a custom stopwords file is provided, that list replaced the OOTB lang-based, 
requiring the developer to provide their own language based Stop words. Is this 
correct? This should be called out and link out to the catalog of OOTB stopword 
txt files for easy inclusion)
* The Stop filters words property must be a String not String[] and the value 
is a comma delimited String value. Would be good to call this out.
* What are all the CharFilters/Filters available? Is there a concise list w/ 
their params? (Ex. i think the PorterStem might support and ignoreCase param?)
* Synonym Filter syntax is unclear; It seems like here are 2 formats; 
directional x -> y and bi-directional (comma delimited); i could only get the 
latter to work.
* Are all the options in the link [2] supported. Its unclear if there is a 1:1 
between oak lucene and solr's capabilities or if [2] is a loose example of the 
"types" of supported analyzers.

[1]  http://jackrabbit.apache.org/oak/docs/query/lucene.html
[2] 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema

  was:
The Oak lucene docs [1] > Analyzers section would benefit from clarification:

Combining analyzer-based topics into a single ticket

* If no analyzer is specified, what analyzer setup is used (at the vert least 
some tokenizer must be used)
* The docs mention the "default" analyzer 
([oak:queryIndexDefinition]/analyzers/default). Can other analyzers be defined? 
How are they selected for use? is the selection configurable?
* By default is the analyzer index AND query time, unless specified by 
"type=index|query" property?
* What is the naming for multiple analyzer nodes? Are all children of analyzers 
assumed to be an analyzer? Ex. If i want a special configuration or index and 
another for query, could i create:
{noformat}
../myIndex/analyzers/indexAnalyzer@type=index
.. define the index-time analyzer ...
../myIndex/analyzers/queryAnalyzer@type=query
.. define the query-time analyzer ...
{noformat}
* How are languages handled? Ex. language specific stop words, synonyms, char 
mapping,  and Stemming.
* If 
[oak:queryIndexDefinition]/analyzers/default@class=org.apache.lucene.analysis.standard.StandardAnalyzer
 it appears the Standard Tokenizer and Standard Lowercase and Stop Filters are 
used. The Stop filter can be augmented w the well-named stopwords file.
** Can other charFilters/filters be layered on top of this "named" Analyzer (it 
seems not).
* When the Stop Filter is used it provided the OOTB language-based stop words. 
If a custom stopwords file is provided, that list replaced the OOTB lang-based, 
requiring the developer to provide their own language based Stop words. Is this 
correct? This should be called out and link out to the catalog of OOTB stopword 
txt files for easy inclusion)
* The Stop filters words property must be a String not String[] and the value 
is a comma delimited String value. Would be good to call this out.
* What are all the CharFilters/Filters available? Is there a concise list w/ 
their params? (Ex. i think the PorterStem might support and ignoreCase param?)
* Synonym Filter syntax is unclear; It seems like here are 2 formats; 
directional x -> y and bi-directional (comma delimited); i could only get the 
latter to work.
* Are all the options in the link [2] 

[jira] [Created] (OAK-5692) Oak Lucene analyzers docs unclear on viable configurations

2017-02-16 Thread David Gonzalez (JIRA)
David Gonzalez created OAK-5692:
---

 Summary: Oak Lucene analyzers docs unclear on viable configurations
 Key: OAK-5692
 URL: https://issues.apache.org/jira/browse/OAK-5692
 Project: Jackrabbit Oak
  Issue Type: Documentation
Reporter: David Gonzalez


The Oak lucene docs [1] > Analyzers section would benefit from clarification:

Combining analyzer-based topics into a single ticket

* If no analyzer is specified, what analyzer setup is used (at the vert least 
some tokenizer must be used)
* The docs mention the "default" analyzer 
([oak:queryIndexDefinition]/analyzers/default). Can other analyzers be defined? 
How are they selected for use? is the selection configurable?
* How are languages handled? Ex. language specific stop words, synonyms, char 
mapping,  and Stemming.
* If 
[oak:queryIndexDefinition]/analyzers/default@class=org.apache.lucene.analysis.standard.StandardAnalyzer
 it appears the Standard Tokenizer and Standard Lowercase and Stop Filters are 
used. The Stop filter can be augmented w the well-named stopwords file.
** Can other charFilters/filters be layered on top of this "named" Analyzer (it 
seems not).
* When the Stop Filter is used it provided the OOTB language-based stop words. 
If a custom stopwords file is provided, that list replaced the OOTB lang-based, 
requiring the developer to provide their own language based Stop words. Is this 
correct? This should be called out and link out to the catalog of OOTB stopword 
txt files for easy inclusion)
* The Stop filters words property must be a String not String[] and the value 
is a comma delimited String value. Would be good to call this out.
* What are all the CharFilters/Filters available? Is there a concise list w/ 
their params? (Ex. i think the PorterStem might support and ignoreCase param?)
* Synonym Filter syntax is unclear; It seems like here are 2 formats; 
directional x -> y and bi-directional (comma delimited); i could only get the 
latter to work.
* Are all the options in the link [2] supported. Its unclear if there is a 1:1 
between oak lucene and solr's capabilities or if [2] is a loose example of the 
"types" of supported analyzers.

[1]  http://jackrabbit.apache.org/oak/docs/query/lucene.html
[2] 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-1965) Support for constraints like: foo = 'X' OR bar = 'Y'

2014-07-22 Thread David Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071319#comment-14071319
 ] 

David Gonzalez commented on OAK-1965:
-

[~jukkaz] FYI - after installing this, my instance has been restarting for 6+ 
hours; When I enabled DEBUG on org.jackrabbit logs, I see a wall of... 

{noformat}
22.07.2014 23:08:46.051 *DEBUG* [TarMK compaction thread 
[/xxx/repository/segmentstore], active since Tue Jul 22 02:00:00 EDT 2014, 
previous max duration 0ms] org.apache.jackrabbit.oak.plugins.segment.SegmentId 
Loading segment 4b993a00-2233-4c8c-a9fa-e9e4fab8a665
{noformat}

With 147 and counting new segment store tar files since the installation. 
Doesn't seem directly related, but was an unexpected side effect of installing 
this jar.

 Support for constraints like: foo = 'X' OR bar = 'Y'
 

 Key: OAK-1965
 URL: https://issues.apache.org/jira/browse/OAK-1965
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: core, query
Reporter: Jukka Zitting
Assignee: Jukka Zitting
 Fix For: 1.1

 Attachments: oak-core-1.0.3-OAK-1965-SNAPSHOT.jar


 Consider the following query statement:
 {noformat}
 SELECT * FROM [nt:base] WHERE [foo] = 'X OR [bar] = 'Y'
 {noformat}
 Such a query could be fairly efficiently executed against a property index 
 that indexes the values of both foo and bar properties. However, the 
 query engine doesn't pass such OR constraints down to the index 
 implementations, so we currently can't leverage such an index for this query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1894) PropertyIndex only considers the cost of a single indexed property

2014-06-16 Thread David Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1403#comment-1403
 ] 

David Gonzalez commented on OAK-1894:
-

[~justinedelson] I think listing all index candidate property names would be 
useful to help understand/make immediately clear 1) if you're missing an index 
for a property and 2) if certain operations (like, not) are preventing a 
property from being resolved to an index.


 PropertyIndex only considers the cost of a single indexed property
 --

 Key: OAK-1894
 URL: https://issues.apache.org/jira/browse/OAK-1894
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: query
Reporter: Justin Edelson
 Fix For: 1.1, 1.0.2

 Attachments: OAK-1894-advanced.diff, OAK-1894.patch


 The existing PropertyIndex loops through the PropertyRestriction objects in 
 the Filter and essentially only calculates the cost of the first indexed 
 property. This isn't actually the first property in the query and 
 Filter.propertyRestrictions is a HashMap.
 More confusingly, the plan for a query with multiple indexed properties 
 outputs *all* indexed properties, even though only the first one is used.
 For queries with multiple indexed properties, the cheapest property index 
 should be used in all three relevant places: when calculating the cost, when 
 executing the query, and when producing the plan.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OAK-1829) IllegalStateException while trying retrieve rows information from QueryResult

2014-05-23 Thread David Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007155#comment-14007155
 ] 

David Gonzalez commented on OAK-1829:
-

Ran into this as well..

jcr:like(fn:lower-case(./@multiValueField), '%somevalue%’) throws 
IllegalStateException

whereas

jcr:like(./@multiValueField, '%somevalue%’)

does not. 



 IllegalStateException while trying retrieve rows information from QueryResult 
 --

 Key: OAK-1829
 URL: https://issues.apache.org/jira/browse/OAK-1829
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: query
Affects Versions: 0.20
Reporter: Vijay Kumar j

 if query contain lowerCase on array property then QueryResult.getRows() 
 throwing  IllegalStateException.
 Query which causing issue
  select [selector_1].* from [nt:unstructured] AS [selector_1] where 
 (([selector_1].[lcc:className] = 
 'com.adobe.icc.dbforms.obj.ConditionalDataModule')) AND 
 (LOWER([selector_1].[dataDictionaryRefs]) = 'employeedd')
 If we remove LOWER function then it is working 
  select [selector_1].* from [nt:unstructured] AS [selector_1] where 
 (([selector_1].[lcc:className] = 
 'com.adobe.icc.dbforms.obj.ConditionalDataModule')) AND 
 ([selector_1].[dataDictionaryRefs] = 'EmployeeDD')



--
This message was sent by Atlassian JIRA
(v6.2#6252)