Re: Problems with hyphen in JSR-170 XPath query using jcr:contains

Dunstall, Christopher Fri, 27 Aug 2010 05:27:44 -0700

Ard,

In indexing_configuration.xml, where you named the property where the
analyzer is used (e.g. FullName), how to I set it so that it's used on all
properties of a node?  As previously said, I'm using jcr:contains because I
need to search all parts of the node, so the analyzer needs to have effect
on all properties.


Regards,

Chris


On 27/08/10 2:22 AM, "H. Wilson" <[email protected]> wrote:

>   Finally! I have been hacking away at this here and there for months,
> trying all different analyzers or not-using analyzers and modifying my
> queries all to no avail! Since I always like precise examples when I am
> searching forums, I will post my (nearly) exact solution both for others
> and so that Ard might verify that this was indeed what he meant.
> 
> Ard, I was hoping you could embellish a little on why we would duplicate
> the property? (I didn't actually do it to get this working perfectly)
> You lost me a little there, was it for efficiency? Thanks for everything!
> 
> H. Wilson
> 
> repository.xml (modified both SearchIndex tags to include an
> indexingConfiguration):
> 
>     <SearchIndex
>     class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
> 
>         ....
>         <param name="indexingConfiguration"
>         value="${rep.home}/indexing_configuration.xml"/>
> 
>     </SearchIndex>
> 
> 
> indexing_configuration.xml:
> 
>     <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0";>
>     <analyzers>
>     <analyzer
>     class="org.mycompany.lucene.analysis.LowerCaseKeywordAnalyzer">
>     <property>fullName</property>
>     </analyzer>
>     </analyzers>
>     </configuration>
> 
> 
> LowerCaseKeywordAnalyzer.java:
> 
>     package org.mycompany.lucene.analysis;
>          import java.io.Reader;
>          import org.apache.lucene.analysis.KeywordAnalyzer;
>          import org.apache.lucene.analysis.LowerCaseFilter;
>          import org.apache.lucene.analysis.TokenStream;
> 
>     public class LowerCaseKeywordAnalyzer extends KeywordAnalyzer {
> 
>          public TokenStream tokenStream ( String field, final Reader
>     reader  ) {
>              TokenStream keywordTokenStream = super.tokenStream (field,
>     reader);
>              return ( new LowerCaseFilter ( keywordTokenStream ) );
>          }
>     }
> 
> 
> Our search class has a method which then does the following:
> 
>     public OurParameter[] getOurParameters (String searchTerm, String
>     srchField ) { //srchField in this case was fullName
> 
>         TransientRepository repository = new TransientRepository (
>         OUR_REPO_CONFIG, OUR_REPO_LOCATION);
>         Session session = repository.login ();
>         List<Class> classes = new ArrayList<Class>();
>         classes.add (OurParameter.class);
>         Mapper mapper = new AnnotationMapperImpl (classes);
>         ObjectContentManager ocm = new ObjectContentManagerImpl
>         (session, mapper);
>         queryManager = ocm.getQueryManager();
>         FilterImpl filter = (FilterImpl)queryManager.createFilter
>         (OurParameter.class);
>         filter.addContains ( srchField,
>         
> org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(searchTerm).repl
> aceAll
>         ("'","''"));
>         // (that last was replace all single ticks with two ticks, I
>         honestly can't remember why though)
>         Query query = queryManager.createQuery (filter);
>         Collection<OurParameter> resultsCollection =
>         (Collection<OurParameter>)ocm.getObjects(query);
> 
>         //convert to an array, do some other stuff, and return...
> 
>     }
> 
> 
> 
> On 08/26/2010 10:42 AM, Ard Schrijvers wrote:
>> On Thu, Aug 26, 2010 at 3:53 PM, H. Wilson<[email protected]>  wrote:
>>>   Ard,
>>> 
>>> I have this same problem, however my scenario involves underscores rather
>>> than hyphens. Although since Chris seems to be seeing the same exact
>> It is because hyphens just as underscores are tokens the Standard
>> Lucene Analyzer splits on. This combined with query expansion that
>> happens for wildcard searches in lucene causes your issuess:
>> 
>>> behavior as I was, I imagine we are both stuck on the same issue. After
>>> scouring the forums for the solution, and not seeing your mentioned
>>> solution, I actually posted my problem as detailed as possible here (
>>> http://markmail.org/message/yh72wqd5b2hbr3j6 ) and received no response.
>>> jcr:like was not an option for me, in this case, as our client wanted the
>>> option for case-insensitive searches. Is there any chance you could please
>>> narrow down where-about the post was which already covered this? Thanks for
>> I can't seem to find my post again. But, I'll give you a quite simple
>> solution:
>> 
>> If you want to have the normal indexing of the property for normal
>> searching, but also want to have the yyy* option, you need to
>> duplicate the property also in another property. If your property,
>> like
>> 
>> .North.South.East.WestLand
>> 
>> is only needed for the one you describe with wildcard searching, you
>> only need it once. Now, suppose, your property is called myProp.
>> 
>> To your configuration.xml add:
>> 
>> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0";>
>>    <analyzers>
>>          <analyzer
>> class="org.mycompany.lucene.analysis.LowerCaseKeywordAnalyzer">
>>              <property>myProp</property>
>>          </analyzer>
>>    </analyzers>
>> </configuration>
>> 
>> Your LowerCaseKeywordAnalyzer is very simple: it extends
>> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/KeywordAna
>> lyzer.html
>> and in the method
>> 
>>   TokenStream tokenStream(String fieldName,Reader reader)
>> 
>> after calling the super, you invoke Lucene's LowerCaseFilter.
>> 
>> That is all (after you do a re-index of your repository). Since now a
>> -, or _ or ~ or whatever is not seen as a token to split on, but you
>> still use lowercase filter, you can do exactly what you want.
>> 
>> Do the words need the be split on spaces however? No problem, just add
>> a WhiteSpaceTokenizer from lucene. It is actually pretty simple,
>> 
>> Hope this helps,
>> 
>> Regards Ard
>> 
>>> your time.
>>> 
>>> *H. Wilson*
>>> 
>>> 
>>> On 08/26/2010 04:59 AM, Ard Schrijvers wrote:
>>>> Hello,
>>>> 
>>>> You can search the archives (mail from me) for wildcard searching
>>>> things related below. There was someone having similar issues. I
>>>> explained the wildcard difficulties. Take a look at jcr:like for your
>>>> usecases
>>>> 
>>>> Regards Ard
>>>> 
>>>> On Thu, Aug 26, 2010 at 10:19 AM, Dunstall, Christopher
>>>> <[email protected]>    wrote:
>>>>> Hi all,
>>>>> 
>>>>> I'm having some trouble with an XPath query, where I'm searching for
>>>>> users with hyphens in their name.
>>>>> 
>>>>> I'm using:
>>>>> jcr:contains(*/*/*,'query')
>>>>> 
>>>>> And it returns some odd results.
>>>>> 
>>>>> I have two users, Sophie-Allen and Sophie-Anne. When I search for
>>>>> 'sophie', I get back users back. Ok, fine, but if I search for 'sophie-a'
>>>>> (with the hyphen escaped as 'sophie\-a' as per the JSR-170 Spec) I get
>>>>> zero
>>>>> results returned.  Oddly, if I search for either 'sophie-allen' or
>>>>> 'sophie-anne' I get the respective user details back fine. Shouldn't I get
>>>>> both users back when escaping the hyphen? Have I missed something in the
>>>>> spec?
>>>>> 
>>>>> One other odd thing is the addition of an asterisk (*).  Searching for
>>>>> 'soph' and 'soph*' return the same result (both users), but if I search
>>>>> for
>>>>> 'sophie-allen*', I get zero results, unlike when searching for just
>>>>> 'sophie-allen'. Searching for 'sophie-a*' has the same result as without
>>>>> the
>>>>> asterisk, i.e. nothing.
>>>>> 
>>>>> The JSR-170 spec doesn't say anything (that I can find) but is the
>>>>> asterisk a wildcard in the jcr:contains function or does it serve some
>>>>> other
>>>>> purpose?
>>>>> 
>>>>> Your assistance is greatly appreciated,
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Chris Dunstall | Service Support - Applications
>>>>> Technology Integration/OLE Virtual Team
>>>>> Division of Information Technology | Charles Sturt University | Bathurst,
>>>>> NSW, Australia
>>>>> 
>>>>> Ph: 02 63384818 | Fax: 02 63384181
>>>>>

Re: Problems with hyphen in JSR-170 XPath query using jcr:contains

Reply via email to