Ard, In indexing_configuration.xml, where you named the property where the analyzer is used (e.g. FullName), how to I set it so that it's used on all properties of a node? As previously said, I'm using jcr:contains because I need to search all parts of the node, so the analyzer needs to have effect on all properties.
Regards, Chris On 27/08/10 2:22 AM, "H. Wilson" <[email protected]> wrote: > Finally! I have been hacking away at this here and there for months, > trying all different analyzers or not-using analyzers and modifying my > queries all to no avail! Since I always like precise examples when I am > searching forums, I will post my (nearly) exact solution both for others > and so that Ard might verify that this was indeed what he meant. > > Ard, I was hoping you could embellish a little on why we would duplicate > the property? (I didn't actually do it to get this working perfectly) > You lost me a little there, was it for efficiency? Thanks for everything! > > H. Wilson > > repository.xml (modified both SearchIndex tags to include an > indexingConfiguration): > > <SearchIndex > class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> > > .... > <param name="indexingConfiguration" > value="${rep.home}/indexing_configuration.xml"/> > > </SearchIndex> > > > indexing_configuration.xml: > > <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> > <analyzers> > <analyzer > class="org.mycompany.lucene.analysis.LowerCaseKeywordAnalyzer"> > <property>fullName</property> > </analyzer> > </analyzers> > </configuration> > > > LowerCaseKeywordAnalyzer.java: > > package org.mycompany.lucene.analysis; > import java.io.Reader; > import org.apache.lucene.analysis.KeywordAnalyzer; > import org.apache.lucene.analysis.LowerCaseFilter; > import org.apache.lucene.analysis.TokenStream; > > public class LowerCaseKeywordAnalyzer extends KeywordAnalyzer { > > public TokenStream tokenStream ( String field, final Reader > reader ) { > TokenStream keywordTokenStream = super.tokenStream (field, > reader); > return ( new LowerCaseFilter ( keywordTokenStream ) ); > } > } > > > Our search class has a method which then does the following: > > public OurParameter[] getOurParameters (String searchTerm, String > srchField ) { //srchField in this case was fullName > > TransientRepository repository = new TransientRepository ( > OUR_REPO_CONFIG, OUR_REPO_LOCATION); > Session session = repository.login (); > List<Class> classes = new ArrayList<Class>(); > classes.add (OurParameter.class); > Mapper mapper = new AnnotationMapperImpl (classes); > ObjectContentManager ocm = new ObjectContentManagerImpl > (session, mapper); > queryManager = ocm.getQueryManager(); > FilterImpl filter = (FilterImpl)queryManager.createFilter > (OurParameter.class); > filter.addContains ( srchField, > > org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(searchTerm).repl > aceAll > ("'","''")); > // (that last was replace all single ticks with two ticks, I > honestly can't remember why though) > Query query = queryManager.createQuery (filter); > Collection<OurParameter> resultsCollection = > (Collection<OurParameter>)ocm.getObjects(query); > > //convert to an array, do some other stuff, and return... > > } > > > > On 08/26/2010 10:42 AM, Ard Schrijvers wrote: >> On Thu, Aug 26, 2010 at 3:53 PM, H. Wilson<[email protected]> wrote: >>> Ard, >>> >>> I have this same problem, however my scenario involves underscores rather >>> than hyphens. Although since Chris seems to be seeing the same exact >> It is because hyphens just as underscores are tokens the Standard >> Lucene Analyzer splits on. This combined with query expansion that >> happens for wildcard searches in lucene causes your issuess: >> >>> behavior as I was, I imagine we are both stuck on the same issue. After >>> scouring the forums for the solution, and not seeing your mentioned >>> solution, I actually posted my problem as detailed as possible here ( >>> http://markmail.org/message/yh72wqd5b2hbr3j6 ) and received no response. >>> jcr:like was not an option for me, in this case, as our client wanted the >>> option for case-insensitive searches. Is there any chance you could please >>> narrow down where-about the post was which already covered this? Thanks for >> I can't seem to find my post again. But, I'll give you a quite simple >> solution: >> >> If you want to have the normal indexing of the property for normal >> searching, but also want to have the yyy* option, you need to >> duplicate the property also in another property. If your property, >> like >> >> .North.South.East.WestLand >> >> is only needed for the one you describe with wildcard searching, you >> only need it once. Now, suppose, your property is called myProp. >> >> To your configuration.xml add: >> >> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> >> <analyzers> >> <analyzer >> class="org.mycompany.lucene.analysis.LowerCaseKeywordAnalyzer"> >> <property>myProp</property> >> </analyzer> >> </analyzers> >> </configuration> >> >> Your LowerCaseKeywordAnalyzer is very simple: it extends >> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/KeywordAna >> lyzer.html >> and in the method >> >> TokenStream tokenStream(String fieldName,Reader reader) >> >> after calling the super, you invoke Lucene's LowerCaseFilter. >> >> That is all (after you do a re-index of your repository). Since now a >> -, or _ or ~ or whatever is not seen as a token to split on, but you >> still use lowercase filter, you can do exactly what you want. >> >> Do the words need the be split on spaces however? No problem, just add >> a WhiteSpaceTokenizer from lucene. It is actually pretty simple, >> >> Hope this helps, >> >> Regards Ard >> >>> your time. >>> >>> *H. Wilson* >>> >>> >>> On 08/26/2010 04:59 AM, Ard Schrijvers wrote: >>>> Hello, >>>> >>>> You can search the archives (mail from me) for wildcard searching >>>> things related below. There was someone having similar issues. I >>>> explained the wildcard difficulties. Take a look at jcr:like for your >>>> usecases >>>> >>>> Regards Ard >>>> >>>> On Thu, Aug 26, 2010 at 10:19 AM, Dunstall, Christopher >>>> <[email protected]> wrote: >>>>> Hi all, >>>>> >>>>> I'm having some trouble with an XPath query, where I'm searching for >>>>> users with hyphens in their name. >>>>> >>>>> I'm using: >>>>> jcr:contains(*/*/*,'query') >>>>> >>>>> And it returns some odd results. >>>>> >>>>> I have two users, Sophie-Allen and Sophie-Anne. When I search for >>>>> 'sophie', I get back users back. Ok, fine, but if I search for 'sophie-a' >>>>> (with the hyphen escaped as 'sophie\-a' as per the JSR-170 Spec) I get >>>>> zero >>>>> results returned. Oddly, if I search for either 'sophie-allen' or >>>>> 'sophie-anne' I get the respective user details back fine. Shouldn't I get >>>>> both users back when escaping the hyphen? Have I missed something in the >>>>> spec? >>>>> >>>>> One other odd thing is the addition of an asterisk (*). Searching for >>>>> 'soph' and 'soph*' return the same result (both users), but if I search >>>>> for >>>>> 'sophie-allen*', I get zero results, unlike when searching for just >>>>> 'sophie-allen'. Searching for 'sophie-a*' has the same result as without >>>>> the >>>>> asterisk, i.e. nothing. >>>>> >>>>> The JSR-170 spec doesn't say anything (that I can find) but is the >>>>> asterisk a wildcard in the jcr:contains function or does it serve some >>>>> other >>>>> purpose? >>>>> >>>>> Your assistance is greatly appreciated, >>>>> >>>>> Regards, >>>>> >>>>> Chris Dunstall | Service Support - Applications >>>>> Technology Integration/OLE Virtual Team >>>>> Division of Information Technology | Charles Sturt University | Bathurst, >>>>> NSW, Australia >>>>> >>>>> Ph: 02 63384818 | Fax: 02 63384181 >>>>>
