On Fri, Aug 27, 2010 at 4:35 PM, H. Wilson <[email protected]> wrote: > Chris, > > I think I can answer this one, (I'm sure Ard will confirm), but back when I > was trying to get this working, one of things I saw was on this page: > > http://wiki.apache.org/jackrabbit/IndexingConfiguration > > ...near the bottom it talks about setting Analyzers for properties in the > indexing_configuration. I think what it is getting at is, since you need it > on all properties, you might not need the indexingConfig, and you can just > add the line: > > <param name="analyzer" > value="org.apache.lucene.analysis.WhitespaceAnalyzer"/> > > to your SearchIndex targets in your repository.xml, modifying the Analyzer > to the one which suites you.
That is correct. However, I doubt whether you would want to have this analyser for all your content :-) Regards Ard > > H. Wilson > > > On 08/27/2010 08:27 AM, Dunstall, Christopher wrote: >> >> Ard, >> >> In indexing_configuration.xml, where you named the property where the >> analyzer is used (e.g. FullName), how to I set it so that it's used on all >> properties of a node? As previously said, I'm using jcr:contains because >> I >> need to search all parts of the node, so the analyzer needs to have effect >> on all properties. >> >> Regards, >> >> Chris >> >> >> On 27/08/10 2:22 AM, "H. Wilson"<[email protected]> wrote: >> >>> Finally! I have been hacking away at this here and there for months, >>> trying all different analyzers or not-using analyzers and modifying my >>> queries all to no avail! Since I always like precise examples when I am >>> searching forums, I will post my (nearly) exact solution both for others >>> and so that Ard might verify that this was indeed what he meant. >>> >>> Ard, I was hoping you could embellish a little on why we would duplicate >>> the property? (I didn't actually do it to get this working perfectly) >>> You lost me a little there, was it for efficiency? Thanks for everything! >>> >>> H. Wilson >>> >>> repository.xml (modified both SearchIndex tags to include an >>> indexingConfiguration): >>> >>> <SearchIndex >>> class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> >>> >>> .... >>> <param name="indexingConfiguration" >>> value="${rep.home}/indexing_configuration.xml"/> >>> >>> </SearchIndex> >>> >>> >>> indexing_configuration.xml: >>> >>> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> >>> <analyzers> >>> <analyzer >>> class="org.mycompany.lucene.analysis.LowerCaseKeywordAnalyzer"> >>> <property>fullName</property> >>> </analyzer> >>> </analyzers> >>> </configuration> >>> >>> >>> LowerCaseKeywordAnalyzer.java: >>> >>> package org.mycompany.lucene.analysis; >>> import java.io.Reader; >>> import org.apache.lucene.analysis.KeywordAnalyzer; >>> import org.apache.lucene.analysis.LowerCaseFilter; >>> import org.apache.lucene.analysis.TokenStream; >>> >>> public class LowerCaseKeywordAnalyzer extends KeywordAnalyzer { >>> >>> public TokenStream tokenStream ( String field, final Reader >>> reader ) { >>> TokenStream keywordTokenStream = super.tokenStream (field, >>> reader); >>> return ( new LowerCaseFilter ( keywordTokenStream ) ); >>> } >>> } >>> >>> >>> Our search class has a method which then does the following: >>> >>> public OurParameter[] getOurParameters (String searchTerm, String >>> srchField ) { //srchField in this case was fullName >>> >>> TransientRepository repository = new TransientRepository ( >>> OUR_REPO_CONFIG, OUR_REPO_LOCATION); >>> Session session = repository.login (); >>> List<Class> classes = new ArrayList<Class>(); >>> classes.add (OurParameter.class); >>> Mapper mapper = new AnnotationMapperImpl (classes); >>> ObjectContentManager ocm = new ObjectContentManagerImpl >>> (session, mapper); >>> queryManager = ocm.getQueryManager(); >>> FilterImpl filter = (FilterImpl)queryManager.createFilter >>> (OurParameter.class); >>> filter.addContains ( srchField, >>> >>> >>> org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(searchTerm).repl >>> aceAll >>> ("'","''")); >>> // (that last was replace all single ticks with two ticks, I >>> honestly can't remember why though) >>> Query query = queryManager.createQuery (filter); >>> Collection<OurParameter> resultsCollection = >>> (Collection<OurParameter>)ocm.getObjects(query); >>> >>> //convert to an array, do some other stuff, and return... >>> >>> } >>> >>> >>> >>> On 08/26/2010 10:42 AM, Ard Schrijvers wrote: >>>> >>>> On Thu, Aug 26, 2010 at 3:53 PM, H. Wilson<[email protected]> wrote: >>>>> >>>>> Ard, >>>>> >>>>> I have this same problem, however my scenario involves underscores >>>>> rather >>>>> than hyphens. Although since Chris seems to be seeing the same exact >>>> >>>> It is because hyphens just as underscores are tokens the Standard >>>> Lucene Analyzer splits on. This combined with query expansion that >>>> happens for wildcard searches in lucene causes your issuess: >>>> >>>>> behavior as I was, I imagine we are both stuck on the same issue. After >>>>> scouring the forums for the solution, and not seeing your mentioned >>>>> solution, I actually posted my problem as detailed as possible here ( >>>>> http://markmail.org/message/yh72wqd5b2hbr3j6 ) and received no >>>>> response. >>>>> jcr:like was not an option for me, in this case, as our client wanted >>>>> the >>>>> option for case-insensitive searches. Is there any chance you could >>>>> please >>>>> narrow down where-about the post was which already covered this? Thanks >>>>> for >>>> >>>> I can't seem to find my post again. But, I'll give you a quite simple >>>> solution: >>>> >>>> If you want to have the normal indexing of the property for normal >>>> searching, but also want to have the yyy* option, you need to >>>> duplicate the property also in another property. If your property, >>>> like >>>> >>>> .North.South.East.WestLand >>>> >>>> is only needed for the one you describe with wildcard searching, you >>>> only need it once. Now, suppose, your property is called myProp. >>>> >>>> To your configuration.xml add: >>>> >>>> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> >>>> <analyzers> >>>> <analyzer >>>> class="org.mycompany.lucene.analysis.LowerCaseKeywordAnalyzer"> >>>> <property>myProp</property> >>>> </analyzer> >>>> </analyzers> >>>> </configuration> >>>> >>>> Your LowerCaseKeywordAnalyzer is very simple: it extends >>>> >>>> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/KeywordAna >>>> lyzer.html >>>> and in the method >>>> >>>> TokenStream tokenStream(String fieldName,Reader reader) >>>> >>>> after calling the super, you invoke Lucene's LowerCaseFilter. >>>> >>>> That is all (after you do a re-index of your repository). Since now a >>>> -, or _ or ~ or whatever is not seen as a token to split on, but you >>>> still use lowercase filter, you can do exactly what you want. >>>> >>>> Do the words need the be split on spaces however? No problem, just add >>>> a WhiteSpaceTokenizer from lucene. It is actually pretty simple, >>>> >>>> Hope this helps, >>>> >>>> Regards Ard >>>> >>>>> your time. >>>>> >>>>> *H. Wilson* >>>>> >>>>> >>>>> On 08/26/2010 04:59 AM, Ard Schrijvers wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> You can search the archives (mail from me) for wildcard searching >>>>>> things related below. There was someone having similar issues. I >>>>>> explained the wildcard difficulties. Take a look at jcr:like for your >>>>>> usecases >>>>>> >>>>>> Regards Ard >>>>>> >>>>>> On Thu, Aug 26, 2010 at 10:19 AM, Dunstall, Christopher >>>>>> <[email protected]> wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I'm having some trouble with an XPath query, where I'm searching for >>>>>>> users with hyphens in their name. >>>>>>> >>>>>>> I'm using: >>>>>>> jcr:contains(*/*/*,'query') >>>>>>> >>>>>>> And it returns some odd results. >>>>>>> >>>>>>> I have two users, Sophie-Allen and Sophie-Anne. When I search for >>>>>>> 'sophie', I get back users back. Ok, fine, but if I search for >>>>>>> 'sophie-a' >>>>>>> (with the hyphen escaped as 'sophie\-a' as per the JSR-170 Spec) I >>>>>>> get >>>>>>> zero >>>>>>> results returned. Oddly, if I search for either 'sophie-allen' or >>>>>>> 'sophie-anne' I get the respective user details back fine. Shouldn't >>>>>>> I get >>>>>>> both users back when escaping the hyphen? Have I missed something in >>>>>>> the >>>>>>> spec? >>>>>>> >>>>>>> One other odd thing is the addition of an asterisk (*). Searching >>>>>>> for >>>>>>> 'soph' and 'soph*' return the same result (both users), but if I >>>>>>> search >>>>>>> for >>>>>>> 'sophie-allen*', I get zero results, unlike when searching for just >>>>>>> 'sophie-allen'. Searching for 'sophie-a*' has the same result as >>>>>>> without >>>>>>> the >>>>>>> asterisk, i.e. nothing. >>>>>>> >>>>>>> The JSR-170 spec doesn't say anything (that I can find) but is the >>>>>>> asterisk a wildcard in the jcr:contains function or does it serve >>>>>>> some >>>>>>> other >>>>>>> purpose? >>>>>>> >>>>>>> Your assistance is greatly appreciated, >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Chris Dunstall | Service Support - Applications >>>>>>> Technology Integration/OLE Virtual Team >>>>>>> Division of Information Technology | Charles Sturt University | >>>>>>> Bathurst, >>>>>>> NSW, Australia >>>>>>> >>>>>>> Ph: 02 63384818 | Fax: 02 63384181 >>>>>>> >> >
