Re: Sorting fields of text_general fieldType
Give us some pairs of titles which sort the wrong way. On Thu, Aug 2, 2012 at 10:06 AM, Anupam Bhattacharya anupam...@gmail.com wrote: The approach used to work perfectly. But recently i realized that it is not working for more than 30 indexed records. I am using SOLR 3.5 version. Is there another approach to SORT a title field in proper alphabetical order irrespective of Lower case and Upper case. Regards Anupam On Thu, May 17, 2012 at 4:32 PM, Ahmet Arslan iori...@yahoo.com wrote: The title sort works in a strange manner because the SOLR server treats title string based on Upper Case or Lower Case String. Thus if we sort in ascending order, first the title with numeric shows up then the titles in alphabetical order which starts with Upper Case after that the titles starting with Lowercase. The title field is indexed as text_general fieldtype. field name=title type=text_general indexed=true stored=true/ Please see Otis' response http://search-lucene.com/m/uDxTF1scW0d2 Simply create an additional field named title_sortable with the following type !-- lowercases the entire field value, keeping it as a single token. -- fieldType name=lowercase class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType Populate it via copyField directive : copyField source=title dest=title_sortable maxChars=N/ then sort=title_sortable asc -- Lance Norskog goks...@gmail.com
Re: Sorting fields of text_general fieldType
Few titles are as following: Embattled JPMorgan boss survives power challenge - Jakarta Globe Kitten Survives 6500-Mile Trip in China-US Container - Jakarta Globe Guard survives hail of bullets - Jakarta Post On Fri, Aug 3, 2012 at 1:09 PM, Lance Norskog goks...@gmail.com wrote: Give us some pairs of titles which sort the wrong way. On Thu, Aug 2, 2012 at 10:06 AM, Anupam Bhattacharya anupam...@gmail.com wrote: The approach used to work perfectly. But recently i realized that it is not working for more than 30 indexed records. I am using SOLR 3.5 version. Is there another approach to SORT a title field in proper alphabetical order irrespective of Lower case and Upper case. Regards Anupam On Thu, May 17, 2012 at 4:32 PM, Ahmet Arslan iori...@yahoo.com wrote: The title sort works in a strange manner because the SOLR server treats title string based on Upper Case or Lower Case String. Thus if we sort in ascending order, first the title with numeric shows up then the titles in alphabetical order which starts with Upper Case after that the titles starting with Lowercase. The title field is indexed as text_general fieldtype. field name=title type=text_general indexed=true stored=true/ Please see Otis' response http://search-lucene.com/m/uDxTF1scW0d2 Simply create an additional field named title_sortable with the following type !-- lowercases the entire field value, keeping it as a single token. -- fieldType name=lowercase class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType Populate it via copyField directive : copyField source=title dest=title_sortable maxChars=N/ then sort=title_sortable asc -- Lance Norskog goks...@gmail.com -- Thanks Regards Anupam Bhattacharya
Re: How to update a core using SolrJ
Hi Roy, the example URL is correct if your core is available under that name (configured in solr.xml) and has started without errors. I think I observed that it makes a different whether there is a trailing slash or not (but that was a while ago, so maybe that has changed). If you can reach that URL via browser but SolrJ with exactly the same URL cannot, then - maybe the SolrJ application is running in a different environment? - there is authentication setup and you are authenticated via browser but SolrJ does not know of it - ...? Some log output would be definitely helpful. Cheers, Chantal Am 02.08.2012 um 22:42 schrieb Benjamin, Roy: I'm using SolrJ and CommonsHttpSolrServer. Before moving to multi-core configuration I constructed CommonsHttpSolrServer from http://localhost:8080/solr;, this worked fine. Now I have two cores. I have tried contructing CommonsHttpSolrServer from http://localhost:8080/solr/core0; but this does not work. The resource is not found when I try to add docs. How do I update Solr using SolrJ in a multi-core configuration? What is the correct form for the CommonsHttpSolrServer URL? Thanks! Roy
Tuning caching of geofilt queries
Hey all, Our production system is heavily optimized for caching and nearly all parts of queries are satisfied by filter caches. The only filter that varies a lot from user to user is the location and distance. Currently we use the default location field type and index lat/long coordinates as we get them from Geonames and GMaps with varying decimal precision. My question is: Does it make sense to round these coordinates (a) while indexing and/or (b) while querying to optimize cache hits? Our maximum required resolution for geo queries is 1km and we can tolerate minor errors so I could round to two decimal points for most of our queries. E.g. Instead of querying like this fq=_query_:{!geofilt sfield=user.location_p pt=48.19815,16.3943 d=50.0}sfield=user.location_ppt=48.1981,16.394 we would round to fq=_query_:{!geofilt sfield=user.location_p pt=48.19,16.39 d=50.0}sfield=user.location_ppt=48.19,16.39 Any feedback would be greatly appreciated. Cheers, Thomas
Re: split on white space and then EdgeNGramFilterFactory
Yes this works, Thank you. Regards Rajani On Thu, Aug 2, 2012 at 6:04 PM, Jack Krupansky j...@basetechnology.comwrote: Only do the ngram filter at index time. So, add a query-time analyzer to that field type but without the ngram filter. Also, add debugQuery to your query request to see what Lucene query is generated. And, use the Solr admin analyzer to validate both index-time and query-time analysis of your terms. -- Jack Krupansky -Original Message- From: Rajani Maski Sent: Thursday, August 02, 2012 7:26 AM To: solr-user@lucene.apache.org Subject: split on white space and then EdgeNGramFilterFactory Hi, I wanted to do split on white space and then apply EdgeNGramFilterFactory. Example : A field in a document has text content : smart phone, i24 xpress exchange offer, 500 dollars smart s sm sma smar smart phone p ph pho phon phone i24 i i2 i24 xpress x xp xpr xpre xpres xpress so on. If I search on xpres I should get this document record matched What field type can support this? I was trying with below one but was not able to achieve the above requirement. fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.**WhitespaceTokenizerFactory/ filter class=solr.**EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 / filter class=solr.**LowerCaseFilterFactory/ /analyzer /type Any suggestions? Thanks, Rajani
search hit on multivalued fields
I have a multivalued field Tex which is indexed , for example : F1: some value F2: some value Text = ( content of f1,f2) When user search , I am checking only a Text field but i would also need to display to users which Field ( F1 or F2 ) resulted the search hit Is it possible in SOLR ? -- Thanks, *Nipen Mark *
Highlighting error InvalidTokenOffsetsException: Token oedipus exceeds length of provided text sized 11
I have an autocomplete index that I return highlighting information for but am getting an error with certain search strings and fields on Solr 3.5. I’ve narrowed it down to a specific field matching with a specific search string. And I’ve tried making a few different changes to the schema and rebuilding but so far I cannot get the error to go away. The field that is failing is an ngram indexed field for matching on the start of any word. Any help would be appreciated. The text being searched for is “ant” (without quotes). The field value that is matching and causing the error is “Anti-Å’dipus” (again without quotes). The field schema is (additional fields and field types removed): types fieldType name=autocomplete_ngram class=solr.TextField analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ charFilter class=solr.PatternReplaceCharFilterFactory pattern=(\|) replaceWith=or replace=all/ charFilter class=solr.PatternReplaceCharFilterFactory pattern=([]) replaceWith=and replace=all/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory maxGramSize=20 minGramSize=2/ filter class=solr.PatternReplaceFilterFactory pattern=([^\w\d\*æøåÆØÅ ]) replacement= replace=all/ /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ charFilter class=solr.PatternReplaceCharFilterFactory pattern=(\|) replaceWith=or replace=all/ charFilter class=solr.PatternReplaceCharFilterFactory pattern=([]) replaceWith=and replace=all/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=([^\w\d\*æøåÆØÅ ]) replacement= replace=all/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all/ /analyzer /fieldType /types fields field name=ng type=autocomplete_ngram indexed=true stored=true omitNorms=true omitTermFreqAndPositions=true/ /fields Things I’ve tried changing in the above are having the PatternReplaceCharFilterFactory charFilters be PatternReplaceFilterFactory filters instead, and moving around the order of the the filters (particularly moving the PatternReplaceFilterFactory filters to the top of bottom of the filters), and completely removing the WordDelimiterFilterFactory and the PatternReplaceFilterFactory that has the pattern=([^\w\d\*æøåÆØÅ ]). No matter what I do though I still get errors (sometimes it seems to change matched values that it gets the error on though, but the one included here seems to be the most consistent). Highlighting is configured as: requestHandler name=ac class=solr.SearchHandler default=true lst name=defaults str name=defTypeedismax/str str name=wtjson/str int name=rows10/int bool name=hltrue/bool str name=hl.flng/str int name=hl.snippets4/int bool name=hl.requireFieldMatchtrue/bool int name=hl.fragsize2/int str name=flng score/str /lst /requestHandler When I do a field analysis using that search term and field value I get: *Index Analyzer* *org.apache.solr.analysis.MappingCharFilterFactory {mapping=mapping-ISOLatin1Accent.txt, luceneMatchVersion=LUCENE_35}* *text* Anti-A’dipus *org.apache.solr.analysis.PatternReplaceCharFilterFactory {replace=all, pattern=(\|), replaceWith=or, luceneMatchVersion=LUCENE_35}* *text* Anti-A’dipus *org.apache.solr.analysis.PatternReplaceCharFilterFactory {replace=all, pattern=([]), replaceWith=and, luceneMatchVersion=LUCENE_35}* *text* Anti-A’dipus *org.apache.solr.analysis.StandardTokenizerFactory {luceneMatchVersion=LUCENE_35}* *position* 1 2 *term text* Anti A’dipus *startOffset* 0 5 *endOffset* 4 12 *type* ALPHANUM ALPHANUM *org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1, generateNumberParts=1, catenateWords=0, luceneMatchVersion=LUCENE_35, generateWordParts=1, catenateAll=0, catenateNumbers=0}* *position* 1 2 3 *term text* Anti A dipus *startOffset* 0 5 7 *endOffset* 4 6 12 *type* ALPHANUM ALPHANUM ALPHANUM *org.apache.solr.analysis.LowerCaseFilterFactory {luceneMatchVersion=LUCENE_35}* *position* 1 2 3 *term text* anti a dipus *startOffset* 0 5 7 *endOffset* 4 6 12 *type* ALPHANUM ALPHANUM ALPHANUM *org.apache.solr.analysis.EdgeNGramFilterFactory {maxGramSize=20, minGramSize=2, luceneMatchVersion=LUCENE_35}* *position* 1 2 3 4 5 6 7 *term text* an ant anti di dip dipu dipus *startOffset* 0 0 0 7 7 7 7 *endOffset* 2 3 4 9
Re: Highlighting error InvalidTokenOffsetsException: Token oedipus exceeds length of provided text sized 11
On Fri, Aug 3, 2012 at 12:38 AM, Justin Engelman jus...@smalldemons.com wrote: I have an autocomplete index that I return highlighting information for but am getting an error with certain search strings and fields on Solr 3.5. try the 3.6 release: * LUCENE-3642, SOLR-2891, LUCENE-3717: Fixed bugs in CharTokenizer, n-gram tokenizers/filters, compound token filters, thai word filter, icutokenizer, pattern analyzer, wikipediatokenizer, and smart chinese where they would create invalid offsets in some situations, leading to problems in highlighting. -- lucidimagination.com
Re: Special suggestions requirement
I could be crazy, but it sounds to me like you need a trie, not a search index: http://en.wikipedia.org/wiki/Trie But in any case, what you want to do should be achievable. It seems like you need to do EdgeNgrams and facet on the results, where facet.counts 1 to exclude the actual part numbers, since each of those would be distinct. I'm on the train right now, so I can't test this. :\ Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Thu, Aug 2, 2012 at 9:19 PM, Lochschmied, Alexander alexander.lochschm...@vishay.com wrote: Even with prefix query, I do not get ABCD02 or any ABCD02... back. BTW: EdgeNGramFilterFactory is used on the field we are getting the suggestions/spellchecks from. I think the problem is that there are a lot of different part numbers starting with ABCD and every part number has the same length. I showed only 4 in the example but there might be thousands. Here are some full part number examples that might be in the index: ABCD110040 ABCD00 ABCD99 ABCD155500 ... I'm looking for a way to make Solr return distinct list of fixed length substrings of them, e.g. if ABCD is entered, I would need ABCD00 ABCD01 ABCD02 ABCD03 ... ABCD99 Then if user chose ABCD42 from the suggestions, I would need ABCD4201 ABCD4202 ABCD4203 ... ABCD4299 and so on. I would be able to do some post processing if needed or adjust the schema or indexing process. But the key functionality I need from Solr is returning distinct set of those suggestions where only the last two characters change. All of the available combinations of those last two characters must be considered though. I need to show alpha-numerically sorted suggestions; the smallest value first. Thanks, Alexander -Ursprüngliche Nachricht- Von: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] Gesendet: Donnerstag, 2. August 2012 15:02 An: solr-user@lucene.apache.org Betreff: Re: Special suggestions requirement In this case, we're storing the overall value length and sorting it on that, then alphabetically. Also, how are your queries fashioned? If you're doing a prefix query, everything that matches it should score the same. If you're only doing a prefix query, you might need to add a term for exact matches as well to get them to show up. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn't a Game On Wed, Aug 1, 2012 at 9:58 PM, Lochschmied, Alexander alexander.lochschm...@vishay.com wrote: Is there a way to offer distinct, alphabetically sorted, fixed length options? I am trying to suggest part numbers and I'm currently trying to do it with the spellchecker component. Let's say ABCD was entered and we have indexed part numbers like ABCD ABCD2000 ABCD2100 ABCD2200 ... I would like to have 2 characters suggested always, so for ABCD, it should suggest ABCD00 ABCD20 ABCD21 ABCD22 ... No smart sorting is needed, just alphabetically sorting. The problem is that for example 00 (or ABCD00) may not be suggested currently as it doesn't score high enough. But we are really trying to get all distinct values starting from the smallest (up to a certain number of suggestions). I was looking already at custom comparator class option. But this would probably not work as I would need more information to implement it there (like at least the currently entered search term, ABCD in the example). Thanks, Alexander
Re: search hit on multivalued fields
You can include the fields in your fl list and then check those field values explicitly in the client, or you could add debugQuery=true to your request and check for which field the term matched in. The latter requires that you have the analyzed term (or check for closest matching term). -- Jack Krupansky -Original Message- From: Mark , N Sent: Friday, August 03, 2012 5:51 AM To: solr-user@lucene.apache.org Subject: search hit on multivalued fields I have a multivalued field Tex which is indexed , for example : F1: some value F2: some value Text = ( content of f1,f2) When user search , I am checking only a Text field but i would also need to display to users which Field ( F1 or F2 ) resulted the search hit Is it possible in SOLR ? -- Thanks, *Nipen Mark *
Re: Adding new field before import- using post.jar
1. Google for XSLT tools. 2. Write a script that loads the XML, adds the fields, and writes the updated XML. 3. Same as #2, but using Java. 4. If the fields are constants, set default values in the schema and then the documents will automatically get those values when added. Take the default value attributes out of the schema once you have input documents that actually have the new field values. 5. Hire a consultant. -- Jack Krupansky -Original Message- From: Rajani Maski Sent: Friday, August 03, 2012 5:37 AM To: solr-user@lucene.apache.org Subject: Adding new field before import- using post.jar Hi all, I have xmls in a folder in the standard solr xml format. I was simply using SimplePostTool.java to import these xmls to solr. Now I have to add 3 new fields to each document in the xml before doing a post. What can be the effective way for doing this? Thanks Regards Rajani
Re: Adding new field before import- using post.jar
I hate to also add: 6. Use DataImportHandler It can index Solr XML, and could add field values, either statically or by template glue if you need to combine multiple field values somehow. And in 4.0 you'll be able to use: 7: scripting update processor Erik On Aug 3, 2012, at 10:51 , Jack Krupansky wrote: 1. Google for XSLT tools. 2. Write a script that loads the XML, adds the fields, and writes the updated XML. 3. Same as #2, but using Java. 4. If the fields are constants, set default values in the schema and then the documents will automatically get those values when added. Take the default value attributes out of the schema once you have input documents that actually have the new field values. 5. Hire a consultant. -- Jack Krupansky -Original Message- From: Rajani Maski Sent: Friday, August 03, 2012 5:37 AM To: solr-user@lucene.apache.org Subject: Adding new field before import- using post.jar Hi all, I have xmls in a folder in the standard solr xml format. I was simply using SimplePostTool.java to import these xmls to solr. Now I have to add 3 new fields to each document in the xml before doing a post. What can be the effective way for doing this? Thanks Regards Rajani
Re: synonym file
The Lucene FST guys made a big improvement in synonym filtering in Lucene/Solr 4.0 using FSTs. Or are you already using that? Or if you are stuck with pre-4.0, you could do a preprocessor that efficiently generates boolean queries for the synonym expansions. That should give you more decent query times, assuming you develop a decent synonym lookup filter. Maybe you could backport the 4.0 FST code, or at least use the same techniques for your own preprocessor. -- Jack Krupansky -Original Message- From: Peyman Faratin Sent: Friday, August 03, 2012 12:56 AM To: solr-user@lucene.apache.org Subject: synonym file Hi I have a (23M) synonym file that takes a long time (3 or so minutes) to load and once included seems to adversely affect the QTime of the application by approximately 4 orders of magnitude. Any advise on how to load faster and lower the QT would be much appreciated. best Peyman=
Re: synonym file
Actually FST (and SynFilter based on it) was backported to 3.x. Mike McCandless http://blog.mikemccandless.com On Fri, Aug 3, 2012 at 11:28 AM, Jack Krupansky j...@basetechnology.com wrote: The Lucene FST guys made a big improvement in synonym filtering in Lucene/Solr 4.0 using FSTs. Or are you already using that? Or if you are stuck with pre-4.0, you could do a preprocessor that efficiently generates boolean queries for the synonym expansions. That should give you more decent query times, assuming you develop a decent synonym lookup filter. Maybe you could backport the 4.0 FST code, or at least use the same techniques for your own preprocessor. -- Jack Krupansky -Original Message- From: Peyman Faratin Sent: Friday, August 03, 2012 12:56 AM To: solr-user@lucene.apache.org Subject: synonym file Hi I have a (23M) synonym file that takes a long time (3 or so minutes) to load and once included seems to adversely affect the QTime of the application by approximately 4 orders of magnitude. Any advise on how to load faster and lower the QT would be much appreciated. best Peyman=
Re: synonym file
I see that the new FSTSynonymFilterFactory is only delegated for Lucene 3.4 and later. I vaguely recall that there was also a recent improvement in loading of files for filters. -- Jack Krupansky -Original Message- From: Michael McCandless Sent: Friday, August 03, 2012 11:32 AM To: solr-user@lucene.apache.org Subject: Re: synonym file Actually FST (and SynFilter based on it) was backported to 3.x. Mike McCandless http://blog.mikemccandless.com On Fri, Aug 3, 2012 at 11:28 AM, Jack Krupansky j...@basetechnology.com wrote: The Lucene FST guys made a big improvement in synonym filtering in Lucene/Solr 4.0 using FSTs. Or are you already using that? Or if you are stuck with pre-4.0, you could do a preprocessor that efficiently generates boolean queries for the synonym expansions. That should give you more decent query times, assuming you develop a decent synonym lookup filter. Maybe you could backport the 4.0 FST code, or at least use the same techniques for your own preprocessor. -- Jack Krupansky -Original Message- From: Peyman Faratin Sent: Friday, August 03, 2012 12:56 AM To: solr-user@lucene.apache.org Subject: synonym file Hi I have a (23M) synonym file that takes a long time (3 or so minutes) to load and once included seems to adversely affect the QTime of the application by approximately 4 orders of magnitude. Any advise on how to load faster and lower the QT would be much appreciated. best Peyman=
Can't get synonyms working
Hi, I'm trying to work out why Synonyms won't work for my installation of Solr. Its a basic install (just using the example set, and have tweaked the fields to what I need). The files are all in: (schema.xml, synonyms.txt etc), but for some reason its not working: /var/home/site/solr/example/solr/conf For the text_ws field I have this set: !-- A text field that only splits on whitespace for exact matching of words -- fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer analyzer type=query !--tokenizer class=solr.StandardTokenizerFactory/ -- tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType ...and for the related fields I'm trying to search with the synonym file: field name=title type=text_wsindexed=true stored=true multiValued=false required=true WhitespaceTokenizerFactory=true / field name=description type=text_wsindexed=true stored=true multiValued=false required=false WhitespaceTokenizerFactory=true / field name=keywords type=text_wsindexed=true stored=true multiValued=false required=false WhitespaceTokenizerFactory=true / In my synonynm file I have this example: car = mascot mascot = car ..and in Solr I actually have a result with the description of str name=descriptionbird hawk mascot 002 (u65bsag)/str ...surely my setup should be showing that result? I've been doing my head in over this - so any advice would be much appreciated :) TIA! -- Andy Newby a...@ultranerds.co.uk
Re: Can't get synonyms working
You used the replacement rule format. You probably simply want the equivalent rule format: car, mascot And, you probably only want to do this at index time. This would mean that both car and mascot will be indexed at any position where either term occurs. And at query time you don't need a synonym filter since both terms are already in the index. So, split your analyzer into an index analyzer and a query analyzer. The former would have the synonym filter, the latter would not. -- Jack Krupansky -Original Message- From: Andy Newby Sent: Friday, August 03, 2012 12:08 PM To: solr-user@lucene.apache.org Subject: Can't get synonyms working Hi, I'm trying to work out why Synonyms won't work for my installation of Solr. Its a basic install (just using the example set, and have tweaked the fields to what I need). The files are all in: (schema.xml, synonyms.txt etc), but for some reason its not working: /var/home/site/solr/example/solr/conf For the text_ws field I have this set: !-- A text field that only splits on whitespace for exact matching of words -- fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer analyzer type=query !--tokenizer class=solr.StandardTokenizerFactory/ -- tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType ...and for the related fields I'm trying to search with the synonym file: field name=title type=text_wsindexed=true stored=true multiValued=false required=true WhitespaceTokenizerFactory=true / field name=description type=text_wsindexed=true stored=true multiValued=false required=false WhitespaceTokenizerFactory=true / field name=keywords type=text_wsindexed=true stored=true multiValued=false required=false WhitespaceTokenizerFactory=true / In my synonynm file I have this example: car = mascot mascot = car ..and in Solr I actually have a result with the description of str name=descriptionbird hawk mascot 002 (u65bsag)/str ...surely my setup should be showing that result? I've been doing my head in over this - so any advice would be much appreciated :) TIA! -- Andy Newby a...@ultranerds.co.uk
Using Solr-319 with Solr 3.6.0
Hi, I am trying to implement a solr based search engine for japanese language. I am having trouble adding synonym supprt for japanese language. I am using text_ja for my indexed text and I have the following entry in schema.xml for it. fieldType name=text_ja class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer type=indextokenizer class=solr.JapaneseTokenizerFactory mode=search/ filter class=solr.SynonymFilterFactory synonyms=synonyms_ja.txt ignoreCase=true expand=true tokenFactory=solr.JapaneseTokenizerFactory randomAttribute=randomValue/ filter class=solr.JapaneseBaseFormFilterFactory/ filter class=solr.JapanesePartOfSpeechStopFilterFactory tags=lang/stoptags_ja.txt enablePositionIncrements=true/ filter class=solr.CJKWidthFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_ja.txt enablePositionIncrements=true/ filter class=solr.JapaneseKatakanaStemFilterFactory minimumLength=4/ filter class=solr.LowerCaseFilterFactory//analyzer analyzer type=querytokenizer class=solr.JapaneseTokenizerFactory mode=search/ filter class=solr.JapaneseBaseFormFilterFactory/ filter class=solr.JapanesePartOfSpeechStopFilterFactory tags=lang/stoptags_ja.txt enablePositionIncrements=true/ filter class=solr.CJKWidthFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_ja.txt enablePositionIncrements=true/ filter class=solr.JapaneseKatakanaStemFilterFactory minimumLength=4/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Here is my synonym.txt 国民経済計算, 国内総生産 I verified the issue through solr analysis webpage and giving type text_ja and the text in the index box as 国民経済計算 and the query box as 国内総生産. Ideally, the synonym filter should apply the synonym at index level. However, it does not. The reason for this is that the synonyms are not tokenized even after specifying the tokenFactory along with the synonymfilter in schema.xml. I verified this by changing the synonym file to 国民, 国内. Now, when I specify the text as 国民 and query as 国内, I get a match because, the tokenizer does not tokenize the text and the synonyms match exactly. I am suing Solr 3.6 and the solr-319 was resolved in 2008 and should have been a part of 3.6.0. Is there any reason why solr-319 is not at work on my solr? Do I have to apply the patch or is there some setting that I can change? Thank you so much for your time and cooperation Himanshu Jindal
IndexDocValues in Solr
Changing the Subject Line to make it easier to understand the topic of the message is there any plan to expose IndexDocValues as part of Solr 4? Any thoughts? -Saroj On Thu, Aug 2, 2012 at 5:10 PM, roz dev rozde...@gmail.com wrote: As we all know, FIeldCache can be costly if we have lots of documents and lots of fields to sort on. I see that IndexDocValues are better at sorting and faceting, w.r.t Memory usage Is there any plan to use IndexDocValues in SOLR for doing sorting and faceting? Will SOLR 4 or 5 have indexDocValues? Is there an easy way to use IndexDocValues in Solr even though it is not implemented yet? -Saroj
Re: Using Solr-319 with Solr 3.6.0
On Fri, Aug 3, 2012 at 12:57 PM, Himanshu Jindal himanshujin...@gmail.com wrote: filter class=solr.SynonymFilterFactory synonyms=synonyms_ja.txt ignoreCase=true expand=true tokenFactory=solr.JapaneseTokenizerFactory randomAttribute=randomValue/ I think you have a typo here, it should be tokenizerFactory, not tokenFactory -- lucidimagination.com
Re: search hit on multivalued fields
Mark, It's not clear what are you want to do. Let's say you requested rows=100 and found 1000 docs. What do you need to show in addition to search result? - matched field on every of 100 snippets - or 400 with F1 and 600 with F2 - or what On Fri, Aug 3, 2012 at 6:41 PM, Jack Krupansky j...@basetechnology.comwrote: You can include the fields in your fl list and then check those field values explicitly in the client, or you could add debugQuery=true to your request and check for which field the term matched in. The latter requires that you have the analyzed term (or check for closest matching term). -- Jack Krupansky -Original Message- From: Mark , N Sent: Friday, August 03, 2012 5:51 AM To: solr-user@lucene.apache.org Subject: search hit on multivalued fields I have a multivalued field Tex which is indexed , for example : F1: some value F2: some value Text = ( content of f1,f2) When user search , I am checking only a Text field but i would also need to display to users which Field ( F1 or F2 ) resulted the search hit Is it possible in SOLR ? -- Thanks, *Nipen Mark * -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
SolrCloud - load balancing
Do anyone know if query using CommonsHttpSolrServer on SolrCloud is automatically load balanced? I am trying to load test using solrmeter on one if the node I am seeing all the nodes seems to be hit. Any clues -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-load-balancing-tp3999143.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud - load balancing
You can use CloudSolrServer - SolrJ client class to communicate with SolrCloud. Instances of this class communicate with Zookeeper to discover Solr endpoints for SolrCloud collections, and then use the LBHttpSolrServer to issue requests. http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/client/solrj/impl/CloudSolrServer.html -- Jack Krupansky -Original Message- From: sausarkar Sent: Friday, August 03, 2012 7:29 PM To: solr-user@lucene.apache.org Subject: SolrCloud - load balancing Do anyone know if query using CommonsHttpSolrServer on SolrCloud is automatically load balanced? I am trying to load test using solrmeter on one if the node I am seeing all the nodes seems to be hit. Any clues -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-load-balancing-tp3999143.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud - load balancing
actually I noticing that the CommonsHttpSolrServer seems to load balancing by hitting all servers in the cluster, just wanted to confirm if that is the case. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-load-balancing-tp3999143p3999145.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Configuration for distributed search
Hmmm, the zero results could be that you're searching against the default text field and you don't have conway in that field. the default search field has recently been deprecated, so try specifying a field in your search The debugQuery=on worked fine for me, so I'm not sure what's happening there... Sorry I can't be more help Erick On Wed, Aug 1, 2012 at 1:59 PM, Michelle Talley talley.miche...@gmail.com wrote: I am having a problem using the SOLR distributed search with version 3.6.0 and version 3.6.1. I was able to recreate the distributed search example from this wiki: http://wiki.apache.org/solr/DistributedSearch and it worked great! However, when I attempted to use my own schema in place of the one provided with the example, I ran into problems. I did not change anything about the configuration except the schema.xml. Specifically, for each shard, I deleted the solr/data/index directory and then replaced the example schema.xml with my own schema.xml in the solr/conf directory. Then, I populated each shard with 1 document each. I am able to search each shard independently. However, when I attempt to use the shards parameter to initiate a distributed search, I get an error, which I have included in this email. I am also attaching my schema.xml and solrconfig.xml. The solrconfig.xml matches exactly the solrconfig.xml that comes with the example. This command to each shard returns one document from each shard. curl 'http://localhost:8983/solr/select?debugQuery=trueindent=trueq=conway' curl 'http://localhost:7574/solr/select?debugQuery=trueindent=trueq=conway' This distributed search command returns 0 documents: curl 'http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=conway' The same distributed search command with debugQuery=true, returns an error. curl 'http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrdebugQuery=trueindent=trueq=conway' Error: html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 null java.lang.NullPointerException at org.apache.solr.common.util.NamedList.nameValueMapToList(NamedList.java:110) at org.apache.solr.common.util.NamedList.lt;initgt;(NamedList.java:75) at org.apache.solr.common.util.SimpleOrderedMap.lt;initgt;(SimpleOrderedMap.java:58) at org.apache.solr.handler.component.DebugComponent.finishStage(DebugComponent.java:130) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) /title /head bodyh2HTTP ERROR 500/h2 pProblem accessing /solr/select. Reason: prenull java.lang.NullPointerException at org.apache.solr.common.util.NamedList.nameValueMapToList(NamedList.java:110) at org.apache.solr.common.util.NamedList.lt;initgt;(NamedList.java:75) at org.apache.solr.common.util.SimpleOrderedMap.lt;initgt;(SimpleOrderedMap.java:58) at org.apache.solr.handler.component.DebugComponent.finishStage(DebugComponent.java:130) at
Re: SolrCloud - load balancing
Right - though I'm not sure it load balances offhand - I think it might just do the fail over part rather than the round robin . I was looking at it the other day and need to investigate. I also think it should be weighting towards sending to leaders - eg ideally it should load balance across the leaders for indexing and everyone for search, round robin. On Fri, Aug 3, 2012 at 7:49 PM, Jack Krupansky j...@basetechnology.comwrote: You can use CloudSolrServer - SolrJ client class to communicate with SolrCloud. Instances of this class communicate with Zookeeper to discover Solr endpoints for SolrCloud collections, and then use the LBHttpSolrServer to issue requests. http://lucene.apache.org/solr/**api-4_0_0-ALPHA/org/apache/** solr/client/solrj/impl/**CloudSolrServer.htmlhttp://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/client/solrj/impl/CloudSolrServer.html -- Jack Krupansky -Original Message- From: sausarkar Sent: Friday, August 03, 2012 7:29 PM To: solr-user@lucene.apache.org Subject: SolrCloud - load balancing Do anyone know if query using CommonsHttpSolrServer on SolrCloud is automatically load balanced? I am trying to load test using solrmeter on one if the node I am seeing all the nodes seems to be hit. Any clues -- View this message in context: http://lucene.472066.n3.** nabble.com/SolrCloud-load-**balancing-tp3999143.htmlhttp://lucene.472066.n3.nabble.com/SolrCloud-load-balancing-tp3999143.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Mark http://www.lucidimagination.com
Re: How config multicore using solr cloud feature
Configure all your cores as you would in a single node setup. Then use -Dbootstrap_config=true rather than the bootstrap option where you point at one directory and give a config set name. That will bootstrap all of your cores with the config they have locally, naming the config sets created after the collection name. The other option is to use the new collections API to create further collections - but I have not gotten around to documenting it on the wiki yet - I will shortly. I'm not positive if the collections API is in 4alpha without looking, but I think it may be. If not, the bootstrap method is pretty simple as well. On Sun, Jul 29, 2012 at 11:00 PM, Qun Wang qun.w...@morningstar.com wrote: Hi, I'm a new user and our program need use multicore to manage index. I found that Solr 4.0 ALPHA has Solr cloud feature which I could use for load balance in query and sync for update. But the wiki for Solr cloud just tell me how to use single core for sync. For my requirement should use it for multicore synchronized in update. Could someone tell me how to configure it? Thanks. -- - Mark http://www.lucidimagination.com
Re: separation of indexes to optimize facet queries without fulltext
Yes, you can have multiple indexes with solrcloud, same as with stand alone. We call them collections. On Thu, Jul 26, 2012 at 3:40 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi Chris, thanks for the answer. the plan is that in lots of queries I just need faceted values and don't even do a fulltext search. And on the other hand I need the fulltext search for exactly one task in my application, which is search documents and returning them. Here no faceting at all is need, but only filtering with fields, which i also use for the other queries. So if 95% of the queries don't use the fulltext i thought it would make sense to split them. Your suggestion to have one main master index and several slave indexes sounds promising. Is it possible to have this replication in SolrCloud e.g with different kind of schemas etc? Thanks. Daniel On Thu, Jul 26, 2012 at 9:05 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : My thought was, that I could separate indexes. So for the facet queries : where I don't need : fulltext search (so also no indexed fulltext field) I can use a completely : new setup of a : sharded Solr which doesn't include the indexed fulltext, so the index is : kept small containing : just the few fields I have. : : And for the fulltext queries I have the current Solr configuration which : includes as mentioned : above all the fields incl. the index fulltext field. : : Is this a normal way of handling these requirements. That there are : different kind of : Solr configurations for the different needs? Because the huge redundancy It's definitley doable -- one thing i'm not clear on is why, if your faceting queries don't care about the full text, you would need to leave those small fields in your full index ... is your plan to do faceting and drill down using the smaller index, but then display docs resulting from those queries by using the same fq params when querying the full index ? if so then it should work, if not -- you may not need those fields in that index. In general there is nothing wrong with having multiple indexes to solve multiple usecases -- an index is usually an inverted denormalization of some structured source data designed for fast queries/retrieval. If there are multiple distinct ways you want to query/retrieve data that don't lend themselves to the same denormalization, there's nothing wrong with multiple denormalizations. Something else to consider is an approach i've used many times: having a single index, but using special purpose replicas. You can have a master index that you update at the rate of change, one set of slaves that are used for one type of query pattern (faceting on X, Y, and Z for example) and a differnet set of slaves that are used for a different query pattern (faceting on A, B, and C) so each set of slaves gets a higher cahce hit rate then if the queries were randomized across all machines -Hoss -- - Mark http://www.lucidimagination.com
Re: SolrCloud - load balancing
Okay - it does basically cycle through the servers. I forgot, it shuffles the server list every time. So it's okay for searching - but we want to make it smarter for indexing. First perhaps by favoring leaders, and then hashing itself and sending directly to the right leader. Sent from my iPhone On Aug 3, 2012, at 8:35 PM, Mark Miller markrmil...@gmail.com wrote: Right - though I'm not sure it load balances offhand - I think it might just do the fail over part rather than the round robin . I was looking at it the other day and need to investigate. I also think it should be weighting towards sending to leaders - eg ideally it should load balance across the leaders for indexing and everyone for search, round robin. On Fri, Aug 3, 2012 at 7:49 PM, Jack Krupansky j...@basetechnology.com wrote: You can use CloudSolrServer - SolrJ client class to communicate with SolrCloud. Instances of this class communicate with Zookeeper to discover Solr endpoints for SolrCloud collections, and then use the LBHttpSolrServer to issue requests. http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/client/solrj/impl/CloudSolrServer.html -- Jack Krupansky -Original Message- From: sausarkar Sent: Friday, August 03, 2012 7:29 PM To: solr-user@lucene.apache.org Subject: SolrCloud - load balancing Do anyone know if query using CommonsHttpSolrServer on SolrCloud is automatically load balanced? I am trying to load test using solrmeter on one if the node I am seeing all the nodes seems to be hit. Any clues -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-load-balancing-tp3999143.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Mark http://www.lucidimagination.com
Re: SolrCloud - load balancing
Hi Mark, You are referring to the CloudSolrServer not CommonsHttpSolrServer right? does the CommonsHttpSolrServer also round robin? Do you recommend any tool for load testing SolrCloud? Thanks, Sauvik -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-load-balancing-tp3999143p3999159.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How do you get the document name from Open Text?
I'm using Solr 4.0 with ManifoldCF .5.1 crawling Open Text v10.5. I have the cats/atts turned on in Open Text and I can see them all in the Solr index. However, the id is just the URL to download the doc from open text and the document name either from Open Text or the document properties is nowhere to be found. Change the name of uniqueKey field to something that won't overlap cats/atts/metadata. (it is id in default schema.xml) For example : fields field name=uniqueKey type=string indexed=true stored=true required=true / /fields uniqueKeyuniqueKey/uniqueKey And set 'Solr id field name=uniqueKey' in Solr Output Connections Settings.
Re: AW: AW: auto completion search with solr using NGrams in SOLR
Hi thanks, which doing searching i will search either with empname or title only. And also not using any asterics in the query. ex : if i search with mic result should come like michale jackson michale border michale smith want the result just like google search. can us suggest me wht are the configuration need to add/change to get the result like google search ?. for my required result which tokenizers need to use. ? can u tell me how to call a query for this?? thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p3999171.html Sent from the Solr - User mailing list archive at Nabble.com.