Good Open Source Front End for Solr
Hi, What is the best open source front-end for Solr Thanks
Portable Solr
Hi, Like portable embedded web server (Spring Boot or Takes Or Rapid) Takes ( https://github.com/yegor256/takes) or Undertow (http://undertow.io/) or Rapidoid (https://www.rapidoid.org/) Do we have portable Solr server? Want to build an Web application with Solr with portability? The user needs should have only javaRest is portable... Please advise. Thanks
Re: Indexing word with plus sign
Thank you very much Erick! You're right! The "Char" part in PatternReplaceCharFilterFactory misguided me and I tought it was just for Char replacements. One I have gone through the documentation of CharFilters (my fault...) I realized that I could use the very same regex I was using with the PatternReplaceFilterFactory to replace the whole "i+d" expression, and nothing more than that, and it is working like charm now. Thanks again!! El 23/05/17 a las 19:41, Erick Erickson escribió: You need to distinguish between PatternReplaceCharFilterFactory and PatternReplaceFilterFactory The first one is applied to the entire input _before_ tokenization. The second is applied _after_ tokenization to individual tokens, by that time it's too late. It's an easy thing to miss. And at query time you'll have to be careful to keep the + sign from being interpreted as an operator. Best, Erick On Tue, May 23, 2017 at 10:12 AM, Fundera Developer <funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com> wrote: I have also tried this option, by using a PatternReplaceFilterFactory, like this: but it gets processed AFTER the Tokenizer, so when it executes there is no longer an "i+d" token, but two "i" and "d" independent tokens. Is there a way I could make the filter execute before the Tokenizer? I have tried to place it first in the Analyzer definition like this: But I had no luck. Are there any other approaches I could be missing? Thanks! El 22/05/17 a las 20:50, Rick Leir escribió: Fundera, You need a regex which matches a '+' with non-blank chars before and after. It should not replace a '+' preceded by white space, that is important in Solr. This is not a perfect solution, but might improve matters for you. Cheers -- Rick On May 22, 2017 1:58:21 PM EDT, Fundera Developer <funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com> wrote: Thank you Zahid and Erik, I was going to try the CharFilter suggestion, but then I doubted. I see the indexing process, and how the appearance of 'i+d' would be handled, but, what happens at query time? If I use the same filter, I could remove '+' chars that are added by the user to identify compulsory tokens in the search results, couldn't I? However, if i do not use the CharFilter I would not be able to match the 'i+d' search tokens... Thanks all! El 22/05/17 a las 16:39, Erick Erickson escribió: You can also use any of the other tokenizers. WhitespaceTokenizer for instance. There are a couple that use regular expressions. Etc. See: https://cwiki.apache.org/confluence/display/solr/Tokenizers Each one has it's considerations. WhitespaceTokenizer won't, for instance, separate out punctuation so you might then have to use a filter to remove those. Regex's can be tricky to get right ;). Etc Best, Erick On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal <zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net> wrote: Hi, Before applying tokenizer, you can replace your special symbols with some phrase to preserve it and after tokenized you can replace it back. For example: Thanks, Zahid iqbal On Mon, May 22, 2017 at 12:57 AM, Fundera Developer < funderadevelo...@outlook.com<mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com>> wrote: Hi all, I am a bit stuck at a problem that I feel must be easy to solve. In Spanish it is usual to find the term 'i+d'. We are working with Solr 5.5, and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in the index documents both in Spanish and Catalan, and in Catalan it is frequent to find 'i' as a word, when a user searches for 'i+d' it gets Catalan documents as results. I have tried to use the SynonymFilter, with something like: i+d => investigacionYdesarrollo But it does not seem to change anything. Is there a way I could set an exception to the Tokenizer so that it does not split this word? Thanks in advance!
Re: Indexing word with plus sign
Thanks Walter!! For the sake of curiosity, do you remember which Tokenizer were you using in that case? Thanks! El 23/05/17 a las 20:02, Walter Underwood escribió: Years ago at Netflix, I had to deal with a DVD from a band named “+/-“. I gave up and translated that to “plusminus” at index and query time. http://plusmin.us/ <http://plusmin.us/><http://plusmin.us/> Luckily, “.hack//Sign” and other related dot-hack anime matched if I just deleted all the punctuation. And everyone searched for "[•REC]²” as “rec2”. The middot is supposed to be red. Movie studios are clueless about searchable strings. wunder Walter Underwood wun...@wunderwood.org<mailto:wun...@wunderwood.org> http://observer.wunderwood.org/ (my blog) On May 23, 2017, at 10:41 AM, Erick Erickson <erickerick...@gmail.com><mailto:erickerick...@gmail.com> wrote: You need to distinguish between PatternReplaceCharFilterFactory and PatternReplaceFilterFactory The first one is applied to the entire input _before_ tokenization. The second is applied _after_ tokenization to individual tokens, by that time it's too late. It's an easy thing to miss. And at query time you'll have to be careful to keep the + sign from being interpreted as an operator. Best, Erick On Tue, May 23, 2017 at 10:12 AM, Fundera Developer <funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com> wrote: I have also tried this option, by using a PatternReplaceFilterFactory, like this: but it gets processed AFTER the Tokenizer, so when it executes there is no longer an "i+d" token, but two "i" and "d" independent tokens. Is there a way I could make the filter execute before the Tokenizer? I have tried to place it first in the Analyzer definition like this: But I had no luck. Are there any other approaches I could be missing? Thanks! El 22/05/17 a las 20:50, Rick Leir escribió: Fundera, You need a regex which matches a '+' with non-blank chars before and after. It should not replace a '+' preceded by white space, that is important in Solr. This is not a perfect solution, but might improve matters for you. Cheers -- Rick On May 22, 2017 1:58:21 PM EDT, Fundera Developer <funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com> wrote: Thank you Zahid and Erik, I was going to try the CharFilter suggestion, but then I doubted. I see the indexing process, and how the appearance of 'i+d' would be handled, but, what happens at query time? If I use the same filter, I could remove '+' chars that are added by the user to identify compulsory tokens in the search results, couldn't I? However, if i do not use the CharFilter I would not be able to match the 'i+d' search tokens... Thanks all! El 22/05/17 a las 16:39, Erick Erickson escribió: You can also use any of the other tokenizers. WhitespaceTokenizer for instance. There are a couple that use regular expressions. Etc. See: https://cwiki.apache.org/confluence/display/solr/Tokenizers Each one has it's considerations. WhitespaceTokenizer won't, for instance, separate out punctuation so you might then have to use a filter to remove those. Regex's can be tricky to get right ;). Etc Best, Erick On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal <zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net> wrote: Hi, Before applying tokenizer, you can replace your special symbols with some phrase to preserve it and after tokenized you can replace it back. For example: Thanks, Zahid iqbal On Mon, May 22, 2017 at 12:57 AM, Fundera Developer < funderadevelo...@outlook.com<mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com>> wrote: Hi all, I am a bit stuck at a problem that I feel must be easy to solve. In Spanish it is usual to find the term 'i+d'. We are working with Solr 5.5, and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in the index documents both in Spanish and Catalan, and in Catalan it is frequent to find 'i' as a word, when a user searches for 'i+d' it gets Catalan documents as results. I have tried to use the SynonymFilter, with something like: i+d => investigacionYdesarrollo But it does not seem to change anything. Is there a way I could set an exception to the Tokenizer so that it does not split this word? Thanks in advance!
Re: Indexing word with plus sign
I have also tried this option, by using a PatternReplaceFilterFactory, like this: but it gets processed AFTER the Tokenizer, so when it executes there is no longer an "i+d" token, but two "i" and "d" independent tokens. Is there a way I could make the filter execute before the Tokenizer? I have tried to place it first in the Analyzer definition like this: But I had no luck. Are there any other approaches I could be missing? Thanks! El 22/05/17 a las 20:50, Rick Leir escribió: Fundera, You need a regex which matches a '+' with non-blank chars before and after. It should not replace a '+' preceded by white space, that is important in Solr. This is not a perfect solution, but might improve matters for you. Cheers -- Rick On May 22, 2017 1:58:21 PM EDT, Fundera Developer <funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com> wrote: Thank you Zahid and Erik, I was going to try the CharFilter suggestion, but then I doubted. I see the indexing process, and how the appearance of 'i+d' would be handled, but, what happens at query time? If I use the same filter, I could remove '+' chars that are added by the user to identify compulsory tokens in the search results, couldn't I? However, if i do not use the CharFilter I would not be able to match the 'i+d' search tokens... Thanks all! El 22/05/17 a las 16:39, Erick Erickson escribió: You can also use any of the other tokenizers. WhitespaceTokenizer for instance. There are a couple that use regular expressions. Etc. See: https://cwiki.apache.org/confluence/display/solr/Tokenizers Each one has it's considerations. WhitespaceTokenizer won't, for instance, separate out punctuation so you might then have to use a filter to remove those. Regex's can be tricky to get right ;). Etc Best, Erick On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal <zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net> wrote: Hi, Before applying tokenizer, you can replace your special symbols with some phrase to preserve it and after tokenized you can replace it back. For example: Thanks, Zahid iqbal On Mon, May 22, 2017 at 12:57 AM, Fundera Developer < funderadevelo...@outlook.com<mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com><mailto:funderadevelo...@outlook.com>> wrote: Hi all, I am a bit stuck at a problem that I feel must be easy to solve. In Spanish it is usual to find the term 'i+d'. We are working with Solr 5.5, and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in the index documents both in Spanish and Catalan, and in Catalan it is frequent to find 'i' as a word, when a user searches for 'i+d' it gets Catalan documents as results. I have tried to use the SynonymFilter, with something like: i+d => investigacionYdesarrollo But it does not seem to change anything. Is there a way I could set an exception to the Tokenizer so that it does not split this word? Thanks in advance!
Re: Indexing word with plus sign
Thank you Zahid and Erik, I was going to try the CharFilter suggestion, but then I doubted. I see the indexing process, and how the appearance of 'i+d' would be handled, but, what happens at query time? If I use the same filter, I could remove '+' chars that are added by the user to identify compulsory tokens in the search results, couldn't I? However, if i do not use the CharFilter I would not be able to match the 'i+d' search tokens... Thanks all! El 22/05/17 a las 16:39, Erick Erickson escribió: You can also use any of the other tokenizers. WhitespaceTokenizer for instance. There are a couple that use regular expressions. Etc. See: https://cwiki.apache.org/confluence/display/solr/Tokenizers Each one has it's considerations. WhitespaceTokenizer won't, for instance, separate out punctuation so you might then have to use a filter to remove those. Regex's can be tricky to get right ;). Etc Best, Erick On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal <zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net> wrote: Hi, Before applying tokenizer, you can replace your special symbols with some phrase to preserve it and after tokenized you can replace it back. For example: Thanks, Zahid iqbal On Mon, May 22, 2017 at 12:57 AM, Fundera Developer < funderadevelo...@outlook.com<mailto:funderadevelo...@outlook.com>> wrote: Hi all, I am a bit stuck at a problem that I feel must be easy to solve. In Spanish it is usual to find the term 'i+d'. We are working with Solr 5.5, and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in the index documents both in Spanish and Catalan, and in Catalan it is frequent to find 'i' as a word, when a user searches for 'i+d' it gets Catalan documents as results. I have tried to use the SynonymFilter, with something like: i+d => investigacionYdesarrollo But it does not seem to change anything. Is there a way I could set an exception to the Tokenizer so that it does not split this word? Thanks in advance!
Indexing word with plus sign
Hi all, I am a bit stuck at a problem that I feel must be easy to solve. In Spanish it is usual to find the term 'i+d'. We are working with Solr 5.5, and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in the index documents both in Spanish and Catalan, and in Catalan it is frequent to find 'i' as a word, when a user searches for 'i+d' it gets Catalan documents as results. I have tried to use the SynonymFilter, with something like: i+d => investigacionYdesarrollo But it does not seem to change anything. Is there a way I could set an exception to the Tokenizer so that it does not split this word? Thanks in advance!
Get number of results in filtered query
Hi all, we are developing a search engine in which all the possible results have one or more countries associated. If, apart from writing the query, the user selects a country, we use a filterquery to restrict the results to those that match the query and are associated to that country. Nothing spectacular so far :-D However, we would like to show the number of results that are returned by the unfiltered query, since we already have the number of results associated to each country as we are also faceting on that field. Is it possible to have that number without executing the query twice? Thanks in advance!
Re: Problem building fuzzy suggestions
After an intensive trial-and-error phase I finally got it working with this configuration: mySuggester FuzzyLookupFactory HighFrequencyDictionaryFactory suggest 0.1 spellchk textSpell true true However, to be honest, I still have no idea why this time it worked and the others did not. In order to learn for the next time, any clarifications would be highly appreciated. Thanks! El 03/04/2016 a las 12:10, Fundera Developer escribió: After correcting some stupid mistakes, and adding all the parameters I have compiled from every example I found googling around, I have this configuration: mySuggester name="classname">org.apache.solr.spelling.suggest.Suggester name="lookupImpl">org.apache.solr.spelling.suggest.jaspell.JaspellLookupFactory spellchk 0.1 0.0 suggest true true However, still no fuzzy suggestions :-( Any ideas? Thanks in advance!! El 03/04/2016 a las 11:47, Fundera Developer escribió: Following on this, I have been able to make Solr load correctly by replacing the line in the solrconfig.xml: JaspellLookupFactory with the fully qualified class name: name="lookupImpl">org.apache.solr.spelling.suggest.jaspell.JaspellLookupFactory and now I can, as I said, start Solr, build the suggestions and run them. However, I am still getting nothing "fuzzy". If I query with "managem", I get several suggestions ("management" among them), but trying with "managm" or "manahem" does not provide any suggestions, even if I configure the accuracy parameter as low as 0.5 (and reindex, of course). Am I doing something wrong? Is there any better alternative to getting fuzzy suggestions? Thanks in advance!! El 02/04/2016 a las 20:27, Fundera Developer escribió: Hi all, At present I am providing suggestions in my app with this configuration in my solrconfig.xml: mySuggester WFSTLookupFactory spellchk 0.005 true true true mySuggester 5 suggest However, I would like to provide fuzzy suggestions, so that if the user types "managme", I can suggest "management". Based on the documentation I found on the wiki and googling, I tried with this configuration: mySuggester JaspellLookupFactory spellchk 0.6 true true But if I start my Solr instance with that configuration, I get this error in the log: 2016-04-02 18:24:26.018 ERROR (coreLoadExecutor-6-thread-1) [ x:funderatenders] o.a.s.c.CoreContainer Error creating core [myapp]: Error loading class 'JaspellLookupFactory' org.apache.solr.common.SolrException: Error loading class 'JaspellLookupFactory' at org.apache.solr.core.SolrCore.(SolrCore.java:820) at org.apache.solr.core.SolrCore.(SolrCore.java:658) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:814) at org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:87) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:467) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:458) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) What is the correct way to use and configure the JaspellLookupFactory, or where can I find documentation on it? Thanks in advance!
StackOverflowError when building suggestions
Hi all, I have taken the example configuration in the reference guide (https://cwiki.apache.org/confluence/display/solr/Suggester) and adapted it to my needs, resulting in this configuration: mySuggester FuzzyLookupFactory DocumentDictionaryFactory spellchk string true true But when I issue a suggestions rebuild (either on solr startup or manually), I get the following exception: 2016-04-03 10:37:19.538 INFO (searcherExecutor-7-thread-1-processing-x:funderatenders) [ x:funderatenders] o.a.s.s.s.SolrSuggester SolrSuggester.build(mySuggester) 2016-04-03 10:37:19.538 INFO (coreLoadExecutor-6-thread-1) [ x:funderatenders] o.a.s.u.UpdateLog Looking up max value of version field to seed version buckets 2016-04-03 10:37:19.538 INFO (coreLoadExecutor-6-thread-1) [ x:funderatenders] o.a.s.u.VersionInfo Refreshing highest value of _version_ for 65536 version buckets from index 2016-04-03 10:37:19.569 INFO (coreLoadExecutor-6-thread-1) [ x:funderatenders] o.a.s.u.VersionInfo Found MAX value 1530583001228902400 from Terms for _version_ in index 2016-04-03 10:37:19.569 INFO (coreLoadExecutor-6-thread-1) [ x:funderatenders] o.a.s.u.UpdateLog Took 27.0ms to seed version buckets with highest version 1530583001228902400 2016-04-03 10:37:19.569 INFO (coreLoadExecutor-6-thread-1) [ x:funderatenders] o.a.s.c.CoreContainer registering core: funderatenders 2016-04-03 10:37:19.640 ERROR (searcherExecutor-7-thread-1-processing-x:funderatenders) [ x:funderatenders] o.a.s.c.SolrCore null:java.lang.StackOverflowError at java.util.BitSet.get(BitSet.java:625) at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1309) at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1311) at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1311) at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1311) at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1311) at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1311) This exception happens exactly the same if I change to the AnalyzingLookupFactory. Looking for information I have found some more people that faced this problem, but none of them ever got an answer or explained if they solved it anyhow: http://stackoverflow.com/questions/33749956/solr-suggester-throws-stackoverflow-error https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201510.mbox/%3c1730285477.22.1445329649504.javamail.ya...@mail.yahoo.com%3E Anyone has any ideas on this? Thanks in advance!!
Re: Problem building fuzzy suggestions
After correcting some stupid mistakes, and adding all the parameters I have compiled from every example I found googling around, I have this configuration: mySuggester name="classname">org.apache.solr.spelling.suggest.Suggester name="lookupImpl">org.apache.solr.spelling.suggest.jaspell.JaspellLookupFactory spellchk 0.1 0.0 suggest true true However, still no fuzzy suggestions :-( Any ideas? Thanks in advance!! El 03/04/2016 a las 11:47, Fundera Developer escribió: Following on this, I have been able to make Solr load correctly by replacing the line in the solrconfig.xml: JaspellLookupFactory with the fully qualified class name: org.apache.solr.spelling.suggest.jaspell.JaspellLookupFactory and now I can, as I said, start Solr, build the suggestions and run them. However, I am still getting nothing "fuzzy". If I query with "managem", I get several suggestions ("management" among them), but trying with "managm" or "manahem" does not provide any suggestions, even if I configure the accuracy parameter as low as 0.5 (and reindex, of course). Am I doing something wrong? Is there any better alternative to getting fuzzy suggestions? Thanks in advance!! El 02/04/2016 a las 20:27, Fundera Developer escribió: Hi all, At present I am providing suggestions in my app with this configuration in my solrconfig.xml: mySuggester WFSTLookupFactory spellchk 0.005 true true true mySuggester 5 suggest However, I would like to provide fuzzy suggestions, so that if the user types "managme", I can suggest "management". Based on the documentation I found on the wiki and googling, I tried with this configuration: mySuggester JaspellLookupFactory spellchk 0.6 true true But if I start my Solr instance with that configuration, I get this error in the log: 2016-04-02 18:24:26.018 ERROR (coreLoadExecutor-6-thread-1) [ x:funderatenders] o.a.s.c.CoreContainer Error creating core [myapp]: Error loading class 'JaspellLookupFactory' org.apache.solr.common.SolrException: Error loading class 'JaspellLookupFactory' at org.apache.solr.core.SolrCore.(SolrCore.java:820) at org.apache.solr.core.SolrCore.(SolrCore.java:658) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:814) at org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:87) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:467) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:458) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) What is the correct way to use and configure the JaspellLookupFactory, or where can I find documentation on it? Thanks in advance!
Re: Problem building fuzzy suggestions
Following on this, I have been able to make Solr load correctly by replacing the line in the solrconfig.xml: JaspellLookupFactory with the fully qualified class name: org.apache.solr.spelling.suggest.jaspell.JaspellLookupFactory and now I can, as I said, start Solr, build the suggestions and run them. However, I am still getting nothing "fuzzy". If I query with "managem", I get several suggestions ("management" among them), but trying with "managm" or "manahem" does not provide any suggestions, even if I configure the accuracy parameter as low as 0.5 (and reindex, of course). Am I doing something wrong? Is there any better alternative to getting fuzzy suggestions? Thanks in advance!! El 02/04/2016 a las 20:27, Fundera Developer escribió: Hi all, At present I am providing suggestions in my app with this configuration in my solrconfig.xml: mySuggester WFSTLookupFactory spellchk 0.005 true true true mySuggester 5 suggest However, I would like to provide fuzzy suggestions, so that if the user types "managme", I can suggest "management". Based on the documentation I found on the wiki and googling, I tried with this configuration: mySuggester JaspellLookupFactory spellchk 0.6 true true But if I start my Solr instance with that configuration, I get this error in the log: 2016-04-02 18:24:26.018 ERROR (coreLoadExecutor-6-thread-1) [ x:funderatenders] o.a.s.c.CoreContainer Error creating core [myapp]: Error loading class 'JaspellLookupFactory' org.apache.solr.common.SolrException: Error loading class 'JaspellLookupFactory' at org.apache.solr.core.SolrCore.(SolrCore.java:820) at org.apache.solr.core.SolrCore.(SolrCore.java:658) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:814) at org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:87) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:467) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:458) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) What is the correct way to use and configure the JaspellLookupFactory, or where can I find documentation on it? Thanks in advance!
Problem building fuzzy suggestions
Hi all, At present I am providing suggestions in my app with this configuration in my solrconfig.xml: mySuggester WFSTLookupFactory spellchk 0.005 true true true mySuggester 5 suggest However, I would like to provide fuzzy suggestions, so that if the user types "managme", I can suggest "management". Based on the documentation I found on the wiki and googling, I tried with this configuration: mySuggester JaspellLookupFactory spellchk 0.6 true true But if I start my Solr instance with that configuration, I get this error in the log: 2016-04-02 18:24:26.018 ERROR (coreLoadExecutor-6-thread-1) [ x:funderatenders] o.a.s.c.CoreContainer Error creating core [myapp]: Error loading class 'JaspellLookupFactory' org.apache.solr.common.SolrException: Error loading class 'JaspellLookupFactory' at org.apache.solr.core.SolrCore.(SolrCore.java:820) at org.apache.solr.core.SolrCore.(SolrCore.java:658) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:814) at org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:87) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:467) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:458) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) What is the correct way to use and configure the JaspellLookupFactory, or where can I find documentation on it? Thanks in advance!
Re: Solr Autosuggest - Strange issue with leading numbers in query
Hi Erik, Thanks a lot for your reply. I expect it to return zero suggestions since the suggested keyword doesnt actually start with numbers. Expected results Searching for ga - returns galaxy Searching for gal - returns galaxy Searching for 12321312321312ga - should not return any suggestion since there is no keyword (combination) exists in the index. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Autosuggest-Strange-issue-with-leading-numbers-in-query-tp4116751p4117846.html Sent from the Solr - User mailing list archive at Nabble.com.
Is it possible to load new elevate.xml on the fly?
Hi, I am trying to figure out a way to use multiple elevate.xml using the query parameters on the fly. We have a scenario where we need to elevate documents based on authentication (same core) without creating a new search handler. * For authenticated customers * elevate documents based on elevate1.xml *For non-authenticated customers* elevate documents based on elevate2.xml I am not sure if there is a way to implement this using any other method. Any help in this regard is appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-load-new-elevate-xml-on-the-fly-tp4117856.html Sent from the Solr - User mailing list archive at Nabble.com.
Escape \\n from getting highlighted - highlighter component
Hi, When searching for a text like 'talk n text' the highlighter component also adds the em tags to the special characters like \n. Is there a way to avoid highlighting the special characters? \\r\\n Family Messaging is getting replaced as \\r\\emn/em Family Messaging -- View this message in context: http://lucene.472066.n3.nabble.com/Escape-n-from-getting-highlighted-highlighter-component-tp4117895.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Autosuggest - Strange issue with leading numbers in query
I have a strange issue with Autosuggest. Whenever I query for a keyword along with numbers (leading) it returns the suggestion corresponding to the alphabets (ignoring the numbers). I was under assumption that it will return an empty result back. I am not sure what I am doing wrong. Can someone help? *Query:* /autocomplete?qt=/lucidreq_type=auto_completespellcheck.maxCollations=10q=12342343243242gaspellcheck.count=10 *Result:* response lst name=responseHeader int name=status0/int int name=QTime1/int /lst lst name=spellcheck lst name=suggestions lst name=ga int name=numFound1/int int name=startOffset15/int int name=endOffset17/int arr name=suggestion strgalaxy/str /arr /lst str name=collation12342343243242galaxy/str /lst /lst /response *My field configuration is as below:* fieldType class=solr.TextField name=textSpell_word positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory enablePositionIncrements=true ignoreCase=true words=stopwords_autosuggest.txt/ /analyzer /fieldType *SolrConfig.xml* searchComponent class=solr.SpellCheckComponent name=autocomplete lst name=spellchecker str name=nameautocomplete/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldautocomplete_word/str str name=storeDirautocomplete/str str name=buildOnCommittrue/str float name=threshold.005/float /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/autocomplete lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionaryautocomplete/str str name=spellcheck.collatetrue/str str name=spellcheck.count10/str str name=spellcheck.onlyMorePopularfalse/str /lst arr name=components strautocomplete/str /arr /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Autosuggest-Strange-issue-with-leading-numbers-in-query-tp4116751.html Sent from the Solr - User mailing list archive at Nabble.com.
Is there a way to ignore elevate.xml through SOLR parameters?
I am currently using elevate.xml to elevate few documents based on some search keywords. We have a requirement to ignore the elevate.xml and boost the documents based on original scoring factors when a user is an authenticated user. Is it something that can be done by using SOLR parameters? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-ignore-elevate-xml-through-SOLR-parameters-tp4114261.html Sent from the Solr - User mailing list archive at Nabble.com.
Limit the number of words in Auto complete using RE - not working
Hi, I have a fieldtype as below configured to index the autocomplete phrase. Everything worked fine except for the fact that some of the phrases were too long so we had to limit the maximum number of words in a phrase hence I added a regular expression which will remove all other words except the first 3 words. *filter class=solr.PatternReplaceFilterFactory pattern=^((?:\S+\s+){2}\S+).* replacement=$1/ * This regular expression works fine when I use analyze the field using the SOLR dashboard but doesnt actually limit the words during indexing. I am not sure if I am doing anything wrong.. Can someone help me figure out the issue? *Field Type:* fieldType class=solr.TextField name=textSpell positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ *filter class=solr.PatternReplaceFilterFactory pattern=^((?:\S+\s+){2}\S+).* replacement=$1/ *filter class=solr.StopFilterFactory enablePositionIncrements=true ignoreCase=true words=stopwords.txt/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ *Field:* field indexed=true multiValued=true name=autocomplete_phrase stored=true type=textSpell/ *Copy Field:* copyField dest=autocomplete_phrase source=displayName/ copyField dest=autocomplete_phrase source=manufacturer/ -- View this message in context: http://lucene.472066.n3.nabble.com/Limit-the-number-of-words-in-Auto-complete-using-RE-not-working-tp4113790.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR suggester component - Get suggestion dump
I am currently using the suggester component for auto suggest. I need to blacklist few of the suggestions generated during indexing hence I am trying to get entire list of suggestions generated during indexing, is there a way to get that dump? -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-suggester-component-Get-suggestion-dump-tp4110026.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR Security - Displaying endpoints to public
Hi, We are currently showing the SOLR endpoints to the public when using our application (public users would be able to view the SOLR endpoints (/select) and the query in debugging console). I am trying to figure out if there is any security threat in terms of displaying the endpoints directly in internet. We have disabled the update handler in production so I assume writes / updates are not possible. The below URL mentions a point 'Solr does not concern itself with security either at the document level or the communication level. It is strongly recommended that the application server containing Solr be firewalled such the only clients with access to Solr are your own.' Is the above statement true even if we just display the read-only endpoints to the public users? Can someone please advise? http://wiki.apache.org/solr/SolrSecurity -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Security-Displaying-endpoints-to-public-tp4109792.html Sent from the Solr - User mailing list archive at Nabble.com.
question regarding dismax query results
Hi, I have a solr schema which has fields related to Indian legal judgments and want to provide a search engine on top of them. I came across a problem which I thought I would take the group's advise on. For discussion sake let us assume there are only two fields assessee and itat_order which are text fields; the latter has the entire judgment of the court in text form. Now I search using dismax against these 2 fields using a query like below http://localhost:8983/solr/itat/select?q=additional+depreciationstart=20rows=30fl=assessee%2C+itat_order%2C+scorewt=xmlindent=truedefType=dismaxqf=assesseehttp://techgaruda.com:8983/solr/itat/select?q=additional+depreciationstart=20rows=30fl=assessee%2C+itat_order%2C+scorewt=xmlindent=truedefType=dismaxqf=assessee ^0.3+itat_order^0.2 For such a dismax query, the words additional depreciation (2 words without quotes), we get results with additional and depreciation separately occurring having higher score than results which have the words additional depreciation occurring immediately together. Why does this happen? Shouldn't we ideally be getting exact matches of additional depreciation first and then matches which have both the words but apart from each other after these exact matches? (In general when I search for A B shouldn't I get matches with A B as they appear first and then A and B separated by distance or singly occuring?) Below I have pasted the score and # of occurences given for three results; if you want I can share the text fields in these cases too. (Also, for what its worth, the solr index uses only a whitespacerfilterfactory and lowercasefilterfactory for querying and indexing) thanks Vulcanoid decision of Heatshrink Technologies : score : 0.083743244 additional depreciation : 0 occurrence additional : 2 occurrences depreciation : 27 occurrences decision of Srinivasa Raju score : 0.08313061 additional depreciation : 0 occurrences additional : 5 occurrences depreciation : 30 occurrences decision of Nani Agro Foods score : 0.08217349 additional depreciation : 5 occurrences additional : 5 occurrences depreciation : 5 occurrences
Re: simple tokenizer question
Thanks for your email. Great, I will look at the WordDelimiterFactory. Just to make clear, I DON'T want any other tokenizing on digits, specialchars, punctuations etc done other than word delimiting on whitespace. All I want for my first version is NO removal of punctuations/special characters at indexing time and during search time i.e., input as-is and search as-is (like a simple sql db?) . I was assuming this would be a trivial case with SOLR and not sure what I am missing here. thanks Vulcanoid On Sun, Dec 8, 2013 at 4:33 AM, Upayavira u...@odoko.co.uk wrote: Have you tried a WhitespaceTokenizerFactory followed by the WordDelimiterFilterFactory? The latter is perhaps more configurable at what it does. Alternatively, you could use a RegexFilterFactory to remove extraneous punctuation that wasn't removed by the Whitespace Tokenizer. Upayavira On Sat, Dec 7, 2013, at 06:15 PM, Vulcanoid Developer wrote: Hi, I am new to solr and I guess this is a basic tokenizer question so please bear with me. I am trying to use SOLR to index a few (Indian) legal judgments in text form and search against them. One of the key points with these documents is that the sections/provisions of law usually have punctuation/special characters in them. For example search queries will TYPICALLY be section 12AA, section 80-IA, section 9(1)(vii) and the text of the judgments themselves will contain these sort of text with section references all over the place. Now, using a default schema setup with standardtokenizer, which seems to delimit on whitespace AND punctuations, I get really bad results because it looks like 12AA is split and results such having 12 and AA in them turn up. It becomes worse with 9(1)(vii) with results containing 9 and 1 etc being turned up. What is the best solution here? I really just want to index the document as-is and also to do whitespace tokenizing on the search and nothing more. So in other words: a) I would like the text document to be indexed as-is with say 12AA and 9(1)(vii) in the document stored as it is mentioned. b) I would like to be able to search for 12AA and for 9(1)(vii) and get proper full matches on them without any splitting up/munging etc. Any suggestions are appreciated. Thank you for your time. Thanks Vulcanoid
simple tokenizer question
Hi, I am new to solr and I guess this is a basic tokenizer question so please bear with me. I am trying to use SOLR to index a few (Indian) legal judgments in text form and search against them. One of the key points with these documents is that the sections/provisions of law usually have punctuation/special characters in them. For example search queries will TYPICALLY be section 12AA, section 80-IA, section 9(1)(vii) and the text of the judgments themselves will contain these sort of text with section references all over the place. Now, using a default schema setup with standardtokenizer, which seems to delimit on whitespace AND punctuations, I get really bad results because it looks like 12AA is split and results such having 12 and AA in them turn up. It becomes worse with 9(1)(vii) with results containing 9 and 1 etc being turned up. What is the best solution here? I really just want to index the document as-is and also to do whitespace tokenizing on the search and nothing more. So in other words: a) I would like the text document to be indexed as-is with say 12AA and 9(1)(vii) in the document stored as it is mentioned. b) I would like to be able to search for 12AA and for 9(1)(vii) and get proper full matches on them without any splitting up/munging etc. Any suggestions are appreciated. Thank you for your time. Thanks Vulcanoid
Re: Suggester - how to return exact match?
Might not be a perfect solution but you can use edgengram filter and copy all your field data to that field and use it for suggestion. fieldType name=text_autocomplete class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=250 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType http://localhost:8983/solr/core1/select?q=name:iphone The above query will return iphone iphone5c iphone4g -- View this message in context: http://lucene.472066.n3.nabble.com/Suggester-how-to-return-exact-match-tp4102203p4102521.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Suggester - how to return exact match?
May be there is a way to do this but it doesn't make sense to return the same search query as a suggestion (Search query is not a suggestion as it might or might not be present in the index). AFAIK you can use various look up algorithm to get the suggestion list and they lookup the terms based on the query value (some alogrithm implements fuzzy logic too). so searching Foo will return FooBar, Foo2 but not foo. You should fetch the suggestion only if the numfound is greater than 0 else you don't have any suggestion. -- View this message in context: http://lucene.472066.n3.nabble.com/Suggester-how-to-return-exact-match-tp4102203p4102259.html Sent from the Solr - User mailing list archive at Nabble.com.
How to index X™ as #8482; (HTML decimal entity)
I have a data coming in to SOLR as below. field name=displayNameX™ - Black/field I need to store the HTML Entity (decimal) equivalent value (i.e. #8482;) in SOLR rather than storing the original value. Is there a way to do this? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-index-X-as-8482-HTML-decimal-entity-tp4102002.html Sent from the Solr - User mailing list archive at Nabble.com.
How to escape special characters from SOLR response header
I am trying to escape special characters from SOLR response header (to prevent cross site scripting). I couldn't find any method in SolrQueryResponse to get just the SOLR response header. Can someone let me know if there is a way to modify the SOLR response header? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-escape-special-characters-from-SOLR-response-header-tp4100772.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need idea to standardize keywords - ring tone vs ringtone
I tried using synonyms but it doesn't actually change the stored text rather just the indexed value. I need a way to change the raw value stored in SOLR. May be I should use a custom update processor to standardize the data. -- View this message in context: http://lucene.472066.n3.nabble.com/Need-idea-to-standardize-keywords-ring-tone-vs-ringtone-tp4097794p4098530.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need idea to standardize keywords - ring tone vs ringtone
Thanks for your response Eric. Sorry for the confusion. I currently display both 'ring tone' as well as 'ringtone' when the user types in 'r' but I am trying to figure out a way to display just 'ringtone' hence I added 'ring tone' to stopwords list so that it doesn't get indexed. I have the list of know keywords (more like synonyms) which I am trying to map against the user entered keywords. ring tone, ringer tine = ringtone -- View this message in context: http://lucene.472066.n3.nabble.com/Need-idea-to-standardize-keywords-ring-tone-vs-ringtone-tp4097794p4098103.html Sent from the Solr - User mailing list archive at Nabble.com.
Need idea to standardize keywords - ring tone vs ringtone
I am currently using a separate core for indexing the autosuggest keywords. Everything works fine except for one issue as below. In index I have 2 entries ring tone ringtone When users type in 'r' I display both ring tone and ringtone in auto suggest list. I am trying to figure out a way to standardize common keywords (known standardized keywords) automatically. Currently I manually add the non standard keywords to the stopwords.txt file so that it doesn't get indexed. Is there a way I can automate this? -- View this message in context: http://lucene.472066.n3.nabble.com/Need-idea-to-standardize-keywords-ring-tone-vs-ringtone-tp4097794.html Sent from the Solr - User mailing list archive at Nabble.com.
Is there a way to standardize the stored values (like using synonyms for indexed values)?
I am trying to figure out a way to standardize the stored values using a file similar to synonyms.txt file. For ex: If I have 3 entries as below name: apple banana name: appleBanana name: applebaNana Mapping apple banana, appleBanana, applebaNana= applebanana I want to just have one entry (overwriting the other - I will be using this name as unique id) to be stored in the index. Not sure if its possible to do this currently. Can someone help me out? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-standardize-the-stored-values-like-using-synonyms-for-indexed-values-tp4097846.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact Match Results
You need to provide us with the fieldtype information.. If you just want to match the phrase entered by user, you can use KeywordTokenizerFactory.. Reference: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Creates org.apache.lucene.analysis.core.KeywordTokenizer. Treats the entire field as a single token, regardless of its content. Example: http://example.com/I-am+example?Text=-Hello; == http://example.com/I-am+example?Text=-Hello; -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-Match-Results-tp4096816p4096846.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact Match Results
For exact phrase match you can wrap the query inside quotes but this will perform the exact match and it wont match other results. The below query will match only : Okkadu telugu movie stills http://localhost:8983/solr/core1/select?q=%22okkadu%20telugu%20movie%20stills%22 Since you are using Edge N Gram filter, it produces so many tokens (as below). You might not get the desired output. You can try using shingle factory with standard analyzer instead of using edge n gram filter. o [6f] 0 26 1 1 word ok [6f 6b] 0 26 1 1 word okk [6f 6b 6b] 0 26 1 1 word okka [6f 6b 6b 61] 0 26 1 1 word okkad [6f 6b 6b 61 64] 0 26 1 1 word okkadu [6f 6b 6b 61 64 75] 0 26 1 1 word okkadu [6f 6b 6b 61 64 75 20] 0 26 1 1 word okkadu t [6f 6b 6b 61 64 75 20 74] 0 26 1 1 word okkadu te [6f 6b 6b 61 64 75 20 74 65] 0 26 1 1 word okkadu tel [6f 6b 6b 61 64 75 20 74 65 6c] 0 26 1 1 word -- View this message in context: http://lucene.472066.n3.nabble.com/Exact-Match-Results-tp4096816p4096906.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fuzzy Logic Implementation
Use the spell check component with collation.. Example: http://localhost:8983/solr/spell?q=delll ultrasharspellcheck=truespellcheck.extendedResults=truespellcheck.collate=true http://wiki.apache.org/solr/SpellCheckComponent -- View this message in context: http://lucene.472066.n3.nabble.com/Fuzzy-Logic-Implementation-tp4095644p4095738.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrnet sample
You can get most of the faceting information using the below link, SOLRNet faceting info: https://github.com/mausch/SolrNet/blob/master/Documentation/Facets.md SOLR faceting info: http://wiki.apache.org/solr/SolrFacetingOverview -- View this message in context: http://lucene.472066.n3.nabble.com/solrnet-sample-tp4094923p4095468.html Sent from the Solr - User mailing list archive at Nabble.com.
My posts are NOT getting accepted by the mailing list.
For some reason, my posts are not getting accepted my mailing list even though I am a subscriber for more than 3 years now. Did anything change in recent past? Do I need to subscribe to this list again? -- View this message in context: http://lucene.472066.n3.nabble.com/My-posts-are-NOT-getting-accepted-by-the-mailing-list-tp4095470.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: My posts are NOT getting accepted by the mailing list.
Thanks Shawn. Actually some of my posts are still pending while others were successfully accepted by mailing list. I never used HTML formatting.. but hopefully I am not listed as a spammer -- View this message in context: http://lucene.472066.n3.nabble.com/Re-My-posts-are-NOT-getting-accepted-by-the-mailing-list-tp4095479p4095507.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Replace NULL with 0 while Indexing
You can also use SELECT ISNULL(myColumn, 0 ) FROM myTable Reference: http://www.w3schools.com/sql/sql_isnull.asp -- View this message in context: http://lucene.472066.n3.nabble.com/Replace-NULL-with-0-while-Indexing-tp4095059p4095550.html Sent from the Solr - User mailing list archive at Nabble.com.