Solr Data Routing
Hi All, I want to route data into shards depends on value of input column. For example: I am getting user data and want to store data of user1 on shard1 and user2 on shard2 and so on. Can you please let me know, how we can achieve the above scenario in Solr. -- Thanks, Ankit Jain
Re: Solr Data Routing
Hi, You can use multi level compositeId routing in solr cloud. Read through the following link http://searchhub.org/2014/01/06/10590/ it should help. Thanks, Himanshu On Tue, Sep 2, 2014 at 1:25 PM, Ankit Jain ankitjainc...@gmail.com wrote: Hi All, I want to route data into shards depends on value of input column. For example: I am getting user data and want to store data of user1 on shard1 and user2 on shard2 and so on. Can you please let me know, how we can achieve the above scenario in Solr. -- Thanks, Ankit Jain -- Himanshu Mehrotra Download Our App[image: A] https://play.google.com/store/apps/details?id=com.snapdeal.mainutm_source=mobileAppLputm_campaign=android[image: A] https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1mt=8utm_source=mobileAppLputm_campaign=ios[image: W] http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f *Ext*: 529 246 OKHLA PHASE III, NEW DELHI 110 020, INDIA [image: Snapdeal.com] http://www.snapdeal.com/
Re: Help with StopFilterFactory
Jira issue: https://issues.apache.org/jira/browse/SOLR-6468 -- View this message in context: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-tp4153839p4156373.html Sent from the Solr - User mailing list archive at Nabble.com.
Date field related query
Hi, I am working on date and i want to find all those records which are indexed today. With Regards Aman Tandon
Re: Date field related query
Hi, I did it using this, fq=datefield:[2014-09-01T23:59:59Z TO 2014-09-02T23:59:59Z]. Correct me if i am wrong. Is there any way to find this using the NOW? With Regards Aman Tandon On Tue, Sep 2, 2014 at 4:08 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I am working on date and i want to find all those records which are indexed today. With Regards Aman Tandon
Re: Date field related query
How about : datefield:[NOW-1DAY/DAY TO *] François On Sep 2, 2014, at 6:54 AM, Aman Tandon amantandon...@gmail.com wrote: Hi, I did it using this, fq=datefield:[2014-09-01T23:59:59Z TO 2014-09-02T23:59:59Z]. Correct me if i am wrong. Is there any way to find this using the NOW? With Regards Aman Tandon On Tue, Sep 2, 2014 at 4:08 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I am working on date and i want to find all those records which are indexed today. With Regards Aman Tandon
HTTPS for SolrCloud
Solr 4.8.1 Java 1.7 Tomcat 7.0.50 Zookeeper 3.4.6 Trying to get a SolrCloud running with https only. I found this: https://issues.apache.org/jira/browse/SOLR-3854 I don't have a clusterprops.json file, and running the zkCli command doesn't add one either. Command is along the lines of: ./zkCli.sh -zkhost host:port -cmd put /clusterprops.json '{urlScheme:https}' (run from the zookeeper/bin directory). I've done some googling, but I can't seem to figure out what I'm doing wrong. I'm not getting an error message when doing the command. Any ideas? Thanks. -- Chris
Search on specific shard
Hi All, I am using below piece of code to route a data on the basis of user field. The data of user1 is going on one shard and data of user2 is going on another shard. try { String zkHostString = 127.0.0.1:2181; CloudSolrServer cloudSolrServer = new CloudSolrServer(zkHostString); CollectionAdminRequest.createCollection(collection5, 2, 2, 2, null, null, user, cloudSolrServer); cloudSolrServer.setDefaultCollection(collection5); for (int i = 0; i = 100; i++) { SolrInputDocument document = new SolrInputDocument(); document.addField(id, i); document.addField(user, user+(i%2)); cloudSolrServer.add(document); } cloudSolrServer.commit(); cloudSolrServer.shutdown(); } catch (SolrException e) { e.printStackTrace(); } catch (SolrServerException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } Now, I want to use routing at search time. If user search the documents for user1, then my query should be execute on only shard1 (shard1 contains the data of user1). Please let me know, how we can route the query to specific shard at search time. -- Thanks, Ankit Jain
Solr source code
Hi, What is the process regarding modify Solr source code (legal part)? In addition, who should I update for this bug and fix so Solr team will consider using it. Thanks, Shay.
Re: Solr source code
On 9/2/2014 8:27 AM, Shay Sofer wrote: What is the process regarding modify Solr source code (legal part)? In addition, who should I update for this bug and fix so Solr team will consider using it. The Lucene/Solr project is licensed under the Apache License, version 2.0. http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/foundation/license-faq.html#WhatDoesItMEAN The generally accepted way to contribute a bugfix to the project is to find (or create) the appropriate issue in Jira, then attach your patch to it. Ideally you will check out the trunk branch from SVN and create your patch against that with the svn diff tool, but the stable branch (branch_4x currently) will do just as well. Just be sure there's enough info accompanying the patch for us to identify the exact branch/revision used to build it. http://wiki.apache.org/solr/HowToContribute There are other methods, like the mailing list, a pull request for the github mirror, etc... but Jira and a patch from SVN are the best way. Thanks, Shawn
Re: Solr Data Routing
Here's another link: http://searchhub.org/2013/06/13/solr-cloud-document-routing/ I have to ask why you want to do this? If you want to put docs in a particular shard yourself, you have to be very careful that you're not shooting yourself in the foot. Not saying it's a bad idea, but this may be an XY problem. What is the use-case you're supporting by doing this? Would separate collections serve as well? Best, Erick On Tue, Sep 2, 2014 at 2:03 AM, Himanshu Mehrotra himanshu.mehro...@snapdeal.com wrote: Hi, You can use multi level compositeId routing in solr cloud. Read through the following link http://searchhub.org/2014/01/06/10590/ it should help. Thanks, Himanshu On Tue, Sep 2, 2014 at 1:25 PM, Ankit Jain ankitjainc...@gmail.com wrote: Hi All, I want to route data into shards depends on value of input column. For example: I am getting user data and want to store data of user1 on shard1 and user2 on shard2 and so on. Can you please let me know, how we can achieve the above scenario in Solr. -- Thanks, Ankit Jain -- Himanshu Mehrotra Download Our App[image: A] https://play.google.com/store/apps/details?id=com.snapdeal.mainutm_source=mobileAppLputm_campaign=android [image: A] https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1mt=8utm_source=mobileAppLputm_campaign=ios [image: W] http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f *Ext*: 529 246 OKHLA PHASE III, NEW DELHI 110 020, INDIA [image: Snapdeal.com] http://www.snapdeal.com/
Re: Date field related query
Hmmm, not quite, I think you meant: datefield:[NOW/DAY TO NOW/DAY+1DAY] You're particularly interested in using date math if if you use these in filter query clauses, see: http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/ Best, Erick On Tue, Sep 2, 2014 at 3:59 AM, François Schiettecatte fschietteca...@gmail.com wrote: How about : datefield:[NOW-1DAY/DAY TO *] François On Sep 2, 2014, at 6:54 AM, Aman Tandon amantandon...@gmail.com wrote: Hi, I did it using this, fq=datefield:[2014-09-01T23:59:59Z TO 2014-09-02T23:59:59Z]. Correct me if i am wrong. Is there any way to find this using the NOW? With Regards Aman Tandon On Tue, Sep 2, 2014 at 4:08 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I am working on date and i want to find all those records which are indexed today. With Regards Aman Tandon
Re: Date field related query
Thanks Erick :) With Regards Aman Tandon On Tue, Sep 2, 2014 at 8:28 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, not quite, I think you meant: datefield:[NOW/DAY TO NOW/DAY+1DAY] You're particularly interested in using date math if if you use these in filter query clauses, see: http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/ Best, Erick On Tue, Sep 2, 2014 at 3:59 AM, François Schiettecatte fschietteca...@gmail.com wrote: How about : datefield:[NOW-1DAY/DAY TO *] François On Sep 2, 2014, at 6:54 AM, Aman Tandon amantandon...@gmail.com wrote: Hi, I did it using this, fq=datefield:[2014-09-01T23:59:59Z TO 2014-09-02T23:59:59Z]. Correct me if i am wrong. Is there any way to find this using the NOW? With Regards Aman Tandon On Tue, Sep 2, 2014 at 4:08 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I am working on date and i want to find all those records which are indexed today. With Regards Aman Tandon
Re: HTTPS for SolrCloud
Getting closer. I can at least get the file to be there, but I can't figure out what to put into it. I make a clusterprops.json file, and its had: { urlScheme: https } { \urlScheme\: \https\ } { \\urlScheme\\: \\https\\ } Which gets loaded in like this: ./zkCli.sh -zkhost localhost:2181 -cmd put /cluserprops.json `cat ./clusterprops.json` (and I've also tried just pushing those above values within the zkCli app to no avail) I always get a message like this: Caused by: org.noggit.JSONParser$ParseException: Expected string: char=\,position=1 BEFORE='{\' AFTER='urlScheme\:\https\}' I'm not getting a whole lot on searches for clusterprops.json -- any advice would be appreciated. -- Chris On Tue, Sep 2, 2014 at 8:59 AM, Christopher Gross cogr...@gmail.com wrote: Solr 4.8.1 Java 1.7 Tomcat 7.0.50 Zookeeper 3.4.6 Trying to get a SolrCloud running with https only. I found this: https://issues.apache.org/jira/browse/SOLR-3854 I don't have a clusterprops.json file, and running the zkCli command doesn't add one either. Command is along the lines of: ./zkCli.sh -zkhost host:port -cmd put /clusterprops.json '{urlScheme:https}' (run from the zookeeper/bin directory). I've done some googling, but I can't seem to figure out what I'm doing wrong. I'm not getting an error message when doing the command. Any ideas? Thanks. -- Chris
Re: HTTPS for SolrCloud
First question: ignoring the oiginal jira (which may be out of date due to later improvements) have you seen the instructions? https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud : I always get a message like this: : Caused by: org.noggit.JSONParser$ParseException: Expected string: : char=\,position=1 BEFORE='{\' AFTER='urlScheme\:\https\}' looks like you have literally backslash caractes in your JSON (evidently from your attempts to ecape the quote characters) If you're having trouble with putting hte JSON directly in the command line (you're examples looked really contrived - which shell are you using?) you can always -putfile directly and bypass any concerns about the shell... https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities -Hoss http://www.lucidworks.com/
Re: HTTPS for SolrCloud
Hi Hoss. I did finally stumble onto that document (just after I posted my last message, of course). Using bash shell. I've now tried those steps: Tomcat is stopped. First I run: ./zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json '{urlScheme:https}' I confirm via the zookeeper-provided client: [zk: localhost:2181(CONNECTED) 0] get /clusterprops.json {urlScheme:https} cZxid = 0x1053a ctime = Tue Sep 02 16:11:09 GMT-00:00 2014 mZxid = 0x1053a mtime = Tue Sep 02 16:11:09 GMT-00:00 2014 pZxid = 0x1053a cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 21 numChildren = 0 [zk: localhost:2181(CONNECTED) 1] Next I start Tomcat, I get this: 482 [localhost-startStop-1] ERROR org.apache.solr.core.SolrCore â null:org.noggit.JSONParser$ParseException: JSON Parse Error: char=',position=0 BEFORE=''' AFTER='{urlScheme:https}'' I've done it with without the quotes, based on commentary here: http://qnalist.com/questions/4770318/solrcloud-and-https I get the same error with loading in the props this way: ./zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json {\urlScheme\:\https\} Error: 533 [localhost-startStop-1] ERROR org.apache.solr.core.SolrCore â null:org.noggit.JSONParser$ParseException: JSON Parse Error: char=',position=0 BEFORE=''' AFTER='{urlScheme:https}'' putfile also nets the same error. I'm not sure where I'm supposed to go from here. Thanks! -- Chris On Tue, Sep 2, 2014 at 12:06 PM, Chris Hostetter hossman_luc...@fucit.org wrote: First question: ignoring the oiginal jira (which may be out of date due to later improvements) have you seen the instructions? https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud : I always get a message like this: : Caused by: org.noggit.JSONParser$ParseException: Expected string: : char=\,position=1 BEFORE='{\' AFTER='urlScheme\:\https\}' looks like you have literally backslash caractes in your JSON (evidently from your attempts to ecape the quote characters) If you're having trouble with putting hte JSON directly in the command line (you're examples looked really contrived - which shell are you using?) you can always -putfile directly and bypass any concerns about the shell... https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities -Hoss http://www.lucidworks.com/
Re: HTTPS for SolrCloud
Side note -- I've also tried adding the clusterprops.json file via zookeeper's shell client on the command line, and within that client, all with no luck. -- Chris On Tue, Sep 2, 2014 at 12:19 PM, Christopher Gross cogr...@gmail.com wrote: Hi Hoss. I did finally stumble onto that document (just after I posted my last message, of course). Using bash shell. I've now tried those steps: Tomcat is stopped. First I run: ./zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json '{urlScheme:https}' I confirm via the zookeeper-provided client: [zk: localhost:2181(CONNECTED) 0] get /clusterprops.json {urlScheme:https} cZxid = 0x1053a ctime = Tue Sep 02 16:11:09 GMT-00:00 2014 mZxid = 0x1053a mtime = Tue Sep 02 16:11:09 GMT-00:00 2014 pZxid = 0x1053a cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 21 numChildren = 0 [zk: localhost:2181(CONNECTED) 1] Next I start Tomcat, I get this: 482 [localhost-startStop-1] ERROR org.apache.solr.core.SolrCore â null:org.noggit.JSONParser$ParseException: JSON Parse Error: char=',position=0 BEFORE=''' AFTER='{urlScheme:https}'' I've done it with without the quotes, based on commentary here: http://qnalist.com/questions/4770318/solrcloud-and-https I get the same error with loading in the props this way: ./zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json {\urlScheme\:\https\} Error: 533 [localhost-startStop-1] ERROR org.apache.solr.core.SolrCore â null:org.noggit.JSONParser$ParseException: JSON Parse Error: char=',position=0 BEFORE=''' AFTER='{urlScheme:https}'' putfile also nets the same error. I'm not sure where I'm supposed to go from here. Thanks! -- Chris On Tue, Sep 2, 2014 at 12:06 PM, Chris Hostetter hossman_luc...@fucit.org wrote: First question: ignoring the oiginal jira (which may be out of date due to later improvements) have you seen the instructions? https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud : I always get a message like this: : Caused by: org.noggit.JSONParser$ParseException: Expected string: : char=\,position=1 BEFORE='{\' AFTER='urlScheme\:\https\}' looks like you have literally backslash caractes in your JSON (evidently from your attempts to ecape the quote characters) If you're having trouble with putting hte JSON directly in the command line (you're examples looked really contrived - which shell are you using?) you can always -putfile directly and bypass any concerns about the shell... https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities -Hoss http://www.lucidworks.com/
WordDelimiter filter, expanding to multiple words, unexpected results
Hello, I'm running into a case where a query is not returning the results I expect, and I'm hoping someone can offer some explanation that might help me fine tune things or understand what's up. I am running Solr 4.3. My filter chain includes a WordDelimiterFilter and, later a filter that downcases everything for case-insensitive searching. It includes many other things too, but I think these are the pertinent facts. For query dELALAIN, the WordDelimiterFilter splits into: text: d start: 0 position: 1 text: ELALAIN start: 1 position: 2 text: dELALAIN start: 0 position: 2 Note the duplication/overlap of the tokens -- one version with d and ELALAIN split into two tokens, and another with just one token. Later, all the tokens are lowercased by another filter in the chain. (actually an ICU filter which is doing something more complicated than just lowercasing, but I think we can consider it lowercasing for the purposes of this discussion). If I understand right what the WordDelimiterFilter is trying to do here, it's probably doing something special because of the lowercase d followed by an uppercase letter, a special case for that. (I don't get this behavior with other mixed case queries not beginning with 'd'). And, what I think it's trying to do, is match text indexed as d elalain as well as text indexed by delalain. The problem is, it's not accomplishing that -- it is NOT matching text that was indexed as delalain (one token). I don't entirely understand what the position attribute is for -- but I wonder if in this case, the position on dELALAIN is really supposed to be 1, not 2? Could that be responsible for the bug? Or is position irrelevant in this case? If that's not it, then I'm at a loss as to what may be causing this bug -- or even if it's a bug at all, or I'm just not understanding intended behavior. I expect a query for dELALAIN to match text indexed as delalain (because of the forced lowercasing in the filter chain). But it's not doing so. Are my expectations wrong? Bug? Something else? Thanks for any advice, Jonathan
Re: Search on specific shard
Hi Ankit, The following blog posts should help you understand composite-id routing in SolrCloud better. http://searchhub.org/2013/06/13/solr-cloud-document-routing/ A more complicated use case (multi-level routing) : http://searchhub.org/2014/01/06/10590/ On Tue, Sep 2, 2014 at 6:38 AM, Ankit Jain ankitjainc...@gmail.com wrote: Hi All, I am using below piece of code to route a data on the basis of user field. The data of user1 is going on one shard and data of user2 is going on another shard. try { String zkHostString = 127.0.0.1:2181; CloudSolrServer cloudSolrServer = new CloudSolrServer(zkHostString); CollectionAdminRequest.createCollection(collection5, 2, 2, 2, null, null, user, cloudSolrServer); cloudSolrServer.setDefaultCollection(collection5); for (int i = 0; i = 100; i++) { SolrInputDocument document = new SolrInputDocument(); document.addField(id, i); document.addField(user, user+(i%2)); cloudSolrServer.add(document); } cloudSolrServer.commit(); cloudSolrServer.shutdown(); } catch (SolrException e) { e.printStackTrace(); } catch (SolrServerException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } Now, I want to use routing at search time. If user search the documents for user1, then my query should be execute on only shard1 (shard1 contains the data of user1). Please let me know, how we can route the query to specific shard at search time. -- Thanks, Ankit Jain -- Anshum Gupta http://www.anshumgupta.net
Re: WordDelimiter filter, expanding to multiple words, unexpected results
Hi Jonathan, Little confused by this line: And, what I think it's trying to do, is match text indexed as d elalain as well as text indexed by delalain. In this case, I don't know how WordDelimiterFilter will help, as you're likely tokenizing on spaces somewhere, and that input text has a space. I could be wrong. It's probably best if you post your field definition from your schema. Also, is this a free-text field, or something that's more like a short string? Thanks, Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Sep 2, 2014 at 12:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Hello, I'm running into a case where a query is not returning the results I expect, and I'm hoping someone can offer some explanation that might help me fine tune things or understand what's up. I am running Solr 4.3. My filter chain includes a WordDelimiterFilter and, later a filter that downcases everything for case-insensitive searching. It includes many other things too, but I think these are the pertinent facts. For query dELALAIN, the WordDelimiterFilter splits into: text: d start: 0 position: 1 text: ELALAIN start: 1 position: 2 text: dELALAIN start: 0 position: 2 Note the duplication/overlap of the tokens -- one version with d and ELALAIN split into two tokens, and another with just one token. Later, all the tokens are lowercased by another filter in the chain. (actually an ICU filter which is doing something more complicated than just lowercasing, but I think we can consider it lowercasing for the purposes of this discussion). If I understand right what the WordDelimiterFilter is trying to do here, it's probably doing something special because of the lowercase d followed by an uppercase letter, a special case for that. (I don't get this behavior with other mixed case queries not beginning with 'd'). And, what I think it's trying to do, is match text indexed as d elalain as well as text indexed by delalain. The problem is, it's not accomplishing that -- it is NOT matching text that was indexed as delalain (one token). I don't entirely understand what the position attribute is for -- but I wonder if in this case, the position on dELALAIN is really supposed to be 1, not 2? Could that be responsible for the bug? Or is position irrelevant in this case? If that's not it, then I'm at a loss as to what may be causing this bug -- or even if it's a bug at all, or I'm just not understanding intended behavior. I expect a query for dELALAIN to match text indexed as delalain (because of the forced lowercasing in the filter chain). But it's not doing so. Are my expectations wrong? Bug? Something else? Thanks for any advice, Jonathan
Re: WordDelimiter filter, expanding to multiple words, unexpected results
Thanks for the response. I understand the problem a little bit better after investigating more. Posting my full field definitions is, I think, going to be confusing, as they are long and complicated. I can narrow it down to an isolation case if I need to. My indexed field in question is relatively short strings. But what it's got to do with is the WordDelimiterFilter's default splitOnCaseChange=1 and generateWordParts=1, and the effects of such. Let's take a less confusing example, query MacBook. With a WordDelimiterFilter followed by something that downcases everything. I think what the WDF (followed by case folding) is trying to do is make query MacBook match both indexed text mac book as well as macbook -- either one should be a match. Is my understanding right of what WordDelimiterfilter with splitOnCaseChange=1 and generateWordParts=1 is intending to do? In my actual index, query MacBook is matching ONLY mac book, and not macbook. Which is unexpected. I indeed want it to match both. (I realize I could make it match only 'macbook' by setting splitOnCaseChange=0 and/or generateWordParts=0). It's possible this is happening as a side effect of other parts of my complex field definition, and I really do need to post hte whole thing and/or isolate it. But I wonder if there are known general problem cases that cause this kind of failure, or any known bugs in WordDelimiterFilter (in Solr 4.3?) that cause this kind of failure. And I wonder if WordDelimiter filter spitting out the token MacBook with position 2 rather than 1 is expected, irrelevant, or possibly a relevant problem. Thanks again, Jonathan On 9/2/14 12:59 PM, Michael Della Bitta wrote: Hi Jonathan, Little confused by this line: And, what I think it's trying to do, is match text indexed as d elalain as well as text indexed by delalain. In this case, I don't know how WordDelimiterFilter will help, as you're likely tokenizing on spaces somewhere, and that input text has a space. I could be wrong. It's probably best if you post your field definition from your schema. Also, is this a free-text field, or something that's more like a short string? Thanks, Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Sep 2, 2014 at 12:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Hello, I'm running into a case where a query is not returning the results I expect, and I'm hoping someone can offer some explanation that might help me fine tune things or understand what's up. I am running Solr 4.3. My filter chain includes a WordDelimiterFilter and, later a filter that downcases everything for case-insensitive searching. It includes many other things too, but I think these are the pertinent facts. For query dELALAIN, the WordDelimiterFilter splits into: text: d start: 0 position: 1 text: ELALAIN start: 1 position: 2 text: dELALAIN start: 0 position: 2 Note the duplication/overlap of the tokens -- one version with d and ELALAIN split into two tokens, and another with just one token. Later, all the tokens are lowercased by another filter in the chain. (actually an ICU filter which is doing something more complicated than just lowercasing, but I think we can consider it lowercasing for the purposes of this discussion). If I understand right what the WordDelimiterFilter is trying to do here, it's probably doing something special because of the lowercase d followed by an uppercase letter, a special case for that. (I don't get this behavior with other mixed case queries not beginning with 'd'). And, what I think it's trying to do, is match text indexed as d elalain as well as text indexed by delalain. The problem is, it's not accomplishing that -- it is NOT matching text that was indexed as delalain (one token). I don't entirely understand what the position attribute is for -- but I wonder if in this case, the position on dELALAIN is really supposed to be 1, not 2? Could that be responsible for the bug? Or is position irrelevant in this case? If that's not it, then I'm at a loss as to what may be causing this bug -- or even if it's a bug at all, or I'm just not understanding intended behavior. I expect a query for dELALAIN to match text indexed as delalain (because of the forced lowercasing in the filter chain). But it's not doing so. Are my expectations wrong? Bug? Something else? Thanks for any advice, Jonathan
Re: HTTPS for SolrCloud
: ./zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json : '{urlScheme:https}' ... : Next I start Tomcat, I get this: : 482 [localhost-startStop-1] ERROR org.apache.solr.core.SolrCore â : null:org.noggit.JSONParser$ParseException: JSON Parse Error: : char=',position=0 BEFORE=''' AFTER='{urlScheme:https}'' I can't reproduce the erorr you are describing when i follow all the steps on the SSL doc page (using bash, and the outer single quotes, just like you)... https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud Are you certain that you your solr nodes are talking to the same zookeeper instance? (Because according to that error, there is a stray sigle-quote at the begining of the clusterprops.json file in the ZK server solr is talking to, and as you already confirmed there's no single quotes in the string you read back from the zk server you are talking to ... perhaps there are 2 zk instances setup somewhere and the one solr is using still has crufty data from before you got the quoting issue straightened out?) do you see log messages early on in Solr's startup from ZkContainer that say... 1359 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper client=localhost:2181 ? -Hoss http://www.lucidworks.com/
Re: WordDelimiter filter, expanding to multiple words, unexpected results
If that's your problem, I bet all you have to do is twiddle on one of the catenate options, either catenateWords or catenateAll. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Sep 2, 2014 at 1:07 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Thanks for the response. I understand the problem a little bit better after investigating more. Posting my full field definitions is, I think, going to be confusing, as they are long and complicated. I can narrow it down to an isolation case if I need to. My indexed field in question is relatively short strings. But what it's got to do with is the WordDelimiterFilter's default splitOnCaseChange=1 and generateWordParts=1, and the effects of such. Let's take a less confusing example, query MacBook. With a WordDelimiterFilter followed by something that downcases everything. I think what the WDF (followed by case folding) is trying to do is make query MacBook match both indexed text mac book as well as macbook -- either one should be a match. Is my understanding right of what WordDelimiterfilter with splitOnCaseChange=1 and generateWordParts=1 is intending to do? In my actual index, query MacBook is matching ONLY mac book, and not macbook. Which is unexpected. I indeed want it to match both. (I realize I could make it match only 'macbook' by setting splitOnCaseChange=0 and/or generateWordParts=0). It's possible this is happening as a side effect of other parts of my complex field definition, and I really do need to post hte whole thing and/or isolate it. But I wonder if there are known general problem cases that cause this kind of failure, or any known bugs in WordDelimiterFilter (in Solr 4.3?) that cause this kind of failure. And I wonder if WordDelimiter filter spitting out the token MacBook with position 2 rather than 1 is expected, irrelevant, or possibly a relevant problem. Thanks again, Jonathan On 9/2/14 12:59 PM, Michael Della Bitta wrote: Hi Jonathan, Little confused by this line: And, what I think it's trying to do, is match text indexed as d elalain as well as text indexed by delalain. In this case, I don't know how WordDelimiterFilter will help, as you're likely tokenizing on spaces somewhere, and that input text has a space. I could be wrong. It's probably best if you post your field definition from your schema. Also, is this a free-text field, or something that's more like a short string? Thanks, Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/ 112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Sep 2, 2014 at 12:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Hello, I'm running into a case where a query is not returning the results I expect, and I'm hoping someone can offer some explanation that might help me fine tune things or understand what's up. I am running Solr 4.3. My filter chain includes a WordDelimiterFilter and, later a filter that downcases everything for case-insensitive searching. It includes many other things too, but I think these are the pertinent facts. For query dELALAIN, the WordDelimiterFilter splits into: text: d start: 0 position: 1 text: ELALAIN start: 1 position: 2 text: dELALAIN start: 0 position: 2 Note the duplication/overlap of the tokens -- one version with d and ELALAIN split into two tokens, and another with just one token. Later, all the tokens are lowercased by another filter in the chain. (actually an ICU filter which is doing something more complicated than just lowercasing, but I think we can consider it lowercasing for the purposes of this discussion). If I understand right what the WordDelimiterFilter is trying to do here, it's probably doing something special because of the lowercase d followed by an uppercase letter, a special case for that. (I don't get this behavior with other mixed case queries not beginning with 'd'). And, what I think it's trying to do, is match text indexed as d elalain as well as text indexed by delalain. The problem is, it's not accomplishing that -- it is NOT matching text that was indexed as delalain (one token). I don't entirely understand what the position attribute is for -- but I wonder if in this case, the position on dELALAIN is really supposed to be 1, not 2? Could that be responsible for the bug? Or is position irrelevant in this case? If that's not it, then I'm
Re: HTTPS for SolrCloud
OK -- so I think my previous attempts were causing the problem. Since this is a dev environment (and is still empty), I just went ahead and wiped out the version-2 directories for the zookeeper nodes, reloaded my solr collections, then ran that command (zkcli.sh in the solr distro). That did work. What is a reliable way to remove a file from Zookeeper? Now I just get this error when trying to create a collection: org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: https://server:8444 This brings up another problem that I have -- if there's an error creating a collection, if I fix the issue and try to re-create the collection, I get something like this: str name=Operation createcollection caused exception:org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: collection already exists: testcollection/str How do I go about cleaning those up? The only reliable thing that I've found is to wipe out the zookeepers and start over. Thanks Hoss! -- Chris On Tue, Sep 2, 2014 at 1:08 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : ./zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json : '{urlScheme:https}' ... : Next I start Tomcat, I get this: : 482 [localhost-startStop-1] ERROR org.apache.solr.core.SolrCore â : null:org.noggit.JSONParser$ParseException: JSON Parse Error: : char=',position=0 BEFORE=''' AFTER='{urlScheme:https}'' I can't reproduce the erorr you are describing when i follow all the steps on the SSL doc page (using bash, and the outer single quotes, just like you)... https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud Are you certain that you your solr nodes are talking to the same zookeeper instance? (Because according to that error, there is a stray sigle-quote at the begining of the clusterprops.json file in the ZK server solr is talking to, and as you already confirmed there's no single quotes in the string you read back from the zk server you are talking to ... perhaps there are 2 zk instances setup somewhere and the one solr is using still has crufty data from before you got the quoting issue straightened out?) do you see log messages early on in Solr's startup from ZkContainer that say... 1359 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper client=localhost:2181 ? -Hoss http://www.lucidworks.com/
Re: WordDelimiter filter, expanding to multiple words, unexpected results
Yes, thanks, I realize I can twiddle those parameters, but it will probably result in MacBook no longer matching mac book at all, but ONLY matching macbook. My understanding of the default settings of WordDelimiterFactory is that they are intending for MacBook to match both mac book AND macbook. I will try to create an isolation reproduction that demonstrates this ruling out interference from other filters (or identifying the other filters), to make my question more clear, I guess. Jonathan On 9/2/14 1:34 PM, Michael Della Bitta wrote: If that's your problem, I bet all you have to do is twiddle on one of the catenate options, either catenateWords or catenateAll. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Sep 2, 2014 at 1:07 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Thanks for the response. I understand the problem a little bit better after investigating more. Posting my full field definitions is, I think, going to be confusing, as they are long and complicated. I can narrow it down to an isolation case if I need to. My indexed field in question is relatively short strings. But what it's got to do with is the WordDelimiterFilter's default splitOnCaseChange=1 and generateWordParts=1, and the effects of such. Let's take a less confusing example, query MacBook. With a WordDelimiterFilter followed by something that downcases everything. I think what the WDF (followed by case folding) is trying to do is make query MacBook match both indexed text mac book as well as macbook -- either one should be a match. Is my understanding right of what WordDelimiterfilter with splitOnCaseChange=1 and generateWordParts=1 is intending to do? In my actual index, query MacBook is matching ONLY mac book, and not macbook. Which is unexpected. I indeed want it to match both. (I realize I could make it match only 'macbook' by setting splitOnCaseChange=0 and/or generateWordParts=0). It's possible this is happening as a side effect of other parts of my complex field definition, and I really do need to post hte whole thing and/or isolate it. But I wonder if there are known general problem cases that cause this kind of failure, or any known bugs in WordDelimiterFilter (in Solr 4.3?) that cause this kind of failure. And I wonder if WordDelimiter filter spitting out the token MacBook with position 2 rather than 1 is expected, irrelevant, or possibly a relevant problem. Thanks again, Jonathan On 9/2/14 12:59 PM, Michael Della Bitta wrote: Hi Jonathan, Little confused by this line: And, what I think it's trying to do, is match text indexed as d elalain as well as text indexed by delalain. In this case, I don't know how WordDelimiterFilter will help, as you're likely tokenizing on spaces somewhere, and that input text has a space. I could be wrong. It's probably best if you post your field definition from your schema. Also, is this a free-text field, or something that's more like a short string? Thanks, Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/ 112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Sep 2, 2014 at 12:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Hello, I'm running into a case where a query is not returning the results I expect, and I'm hoping someone can offer some explanation that might help me fine tune things or understand what's up. I am running Solr 4.3. My filter chain includes a WordDelimiterFilter and, later a filter that downcases everything for case-insensitive searching. It includes many other things too, but I think these are the pertinent facts. For query dELALAIN, the WordDelimiterFilter splits into: text: d start: 0 position: 1 text: ELALAIN start: 1 position: 2 text: dELALAIN start: 0 position: 2 Note the duplication/overlap of the tokens -- one version with d and ELALAIN split into two tokens, and another with just one token. Later, all the tokens are lowercased by another filter in the chain. (actually an ICU filter which is doing something more complicated than just lowercasing, but I think we can consider it lowercasing for the purposes of this discussion). If I understand right what the WordDelimiterFilter is trying to do here, it's probably doing something special because of the lowercase d followed by an uppercase letter, a special case for that. (I don't get this behavior with other mixed case queries not beginning with 'd'). And, what I think it's
Re: WordDelimiter filter, expanding to multiple words, unexpected results
bq: In my actual index, query MacBook is matching ONLY mac book, and not macbook I suspect your query parameters for WordDelimiterFilterFactory doesn't have catenate words set. What do you see when you enter these in both the index and query portions of the admin/analysis page? Best, Erick On Tue, Sep 2, 2014 at 10:34 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: If that's your problem, I bet all you have to do is twiddle on one of the catenate options, either catenateWords or catenateAll. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Sep 2, 2014 at 1:07 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Thanks for the response. I understand the problem a little bit better after investigating more. Posting my full field definitions is, I think, going to be confusing, as they are long and complicated. I can narrow it down to an isolation case if I need to. My indexed field in question is relatively short strings. But what it's got to do with is the WordDelimiterFilter's default splitOnCaseChange=1 and generateWordParts=1, and the effects of such. Let's take a less confusing example, query MacBook. With a WordDelimiterFilter followed by something that downcases everything. I think what the WDF (followed by case folding) is trying to do is make query MacBook match both indexed text mac book as well as macbook -- either one should be a match. Is my understanding right of what WordDelimiterfilter with splitOnCaseChange=1 and generateWordParts=1 is intending to do? In my actual index, query MacBook is matching ONLY mac book, and not macbook. Which is unexpected. I indeed want it to match both. (I realize I could make it match only 'macbook' by setting splitOnCaseChange=0 and/or generateWordParts=0). It's possible this is happening as a side effect of other parts of my complex field definition, and I really do need to post hte whole thing and/or isolate it. But I wonder if there are known general problem cases that cause this kind of failure, or any known bugs in WordDelimiterFilter (in Solr 4.3?) that cause this kind of failure. And I wonder if WordDelimiter filter spitting out the token MacBook with position 2 rather than 1 is expected, irrelevant, or possibly a relevant problem. Thanks again, Jonathan On 9/2/14 12:59 PM, Michael Della Bitta wrote: Hi Jonathan, Little confused by this line: And, what I think it's trying to do, is match text indexed as d elalain as well as text indexed by delalain. In this case, I don't know how WordDelimiterFilter will help, as you're likely tokenizing on spaces somewhere, and that input text has a space. I could be wrong. It's probably best if you post your field definition from your schema. Also, is this a free-text field, or something that's more like a short string? Thanks, Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/ 112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Sep 2, 2014 at 12:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Hello, I'm running into a case where a query is not returning the results I expect, and I'm hoping someone can offer some explanation that might help me fine tune things or understand what's up. I am running Solr 4.3. My filter chain includes a WordDelimiterFilter and, later a filter that downcases everything for case-insensitive searching. It includes many other things too, but I think these are the pertinent facts. For query dELALAIN, the WordDelimiterFilter splits into: text: d start: 0 position: 1 text: ELALAIN start: 1 position: 2 text: dELALAIN start: 0 position: 2 Note the duplication/overlap of the tokens -- one version with d and ELALAIN split into two tokens, and another with just one token. Later, all the tokens are lowercased by another filter in the chain. (actually an ICU filter which is doing something more complicated than just lowercasing, but I think we can consider it lowercasing for the purposes of this discussion). If I understand right what the WordDelimiterFilter is trying to do here, it's probably doing something special because of the lowercase d followed by an uppercase letter, a special case for that. (I don't get this behavior with
RE: Solr spellcheck returns more than 1 word for a 1 word spellcheck
This is the WordBreakSolrSpellChecker, which is there to correct spelling errors involving misplaced whitespace (or is it white space ??) To disable it, remove this or similar line from your requestHandler in solrconfig.xml: str name=spellcheck.dictionarywordbreak/str Keep in mind, if you want the best of both worlds, you can keep this there and using the collation feature, it will try and pick the best combination of spelling corrections that best fixes your user's query. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate and following sections. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Thomas Michael Engelke [mailto:thomas.enge...@posteo.de] Sent: Monday, September 01, 2014 6:44 AM To: Solr user Subject: Solr spellcheck returns more than 1 word for a 1 word spellcheck I'm in the process of incorporating Solr spellchecking in our product. For that, I've created a new field: field name=spell type=spell indexed=true stored=true required=false multiValued=false/ copyField source=name dest=spell maxChars=3 / And in the fieldType definitions: fieldType name=spell class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType Then I feed the names of products into the corresponding core. They can have a lot of words (examples): door lock rear left Door brake, door in front + rear fitting. However, the names get pretty long, and in the source data, they have been truncated. This sometimes leaves parts of words at the end: The water pump can evacuate some coo I have created a spellcheck component, feeding of the `spell` field defined earlier. Now for the problem. Sometimes, when I look up a slightly misspelled word, I get results I do not expect. Example request: http://solr.url:8983/solr/en/spell?q=coole This is (part of) the response: str name=wordcooler/strint name=freq21/int str name=wordcoo le/strint name=freq2/int str name=wordcable/strint name=freq334/int str name=wordco o le/strint name=freq4/int [...] Now, as you can see, the misspelled `coole` should have been `cooler`, and it's the first suggestion. However, the second and fourth suggestion baffle me. After a bit of research, I found this to be multiple words clunked together. As I described above, `coo` was a part of a name that was truncated. I found `co` the same way, and the source data contains a small number of `o` characters on their own (product number names). Now, my question is: Why is Solr suggesting `multiple words` pasted together for a spellcheck for a single word? Is there a way to prevent Solr from pasting together word parts to forge suggestions?
Re: HTTPS for SolrCloud
Is the solr.ssl.checkPeerName option available in 4.8.1? I have my Tomcat starting up with that as a -D option, but I'm getting an exception on validating the hostname w/ the cert... -- Chris On Tue, Sep 2, 2014 at 1:44 PM, Christopher Gross cogr...@gmail.com wrote: OK -- so I think my previous attempts were causing the problem. Since this is a dev environment (and is still empty), I just went ahead and wiped out the version-2 directories for the zookeeper nodes, reloaded my solr collections, then ran that command (zkcli.sh in the solr distro). That did work. What is a reliable way to remove a file from Zookeeper? Now I just get this error when trying to create a collection: org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: https://server:8444 This brings up another problem that I have -- if there's an error creating a collection, if I fix the issue and try to re-create the collection, I get something like this: str name=Operation createcollection caused exception:org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: collection already exists: testcollection/str How do I go about cleaning those up? The only reliable thing that I've found is to wipe out the zookeepers and start over. Thanks Hoss! -- Chris On Tue, Sep 2, 2014 at 1:08 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : ./zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json : '{urlScheme:https}' ... : Next I start Tomcat, I get this: : 482 [localhost-startStop-1] ERROR org.apache.solr.core.SolrCore â : null:org.noggit.JSONParser$ParseException: JSON Parse Error: : char=',position=0 BEFORE=''' AFTER='{urlScheme:https}'' I can't reproduce the erorr you are describing when i follow all the steps on the SSL doc page (using bash, and the outer single quotes, just like you)... https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud Are you certain that you your solr nodes are talking to the same zookeeper instance? (Because according to that error, there is a stray sigle-quote at the begining of the clusterprops.json file in the ZK server solr is talking to, and as you already confirmed there's no single quotes in the string you read back from the zk server you are talking to ... perhaps there are 2 zk instances setup somewhere and the one solr is using still has crufty data from before you got the quoting issue straightened out?) do you see log messages early on in Solr's startup from ZkContainer that say... 1359 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper client=localhost:2181 ? -Hoss http://www.lucidworks.com/
Solr 4.1.0 Compatibility with zookeeper 3.4.5
Hello, I'm using solr 4.1.0 with zookeeper 3.3.6 and need to update to zookeeper 3.4.5 . I would like to make sure if solr 4.1.0 is compatible with zookeeper 3.4.5 or if there are any precautions should I take before up-gradation. -- Best Regards, Shivam Bajpai DevOps Engineer StackExpress
Re: WordDelimiter filter, expanding to multiple words, unexpected results
On 9/2/14 1:51 PM, Erick Erickson wrote: bq: In my actual index, query MacBook is matching ONLY mac book, and not macbook I suspect your query parameters for WordDelimiterFilterFactory doesn't have catenate words set. What do you see when you enter these in both the index and query portions of the admin/analysis page? Thanks Erick! Our WordDelimiterFilterFactory does have catenate words set, in both index and query phases (is that right?): filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ It's hard to cut and paste the results of the analysis page into email (or anywhere!), I'll give you screenshots, sorry -- and I'll give them for our whole real world app complex field definition. I'll also paste in our entire field definition below. But I realize my next step is probably creating a simpler isolation/reproduction case (unless you have a magic answer from this!). Again, the problem is that MacBook seems to be only matching on indexed macbook and not indexed mac book. MacBook query analysis: https://www.dropbox.com/s/b8y11usjdlc88un/mixedcasequery.png MacBook index analysis: https://www.dropbox.com/s/fwae3nz4tdtjhjv/mixedcaseindex.png mac book index analysis: https://www.dropbox.com/s/mihd58f6zs3rfu8/twowordindex.png Our entire actual field definition: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer !-- the rulefiles thing is to keep ICUTokenizerFactory from stripping punctuation, so our synonym filter involving C++ etc can still work. From: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201305.mbox/%3c51965e70.6070...@elyograg.org%3E the rbbi file is in our local ./conf, copied from lucene source tree -- tokenizer class=solr.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.SynonymFilterFactory synonyms=punctuation-whitelist.txt ignoreCase=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ !-- folding need sto be after WordDelimiter, so WordDelimiter can do it's thing with full cases and such -- filter class=solr.ICUFoldingFilterFactory / !-- ICUFolding already includes lowercasing, no need for seperate lowercasing step filter class=solr.LowerCaseFilterFactory/ -- filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType
Re: WordDelimiter filter, expanding to multiple words, unexpected results
What happens if you append debug=query to your query? IOW, what does the _parsed_ query look like? Also note that the defaults for WDFF are _not_ identical. catenateWords and catenateNumbers are 1 in the index portion and 0 in the query section. Still, this shouldn't be a problem all other things being equal. Best, Erick On Tue, Sep 2, 2014 at 12:43 PM, Jonathan Rochkind rochk...@jhu.edu wrote: On 9/2/14 1:51 PM, Erick Erickson wrote: bq: In my actual index, query MacBook is matching ONLY mac book, and not macbook I suspect your query parameters for WordDelimiterFilterFactory doesn't have catenate words set. What do you see when you enter these in both the index and query portions of the admin/analysis page? Thanks Erick! Our WordDelimiterFilterFactory does have catenate words set, in both index and query phases (is that right?): filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ It's hard to cut and paste the results of the analysis page into email (or anywhere!), I'll give you screenshots, sorry -- and I'll give them for our whole real world app complex field definition. I'll also paste in our entire field definition below. But I realize my next step is probably creating a simpler isolation/reproduction case (unless you have a magic answer from this!). Again, the problem is that MacBook seems to be only matching on indexed macbook and not indexed mac book. MacBook query analysis: https://www.dropbox.com/s/b8y11usjdlc88un/mixedcasequery.png MacBook index analysis: https://www.dropbox.com/s/fwae3nz4tdtjhjv/mixedcaseindex.png mac book index analysis: https://www.dropbox.com/s/mihd58f6zs3rfu8/twowordindex.png Our entire actual field definition: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer !-- the rulefiles thing is to keep ICUTokenizerFactory from stripping punctuation, so our synonym filter involving C++ etc can still work. From: https://mail-archives.apache. org/mod_mbox/lucene-solr-user/201305.mbox/%3C51965E70. 6070...@elyograg.org%3E the rbbi file is in our local ./conf, copied from lucene source tree -- tokenizer class=solr.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.SynonymFilterFactory synonyms=punctuation-whitelist.txt ignoreCase=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ !-- folding need sto be after WordDelimiter, so WordDelimiter can do it's thing with full cases and such -- filter class=solr.ICUFoldingFilterFactory / !-- ICUFolding already includes lowercasing, no need for seperate lowercasing step filter class=solr.LowerCaseFilterFactory/ -- filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType
Re: WordDelimiter filter, expanding to multiple words, unexpected results
Although not a solution, this may help in trying to find the problem. In http://solr.pl/en/2010/08/16/what-is-schema-xml/ it says: It is worth noting that there is an additional attribute for the text field type: autoGeneratePhraseQueries This attribute is responsible for telling filters how to behave when dividing tokens. Some filters (such as WordDelimiterFilter) can divide tokens into a set of tokens. Setting the attribute to true (default value) will automatically generate phrase queries. This means that WordDelimiterFilter will divide the word “wi-fi” into two tokens “wi” and “fi”. With autoGeneratePhraseQueries set to true query sent to Lucene will look like field:wi fi, while with set to false Lucene query will look like field:wi OR field:fi. However, please note, that this attribute only behaves well with tokenizers based on white spaces. Since phrases are made by looking at the position, it is possible that the position set for the other generated tokens have something to do with it. Have you tried turning autoGeneratePhraseQueries=false to see if it'll match both? (I know that might have other unintended behaviors but it might give some insight into the problem) Diego Fernandez - 爱国 Software Engineer US GSS Supportability - Diagnostics - Original Message - On 9/2/14 1:51 PM, Erick Erickson wrote: bq: In my actual index, query MacBook is matching ONLY mac book, and not macbook I suspect your query parameters for WordDelimiterFilterFactory doesn't have catenate words set. What do you see when you enter these in both the index and query portions of the admin/analysis page? Thanks Erick! Our WordDelimiterFilterFactory does have catenate words set, in both index and query phases (is that right?): filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ It's hard to cut and paste the results of the analysis page into email (or anywhere!), I'll give you screenshots, sorry -- and I'll give them for our whole real world app complex field definition. I'll also paste in our entire field definition below. But I realize my next step is probably creating a simpler isolation/reproduction case (unless you have a magic answer from this!). Again, the problem is that MacBook seems to be only matching on indexed macbook and not indexed mac book. MacBook query analysis: https://www.dropbox.com/s/b8y11usjdlc88un/mixedcasequery.png MacBook index analysis: https://www.dropbox.com/s/fwae3nz4tdtjhjv/mixedcaseindex.png mac book index analysis: https://www.dropbox.com/s/mihd58f6zs3rfu8/twowordindex.png Our entire actual field definition: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer !-- the rulefiles thing is to keep ICUTokenizerFactory from stripping punctuation, so our synonym filter involving C++ etc can still work. From: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201305.mbox/%3c51965e70.6070...@elyograg.org%3E the rbbi file is in our local ./conf, copied from lucene source tree -- tokenizer class=solr.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.SynonymFilterFactory synonyms=punctuation-whitelist.txt ignoreCase=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ !-- folding need sto be after WordDelimiter, so WordDelimiter can do it's thing with full cases and such -- filter class=solr.ICUFoldingFilterFactory / !-- ICUFolding already includes lowercasing, no need for seperate lowercasing step filter class=solr.LowerCaseFilterFactory/ -- filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType
How can I set shard members?
Hi, I am trying to test Solr Cloud with version 4.1.0. ( http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble ) Is there any way set shard shard member ? for example. server1, server2 for shard1 server3, server4 for shard2 when I tested the example, shard member depend on running Solr order. i.e. run server1 - server2 - server3 - server4 then server1, 3 are shard1 and server 2,4 are shard2 of course, from second time there is no dependency of running Solr order. and I tried -DshardId=shard1” but it is not working. Thanks, Chunki.
Re: How can I set shard members?
Hello, have you tried the createNodeSet option of collection/shard creation and the node option of replica creation in Solr 4.9.0+? As you're just testing, I would strongly recommend going to the latest version. https://cwiki.apache.org/confluence/display/solr/Collections+API This is useful to provide underlying topology information. We use this in customer scenarios to partition the set of servers into at least two groups, so all shards of a SolrCloud cluster will have replica X of a shard located in server group X (usually, X = 2). The two server groups then correspond to two separate physical ESX clusters, so if one VM cluster goes down, at least one replica of each shard will still be available. Cheers, --Jürgen On 03.09.2014 06:00, Lee Chunki wrote: Hi, I am trying to test Solr Cloud with version 4.1.0. ( http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble ) Is there any way set shard shard member ? for example. server1, server2 for shard1 server3, server4 for shard2 when I tested the example, shard member depend on running Solr order. i.e. run server1 - server2 - server3 - server4 then server1, 3 are shard1 and server 2,4 are shard2 of course, from second time there is no dependency of running Solr order. and I tried -DshardId=shard1” but it is not working. Thanks, Chunki.
Re: How can I set shard members?
Take a look here: http://heliosearch.org/solrcloud-assigning-nodes-machines/ If you really, really, really require that shard1 be on server1 and _not_ server 3 I'm not quite sure how you'd do it. But if you want your leaders on servers 1 and 3, just use the nodeset. (Jürgen beat me to it!). Best Erick On Tue, Sep 2, 2014 at 9:00 PM, Lee Chunki lck7...@coupang.com wrote: Hi, I am trying to test Solr Cloud with version 4.1.0. ( http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble ) Is there any way set shard shard member ? for example. server1, server2 for shard1 server3, server4 for shard2 when I tested the example, shard member depend on running Solr order. i.e. run server1 - server2 - server3 - server4 then server1, 3 are shard1 and server 2,4 are shard2 of course, from second time there is no dependency of running Solr order. and I tried -DshardId=shard1” but it is not working. Thanks, Chunki.