What does replicationFactor really do?
Hi, In 5.1, we are creating a collection using the Collections API with an initial replicationFactor of X. This value is then stored in the state.json file for that collection. If I try to issue ADDREPLICA on this cluster, it throws an error saying that there are no live nodes for additional replicas. If I connect a new solr node to zookeeper and issue an ADDREPLICA call, the replica is created and no errors are thrown, but replicationFactor remains at X in the state.json file. Why? What does replicationFactor really mean? It seems like it's being honored in some cases and ignored in others. Thanks for any help you can provide. Cheers, Jim
CREATE collection bug or feature?
I noticed that when I issue the CREATE collection command to the api, it does not automatically put a replica on every live node connected to zookeeper. So, for example, if I have 3 solr nodes connected to a zookeeper ensemble and create a collection like this: /admin/collections?action=CREATEname=my_collectionnumShards=1replicationFactor=1maxShardsPerNode=1collection.configName=my_config It will only create a core on one of the three nodes. I can make it work if I change replicationFactor to 3. When standing up an entire stack using chef, this all gets a bit clunky. I don't see any option such as ALL that would just create a replica on all nodes regardless of size. I'm guessing this is intentional, but curious about the reasoning. Thanks! Jim
Re: CREATE collection bug or feature?
Thanks as always for the great answers! Jim On 6/19/15, 11:57 AM, Erick Erickson erickerick...@gmail.com wrote: Jim: This is by design. There's no way to tell Solr to find all the cores available and put one replica on each. In fact, you're explicitly telling it to create one and only one replica, one and only one shard. That is, your collection will have exactly one low-level core. But you realized that... As to the reasoning. Consider hetergeneous collections all hosted on the same Solr cluster. I have big collections, little collections, some with high QPS rates, some not. etc. Having Solr do things like this automatically would make managing this difficult. Probably the real reason is nobody thought it would be useful in the general case. And I probably concur. Adding a new node to an existing cluster would result in unbalanced clusters etc. I suppose a stop-gap would be to query the live_nodes in the cluster and add that to the URL, don't know how much of a pain that would be though. Best, Erick On Fri, Jun 19, 2015 at 10:15 AM, Jim.Musil jim.mu...@target.com wrote: I noticed that when I issue the CREATE collection command to the api, it does not automatically put a replica on every live node connected to zookeeper. So, for example, if I have 3 solr nodes connected to a zookeeper ensemble and create a collection like this: /admin/collections?action=CREATEname=my_collectionnumShards=1replicati onFactor=1maxShardsPerNode=1collection.configName=my_config It will only create a core on one of the three nodes. I can make it work if I change replicationFactor to 3. When standing up an entire stack using chef, this all gets a bit clunky. I don't see any option such as ALL that would just create a replica on all nodes regardless of size. I'm guessing this is intentional, but curious about the reasoning. Thanks! Jim
Collections API and adding new boxes
Hi, Let's say I have a zookeeper ensemble with several Solr nodes connected to it. I've created a collection successfully and all is well. What happens when I want to add another solr node? I've tried spinning one up and connecting it to zookeeper, but the new node doesn't join the collection. What's the expected next step? This is Solr 5.1. Thanks! Jim Musil
Re: Clarification on Collections API for 5.x
bump On 5/21/15, 9:06 AM, Jim.Musil jim.mu...@target.com wrote: Hi, In the guide for moving from Solr 4.x to 5.x, it states the following: Solr 5.0 only supports creating and removing SolrCloud collections through the Collections APIhttps://cwiki.apache.org/confluence/display/solr/Collections+API, unlike previous versions. While not using the collections API may still work in 5.0, it is unsupported, not recommended, and the behavior will change in a 5.x release. Currently, we launch several solr nodes with identical cores defined using the new Core Discovery process. These nodes are also connected to a zookeeper ensemble. Part of the core definition is to set the configSet to use. This configSet is uploaded to zookeeper separately. This effectively creates a Collection. Is this method no long supported in 5.x? Thanks! Jim Musil
Re: Clarification on Collections API for 5.x
Thanks for the clarification! On 5/27/15, 12:00 PM, Erick Erickson erickerick...@gmail.com wrote: Are you defining shard and replicas here? Or is this just a single-node collection? In any case, this seems unnecessary. You'd get the same thing by having your uploading the config set to ZK, then just issuing a Collections CREATE command, specifying the node to use if desired. What you're doing _should_ work, because essentially that's what start up does. It finds cores somewhere below SOLR_HOME and reads the core.properties file. When it finds parameters like collection, shard, coreNodeName, numShards, all that stuff it figures things out. But, you have to get all this right manually with the process you're using now, why take the risk? Besides, in the future you'll have to adapt to any back-compat breaks... Best, Erick On Wed, May 27, 2015 at 8:34 AM, Jim.Musil jim.mu...@target.com wrote: bump On 5/21/15, 9:06 AM, Jim.Musil jim.mu...@target.com wrote: Hi, In the guide for moving from Solr 4.x to 5.x, it states the following: Solr 5.0 only supports creating and removing SolrCloud collections through the Collections APIhttps://cwiki.apache.org/confluence/display/solr/Collections+API, unlike previous versions. While not using the collections API may still work in 5.0, it is unsupported, not recommended, and the behavior will change in a 5.x release. Currently, we launch several solr nodes with identical cores defined using the new Core Discovery process. These nodes are also connected to a zookeeper ensemble. Part of the core definition is to set the configSet to use. This configSet is uploaded to zookeeper separately. This effectively creates a Collection. Is this method no long supported in 5.x? Thanks! Jim Musil
Clarification on Collections API for 5.x
Hi, In the guide for moving from Solr 4.x to 5.x, it states the following: Solr 5.0 only supports creating and removing SolrCloud collections through the Collections APIhttps://cwiki.apache.org/confluence/display/solr/Collections+API, unlike previous versions. While not using the collections API may still work in 5.0, it is unsupported, not recommended, and the behavior will change in a 5.x release. Currently, we launch several solr nodes with identical cores defined using the new Core Discovery process. These nodes are also connected to a zookeeper ensemble. Part of the core definition is to set the configSet to use. This configSet is uploaded to zookeeper separately. This effectively creates a Collection. Is this method no long supported in 5.x? Thanks! Jim Musil
ConfigSets and SolrCloud
Hi, I need a little clarification on configSets in solr 5.x. According to this page: https://cwiki.apache.org/confluence/display/solr/Config+Sets I can create named configSets to be shared by other cores. If I create them using this method AND am operating in SolrCloud mode, will it automatically upload these named config sets to zookeeper? Thanks! Jim Musil
Confusion about zkcli.sh and solr.war
I'm trying to use zkcli.sh to upload configurations to zookeeper and solr 5.1. It's throwing an error because it references webapps/solr.war which no longer exists. Do I have to build my own solr.war in order to use zkcli.sh? Please forgive me if I'm missing something here. Jim Musil
Possible to dump clusterstate, system stats into solr log?
Hi, Is it possible to periodically dump the cluster state contents (or system diagnostics) into the main solr log file? We have many security protocols in place that prevents us from running diagnostic requests directly to the solr boxes, but we do have access to the shipped logs. Thanks! Jim
Re: Where can we set the parameters in Solr Config?
We set them as extra parameters sent to to the servlet (jetty or tomcat). eg java -Dsolr.lock.type=native -jar start.jar Jim On 2/3/15, 11:58 AM, O. Olson olson_...@yahoo.it wrote: I'm sorry if this is a basic question, but I am curious where, or at least, how can we set the parameters in the solrconfig.xml. E.g. Consider the solrconfig.xml shown here: http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_10/solr/exa mple/example-DIH/solr/db/conf/solrconfig.xml?revision=1638496view=markup There seems be a lot of ${ParameterName:Value} E.g. lockType${solr.lock.type:native}/lockType Where do these parameter values get set? Thank you in anticipation. -- View this message in context: http://lucene.472066.n3.nabble.com/Where-can-we-set-the-parameters-in-Solr -Config-tp4183706.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR retrieve data using URL
You don't have to use SolrJ. It's just a web request to a url, so just issue the request in Java and parse the JSON response. http://stackoverflow.com/questions/7467568/parsing-json-from-url SolrJ does make it simpler, however. Jim On 2/2/15, 12:57 PM, mathewvino vinojmat...@hotmail.com wrote: Hi There, I am using solrj API to make call to Solr Server with the data that I am looking for. Basically I am using solrj api as below to get the data. Everything is working as expected HttpSolrServer solr = new HttpSolrServer(http://server:8983/solr/collection1;); SolrQuery query = new SolrQuery(*:*); query.setFacet(true).addFacetField(PLS_SURVY_SURVY_STATUS_MAP) Is there any API I can use the complete URL to get the data like below HttpSolrServer solr = new HttpSolrServer(http://server:8983/solr/collection1/select?q=*%3A*wt=json indent=truefacet=truefacet.field=PLS_SURVY_SURVY_LANG_CHOICE_MAP) I would like to pass the complete url to get the data insted of using solrj query api. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-retrieve-data-using-URL-tp4183536. html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr throwing SocketException: Connection Reset
This is difficult to diagnose, but here¹s some questions I would ask myself: Can you reliably recreate the error? Can you recreate the error faster by writing to all 100 collections at once? Can you recreate the error faster if I have less nodes? Is just one solr node or one solr collection throwing the error? Are all the updates coming from one machine? Is there some other bottleneck in your network (like a load balancer) that is limiting connections? Good luck, Jim Musil On 2/2/15, 5:29 AM, nkgupta nitinkumargu...@gmail.com wrote: I have 8 node solr cloud cluster connected with external zookeeper. Each node : 30 Gb, 4 core. I have created around 100 collections, each collection is having approx. 30 shards. (Why I need it, let be a different story, business isolation, business requirement could be anything). Now, I am ingesting data into cluster on 30 collections simultaneously. I see that ingestion to few collections is getting failed. In solr logs, I can see this Connection Reset exception occurring. Overall time for ingestion is in the tune of 10 hours. Any suggestion? Even if it is due to resource starvation how can I prove that connection reset is coming because of lack of resources. Exception == 2015-01-30 09:16:14,454 ERROR [updateExecutor-1-thread-8151] ? (:) - error java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) ~[?:1.7.0_55] at java.net.SocketInputStream.read(SocketInputStream.java:122) ~[?:1.7.0_55] at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSess ionInputBuffer.java:160) ~[httpcore-4.3.jar:4.3] at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.jav a:84) ~[httpcore-4.3.jar:4.3] at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessio nInputBuffer.java:273) ~[httpcore-4.3.jar:4.3] at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpR esponseParser.java:140) ~[httpclient-4.3.1.jar:4.3.1] at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpR esponseParser.java:57) ~[httpclient-4.3.1.jar:4.3.1] at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser. java:260) ~[httpcore-4.3.jar:4.3] at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(Ab stractHttpClientConnection.java:283) ~[httpcore-4.3.jar:4.3] at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(De faultClientConnection.java:251) ~[httpclient-4.3.1.jar:4.3.1] at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeade r(ManagedClientConnectionImpl.java:197) ~[httpclient-4.3.1.jar:4.3.1] at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequest Executor.java:271) ~[httpcore-4.3.jar:4.3] at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.j ava:123) ~[httpcore-4.3.jar:4.3] at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultReque stDirector.java:682) ~[httpclient-4.3.1.jar:4.3.1] at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestD irector.java:486) ~[httpclient-4.3.1.jar:4.3.1] at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClien t.java:863) ~[httpclient-4.3.1.jar:4.3.1] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClien t.java:82) ~[httpclient-4.3.1.jar:4.3.1] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClien t.java:106) ~[httpclient-4.3.1.jar:4.3.1] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClien t.java:57) ~[httpclient-4.3.1.jar:4.3.1] at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(Co ncurrentUpdateSolrServer.java:233) [solr-solrj-4.10.0.jar:4.10.0 1620776 - rjernst - 2014-08-26 20:49:51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: 1145) [?:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java :615) [?:1.7.0_55] at java.lang.Thread.run(Thread.java:745) [?:1.7.0_55] -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-throwing-SocketException-Connectio n-Reset-tp4183434.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr pattern tokenizer
It looks to me like you simply want to split the incoming query by the hyphen, so that it searches for exact codes like this ³CHQ PAID² ³INWARD TRAN² ³HDFC LTD². If that¹s true, I¹d either just change the query at the client to do what you want, or look into something like the PatternTokenizer: https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternTo kenizerFactory Apologies if I¹m not understanding your use case. Thanks, Jim On 2/2/15, 3:56 AM, Nivedita nivedita.pa...@tcs.com wrote: Hi, I want to tokenize query like CHQ PAID-INWARD TRAN-HDFC LTD in such a way that it should give me result documnet containing HDFC LTD and not HDFC MF. How can I do this. I Have already applied below Tokenizers fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=25 side=front/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ filter class=solr.TrimFilterFactory / /analyzer /fieldType Please help. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-pattern-tokenizer-tp4183421.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: An interesting approach to grouping
Yes, I’m trying to pin down exactly what conditions cause the bug to appear. It seems as though it’s only when using the query function. Jim On 1/27/15, 12:44 PM, Ryan Josal rjo...@gmail.com wrote: This is great, thanks Jim. Your patch worked and the sorting solution meets the goal, although group.limit seems like it could cut various results out of the middle of the result set. I will play around with it and see if it proves helpful. Can you let me know the Jira so I can keep an eye on it? Ryan On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com wrote: Interestingly, you can do something like this: group=true group.main=true group.func=rint(scale(query({!type=edismax v=$q}),0,20)) // puts into buckets group.limit=20 // gives you 20 from each bucket group.sort=category asc // this will sort by category within each bucket, but this can be a function as well. Jim Musil On 1/27/15, 10:14 AM, Jim.Musil jim.mu...@target.com javascript:; wrote: When using group.main=true, the results are not mixed as you expect: If true, the result of the last field grouping command is used as the main result list in the response, using group.format=simple” https://wiki.apache.org/solr/FieldCollapsing Jim On 1/27/15, 9:22 AM, Ryan Josal rjo...@gmail.com javascript:; wrote: Thanks a lot! I'll try this out later this morning. If group.func and group.field don't combine the way I think they might, I'll try to look for a way to put it all in group.func. On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com javascript:; wrote: I¹m not sure the query you provided will do what you want, BUT I did find the bug in the code that is causing the NullPointerException. The variable context is supposed to be global, but when prepare() is called, it is only defined in the scope of that function. Here¹s the simple patch: Index: core/src/java/org/apache/solr/search/Grouping.java === --- core/src/java/org/apache/solr/search/Grouping.java (revision 1653358) +++ core/src/java/org/apache/solr/search/Grouping.java (working copy) @@ -926,7 +926,7 @@ */ @Override protected void prepare() throws IOException { - Map context = ValueSource.newContext(searcher); + context = ValueSource.newContext(searcher); groupBy.createWeight(context, searcher); actualGroupsToFind = getMax(offset, numGroups, maxDoc); } I¹ll search for a Jira issue and open if I can¹t find one. Jim Musil On 1/26/15, 6:34 PM, Ryan Josal r...@josal.com javascript:; javascript:; wrote: I have an index of products, and these products have a category which we can say for now is a good approximation of its location in the store. I'm investigating altering the ordering of the results so that the categories aren't interlaced as much... so that the results are a little bit more grouped by category, but not *totally* grouped by category. It's interesting because it's an approach that sort of compares results to near-scored/ranked results. One of the hoped outcomes of this would that there would be somewhat fewer categories represented in the top results for a given query, although it is questionable if this is a good measurement to determine the effectiveness of the implementation. My first attempt was to group=truegroup.main=truegroup.field=categorygroup.func=rint(scale (q u er y({!type=edismax v=$q}),0,20)) Or some FunctionQuery like that, so that in order to become a member of a group, the doc would have to have the same category, and be dropped into the same score bucket (20 in this case). This doesn't work out of the gate due to an NPE (solr 4.10.2) (although I'm not sure it would work anyway): java.lang.NullPointerException\n\tat org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.get Va l ue s(ScaleFloatFunction.java:104)\n\tat org.apache.solr.search.DoubleParser$Function.getValues(ValueSourcePar se r .j ava:)\n\tat org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingC ol l ec tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector. ja v a: 113)\n\tat org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612) \n \ ta t org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) \n \ ta t org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:4 51 ) \n \tat org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat org.apache.solr.handler.component.QueryComponent.process(QueryCompone nt . ja va:459)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea rc h Ha ndler.java:218)\n\tat Has anyone tried something like
Re: An interesting approach to grouping
Here’s the issue: https://issues.apache.org/jira/browse/SOLR-7046 Jim On 1/27/15, 12:44 PM, Ryan Josal rjo...@gmail.com wrote: This is great, thanks Jim. Your patch worked and the sorting solution meets the goal, although group.limit seems like it could cut various results out of the middle of the result set. I will play around with it and see if it proves helpful. Can you let me know the Jira so I can keep an eye on it? Ryan On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com wrote: Interestingly, you can do something like this: group=true group.main=true group.func=rint(scale(query({!type=edismax v=$q}),0,20)) // puts into buckets group.limit=20 // gives you 20 from each bucket group.sort=category asc // this will sort by category within each bucket, but this can be a function as well. Jim Musil On 1/27/15, 10:14 AM, Jim.Musil jim.mu...@target.com javascript:; wrote: When using group.main=true, the results are not mixed as you expect: If true, the result of the last field grouping command is used as the main result list in the response, using group.format=simple” https://wiki.apache.org/solr/FieldCollapsing Jim On 1/27/15, 9:22 AM, Ryan Josal rjo...@gmail.com javascript:; wrote: Thanks a lot! I'll try this out later this morning. If group.func and group.field don't combine the way I think they might, I'll try to look for a way to put it all in group.func. On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com javascript:; wrote: I¹m not sure the query you provided will do what you want, BUT I did find the bug in the code that is causing the NullPointerException. The variable context is supposed to be global, but when prepare() is called, it is only defined in the scope of that function. Here¹s the simple patch: Index: core/src/java/org/apache/solr/search/Grouping.java === --- core/src/java/org/apache/solr/search/Grouping.java (revision 1653358) +++ core/src/java/org/apache/solr/search/Grouping.java (working copy) @@ -926,7 +926,7 @@ */ @Override protected void prepare() throws IOException { - Map context = ValueSource.newContext(searcher); + context = ValueSource.newContext(searcher); groupBy.createWeight(context, searcher); actualGroupsToFind = getMax(offset, numGroups, maxDoc); } I¹ll search for a Jira issue and open if I can¹t find one. Jim Musil On 1/26/15, 6:34 PM, Ryan Josal r...@josal.com javascript:; javascript:; wrote: I have an index of products, and these products have a category which we can say for now is a good approximation of its location in the store. I'm investigating altering the ordering of the results so that the categories aren't interlaced as much... so that the results are a little bit more grouped by category, but not *totally* grouped by category. It's interesting because it's an approach that sort of compares results to near-scored/ranked results. One of the hoped outcomes of this would that there would be somewhat fewer categories represented in the top results for a given query, although it is questionable if this is a good measurement to determine the effectiveness of the implementation. My first attempt was to group=truegroup.main=truegroup.field=categorygroup.func=rint(scale (q u er y({!type=edismax v=$q}),0,20)) Or some FunctionQuery like that, so that in order to become a member of a group, the doc would have to have the same category, and be dropped into the same score bucket (20 in this case). This doesn't work out of the gate due to an NPE (solr 4.10.2) (although I'm not sure it would work anyway): java.lang.NullPointerException\n\tat org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.get Va l ue s(ScaleFloatFunction.java:104)\n\tat org.apache.solr.search.DoubleParser$Function.getValues(ValueSourcePar se r .j ava:)\n\tat org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingC ol l ec tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector. ja v a: 113)\n\tat org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612) \n \ ta t org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) \n \ ta t org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:4 51 ) \n \tat org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat org.apache.solr.handler.component.QueryComponent.process(QueryCompone nt . ja va:459)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea rc h Ha ndler.java:218)\n\tat Has anyone tried something like this before, and does anyone have any novel ideas for how to approach
Re: An interesting approach to grouping
Here’s the issue: On 1/27/15, 12:44 PM, Ryan Josal rjo...@gmail.com wrote: This is great, thanks Jim. Your patch worked and the sorting solution meets the goal, although group.limit seems like it could cut various results out of the middle of the result set. I will play around with it and see if it proves helpful. Can you let me know the Jira so I can keep an eye on it? Ryan On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com wrote: Interestingly, you can do something like this: group=true group.main=true group.func=rint(scale(query({!type=edismax v=$q}),0,20)) // puts into buckets group.limit=20 // gives you 20 from each bucket group.sort=category asc // this will sort by category within each bucket, but this can be a function as well. Jim Musil On 1/27/15, 10:14 AM, Jim.Musil jim.mu...@target.com javascript:; wrote: When using group.main=true, the results are not mixed as you expect: If true, the result of the last field grouping command is used as the main result list in the response, using group.format=simple” https://wiki.apache.org/solr/FieldCollapsing Jim On 1/27/15, 9:22 AM, Ryan Josal rjo...@gmail.com javascript:; wrote: Thanks a lot! I'll try this out later this morning. If group.func and group.field don't combine the way I think they might, I'll try to look for a way to put it all in group.func. On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com javascript:; wrote: I¹m not sure the query you provided will do what you want, BUT I did find the bug in the code that is causing the NullPointerException. The variable context is supposed to be global, but when prepare() is called, it is only defined in the scope of that function. Here¹s the simple patch: Index: core/src/java/org/apache/solr/search/Grouping.java === --- core/src/java/org/apache/solr/search/Grouping.java (revision 1653358) +++ core/src/java/org/apache/solr/search/Grouping.java (working copy) @@ -926,7 +926,7 @@ */ @Override protected void prepare() throws IOException { - Map context = ValueSource.newContext(searcher); + context = ValueSource.newContext(searcher); groupBy.createWeight(context, searcher); actualGroupsToFind = getMax(offset, numGroups, maxDoc); } I¹ll search for a Jira issue and open if I can¹t find one. Jim Musil On 1/26/15, 6:34 PM, Ryan Josal r...@josal.com javascript:; javascript:; wrote: I have an index of products, and these products have a category which we can say for now is a good approximation of its location in the store. I'm investigating altering the ordering of the results so that the categories aren't interlaced as much... so that the results are a little bit more grouped by category, but not *totally* grouped by category. It's interesting because it's an approach that sort of compares results to near-scored/ranked results. One of the hoped outcomes of this would that there would be somewhat fewer categories represented in the top results for a given query, although it is questionable if this is a good measurement to determine the effectiveness of the implementation. My first attempt was to group=truegroup.main=truegroup.field=categorygroup.func=rint(scale (q u er y({!type=edismax v=$q}),0,20)) Or some FunctionQuery like that, so that in order to become a member of a group, the doc would have to have the same category, and be dropped into the same score bucket (20 in this case). This doesn't work out of the gate due to an NPE (solr 4.10.2) (although I'm not sure it would work anyway): java.lang.NullPointerException\n\tat org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.get Va l ue s(ScaleFloatFunction.java:104)\n\tat org.apache.solr.search.DoubleParser$Function.getValues(ValueSourcePar se r .j ava:)\n\tat org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingC ol l ec tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector. ja v a: 113)\n\tat org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612) \n \ ta t org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) \n \ ta t org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:4 51 ) \n \tat org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat org.apache.solr.handler.component.QueryComponent.process(QueryCompone nt . ja va:459)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea rc h Ha ndler.java:218)\n\tat Has anyone tried something like this before, and does anyone have any novel ideas for how to approach it, no matter how different? How about a workaround
Re: An interesting approach to grouping
I¹m not sure the query you provided will do what you want, BUT I did find the bug in the code that is causing the NullPointerException. The variable context is supposed to be global, but when prepare() is called, it is only defined in the scope of that function. Here¹s the simple patch: Index: core/src/java/org/apache/solr/search/Grouping.java === --- core/src/java/org/apache/solr/search/Grouping.java (revision 1653358) +++ core/src/java/org/apache/solr/search/Grouping.java (working copy) @@ -926,7 +926,7 @@ */ @Override protected void prepare() throws IOException { - Map context = ValueSource.newContext(searcher); + context = ValueSource.newContext(searcher); groupBy.createWeight(context, searcher); actualGroupsToFind = getMax(offset, numGroups, maxDoc); } I¹ll search for a Jira issue and open if I can¹t find one. Jim Musil On 1/26/15, 6:34 PM, Ryan Josal r...@josal.com wrote: I have an index of products, and these products have a category which we can say for now is a good approximation of its location in the store. I'm investigating altering the ordering of the results so that the categories aren't interlaced as much... so that the results are a little bit more grouped by category, but not *totally* grouped by category. It's interesting because it's an approach that sort of compares results to near-scored/ranked results. One of the hoped outcomes of this would that there would be somewhat fewer categories represented in the top results for a given query, although it is questionable if this is a good measurement to determine the effectiveness of the implementation. My first attempt was to group=truegroup.main=truegroup.field=categorygroup.func=rint(scale(quer y({!type=edismax v=$q}),0,20)) Or some FunctionQuery like that, so that in order to become a member of a group, the doc would have to have the same category, and be dropped into the same score bucket (20 in this case). This doesn't work out of the gate due to an NPE (solr 4.10.2) (although I'm not sure it would work anyway): java.lang.NullPointerException\n\tat org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.getValue s(ScaleFloatFunction.java:104)\n\tat org.apache.solr.search.DoubleParser$Function.getValues(ValueSourceParser.j ava:)\n\tat org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingCollec tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.java: 113)\n\tat org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)\n\ta t org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)\n\ta t org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:451)\n \tat org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat org.apache.solr.handler.component.QueryComponent.process(QueryComponent.ja va:459)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa ndler.java:218)\n\tat Has anyone tried something like this before, and does anyone have any novel ideas for how to approach it, no matter how different? How about a workaround for the group.func error here? I'm very open-minded about where to go on this one. Thanks, Ryan
Re: An interesting approach to grouping
When using group.main=true, the results are not mixed as you expect: If true, the result of the last field grouping command is used as the main result list in the response, using group.format=simple” https://wiki.apache.org/solr/FieldCollapsing Jim On 1/27/15, 9:22 AM, Ryan Josal rjo...@gmail.com wrote: Thanks a lot! I'll try this out later this morning. If group.func and group.field don't combine the way I think they might, I'll try to look for a way to put it all in group.func. On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com wrote: I¹m not sure the query you provided will do what you want, BUT I did find the bug in the code that is causing the NullPointerException. The variable context is supposed to be global, but when prepare() is called, it is only defined in the scope of that function. Here¹s the simple patch: Index: core/src/java/org/apache/solr/search/Grouping.java === --- core/src/java/org/apache/solr/search/Grouping.java (revision 1653358) +++ core/src/java/org/apache/solr/search/Grouping.java (working copy) @@ -926,7 +926,7 @@ */ @Override protected void prepare() throws IOException { - Map context = ValueSource.newContext(searcher); + context = ValueSource.newContext(searcher); groupBy.createWeight(context, searcher); actualGroupsToFind = getMax(offset, numGroups, maxDoc); } I¹ll search for a Jira issue and open if I can¹t find one. Jim Musil On 1/26/15, 6:34 PM, Ryan Josal r...@josal.com javascript:; wrote: I have an index of products, and these products have a category which we can say for now is a good approximation of its location in the store. I'm investigating altering the ordering of the results so that the categories aren't interlaced as much... so that the results are a little bit more grouped by category, but not *totally* grouped by category. It's interesting because it's an approach that sort of compares results to near-scored/ranked results. One of the hoped outcomes of this would that there would be somewhat fewer categories represented in the top results for a given query, although it is questionable if this is a good measurement to determine the effectiveness of the implementation. My first attempt was to group=truegroup.main=truegroup.field=categorygroup.func=rint(scale(qu er y({!type=edismax v=$q}),0,20)) Or some FunctionQuery like that, so that in order to become a member of a group, the doc would have to have the same category, and be dropped into the same score bucket (20 in this case). This doesn't work out of the gate due to an NPE (solr 4.10.2) (although I'm not sure it would work anyway): java.lang.NullPointerException\n\tat org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.getVal ue s(ScaleFloatFunction.java:104)\n\tat org.apache.solr.search.DoubleParser$Function.getValues(ValueSourceParser .j ava:)\n\tat org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingColl ec tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.jav a: 113)\n\tat org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)\n\ ta t org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)\n\ ta t org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:451) \n \tat org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat org.apache.solr.handler.component.QueryComponent.process(QueryComponent. ja va:459)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Ha ndler.java:218)\n\tat Has anyone tried something like this before, and does anyone have any novel ideas for how to approach it, no matter how different? How about a workaround for the group.func error here? I'm very open-minded about where to go on this one. Thanks, Ryan
Re: An interesting approach to grouping
Interestingly, you can do something like this: group=true group.main=true group.func=rint(scale(query({!type=edismax v=$q}),0,20)) // puts into buckets group.limit=20 // gives you 20 from each bucket group.sort=category asc // this will sort by category within each bucket, but this can be a function as well. Jim Musil On 1/27/15, 10:14 AM, Jim.Musil jim.mu...@target.com wrote: When using group.main=true, the results are not mixed as you expect: If true, the result of the last field grouping command is used as the main result list in the response, using group.format=simple” https://wiki.apache.org/solr/FieldCollapsing Jim On 1/27/15, 9:22 AM, Ryan Josal rjo...@gmail.com wrote: Thanks a lot! I'll try this out later this morning. If group.func and group.field don't combine the way I think they might, I'll try to look for a way to put it all in group.func. On Tuesday, January 27, 2015, Jim.Musil jim.mu...@target.com wrote: I¹m not sure the query you provided will do what you want, BUT I did find the bug in the code that is causing the NullPointerException. The variable context is supposed to be global, but when prepare() is called, it is only defined in the scope of that function. Here¹s the simple patch: Index: core/src/java/org/apache/solr/search/Grouping.java === --- core/src/java/org/apache/solr/search/Grouping.java (revision 1653358) +++ core/src/java/org/apache/solr/search/Grouping.java (working copy) @@ -926,7 +926,7 @@ */ @Override protected void prepare() throws IOException { - Map context = ValueSource.newContext(searcher); + context = ValueSource.newContext(searcher); groupBy.createWeight(context, searcher); actualGroupsToFind = getMax(offset, numGroups, maxDoc); } I¹ll search for a Jira issue and open if I can¹t find one. Jim Musil On 1/26/15, 6:34 PM, Ryan Josal r...@josal.com javascript:; wrote: I have an index of products, and these products have a category which we can say for now is a good approximation of its location in the store. I'm investigating altering the ordering of the results so that the categories aren't interlaced as much... so that the results are a little bit more grouped by category, but not *totally* grouped by category. It's interesting because it's an approach that sort of compares results to near-scored/ranked results. One of the hoped outcomes of this would that there would be somewhat fewer categories represented in the top results for a given query, although it is questionable if this is a good measurement to determine the effectiveness of the implementation. My first attempt was to group=truegroup.main=truegroup.field=categorygroup.func=rint(scale(q u er y({!type=edismax v=$q}),0,20)) Or some FunctionQuery like that, so that in order to become a member of a group, the doc would have to have the same category, and be dropped into the same score bucket (20 in this case). This doesn't work out of the gate due to an NPE (solr 4.10.2) (although I'm not sure it would work anyway): java.lang.NullPointerException\n\tat org.apache.lucene.queries.function.valuesource.ScaleFloatFunction.getVa l ue s(ScaleFloatFunction.java:104)\n\tat org.apache.solr.search.DoubleParser$Function.getValues(ValueSourceParse r .j ava:)\n\tat org.apache.lucene.search.grouping.function.FunctionFirstPassGroupingCol l ec tor.setNextReader(FunctionFirstPassGroupingCollector.java:82)\n\tat org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.ja v a: 113)\n\tat org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)\n \ ta t org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)\n \ ta t org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:451 ) \n \tat org.apache.solr.search.Grouping.execute(Grouping.java:368)\n\tat org.apache.solr.handler.component.QueryComponent.process(QueryComponent . ja va:459)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(Searc h Ha ndler.java:218)\n\tat Has anyone tried something like this before, and does anyone have any novel ideas for how to approach it, no matter how different? How about a workaround for the group.func error here? I'm very open-minded about where to go on this one. Thanks, Ryan
Re: Indexed epoch time in Solr
If you are using the DataImportHandler, you can leverage on of the transformers, such as the DateFormatTransformer: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer If you are updating documents directly you can define a regex transformation in your schema.xml: https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternRe placeCharFilterFactory If you have control over the input, then I always find it better to just transform it prior to sending it into solr. Jim On 1/25/15, 11:35 PM, Ahmed Adel ahmed.a...@badrit.com wrote: Hi All, Is there a way to convert unix time field that is already indexed to ISO-8601 format in query response? If this is not possible on the query level, what is the best way to copy this field to a new Solr standard date field. Thanks, -- *Ahmed Adel* http://s.wisestamp.com/links?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2F
Does CloudSolrServer hit zookeeper for every request?
I’m curious how CloudSolrServer works in practice. I understand that it gets the active solr nodes from zookeeper, but does it do this for every request? If it does hit zk for every request, that seems to put a lot of pressure on the zk ensemble. If it does NOT hit zk for every request, then how does it detect changes in the number of nodes and the status of the nodes? Thanks! Jim M.
Status of configName in core.properties
Hi, I’m attempting to define a core using the new core discovery method described here: http://wiki.apache.org/solr/Core%20Discovery%20(4.4%20and%20beyond) At the bottom of the page is a parameter named configName that should allow me to specify a configuration name to use for a collection. This does not seem to be working. I have a configuration uploaded to zookeeper with a name. I want to share that configuration between two cores, but it is only linking to the one with the same exact name. This parameter is marked at “Tentative” for 4.6. What is the status? Thanks! Jim