Re: TermsComponent from deleted document
Which is preferable? using TermsComponent or Facets for autosuggest? On Fri, Sep 9, 2011 at 10:33 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : http://wiki.apache.org/solr/TermsComponent states that TermsComponent will : return frequencies from deleted documents too. : : Is there anyway to omit the deleted documents to get the frequencies. not really -- until a deleted document is expunged from segment merging, they are still included in the term stats which is what the TermsComponent looks at. If having 100% accurate term counts is really important to you, then you can optimize after doing any updates on your index - but there is obviously a performance tradeoff there. -Hoss
Re: Sorting groups by numFound group size
Not yet. If you want you can create an issue for sorting groups by numFound. On 9 September 2011 18:49, O. Klein kl...@octoweb.nl wrote: I am also looking for way to sort on numFound. Has an issue been created? -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-groups-by-numFound-group-size-tp3315740p3323420.html Sent from the Solr - User mailing list archive at Nabble.com. -- Met vriendelijke groet, Martijn van Groningen
Re: How to write this query?
Hi, key:value1^8 key:value2^4 key:value3^2 is correct. Sorry for bad query written. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-write-this-query-tp3318577p3325033.html Sent from the Solr - User mailing list archive at Nabble.com.
Master Slave Question
Is it appropriate to query the master servers when replicating? I ask because there could be a case where we index say 50 documents to the master, they have not yet been replicated and a user asks for page 2, when they ask for page 2 the request could be sent to a slave and get 0. Is there a way to avoid this? My thought was to not allow querying of the master but I'm not sure that this could be configured in solr
Re: Master Slave Question
Real Time indexing (solr 4) or decrease replication poll and auto commit time. 2011/9/10 Jamie Johnson jej2...@gmail.com Is it appropriate to query the master servers when replicating? I ask because there could be a case where we index say 50 documents to the master, they have not yet been replicated and a user asks for page 2, when they ask for page 2 the request could be sent to a slave and get 0. Is there a way to avoid this? My thought was to not allow querying of the master but I'm not sure that this could be configured in solr
Re: Solr Cloud - is replication really a feature on the trunk?
1s of all, thanks everyone, your expertise and time is much appreciated. @Jamie: Great suggestion, I just have one small objection to it ... I wouldn't want to mix the core's name with the collection's configName. Wouldn't you also want to keep the two separate for clarity? What do you think about that? @Yury: Overall what you said makes sense and I'll roll with it. But FYI, through experimentation I found out that collection=myconf does not become the value for configName when I inspect ZooKeeper.jsp, here's an example of what shows up if I setup the solr.xml file but don't say anything in the cmd line startup: myconf (v=0 children=1) configName=configuration1 But perhaps that's exactly what you are trying to warn me about. I'll experiment more and get back. - Pulkit On Fri, Sep 9, 2011 at 10:17 PM, Jamie Johnson jej2...@gmail.com wrote: as a note you could change out the values in solr.xml to be as follows and pull these values from System Properties. cores adminPath=/admin/cores defaultCoreName=${collection.configName} core name=${collection.configName} instanceDir=. shard=${shard}/ /cores unless someone says otherwise, but the quick tests I've run seem to work perfectly well with this setup. 2011/9/9 Yury Kats yuryk...@yahoo.com: On 9/9/2011 6:54 PM, Pulkit Singhal wrote: Thanks Again. Another question: My solr.xml has: cores adminPath=/admin/cores defaultCoreName=master1 core name=master1 instanceDir=. shard=shard1 collection=myconf/ /cores And I omitted -Dcollection.configName=myconf from the startup command because I felt that specifying collection=myconf should take care of that: cd /trunk/solr/example java -Dbootstrap_confdir=./solr/conf -Dslave=disabled -DzkRun -jar start.jar With this you are telling ZK to bootstrap a collection with content of specific files, but you don't tell what collection that should be. Hence you want collection.configName parameter, and you want solr.xml to reference the same name in 'collection' attribute for the cores, so that SolrCloud knows where to pull configuration for that core from.
Re: Solr Cloud - is replication really a feature on the trunk?
Yes now I'm sure that a) collection=blah in solr.xml, and b) -Dcollection.configName=myconf at cmd line actually fill in values for two very different fields. Here's why I say so: Example config # 1: core name=master1 instanceDir=. shard=shard1 collection=scaleDeep/ Results in: /collections (v=6 children=1) scaleDeep (v=0 children=1) configName=myconf Example config # 2: Results in: /collections (v=6 children=1) scaleDeep (v=0 children=1) configName=scaleDeep What do you think about that? I maybe mis-interpreting the resutls so pleaase pelase feel free to set me straight on this. Also it would be nice if I knew the code well enough to just look @ it and give an authoritative answer. Does anyone have that kind of expertise? Reverse-engineering is getting a bit mundane. Thanks! - Pulkit On Sat, Sep 10, 2011 at 11:43 AM, Pulkit Singhal pulkitsing...@gmail.com wrote: 1s of all, thanks everyone, your expertise and time is much appreciated. @Jamie: Great suggestion, I just have one small objection to it ... I wouldn't want to mix the core's name with the collection's configName. Wouldn't you also want to keep the two separate for clarity? What do you think about that? @Yury: Overall what you said makes sense and I'll roll with it. But FYI, through experimentation I found out that collection=myconf does not become the value for configName when I inspect ZooKeeper.jsp, here's an example of what shows up if I setup the solr.xml file but don't say anything in the cmd line startup: myconf (v=0 children=1) configName=configuration1 But perhaps that's exactly what you are trying to warn me about. I'll experiment more and get back. - Pulkit On Fri, Sep 9, 2011 at 10:17 PM, Jamie Johnson jej2...@gmail.com wrote: as a note you could change out the values in solr.xml to be as follows and pull these values from System Properties. cores adminPath=/admin/cores defaultCoreName=${collection.configName} core name=${collection.configName} instanceDir=. shard=${shard}/ /cores unless someone says otherwise, but the quick tests I've run seem to work perfectly well with this setup. 2011/9/9 Yury Kats yuryk...@yahoo.com: On 9/9/2011 6:54 PM, Pulkit Singhal wrote: Thanks Again. Another question: My solr.xml has: cores adminPath=/admin/cores defaultCoreName=master1 core name=master1 instanceDir=. shard=shard1 collection=myconf/ /cores And I omitted -Dcollection.configName=myconf from the startup command because I felt that specifying collection=myconf should take care of that: cd /trunk/solr/example java -Dbootstrap_confdir=./solr/conf -Dslave=disabled -DzkRun -jar start.jar With this you are telling ZK to bootstrap a collection with content of specific files, but you don't tell what collection that should be. Hence you want collection.configName parameter, and you want solr.xml to reference the same name in 'collection' attribute for the cores, so that SolrCloud knows where to pull configuration for that core from.
Re: Solr Cloud - is replication really a feature on the trunk?
Sorry a message got sent without me finishing it up, ctrl+s is not save but send ... sigh! Yes now I'm sure that a) collection=blah in solr.xml, and b) -Dcollection.configName=myconf at cmd line actually fill in values for two very different fields. Here's why I say so: Example config # 1: core name=master1 instanceDir=. shard=shard1 collection=*scaleDeep* / java -Dcollection.configName=*myconf* ... -DzkRun -jar start.jar Results in: /collections (v=6 children=1) *scaleDeep* (v=0 children=1) configName=*myconf* Example config # 2: core name=master1 instanceDir=. shard=shard1 collection=*scaleDeep* / java -Dcollection.configName=*scaleDeep* ... -DzkRun -jar start.jar Results in: /collections (v=6 children=1) *scaleDeep* (v=0 children=1) configName=*scaleDeep* What do you think about that? I maybe mis-interpreting the results so please please feel free to set me straight on this. Also it would be nice if I knew the code well enough to just look @ it and give an authoritative answer. Does anyone have that kind of expertise? Reverse-engineering is getting a bit mundane. Thanks! - Pulkit On Sat, Sep 10, 2011 at 11:43 AM, Pulkit Singhal pulkitsing...@gmail.com wrote: 1s of all, thanks everyone, your expertise and time is much appreciated. @Jamie: Great suggestion, I just have one small objection to it ... I wouldn't want to mix the core's name with the collection's configName. Wouldn't you also want to keep the two separate for clarity? What do you think about that? @Yury: Overall what you said makes sense and I'll roll with it. But FYI, through experimentation I found out that collection=myconf does not become the value for configName when I inspect ZooKeeper.jsp, here's an example of what shows up if I setup the solr.xml file but don't say anything in the cmd line startup: myconf (v=0 children=1) configName=configuration1 But perhaps that's exactly what you are trying to warn me about. I'll experiment more and get back. - Pulkit On Fri, Sep 9, 2011 at 10:17 PM, Jamie Johnson jej2...@gmail.com wrote: as a note you could change out the values in solr.xml to be as follows and pull these values from System Properties. cores adminPath=/admin/cores defaultCoreName=${collection.configName} core name=${collection.configName} instanceDir=. shard=${shard}/ /cores unless someone says otherwise, but the quick tests I've run seem to work perfectly well with this setup. 2011/9/9 Yury Kats yuryk...@yahoo.com: On 9/9/2011 6:54 PM, Pulkit Singhal wrote: Thanks Again. Another question: My solr.xml has: cores adminPath=/admin/cores defaultCoreName=master1 core name=master1 instanceDir=. shard=shard1 collection=myconf/ /cores And I omitted -Dcollection.configName=myconf from the startup command because I felt that specifying collection=myconf should take care of that: cd /trunk/solr/example java -Dbootstrap_confdir=./solr/conf -Dslave=disabled -DzkRun -jar start.jar With this you are telling ZK to bootstrap a collection with content of specific files, but you don't tell what collection that should be. Hence you want collection.configName parameter, and you want solr.xml to reference the same name in 'collection' attribute for the cores, so that SolrCloud knows where to pull configuration for that core from.
Re: TermsComponent from deleted document
I'd use the suggester: http://wiki.apache.org/solr/Suggester The suggester can give a collation. The TermsComponent can't do that. The suggester builds on top of the spellchecking infrastructure, so should be easy to use if you're familiar with that. Martijn On 10 September 2011 08:37, Manish Bafna manish.bafna...@gmail.com wrote: Which is preferable? using TermsComponent or Facets for autosuggest? On Fri, Sep 9, 2011 at 10:33 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : http://wiki.apache.org/solr/TermsComponent states that TermsComponent will : return frequencies from deleted documents too. : : Is there anyway to omit the deleted documents to get the frequencies. not really -- until a deleted document is expunged from segment merging, they are still included in the term stats which is what the TermsComponent looks at. If having 100% accurate term counts is really important to you, then you can optimize after doing any updates on your index - but there is obviously a performance tradeoff there. -Hoss -- Met vriendelijke groet, Martijn van Groningen
Full-search index for the database
I want to create full-text search for my database. It means that search engine should look up some string for all fields of my database. I have created Solr configuration for extracting and indexing data from a database. According documentation in the file schema.xml I have created field for full-text search index: field name=TEXT type=... indexed=true stored=true multiValued=true/ Also I have added strings for copying all values of all fields into this full-search field: ... copyField source= dest=TEXT/ ... In result I have possibility to search for all fields in my database. But I can't recognize which field in the found record contains requested string. Highlighting functionality just marks string in the TEXT field like following: lst name=highlighting lst name=431046.431344...8473633 arr name=TEXT strAny text any text emTest/em/str /arr /lst lst name=431046.431231...8476393 arr name=TEXT strAny text any text emTest/em/str /arr /lst How to create full-search index with possibility to recognize source database field? Thx a lot. Eugeny
Re: Replication setup with SolrCloud/Zk
Hi Yury, How do you manage to start the instances without any issues? The way I see it, no matter which instance is started first, the slave will complain about not being to find its respective master because that instance hasn't been started yet ... no? Thanks, - Pulkit 2011/5/17 Yury Kats yuryk...@yahoo.com On 5/17/2011 10:17 AM, Stefan Matheis wrote: Yury, perhaps Java-Pararms (like used for this sample: http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node ) can help you? Ah, thanks! It does seem to work! Cluster's solrconfig.xml (shared between all Solr instances and cores via SolrCloud/ZK): requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAftercommit/str str name=replicateAfterstartup/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=pollInterval00:01:00/str str name=masterUrlhttp:// ${masterHost:xyz}/solr/master/replication/str /lst /requestHandler Node 1 solr.xml: cores adminPath=/admin/cores defaultCoreName=master core name=master instanceDir=core1 shard=shard1 collection=myconf property name=enable.master value=true / /core core name=slave instanceDir=core2 shard=shard2 collection=myconf property name=enable.slave value=true / property name=masterHost value=node2:8983 / /core /cores Node 2 solr.xml: cores adminPath=/admin/cores defaultCoreName=master core name=master instanceDir=core1 shard=shard2 collection=myconf property name=enable.master value=true / /core core name=slave instanceDir=core2 shard=shard1 collection=myconf property name=enable.slave value=true / property name=masterHost value=node1:8983 / /core /cores
Re: Solr Cloud - is replication really a feature on the trunk?
On Sep 9, 2011, at 6:54 PM, Pulkit Singhal wrote: Thanks Again. Another question: My solr.xml has: cores adminPath=/admin/cores defaultCoreName=master1 core name=master1 instanceDir=. shard=shard1 collection=myconf/ /cores And I omitted -Dcollection.configName=myconf from the startup command because I felt that specifying collection=myconf should take care of that: cd /trunk/solr/example java -Dbootstrap_confdir=./solr/conf -Dslave=disabled -DzkRun -jar start.jar I don't think so? The CoreAdmin handler takes a collection= param to name the collection the SolrCore belongs to. This is different than setting the name of the config file set to use. If you don't specify anything, I believe it defaults to configuration1. The conf file set name has nothing to do with the collection name. But the zookeeper.jsp page doesn't seem to take any of that into effect and shows: /collections (v=6 children=1) collection1 (v=0 children=1) configName=configuration1 shards (v=0 children=1) shard1 (v=0 children=1) tiklup-mac.local:8983_solr_ (v=0) node_name=tiklup-mac.local:8983_solr url=http://tiklup-mac.local:8983/solr/; Then what is the point of naming the core and the collection? The SolrCore name determines which URL's to use to work with that core. The collection name determines which collection the SolrCore acts as a shard in. The collection.configName is the name of the config file set you are uploading - if you leave it out, it's called configuration1. - Pulkit 2011/9/9 Yury Kats yuryk...@yahoo.com: On 9/9/2011 10:52 AM, Pulkit Singhal wrote: Thank You Yury. After looking at your thread, there's something I must clarify: Is solr.xml not uploaded and held in ZooKeeper? Not as far as I understand. Cores are loaded/created by the local Solr server based on solr.xml and then registered with ZK, so that ZK know what cores are out there and how they are organized in shards. because you have a slightly different config between Node 1 2: http://lucene.472066.n3.nabble.com/Replication-setup-with-SolrCloud-Zk-td2952602.html I have two shards, each shard having a master and a slave core. Cores are located so that master and slave are on different nodes. This protects search (but not indexing) from node failure. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona
Re: Replication setup with SolrCloud/Zk
Sorry, stupid question, now I see that the core still starts and the polling process simply logs an error: SEVERE: Master at: http://localhost:7574/solr/master2/replication is not available. Index fetch failed. Exception: Connection refused I was able to setup the instructions in-detail with this thread's help here: http://pulkitsinghal.blogspot.com/2011/09/multicore-master-slave-replication-in.html Thanks, - Pulkit On Sat, Sep 10, 2011 at 2:54 PM, Pulkit Singhal pulkitsing...@gmail.comwrote: Hi Yury, How do you manage to start the instances without any issues? The way I see it, no matter which instance is started first, the slave will complain about not being to find its respective master because that instance hasn't been started yet ... no? Thanks, - Pulkit 2011/5/17 Yury Kats yuryk...@yahoo.com On 5/17/2011 10:17 AM, Stefan Matheis wrote: Yury, perhaps Java-Pararms (like used for this sample: http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node ) can help you? Ah, thanks! It does seem to work! Cluster's solrconfig.xml (shared between all Solr instances and cores via SolrCloud/ZK): requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAftercommit/str str name=replicateAfterstartup/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=pollInterval00:01:00/str str name=masterUrlhttp:// ${masterHost:xyz}/solr/master/replication/str /lst /requestHandler Node 1 solr.xml: cores adminPath=/admin/cores defaultCoreName=master core name=master instanceDir=core1 shard=shard1 collection=myconf property name=enable.master value=true / /core core name=slave instanceDir=core2 shard=shard2 collection=myconf property name=enable.slave value=true / property name=masterHost value=node2:8983 / /core /cores Node 2 solr.xml: cores adminPath=/admin/cores defaultCoreName=master core name=master instanceDir=core1 shard=shard2 collection=myconf property name=enable.master value=true / /core core name=slave instanceDir=core2 shard=shard1 collection=myconf property name=enable.slave value=true / property name=masterHost value=node1:8983 / /core /cores
Nested documents
Hi, Does Solr support nested documents? If not is there any plan to add such a feature? Thanks.
Re: Replication setup with SolrCloud/Zk
On 9/10/2011 3:54 PM, Pulkit Singhal wrote: Hi Yury, How do you manage to start the instances without any issues? The way I see it, no matter which instance is started first, the slave will complain about not being to find its respective master because that instance hasn't been started yet ... no? Yes, but it's not a big deal. The slaves polls periodically, so next time around the master will be up.
Re: Master Slave Question
Is this feature on Trunk currently? On Sat, Sep 10, 2011 at 12:26 PM, Patrick Sauts patrick.via...@gmail.com wrote: Real Time indexing (solr 4) or decrease replication poll and auto commit time. 2011/9/10 Jamie Johnson jej2...@gmail.com Is it appropriate to query the master servers when replicating? I ask because there could be a case where we index say 50 documents to the master, they have not yet been replicated and a user asks for page 2, when they ask for page 2 the request could be sent to a slave and get 0. Is there a way to avoid this? My thought was to not allow querying of the master but I'm not sure that this could be configured in solr
Re: Solr Cloud - is replication really a feature on the trunk?
Thanks Mark, so perhaps a more appropriate config would be cores adminPath=/admin/cores defaultCoreName=${coreName} core name=${coreName} instanceDir=. shard=${shard} collection=${collection}/ /cores which would require the collection and coreName be specified as System Properties. On Sat, Sep 10, 2011 at 4:22 PM, Mark Miller markrmil...@gmail.com wrote: On Sep 9, 2011, at 6:54 PM, Pulkit Singhal wrote: Thanks Again. Another question: My solr.xml has: cores adminPath=/admin/cores defaultCoreName=master1 core name=master1 instanceDir=. shard=shard1 collection=myconf/ /cores And I omitted -Dcollection.configName=myconf from the startup command because I felt that specifying collection=myconf should take care of that: cd /trunk/solr/example java -Dbootstrap_confdir=./solr/conf -Dslave=disabled -DzkRun -jar start.jar I don't think so? The CoreAdmin handler takes a collection= param to name the collection the SolrCore belongs to. This is different than setting the name of the config file set to use. If you don't specify anything, I believe it defaults to configuration1. The conf file set name has nothing to do with the collection name. But the zookeeper.jsp page doesn't seem to take any of that into effect and shows: /collections (v=6 children=1) collection1 (v=0 children=1) configName=configuration1 shards (v=0 children=1) shard1 (v=0 children=1) tiklup-mac.local:8983_solr_ (v=0) node_name=tiklup-mac.local:8983_solr url=http://tiklup-mac.local:8983/solr/; Then what is the point of naming the core and the collection? The SolrCore name determines which URL's to use to work with that core. The collection name determines which collection the SolrCore acts as a shard in. The collection.configName is the name of the config file set you are uploading - if you leave it out, it's called configuration1. - Pulkit 2011/9/9 Yury Kats yuryk...@yahoo.com: On 9/9/2011 10:52 AM, Pulkit Singhal wrote: Thank You Yury. After looking at your thread, there's something I must clarify: Is solr.xml not uploaded and held in ZooKeeper? Not as far as I understand. Cores are loaded/created by the local Solr server based on solr.xml and then registered with ZK, so that ZK know what cores are out there and how they are organized in shards. because you have a slightly different config between Node 1 2: http://lucene.472066.n3.nabble.com/Replication-setup-with-SolrCloud-Zk-td2952602.html I have two shards, each shard having a master and a slave core. Cores are located so that master and slave are on different nodes. This protects search (but not indexing) from node failure. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona