RE: SolrCloud never fully recovers after slow disks
The joy was short-lived. Tonight our environment was “down/slow” a bit longer than usual. It looks like two of our nodes never recovered, clusterstate says everything is active. All nodes are throwing this in the log (the nodes they have trouble reaching are the ones that are affected) - the error comes about several cores: ERROR - 2013-11-11 09:16:42.735; org.apache.solr.common.SolrException; Error while trying to recover. core=products_se_shard1_replica2:org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr04.cd-et.com:8080/solr at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:431) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) ... 4 more ERROR - 2013-11-11 09:16:42.736; org.apache.solr.cloud.RecoveryStrategy; Recovery failed - trying again... (30) core=products_se_shard1_replica2 -- Henrik Ossipoff Hansen Developer, Entertainment Trading On 10. nov. 2013 at 21.07.32, Henrik Ossipoff Hansen (h...@entertainment-trading.commailto://h...@entertainment-trading.com) wrote: Solr version is 4.5.0. I have done some tweaking. Doubling my Zookeeper timeout values in zoo.cfg and the Zookeeper timeout in solr.xml seemed to somewhat minimize the problem, but it still did occur. I next stopped all larger batch indexing in the period where the issues happened, which also seemed to help somewhat. Now the next thing weirds me out a bit - I switched from using Tomcat7 to using the Jetty that ships with Solr, and that actually seems to have fixed the last issues (together with stopping a few smaller updates - very few). During the slow period in the night, I get something like this: 03:11:49 ERROR ZkController There was a problem finding the leader in zk:org.apache.solr.common.SolrException: Could not get leader props 03:06:47 ERROR Overseer Could not create Overseer node 03:06:47 WARN LeaderElector 03:06:47 WARN ZkStateReader ZooKeeper watch triggered, but Solr cannot talk to ZK 03:07:41 WARN RecoveryStrategy Stopping recovery for zkNodeName=solr04.cd-et.com:8080_solr_auto_suggest_shard1_replica2core=auto_suggest_shard1_replica2 After this, the cluster state seems to be fine, and I'm not being spammed with errors in the log files. Bottom line is that the issues are fixed for now it seems, but I still find it weird that Solr was not able to fully receover. // Henrik Ossipoff -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: 10. november 2013 19:27 To: solr-user@lucene.apache.org Subject: Re: SolrCloud never fully recovers after
Re: SolrCloud never fully recovers after slow disks
Hi, I have sometimes this exception too, the recovering goes to an state of loop and I can only finish the recovering if I restart the replica that has the stuck core. In my case I have ssd but replicas with 40 or 50 gigas. If I have 3 replicas in recovery mode and they are replicating from a same node, I have this error. My rate of indexing is high too (~500 doc/s). -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, November 11, 2013 at 8:27 AM, Henrik Ossipoff Hansen wrote: The joy was short-lived. Tonight our environment was “down/slow” a bit longer than usual. It looks like two of our nodes never recovered, clusterstate says everything is active. All nodes are throwing this in the log (the nodes they have trouble reaching are the ones that are affected) - the error comes about several cores: ERROR - 2013-11-11 09:16:42.735; org.apache.solr.common.SolrException; Error while trying to recover. core=products_se_shard1_replica2:org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr04.cd-et.com:8080/solr at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:431) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) ... 4 more ERROR - 2013-11-11 09:16:42.736; org.apache.solr.cloud.RecoveryStrategy; Recovery failed - trying again... (30) core=products_se_shard1_replica2 -- Henrik Ossipoff Hansen Developer, Entertainment Trading On 10. nov. 2013 at 21.07.32, Henrik Ossipoff Hansen (h...@entertainment-trading.commailto://h...@entertainment-trading.com) wrote: Solr version is 4.5.0. I have done some tweaking. Doubling my Zookeeper timeout values in zoo.cfg and the Zookeeper timeout in solr.xml seemed to somewhat minimize the problem, but it still did occur. I next stopped all larger batch indexing in the period where the issues happened, which also seemed to help somewhat. Now the next thing weirds me out a bit - I switched from using Tomcat7 to using the Jetty that ships with Solr, and that actually seems to have fixed the last issues (together with stopping a few smaller updates - very few). During the slow period in the night, I get something like this: 03:11:49 ERROR ZkController There was a problem finding the leader in zk:org.apache.solr.common.SolrException: Could not get leader props 03:06:47 ERROR Overseer Could not create Overseer node 03:06:47 WARN LeaderElector 03:06:47 WARN ZkStateReader ZooKeeper watch triggered, but Solr cannot talk to ZK 03:07:41 WARN RecoveryStrategy Stopping recovery for zkNodeName=solr04.cd-et.com:8080
Re: Solr timeout after reboot
Thank you, Peter! Last weekend I was up until 4am trying to understand why is Solr starting so so sooo slow, when i had gave enough memory to fit the entire index. And then I remembered your trick used on the m3.xlarge machines, tried it and it worked like a charm! Thank you again! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4100254.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Merging shards and replicating changes in SolrCloud
Thanks for the comments Shalin,I ended up doing just that, reindexing from ground up. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-shards-and-replicating-changes-in-SolrCloud-tp407p4100255.html Sent from the Solr - User mailing list archive at Nabble.com.
spellcheck solr 4.3.1
Hey I am running af solr 4.3.1 and working is implementing spellcheck using solr.DirectSolrSpellChecker everything seems to be working fine but at have one issue. If I search for http://localhost:8765/solr/MainIndex/spell?q=kim%20AND%20larsen the result is some hits and the spell component return the following structure. lst name=spellcheck lst name=suggestions bool name=correctlySpelledtrue/bool /lst /lst I would have liked that if some suggest were found they were return If I do a search for http://localhost:8765/solr/MainIndex/spell?q=kim%20AND%20larsenn with larsen spelled wrong (larsenn) the spell component return the following: lst name=spellcheck lst name=suggestions lst name=larsenn int name=numFound1/int int name=startOffset8/int int name=endOffset15/int int name=origFreq0/int arr name=suggestion lst str name=wordlarsen/str int name=freq12/int /lst /arr /lst bool name=correctlySpelledfalse/bool lst name=collation str name=collationQuerykim AND larsen/str int name=hits12/int lst name=misspellingsAndCorrections str name=kimkim/str str name=larsennlarsen/str /lst /lst /lst /lst In my point of view this is correct but, if I do the same search as above just as an OR search http://localhost:8765/solr/MainIndex/spell?q=kim%20OR%20larsenn The spell component return some result and: lst name=spellcheck lst name=suggestions bool name=correctlySpelledtrue/bool /lst /lst larsenn now is spelled corrected according to solr, I cannot understand this behavior. Is there a setting to adjust the spell component so it always return suggestions ? or a way to have suggest in OR search with one wrong word working? Med venlig hilsen / Best regards Daniel Borup Tel: (+45) 28 87 69 18 E-mail: d...@alpha-solutions.dkmailto:d...@alpha-solutions.dk Alpha Solutions A/S Sølvgade 10, 1.sal, DK-1307 Copenhagen K Tel: (+45) 70 20 65 38 Web: www.alpha-solutions.dkhttp://www.alpha-solutions.dk/ ** This message including any attachments may contain confidential and/or privileged information intended only for the person or entity to which it is addressed. If you are not the intended recipient you should delete this message. Any printing, copying, distribution or other use of this message is strictly prohibited. If you have received this message in error, please notify the sender immediately by telephone or e-mail and delete all copies of this message and any attachments from your system. Thank you.
Adding a server to an existing SOLR cloud cluster
Hi We have a SOLRCloud cluster of 3 solr servers (v4.5.0 running under tomcat) with 1 shard. We added a new SOLR server (v4.5.1) by simply starting tomcat and pointing it at the zookeeper ensemble used by the existing cluster. My understanding was that this new server would handshake with zookeeper and add itself as a replica to the existing cluster. What has actually happened is that the server is in zookeeper's live_nodes, but is not in the clusterstate.json file. It also does not have a CORE/collection associated with it. Any ideas? I assume I am missing a step. Do I have to manually create the core on the new server? Cheers Ade -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-a-server-to-an-existing-SOLR-cloud-cluster-tp4100275.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding a server to an existing SOLR cloud cluster
Try manually creating shard replicas on the new server. I think the new server is only used automatically when you start you Solr server instance with correct command line option (aka. -DnumShards) - I never liked this kind of behaviour. The server is not present in clusterstate.json file, because it contains no replicas - but it is a live node, as you have already stated. Best regards, Primoz From: ade-b adrian.bro...@gmail.com To: solr-user@lucene.apache.org Date: 11.11.2013 14:48 Subject:Adding a server to an existing SOLR cloud cluster Hi We have a SOLRCloud cluster of 3 solr servers (v4.5.0 running under tomcat) with 1 shard. We added a new SOLR server (v4.5.1) by simply starting tomcat and pointing it at the zookeeper ensemble used by the existing cluster. My understanding was that this new server would handshake with zookeeper and add itself as a replica to the existing cluster. What has actually happened is that the server is in zookeeper's live_nodes, but is not in the clusterstate.json file. It also does not have a CORE/collection associated with it. Any ideas? I assume I am missing a step. Do I have to manually create the core on the new server? Cheers Ade -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-a-server-to-an-existing-SOLR-cloud-cluster-tp4100275.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding a server to an existing SOLR cloud cluster
Thanks. If I understand what you are saying, it should automatically register itself with the existing cluster if we start SOLR with the correct command line options. We tried adding the numShards option to the command line but still get the same outcome. We start the new SOLR server using /usr/bin/java -Djava.util.logging.config.file=/mnt/ephemeral/apache-tomcat-7.0.47/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -server -Xms256m -Xmx1024m -XX:+DisableExplicitGC -Dsolr.solr.home=/mnt/ephemeral/solr -Dport=8080 -DhostContext=solr -DnumShards=1 -DzkClientTimeout=15000 -DzkHost=zk ip address -Djava.endorsed.dirs=/mnt/ephemeral/apache-tomcat-7.0.47/endorsed -classpath /mnt/ephemeral/apache-tomcat-7.0.47/bin/bootstrap.jar:/mnt/ephemeral/apache-tomcat-7.0.47/bin/tomcat-juli.jar -Dcatalina.base=/mnt/ephemeral/apache-tomcat-7.0.47 -Dcatalina.home=/mnt/ephemeral/apache-tomcat-7.0.47 -Djava.io.tmpdir=/mnt/ephemeral/apache-tomcat-7.0.47/temp org.apache.catalina.startup.Bootstrap start Regards Ade -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-a-server-to-an-existing-SOLR-cloud-cluster-tp4100275p4100286.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: spellcheck solr 4.3.1
There are 2 parameters you want to consider: First is spellcheck.maxResultsForSuggest. Because you have an OR query, you'll get hits if only 1 query term is in the index. This parameter lets you tune it to make it suggest if the query returns n or fewer hits. My memory tells me, however, that if you leave this parameter out entirely, it will still return suggestions for OR queries with some misspelled words (false memory on my part?). Possibly you have this set to 1? Omitting it might be a better option. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxResultsForSuggest . Second is collateParam, which lets you override certain query parameters when the spellchecker is testing collations against the index. For instance, if you have q.op=OR, the spellchecker will return collations that possibly only have 1 correct term. The reason is it simply checks if a collation will return any hits. So you can overide this with spellcheck.collateParam.q.op=AND. The same can be done for mm if using edismax. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateParam.XX . James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Daniel Borup [mailto:d...@alpha-solutions.dk] Sent: Monday, November 11, 2013 7:38 AM To: solr-user@lucene.apache.org Subject: spellcheck solr 4.3.1 Hey I am running af solr 4.3.1 and working is implementing spellcheck using solr.DirectSolrSpellChecker everything seems to be working fine but at have one issue. If I search for http://localhost:8765/solr/MainIndex/spell?q=kim%20AND%20larsen the result is some hits and the spell component return the following structure. lst name=spellcheck lst name=suggestions bool name=correctlySpelledtrue/bool /lst /lst I would have liked that if some suggest were found they were return If I do a search for http://localhost:8765/solr/MainIndex/spell?q=kim%20AND%20larsenn with larsen spelled wrong (larsenn) the spell component return the following: lst name=spellcheck lst name=suggestions lst name=larsenn int name=numFound1/int int name=startOffset8/int int name=endOffset15/int int name=origFreq0/int arr name=suggestion lst str name=wordlarsen/str int name=freq12/int /lst /arr /lst bool name=correctlySpelledfalse/bool lst name=collation str name=collationQuerykim AND larsen/str int name=hits12/int lst name=misspellingsAndCorrections str name=kimkim/str str name=larsennlarsen/str /lst /lst /lst /lst In my point of view this is correct but, if I do the same search as above just as an OR search http://localhost:8765/solr/MainIndex/spell?q=kim%20OR%20larsenn The spell component return some result and: lst name=spellcheck lst name=suggestions bool name=correctlySpelledtrue/bool /lst /lst larsenn now is spelled corrected according to solr, I cannot understand this behavior. Is there a setting to adjust the spell component so it always return suggestions ? or a way to have suggest in OR search with one wrong word working? Med venlig hilsen / Best regards Daniel Borup Tel: (+45) 28 87 69 18 E-mail: d...@alpha-solutions.dkmailto:d...@alpha-solutions.dk Alpha Solutions A/S Sølvgade 10, 1.sal, DK-1307 Copenhagen K Tel: (+45) 70 20 65 38 Web: www.alpha-solutions.dkhttp://www.alpha-solutions.dk/ ** This message including any attachments may contain confidential and/or privileged information intended only for the person or entity to which it is addressed. If you are not the intended recipient you should delete this message. Any printing, copying, distribution or other use of this message is strictly prohibited. If you have received this message in error, please notify the sender immediately by telephone or e-mail and delete all copies of this message and any attachments from your system. Thank you.
Re: SolrCloud never fully recovers after slow disks
The socket read timeouts are actually fairly short for recovery - we should probably bump them up. Can you file a JIRA issue? It may be a symptom rather than a cause, but given a slow env, bumping them up makes sense. - Mark On Nov 11, 2013, at 8:27 AM, Henrik Ossipoff Hansen h...@entertainment-trading.com wrote: The joy was short-lived. Tonight our environment was “down/slow” a bit longer than usual. It looks like two of our nodes never recovered, clusterstate says everything is active. All nodes are throwing this in the log (the nodes they have trouble reaching are the ones that are affected) - the error comes about several cores: ERROR - 2013-11-11 09:16:42.735; org.apache.solr.common.SolrException; Error while trying to recover. core=products_se_shard1_replica2:org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr04.cd-et.com:8080/solr at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:431) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) ... 4 more ERROR - 2013-11-11 09:16:42.736; org.apache.solr.cloud.RecoveryStrategy; Recovery failed - trying again... (30) core=products_se_shard1_replica2 -- Henrik Ossipoff Hansen Developer, Entertainment Trading On 10. nov. 2013 at 21.07.32, Henrik Ossipoff Hansen (h...@entertainment-trading.commailto://h...@entertainment-trading.com) wrote: Solr version is 4.5.0. I have done some tweaking. Doubling my Zookeeper timeout values in zoo.cfg and the Zookeeper timeout in solr.xml seemed to somewhat minimize the problem, but it still did occur. I next stopped all larger batch indexing in the period where the issues happened, which also seemed to help somewhat. Now the next thing weirds me out a bit - I switched from using Tomcat7 to using the Jetty that ships with Solr, and that actually seems to have fixed the last issues (together with stopping a few smaller updates - very few). During the slow period in the night, I get something like this: 03:11:49 ERROR ZkController There was a problem finding the leader in zk:org.apache.solr.common.SolrException: Could not get leader props 03:06:47 ERROR Overseer Could not create Overseer node 03:06:47 WARN LeaderElector 03:06:47 WARN ZkStateReader ZooKeeper watch triggered, but Solr cannot talk to ZK 03:07:41 WARN RecoveryStrategy Stopping recovery for zkNodeName=solr04.cd-et.com:8080_solr_auto_suggest_shard1_replica2core=auto_suggest_shard1_replica2 After this, the cluster state seems to be
Re: Adding a server to an existing SOLR cloud cluster
According to the wiki pages it should, but I have not really tried it yet - I like to make the bookeeping myself :) I am sorry but someones with more knowledge of Solr will have to answer your question. Primoz From: ade-b adrian.bro...@gmail.com To: solr-user@lucene.apache.org Date: 11.11.2013 15:44 Subject:Re: Adding a server to an existing SOLR cloud cluster Thanks. If I understand what you are saying, it should automatically register itself with the existing cluster if we start SOLR with the correct command line options. We tried adding the numShards option to the command line but still get the same outcome. We start the new SOLR server using /usr/bin/java -Djava.util.logging.config.file=/mnt/ephemeral/apache-tomcat-7.0.47/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -server -Xms256m -Xmx1024m -XX:+DisableExplicitGC -Dsolr.solr.home=/mnt/ephemeral/solr -Dport=8080 -DhostContext=solr -DnumShards=1 -DzkClientTimeout=15000 -DzkHost=zk ip address -Djava.endorsed.dirs=/mnt/ephemeral/apache-tomcat-7.0.47/endorsed -classpath /mnt/ephemeral/apache-tomcat-7.0.47/bin/bootstrap.jar:/mnt/ephemeral/apache-tomcat-7.0.47/bin/tomcat-juli.jar -Dcatalina.base=/mnt/ephemeral/apache-tomcat-7.0.47 -Dcatalina.home=/mnt/ephemeral/apache-tomcat-7.0.47 -Djava.io.tmpdir=/mnt/ephemeral/apache-tomcat-7.0.47/temp org.apache.catalina.startup.Bootstrap start Regards Ade -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-a-server-to-an-existing-SOLR-cloud-cluster-tp4100275p4100286.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unit of dimension for solr field
Thanks Upayavira It seems it needs too much work. I will have several more fields that will have unit values. Do we have more quicker way of implementing it? We have Currency filed coming as default with SOLR. Can we use it? Creating conversion rate table for each field? What I am expecting from units is similar to currency field Erol Akarsu -- View this message in context: http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209p4100295.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud never fully recovers after slow disks
I will file a JIRA later today. What I don’t get though (I haven’t looked much into any actual Solr code) is that at this point, our systems are running fine, so timeouts shouldn’t be an issue. Those two nodes though, is somehow left in a state where their response time is up to around 120k ms - which is fairly high - everything else is running like normal at this point. -- Henrik Ossipoff Hansen Developer, Entertainment Trading On 11. nov. 2013 at 16.01.58, Mark Miller (markrmil...@gmail.commailto://markrmil...@gmail.com) wrote: The socket read timeouts are actually fairly short for recovery - we should probably bump them up. Can you file a JIRA issue? It may be a symptom rather than a cause, but given a slow env, bumping them up makes sense. - Mark On Nov 11, 2013, at 8:27 AM, Henrik Ossipoff Hansen h...@entertainment-trading.com wrote: The joy was short-lived. Tonight our environment was “down/slow” a bit longer than usual. It looks like two of our nodes never recovered, clusterstate says everything is active. All nodes are throwing this in the log (the nodes they have trouble reaching are the ones that are affected) - the error comes about several cores: ERROR - 2013-11-11 09:16:42.735; org.apache.solr.common.SolrException; Error while trying to recover. core=products_se_shard1_replica2:org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr04.cd-et.com:8080/solr at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:431) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) ... 4 more ERROR - 2013-11-11 09:16:42.736; org.apache.solr.cloud.RecoveryStrategy; Recovery failed - trying again... (30) core=products_se_shard1_replica2 -- Henrik Ossipoff Hansen Developer, Entertainment Trading On 10. nov. 2013 at 21.07.32, Henrik Ossipoff Hansen (h...@entertainment-trading.commailto://h...@entertainment-trading.com) wrote: Solr version is 4.5.0. I have done some tweaking. Doubling my Zookeeper timeout values in zoo.cfg and the Zookeeper timeout in solr.xml seemed to somewhat minimize the problem, but it still did occur. I next stopped all larger batch indexing in the period where the issues happened, which also seemed to help somewhat. Now the next thing weirds me out a bit - I switched from using Tomcat7 to using the Jetty that ships with Solr, and that actually seems to have fixed the last issues (together with stopping a few smaller updates - very few). During the slow period in the night, I get something like this: 03:11:49 ERROR ZkController There was a problem finding the leader in zk:org.apache.solr.common.SolrException: Could not get leader props 03:06:47 ERROR Overseer
Re: Unit of dimension for solr field
I think Upayavira's suggestion of writing a filter factory fits what you're asking for. However, the other end of cleverness is to simple use solr.TrieIntField and store everything in MB. So for 1TB you'd write 51200. A range query for 256MB to 1GB would be field:[256 TO 1024]. Conversion from MB to your displayed unit (2TB, for example) would happen in the application layer. But using trie ints would be simple and efficient. - Ryan On Mon, Nov 11, 2013 at 7:06 AM, eakarsu eaka...@gmail.com wrote: Thanks Upayavira It seems it needs too much work. I will have several more fields that will have unit values. Do we have more quicker way of implementing it? We have Currency filed coming as default with SOLR. Can we use it? Creating conversion rate table for each field? What I am expecting from units is similar to currency field Erol Akarsu -- View this message in context: http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209p4100295.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.5 and 4.4 - ParsedQuery is different for q=*
On 11/10/2013 10:12 PM, subacini Arunkumar wrote: We are upgrading from Solr 3.5 to Solr 4.4. The response from 3.5 and 4.4 are different. I have attached request and response [highlighted the major difference in RED] Can you please let me know how to change parsedQuery from *MatchAllDocsQuery to text.* *Also, in solr 4.4 if fq or q param has * , we are having this issue. Otheriwse parsedQuery value is text:searchStr* Your attachment never made it through, because most attachments cannot be sent to the mailing list. Nothing was colored red -- the list doesn't really do HTML email. I can't tell if you're using * for emphasis or whether all asterisks were literally there. I'm going to assume that you aren't trying to emphasize things. Apologies if I'm wrong. I can tell you that q=* is not really a valid query for any version of Solr. If you meant all documents with the standard query parser, use q=*:*, which is a special shortcut for all documents. If you meant all documents with the dismax or edismax query parster, then set q.alt to *:* and either pass an empty q value, or don't include q at all. I'm really confused about what your filter query is supposed to accomplish. Four asterisks, one of which is escaped? I have no idea what that is supposed to do. To go much further, we'll need to know what you are trying to accomplish. We will also need to see the config of your /select handler on both versions, the field name(s) that you are trying to search, as well as info from schema.xml about the field(s) and any related fieldType settings. Thanks, Shawn
Re: Unit of dimension for solr field
A custom token filter may indeed be the right way to go, but an alternative is the combination of an update processor and a query preprocessor. The update processor, which could be a JavaScript script could normalize the string into a simple integer byte count. You might also want to keep separate fields, one for the raw string and one for the final byte count. A JavaScript script would be a lot easier to develop than a custom token filter. A query preprocessor could do two things: First, the same string to byte count normalization as the update processor, plus generate a range query. So, for example, a query for 0.5 TB could match 512 GB, 500 GB, etc, with [5000 TO 4999]. Technically, you could implement a query preprocessor as a plugin Solr search component, but if that sounds like too much effort, an application-level implementation would probably be easier to master. -- Jack Krupansky -Original Message- From: Ryan Cutter Sent: Monday, November 11, 2013 10:18 AM To: solr-user@lucene.apache.org Subject: Re: Unit of dimension for solr field I think Upayavira's suggestion of writing a filter factory fits what you're asking for. However, the other end of cleverness is to simple use solr.TrieIntField and store everything in MB. So for 1TB you'd write 51200. A range query for 256MB to 1GB would be field:[256 TO 1024]. Conversion from MB to your displayed unit (2TB, for example) would happen in the application layer. But using trie ints would be simple and efficient. - Ryan On Mon, Nov 11, 2013 at 7:06 AM, eakarsu eaka...@gmail.com wrote: Thanks Upayavira It seems it needs too much work. I will have several more fields that will have unit values. Do we have more quicker way of implementing it? We have Currency filed coming as default with SOLR. Can we use it? Creating conversion rate table for each field? What I am expecting from units is similar to currency field Erol Akarsu -- View this message in context: http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209p4100295.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unit of dimension for solr field
Ryan and Upayavira, Do we have an example skeleton to do this for schema.xml and solrconfig.xml? Example java class that would help to build UnitResolvingFilterFactory class? Thanks Erol Akarsu -- View this message in context: http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209p4100303.html Sent from the Solr - User mailing list archive at Nabble.com.
How to cancel a collection 'optimize'?
We have an internal Solr collection with ~1 billion documents. It's split across 24 shards and uses ~3.2TB of disk space. Unfortunately we've triggered an 'optimize' on the collection (via a restarted browser tab), which has raised the disk usage to 4.6TB, with 130GB left on the disk volume. As I fully expect Solr to use up all of the disk space as the collection is more than 50% of the disk volume, how can I cancel this optimize? And separately, if I were to reissue with maxSegments=(high number, eg 40), should I still expect the same disk usage? (I'm presuming so as doesn't it need to gather the whole index to determine which docs should go into which segments?) Solr 4.4 on RHEL6.4, 160GB RAM, 5GB per shard. (Great conference last week btw - so much to learn!) Gil Hoggarth Web Archiving Technical Services Engineer The British Library, Boston Spa, West Yorkshire, LS23 7BQ Tel: 01937 546163
[Solr 4] Data grouping on weeks
Hi, I'm new with solr and wanted to group data on weeks, is there any built-in date round function so I give date to this function and it return me the week of the year. For example I query to solr against date (01/01/2013) it should return me (1st week of 2013). Like I have following documents in solr: Doc1 CreatedDate: 1/1/2013 Data:ABC Doc2 CreatedDate: 4/1/2013 Data:ABC Doc3 CreatedDate: 3/2/2013 Data:ABC Doc4 CreatedDate: 4/2/2013 Data:ABC Doc5 CreatedDate: 12/2/2013 Data:ABC Result should be: 2013 Week1 :2 records 2013 Week7 :2 records 2013 Week8 :1 record Thanks in advance! Jamshaid
Re: How to cancel a collection 'optimize'?
Hi Gil, (we spoke in Dublin, didn't we?) Short of stopping Solr I have a feeling there isn't much you can do hm. or, I wonder if you could somehow get a thread dump, get the PID of the thread (since I believe threads in Linux are run as processes), and then kill that thread... Feels scary and I'm not sure what this might do to the index, but maybe somebody else can jump in and comment on this approach or suggest a better one. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Nov 11, 2013 at 10:44 AM, Hoggarth, Gil gil.hogga...@bl.uk wrote: We have an internal Solr collection with ~1 billion documents. It's split across 24 shards and uses ~3.2TB of disk space. Unfortunately we've triggered an 'optimize' on the collection (via a restarted browser tab), which has raised the disk usage to 4.6TB, with 130GB left on the disk volume. As I fully expect Solr to use up all of the disk space as the collection is more than 50% of the disk volume, how can I cancel this optimize? And separately, if I were to reissue with maxSegments=(high number, eg 40), should I still expect the same disk usage? (I'm presuming so as doesn't it need to gather the whole index to determine which docs should go into which segments?) Solr 4.4 on RHEL6.4, 160GB RAM, 5GB per shard. (Great conference last week btw - so much to learn!) Gil Hoggarth Web Archiving Technical Services Engineer The British Library, Boston Spa, West Yorkshire, LS23 7BQ Tel: 01937 546163
Re: Adding a server to an existing SOLR cloud cluster
From my understanding, if your already existing cluster satisfies your collection (already live nodes = nr shards * replication factor) there wouldn't be any need for creating additional replicas on the new server, unless you directly ask for them, after startup. I usually just add the machine to the cluster and the manually create the replicas I need. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-a-server-to-an-existing-SOLR-cloud-cluster-tp4100275p4100313.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [Solr 4] Data grouping on weeks
You're probably looking at date math, see: http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/util/DateMathParser.html You're probably going to be faceting to get these counts, see facet ranges here: http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range So the start is something like date/YEAR, then gaps of +7DAYS or some such Best, Erick On Mon, Nov 11, 2013 at 10:51 AM, Jamshaid Ashraf jamshaid...@gmail.comwrote: Hi, I'm new with solr and wanted to group data on weeks, is there any built-in date round function so I give date to this function and it return me the week of the year. For example I query to solr against date (01/01/2013) it should return me (1st week of 2013). Like I have following documents in solr: Doc1 CreatedDate: 1/1/2013 Data:ABC Doc2 CreatedDate: 4/1/2013 Data:ABC Doc3 CreatedDate: 3/2/2013 Data:ABC Doc4 CreatedDate: 4/2/2013 Data:ABC Doc5 CreatedDate: 12/2/2013 Data:ABC Result should be: 2013 Week1 :2 records 2013 Week7 :2 records 2013 Week8 :1 record Thanks in advance! Jamshaid
RE: How to cancel a collection 'optimize'?
Hi Otis, thanks for the response. I could stop the whole Solr service as as yet there's no audience access to it, but might it be left in an incomplete state and thus try to complete optimisation when the service is restarted? [Yes, we did speak in Dublin - you can see we need that monitoring service! Must set up the demo version, asap!] -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: 11 November 2013 16:02 To: solr-user@lucene.apache.org Subject: Re: How to cancel a collection 'optimize'? Hi Gil, (we spoke in Dublin, didn't we?) Short of stopping Solr I have a feeling there isn't much you can do hm. or, I wonder if you could somehow get a thread dump, get the PID of the thread (since I believe threads in Linux are run as processes), and then kill that thread... Feels scary and I'm not sure what this might do to the index, but maybe somebody else can jump in and comment on this approach or suggest a better one. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Nov 11, 2013 at 10:44 AM, Hoggarth, Gil gil.hogga...@bl.uk wrote: We have an internal Solr collection with ~1 billion documents. It's split across 24 shards and uses ~3.2TB of disk space. Unfortunately we've triggered an 'optimize' on the collection (via a restarted browser tab), which has raised the disk usage to 4.6TB, with 130GB left on the disk volume. As I fully expect Solr to use up all of the disk space as the collection is more than 50% of the disk volume, how can I cancel this optimize? And separately, if I were to reissue with maxSegments=(high number, eg 40), should I still expect the same disk usage? (I'm presuming so as doesn't it need to gather the whole index to determine which docs should go into which segments?) Solr 4.4 on RHEL6.4, 160GB RAM, 5GB per shard. (Great conference last week btw - so much to learn!) Gil Hoggarth Web Archiving Technical Services Engineer The British Library, Boston Spa, West Yorkshire, LS23 7BQ Tel: 01937 546163
Re: Function query matching
I replaced the frange filter with the following filter and got the correct no. of results and it was 3X faster: select?qq={!edismax v='news' qf='title^2 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!edismax v='news' qf='title^2 body'} Then, I tried to simplify the query with parameter substitution, but 'fq' didn't parse correctly: select?qq={!edismax v='news' qf='title^2 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq=$qq What is the proper syntax? Thanks, Peter On Thu, Nov 7, 2013 at 2:16 PM, Peter Keegan peterlkee...@gmail.com wrote: I'm trying to used a normalized score in a query as I described in a recent thread titled Re: How to get similarity score between 0 and 1 not relative score I'm using this query: select?qq={!edismax v='news' qf='title^2 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!frange l=0.001}$q Is there another way to accomplish this using dismax boosting? On Thu, Nov 7, 2013 at 12:55 PM, Jason Hellman jhell...@innoventsolutions.com wrote: You can, of course, us a function range query: select?q=text:newsfq={!frange l=0 u=100}sum(x,y) http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html This will give you a bit more flexibility to meet your goal. On Nov 7, 2013, at 7:26 AM, Erik Hatcher erik.hatc...@gmail.com wrote: Function queries score (all) documents, but don't filter them. All documents effectively match a function query. Erik On Nov 7, 2013, at 1:48 PM, Peter Keegan peterlkee...@gmail.com wrote: Why does this function query return docs that don't match the embedded query? select?qq=text:newsq={!func}sum(query($qq),0)
Re: Function query matching
On Mon, Nov 11, 2013 at 11:39 AM, Peter Keegan peterlkee...@gmail.com wrote: fq=$qq What is the proper syntax? fq={!query v=$qq} -Yonik http://heliosearch.com -- making solr shine
RE: date range tree
Has someone at least got a idee how i could do a year/month-date-tree? In Solr-Wiki it is mentioned that facet.date.gap=+1DAY,+2DAY,+3DAY,+10DAY should create 4 buckets but it doesn't work -Original Message- From: Andreas Owen [mailto:a...@conx.ch] Sent: Donnerstag, 7. November 2013 18:23 To: solr-user@lucene.apache.org Subject: date range tree I would like to make a facet on a date field with the following tree: 2013 4.Quartal December November Oktober 3.Quartal September August Juli 2.Quartal June Mai April 1. Quartal March February January 2012 . Same as above So far I have this in solrconfig.xml: str name=facet.date{!ex=last_modified,thema,inhaltstyp,doctype}last_modified /str str name=facet.date.gap+1MONTH/str str name=facet.date.endNOW/MONTH/str str name=facet.date.startNOW/MONTH-36MONTHS/str str name=facet.date.otherafter/str Can I do this in one query or do I need multiple queries? If yes how would I do the second and keep all the facet queries in the count?
Re: Function query matching
Thanks On Mon, Nov 11, 2013 at 11:46 AM, Yonik Seeley yo...@heliosearch.comwrote: On Mon, Nov 11, 2013 at 11:39 AM, Peter Keegan peterlkee...@gmail.com wrote: fq=$qq What is the proper syntax? fq={!query v=$qq} -Yonik http://heliosearch.com -- making solr shine
qf match density?
While doing a search like: q=great+gatsbydefType=edismaxqf=title^1.8 records with a title of great gatsby / great gatsby always score higher than great gatsby just a single time. How do I express that a single match should be just as important as having the query match multiple times in the title field? Thanks, m.
Nutch 1.7 + AJAX Solr returning ALL contents vs. SPECIFIC
Hi: I was encouraged to explore the Solr mail list, specifically regarding the fl–parameter. What is that parameter for and can it accomplish my original task of crawling/indexing specific html components versus parsing the entire page? My original question is listed below (previously on the Nutch mail list): --- I’m using Nutch 1.7 to crawl/index the pages of my domain to Solr and JavaScript library AJAX Solr to capture that index as JSON, which would then print that to the front-end. My question is, if it’s possible to have specific content return (i.e. An H2 tag and a p tag) on the search results page versus all contents of that page? --- Thanks again, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: Unit of dimension for solr field
Can DelimitedPayloadTokenFilterFactory be used to store unit dimension information? This factory class can store extra information for field. -- View this message in context: http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209p4100345.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: dropping noise words and maintaining the relevancy
Hello, On dropping noise words we have scenario that we have to only drop ending noise words. For e.g. 160 Associates LP, the noise words here are Associates and LP but we only want to drop the LP one which is a ending noise word. If we use stop words, it will drop both words and make search key as 160. Any suggestion? Thanks in advance. -Original Message- From: Susheel Kumar [mailto:susheel.ku...@thedigitalgroup.net] Sent: Thursday, October 31, 2013 9:59 PM To: solr-user@lucene.apache.org Subject: RE: dropping noise words and maintaining the relevancy Thanks, Kranti. Nice suggestion. I'll try it out. -Original Message- From: Kranti Parisa [mailto:kranti.par...@gmail.com] Sent: Thursday, October 31, 2013 3:18 PM To: solr-user@lucene.apache.org Subject: Re: dropping noise words and maintaining the relevancy One possible approach is you can populate the titles in a field (say exactMatch) and point your search query to exactMatch:160 Associates LP OR text:160 Associates LP assuming that you have all the text populated into the field called text you can also use field level boosting with the above query, example exactMatch:160 Associates LP^10 OR text:160 Associates LP^5 Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Thu, Oct 31, 2013 at 4:00 PM, Susheel Kumar susheel.ku...@thedigitalgroup.net wrote: Hello, We have a very particular requirement of dropping noise words (LP, LLP, LLC, Corp, Corporation, Inc, Incoporation, PA, Professional Association, Attorney at law, GP, General Partnership etc.) at the end of search key but maintaining the relevancy. For e.g. If user search for 160 Associates LP, we want search to return in their below relevancy order. Basically if exact / similar match is present, it comes first followed by other results. 160 Associates LP 160 Associates 160 Associates LLC 160 Associates LLLP 160 Hilton Associates If I handle this through Stop words then LP will get dropped from search key and then all results will come but exact match will be shown somewhere lower or deep. Regards and appreciate your help. Susheel
HTTP 500 error when invoking a REST client in Solr Analyzer
Hi All, I am working on a custom analyzer in Solr to post content to Apache Stanbol for enhancement during indexing. To post content to Stanbol, inside my custom analyzer's incrementToken() method I have written below code using Jersey client API sample [1]; public boolean incrementToken() throws IOException { if (!input.incrementToken()) { return false; } char[] buffer = charTermAttr.buffer(); String content = new String(buffer); Client client = Client.create(); WebResource webResource = client.resource(http://localhost:8080/enhancer;); ClientResponse response = webResource.type(text/plain).accept(new MediaType(application, rdf+xml)).post(ClientResponse.class, content); int status = response.getStatus(); if (status != 200 status != 201 status != 202) { throw new RuntimeException(Failed : HTTP error code : + response.getStatus()); } String output = response.getEntity(String.class); System.out.println(output); charTermAttr.setEmpty(); char[] newBuffer = output.toCharArray(); charTermAttr.copyBuffer(newBuffer, 0, newBuffer.length); return true; } When testing the analyzer I always get a HTTP 500 response from Stanbol server and I cannot process the enhancement response properly. But I could successfully execute the same jersey client code above in a Java application (in main method) and retrieve desired enhancement response from Stanbol. Any ideas why I always get a HTTP 500 error when invoking a rest endpoint in Solr analyzer? Could it be a permission problem in my Solr analyzer ? Appreciate your help. Thanks, Dileepa [1] https://blogs.oracle.com/enterprisetechtips/entry/consuming_restful_web_services_with [2] 6424 [qtp918598659-11] ERROR org.apache.solr.core.SolrCore – java.lang.RuntimeException: Failed : HTTP error code : 500 at com.solr.test.analyzer.ContentFilter.incrementToken(ContentFilter.java:70) at org.apache.solr.handler.AnalysisRequestHandlerBase.analyzeTokenStream(AnalysisRequestHandlerBase.java:179) at org.apache.solr.handler.AnalysisRequestHandlerBase.analyzeValue(AnalysisRequestHandlerBase.java:126) at org.apache.solr.handler.FieldAnalysisRequestHandler.analyzeValues(FieldAnalysisRequestHandler.java:221) at org.apache.solr.handler.FieldAnalysisRequestHandler.handleAnalysisRequest(FieldAnalysisRequestHandler.java:190) at org.apache.solr.handler.FieldAnalysisRequestHandler.doAnalysis(FieldAnalysisRequestHandler.java:101) at org.apache.solr.handler.AnalysisRequestHandlerBase.handleRequestBody(AnalysisRequestHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at
Re: Solr grouping performance porblem
Thanks Joel, appreciate your help. Is Solr 4.6 due this year ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-grouping-performance-porblem-tp4098565p4100358.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: HTTP 500 error when invoking a REST client in Solr Analyzer
This seems to be a weird intermittent issue when I use the Analysis UI ( http://localhost:8983/solr/#/collection1/analysis) for testing my Analyzer. It works fine when I hard code the input value in the Analyzer and index. I gave the same input : Tim Bernes Lee is a professor at MIT hard coded in the Analyzer class and from the Solr Analysis UI. The UI response failed intermittently when I adjust the field value. This could be a problem with character encoding of the field value it seems. Thanks, Dileepa On Tue, Nov 12, 2013 at 1:33 AM, Dileepa Jayakody dileepajayak...@gmail.com wrote: Hi All, I am working on a custom analyzer in Solr to post content to Apache Stanbol for enhancement during indexing. To post content to Stanbol, inside my custom analyzer's incrementToken() method I have written below code using Jersey client API sample [1]; public boolean incrementToken() throws IOException { if (!input.incrementToken()) { return false; } char[] buffer = charTermAttr.buffer(); String content = new String(buffer); Client client = Client.create(); WebResource webResource = client.resource(http://localhost:8080/enhancer ); ClientResponse response = webResource.type(text/plain).accept(new MediaType(application, rdf+xml)).post(ClientResponse.class, content); int status = response.getStatus(); if (status != 200 status != 201 status != 202) { throw new RuntimeException(Failed : HTTP error code : + response.getStatus()); } String output = response.getEntity(String.class); System.out.println(output); charTermAttr.setEmpty(); char[] newBuffer = output.toCharArray(); charTermAttr.copyBuffer(newBuffer, 0, newBuffer.length); return true; } When testing the analyzer I always get a HTTP 500 response from Stanbol server and I cannot process the enhancement response properly. But I could successfully execute the same jersey client code above in a Java application (in main method) and retrieve desired enhancement response from Stanbol. Any ideas why I always get a HTTP 500 error when invoking a rest endpoint in Solr analyzer? Could it be a permission problem in my Solr analyzer ? Appreciate your help. Thanks, Dileepa [1] https://blogs.oracle.com/enterprisetechtips/entry/consuming_restful_web_services_with [2] 6424 [qtp918598659-11] ERROR org.apache.solr.core.SolrCore – java.lang.RuntimeException: Failed : HTTP error code : 500 at com.solr.test.analyzer.ContentFilter.incrementToken(ContentFilter.java:70) at org.apache.solr.handler.AnalysisRequestHandlerBase.analyzeTokenStream(AnalysisRequestHandlerBase.java:179) at org.apache.solr.handler.AnalysisRequestHandlerBase.analyzeValue(AnalysisRequestHandlerBase.java:126) at org.apache.solr.handler.FieldAnalysisRequestHandler.analyzeValues(FieldAnalysisRequestHandler.java:221) at org.apache.solr.handler.FieldAnalysisRequestHandler.handleAnalysisRequest(FieldAnalysisRequestHandler.java:190) at org.apache.solr.handler.FieldAnalysisRequestHandler.doAnalysis(FieldAnalysisRequestHandler.java:101) at org.apache.solr.handler.AnalysisRequestHandlerBase.handleRequestBody(AnalysisRequestHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at
Re: Unit of dimension for solr field
You seem to be consistently missing the problem that your queries will not work as expected. How would you do a range query without writing a some kind of custom code that looked at the payloads to determine the normalized units? The simplest way to do this is probably have your ingestion side normalize. Put the original (complete with units) in a field that has indexed=false, this will only be used for showing in the results list. _Also_ add the normalized field to another filed that you set indexed=true and stored=false to. that will allow range searches, faceting, etc. HTH, Erick On Mon, Nov 11, 2013 at 2:36 PM, eakarsu eaka...@gmail.com wrote: Can DelimitedPayloadTokenFilterFactory be used to store unit dimension information? This factory class can store extra information for field. -- View this message in context: http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209p4100345.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr grouping performance porblem
In fact, there's some movement towards starting the release process this week, stay tuned! Erick On Mon, Nov 11, 2013 at 4:12 PM, shamik sham...@gmail.com wrote: Thanks Joel, appreciate your help. Is Solr 4.6 due this year ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-grouping-performance-porblem-tp4098565p4100358.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr grouping performance porblem
On 11/11/2013 2:12 PM, shamik wrote: Thanks Joel, appreciate your help. Is Solr 4.6 due this year ? The job of release manager for 4.6 has already been claimed. There should be a release candidate posted on the dev list sometime on November 12th (tomorrow) in the USA timezones, unless a serious problem is discovered. After the RC gets posted, there is a 72-hour voting period where committers vote whether or not to release that version. If someone finds a problem that warrants a negative vote during that 72 hour period, it will be put on hold until the problem is fixed. A new RC will eventually be made available and the 72-hour voting period will begin again. When the vote finally passes, the release process will begin. It typically takes 2-3 days after that before the official announcement is made. What this means in real terms is that 4.6 will most likely be out before the end of November. It would take a major series of bugs and problems tokeep that from happening. Because of the upcoming holiday madness, I think 4.7 is not likely to happen before next year. Thanks, Shawn
Re: How to cancel a collection 'optimize'?
On Mon, Nov 11, 2013 at 11:28 AM, Hoggarth, Gil gil.hogga...@bl.uk wrote: I could stop the whole Solr service as as yet there's no audience access to it, but might it be left in an incomplete state and thus try to complete optimisation when the service is restarted? Should be fine. Lucene has a write-once architecture... existing segment files are not changed, and only deleted when a merge (producing a new segment containing the old segment) has completed. So if you stop things in the middle of a commit/optimize, the index should always correctly open on the last completed commit/optimize. -Yonik http://heliosearch.com -- making solr shine
Why do people want to deploy to Tomcat?
Hello, I keep seeing here and on Stack Overflow people trying to deploy Solr to Tomcat. We don't usually ask why, just help when where we can. But the question happens often enough that I am curious. What is the actual business case. Is that because Tomcat is well known? Is it because other apps are running under Tomcat and it is ops' requirement? Is it because Tomcat gives something - to Solr - that Jetty does not? It might be useful to know. Especially, since Solr team is considering making the server part into a black box component. What use cases will that break? So, if somebody runs Solr under Tomcat (or needed to and gave up), let's use this thread to collect this knowledge. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Indexing a token to a different field in a custom filter
Hi All, In my custom filter, I need to index the processed token into a different field. The processed token is a Stanbol enhancement response. The solution I have so far found is to use a Solr client (solj) to add a new Document with my processed field into Solr. Below is the sample code segment; SolrServer server = new HttpSolrServer(http://localhost:8983/solr/;); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( id, id1, 1.0f ); doc1.addField(stanbolResponse, response); try { server.add(doc1); server.commit(); } catch (SolrServerException e) { e.printStackTrace(); } This mechanism requires a new HTTP call to the local Solr server for every token I process for the stanbolRequest field, and I feel it's not very efficient. Is there any other alternative way to invoke a update request to add a new field to the indexing document within the filter (without making an explicit HTTP call using Solrj)? Thanks, Dileepa
SolrCloud 4.5.1 and Zookeeper SASL
Howdy. We are testing to upgrade Solr from 4.3 to 4.5.1 . We're using SolrCloud and our problem is that the core does not appear to be loaded anymore. We've set logging to DEBUG and we've found lots of those 2013-11-12 06:30:43,339 [pool-2-thread-1-SendThread(our.zookeeper.com:2181)] DEBUG org.apache.zookeeper.client.ZooKeeperSaslClient âU+0080U+0093 Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration Zookeeper is up and running. Is there any doco on how to disable SASL ? Or what changes were made to SolrCould exactly? Much appreciated, Sven
Re: eDisMax, multiple language support and stopwords
Happy to see some one have similar solutions as ours. we have similar multi-language search feature and we index different language content to _fr, _en field like you've done but in search, we need a language code as a parameter to specify the language client wants to search on which is normally decided by the website visited, such as: qf=name descriptionlanguage=en and in our search components we find the right field: name_en and description_en to be searched on we used to support on all language search and removed that later, as the site tells the customer which language is supported, we also don't think we have many language experts on our web sites that knows more than two language and need to search them at the same time. On 7 November 2013 23:01, Tom Mortimer tom.m.f...@gmail.com wrote: Ah, thanks Markus. I think I'll just add the Boolean operators to the stopwords list in that case. Tom On 7 November 2013 12:01, Markus Jelsma markus.jel...@openindex.io wrote: This is an ancient problem. The issue here is your mm-parameter, it gets confused because for separate fields different amount of tokens are filtered/emitted so it is never going to work just like this. The easiest option is not to use the stopfilter. http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-td493483.html https://issues.apache.org/jira/browse/SOLR-3085 -Original message- From:Tom Mortimer tom.m.f...@gmail.com Sent: Thursday 7th November 2013 12:50 To: solr-user@lucene.apache.org Subject: eDisMax, multiple language support and stopwords Hi all, Thanks for the help and advice I've got here so far! Another question - I want to support stopwords at search time, so that e.g. the query oscar and wilde is equivalent to oscar wilde (this is with lowercaseOperators=false). Fair enough, I have stopword and in the query analyser chain. However, I also need to support French as well as English, so I've got _en and _fr versions of the text fields, with appropriate stemming and stopwords. I index French content into the _fr fields and English into the _en fields. I'm searching with eDisMax over both versions, e.g.: str name=qfheadline_en headline_fr/str However, this means I get no results for oscar and wilde. The parsed query is: (+((DisjunctionMaxQuery((headline_fr:osca | headline_en:oscar)) DisjunctionMaxQuery((headline_fr:and)) DisjunctionMaxQuery((headline_fr:wild | headline_en:wild)))~3))/no_coord If I add and to the French stopwords list, I *do* get results, and the parsed query is: (+((DisjunctionMaxQuery((headline_fr:osca | headline_en:oscar)) DisjunctionMaxQuery((headline_fr:wild | headline_en:wild)))~2))/no_coord This implies that the only solution is to have a minimal, shared stopwords list for all languages I want to support. Is this correct, or is there a way of supporting this kind of searching with per-language stopword lists? Thanks for any ideas! Tom -- All the best Liu Bo
Replicate Solr Cloud
Hi i want to create solr cloud like this: 1 solr cloud in location A, and another solr cloud in location B how to make that solr cloud is location B is replicate solr cloud in location A. And if all node in slor cloud A is die slor cloud B is still working and vice versa. any body know how to create this thanks
Re: Multi-core support for indexing multiple servers
like Erick said, merge data from different datasource could be very difficult, SolrJ is much easier to use but may need another application to do handle index process if you don't want to extends solr much. I eventually end up with a customized request handler which use SolrWriter from DIH package to index data, So that I can fully control the index process, quite like SolrJ, you can write code to convert your data into SolrInputDocument, and then post them to SolrWriter, SolrWriter will handles the rest stuff. On 8 November 2013 21:46, Erick Erickson erickerick...@gmail.com wrote: Yep, you can define multiple data sources for use with DIH. Combining data from those multiple sources into a single index can be a bit tricky with DIH, personally I tend to prefer SolrJ, but that's mostly personal preference, especially if I want to get some parallelism going on. But whatever works Erick On Thu, Nov 7, 2013 at 11:17 PM, manju16832003 manju16832...@gmail.com wrote: Eric, Just a question :-), wouldn't it be easy to use DIH to pull data from multiple data sources. I do use DIH to do that comfortably. I have three data sources - MySQL - URLDataSource that returns XML from an .NET application - URLDataSource that connects to an API and return XML Here is part of data-config data source settings dataSource type=JdbcDataSource name=solr driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/employeeDB batchSize=-1 user=root password=root/ dataSource name=CRMServer type=URLDataSource encoding=UTF-8 connectionTimeout=5000 readTimeout=1/ dataSource name=ImageServer type=URLDataSource encoding=UTF-8 connectionTimeout=5000 readTimeout=1/ Of course, in application I do the same. To construct my results, I do connect to MySQL and those two data sources. Basically we have two point of indexing - Using DIH at one time indexing - At application whenever there is transaction to the details that we are storing in Solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html Sent from the Solr - User mailing list archive at Nabble.com. -- All the best Liu Bo