Re: ConcurrentUpdateSolrServer Missing ContentType error on SOLR 4.2.1
I apologize for intruding, Shawn, do you know what can cause empty params (i.e. params={}) ? I've got no idea what is causing this problem on your system. All of the ideas I've had so far don't seem to apply. Can you run a packet sniffer on your client to see whether the client is sending the right info? Thanks, Shawn
Re: update to 4.3
Any tips on what to do with the configuration files? Where do I have to store them and what should they look like? Any examples? May 07, 2013 6:16:27 AM org.apache.catalina.core.AprLifecycleListener init INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib May 07, 2013 6:16:28 AM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler [http-bio-8983] May 07, 2013 6:16:28 AM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler [ajp-bio-8009] May 07, 2013 6:16:28 AM org.apache.catalina.startup.Catalina load INFO: Initialization processed in 621 ms May 07, 2013 6:16:28 AM org.apache.catalina.core.StandardService startInternal INFO: Starting service Catalina May 07, 2013 6:16:28 AM org.apache.catalina.core.StandardEngine startInternal INFO: Starting Servlet Engine: Apache Tomcat/7.0.39 May 07, 2013 6:16:28 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /usr/local/apache-tomcat-7.0.39/webapps/solr.war log4j:WARN No appenders could be found for logger (org.apache.solr.servlet.SolrDispatchFilter). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/host-manager May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/docs May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/manager May 07, 2013 6:16:34 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/ROOT May 07, 2013 6:16:34 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/examples May 07, 2013 6:16:34 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [http-bio-8983] May 07, 2013 6:16:34 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [ajp-bio-8009] May 07, 2013 6:16:34 AM org.apache.catalina.startup.Catalina start INFO: Server startup in 6000 ms BR, Arkadi On 05/06/2013 10:13 PM, Jan Høydahl wrote: Hi, The reason is that from Solr 4.3 you need to provide the SLF4J logger jars of choice when deploying Solr to an external servlet container. Simplest is to copy all jars from example/lib/ext into tomcat/lib cd solr-4.3.0/example/lib/ext cp * /usr/local/apache-tomcat-7.0.39/lib/ Please see CHANGES.TXT for more info http://lucene.apache.org/solr/4_3_0/changes/Changes.html#4.3.0.upgrading_from_solr_4.2.0 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 6. mai 2013 kl. 16:50 skrev Arkadi Colson ark...@smartbit.be: Hi After update to 4.3 I got this error: May 06, 2013 2:30:08 PM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler [http-bio-8983] May 06, 2013 2:30:08 PM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler [ajp-bio-8009] May 06, 2013 2:30:08 PM org.apache.catalina.startup.Catalina load INFO: Initialization processed in 610 ms May 06, 2013 2:30:08 PM org.apache.catalina.core.StandardService startInternal INFO: Starting service Catalina May 06, 2013 2:30:08 PM org.apache.catalina.core.StandardEngine startInternal INFO: Starting Servlet Engine: Apache Tomcat/7.0.39 May 06, 2013 2:30:08 PM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /usr/local/apache-tomcat-7.0.39/webapps/solr.war May 06, 2013 2:30:45 PM org.apache.catalina.util.SessionIdGenerator createSecureRandom INFO: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [36,697] milliseconds. May 06, 2013 2:30:45 PM org.apache.catalina.core.StandardContext startInternal SEVERE: Error filterStart May 06, 2013 2:30:45 PM org.apache.catalina.core.StandardContext startInternal SEVERE: Context [/solr] startup failed due to previous errors May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/host-manager May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/docs May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/manager May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig
Re: update to 4.3
Found it on http://wiki.apache.org/solr/SolrLogging! Thx On 05/07/2013 08:40 AM, Arkadi Colson wrote: Any tips on what to do with the configuration files? Where do I have to store them and what should they look like? Any examples? May 07, 2013 6:16:27 AM org.apache.catalina.core.AprLifecycleListener init INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib May 07, 2013 6:16:28 AM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler [http-bio-8983] May 07, 2013 6:16:28 AM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler [ajp-bio-8009] May 07, 2013 6:16:28 AM org.apache.catalina.startup.Catalina load INFO: Initialization processed in 621 ms May 07, 2013 6:16:28 AM org.apache.catalina.core.StandardService startInternal INFO: Starting service Catalina May 07, 2013 6:16:28 AM org.apache.catalina.core.StandardEngine startInternal INFO: Starting Servlet Engine: Apache Tomcat/7.0.39 May 07, 2013 6:16:28 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /usr/local/apache-tomcat-7.0.39/webapps/solr.war log4j:WARN No appenders could be found for logger (org.apache.solr.servlet.SolrDispatchFilter). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/host-manager May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/docs May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/manager May 07, 2013 6:16:34 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/ROOT May 07, 2013 6:16:34 AM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/examples May 07, 2013 6:16:34 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [http-bio-8983] May 07, 2013 6:16:34 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler [ajp-bio-8009] May 07, 2013 6:16:34 AM org.apache.catalina.startup.Catalina start INFO: Server startup in 6000 ms BR, Arkadi On 05/06/2013 10:13 PM, Jan Høydahl wrote: Hi, The reason is that from Solr 4.3 you need to provide the SLF4J logger jars of choice when deploying Solr to an external servlet container. Simplest is to copy all jars from example/lib/ext into tomcat/lib cd solr-4.3.0/example/lib/ext cp * /usr/local/apache-tomcat-7.0.39/lib/ Please see CHANGES.TXT for more info http://lucene.apache.org/solr/4_3_0/changes/Changes.html#4.3.0.upgrading_from_solr_4.2.0 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 6. mai 2013 kl. 16:50 skrev Arkadi Colson ark...@smartbit.be: Hi After update to 4.3 I got this error: May 06, 2013 2:30:08 PM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler [http-bio-8983] May 06, 2013 2:30:08 PM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler [ajp-bio-8009] May 06, 2013 2:30:08 PM org.apache.catalina.startup.Catalina load INFO: Initialization processed in 610 ms May 06, 2013 2:30:08 PM org.apache.catalina.core.StandardService startInternal INFO: Starting service Catalina May 06, 2013 2:30:08 PM org.apache.catalina.core.StandardEngine startInternal INFO: Starting Servlet Engine: Apache Tomcat/7.0.39 May 06, 2013 2:30:08 PM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive /usr/local/apache-tomcat-7.0.39/webapps/solr.war May 06, 2013 2:30:45 PM org.apache.catalina.util.SessionIdGenerator createSecureRandom INFO: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [36,697] milliseconds. May 06, 2013 2:30:45 PM org.apache.catalina.core.StandardContext startInternal SEVERE: Error filterStart May 06, 2013 2:30:45 PM org.apache.catalina.core.StandardContext startInternal SEVERE: Context [/solr] startup failed due to previous errors May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/host-manager May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /usr/local/apache-tomcat-7.0.39/webapps/docs May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application
Re: When a search query comes to a replica what happens?
Hi Otis; I've read at somewhere says that if you have one replica and 1000 query per second search rate and if you switch to 5 replica you may get 200 qps search rate. What do you think about that and how Solr parallelize searching within replicas? By the way when you say replica do you mean both replica and leader (because of leader is a replica too) or nodes of shard except for leader? 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com No. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 6:23 PM, Furkan KAMACI furkankam...@gmail.com wrote: All in all will replica ask to its leader about where is remaining of data or it directly asks to Zookeper? 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com Hi, No, I believe redirect from replica to leader would happen only at index time, so a doc first gets indexed to leader and from there it's replicated to non-leader shards. At query time there is no redirect to leader, I imagine, as that would quickly turn leaders into hotspots. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com wrote: I want to make it clear in my mind: When a search query comes to a replica what happens? -Does it forwards the search query to leader and leader collects all the data and prepares response (this will cause a performance issue because leader is responsible for indexing at same time) or - replica communicates with leader and learns where is remaining data(leaders asks to Zookeper and tells it to replica) and replica collects all data and response it?
RE: Solr Cloud with large synonyms.txt
We have synonym files bigger than 5MB so even with compression that would be probably failing (not using solr cloud yet) Roman On 6 May 2013 23:09, David Parks davidpark...@yahoo.com wrote: Wouldn't it make more sense to only store a pointer to a synonyms file in zookeeper? Maybe just make the synonyms file accessible via http so other boxes can copy it if needed? Zookeeper was never meant for storing significant amounts of data. -Original Message- From: Jan Høydahl [mailto:jan@cominvent.com] Sent: Tuesday, May 07, 2013 4:35 AM To: solr-user@lucene.apache.org Subject: Re: Solr Cloud with large synonyms.txt See discussion here http://lucene.472066.n3.nabble.com/gt-1MB-file-to-Zookeeper-td3958614.html One idea was compression. Perhaps if we add gzip support to SynonymFilter it can read synonyms.txt.gz which would then fit larger raw dicts? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 6. mai 2013 kl. 18:32 skrev Son Nguyen s...@trancorp.com: Hello, I'm building a Solr Cloud (version 4.1.0) with 2 shards and a Zookeeper (the Zookeeer is on different machine, version 3.4.5). I've tried to start with a 1.7MB synonyms.txt, but got a ConnectionLossException: Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /configs/solr1/synonyms.txt at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266) at org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:270) at org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:267) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java :65) at org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:267) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:436) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:315) at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1135) at org.apache.solr.cloud.ZkController.uploadConfigDir(ZkController.java:955) at org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:285) ... 43 more I did some researches on internet and found out that because Zookeeper znode size limit is 1MB. I tried to increase the system property jute.maxbuffer but it won't work. Does anyone have experience of dealing with it? Thanks, Son
Re: Rearranging Search Results of a Search?
Can I use Transformers for my purpose? 2013/5/3 Furkan KAMACI furkankam...@gmail.com I think this looks like what I search for: https://issues.apache.org/jira/browse/SOLR-4465 How about post filter for Lucene, can it help me for my purpose? 2013/5/3 Otis Gospodnetic otis.gospodne...@gmail.com Hi, You should use search more often :) http://search-lucene.com/?q=scriptable+collectorsort=newestOnTopfc_project=Solrfc_type=issue Coincidentally, what you see there happens to be a good example of a Solr component that does something behind the scenes to deliver those search results even though my original query was bad. Knd of similar to what you are after. Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, May 2, 2013 at 4:47 PM, Furkan KAMACI furkankam...@gmail.com wrote: I know that I can use boosting at query for a field, for a searching term, at solrconfig.xml and query elevator so I can arrange the results of a search. However after I get top documents how can I change the order of a results? Does Lucene's postfilter stands for that?
Re: Solr Cloud with large synonyms.txt
Hi, SolrCloud is designed with an assumption that you should be able to upload your whole disk-based conf folder into ZK, and that you should be able to add an empty Solr node to a cluster and it would download all config from ZK. So immediately a splitting strategy automatically handled by ZkSolresourceLoader for large files could be one way forward, i.e. store synonyms.txt as e.g. __001_synonyms.txt __002_synonyms.txt Feel free to open a JIRA issue for this so we can get a proper resolution. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 7. mai 2013 kl. 09:55 skrev Roman Chyla roman.ch...@gmail.com: We have synonym files bigger than 5MB so even with compression that would be probably failing (not using solr cloud yet) Roman On 6 May 2013 23:09, David Parks davidpark...@yahoo.com wrote: Wouldn't it make more sense to only store a pointer to a synonyms file in zookeeper? Maybe just make the synonyms file accessible via http so other boxes can copy it if needed? Zookeeper was never meant for storing significant amounts of data. -Original Message- From: Jan Høydahl [mailto:jan@cominvent.com] Sent: Tuesday, May 07, 2013 4:35 AM To: solr-user@lucene.apache.org Subject: Re: Solr Cloud with large synonyms.txt See discussion here http://lucene.472066.n3.nabble.com/gt-1MB-file-to-Zookeeper-td3958614.html One idea was compression. Perhaps if we add gzip support to SynonymFilter it can read synonyms.txt.gz which would then fit larger raw dicts? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 6. mai 2013 kl. 18:32 skrev Son Nguyen s...@trancorp.com: Hello, I'm building a Solr Cloud (version 4.1.0) with 2 shards and a Zookeeper (the Zookeeer is on different machine, version 3.4.5). I've tried to start with a 1.7MB synonyms.txt, but got a ConnectionLossException: Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /configs/solr1/synonyms.txt at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266) at org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:270) at org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:267) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java :65) at org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:267) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:436) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:315) at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1135) at org.apache.solr.cloud.ZkController.uploadConfigDir(ZkController.java:955) at org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:285) ... 43 more I did some researches on internet and found out that because Zookeeper znode size limit is 1MB. I tried to increase the system property jute.maxbuffer but it won't work. Does anyone have experience of dealing with it? Thanks, Son
Re: Delete from Solr Cloud 4.0 index..
Hi Erick, Thanks for the tip. Will docValues help with memory usage? It seemed a bit complicated to set up.. The index size saving was nice because that means that potentially I could use smaller provisioned IOP volumes which cost less... Thanks. On 3 May 2013 18:27, Erick Erickson erickerick...@gmail.com wrote: Anette: Be a little careful with the index size savings, they really don't mean much for _searching_. The sotred field compression significantly reduces the size on disk, but only for the stored data which is only accessed when returning the top N docs. In terms of how many docs you can fit on your hardware, it's pretty irrelevant. The *.fdt and *.fdx files in your index directory contain the stored data, so when looking at the effects of various options (including compression), you can pretty much ignore these files. FWIW, Erick On Fri, May 3, 2013 at 2:03 AM, Annette Newton annette.new...@servicetick.com wrote: Thanks Shawn. I have played around with Soft Commits before and didn't seem to have any improvement, but with the current load testing I am doing I will give it another go. I have researched docValues and came across the fact that it would increase the index size. With the upgrade to 4.2.1 the index size has reduced by approx 33% which is pleasing and I don't really want to lose that saving. We do use the facet.enum method - which works really well, but I will verify that we are using that in every instance, we have numerous developers working on the product and maybe one or two have slipped through. Right from the first I upped the zkClientTimeout to 30 as I wanted to give extra time for any network blips that we experience on AWS. We only seem to drop communication on a full garbage collection though. I am coming to the conclusion that we need to have more shards to cope with the writes, so I will play around with adding more shards and see how I go. I appreciate you having a look over our setup and the advice. Thanks again. Netty. On 2 May 2013 23:17, Shawn Heisey s...@elyograg.org wrote: On 5/2/2013 4:24 AM, Annette Newton wrote: Hi Shawn, Thanks so much for your response. We basically are very write intensive and write throughput is pretty essential to our product. Reads are sporadic and actually is functioning really well. We write on average (at the moment) 8-12 batches of 35 documents per minute. But we really will be looking to write more in the future, so need to work out scaling of solr and how to cope with more volume. Schema (I have changed the names) : http://pastebin.com/x1ry7ieW Config: http://pastebin.com/pqjTCa7L This is very clean. There's probably more you could remove/comment, but generally speaking I couldn't find any glaring issues. In particular, you have disabled autowarming, which is a major contributor to commit speed problems. The first thing I think I'd try is increasing zkClientTimeout to 30 or 60 seconds. You can use the startup commandline or solr.xml, I would probably use the latter. Here's a solr.xml fragment that uses a system property or a 15 second default: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true sharedLib=lib cores adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000} hostPort=${jetty.port:} hostContext=solr General thoughts, these changes might not help this particular issue: You've got autoCommit with openSearcher=true. This is a hard commit. If it were me, I would set that up with openSearcher=false and either do explicit soft commits from my application or set up autoSoftCommit with a shorter timeframe than autoCommit. This might simply be a scaling issue, where you'll need to spread the load wider than four shards. I know that there are financial considerations with that, and they might not be small, so let's leave that alone for now. The memory problems might be a symptom/cause of the scaling issue I just mentioned. You said you're using facets, which can be a real memory hog even with only a few of them. Have you tried facet.method=enum to see how it performs? You'd need to switch to it exclusively, never go with the default of fc. You could put that in the defaults or invariants section of your request handler(s). Another way to reduce memory usage for facets is to use disk-based docValues on version 4.2 or later for the facet fields, but this will increase your index size, and your index is already quite large. Depending on your index contents, the increase may be small or large. Something to just mention: It looks like your solrconfig.xml has hard-coded absolute paths for dataDir and updateLog. This is fine if you'll only ever have one core/collection on each server, but it'll be a disaster if you have multiples. I could be wrong about how these get interpreted
Re: solr adding unique values
Thanks Erik, For the reply ! I know about 'set' but that's not my goal, i had to give a better example. I want this and if i have to add another list_c user a[ id:a liists[ list_a, list_b ] ] It Should look like: user a[ id:a liists[ list_a, list_b, list_c ] ] However if i again add list_a, it should *not* be: user a[ id:a liists[ list_a, list_b, list_c, list_a, ] ] I am *not* reindexing the documents. Depends on your goal here. I'm guessing you're using atomic updates, in which case you need to use set rather than add as the former replaces the contents. See: http://wiki.apache.org/solr/UpdateJSON#Solr_4.0_Example If you're simply re-indexing the documents, just send the entire fresh document to solr and it'll replace the earlier document completely. Best Erick On Mon, May 6, 2013 at 1:44 PM, Nikhil Kumar nikhil.ku...@hashedin.comwrote: Hey, I have recently started using solr, I have a list of users, which are subscribed to some lists. eg. user a[ id:a liists[ list_a ] ] user b[ id:b liists[ list_a ] ] I am using {id: a, lists:{add:list_a}} to add particular list a user. but what is happening if I use the same command again, it again adds the same list, which i want to avoid. user a[ id:a liists[ list_a, list_a ] ] I searched the documentation and tutorials, i found - overwrite = true | false — default is true, meaning newer documents will replace previously added documents with the same uniqueKey. - commitWithin = (milliseconds) if the commitWithin attribute is present, the document will be added within that time. [image: !] Solr1.4 http://wiki.apache.org/solr/Solr1.4. See CommitWithinhttp://wiki.apache.org/solr/CommitWithin - (deprecated) allowDups = true | false — default is false - (deprecated) overwritePending = true | false — default is negation of allowDups - (deprecated) overwriteCommitted = true|false — default is negation of allowDups but using overwrite and allowDups didn't solve the problem either, seems because there is no unique id but just value. So the question is how to solve this problem? -- Thank You and Regards, Nikhil Kumar +91-9916343619 Technical Analyst Hashed In Technologies Pvt. Ltd. -- Thank You and Regards, Nikhil Kumar +91-9916343619 Technical Analyst Hashed In Technologies Pvt. Ltd.
Lazy load Error on UI analysis area
Hi, I was exploring the UI interface and in the analysis section I had a lazy load error. The logs says: INFO - 2013-05-07 11:52:06.412; org.apache.solr.core.SolrCore; [] webapp=/solr path=/admin/luke params={_=1367923926380show=schemawt=json} status=0 QTime=23 ERROR - 2013-05-07 11:52:06.499; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:258) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.solr.FieldAnalysisRequestHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:464) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:592) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249) ... 20 more Caused by: java.lang.ClassNotFoundException: solr.solr.FieldAnalysisRequestHandler at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:266) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448) ... 24 more - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Lazy-load-Error-on-UI-analysis-area-tp4061291.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search performance: shards or replications?
Hi, It depends(TM) on what kind of search performance problems you are seeing. If you simply have so high query load that the server starts to kneal, it will definitely not help to shard, since ALL the shards will still be hit with ALL the queries, and you add some extra overhead with sharding as well. But if your QPS is moderate and you have tons of documents, you may gain better performance both for indexing latency and search latency by sharding. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov s.sandalni...@gmail.com: Hi, We are moving to SolrCloud architecture. And I have question about search performance and its correlation with shards or replicas. What will be more efficient: to split all index we have to several shards or create several replications of index? Is parallel search works with both shards and replicas? Please share your experience regarding this matter. Thanks in advance. Regards, Stanislav
Re: Search performance: shards or replications?
Hi Yan, Thanks for the quick reply. Thus, replication seems to be the preferable solution. QTime decreases proportional to replications number or there are any other drawbacks? Just to clarify, what amount of documents stands for tons of documents in your opinion? :) 2013/5/7 Jan Høydahl jan@cominvent.com Hi, It depends(TM) on what kind of search performance problems you are seeing. If you simply have so high query load that the server starts to kneal, it will definitely not help to shard, since ALL the shards will still be hit with ALL the queries, and you add some extra overhead with sharding as well. But if your QPS is moderate and you have tons of documents, you may gain better performance both for indexing latency and search latency by sharding. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov s.sandalni...@gmail.com : Hi, We are moving to SolrCloud architecture. And I have question about search performance and its correlation with shards or replicas. What will be more efficient: to split all index we have to several shards or create several replications of index? Is parallel search works with both shards and replicas? Please share your experience regarding this matter. Thanks in advance. Regards, Stanislav
Re: Search performance: shards or replications?
P.S. Sorry for misspelling your name, Jan 2013/5/7 Stanislav Sandalnikov s.sandalni...@gmail.com Hi Yan, Thanks for the quick reply. Thus, replication seems to be the preferable solution. QTime decreases proportional to replications number or there are any other drawbacks? Just to clarify, what amount of documents stands for tons of documents in your opinion? :) 2013/5/7 Jan Høydahl jan@cominvent.com Hi, It depends(TM) on what kind of search performance problems you are seeing. If you simply have so high query load that the server starts to kneal, it will definitely not help to shard, since ALL the shards will still be hit with ALL the queries, and you add some extra overhead with sharding as well. But if your QPS is moderate and you have tons of documents, you may gain better performance both for indexing latency and search latency by sharding. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov s.sandalni...@gmail.com: Hi, We are moving to SolrCloud architecture. And I have question about search performance and its correlation with shards or replicas. What will be more efficient: to split all index we have to several shards or create several replications of index? Is parallel search works with both shards and replicas? Please share your experience regarding this matter. Thanks in advance. Regards, Stanislav
How to get Term Vector Information on Distributed Search
Hi, I am using distributed query to fetch records. Distributed Search Document on wiki says , Distributed Search support distributed query. but I m getting error while querying. Not sure if I am doing anything wrong. below is my Query to fetch Term Vector with Distributed Search. http://localhost:8080/solr/core1/tvrh?q=id:3426545tv.all=truef.text.tv.tf_idf=falsef.text.tv.df=falsetv.fl=textshards=localhost:8080/solr/core1,localhost:8080/solr/core2,localhost:8080/solr/core3shards.qt=selectdebugQuery=on Below is error coming... java.lang.NullPointerException at org.apache.solr.handler.component.TermVectorComponent.finishStage(TermVectorComponent.java:437) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:317) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161) at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:153) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:368) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:671) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:930) at java.lang.Thread.run(Unknown Source) Please help me on this. Thanks Meghana -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-Term-Vector-Information-on-Distributed-Search-tp4061313.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: How to get Term Vector Information on Distributed Search
hi - this is a known issue: https://issues.apache.org/jira/browse/SOLR-4479 -Original message- From:meghana meghana.rav...@amultek.com Sent: Tue 07-May-2013 14:28 To: solr-user@lucene.apache.org Subject: How to get Term Vector Information on Distributed Search Hi, I am using distributed query to fetch records. Distributed Search Document on wiki says , Distributed Search support distributed query. but I m getting error while querying. Not sure if I am doing anything wrong. below is my Query to fetch Term Vector with Distributed Search. http://localhost:8080/solr/core1/tvrh?q=id:3426545tv.all=truef.text.tv.tf_idf=falsef.text.tv.df=falsetv.fl=textshards=localhost:8080/solr/core1,localhost:8080/solr/core2,localhost:8080/solr/core3shards.qt=selectdebugQuery=on Below is error coming... java.lang.NullPointerException at org.apache.solr.handler.component.TermVectorComponent.finishStage(TermVectorComponent.java:437) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:317) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161) at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:153) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:368) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:671) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:930) at java.lang.Thread.run(Unknown Source) Please help me on this. Thanks Meghana -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-Term-Vector-Information-on-Distributed-Search-tp4061313.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 1.4 - Proximity Search - Where is configuration for storing positions?
I have an index built using Solr 1.4 with one field. I was able to run proximity search (Ex: word1 within5 word2) but no where in the configuration I see any information about storing/indexing the positions or offsets of the terms. My understanding is that we need to store/index termvectors positions/offsets for proximity search to work. Can someone please tell if positions are indexed by default in Solr 1.4? FYI, Here is the configuration of field in schema.xml (to keep it simple I am only adding fieldType and field definition from schema.xml here) fieldtype class=solr.TextField name=string omitNorms=true sortMissingLast=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.StopFilterFactory enablePositionIncrements=true ignoreCase=true words=stop-words.txt / /analyzer /fieldtype field indexed=true multiValued=false name=contents stored=true type=string / Thanks -kRider - Thanks -K'Rider -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-1-4-Proximity-Search-Where-is-configuration-for-storing-positions-tp4061315.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr 1.4 - Proximity Search - Where is configuration for storing positions?
Hi - they are indexed by default but can be omitted since 3.4: http://wiki.apache.org/solr/SchemaXml#Common_field_options -Original message- From:KnightRider ksu.wildc...@gmail.com Sent: Tue 07-May-2013 14:41 To: solr-user@lucene.apache.org Subject: Solr 1.4 - Proximity Search - Where is configuration for storing positions? I have an index built using Solr 1.4 with one field. I was able to run proximity search (Ex: word1 within5 word2) but no where in the configuration I see any information about storing/indexing the positions or offsets of the terms. My understanding is that we need to store/index termvectors positions/offsets for proximity search to work. Can someone please tell if positions are indexed by default in Solr 1.4? FYI, Here is the configuration of field in schema.xml (to keep it simple I am only adding fieldType and field definition from schema.xml here) fieldtype class=solr.TextField name=string omitNorms=true sortMissingLast=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.StopFilterFactory enablePositionIncrements=true ignoreCase=true words=stop-words.txt / /analyzer /fieldtype field indexed=true multiValued=false name=contents stored=true type=string / Thanks -kRider - Thanks -K'Rider -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-1-4-Proximity-Search-Where-is-configuration-for-storing-positions-tp4061315.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search performance: shards or replications?
Some clarifications : 1) *lots of docs, few queries* : If you have a high number of documents (+dozen millions) and lowish number of queries per second (say less than 10), replicas will not help to reduce the Qtime. For this kind of task it is better to shard the index, as each query will effectively be processed in parallel by N shards, thus reducing Qtime. 2) *few docs, lots of queries* : less than 10M docs and 30+ qps, on the contrary, you want more replicas to handle more traffic, and avoid overloaded servers (which would increase the Qtime). 3) *lots of docs, lots of queries* : do both sharding and replicas. Actual numbers depends on the hardware, the type of docs and queries, etc. The best is to benchmark your setup varying the load so that you case trace a hockey stick graph of Qtime versus qps. Feel free to ask for details if needed. André On 05/07/2013 01:56 PM, Stanislav Sandalnikov wrote: Hi Yan, Thanks for the quick reply. Thus, replication seems to be the preferable solution. QTime decreases proportional to replications number or there are any other drawbacks? Just to clarify, what amount of documents stands for tons of documents in your opinion? :) 2013/5/7 Jan Høydahljan@cominvent.com Hi, It depends(TM) on what kind of search performance problems you are seeing. If you simply have so high query load that the server starts to kneal, it will definitely not help to shard, since ALL the shards will still be hit with ALL the queries, and you add some extra overhead with sharding as well. But if your QPS is moderate and you have tons of documents, you may gain better performance both for indexing latency and search latency by sharding. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikovs.sandalni...@gmail.com : Hi, We are moving to SolrCloud architecture. And I have question about search performance and its correlation with shards or replicas. What will be more efficient: to split all index we have to several shards or create several replications of index? Is parallel search works with both shards and replicas? Please share your experience regarding this matter. Thanks in advance. Regards, Stanislav -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
custom facet.sort
I have a string field containing values such as 1khz 1ghz 1mhz etc. I use this field to show a facet, currently I'm showing results in facet.sort=count order. Now I'm asked to reorder the facet according to the unit of measure (khz/mhz/ghz). I also have 3/4 other custom sorting to implement Is it possible to plug in a custom java class to provide custom facet.sort modes? Thank you Giovanni
RE: Solr 1.4 - Proximity Search - Where is configuration for storing positions?
Thanks Markus. - Thanks -K'Rider -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-1-4-Proximity-Search-Where-is-configuration-for-storing-positions-tp4061315p4061325.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR query performance
Dear All I am using Apache SOLR 3.6.2 version for my search engine in a job site. I am observing for a solr query taking around 15 seconds to complete. I am sure there is something wrong in my approach or I am doing indexing wrongly. I need assistance/pointer to resolve this issue. I am providing a detail background of work what I have done. Kindly provide me some pointer how do I resolve this. I am using Drupal 7.15 framework for job site. Using Apache solr 3.6.2 as my search engine. When a user registers his profile, I create a node (page), attach the document in that node. In every one hour I run the cron task and do indexing of new nodes or updated nodes. When an employer search some key words say java, mysql, php etc, I use apis provided by Drupal to interact with SOLR and get the documents that contains keywords such as java, mysql, drupal etc. There is a filter rows. If I specify rows as 100 or 200, the query returns first (takes around half second). If I specify rows as 3000, it takes around 15seconds to return. Now, my question is, Is there any mechanism, I can tell to solr that, my start row is X, rows is Y, then it will return search result from Xth row with Y number of rows (Please note that this is similar with LIMIT stuff provided by mysql). Kindly let me know. This will help us to great extent. Best Regards Kamal
Re: Solr Cloud with large synonyms.txt
On May 6, 2013, at 12:32 PM, Son Nguyen s...@trancorp.com wrote: I did some researches on internet and found out that because Zookeeper znode size limit is 1MB. I tried to increase the system property jute.maxbuffer but it won't work. Does anyone have experience of dealing with it? Perhaps hit up the ZK list? They doc it as simply raising jute.maxbuffer, though you have to do it for each ZK instance. - Mark
Re: SOLR query performance
Yes, that's what the 'start' and 'rows' parameters do in the query string. I would check the queries Solr sees when you do that long request. There is usually a delay in retrieving items further down the sorted list, but 15 seconds does feel excessive. http://wiki.apache.org/solr/CommonQueryParameters#start Regards, Alex. On Tue, May 7, 2013 at 10:10 AM, Kamal Palei palei.ka...@gmail.com wrote: Now, my question is, Is there any mechanism, I can tell to solr that, my start row is X, rows is Y, then it will return search result from Xth row with Y number of rows (Please note that this is similar with LIMIT stuff provided by mysql). Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: Solr Cloud with large synonyms.txt
On May 7, 2013, at 10:24 AM, Mark Miller markrmil...@gmail.com wrote: On May 6, 2013, at 12:32 PM, Son Nguyen s...@trancorp.com wrote: I did some researches on internet and found out that because Zookeeper znode size limit is 1MB. I tried to increase the system property jute.maxbuffer but it won't work. Does anyone have experience of dealing with it? Perhaps hit up the ZK list? They doc it as simply raising jute.maxbuffer, though you have to do it for each ZK instance. - Mark the system property must be set on all servers and clients otherwise problems will arise. Make sure you try passing it both to ZK *and* to Solr. - Mark
Get Suggester to return same phrase as query
Hi, I'm using the Suggester component in Solr, and if I search for iPhone 5 the suggestions never give me the same phrase, that is iPhone 5. Is there any way to alter this behaviour to return iPhone 5 as well? A backup option could be to always display what the user has entered in the UI, but I want it to be displayed *only *if there are results for it in Solr, which is only possible if Solr returns the term. Rounak
Re: SOLR query performance
Thanks a lot Alex. I will go and try to make use of start filter and update. Meantime, if I need to know, how many total search records are there. Example: Lets say I am searching key word java. There might be 1000 documents having java keyword. I need to show only 100 records at a time. When I query, as query result I need to know total number of records, and only 100 records data. At the bottom of the web page, I am showing something like *Prev 1234567 8 910 Next* When user clicks, 4, I will set start filter as 300, rows filter as 100 and do the query. As query result, I am expecting row count as 1000, and 100 records data (row number 301 to 400). Is this something possible. Alex, kindly guide me. Thanks kamal On Tue, May 7, 2013 at 7:55 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Yes, that's what the 'start' and 'rows' parameters do in the query string. I would check the queries Solr sees when you do that long request. There is usually a delay in retrieving items further down the sorted list, but 15 seconds does feel excessive. http://wiki.apache.org/solr/CommonQueryParameters#start Regards, Alex. On Tue, May 7, 2013 at 10:10 AM, Kamal Palei palei.ka...@gmail.com wrote: Now, my question is, Is there any mechanism, I can tell to solr that, my start row is X, rows is Y, then it will return search result from Xth row with Y number of rows (Please note that this is similar with LIMIT stuff provided by mysql). Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: SOLR query performance
On 5/7/2013 8:45 AM, Kamal Palei wrote: When user clicks, 4, I will set start filter as 300, rows filter as 100 and do the query. As query result, I am expecting row count as 1000, and 100 records data (row number 301 to 400). This is what using the start and rows parameter with Solr will do. A nitpick: It will be row number 300 to 399 - the first page is accessed with start=0. Requesting 3000 rows (or even a start value of 3000) should not take 15 seconds. You should review this wiki page that I wrote for possible problems with your install: http://wiki.apache.org/solr/SolrPerformanceProblems One thing that is not on the wiki page, I will need to add it: Solr performs best when it is the only thing running on a server. Other applications (like a web server running Drupal) compete for resources. Performance of both Solr and the other applications will suffer. For low-volume installations on really good hardware this may not be a problem, but if your volume is high and/or your server is undersized, then sharing is not a good idea. Thanks, Shawn
Storing positions and offsets vs FieldType IndexOptions DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
I see that Lucene 4.x has FieldInfo.IndexOptions that can be used to tell lucene whether to Index Documents/Frequencies/Positions/Offsets. We are in the process of upgrading from Lucene 2.9 to Lucene 4.x and I was wondering if there was a way to tell lucene whether to index docs/freqs/pos/offsets or not in the older versions (2.9) or did it always index positions and offsets by default? Also I see that Lucene 4.x has FieldType.setStoreTermVectorPositions and FieldType.setStoreTermVectorOffsets. Can someone please tell me a usecase for storing positions and offsets in index? Is it necessary to store termvector positions and offsets when using IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS? Thanks -kRider - Thanks -K'Rider -- View this message in context: http://lucene.472066.n3.nabble.com/Storing-positions-and-offsets-vs-FieldType-IndexOptions-DOCS-AND-FREQS-AND-POSITIONS-AND-OFFSETS-tp4061354.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search performance: shards or replications?
Thank you, everything seems clear. 07.05.2013 20:17 пользователь Andre Bois-Crettez andre.b...@kelkoo.com написал: Some clarifications : 1) *lots of docs, few queries* : If you have a high number of documents (+dozen millions) and lowish number of queries per second (say less than 10), replicas will not help to reduce the Qtime. For this kind of task it is better to shard the index, as each query will effectively be processed in parallel by N shards, thus reducing Qtime. 2) *few docs, lots of queries* : less than 10M docs and 30+ qps, on the contrary, you want more replicas to handle more traffic, and avoid overloaded servers (which would increase the Qtime). 3) *lots of docs, lots of queries* : do both sharding and replicas. Actual numbers depends on the hardware, the type of docs and queries, etc. The best is to benchmark your setup varying the load so that you case trace a hockey stick graph of Qtime versus qps. Feel free to ask for details if needed. André On 05/07/2013 01:56 PM, Stanislav Sandalnikov wrote: Hi Yan, Thanks for the quick reply. Thus, replication seems to be the preferable solution. QTime decreases proportional to replications number or there are any other drawbacks? Just to clarify, what amount of documents stands for tons of documents in your opinion? :) 2013/5/7 Jan Høydahljan@cominvent.com Hi, It depends(TM) on what kind of search performance problems you are seeing. If you simply have so high query load that the server starts to kneal, it will definitely not help to shard, since ALL the shards will still be hit with ALL the queries, and you add some extra overhead with sharding as well. But if your QPS is moderate and you have tons of documents, you may gain better performance both for indexing latency and search latency by sharding. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikovs.sandalnikov@** gmail.com s.sandalni...@gmail.com : Hi, We are moving to SolrCloud architecture. And I have question about search performance and its correlation with shards or replicas. What will be more efficient: to split all index we have to several shards or create several replications of index? Is parallel search works with both shards and replicas? Please share your experience regarding this matter. Thanks in advance. Regards, Stanislav -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: FieldCache insanity with field used as facet and group
: I am using the Lucene FieldCache with SolrCloud and I have insane instances : with messages like: FWIW: I'm the one that named the result of these sanity checks FieldCacheInsantity and i have regretted it ever since -- a better label would have been inconsistency : VALUEMISMATCH: Multiple distinct value objects for : SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)+merchantid : 'SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)'='merchantid',class : org.apache.lucene.index.SortedDocValues,0.5=org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353 : 'SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)'='merchantid',int,null=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713 : 'SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)'='merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713 : : All insane instances are for a field merchantid of type int used as facet : and group field. Interesting: it appears that the grouping code and the facet code are not being consistent in how they are building hte field cache, so you are getting two objects in the cache for each segment I haven't checked if this happens much with the example configs, but if you could: please file a bug with the details of which Solr version you are using along with the schema fieldType filed declarations for your merchantid field, along with the mbean stats output showing the field cache insanity after executing two queries like... /select?q=*:*facet=truefacet.field=merchantid /select?q=*:*group=truegroup.field=merchantid (that way we can rule out your custom SearchComponent as having a bug in it) : This insanity can have performance impact ? : How can I fix it ? the impact is just that more ram is being used them is probably strictly neccessary. unless there is something unusual in your fieldType delcataion, i don't think there is an easy fix you can apply -- we need to fix the underlying code. -Hoss
RE: Solr Cloud with large synonyms.txt
Mark, I tried to set that property on both ZK (I have only one ZK instance) and Solr, but it still didn't work. But I read somewhere that ZK is not really designed for keeping large data files, so this solution - increasing jute.maxbuffer (if I can implement it) should be just temporary. Son -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, May 07, 2013 9:35 PM To: solr-user@lucene.apache.org Subject: Re: Solr Cloud with large synonyms.txt On May 7, 2013, at 10:24 AM, Mark Miller markrmil...@gmail.com wrote: On May 6, 2013, at 12:32 PM, Son Nguyen s...@trancorp.com wrote: I did some researches on internet and found out that because Zookeeper znode size limit is 1MB. I tried to increase the system property jute.maxbuffer but it won't work. Does anyone have experience of dealing with it? Perhaps hit up the ZK list? They doc it as simply raising jute.maxbuffer, though you have to do it for each ZK instance. - Mark the system property must be set on all servers and clients otherwise problems will arise. Make sure you try passing it both to ZK *and* to Solr. - Mark
RE: Solr Cloud with large synonyms.txt
Jan, Thank you for your answer. I've opened a JIRA issue with your suggestion. https://issues.apache.org/jira/browse/SOLR-4793 Son -Original Message- From: Jan Høydahl [mailto:jan@cominvent.com] Sent: Tuesday, May 07, 2013 4:16 PM To: solr-user@lucene.apache.org Subject: Re: Solr Cloud with large synonyms.txt Hi, SolrCloud is designed with an assumption that you should be able to upload your whole disk-based conf folder into ZK, and that you should be able to add an empty Solr node to a cluster and it would download all config from ZK. So immediately a splitting strategy automatically handled by ZkSolresourceLoader for large files could be one way forward, i.e. store synonyms.txt as e.g. __001_synonyms.txt __002_synonyms.txt Feel free to open a JIRA issue for this so we can get a proper resolution. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 7. mai 2013 kl. 09:55 skrev Roman Chyla roman.ch...@gmail.com: We have synonym files bigger than 5MB so even with compression that would be probably failing (not using solr cloud yet) Roman On 6 May 2013 23:09, David Parks davidpark...@yahoo.com wrote: Wouldn't it make more sense to only store a pointer to a synonyms file in zookeeper? Maybe just make the synonyms file accessible via http so other boxes can copy it if needed? Zookeeper was never meant for storing significant amounts of data. -Original Message- From: Jan Høydahl [mailto:jan@cominvent.com] Sent: Tuesday, May 07, 2013 4:35 AM To: solr-user@lucene.apache.org Subject: Re: Solr Cloud with large synonyms.txt See discussion here http://lucene.472066.n3.nabble.com/gt-1MB-file-to-Zookeeper-td3958614 .html One idea was compression. Perhaps if we add gzip support to SynonymFilter it can read synonyms.txt.gz which would then fit larger raw dicts? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 6. mai 2013 kl. 18:32 skrev Son Nguyen s...@trancorp.com: Hello, I'm building a Solr Cloud (version 4.1.0) with 2 shards and a Zookeeper (the Zookeeer is on different machine, version 3.4.5). I've tried to start with a 1.7MB synonyms.txt, but got a ConnectionLossException: Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /configs/solr1/synonyms.txt at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266) at org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java :270) at org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java :267) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecut or.java :65) at org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:2 67) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java: 436) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java: 315) at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1135) at org.apache.solr.cloud.ZkController.uploadConfigDir(ZkController.java: 955) at org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:2 85) ... 43 more I did some researches on internet and found out that because Zookeeper znode size limit is 1MB. I tried to increase the system property jute.maxbuffer but it won't work. Does anyone have experience of dealing with it? Thanks, Son
stats cache
Hi I am computing lots of stats as part of a query… looks like the solr caching is not helping here… Does solr caches stats of a query? ./zahoor
facet.pivot limit
Hi is there a limit for facet pivot like we have in facet.limit? ./zahoor
Re: Solr Cloud with large synonyms.txt
I'm not so worried about the large file in zk issue myself. The concern is that you start storing and accessing lots of large files in ZK. This is not what it was made for, and everything stays in RAM, so they guard against this type of usage. We are talking about a config file that is loaded on Core load though. It's uploaded and read very rarely. On modern hardware and networks, making that file 5MB rather than 1MB is not going to ruin your day. It just won't. Solr does not use ZooKeeper heavily - in a steady state cluster, it doesn't read or write from ZooKeeper at all to any degree that registers. I'm going to have to see problems loading these larger config files from ZooKeeper before I'm worried that it's a problem. - Mark On May 7, 2013, at 12:21 PM, Son Nguyen s...@trancorp.com wrote: Mark, I tried to set that property on both ZK (I have only one ZK instance) and Solr, but it still didn't work. But I read somewhere that ZK is not really designed for keeping large data files, so this solution - increasing jute.maxbuffer (if I can implement it) should be just temporary. Son -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, May 07, 2013 9:35 PM To: solr-user@lucene.apache.org Subject: Re: Solr Cloud with large synonyms.txt On May 7, 2013, at 10:24 AM, Mark Miller markrmil...@gmail.com wrote: On May 6, 2013, at 12:32 PM, Son Nguyen s...@trancorp.com wrote: I did some researches on internet and found out that because Zookeeper znode size limit is 1MB. I tried to increase the system property jute.maxbuffer but it won't work. Does anyone have experience of dealing with it? Perhaps hit up the ZK list? They doc it as simply raising jute.maxbuffer, though you have to do it for each ZK instance. - Mark the system property must be set on all servers and clients otherwise problems will arise. Make sure you try passing it both to ZK *and* to Solr. - Mark
Re: stats cache
Hi, Yes, in the query cache. You should see it in your monitoring tool or your Solr Stats Admin page. Doesn't help if queries don't repeat or cache settings and poor. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html SOLR Performance Monitoring - http://sematext.com/spm/index.html On Tue, May 7, 2013 at 12:48 PM, J Mohamed Zahoor zah...@indix.com wrote: Hi I am computing lots of stats as part of a query… looks like the solr caching is not helping here… Does solr caches stats of a query? ./zahoor
Use case for storing positions and offsets in index?
Can someone please tell me the usecase for storing term positions and offsets in the index? I am trying to understand the difference between storing positions/offsets vs indexing positions/offsets. Thanks KR - Thanks -K'Rider -- View this message in context: http://lucene.472066.n3.nabble.com/Use-case-for-storing-positions-and-offsets-in-index-tp4061376.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete from Solr Cloud 4.0 index..
bq: Will docValues help with memory usage? 'm still a bit fuzzy on all the ramifications of DocValues, but I somewhat doubt they'll result in index size savings, they _really_ help with loading the values for a field, but the end result is still the values in memory People who know what they're talking about, _please_ correct this if I'm off base. Sure, stored field compression will help with disk space, no question. I was mostly cautioning against extrapolating from disk size to memory requirements without taking this into account. Best Erick Best Erick On Tue, May 7, 2013 at 6:46 AM, Annette Newton annette.new...@servicetick.com wrote: Hi Erick, Thanks for the tip. Will docValues help with memory usage? It seemed a bit complicated to set up.. The index size saving was nice because that means that potentially I could use smaller provisioned IOP volumes which cost less... Thanks. On 3 May 2013 18:27, Erick Erickson erickerick...@gmail.com wrote: Anette: Be a little careful with the index size savings, they really don't mean much for _searching_. The sotred field compression significantly reduces the size on disk, but only for the stored data which is only accessed when returning the top N docs. In terms of how many docs you can fit on your hardware, it's pretty irrelevant. The *.fdt and *.fdx files in your index directory contain the stored data, so when looking at the effects of various options (including compression), you can pretty much ignore these files. FWIW, Erick On Fri, May 3, 2013 at 2:03 AM, Annette Newton annette.new...@servicetick.com wrote: Thanks Shawn. I have played around with Soft Commits before and didn't seem to have any improvement, but with the current load testing I am doing I will give it another go. I have researched docValues and came across the fact that it would increase the index size. With the upgrade to 4.2.1 the index size has reduced by approx 33% which is pleasing and I don't really want to lose that saving. We do use the facet.enum method - which works really well, but I will verify that we are using that in every instance, we have numerous developers working on the product and maybe one or two have slipped through. Right from the first I upped the zkClientTimeout to 30 as I wanted to give extra time for any network blips that we experience on AWS. We only seem to drop communication on a full garbage collection though. I am coming to the conclusion that we need to have more shards to cope with the writes, so I will play around with adding more shards and see how I go. I appreciate you having a look over our setup and the advice. Thanks again. Netty. On 2 May 2013 23:17, Shawn Heisey s...@elyograg.org wrote: On 5/2/2013 4:24 AM, Annette Newton wrote: Hi Shawn, Thanks so much for your response. We basically are very write intensive and write throughput is pretty essential to our product. Reads are sporadic and actually is functioning really well. We write on average (at the moment) 8-12 batches of 35 documents per minute. But we really will be looking to write more in the future, so need to work out scaling of solr and how to cope with more volume. Schema (I have changed the names) : http://pastebin.com/x1ry7ieW Config: http://pastebin.com/pqjTCa7L This is very clean. There's probably more you could remove/comment, but generally speaking I couldn't find any glaring issues. In particular, you have disabled autowarming, which is a major contributor to commit speed problems. The first thing I think I'd try is increasing zkClientTimeout to 30 or 60 seconds. You can use the startup commandline or solr.xml, I would probably use the latter. Here's a solr.xml fragment that uses a system property or a 15 second default: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true sharedLib=lib cores adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000} hostPort=${jetty.port:} hostContext=solr General thoughts, these changes might not help this particular issue: You've got autoCommit with openSearcher=true. This is a hard commit. If it were me, I would set that up with openSearcher=false and either do explicit soft commits from my application or set up autoSoftCommit with a shorter timeframe than autoCommit. This might simply be a scaling issue, where you'll need to spread the load wider than four shards. I know that there are financial considerations with that, and they might not be small, so let's leave that alone for now. The memory problems might be a symptom/cause of the scaling issue I just mentioned. You said you're using facets, which can be a real memory hog even with only a few of them. Have you tried facet.method=enum to see how it performs? You'd need to switch to it exclusively, never go with the default of fc. You
Re: Rearranging Search Results of a Search?
No, DocTransformers work on a single document at a time, which is pretty clear if you look at the methods you must implement. Really, you'd do yourself a favor by doing a little more research before asking questions, you might review: http://wiki.apache.org/solr/UsingMailingLists and consider that most of us are volunteers with limited time. So a little evidence that you're putting forth some effort before pinging the list would be well received. Best Erick On Tue, May 7, 2013 at 4:04 AM, Furkan KAMACI furkankam...@gmail.com wrote: Can I use Transformers for my purpose? 2013/5/3 Furkan KAMACI furkankam...@gmail.com I think this looks like what I search for: https://issues.apache.org/jira/browse/SOLR-4465 How about post filter for Lucene, can it help me for my purpose? 2013/5/3 Otis Gospodnetic otis.gospodne...@gmail.com Hi, You should use search more often :) http://search-lucene.com/?q=scriptable+collectorsort=newestOnTopfc_project=Solrfc_type=issue Coincidentally, what you see there happens to be a good example of a Solr component that does something behind the scenes to deliver those search results even though my original query was bad. Knd of similar to what you are after. Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, May 2, 2013 at 4:47 PM, Furkan KAMACI furkankam...@gmail.com wrote: I know that I can use boosting at query for a field, for a searching term, at solrconfig.xml and query elevator so I can arrange the results of a search. However after I get top documents how can I change the order of a results? Does Lucene's postfilter stands for that?
Re: Lazy load Error on UI analysis area
It looks like you have old jars in the classpath somewhere, class not found just shouldn't be happening. If this can be reproduced on a fresh install (and even better on a machine that's never had Solr installed) it would be something we'd need to pursue... Best Erick On Tue, May 7, 2013 at 6:56 AM, yriveiro yago.rive...@gmail.com wrote: Hi, I was exploring the UI interface and in the analysis section I had a lazy load error. The logs says: INFO - 2013-05-07 11:52:06.412; org.apache.solr.core.SolrCore; [] webapp=/solr path=/admin/luke params={_=1367923926380show=schemawt=json} status=0 QTime=23 ERROR - 2013-05-07 11:52:06.499; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:258) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.solr.FieldAnalysisRequestHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:464) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:592) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249) ... 20 more Caused by: java.lang.ClassNotFoundException: solr.solr.FieldAnalysisRequestHandler at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:266) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448) ... 24 more - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Lazy-load-Error-on-UI-analysis-area-tp4061291.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Unsubscribing from JIRA
For someone link me, who want to follow dev discussions but not JIRA, having a separate mailing list subscription for each would be ideal. The incoming mail traffic would be cut drastically (for me, I get far more non relevant emails from JIRA vs. dev). -- MJ -Original Message- From: Raymond Wiker [mailto:rwi...@gmail.com] Sent: Wednesday, May 01, 2013 2:01 PM To: solr-user@lucene.apache.org Subject: Re: Unsubscribing from JIRA On May 1, 2013,at 19:07 , johnmunir@aol.comwrote: Are yousaying because I'm subscribed to dev, which I'm, is why I'm getting JIRA mailstoo, and the only way I can stop JIRA mails is to unsubscribe from dev? I don't think so. I'm subscribed to other projects, both devand user, and yet I do not receive JIRA mails. I'm pretty surethat's the case... I subscribed to dev, and got the JIRA mails. I unsubscribedfrom dev, and the JIRA mails stopped.
Re: Get Suggester to return same phrase as query
Hmmm, R. Muir did some work here: https://issues.apache.org/jira/browse/SOLR-3143, note that it's 4.0 or later. I haven't implemented this, but this is a common problem so if you do dig into it and get it to work (warning, I haven't a clue) it'd be a great contribution to the Wiki. Best Erick On Tue, May 7, 2013 at 10:41 AM, Rounak Jain rouna...@gmail.com wrote: Hi, I'm using the Suggester component in Solr, and if I search for iPhone 5 the suggestions never give me the same phrase, that is iPhone 5. Is there any way to alter this behaviour to return iPhone 5 as well? A backup option could be to always display what the user has entered in the UI, but I want it to be displayed *only *if there are results for it in Solr, which is only possible if Solr returns the term. Rounak
Re: Unsubscribing from JIRA
Email filters? I mean, you may have a point, but the cost of change at this moment is probably too high. Personal email filters, on the other hand, seems like an easy solution. Regards, Alex. On Tue, May 7, 2013 at 2:01 PM, johnmu...@aol.com wrote: For someone link me, who want to follow dev discussions but not JIRA, having a separate mailing list subscription for each would be ideal. The incoming mail traffic would be cut drastically (for me, I get far more non relevant emails from JIRA vs. dev). Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Search identifier fields containing blanks
Hello, I am about to index identfier fields containing blanks (shelfmarks) eg. G 23/60 12 The field type is set to Solr.string. To get the exact matching hit (the doc with shelfmark mentioned above) the user must quote the search term. Is there a way to omit the quotes? Best, Silvio
Re: Storing positions and offsets vs FieldType IndexOptions DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
On 5/7/2013 9:50 AM, KnightRider wrote: I see that Lucene 4.x has FieldInfo.IndexOptions that can be used to tell lucene whether to Index Documents/Frequencies/Positions/Offsets. I really don't like giving unhelpful responses like this, but I don't think there's any other way to go. This is the solr-user mailing list. Most of the end-users here (and a few of the regulars, including myself) have very little experience with Lucene, even though Solr is a Lucene application and the source code is part of Lucene. There are a number of lucene-specific discussion places available: http://lucene.apache.org/core/discussion.html Thanks, Shawn
dataimport handler
In the data import handler I have multiple entities. Each one generates a date in the dataimport.properties i.e. entityname.last_index_time. How do I reference the specific entity time in my delta queries? Thanks Eric
Re: solr.LatLonType type vs solr.SpatialRecursivePrefixTreeFieldType
Hi Barani, This identical question was posed at the same time on StackOverflow, and I answered it there already: http://stackoverflow.com/questions/16407110/solr-4-2-solr-latlontype-type-v s-solr-spatialrecursiveprefixtreefieldtype/16409327#16409327 ~ David On 5/6/13 12:28 PM, bbarani bbar...@gmail.com wrote: Hi, I am currently using SOLR 4.2 to index geospatial data. I have configured my geospatial field as below. fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/ field name=latlong type=location indexed=true stored=false multiValued=true/ I just want to make sure that I am using the correct SOLR class for performing geospatial search since I am not sure which of the 2 class(LatLonType vs SpatialRecursivePrefixTreeFieldType) will be supported by future versions of SOLR. I assume latlong is an upgraded version of SpatialRecursivePrefixTreeFieldType, can someone please confirm if I am right? Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/solr-LatLonType-type-vs-solr-SpatialRec ursivePrefixTreeFieldType-tp4061113.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ConcurrentUpdateSolrServer Missing ContentType error on SOLR 4.2.1
This is resolved, I switched in the 4.2.1 jars and also corrected a mismatch between the compile and runtime JDKs, for some reason the system was overriding my JAVA_HOME setting (6.1) and running the client with a 5.0 JVM. I did not have to use setParser. I did try running the 'new' 4.2.1 SolrJ client against SOLR 3.6 and got this error in the server log: 2013-05-07 16:14:34,835 WARN [org.apache.solr.handler.XmlUpdateRequestHandler] (http-0.0.0.0-18841-Processor15) Unknown attribute doc/field/@update so I've settled for separate 3.6 and 4.2.1 versions. Your info helped a lot, thanks Shawn. DK -- View this message in context: http://lucene.472066.n3.nabble.com/ConcurrentUpdateSolrServer-Missing-ContentType-error-on-SOLR-4-2-1-tp4061160p4061416.html Sent from the Solr - User mailing list archive at Nabble.com.
Storing and retrieving Objects using ByteField
Hi I need to store and retrieve some custom java objects using Solr and I have used ByteField and java serialisation for this. Using the embedded jetty server I can see these byte data but when I use Solrj api to retrieve the data they are not available. Details are below: My schema: -- field name=id type=string indexed=true stored=true required=true multiValued=false / field name=value type=byte indexed=false stored=true multiValued=true omitNorms=true/ fieldtype name=byte class=solr.ByteField/ My query using jetty embedded solr server: -- http://localhost:8983/solr/collection1/select?q=id:1843921115wt=xmlindent=true - And I can see the following results in the browser: - esult name=response numFound=1 start=0 doc str name=id1843921115/str arr name=value strrO0ABXNyABNqYXZhLnV0aWwuQXJyYX..blahblahblah/str /arr long name=_version_1434407268842995712/long /doc - So it looks like the data are created properly. However, when I use SolrJ to retrieve this record like this: --- ModifiableSolrParams params = new ModifiableSolrParams(); params.set(q, id:1843921115); QueryResponse response = server.query(params); SolrDocument doc = response.getResults().get(0); if(doc.getFieldValues(value)==null) System.out.println(data unavailable); --- I can see that doc only have two fields: id and version, and the field value is never available. Please suggestions what have I done wrong? Many thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Storing-and-retrieving-Objects-using-ByteField-tp4061418.html Sent from the Solr - User mailing list archive at Nabble.com.
Index compatibility between Solr releases.
We have a fairly large (in the order of 10s of TB) indices built using Solr 3.5. We are considering migrating to Solr 4.3 and was wondering what the policy is on maintaining backward compatibility of the indices? Will 4.3 work with my 3.5 indexes? Because of the large data size, I would ideally like to move new data to 4.3 and gradually re-index all the 3.5 indices. Thanks, - Skand.
Index compatibility between Solr releases.
We have a fairly large (in the order of 10s of TB) indices built using Solr 3.5. We are considering migrating to Solr 4.3 and was wondering what the policy is on maintaining backward compatibility of the indices? Will 4.3 work with my 3.5 indexes? Because of the large data size, I would ideally like to move new data to 4.3 and gradually re-index all the 3.5 indices. Thanks, - Skand.
Re: Index compatibility between Solr releases.
On 5/7/2013 3:11 PM, Skand Gupta wrote: We have a fairly large (in the order of 10s of TB) indices built using Solr 3.5. We are considering migrating to Solr 4.3 and was wondering what the policy is on maintaining backward compatibility of the indices? Will 4.3 work with my 3.5 indexes? Because of the large data size, I would ideally like to move new data to 4.3 and gradually re-index all the 3.5 indices. Solr 4.x will read 3.x indexes with no problem. When Solr 5.x comes out, it will read 4.x indexes, but it will not read 3.x indexes. If the 4.x server does any updates on a 3.x index, it will write new segments in the new format, and if existing segments get merged, they will be in the new format. If you do an optimize in that situation, which would take forever with terabytes of data, Solr would convert the index format. Reindexing is MUCH better, but you've already stated that as a goal, so I won't mention any more about that. Due to advances and bugfixes, you might see some unusual behavior until you reindex. This happens due to changes in the way analyzers and query parsers work as compared to the way things worked on 3.5 when you built the index. The more complicated your analyzer chains are in your schema, the more likely you are to run into this. One thing that might be of immediate concern - in 4.0 and later, the forward slash is a special query character and must be escaped with a backslash. It is safe to send this escaped character to 3.5 as well. The utility method in SolrJ for escaping queries (ClientUtils#escapeQueryChars) has been updated to include the foward slash in newer SolrJ versions. Thanks, Shawn
Re: dataimport handler
Using ${dih.entity_name.last_index_time} should work. Make sure you put it in quotes in your query. On Tue, May 7, 2013 at 12:07 PM, Eric Myers emy...@nabancard.com wrote: In the data import handler I have multiple entities. Each one generates a date in the dataimport.properties i.e. entityname.last_index_time. How do I reference the specific entity time in my delta queries? Thanks Eric -- Regards, Shalin Shekhar Mangar.
Re: stats cache
On Tue, May 7, 2013 at 12:48 PM, J Mohamed Zahoor zah...@indix.com wrote: Hi I am computing lots of stats as part of a query… looks like the solr caching is not helping here… Does solr caches stats of a query? No. Neither facet counts or stats part of a request are cached. The query cache only caches top N docs (plus scores if applicable) for a given query + filters. If the whole request is identical, then you can use an HTTP caching mechanism though. -Yonik http://lucidworks.com
Re: Index compatibility between Solr releases.
Thank you Shawn. This was detailed and very helpful. Skand. On May 7, 2013, at 5:54 PM, Shawn Heisey s...@elyograg.org wrote: On 5/7/2013 3:11 PM, Skand Gupta wrote: We have a fairly large (in the order of 10s of TB) indices built using Solr 3.5. We are considering migrating to Solr 4.3 and was wondering what the policy is on maintaining backward compatibility of the indices? Will 4.3 work with my 3.5 indexes? Because of the large data size, I would ideally like to move new data to 4.3 and gradually re-index all the 3.5 indices. Solr 4.x will read 3.x indexes with no problem. When Solr 5.x comes out, it will read 4.x indexes, but it will not read 3.x indexes. If the 4.x server does any updates on a 3.x index, it will write new segments in the new format, and if existing segments get merged, they will be in the new format. If you do an optimize in that situation, which would take forever with terabytes of data, Solr would convert the index format. Reindexing is MUCH better, but you've already stated that as a goal, so I won't mention any more about that. Due to advances and bugfixes, you might see some unusual behavior until you reindex. This happens due to changes in the way analyzers and query parsers work as compared to the way things worked on 3.5 when you built the index. The more complicated your analyzer chains are in your schema, the more likely you are to run into this. One thing that might be of immediate concern - in 4.0 and later, the forward slash is a special query character and must be escaped with a backslash. It is safe to send this escaped character to 3.5 as well. The utility method in SolrJ for escaping queries (ClientUtils#escapeQueryChars) has been updated to include the foward slash in newer SolrJ versions. Thanks, Shawn
Index corrupted detection from http get command.
Hello, I'm look for a way to detect solr index corruption using a http get command. I've look at the /admin/ping and /admin/luke request handlers but not sure if the their status provide guarantees that everything is all right. The idea is to be able to tell a load balancer to put a given solr instance out of rotation if its index is corrupted. Thanks Michel
Re: Storing positions and offsets vs FieldType IndexOptions DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
Thanks Shawn. I'll reach out to Lucene discussion group. - Thanks -K'Rider -- View this message in context: http://lucene.472066.n3.nabble.com/Storing-positions-and-offsets-vs-FieldType-IndexOptions-DOCS-AND-FREQS-AND-POSITIONS-AND-OFFSETS-tp4061354p4061457.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Questions about the performance of Solr
Thank you. However, fq is already in use. In my opinion, it is to think that it might be slow data of 70 million reviews is contained in the core of one, but do you have examples of performance of a certain number or more may decrease maybe? -- View this message in context: http://lucene.472066.n3.nabble.com/Questions-about-the-performance-of-Solr-tp4060988p4061461.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search identifier fields containing blanks
: I am about to index identfier fields containing blanks (shelfmarks) eg. G : 23/60 12 : The field type is set to Solr.string. To get the exact matching hit (the doc : with shelfmark mentioned above) the user must quote the search term. Is there : a way to omit the quotes? whitespace has to be quoted when using the lucene QParser because it's a semanticly significant character that means end boolean query clause if you want to search for a literal string w/o needing any escaping, use the term QParser... {!term f=yourFieldName}G 23/60 12 Of course, if you are putting this in a URL (ie: testing in a browser) it still needs to be URL escaped... /select?q={!term+f=yourFieldName}G+23/60+12 -Hoss
Re: Unsubscribing from JIRA
: Email filters? I mean, you may have a point, but the cost of change at : this moment is probably too high. Personal email filters, on the other : hand, seems like an easy solution. The reason for having Jira notifications go to the devs list is that all of the comments discussion in jira are the bulk of the discussion about developing Solr/Lucene. The goal is to make it easy to subscribe to one list and then be notified about eveyrthing related to the development efforts. As mentioned in a previous comment, the appropraite place to suggest policy changes to the dev list would be on the dev list -- but Alex's comment is probably what you are going to hear from most people. -Hoss
Re: Search identifier fields containing blanks
On Wed, May 8, 2013, at 02:07 AM, Chris Hostetter wrote: : I am about to index identfier fields containing blanks (shelfmarks) eg. G : 23/60 12 : The field type is set to Solr.string. To get the exact matching hit (the doc : with shelfmark mentioned above) the user must quote the search term. Is there : a way to omit the quotes? whitespace has to be quoted when using the lucene QParser because it's a semanticly significant character that means end boolean query clause if you want to search for a literal string w/o needing any escaping, use the term QParser... {!term f=yourFieldName}G 23/60 12 Of course, if you are putting this in a URL (ie: testing in a browser) it still needs to be URL escaped... /select?q={!term+f=yourFieldName}G+23/60+12 I'm surprised you didn't offer the improvement (a technique I learned from you..:-) ): /select?q={!term f=yourFieldName v=$productCode}productCode=G 23/60 12 which allows you to present the code as a separate request parameter. Upayavira
Re: Scores dilemma after providing boosting with bq as same weigtage for 2 condition
ab_1eb83ef9bc0896: 0.17063755 = (MATCH) sum of: 3.085E-4 = (MATCH) MatchAllDocsQuery, product of: 3.085E-4 = queryNorm 0.009742409 = (MATCH) product of: 0.019484818 = (MATCH) sum of: 0.016588148 = (MATCH) sum of: 0.0034696688 = (MATCH) weight(articleTopic:Food^1.2 in 2441) [DefaultSimilarity], result of: 0.0034696688 = score(doc=2441,freq=1.0 = termFreq=1.0 ), product of: 0.0012905049 = queryWeight, product of: 1.2 = boost 2.6886134 = idf(docFreq=52556, maxDocs=284437) 3.085E-4 = queryNorm 2.6886134 = fieldWeight in 2441, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 2.6886134 = idf(docFreq=52556, maxDocs=284437) 1.0 = fieldNorm(doc=2441) 0.013118479 = (MATCH) weight(articleTopic:Office^1.2 in 2441) [DefaultSimilarity], result of: 0.013118479 = score(doc=2441,freq=1.0 = termFreq=1.0 ), product of: 0.0025093278 = queryWeight, product of: 1.2 = boost 5.2278857 = idf(docFreq=4147, maxDocs=284437) 3.085E-4 = queryNorm 5.2278857 = fieldWeight in 2441, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.2278857 = idf(docFreq=4147, maxDocs=284437) 1.0 = fieldNorm(doc=2441) 7.967604E-4 = (MATCH) product of: 0.0051789423 = (MATCH) sum of: 0.0017619515 = (MATCH) weight(subTopic:Protein in 2441) [DefaultSimilarity], result of: 0.0017619515 = score(doc=2441,freq=1.0 = termFreq=1.0 ), product of: 4.5981447E-4 = queryWeight, product of: 3.8318748 = idf(docFreq=16753, maxDocs=284437) 1.1999726E-4 = queryNorm 3.8318748 = fieldWeight in 2441, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.8318748 = idf(docFreq=16753, maxDocs=284437) 1.0 = fieldNorm(doc=2441) 0.0034169909 = (MATCH) weight(subTopic:Printers in 2441) [DefaultSimilarity], result of: 0.0034169909 = score(doc=2441,freq=1.0 = termFreq=1.0 ), product of: 5.019797E-4 = queryWeight, product of: 0.3 = boost 4.18326 = idf(docFreq=11789, maxDocs=284437) 3.085E-4 = queryNorm 4.18326 = fieldWeight in 2441, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.18326 = idf(docFreq=11789, maxDocs=284437) 1.0 = fieldNorm(doc=2441) 0.5 = coord(3/6) 0.16049515 = (MATCH) FunctionQuery(0.08/(3.16E-11*float(ms(const(136779840),date(pubDate)))+0.5)), product of: 0.16049883 = 0.08/(3.16E-11*float(ms(const(136779840),date(pubDate)=1367847578000))+0.5) 2500.0 = boost 3.085E-4 = queryNorm , . . . .. This is the explain description for the score coming upbut not coming in easier understandable format...any pointer would be helpful, meantime looking into it to understand more -- View this message in context: http://lucene.472066.n3.nabble.com/Scores-dilemma-after-providing-boosting-with-bq-as-same-weigtage-for-2-condition-tp4061035p4061480.html Sent from the Solr - User mailing list archive at Nabble.com.