Re: stress testing Solr 4.x
Hi Mark, Usually I was stopping them with ctrl-c but several times, one of the servers was hung and had to be stopped with kill -9. Thanks, Alain On Mon, Dec 10, 2012 at 5:09 AM, Mark Miller markrmil...@gmail.com wrote: Hmmm...EOF on the segments file is odd... How were you killing the nodes? Just stopping them or kill -9 or what? - Mark On Sun, Dec 9, 2012 at 1:37 PM, Alain Rogister alain.rogis...@gmail.com wrote: Hi, I have re-ran my tests today after I updated Solr 4.1 to apply the patch. First, the good news : it works i.e. if I stop all three Solr servers and then restart one, it will try to find the other two for a while (about 3 minutes I think) then give up, become the leader and start processing requests. Now, the not-so-good : I encountered several exceptions that seem to indicate 2 other issues. Here are the relevant bits. 1) The ZK session expiry problem : not sure what caused it but I did a few Solr or ZK node restarts while the system was under load. SEVERE: There was a problem finding the leader in zk:org.apache.solr.common.SolrException: Could not get leader props at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:732) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:696) at org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1095) at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:265) at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84) at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184) at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/adressage/leaders/shard1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63) at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:710) ... 10 more SEVERE: :org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /overseer/queue/qn- at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:210) at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:207) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63) at org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:207) at org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:229) at org.apache.solr.cloud.ZkController.publish(ZkController.java:824) at org.apache.solr.cloud.ZkController.publish(ZkController.java:797) at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:258) at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84) at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184) at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2) Data corruption of 1 core on 2 out of 3 Solr servers. This core failed to start due to the exceptions below and both servers went into a seemingly endless loop of exponential retries. The fix was to stop both faulty servers, remove the data directory of this core and restart : replication then took place correctly. As above, not sure what exactly caused this to happen; no updates were taking place, only searches. On server 1 : INFO: Closing directory:/Users/arogister/Dev/apache-solr-4.1-branch/solr
Re: stress testing Solr 4.x
Great, thanks Mark ! I'll test the fix and post my results. Alain On Saturday, December 8, 2012, Mark Miller wrote: After some more playing around on 5x I have duplicated the issue. I'll file a JIRA issue for you and fix it shortly. - Mark On Dec 8, 2012, at 8:43 AM, Mark Miller markrmil...@gmail.com wrote: Hmm…I've tried to replicate what looked like a bug from your report (3 Solr servers stop/start ), but on 5x it works no problem for me. It shouldn't be any different on 4x, but I'll try that next. In terms of starting up Solr without a working ZooKeeper ensemble - it won't work currently. Cores won't be able to register with ZooKeeper and will fail loading. It would probably be nicer to come up in search only mode and keep trying to reconnect to zookeeper - file a JIRA issue if you are interested. On the zk data dir, see http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup - Mark On Dec 7, 2012, at 10:22 PM, Mark Miller markrmil...@gmail.com wrote: Hey, I'll try and answer this tomorrow. There is a def an unreported bug in there that needs to be fixed for the restarting the all nodes case. Also, a 404 one is generally when jetty is starting or stopping - there are points where 404's can be returned. I'm not sure why else you'd see one. Generally we do retries when that happens. - Mark On Dec 7, 2012, at 1:07 PM, Alain Rogister alain.rogis...@gmail.com wrote: I am reporting the results of my stress tests against Solr 4.x. As I was getting many error conditions with 4.0, I switched to the 4.1 trunk in the hope that some of the issues would be fixed already. Here is my setup : - Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I realize this is not representative of a production environment but it's a fine way to find out what happens under resource-constrained conditions. - 3 Solr servers, 3 cores (2 of which are very small, the third one has 410 MB of data) - single shard - 3 Zookeeper instances - HAProxy load balancing requests across Solr servers - JMeter or ApacheBench running the tests : 5 thread pools of 20 threads each, sending search requests continuously (no updates) In nominal conditions, it all works fine i.e. it can process a million requests, maxing out the CPUs at all time, without experiencing nasty failures. There are errors in the logs about replication failures though; they should be benigne in this case as no updates are taking place but it's hard to tell what is going on exactly. Example : Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr exception talking to http://192.168.0.101:8985/solr/adressage/, failed org.apache.solr.common.SolrException: Server at http://192.168.0.101:8985/solr/adressage returned non ok status:404, message:Not Found at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.
Re: Loading data to SOLR first time ( taking too long)
Are you loading data from multiple tables ? How many levels deep ? After some experimenting, I gave up on the DIH because I found it to generate very chatty (one row at a time) SQL against my schema, and I experienced concurrency bugs unless multithreading was set to false, and I wasn't too confident in the incremental mode against a complex schema. Here is what worked for us (with Oracle): - create materialized views; make sure that you include a 'lastUpdateTime' field in the main table. This step may be unnecessary if your source data does not need any pre-processing / cleaning / reorganizing. - write a stored procedure that exports the data in Solr's XML format; parameterize it with a range of primary keys of your main table so that you can partition the export into manageable subsets. The XML format is very simple, no need for complex in-the-database XML functions to generate it. - use the database scheduler to run that procedure as a set of jobs; run a few of them in parallel. - use CURL or WGET or similar to feed the XML files into the index as soon as they are available. - compress and archive the XML files; they will come handy when you need to provision another index instance and will save you a lot of exporting time. - make sure your stored procedure can work in incremental mode: e.g. export all records updated after a certain timestamp; then just push the resulting XML into Solr. Alain On Tue, Oct 25, 2011 at 9:56 PM, Awasthi, Shishir shishir.awas...@baml.comwrote: Hi, I recently started working on SOLR and loaded approximately 4 million records to the solr using DataImportHandler. It took 5 days to complete this process. Can you please suggest how this can be improved? I would like this to be done in less than 6 hrs. Thanks, Shishir -- This message w/attachments (message) is intended solely for the use of the intended recipient(s) and may contain information that is privileged, confidential or proprietary. If you are not an intended recipient, please notify the sender, and then please delete and destroy all copies and attachments, and be advised that any review or dissemination of, or the taking of any action in reliance on, the information contained in or attached to this message is prohibited. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Sender. Subject to applicable law, Sender may intercept, monitor, review and retain e-communications (EC) traveling through its networks/systems and may produce any such EC to regulators, law enforcement, in litigation and as required by law. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or free of errors or viruses. References to Sender are references to any subsidiary of Bank of America Corporation. Securities and Insurance Products: * Are Not FDIC Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a Condition to Any Banking Service or Activity * Are Not Insured by Any Federal Government Agency. Attachments that are part of this EC may have additional important disclosures and disclaimers, which you should read. This message is subject to terms available at the following link: http://www.bankofamerica.com/emaildisclaimer. By messaging with Sender you consent to the foregoing.
Re: Loading data to SOLR first time ( taking too long)
Sishir, I believe our main table has about half a million rows, which isn't a lot but it has multiple dependent tables, several levels deep. The resulting XML files were about 1 GB in total, split into around 15 files. We could feed these files one at a time into Solr in as little as a few seconds per file (tens of seconds on a slow machine), much less that the database export actually. In your case, it may be the join that is slowing things down in the DIH. Depending on your schema, you *may* be able to write the DIH query differently, or you could create a [materialized] view and use it in the DIH query. Alain On Tue, Oct 25, 2011 at 10:50 PM, Awasthi, Shishir shishir.awas...@baml.com wrote: Alain, How many rows did you export in this fashion and what was the performance? We do have oracle as underlying database with data obtained from multiple tables. The data is only 1 level deep except for one table where we need to traverse hierarchy to get information. How many XML files did you feed into SOLR one at a time? Shishir -Original Message- From: Alain Rogister [mailto:alain.rogis...@gmail.com] Sent: Tuesday, October 25, 2011 4:28 PM To: solr-user@lucene.apache.org Subject: Re: Loading data to SOLR first time ( taking too long) Are you loading data from multiple tables ? How many levels deep ? After some experimenting, I gave up on the DIH because I found it to generate very chatty (one row at a time) SQL against my schema, and I experienced concurrency bugs unless multithreading was set to false, and I wasn't too confident in the incremental mode against a complex schema. Here is what worked for us (with Oracle): - create materialized views; make sure that you include a 'lastUpdateTime' field in the main table. This step may be unnecessary if your source data does not need any pre-processing / cleaning / reorganizing. - write a stored procedure that exports the data in Solr's XML format; parameterize it with a range of primary keys of your main table so that you can partition the export into manageable subsets. The XML format is very simple, no need for complex in-the-database XML functions to generate it. - use the database scheduler to run that procedure as a set of jobs; run a few of them in parallel. - use CURL or WGET or similar to feed the XML files into the index as soon as they are available. - compress and archive the XML files; they will come handy when you need to provision another index instance and will save you a lot of exporting time. - make sure your stored procedure can work in incremental mode: e.g. export all records updated after a certain timestamp; then just push the resulting XML into Solr. Alain On Tue, Oct 25, 2011 at 9:56 PM, Awasthi, Shishir shishir.awas...@baml.comwrote: Hi, I recently started working on SOLR and loaded approximately 4 million records to the solr using DataImportHandler. It took 5 days to complete this process. Can you please suggest how this can be improved? I would like this to be done in less than 6 hrs. Thanks, Shishir -- This message w/attachments (message) is intended solely for the use of the intended recipient(s) and may contain information that is privileged, confidential or proprietary. If you are not an intended recipient, please notify the sender, and then please delete and destroy all copies and attachments, and be advised that any review or dissemination of, or the taking of any action in reliance on, the information contained in or attached to this message is prohibited. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Sender. Subject to applicable law, Sender may intercept, monitor, review and retain e-communications (EC) traveling through its networks/systems and may produce any such EC to regulators, law enforcement, in litigation and as required by law. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or free of errors or viruses. References to Sender are references to any subsidiary of Bank of America Corporation. Securities and Insurance Products: * Are Not FDIC Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a Condition to Any Banking Service or Activity * Are Not Insured by Any Federal Government Agency. Attachments that are part of this EC may have additional important disclosures and disclaimers, which you should read. This message is subject to terms available at the following link: http://www.bankofamerica.com
Re: inconsistent results when faceting on multivalued field
Pravesh, Not exactly. Here is the search I do, in more details (different field name, but same issue). I want to get a count for a specific value of the sou_codeMetier field, which is multivalued. I expressed this by including a fq clause : /select/?q=*:*facet=truefacet.field=sou_codeMetierfq=sou_codeMetier:1213206rows=0 The response (excerpt only): lst name=facet_fields lst name=sou_codeMetier int name=12132061281/int int name=1212104476/int int name=121320603285/int int name=1213101260/int int name=121320602208/int int name=121320605171/int int name=1212201152/int ... As you see, I get back both the expected results and extra results I would expect to be filtered out by the fq clause. I can eliminate the extra results with a 'f.sou_codeMetier.facet.prefix=1213206' clause. But I wonder if Solr's behavior is correct and how the fq filtering works exactly. If I replace the facet.field clause with a facet.query clause, like this: /select/?q=*:*facet=truefacet.query=sou_codeMetier:[1213206 TO 1213206]rows=0 The results contain a single item: lst name=facet_queries int name=sou_codeMetier:[1213206 TO 1213206]1281/int /lst The 'fq=sou_codeMetier:1213206' clause isn't necessary here and does not affect the results. Thanks, Alain On Fri, Oct 21, 2011 at 9:18 AM, pravesh suyalprav...@yahoo.com wrote: Could u clarify on below: When I make a search on facet.qua_code=1234567 ?? Are u trying to say, when u fire a fresh search for a facet item, like; q=qua_code:1234567?? This this would fetch for documents where qua_code fields contains either the terms 1234567 OR both terms (1234567 9384738.and others terms). This would be since its a multivalued field and hence if you see the facet, then its shown for both the terms. If I reword the query as 'facet.query=qua_code:1234567 TO 1234567', I only get the expected counts You will get facet for documents which have term 1234567 only (facet.query would apply to the facets,so as to which facet to be picked/shown) Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/inconsistent-results-when-faceting-on-multivalued-field-tp3438991p3440128.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Painfully slow indexing
As an alternative, I can suggest this one which worked great for me: - generate the ready-for-indexing XML documents on a file system - use curl to feed them into Solr I am not dealing with huge volumes, but was surprised at how *fast* Solr was indexing my documents using this simple approach. Also, the workflow is easy to manage. And the XML contents can easily be provisioned to multiple systems e.g. for setting up test environments. Regards, Alain On Fri, Oct 21, 2011 at 9:46 AM, pravesh suyalprav...@yahoo.com wrote: Are you posting through HTTP/SOLRJ? Your script time 'T' includes time between sending POST request -to- the response fetched after successful response right?? Try sending in small batches like 10-20. BTW how many documents are u indexing??? Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Painfully-slow-indexing-tp3434399p3440175.html Sent from the Solr - User mailing list archive at Nabble.com.
inconsistent results when faceting on multivalued field
I am surprised by the results I am getting from a search in a Solr 3.4 index. My schema has a multivalued field of type 'string' : field name=qua_code type=string multiValued=true indexed=true stored=true/ The field values are 7-digit or 9-digit integer numbers; this corresponds to a hierarchy. I could have used a numeric type instead of string but no numerical operations are performed against the values. Now, each document contains 0-N values for this field, such as: 8625774 1234567 123456701 123456702 123456703 9384738 When I make a search on facet.qua_code=1234567 , I am getting the counts I expect (seemingly correct) + a large number of counts for *other* field values (e.g. 9384738). If I reword the query as 'facet.query=qua_code:1234567 TO 1234567', I only get the expected counts. I can also filter out the extraneous results with a facet.prefix clause. Should I file an issue or am I misunderstanding something about faceting on multivalued fields ? Thanks.