Re: Dynamic starting or stoping of zookeepers in a cluster
This is solid information. *How about the application, which uses SOLR/Zookeeper?* Do we have to follow this guidance, to make the application ZK config aware: https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html#ch_reconfig_rebalancing <https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html#ch_reconfig_rebalancing> Or, could we leave it as is, and as long as the ZK Ensemble has the same IPs? Thanks! Joe -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Error Adding a Replica to SOLR Cloud 8.2.0
We finally got this fixed by temporarily disabling any updates to the SOLR index. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: systemd definition for solr?
Close, but not quite there yet. The rules say use systemctl start (or stop or status) solr.service That dot service part ought to be there. I suspect that if we omit it then we may be scolded on-screen and lose some grade points. On your error report below. Best to ensure that Solr is started by either /etc/init.d or systemd but not both. To check on the /etc/init.d part, go to /etc/init.d and give command chkconfig -l solr. if the result shows "On" for any run level then /etc/init.d is supposed to be in charge rather than systemd. If that were the case then your systemd control page ought to indicate that solr is a "LSB" process. On the other hand, if systemd were to be the controlling agent then ensure that the /etc/init.d part does not interfere by issuing command chkconfig -d solr which will unlink solr from its to-do list. Then say systemctl enable solr to let systemd take charge. Thus some busy work to check on things, and then making a choice of which flavour will be in charge. Thanks, Joe D. On 15/10/2020 21:03, Ryan W wrote: I didn't realize that to start a systemd service, I need to do... systemctl start solr ...and not... service solr start Now the output from the status command looks a bit better, though still with some problems... [root@faspbsy0002 system]# systemctl status solr.service ? solr.service - LSB: A very fast and reliable search engine. Loaded: loaded (/etc/rc.d/init.d/solr; bad; vendor preset: disabled) Active: active (exited) since Thu 2020-10-15 15:58:23 EDT; 19s ago Docs: man:systemd-sysv-generator(8) Process: 34100 ExecStop=/etc/rc.d/init.d/solr stop (code=exited, status=1/FAILURE) Process: 98871 ExecStart=/etc/rc.d/init.d/solr start (code=exited, status=0/SUCCESS) On Thu, Oct 15, 2020 at 3:24 PM Ryan W wrote: Does anyone have a simple systemd definition for a solr service? The things I am finding on the internet don't work. I am not sure if this is the kind of thing where there might be some boilerplate that (usually) works? Or do situations vary so much that no boilerplate is possible? Here is what I see when I try to use one of the definitions I found on the internet: [root@faspbsy0002 system]# systemctl status solr.service ? solr.service - LSB: A very fast and reliable search engine. Loaded: loaded (/etc/rc.d/init.d/solr; bad; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2020-10-15 09:32:02 EDT; 5h 50min ago Docs: man:systemd-sysv-generator(8) Process: 34100 ExecStop=/etc/rc.d/init.d/solr stop (code=exited, status=1/FAILURE) Process: 1337 ExecStart=/etc/rc.d/init.d/solr start (code=exited, status=0/SUCCESS) Oct 15 09:32:01 faspbsy0002 systemd[1]: Stopping LSB: A very fast and reliab Oct 15 09:32:01 faspbsy0002 su[34102]: (to solr) root on none Oct 15 09:32:02 faspbsy0002 solr[34100]: No process found for Solr node runn...3 Oct 15 09:32:02 faspbsy0002 systemd[1]: solr.service: control process exited...1 Oct 15 09:32:02 faspbsy0002 systemd[1]: Stopped LSB: A very fast and reliabl Oct 15 09:32:02 faspbsy0002 systemd[1]: Unit solr.service entered failed state. Oct 15 09:32:02 faspbsy0002 systemd[1]: solr.service failed. Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable. Hint: Some lines were ellipsized, use -l to show in full.
Re: solr-8983.pid: Permission denied
Ah so, systemd style. I suggestion which might help. Look at the /etc/init.d style script in the Solr distribution and uses its commands as a reference when you review the systemd equivalents. In addition, a prerequisite is choosing the user+group for Solr files, applying ownership of those files, and ensuring the systemd script uses that user information (which the init.d script does via that RUNAS item). Solr''s offered /etc/init.d file is in installation directory ~/solr/bin/init.d as file solr. For systemd I control apps via commands of the form systemctl start solr.service. Thanks, Joe D. On 15/10/2020 16:01, Ryan W wrote: I have been starting solr like so... service solr start On Thu, Oct 15, 2020 at 10:31 AM Joe Doupnik wrote: Alex has it right. In my environment I created user "solr" in group "users". Then I ensured that "solr:user" owns all of Solr's files. In addition, I do Solr start/stop with an /etc/init.d script (the Solr distribution has the basic one which we can embellish) in which there is control line RUNAS="solr". The RUNAS variable is used to properly start Solr. Thanks, Joe D. On 15/10/2020 15:02, Alexandre Rafalovitch wrote: It sounds like maybe you have started the Solr in a different way than you are restarting it. E.g. maybe you started it manually (bin/solr start, probably as a root) but are trying to restart it via service script. Who owned the .pid file? I am guessing 'root', while the service script probably runs as a different (lower-permission) user. The practical effect of that assumption is that your environmental variables were set differently and various things (e.g. logs) may not be where you expect. The solution is to be consistent in using the service to start/restart/stop your Solr. Regards, Alex. On Thu, 15 Oct 2020 at 09:51, Ryan W wrote: What is my permissions problem here: [root@faspbsy0002 bin]# service solr restart Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds to allow Jetty process 38947 to stop gracefully. /opt/solr/bin/solr: line 2125: /opt/solr/bin/solr-8983.pid: Permission denied What is the practical effect if Solr can't write this solr-8983.pid file? What user should own the contents of /opt/solr/bin ? Thanks
Re: solr-8983.pid: Permission denied
Alex has it right. In my environment I created user "solr" in group "users". Then I ensured that "solr:user" owns all of Solr's files. In addition, I do Solr start/stop with an /etc/init.d script (the Solr distribution has the basic one which we can embellish) in which there is control line RUNAS="solr". The RUNAS variable is used to properly start Solr. Thanks, Joe D. On 15/10/2020 15:02, Alexandre Rafalovitch wrote: It sounds like maybe you have started the Solr in a different way than you are restarting it. E.g. maybe you started it manually (bin/solr start, probably as a root) but are trying to restart it via service script. Who owned the .pid file? I am guessing 'root', while the service script probably runs as a different (lower-permission) user. The practical effect of that assumption is that your environmental variables were set differently and various things (e.g. logs) may not be where you expect. The solution is to be consistent in using the service to start/restart/stop your Solr. Regards, Alex. On Thu, 15 Oct 2020 at 09:51, Ryan W wrote: What is my permissions problem here: [root@faspbsy0002 bin]# service solr restart Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds to allow Jetty process 38947 to stop gracefully. /opt/solr/bin/solr: line 2125: /opt/solr/bin/solr-8983.pid: Permission denied What is the practical effect if Solr can't write this solr-8983.pid file? What user should own the contents of /opt/solr/bin ? Thanks
Solr Cloud 8.5.1 - HDFS and Erasure Coding
Anyone use Solr with Erasure Coding on HDFS? Is that supported? Thank you -Joe
Re: BasicAuth help
There is an effective alternative approach to placing authentication within Solr. It is to use the web server (say Apache) as a smart proxy to Solr and in so doing also apply access restrictions of various kinds. Thus Solr remains intact, no addition needed for authentication, and authentication can be accomplished with a known robust tool. Sketching the Apache part, to clarify matters. This example requires both an IP range and an LDAP authentication, and it supports https as well. require ip 11.22.33.44/24 5.6.7.8/28 AuthType Basic AuthBasicProvider ldap AuthName "Solr" AuthLDAPUrl ldap://example.com/o=GCHQ?uid?one?(objectClass=user) require ldap-user admin james moneypenny proxypass "http://localhost:8983/solr" keepalive=on proxypassreverse "http://localhost:8983/solr; Above, localhost can be replaced with the DNS name of another machine, that where Solr itself resides. The URI name /solr is clearly something which we can choose to suit ourselves. This example may be enhanced for local requirements. The Apache manual has full details, naturally. It is important to use proven robust tools when we deal with the bad guys. Thanks, Joe D. On 04/09/2020 08:43, Aroop Ganguly wrote: Try looking at a simple ldap authentication suggested here: https://github.com/itzmestar/ldap_solr <https://github.com/itzmestar/ldap_solr> You can combine this for authentication and couple it with rule based authorization. On Aug 28, 2020, at 12:26 PM, Vanalli, Ali A - DOT mailto:ali.vana...@dot.wi.gov>> wrote: Hello, Solr is running on windows machine and wondering if it possible to setup BasicAuth with the LDAP? Also, tried the example of Basic-Authentication that is published here<https://lucene.apache.org/solr/guide/8_6/rule-based-authorization-plugin.html#rule-based-authorization-plugin <https://lucene.apache.org/solr/guide/8_6/rule-based-authorization-plugin.html#rule-based-authorization-plugin>> but this did not work too. Thanks...Ali
Re: Understanding Solr heap %
That's good. I think I need to mention one other point about this matter. It is feeding files into Tika (in my case) is paced to avoid overloads. That is done in my crawler by having a small adjustable pause (~100ms) after each file submission, and then longer ones (1-3 sec) after every 100 and 1000 submissions. Also the crawler is set to run at a lower priority than Solr, thus giving preference to Solr. In the end we ought to run experiments to find and verify working values. Thanks, Joe D. On 02/09/2020 03:40, yaswanth kumar wrote: I got some understanding now about my actual question.. thanks all for your valuable theories Sent from my iPhone On Sep 1, 2020, at 2:01 PM, Joe Doupnik wrote: As I have not received the follow-on message to mine I will cut it below. My comments on that are the numbers are the numbers. More importantly, I have run large imports ~0.5M docs and I have watched as that progresses. My crawler paces material into Solr. Memory usage (Linux "top") shows cyclic small rises and falls, peaking at about 2GB as the crawler introduces 1 and 3 second pauses every hundred and thousand submissions.. The test shown in my original message is sufficient to show the nature of Solr versions and the choice of garbage collector, and other folks can do similar experiments on their gear. The quoted tests are indeed representative of large and small amounts of various kinds of documents, and I say that based on much experience observing the details. Quibble about GC names if you wish, but please do see those experimental results. Also note the difference in our SOLR_HEAP values: 2GB in my work, 8GB in yours. I have found 2GB to work well for importing small and very large collections (of many file varieties). Thanks, Joe D. This is misleading and not particularly good advice. Solr 8 does NOT contain G1. G1GC is a feature of the JVM. We’ve been using it with Java 8 and Solr 6.6.2 for a few years. A test with eighty documents doesn’t test anything. Try a million documents to get Solr memory usage warmed up. GC_TUNE has been in the solr.in.sh file for a long time. Here are the settings we use with Java 8. We have about 120 hosts running Solr in six prod clusters. SOLR_HEAP=8g # Use G1 GC -- wunder 2017-01-23 # Settings from https://wiki.apache.org/solr/ShawnHeisey GC_TUNE=" \ -XX:+UseG1GC \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=8m \ -XX:MaxGCPauseMillis=200 \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ " wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On 01/09/2020 16:39, Joe Doupnik wrote: Erick states this correctly. To give some numbers from my experiences, here are two slides from my presentation about installing Solr (https://netlab1.net/, locate item "Solr/Lucene Search Service"): Thus we see a) experiments are the key, just as Erick says, and b) the choice of garbage collection algorithm plays a major role. In my setup I assigned SOLR_HEAP to be 2048m, SOLR_OPTS has -Xss1024k, plus stock GC_TUNE values. Your "memorage" may vary. Thanks, Joe D. On 01/09/2020 15:33, Erick Erickson wrote: You want to run with the smallest heap you can due to Lucene’s use of MMapDirectory, see the excellent: https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html There’s also little reason to have different Xms and Xmx values, that just means you’ll eventually move a bunch of memory around as the heap expands, I usually set them both to the same value. How to determine what “the smallest heap you can” is? Unfortunately there’s no good way outside of stress-testing your application with less and less memory until you have problems, then add some extra… Best, Erick On Sep 1, 2020, at 10:27 AM, Dominique Bejean wrote: Hi, As all Java applications the Heap memory is regularly cleaned by the garbage collector (some young items moved to the old generation heap zone and unused old items removed from the old generation heap zone). This causes heap usage to continuously grow and reduce. Regards Dominique Le mar. 1 sept. 2020 à 13:50, yaswanth kumar a écrit : Can someone make me understand on how the value % on the column Heap is calculated. I did created a new solr cloud with 3 solr nodes and one zookeeper, its not yet live neither interms of indexing or searching, but I do see some spikes in the HEAP column against nodes when I refresh the page multiple times. Its like almost going to 95% (sometimes) and then coming down to 50% Solr version: 8.2 Zookeeper: 3.4 JVM size configured in solr.in.sh is min of 1GB to max of 10GB (actually RAM size on the node is 16GB) Basically need to understand if I need to worry about this heap % which was quite altering before making it live? or is that quite normal, because this is new UI change on solr cloud is kind of new to us as we used to have sol
Re: Exclude a folder/directory from indexing
Some time ago I faced a roughly similar challenge. After many trials and tests I ended up creating my own programs to accomplish the tasks of fetching files, selecting which are allowed to be indexed, and feeding them into Solr (POST style). This work is open source, found on https://netlab1.net/, web page section titled Presentations of long term utility, item Solr/Lucene Search Service. This is a set of docs, three small PHP programs, and a Solr schema etc bundle, all within one downloadable zip file. On filtering found files, my solution uses a list of regular expressions which are simple to state and to process. The docs discuss the rules. Luckily, the code dealing with rules per se and doing the filtering is very short and simple; see crawler.php for convertfilter() and filterbyname(). Thus you may wish to consider them or equivalents for inclusion in your system, whatever that may be. Thanks, Joe D. On 27/08/2020 20:32, Alexandre Rafalovitch wrote: If you are indexing from Drupal into Solr, that's the question for Drupal's solr module. If you are doing it some other way, which way are you doing it? bin/post command? Most likely this is not the Solr question, but whatever you have feeding data into Solr. Regards, Alex. On Thu, 27 Aug 2020 at 15:21, Staley, Phil R - DCF wrote: Can you or how do you exclude a specific folder/directory from indexing in SOLR version 7.x or 8.x? Also our CMS is Drupal 8 Thanks, Phil Staley DCF Webmaster 608 422-6569 phil.sta...@wisconsin.gov
Re: PDF extraction using Tika
More properly,it would be best to fix Tika and thus not push extra complexity upon many many users. Error handling is one thing, crashes though ought to be designed out. Thanks, Joe D. On 25/08/2020 10:54, Charlie Hull wrote: On 25/08/2020 06:04, Srinivas Kashyap wrote: Hi Alexandre, Yes, these are the same PDF files running in windows and linux. There are around 30 pdf files and I tried indexing single file, but faced same error. Is it related to how PDF stored in linux? Did you try running Tika (the same version as you're using in Solr) standalone on the file as Alexandre suggested? And with regard to DIH and TIKA going away, can you share if any program which extracts from PDF and pushes into solr? https://lucidworks.com/post/indexing-with-solrj/ is one example. You should run Tika separately as it's entirely possible for it to fail to parse a PDF and crash - and if you're running it in DIH & Solr it then brings down everything. Separate your PDF processing from your Solr indexing. Cheers Charlie Thanks, Srinivas Kashyap -Original Message- From: Alexandre Rafalovitch Sent: 24 August 2020 20:54 To: solr-user Subject: Re: PDF extraction using Tika The issue seems to be more with a specific file and at the level way below Solr's or possibly even Tika's: Caused by: java.io.IOException: expected='>' actual=' ' at offset 2383 at org.apache.pdfbox.pdfparser.BaseParser.readExpectedChar(BaseParser.java:1045) Are you indexing the same files on Windows and Linux? I am guessing not. I would try to narrow down which of the files it is. One way could be to get a standalone Tika (make sure to match the version Solr embeds) and run it over the documents by itself. It will probably complain with the same error. Regards, Alex. P.s. Additionally, both DIH and Embedded Tika are not recommended for production. And both will be going away in future Solr versions. You may have a much less brittle pipeline if you save the structured outputs from those Tika standalone runs and then index them into Solr, possibly pre-processed. On Mon, 24 Aug 2020 at 11:09, Srinivas Kashyap wrote: Hello, We are using TikaEntityProcessor to extract the content out of PDF and make the content searchable. When jetty is run on windows based machine, we are able to successfully load documents using full import DIH(tika entity). Here PDF's is maintained in windows file system. But when jetty solr is run on linux machine, and try to run DIH, we are getting below exception: (Here PDF's are maintained in linux filesystem) Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) ... 4 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) ... 6 more Caused by: org.apache.tika.exception.TikaException: Unable to extract PDF content at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:139) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParse
Re: Solr with HDFS configuration example running in production/dev
Are you running with solr.lock.type=hdfs ? Have you defined your DirectoryFactory - something like: true true 43 name="solr.hdfs.blockcache.direct.memory.allocation">true 16384 true true 128 1024 hdfs://nameservice1:8020/solr8.2.0 /etc/hadoop/conf.cloudera.hdfs1 -Joe On 8/20/2020 2:30 AM, Prashant Jyoti wrote: Hi Joe, These are the errors I am running into: org.apache.solr.common.SolrException: Error CREATEing SolrCore 'newcollsolr2_shard1_replica_n1': Unable to create core [newcollsolr2_shard1_replica_n1] Caused by: Illegal char <:> at index 4: hdfs://hn1-pjhado.tvbhpqtgh3judk1e5ihrx2k21d.tx.internal.cloudapp.net:8020/user/solr-data/newcollsolr2/core_node3/data\ <http://hn1-pjhado.tvbhpqtgh3judk1e5ihrx2k21d.tx.internal.cloudapp.net:8020/user/solr-data/newcollsolr2/core_node3/data\> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1256) at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:93) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:362) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211) at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:842) at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:808) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:559) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221) at org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.Server.handle(Server.java:500) at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383) at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.solr.common.SolrExcepti
Re: Solr doesn't run after editing solr.in.sh
On 22/08/2020 22:08, maciejpreg...@tutanota.com.INVALID wrote: Good morning. When I uncomment any of commands in solr.in.sh, Solr doesn't run. What do I have to do to fix a problem? Best regards, Maciej Pregiel On 22/08/2020 22:08, maciejpreg...@tutanota.com.INVALID wrote: Good morning. When I uncomment any of commands in solr.in.sh, Solr doesn't run. What do I have to do to fix a problem? Best regards, Maciej Pregiel - My approach has been to add local configuration options to the end of the file and leave intact the original text. Here is the end of my file, which has no changes above this material: #SOLR_SECURITY_MANAGER_ENABLED=false ## JRD values SOLR_ULIMIT_CHECKS=falseGC_TUNE=" \ -XX:SurvivorRatio=4 \ -XX:TargetSurvivorRatio=90 \ -XX:MaxTenuringThreshold=8 \ -XX:+UseConcMarkSweepGC \ -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \ -XX:+CMSScavengeBeforeRemark \ -XX:PretenureSizeThreshold=64m \ -XX:+UseCMSInitiatingOccupancyOnly \ -XX:CMSInitiatingOccupancyFraction=50 \ -XX:CMSMaxAbortablePrecleanTime=6000 \ -XX:+CMSParallelRemarkEnabled \ -XX:+ParallelRefProcEnabled \ -XX:-OmitStackTraceInFastThrow" #JRD give more memory ##SOLR_HEAP="4096m" SOLR_HEAP="2048m" ##JRD enlarge this #SOLR_OPTS="$SOLR_OPTS -Xss512k" SOLR_OPTS="$SOLR_OPTS -Xss1024k" SOLR_STOP_WAIT=30 SOLR_JAVA_HOME="/usr/java/latest/" SOLR_PID_DIR="/home/search/solr" SOLR_HOME="/home/search/solr/data" SOLR_LOGS_DIR="/home/search/solr/logs" SOLR_PORT="8983" SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=3000" SOLR_OPTS="$SOLR_OPTS -Dsolr.autoCommit.maxTime=6" SOLR_OPTS="$SOLR_OPTS -Djava.io.tmpdir=/home/search/tmp" Thanks, Joe D.
Re: Solr with HDFS configuration example running in production/dev
Your exception didn't come across - can you paste it in? -Joe On 8/19/2020 10:50 AM, Prashant Jyoti wrote: You're right Andrew. Even I read about that. But there's a use case for which we want to configure the said case. Are you also aware of what feature we are moving towards instead of HDFS? Will you be able to help me with the error that I'm running into? Thanks in advance! On Wed, 19 Aug, 2020, 5:24 pm Andrew MacKay, wrote: I believe HDFS support is being deprecated in Solr. Not sure you want to continue configuration if support will disappear. On Wed, Aug 19, 2020 at 7:52 AM Prashant Jyoti wrote: Hi all, Hope you are healthy and safe. Need some help with HDFS configuration. Could anybody of you share an example of the configuration with which you are running Solr with HDFS in any of your production/dev environments? I am interested in the parts of SolrConfig.xml / Solr.in.cmd/sh which you may have modified. Obviously with the security parts obfuscated. I am stuck at an error and unable to move ahead. Attaching the exception log if anyone is interested to look at the error. Thanks! -- Regards, Prashant. -- CONFIDENTIALITY NOTICE: The information contained in this email is privileged and confidential and intended only for the use of the individual or entity to whom it is addressed. If you receive this message in error, please notify the sender immediately at 613-729-1100 and destroy the original message and all copies. Thank you.
Re: Creating 100000 dynamic fields in solr
Could you use a multi-valued field for user in each of your products? So productA and a field User that is a list of all the users that have productA. Then you could do a search like: user:User1 AND Product_A_cost:[5 TO 10] user:(User1 User5...) AND Product_B_cost[0 TO 40] -Joe On 5/11/2020 5:35 AM, Vignan Malyala wrote: I have around 1M products used by my clients. Client need a filter of these 1M products by their cost filters. Just like: User1 has 5 products (A,B,C,D,E) User2 has 3 products (D,E,F) User3 has 10 products (A,B,C,H,I,J,K,L,M,N,O) ...every customer has different sets. Now they want to search users by filter of product costs: Product_A_cost : 50 TO 100 Product_D_cost : 0 TO 40 it should return all the users who use products in this filter range. As I have 1M products, do I need to create dynamic fields for all users with filed names as Product_A_cost and product_B_cost. etc to make a search by them? If I should, then I haveto create 1M dynamic fields Or is there any other way? Hope I'm clear here! On Mon, May 11, 2020 at 1:47 PM Jan Høydahl wrote: Sounds like an anti pattern. Can you explain what search problem you are trying to solve with this many unique fields? Jan Høydahl 11. mai 2020 kl. 07:51 skrev Vignan Malyala : Hi Is it good idea to create 10 dynamic fields of time pint in solr? I have that many fields to search on actually which come upon based on users. Thanks in advance! And I'm using Solr Cloud in real-time. Regards, Sai Vignan M
Re: Delete on 8.5.1
Hi All - while I'm still getting the error, it does appear to work (still gives the error - but a search of the data then shows less results - so the delete is working). In some cases, it may be necessary to run the query several times. -Joe On 4/29/2020 9:03 AM, Joe Obernberger wrote: Hi - I also tried deleting from solrj (8.5.1) using CloudSolrClient.deleteByQuery. This results in: Error: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://paradigm8:9100/solr/PROCESSOR_LOGS: Async exception during distributed update: Error from server at http://paradigm7:9100/solr/PROCESSOR_LOGS_shard6_replica_n20/: null request: http://paradigm7:9100/solr/PROCESSOR_LOGS_shard6_replica_n20/ Remote error message: Task queue processing has stalled for 20203 ms with 0 remaining elements to process. org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://paradigm8:9100/solr/PROCESSOR_LOGS: Async exception during distributed update: Error from server at http://paradigm7:9100/solr/PROCESSOR_LOGS_shard6_replica_n20/: null request: http://paradigm7:9100/solr/PROCESSOR_LOGS_shard6_replica_n20/ Remote error message: Task queue processing has stalled for 20203 ms with 0 remaining elements to process. at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:665) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:265) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) at org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:368) at org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:296) at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1143) at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:906) at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:838) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) at org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:940) at org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:903) at com.ngc.bigdata.solrsearcher.SearcherThread.doSearch(SearcherThread.java:401) at com.ngc.bigdata.solrsearcher.SearcherThread.run(SearcherThread.java:125) at com.ngc.bigdata.solrsearcher.Worker.doSearchTest(Worker.java:145) at com.ngc.bigdata.solrsearcher.SolrSearcher.main(SolrSearcher.java:60) On 4/28/2020 11:50 AM, Joe Obernberger wrote: Hi all - I'm running this query on solr cloud 8.5.1 with the index on HDFS: curl http://enceladus:9100/solr/PROCESSOR_LOGS/update?commit=true -H "Connect-Type: text/xml" --data-binary 'StartTime:[2020-01-01T01:02:43Z TO 2020-04-25T00:00:00Z]' getting this response: 1 500 54091 name="error-class">org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException name="root-error-class">org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException 2 Async exceptions during distributed update: Error from server at http://paradigm8:9100/solr/PROCESSOR_LOGS_shard2_replica_n4/: null request: http://paradigm8:9100/solr/PROCESSOR_LOGS_shard2_replica_n4/ Remote error message: Task queue processing has stalled for 20193 ms with 0 remaining elements to process. Error from server at http://belinda:9100/solr/PROCESSOR_LOGS_shard10_replica_n38/: null request: http://belinda:9100/solr/PROCESSOR_LOGS_shard10_replica_n38/ Remote error message: Task queue processing has stalled for 20021 ms with 0 remaining elements to process. name="trace">org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: 2 Async exceptions during distributed update: Error from server at http://paradigm8:9100/solr/PROCESSOR_LOGS_shard2_replica_n4/: null request: http://paradigm8:9100/solr/PROCESSOR_LOGS_shard2_replica_n4/ Remote error message: Task queue processing has stalled for 20193 ms with 0 remaining elements to process. Error from server at http://belinda:9100/solr/PROCESSOR_LOGS_shard10_replica_n38/: null request: http://belinda:9100/solr/PROCESSOR_LOGS_shard10_replica_n38/ Remote error message: Task queue processing has stalled for 20021 ms with 0 remaining elements to process. at org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribFinish(DistributedZkUpdateProcessor.java:1189) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1096) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdatePr
Re: Delete on 8.5.1
Hi - I also tried deleting from solrj (8.5.1) using CloudSolrClient.deleteByQuery. This results in: Error: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://paradigm8:9100/solr/PROCESSOR_LOGS: Async exception during distributed update: Error from server at http://paradigm7:9100/solr/PROCESSOR_LOGS_shard6_replica_n20/: null request: http://paradigm7:9100/solr/PROCESSOR_LOGS_shard6_replica_n20/ Remote error message: Task queue processing has stalled for 20203 ms with 0 remaining elements to process. org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://paradigm8:9100/solr/PROCESSOR_LOGS: Async exception during distributed update: Error from server at http://paradigm7:9100/solr/PROCESSOR_LOGS_shard6_replica_n20/: null request: http://paradigm7:9100/solr/PROCESSOR_LOGS_shard6_replica_n20/ Remote error message: Task queue processing has stalled for 20203 ms with 0 remaining elements to process. at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:665) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:265) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) at org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:368) at org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:296) at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1143) at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:906) at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:838) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) at org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:940) at org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:903) at com.ngc.bigdata.solrsearcher.SearcherThread.doSearch(SearcherThread.java:401) at com.ngc.bigdata.solrsearcher.SearcherThread.run(SearcherThread.java:125) at com.ngc.bigdata.solrsearcher.Worker.doSearchTest(Worker.java:145) at com.ngc.bigdata.solrsearcher.SolrSearcher.main(SolrSearcher.java:60) On 4/28/2020 11:50 AM, Joe Obernberger wrote: Hi all - I'm running this query on solr cloud 8.5.1 with the index on HDFS: curl http://enceladus:9100/solr/PROCESSOR_LOGS/update?commit=true -H "Connect-Type: text/xml" --data-binary 'StartTime:[2020-01-01T01:02:43Z TO 2020-04-25T00:00:00Z]' getting this response: 1 500 54091 name="error-class">org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException name="root-error-class">org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException 2 Async exceptions during distributed update: Error from server at http://paradigm8:9100/solr/PROCESSOR_LOGS_shard2_replica_n4/: null request: http://paradigm8:9100/solr/PROCESSOR_LOGS_shard2_replica_n4/ Remote error message: Task queue processing has stalled for 20193 ms with 0 remaining elements to process. Error from server at http://belinda:9100/solr/PROCESSOR_LOGS_shard10_replica_n38/: null request: http://belinda:9100/solr/PROCESSOR_LOGS_shard10_replica_n38/ Remote error message: Task queue processing has stalled for 20021 ms with 0 remaining elements to process. name="trace">org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: 2 Async exceptions during distributed update: Error from server at http://paradigm8:9100/solr/PROCESSOR_LOGS_shard2_replica_n4/: null request: http://paradigm8:9100/solr/PROCESSOR_LOGS_shard2_replica_n4/ Remote error message: Task queue processing has stalled for 20193 ms with 0 remaining elements to process. Error from server at http://belinda:9100/solr/PROCESSOR_LOGS_shard10_replica_n38/: null request: http://belinda:9100/solr/PROCESSOR_LOGS_shard10_replica_n38/ Remote error message: Task queue processing has stalled for 20021 ms with 0 remaining elements to process. at org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribFinish(DistributedZkUpdateProcessor.java:1189) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1096) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182) at org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) at org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:80) at org.apache.solr.update.processor.UpdateReque
Delete on 8.5.1
r.ContextHandler.doScope(ContextHandler.java:1212) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221) at org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.Server.handle(Server.java:500) at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383) at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938) at java.lang.Thread.run(Thread.java:748) 500 Is there a way to process this asynchronously? -Joe
Query confusion - solr cloud 8.2.0
I'm running the following query: id:COLLECT2601697594_T496 AND (person:[80 TO 100]) That returns 1 hit. The following query also returns the same hit: id:COLLECT2601697594_T496 AND ((POP16_Rez1:blue_Sky AND POP16_Sc1:[80 TO 100]) OR (POP16_Rez2:blue_Sky AND POP16_Sc2:[80 TO 100]) OR (POP16_Rez3:blue_Sky AND POP16_Sc3:[80 TO 100]) OR (POP19_Rez1:blue_Sky AND POP19_Sc1:[80 TO 100]) OR (POP19_Rez2:blue_Sky AND POP19_Sc2:[80 TO 100]) OR (POP19_Rez3:blue_Sky AND POP19_Sc3:[80 TO 100]) OR (ResN_Rez1:blue_Sky AND ResN_Sc1:[80 TO 100]) OR (ResN_Rez2:blue_Sky AND ResN_Sc2:[80 TO 100]) OR (ResN_Rez3:blue_Sky AND ResN_Sc3:[80 TO 100])) but AND'ing the two together returns 0 hits. What am I missing? id:COLLECT2601697594_T496 AND ((POP16_Rez1:blue_Sky AND POP16_Sc1:[80 TO 100]) OR (POP16_Rez2:blue_Sky AND POP16_Sc2:[80 TO 100]) OR (POP16_Rez3:blue_Sky AND POP16_Sc3:[80 TO 100]) OR (POP19_Rez1:blue_Sky AND POP19_Sc1:[80 TO 100]) OR (POP19_Rez2:blue_Sky AND POP19_Sc2:[80 TO 100]) OR (POP19_Rez3:blue_Sky AND POP19_Sc3:[80 TO 100]) OR (ResN_Rez1:blue_Sky AND ResN_Sc1:[80 TO 100]) OR (ResN_Rez2:blue_Sky AND ResN_Sc2:[80 TO 100]) OR (ResN_Rez3:blue_Sky AND ResN_Sc3:[80 TO 100])) AND (person:[80 TO 100]) Thank you! -Joe
Re: SolrCloud 8.2.0 - adding a field
Nevermind - I see that I need to specify an existing collection not a schema. There is no collection called UNCLASS - only a schema. -Joe On 4/1/2020 4:52 PM, Joe Obernberger wrote: Hi All - I'm trying this: curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"Para450","type":"text_general","stored":"false","indexed":"true","docValues":"false","multiValued":"false"}}' http://ursula.querymasters.com:9100/api/cores/UNCLASS/schema This results in: { "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"no core retrieved for UNCLASS", "code":404}} I've also tried going to api/c: curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"Para450","type":"text_general","stored":"false","indexed":"true","docValues":"false","multiValued":"false"}}' http://ursula.querymasters.com:9100/api/c/UNCLASS/schema results in: { "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"no such collection or alias", "code":400}} What am I doing wrong? The schema UNCLASS does exist in Zookeeper. Thanks! -Joe
SolrCloud 8.2.0 - adding a field
Hi All - I'm trying this: curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"Para450","type":"text_general","stored":"false","indexed":"true","docValues":"false","multiValued":"false"}}' http://ursula.querymasters.com:9100/api/cores/UNCLASS/schema This results in: { "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"no core retrieved for UNCLASS", "code":404}} I've also tried going to api/c: curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"Para450","type":"text_general","stored":"false","indexed":"true","docValues":"false","multiValued":"false"}}' http://ursula.querymasters.com:9100/api/c/UNCLASS/schema results in: { "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"no such collection or alias", "code":400}} What am I doing wrong? The schema UNCLASS does exist in Zookeeper. Thanks! -Joe
Re: Solr 8.2.0 - Schema issue
Erick - tried this, had to run it async, but it's been running for over 24 hours on one collection with: { "responseHeader":{ "status":0, "QTime":18326}, "status":{ "state":"submitted", "msg":"found [5] in submitted tasks"}} I don't see anything in the logs. -Joe On 3/6/2020 1:43 PM, Joe Obernberger wrote: Thank you Erick - I have no record of that, but will absolutely give the API RELOAD a shot! Thank you! -Joe On 3/6/2020 10:26 AM, Erick Erickson wrote: Didn’t we talk about reloading the collections that share the schema after the schema change via the collections API RELOAD command? Best, Erick On Mar 6, 2020, at 05:34, Joe Obernberger wrote: Hi All - any ideas on this? Anything I can try? Thank you! -Joe On 2/26/2020 9:01 AM, Joe Obernberger wrote: Hi All - I have several solr collections all with the same schema. If I add a field to the schema and index it into the collection on which I added the field, it works fine. However, if I try to add a document to a different solr collection that contains the new field (and is using the same schema), I get an error that the field doesn't exist. If I restart the cluster, this problem goes away and I can add a document with the new field to any solr collection that has the schema. Any work-arounds that don't involve a restart? Thank you! -Joe Obernberger
Re: Solr 8.2.0 - Schema issue
Thank you Erick - I have no record of that, but will absolutely give the API RELOAD a shot! Thank you! -Joe On 3/6/2020 10:26 AM, Erick Erickson wrote: Didn’t we talk about reloading the collections that share the schema after the schema change via the collections API RELOAD command? Best, Erick On Mar 6, 2020, at 05:34, Joe Obernberger wrote: Hi All - any ideas on this? Anything I can try? Thank you! -Joe On 2/26/2020 9:01 AM, Joe Obernberger wrote: Hi All - I have several solr collections all with the same schema. If I add a field to the schema and index it into the collection on which I added the field, it works fine. However, if I try to add a document to a different solr collection that contains the new field (and is using the same schema), I get an error that the field doesn't exist. If I restart the cluster, this problem goes away and I can add a document with the new field to any solr collection that has the schema. Any work-arounds that don't involve a restart? Thank you! -Joe Obernberger
Re: Solr 8.2.0 - Schema issue
Hi All - any ideas on this? Anything I can try? Thank you! -Joe On 2/26/2020 9:01 AM, Joe Obernberger wrote: Hi All - I have several solr collections all with the same schema. If I add a field to the schema and index it into the collection on which I added the field, it works fine. However, if I try to add a document to a different solr collection that contains the new field (and is using the same schema), I get an error that the field doesn't exist. If I restart the cluster, this problem goes away and I can add a document with the new field to any solr collection that has the schema. Any work-arounds that don't involve a restart? Thank you! -Joe Obernberger
Solr 8.2.0 - Schema issue
Hi All - I have several solr collections all with the same schema. If I add a field to the schema and index it into the collection on which I added the field, it works fine. However, if I try to add a document to a different solr collection that contains the new field (and is using the same schema), I get an error that the field doesn't exist. If I restart the cluster, this problem goes away and I can add a document with the new field to any solr collection that has the schema. Any work-arounds that don't involve a restart? Thank you! -Joe Obernberger
Split Shard - HDFS Index - Solr 7.6.0
Hi All - Getting this error when trying to split a shard. HDFS has space available, but it looks like it is using the local disk storage value instead of available HDFS disk space. Is there a workaround? Thanks! { "responseHeader": { "status": 0, "QTime": 6 }, "Operation splitshard caused exception:": "org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: not enough free disk space to perform index split on node sys-hadoop-1:9100_solr, required: 306.76734546013176, available: 16.772361755371094", "exception": { "msg": "not enough free disk space to perform index split on node sys-hadoop-1:9100_solr, required: 306.76734546013176, available: 16.772361755371094", "rspCode": 500 }, "status": { "state": "failed", "msg": "found [] in failed tasks" } } -Joe Obernberger
NoClassDefFoundError - Faceting on 8.2.0
) at org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:392) at org.apache.solr.handler.component.SpatialHeatmapFacets.getHeatmapForField(SpatialHeatmapFacets.java:50) at org.apache.solr.request.SimpleFacets.getHeatmapCounts(SimpleFacets.java:1204) at org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:334) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:274) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:305) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2578) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:780) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566) ... 42 more But that class is clearly present in the classpath in solr-core-8.2.0.jar. Any idea how this is happening? Thank you! -Joe Obernberger
Re: native Thread - solr 8.2.0
Thank you Erick - it was a mistake for this collection to be running in schemaless mode; I will fix that, but right now the 'PROCESSOR_LOGS' schema only has 10 fields. Another managed schema in the system has over 1,000. Shawn - I did see a post about setting vm.max_map_count higher (it was 65,530) and I increased it to 262144. For the solr user, we're using 102,400 for open files and for max user processes, we use 65,000. -Joe On 12/10/2019 7:46 AM, Erick Erickson wrote: One other red flag is you’re apparently running in “schemaless” mode, based on: org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:475) When running in schemaless mode, if Solr encounters a field in a doc that it hasn’t seen before it will add a new field to the schema. Which will update the schema and reload the collection. If this is happening in the middle of “heavy indexing”, it’s going to clog up the works. Please turn this off. See the message when you create a collection or look at the ref guide for how. The expense is one reason, but the second reason is that you have no control at all about how many fields you have in your index. Solr will merrily create these for _any_ new field. If you’re _lucky_, Solr will guess right. If you’re not lucky, Solr will start refusing to index documents due to field incompatibilities. Say the first value for a field is “1”. Solr guesses it’s an int. The next doc has “1.0”. solr will fail the doc. Next up. When Solr has thousands of fields it starts to bog down due to housekeeping complexity. Do you have any idea how many fields have actually been realized in your index? 5? 50? 100K? The admin UI>>core>>schema will give you an idea. Of course if your input docs are very tightly controlled, this really won’t be a problem, but in that case you don’t need schemaless anyway. Why am I belaboring this? Because this may be the root of your thread issue. As you keep throwing docs at Solr, it has to queue them up if it’s making schema changes until the schema is updated and re-distributed to all replicas…. Best, Erick On Dec 10, 2019, at 2:25 AM, Walter Underwood wrote: We’ve run into this fatal problem with 6.6 in prod. It gets overloaded, make 4000 threads, runs out of memory, and dies. Not an acceptable design. Excess load MUST be rejected, otherwise the system goes into a stable congested state. I was working with John Nagle when he figured this out in the late 1980s. https://www.researchgate.net/publication/224734039_On_Packet_Switches_with_Infinite_Storage wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Dec 9, 2019, at 11:14 PM, Mikhail Khludnev wrote: My experience with "OutOfMemoryError: unable to create new native thread" as follows: it occurs on envs where devs refuse to use threadpools in favor of old good new Thread(). Then, it turns rather interesting: If there are plenty of heap, GC doesn't sweep Thread instances. Since they are native in Java, every of them hold some ram for native stack. That exceeds stack space at some point of time. So, check how many thread JVM hold after this particular OOME occurs by jstack; you can even force GC to release that native stack space. Then, rewrite the app, or reduce heap to enforce GC. On Tue, Dec 10, 2019 at 9:44 AM Shawn Heisey wrote: On 12/9/2019 2:23 PM, Joe Obernberger wrote: Getting this error on some of the nodes in a solr cloud during heavy indexing: Caused by: java.lang.OutOfMemoryError: unable to create new native thread Java was not able to start a new thread. Most likely this is caused by the operating system imposing limits on the number of processes or threads that a user is allowed to start. On Linux, the default limit is usually 1024 processes. It doesn't take much for a Solr install to need more threads than that. How to increase the limit will depend on what OS you're running on. Typically on Linux, this is controlled by /etc/security/limits.conf. If you're not on Linux, then you'll need to research how to increase the process limit. As long as you're fiddling with limits, you'll probably also want to increase the open file limit. Thanks, Shawn -- Sincerely yours Mikhail Khludnev
native Thread - solr 8.2.0
(DFSClient.java:1093) at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:463) at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:460) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:474) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1150) at org.apache.solr.store.hdfs.HdfsFileWriter.getOutputStream(HdfsFileWriter.java:51) at org.apache.solr.store.hdfs.HdfsFileWriter.(HdfsFileWriter.java:40) at org.apache.solr.store.hdfs.HdfsDirectory.createOutput(HdfsDirectory.java:114) at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:74) at org.apache.solr.store.blockcache.BlockDirectory.createOutput(BlockDirectory.java:351) at org.apache.lucene.store.NRTCachingDirectory.unCache(NRTCachingDirectory.java:301) at org.apache.lucene.store.NRTCachingDirectory.sync(NRTCachingDirectory.java:156) at org.apache.lucene.store.LockValidatingDirectoryWrapper.sync(LockValidatingDirectoryWrapper.java:68) at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4804) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3276) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3444) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3409) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:671) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:270) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more Server is running with these parameters: -DDSTOP.KEY=solrrocks -DSTOP.PORT=8100 -Dhost=puck -Djava.library.path=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native -Djetty.home=/opt/solr8/server -Djetty.port=9100 -Dsolr.autoCommit.maxTime=180 -Dsolr.autoSoftCommit.maxTime=12 -Dsolr.clustering.enabled=true -Dsolr.data.home= -Dsolr.default.confdir=/opt/solr8/server/solr/configsets/_default/conf -Dsolr.install.dir=/opt/solr8 -Dsolr.jetty.https.port=9100 -Dsolr.lock.type=hdfs -Dsolr.log.dir=/opt/solr8/server/logs -Dsolr.log.muteconsole -Dsolr.solr.home=/etc/solr8 -Dsolr.solr.home=/opt/solr8/server/solr -Duser.timezone=UTC -DzkClientTimeout=30 -DzkHost=frodo.querymasters.com:2181,bilbo.querymasters.com:2181,gandalf.querymasters.com:2181,cordelia.querymasters.com:2181,cressida.querymasters.com:2181/solr8.2.0 -XX:+AggressiveOpts -XX:+ParallelRefProcEnabled -XX:+PerfDisableSharedMem -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseGCLogFileRotation -XX:+UseLargePages -XX:-ResizePLAB -XX:G1HeapRegionSize=16m -XX:GCLogFileSize=20M -XX:InitiatingHeapOccupancyPercent=75 -XX:MaxDirectMemorySize=8g -XX:MaxGCPauseMillis=500 -XX:NumberOfGCLogFiles=9 -XX:OnOutOfMemoryError=/opt/solr8/bin/oom_solr.sh 9100 /opt/solr8/server/logs -XX:ParallelGCThreads=8-Xloggc:/opt/solr8/server/logs/solr_gc.log-Xms20g-Xmx25g-Xss256k -verbose:gc Any ideas? Thanks. -Joe
Re: Solr 8.2.0 - Unable to write response
Thank you Shawn. What I'm trying to get for my application is the commitTimeMSec. I use that value to build up an alias of solr collections. Is there a better way? -Joe On 11/1/2019 10:17 AM, Shawn Heisey wrote: On 11/1/2019 7:20 AM, Joe Obernberger wrote: Hi All - getting this error from only one server in a 45 node cluster when calling COLSTATUS. Any ideas? 2019-11-01 13:17:32.556 INFO (qtp694316372-44709) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={name=UNCLASS_2019_1_18_36=COLSTATUS=javabin=2} status=0 QTime=94734 2019-11-01 13:17:32.567 INFO (qtp694316372-44688) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={name=UNCLASS_2021_2_17_36=COLSTATUS=javabin=2} status=0 QTime=815338 2019-11-01 13:17:32.570 INFO (qtp694316372-44688) [ ] o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we are shutting down => org.eclipse.jetty.io.EofException: Closed at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:491) org.eclipse.jetty.io.EofException: Closed Jetty's EofException almost always means one specific thing. The client closed the connection before Solr could respond, so when Solr finally finished processing and tried to have Jetty send the response, there was nowhere to send it -- the connection was gone. The first two lines of the log snippet indicate that there was one COLSTATUS call that took nearly 95 seconds, and one that took 815 seconds, which is close to 15 minutes. Apparently those two calls completed at the same time, even though they did not start at the same time. Which suggests that for many minutes, the Solr server has been under severe stress that prevented it from responding quickly to the COLSTATUS request. The client gave up and closed its connection ... probably on the one that took almost 15 minutes. I found that the COLSTATUS action is mentioned in the 8.1 reference guide on the "Collections API" page, but it is not there on the same page in the 8.2 guide. That page in 8.2 appears to be significantly smaller and missing most of the documentation that 8.1 has. So I think we had a problem with documentation generation on 8.2. Based on what I can see in the response on the 8.1 guide, I'm betting that gathering COLSTATUS takes a fair amount of processing, and any performance issues will make it very slow. Thanks, Shawn
Solr 8.2.0 - Unable to write response
610] at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) ~[jetty-servlet-9.4.19.v20190610.jar:9.4.19.v20190610] at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678) ~[jetty-server-9.4.19.v20190610.jar:9.4.19.v20190610] at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) ~[jetty-server-9.4.19.v20190610.jar:9.4.19.v20190610] at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249) ~[jetty-server-9.4.19.v20190610.jar:9.4.19.v20190610] -Joe
SolrCloud streaming innerJoin unexplained results
I don't believe I am getting expected results when using a streaming expression that simply uses innerJoin. Here's the example: innerJoin( search(illuminate, q=(mrn:123) (*:*), fl="key,mrn", sort="mrn asc"), search(illuminate, q=(foo*), fl="key,mrn,*", sort="mrn asc"), on="mrn" ) All documents in my scenario are sharded based on the mrn field. Therefore, only one shard will have documents that match the left query. All shards will match the right. The problem I'm seeing is that I get no results back for this query. Even though, on one of the shards, there is a match on the left and right. When I add mrn:123 to the right side I get documents back, presumably because of its returning results like Scenario 1 below! Here's what I'm noticing: Scenario1: no matches for either side on the first shard. #L/R means match count Shard: #Left, #Right 1: 0 0 2: 1 100 Scenario 1 doesn't match anything in the first shard but does match in the second. The result is what I would get if I only queried the second shard. This is great! Scenario 2: the first shard matches something on the right but not the left. Shard: #Left, #Right 1: 0 100 2: 1 100 Scenario 2 I get back no results.
Solr 8.2 - Added Field - can't facet using alias
Hi All, I've added a field with: curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"FaceCluster","type":"plongs","stored":false,"multiValued":true,"indexed":true}}' http://miranda:9100/solr/UNCLASS_2019_8_5_36/schema It returned success. In the UI, when I examine the schema, it shows up but does not list 'schema' with the check-boxes for Indexed/DocValues etc.. It only lists Properties for FaceCluster. Other plong fields that were added a while back and show both properties and schema. While I can facet on this field using an alias, I get 'Error from server at null: undefined field: FaceCluster'. If I search an individual solr collection, I can facet on it. Any ideas? -Joe
8.2.0 - REPLACENODE
Hi All - I just ran the REPLACENODE command on a cluster with 5 nodes in it. I ran the command async, and it failed with: { "responseHeader":{ "status":0, "QTime":11}, "Operation replacenode caused exception:":"java.util.concurrent.RejectedExecutionException:java.util.concurrent.RejectedExecutionException: Task org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$52/1374786673@18107b06 rejected from org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@2c993425[Running, pool size = 10, active threads = 10, queued tasks = 0, completed tasks = 0]", "exception":{ "msg":"Task org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$52/1374786673@18107b06 rejected from org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@2c993425[Running, pool size = 10, active threads = 10, queued tasks = 0, completed tasks = 0]", "rspCode":-1}, "status":{ "state":"failed", "msg":"found [1234] in failed tasks"}} Prior to running the command, each shard had two replicas. Now some shards have 4, and some 3. In addition the auto scaling policy of: cluster-policy":[{ "replica":"<2", "shard":"#EACH", "node":"#ANY"}], seems to be ignored as many collections have the same node hosting multiple replicas. Is this related to JIRA: https://issues.apache.org/jira/browse/SOLR-13586 ? Thank you! -Joe
Re: auto scaling question - solr 8.2.0
Just as another data point. I just tried again, and this time, I got an error from one of the remaining 3 nodes: Error while trying to recover. core=UNCLASS_2019_6_8_36_shard2_replica_n21:java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.SolrServerException: IOException occurred when talking to server at: http://telesto:9100/solr at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:902) at org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:603) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:336) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:317) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occurred when talking to server at: http://telesto:9100/solr at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:670) at org.apache.solr.client.solrj.impl.HttpSolrClient.lambda$httpUriRequest$0(HttpSolrClient.java:306) ... 5 more Caused by: java.net.SocketException: Socket closed at java.net.SocketInputStream.read(SocketInputStream.java:204) at java.net.SocketInputStream.read(SocketInputStream.java:141) at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:555) ... 6 more At this point, no nodes are hosting one of the collections. -Joe On 9/26/2019 1:32 PM, Joe Obernberger wrote: Hi all - I have a 4 node cluster for test, and created several solr collections with 2 shards and 2 replicas each. I'd like the global policy to be to not place more than one replica of the same shard on the same node. I did this with this curl command: curl -X POST -H 'Content-type:application/json' --data-binary '{"set-cluster-policy":[{"replica": "<2", "shard": "#EACH", "node": "#ANY"}]}' http://localhost:9100/solr/admin/autoscaling Creating the collections works great - they are distributed across the nodes nicely. When I turn a node off, however, (going from 4 nodes to 3), the same node was assigned to not only be both replicas of a shard, but one node is now hosting all of the replicas of a collection ie: collection->shard1>replica1,replica2 collection->shard2->replica1,replica2
auto scaling question - solr 8.2.0
Hi all - I have a 4 node cluster for test, and created several solr collections with 2 shards and 2 replicas each. I'd like the global policy to be to not place more than one replica of the same shard on the same node. I did this with this curl command: curl -X POST -H 'Content-type:application/json' --data-binary '{"set-cluster-policy":[{"replica": "<2", "shard": "#EACH", "node": "#ANY"}]}' http://localhost:9100/solr/admin/autoscaling Creating the collections works great - they are distributed across the nodes nicely. When I turn a node off, however, (going from 4 nodes to 3), the same node was assigned to not only be both replicas of a shard, but one node is now hosting all of the replicas of a collection ie: collection->shard1>replica1,replica2 collection->shard2->replica1,replica2 all of those replicas above are hosted by the same node. What am I doing wrong here? Thank you! -Joe
HDFS Shard Split
Hi All - added a couple more solr nodes to an existing solr cloud cluster where the index is in HDFS. When I try to a split a shard, I get an error saying there is not enough disk space. It looks like it is looking on the local file system, and not in HDFS. "Operation splitshard casued exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: not enough free disk space to perform index split on node -Joe
Re: Clustering error - Solr 8.2
Mystery solved. I added 'features' to the schema, next error was name, then manu, sku, and cat. These are defined in solrconfig.xml under browse: text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0 name="mlt.fl">text,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename and under the clustering request handler: text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 What's odd is that this doesn't cause an issue with 7.x, but does with 8.2. Removed the fields that my schema doesn't have and clustering works on the fields I have defined for carrot2. -Joe On 8/29/2019 10:39 AM, Jörn Franke wrote: Maybe there are more details in the logfiles? It could be also that a parameter is configured with a different default? Try also to change the Solr version in solrconfig.xml to a higher one, e.g. 8.0.0 Am 29.08.2019 um 16:12 schrieb Joe Obernberger : Thank you Erick. I'm upgrading from 7.6.0 and as far as I can tell the schema and configuration (solrconfig.xml) isn't different (apart from the version). Right now, I'm at a loss. I still have the 7.6.0 cluster running and the query works OK there. Sure seems like I'm missing a field called 'features', but it's not defined in the prior schema either. Thanks again! -Joe On 8/28/2019 6:19 PM, Erick Erickson wrote: What it says ;) My guess is that your configuration mentions the field “features” in, perhaps carrot.snippet or carrot.title. But it’s a guess. Best, Erick On Aug 28, 2019, at 5:18 PM, Joe Obernberger wrote: Hi All - trying to use clustering with SolrCloud 8.2, but getting this error: "msg":"Error from server at null: org.apache.solr.search.SyntaxError: Query Field 'features' is not a valid field name", The URL, I'm using is: http://solrServer:9100/solr/DOCS/select?q=*%3A*=/clustering=true=true <http://cronus:9100/solr/UNCLASS_2018_5_19_184/select?q=*%3A*=/clustering=true=true> Thanks for any ideas! Complete response: { "responseHeader":{ "zkConnected":true, "status":400, "QTime":38, "params":{ "q":"*:*", "qt":"/clustering", "clustering":"true", "clustering.collection":"true"}}, "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException", "error-class","org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException", "root-error-class","org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException"], "msg":"Error from server at null: org.apache.solr.search.SyntaxError: Query Field 'features' is not a valid field name", "code":400}} -Joe --- This email has been checked for viruses by AVG. https://www.avg.com
Re: Clustering error - Solr 8.2
Thank you Erick. I'm upgrading from 7.6.0 and as far as I can tell the schema and configuration (solrconfig.xml) isn't different (apart from the version). Right now, I'm at a loss. I still have the 7.6.0 cluster running and the query works OK there. Sure seems like I'm missing a field called 'features', but it's not defined in the prior schema either. Thanks again! -Joe On 8/28/2019 6:19 PM, Erick Erickson wrote: What it says ;) My guess is that your configuration mentions the field “features” in, perhaps carrot.snippet or carrot.title. But it’s a guess. Best, Erick On Aug 28, 2019, at 5:18 PM, Joe Obernberger wrote: Hi All - trying to use clustering with SolrCloud 8.2, but getting this error: "msg":"Error from server at null: org.apache.solr.search.SyntaxError: Query Field 'features' is not a valid field name", The URL, I'm using is: http://solrServer:9100/solr/DOCS/select?q=*%3A*=/clustering=true=true <http://cronus:9100/solr/UNCLASS_2018_5_19_184/select?q=*%3A*=/clustering=true=true> Thanks for any ideas! Complete response: { "responseHeader":{ "zkConnected":true, "status":400, "QTime":38, "params":{ "q":"*:*", "qt":"/clustering", "clustering":"true", "clustering.collection":"true"}}, "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException", "error-class","org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException", "root-error-class","org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException"], "msg":"Error from server at null: org.apache.solr.search.SyntaxError: Query Field 'features' is not a valid field name", "code":400}} -Joe --- This email has been checked for viruses by AVG. https://www.avg.com
Clustering error - Solr 8.2
Hi All - trying to use clustering with SolrCloud 8.2, but getting this error: "msg":"Error from server at null: org.apache.solr.search.SyntaxError: Query Field 'features' is not a valid field name", The URL, I'm using is: http://solrServer:9100/solr/DOCS/select?q=*%3A*=/clustering=true=true <http://cronus:9100/solr/UNCLASS_2018_5_19_184/select?q=*%3A*=/clustering=true=true> Thanks for any ideas! Complete response: { "responseHeader":{ "zkConnected":true, "status":400, "QTime":38, "params":{ "q":"*:*", "qt":"/clustering", "clustering":"true", "clustering.collection":"true"}}, "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException", "error-class","org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException", "root-error-class","org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException"], "msg":"Error from server at null: org.apache.solr.search.SyntaxError: Query Field 'features' is not a valid field name", "code":400}} -Joe
Re: Solr on HDFS
Hi Kyle - Thank you. Our current index is split across 3 solr collections; our largest collection is 26.8TBytes (80.5TBytes when 3x replicated in HDFS) across 100 shards. There are 40 machines hosting this cluster. We've found that when dealing with large collections having no replicas (but lots of shards) ends up being more reliable since there is a much smaller recovery time. We keep another 30 day index (1.4TBytes) that does have replicas (40 shards, 3 replicas each), and if a node goes down, we manually delete lock files and then bring it back up and yes - lots of network IO, but it usually recovers OK. Having a large collection like this with no replicas seems like a recipe for disaster. So, we've been experimenting with the latest version (8.2) and our index process to split up the data into many solr collections that do have replicas, and then build the list of collections to search at query time. Our searches are date based, so we can define what collections we want to query at query time. As a test, we ran just two machines, HDFS, and 500 collections. One server ran out of memory and crashed. We had over 1,600 lock files to delete. If you think about it, having a shard with 3 replicas on top of a file system that does 3x replication seems a little excessive! I'd love to see Solr take more advantage of a shared FS. Perhaps an idea is to use HDFS but with an NFS gateway. Seems like that may be slow. Architecturally, I love only having one large file system to manage instead of lots of individual file systems across many machines. HDFS makes this easy. -Joe On 8/2/2019 9:10 AM, lstusr 5u93n4 wrote: Hi Joe, We fought with Solr on HDFS for quite some time, and faced similar issues as you're seeing. (See this thread, for example:" http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201812.mbox/%3cCABd9LjTeacXpy3FFjFBkzMq6vhgu7Ptyh96+w-KC2p=-rqk...@mail.gmail.com%3e ) The Solr lock files on HDFS get deleted if the Solr server gets shut down gracefully, but we couldn't always guarantee that in our environment so we ended up writing a custom startup script to search for lock files on HDFS and delete them before solr startup. However, the issue that you mention of the Solr server rebuilding its whole index from replicas on startup was enough of a show-stopper for us that we switched away from HDFS to local disk. It literally made the difference between 24+ hours of recovery time after an unexpected outage to less than a minute... If you do end up finding a solution to this issue, please post it to this mailing list, because there are others out there (like us!) who would most definitely make use it. Thanks Kyle On Fri, 2 Aug 2019 at 08:58, Joe Obernberger wrote: Thank you. No, while the cluster is using Cloudera for HDFS, we do not use Cloudera to manager the solr cluster. If it is a configuration/architecture issue, what can I do to fix it? I'd like a system where servers can come and go, but the indexes stay available and recover automatically. Is that possible with HDFS? While adding an alias to other collections would be an option, if that collection is the only collection, or one that is currently needed, in a live system, we can't bring it down, re-create it, and re-index when that process may take weeks to do. Any ideas? -Joe On 8/1/2019 6:15 PM, Angie Rabelero wrote: I don’t think you’re using claudera or ambari, but ambari has an option to delete the locks. This seems more a configuration/architecture isssue than a realibility issue. You may want to spin up an alias while you bring down, clear locks and directories, recreate and index the affected collection, while you work your other isues. On Aug 1, 2019, at 16:40, Joe Obernberger wrote: Been using Solr on HDFS for a while now, and I'm seeing an issue with redundancy/reliability. If a server goes down, when it comes back up, it will never recover because of the lock files in HDFS. That solr node needs to be brought down manually, the lock files deleted, and then brought back up. At that point, it appears to copy all the data for its replicas. If the index is large, and new data is being indexed, in some cases it will never recover. The replication retries over and over. How can we make a reliable Solr Cloud cluster when using HDFS that can handle servers coming and going? Thank you! -Joe --- This email has been checked for viruses by AVG. https://www.avg.com
Re: Solr on HDFS
Thank you. No, while the cluster is using Cloudera for HDFS, we do not use Cloudera to manager the solr cluster. If it is a configuration/architecture issue, what can I do to fix it? I'd like a system where servers can come and go, but the indexes stay available and recover automatically. Is that possible with HDFS? While adding an alias to other collections would be an option, if that collection is the only collection, or one that is currently needed, in a live system, we can't bring it down, re-create it, and re-index when that process may take weeks to do. Any ideas? -Joe On 8/1/2019 6:15 PM, Angie Rabelero wrote: I don’t think you’re using claudera or ambari, but ambari has an option to delete the locks. This seems more a configuration/architecture isssue than a realibility issue. You may want to spin up an alias while you bring down, clear locks and directories, recreate and index the affected collection, while you work your other isues. On Aug 1, 2019, at 16:40, Joe Obernberger wrote: Been using Solr on HDFS for a while now, and I'm seeing an issue with redundancy/reliability. If a server goes down, when it comes back up, it will never recover because of the lock files in HDFS. That solr node needs to be brought down manually, the lock files deleted, and then brought back up. At that point, it appears to copy all the data for its replicas. If the index is large, and new data is being indexed, in some cases it will never recover. The replication retries over and over. How can we make a reliable Solr Cloud cluster when using HDFS that can handle servers coming and going? Thank you! -Joe --- This email has been checked for viruses by AVG. https://www.avg.com
Solr on HDFS
Been using Solr on HDFS for a while now, and I'm seeing an issue with redundancy/reliability. If a server goes down, when it comes back up, it will never recover because of the lock files in HDFS. That solr node needs to be brought down manually, the lock files deleted, and then brought back up. At that point, it appears to copy all the data for its replicas. If the index is large, and new data is being indexed, in some cases it will never recover. The replication retries over and over. How can we make a reliable Solr Cloud cluster when using HDFS that can handle servers coming and going? Thank you! -Joe
Solrj + Aliases
Hi All - I've created an alias, but when I try to index to the alias using CloudSolrClient, I get 'Collection not Found: TestAlias'. Can you not use an alias name to index to with CloudSolrClient? This is with SolrCloud 8.1. Thanks! -Joe
ShardSplit with HDFS
Hi All - I'm running an index in HDFS and trying to do a SHARDSPLIT. It is returning that there is "not enough free disk space to perform index split". It looks like it is using the local disk to determine free disk space instead of HDFS. Is there a way around this? I'm running SolrCloud version 7.6.0 on 4 nodes. Thank you! -Joe Obernberger
Re: Solr Migration to The AWS Cloud
Ooohh...interesting. Then, presumably there is some way to have what was the cross-data-center replica become the new "primary"? It's getting too easy! Joe -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Solr Migration to The AWS Cloud
Hi, Our application is migrating from on-premise to AWS. We are currently on Solr Cloud 7.3.0. We are interested in exploring ways to do this with minimal, down-time, as in, maybe one hour. One strategy would be to set up a new empty Solr Cloud instance in AWS, and reindex the world. But reindexing takes us around ~14 hours, so, that is not a viable approach. I think one very attractive option would be to set up a new live node/replica in AWS, and, once it replicates, we're essentially done--literally zero down time (for search anyway). But I don't think we're going to be able to do that from a networking/security perspective. >From what I've seen, the other option is to copy the Solr index files to AWS, and somehow use them to set up a new pre-indexed instance. Do I need to shut down my application and Solr on prem before I copy the files, or can I copy while things are active. If I can do the copy while the application is running, I can probably: 1. Copy files to AWS Friday at noon 2. Keep a record of what got re-indexed after Friday at noon (or, heck, 11:45am) 3. Start up the new Solr in AWS against the copied files 4. Reindex the stuff that got re-indexed after Friday at noon Is there a cleaner/simpler/more official way of moving an index from what place to another? Export/import, or something like that? Thanks for any help! Joe -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Newbie permissions problem running solr
One day I will learn to type. In the meanwhile the command, as root, is chown -R solr:users solr. That means creating that username if it is not present. Thanks, Joe D. On 30/05/2019 20:12, Joe Doupnik wrote: On 30/05/2019 20:04, Bernard T. Higonnet wrote: Hello, I have installed solr from ports under FreeBSD 12.0 and I am trying to run solr as described in the Solr Quick Start tutorial. I keep getting permission errors: /usr/local/solr/example/cloud/node2/solr/../logs could not be created. Exiting Apart from the fact that I find it bizarre that it doesn't put its logs in some 'standard' writable place, the ".." perturbs me. Does it mean there's stuff there which I don't know what it is (but it doesn't want to tell me?). He knows how to write long messages so what's the problem? I have tried making various places writable, but clearly I don't know what the ".." means... Any help appreciated. TIA Bernard Higonnet --- In my own work, now and then I encounter exactly that problem. I then recall that the Solr material expects to be owned by user solr, and group users on Linux. Thus a chmod -R solr:users solr command would take care of the problem. Thanks, Joe D.
Re: Newbie permissions problem running solr
On 30/05/2019 20:04, Bernard T. Higonnet wrote: Hello, I have installed solr from ports under FreeBSD 12.0 and I am trying to run solr as described in the Solr Quick Start tutorial. I keep getting permission errors: /usr/local/solr/example/cloud/node2/solr/../logs could not be created. Exiting Apart from the fact that I find it bizarre that it doesn't put its logs in some 'standard' writable place, the ".." perturbs me. Does it mean there's stuff there which I don't know what it is (but it doesn't want to tell me?). He knows how to write long messages so what's the problem? I have tried making various places writable, but clearly I don't know what the ".." means... Any help appreciated. TIA Bernard Higonnet --- In my own work, now and then I encounter exactly that problem. I then recall that the Solr material expects to be owned by user solr, and group users on Linux. Thus a chmod -R solr:users solr command would take care of the problem. Thanks, Joe D.
Re: Solr 7.6.0 - won't elect leader
Thank you Walter. I ended up dropping the collection. We have two primary collections - one is all the data (100 shards, no replicas), and one is 30 days of data (40 shards, 3 replicas each). We hardly ever have any issues with the collection with no replicas. I tried bringing down the nodes several times. I then updated the zookeeper node and put the necessary information into it with a leader selected. Then I restarted the nodes again - no luck. -Joe On 5/30/2019 10:42 AM, Walter Underwood wrote: We had a 6.6.2 prod cluster get into a state like this. It did not have an overseer, so any command just sat in the overseer queue. After I figured that out, I could see a bunch of queued stuff in the tree view under /overseer. That included an ADDROLE command to set an overseer. Sigh. Fixed it by shutting down all the nodes, then bringing up one. That one realized there was no overseer and assumed the role. Then we brought up the rest of the nodes. I do not know how it got into that situation. We had some messed up networking conditions where I could HTTP from node A to port 8983 on node B, but it would hang when I tried that from B to A. This is all in AWS. Yours might be different. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 30, 2019, at 5:47 AM, Joe Obernberger wrote: More info - looks like a zookeeper node got deleted somehow. NoNode for /collections/UNCLASS_30DAYS/leaders/shard31/leader I then made that node using solr zk mkroot, and now I get the error: :org.apache.solr.common.SolrException: Error getting leader from zk for shard shard31 at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:1299) at org.apache.solr.cloud.ZkController.register(ZkController.java:1150) at org.apache.solr.cloud.ZkController.register(ZkController.java:1081) at org.apache.solr.core.ZkContainer.lambda$registerInZk$0(ZkContainer.java:187) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.solr.common.SolrException: Could not get leader props at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1346) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1310) at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:1266) ... 7 more Caused by: java.lang.NullPointerException at org.apache.solr.common.util.Utils.fromJSON(Utils.java:239) at org.apache.solr.common.cloud.ZkNodeProps.load(ZkNodeProps.java:92) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1328) ... 9 more Can I manually enter information for the leader? How would I get that? -Joe On 5/30/2019 8:39 AM, Joe Obernberger wrote: Hi All - I have a 40 node cluster that has been running great for a long while, but it all came down due to OOM. I adjusted the parameters and restarted, but one shard with 3 replicas (all NRT) will not elect a leader. I see messages like: 2019-05-30 12:35:30.597 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.SyncStrategy Sync replicas to http://elara:9100/solr/UNCLASS_30DAYS_shard31_replica_n182/ 2019-05-30 12:35:30.597 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr START replicas=[http://enceladus:9100/solr/UNCLASS_30DAYS_shard31_replica_n180/, http://rosalind:9100/solr/UNCLASS_30DAYS_shard31_replica_n184/] nUpdates=100 2019-05-30 12:35:30.651 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr Received 100 versions from http://enceladus:9100/solr/UNCLASS_30DAYS_shard31_replica_n180/ fingerprint:null 2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr Our versions are too old. ourHighThreshold=1634891841359839232 otherLowThreshold=1634892098551414784 ourHighest=1634892003501146112 otherHighest=1634892708023631872 2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr DONE. sync failed 2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS
Re: Solr 7.6.0 - won't elect leader
More info - looks like a zookeeper node got deleted somehow. NoNode for /collections/UNCLASS_30DAYS/leaders/shard31/leader I then made that node using solr zk mkroot, and now I get the error: :org.apache.solr.common.SolrException: Error getting leader from zk for shard shard31 at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:1299) at org.apache.solr.cloud.ZkController.register(ZkController.java:1150) at org.apache.solr.cloud.ZkController.register(ZkController.java:1081) at org.apache.solr.core.ZkContainer.lambda$registerInZk$0(ZkContainer.java:187) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.solr.common.SolrException: Could not get leader props at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1346) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1310) at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:1266) ... 7 more Caused by: java.lang.NullPointerException at org.apache.solr.common.util.Utils.fromJSON(Utils.java:239) at org.apache.solr.common.cloud.ZkNodeProps.load(ZkNodeProps.java:92) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1328) ... 9 more Can I manually enter information for the leader? How would I get that? -Joe On 5/30/2019 8:39 AM, Joe Obernberger wrote: Hi All - I have a 40 node cluster that has been running great for a long while, but it all came down due to OOM. I adjusted the parameters and restarted, but one shard with 3 replicas (all NRT) will not elect a leader. I see messages like: 2019-05-30 12:35:30.597 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.SyncStrategy Sync replicas to http://elara:9100/solr/UNCLASS_30DAYS_shard31_replica_n182/ 2019-05-30 12:35:30.597 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr START replicas=[http://enceladus:9100/solr/UNCLASS_30DAYS_shard31_replica_n180/, http://rosalind:9100/solr/UNCLASS_30DAYS_shard31_replica_n184/] nUpdates=100 2019-05-30 12:35:30.651 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr Received 100 versions from http://enceladus:9100/solr/UNCLASS_30DAYS_shard31_replica_n180/ fingerprint:null 2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr Our versions are too old. ourHighThreshold=1634891841359839232 otherLowThreshold=1634892098551414784 ourHighest=1634892003501146112 otherHighest=1634892708023631872 2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr DONE. sync failed 2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.SyncStrategy Leader's attempt to sync with shard failed, moving to the next candidate 2019-05-30 12:35:30.683 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.ShardLeaderElectionContext There may be a better leader candidate than us - going back into recovery 2019-05-30 12:35:30.693 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral leader parent node, won't remove previous leader registration. 2019-05-30 12:35:30.694 WARN (updateExecutor-3-thread-4-processing-n:elara:9100_solr x:UNCLASS_30DAYS_shard31_replica_n182 c:UNCLASS_30DAYS s:shard31 r:core_node185) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.RecoveryStrategy Stopping recovery for core=[UNCLASS_30DAYS_shard31_replica_n182] coreNodeName=[core_node185] and 2019-05-30 12:25:39.522 INFO (zkCallback-7-thread-1) [c:UNCLASS_30DAYS s:shard31 r:core_node187 x:UNCLASS_30DAYS_shard31_replica_n184] o.a.s.c.ActionThrottle Throttling leader attempts - waiting for 136ms 2019-05-30 12:25:39.672 INFO (zkCallback-7-thread-1) [c:UNCLASS_30DAYS s:shard31 r:core_node187
Solr 7.6.0 - won't elect leader
Hi All - I have a 40 node cluster that has been running great for a long while, but it all came down due to OOM. I adjusted the parameters and restarted, but one shard with 3 replicas (all NRT) will not elect a leader. I see messages like: 2019-05-30 12:35:30.597 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.SyncStrategy Sync replicas to http://elara:9100/solr/UNCLASS_30DAYS_shard31_replica_n182/ 2019-05-30 12:35:30.597 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr START replicas=[http://enceladus:9100/solr/UNCLASS_30DAYS_shard31_replica_n180/, http://rosalind:9100/solr/UNCLASS_30DAYS_shard31_replica_n184/] nUpdates=100 2019-05-30 12:35:30.651 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr Received 100 versions from http://enceladus:9100/solr/UNCLASS_30DAYS_shard31_replica_n180/ fingerprint:null 2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr Our versions are too old. ourHighThreshold=1634891841359839232 otherLowThreshold=1634892098551414784 ourHighest=1634892003501146112 otherHighest=1634892708023631872 2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.u.PeerSync PeerSync: core=UNCLASS_30DAYS_shard31_replica_n182 url=http://elara:9100/solr DONE. sync failed 2019-05-30 12:35:30.652 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.SyncStrategy Leader's attempt to sync with shard failed, moving to the next candidate 2019-05-30 12:35:30.683 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.ShardLeaderElectionContext There may be a better leader candidate than us - going back into recovery 2019-05-30 12:35:30.693 INFO (zkCallback-7-thread-3) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral leader parent node, won't remove previous leader registration. 2019-05-30 12:35:30.694 WARN (updateExecutor-3-thread-4-processing-n:elara:9100_solr x:UNCLASS_30DAYS_shard31_replica_n182 c:UNCLASS_30DAYS s:shard31 r:core_node185) [c:UNCLASS_30DAYS s:shard31 r:core_node185 x:UNCLASS_30DAYS_shard31_replica_n182] o.a.s.c.RecoveryStrategy Stopping recovery for core=[UNCLASS_30DAYS_shard31_replica_n182] coreNodeName=[core_node185] and 2019-05-30 12:25:39.522 INFO (zkCallback-7-thread-1) [c:UNCLASS_30DAYS s:shard31 r:core_node187 x:UNCLASS_30DAYS_shard31_replica_n184] o.a.s.c.ActionThrottle Throttling leader attempts - waiting for 136ms 2019-05-30 12:25:39.672 INFO (zkCallback-7-thread-1) [c:UNCLASS_30DAYS s:shard31 r:core_node187 x:UNCLASS_30DAYS_shard31_replica_n184] o.a.s.c.ShardLeaderElectionContext Can't become leader, other replicas with higher term participated in leader election 2019-05-30 12:25:39.672 INFO (zkCallback-7-thread-1) [c:UNCLASS_30DAYS s:shard31 r:core_node187 x:UNCLASS_30DAYS_shard31_replica_n184] o.a.s.c.ShardLeaderElectionContext There may be a better leader candidate than us - going back into recovery 2019-05-30 12:25:39.677 INFO (zkCallback-7-thread-1) [c:UNCLASS_30DAYS s:shard31 r:core_node187 x:UNCLASS_30DAYS_shard31_replica_n184] o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral leader parent node, won't remove previous leader registration. and 2019-05-30 12:26:39.820 INFO (zkCallback-7-thread-5) [c:UNCLASS_30DAYS s:shard31 r:core_node183 x:UNCLASS_30DAYS_shard31_replica_n180] o.a.s.c.ShardLeaderElectionContext Can't become leader, other replicas with higher term participated in leader election 2019-05-30 12:26:39.820 INFO (zkCallback-7-thread-5) [c:UNCLASS_30DAYS s:shard31 r:core_node183 x:UNCLASS_30DAYS_shard31_replica_n180] o.a.s.c.ShardLeaderElectionContext There may be a better leader candidate than us - going back into recovery 2019-05-30 12:26:39.826 INFO (zkCallback-7-thread-5) [c:UNCLASS_30DAYS s:shard31 r:core_node183 x:UNCLASS_30DAYS_shard31_replica_n180] o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral leader parent node, won't remove previous leader registration. I've tried FORCELEADER, but it had no effect. I also tried adding a shard, but that one didn't come up either. The index is on HDFS. Help! -Joe
Re: Solr-8.1.0 uses much more memory
An interesting supplement to this discussion. The experiment this time was use Solr v8.1, omit the GC_TUNE items, but instead adjust SOLR_HEAP. I had set the heap to 4GB, based on good intentions, and as we have seen Solr v8.1 gobbles it up and does not return a farthing. Thus I tried indexing a large (2600 docs) collection of .pdfs, .ppt, etc files, but with the heap size gradually reduced from 4GB to 1GB. That worked smoothly, and while indexing Solr is consuming about 1.5/1.6GB and working hard. So, if a little is good then less must be better, yes? 512MB is too little and Solr barely starts and then shuts down. 1GB seems to be a safe value for the heap, and no GC_TUNE settings. This is true on my machines for both Oracle jdk 1.8 and openjdk 10. In passing, recommendations on the net suggest watching the action via jconsole (in the Oracle jdk bundle and in the openjdk material). Well, it has pretty pictures and many numbers which are far far away from the basic values we see with top and ps aux | grep solr. Not useful, even less believable if one asks my simple consumption question. So then, this leaves us with the usual question of just how much heap space does a Java app require. The answer seems to be no one really knows, only experiments will reveal practical values. Thus we choose a heap value tested to be safe and observe the persisting use of that value until Solr is restarted and then consumes a smaller amount sufficient for answering queries rather than indexing files. If the openjdk folks get their reduction work (below) into our hands then idle memory may shrink further. In closing, Solr v8.1 has one very nice advantage over its predecessors: indexing speed, about double that of v8.0. Thanks, Joe D. On 27/05/2019 18:38, Joe Doupnik wrote: An interesting note on the memory returning issue for the G1 collector. https://openjdk.java.net/jeps/346 Entitled "JEP 346: Promptly Return Unused Committed Memory from G1" with a summary saying "Enhance the G1 garbage collector to automatically return Java heap memory to the operating system when idle." It goes on to say the following, and more: "Motivation Currently the G1 garbage collector may not return committed Java heap memory to the operating system in a timely manner. G1 only returns memory from the Java heap at either a full GC or during a concurrent cycle. Since G1 tries hard to completely avoid full GCs, and only triggers a concurrent cycle based on Java heap occupancy and allocation activity, it will not return Java heap memory in many cases unless forced to do so externally. This behavior is particularly disadvantageous in container environments where resources are paid by use. Even during phases where the VM only uses a fraction of its assigned memory resources due to inactivity, G1 will retain all of the Java heap. This results in customers paying for all resources all the time, and cloud providers not being able to fully utilize their hardware. If the VM were able to detect phases of Java heap under-utilization ("idle" phases), and automatically reduce its heap usage during that time, both would benefit. Shenandoah and OpenJ9's GenCon collector already provide similar functionality. Tests with a prototype in Bruno et al., section 5.5, shows that based on the real-world utilization of a Tomcat server that serves HTTP requests during the day, and is mostly idle during the night, this solution can reduce the amount of memory committed by the Java VM by 85%." Please read the full web page to have a rounded view of that discussion. Thanks, Joe D. On 27/05/2019 18:17, Joe Doupnik wrote: My comments are inserted in-line this time. Thanks for the amplifications Shawn. On 27/05/2019 17:39, Shawn Heisey wrote: On 5/27/2019 9:49 AM, Joe Doupnik wrote: A few more numbers to contemplate. An experiment here, adding 80 PDF and PPTX files into an empty index. Solr v8.0 regular settings, 1.7GB quiesent memory consumption, 1.9GB while indexing, 2.92 minutes to do the job. Solr v8.0, using GC_TUNE from v8.1 solr.in.sh, 1.1GB quiesent, 1.3GB while indexing, 2.97 minutes. Solr v8.1, regular settings, 4.3GB quiesent, 4.4GB while indexing, 1.67 minutes Solr v8.1, using GC_TUNE from v8.1 solr.in.sh, 1.0GB quiesent, 1.3GB while indexing, 1.53 minutes It is clear that the GC_TUNE settings from v8.1 are beneficial to v8.0, saving about 600MB of memory. That's not small change. Well, the numbers observed here tell a slightly different story: TUNEing can help Solr v8.0. Confirmatory values from other folks would be good to have. The memory concerned is what is taken from the system as real memory, and the rest of the system is directly affected by that. Java can subdivide its part as it wishes. Yes, the TUNE values were from Solr v8.1. To me that says those values are late
Re: Solr-8.1.0 uses much more memory
An interesting note on the memory returning issue for the G1 collector. https://openjdk.java.net/jeps/346 Entitled "JEP 346: Promptly Return Unused Committed Memory from G1" with a summary saying "Enhance the G1 garbage collector to automatically return Java heap memory to the operating system when idle." It goes on to say the following, and more: "Motivation Currently the G1 garbage collector may not return committed Java heap memory to the operating system in a timely manner. G1 only returns memory from the Java heap at either a full GC or during a concurrent cycle. Since G1 tries hard to completely avoid full GCs, and only triggers a concurrent cycle based on Java heap occupancy and allocation activity, it will not return Java heap memory in many cases unless forced to do so externally. This behavior is particularly disadvantageous in container environments where resources are paid by use. Even during phases where the VM only uses a fraction of its assigned memory resources due to inactivity, G1 will retain all of the Java heap. This results in customers paying for all resources all the time, and cloud providers not being able to fully utilize their hardware. If the VM were able to detect phases of Java heap under-utilization ("idle" phases), and automatically reduce its heap usage during that time, both would benefit. Shenandoah and OpenJ9's GenCon collector already provide similar functionality. Tests with a prototype in Bruno et al., section 5.5, shows that based on the real-world utilization of a Tomcat server that serves HTTP requests during the day, and is mostly idle during the night, this solution can reduce the amount of memory committed by the Java VM by 85%." Please read the full web page to have a rounded view of that discussion. Thanks, Joe D. On 27/05/2019 18:17, Joe Doupnik wrote: My comments are inserted in-line this time. Thanks for the amplifications Shawn. On 27/05/2019 17:39, Shawn Heisey wrote: On 5/27/2019 9:49 AM, Joe Doupnik wrote: A few more numbers to contemplate. An experiment here, adding 80 PDF and PPTX files into an empty index. Solr v8.0 regular settings, 1.7GB quiesent memory consumption, 1.9GB while indexing, 2.92 minutes to do the job. Solr v8.0, using GC_TUNE from v8.1 solr.in.sh, 1.1GB quiesent, 1.3GB while indexing, 2.97 minutes. Solr v8.1, regular settings, 4.3GB quiesent, 4.4GB while indexing, 1.67 minutes Solr v8.1, using GC_TUNE from v8.1 solr.in.sh, 1.0GB quiesent, 1.3GB while indexing, 1.53 minutes It is clear that the GC_TUNE settings from v8.1 are beneficial to v8.0, saving about 600MB of memory. That's not small change. Well, the numbers observed here tell a slightly different story: TUNEing can help Solr v8.0. Confirmatory values from other folks would be good to have. The memory concerned is what is taken from the system as real memory, and the rest of the system is directly affected by that. Java can subdivide its part as it wishes. Yes, the TUNE values were from Solr v8.1. To me that says those values are late arriving for v8.0 and prior, but we have them now and can use them to save system resources. Also, it means that Solr v8.1's GC1 needs more baking time; the new GC is not quite ready for normal production work (to put it mildly). GC tuning will not change the amount of memory the program needs. It *can't* change it. All it can do is affect how the garbage collector works. Different collectors can result in differences in how much memory an outside observer will see allocated, because one may be more aggressive about early collection than the other, but the amount of heap actually required by the program will not change. The commented out GC_TUNE settings in the 8.1 "bin/solr.in.sh" file are the old CMS settings that earlier versions of Solr used. When you tell a Java program that it is allowed to use 4GB of memory, it's going to use that memory. Eventually. Maybe not in three minutes, but eventually. Even the settings that you are seeing use less memory WILL eventually use all of it that they have been allowed. That is the nature of Java. Data here says there is a quiesent consumption value, a higher one during intensive indexing, and a smaller one during routine query handling. The point is the consumption peaks go away, memory is returned to the system. That's what garbage collection is all about. Also clear is that Solr v8.1 is slightly faster than v8.0 when both use those TUNE values. A hidden benefit. Without GC_TUNE settings Solr v8.1 shows its appetite for much memory, several GB's more than v8.0. The CMS collector will be removed from Java at some point in the future. We can't use it any more. Meanwhile we in the field can improve our current systems with the TUNE settings. Solr v8.1 isn't ready yet for that workload, in my op
Re: Solr-8.1.0 uses much more memory
My comments are inserted in-line this time. Thanks for the amplifications Shawn. On 27/05/2019 17:39, Shawn Heisey wrote: On 5/27/2019 9:49 AM, Joe Doupnik wrote: A few more numbers to contemplate. An experiment here, adding 80 PDF and PPTX files into an empty index. Solr v8.0 regular settings, 1.7GB quiesent memory consumption, 1.9GB while indexing, 2.92 minutes to do the job. Solr v8.0, using GC_TUNE from v8.1 solr.in.sh, 1.1GB quiesent, 1.3GB while indexing, 2.97 minutes. Solr v8.1, regular settings, 4.3GB quiesent, 4.4GB while indexing, 1.67 minutes Solr v8.1, using GC_TUNE from v8.1 solr.in.sh, 1.0GB quiesent, 1.3GB while indexing, 1.53 minutes It is clear that the GC_TUNE settings from v8.1 are beneficial to v8.0, saving about 600MB of memory. That's not small change. Well, the numbers observed here tell a slightly different story: TUNEing can help Solr v8.0. Confirmatory values from other folks would be good to have. The memory concerned is what is taken from the system as real memory, and the rest of the system is directly affected by that. Java can subdivide its part as it wishes. Yes, the TUNE values were from Solr v8.1. To me that says those values are late arriving for v8.0 and prior, but we have them now and can use them to save system resources. Also, it means that Solr v8.1's GC1 needs more baking time; the new GC is not quite ready for normal production work (to put it mildly). GC tuning will not change the amount of memory the program needs. It *can't* change it. All it can do is affect how the garbage collector works. Different collectors can result in differences in how much memory an outside observer will see allocated, because one may be more aggressive about early collection than the other, but the amount of heap actually required by the program will not change. The commented out GC_TUNE settings in the 8.1 "bin/solr.in.sh" file are the old CMS settings that earlier versions of Solr used. When you tell a Java program that it is allowed to use 4GB of memory, it's going to use that memory. Eventually. Maybe not in three minutes, but eventually. Even the settings that you are seeing use less memory WILL eventually use all of it that they have been allowed. That is the nature of Java. Data here says there is a quiesent consumption value, a higher one during intensive indexing, and a smaller one during routine query handling. The point is the consumption peaks go away, memory is returned to the system. That's what garbage collection is all about. Also clear is that Solr v8.1 is slightly faster than v8.0 when both use those TUNE values. A hidden benefit. Without GC_TUNE settings Solr v8.1 shows its appetite for much memory, several GB's more than v8.0. The CMS collector will be removed from Java at some point in the future. We can't use it any more. Meanwhile we in the field can improve our current systems with the TUNE settings. Solr v8.1 isn't ready yet for that workload, in my opinion. The latency discussion below is in need of hard experimental evidence. That does not mean your analysis is incorrect, but rather we simply don't know and ought not make decisions based on such assumptions. I look forward to seeing decent test results. Thanks, Joe D. When you note that for a given sequential process, certain settings accomplishing that process faster, that's a measure of throughput -- how much data is pushed through in a given timeframe. We really don't care about that metric for Solr. We care about latency. Let's say that setting 1 produces a typical processing time per request of 90 milliseconds, and setting 2 produces a typical processing time per request of 100 milliseconds. You might think setting 1 is better. But what if 1 percent of the requests with setting 1 take ten seconds, and EVERY request with setting 2 takes 120 milliseconds or less? As a project, we are going to prefer setting 2. That's not a theoretical situation -- it's how things really work out with different garbage collectors, and it's why Solr has the default settings that it does. Shawn
Re: Solr-8.1.0 uses much more memory
A few more numbers to contemplate. An experiment here, adding 80 PDF and PPTX files into an empty index. Solr v8.0 regular settings, 1.7GB quiesent memory consumption, 1.9GB while indexing, 2.92 minutes to do the job. Solr v8.0, using GC_TUNE from v8.1 solr.in.sh, 1.1GB quiesent, 1.3GB while indexing, 2.97 minutes. Solr v8.1, regular settings, 4.3GB quiesent, 4.4GB while indexing, 1.67 minutes Solr v8.1, using GC_TUNE from v8.1 solr.in.sh, 1.0GB quiesent, 1.3GB while indexing, 1.53 minutes It is clear that the GC_TUNE settings from v8.1 are beneficial to v8.0, saving about 600MB of memory. That's not small change. Also clear is that Solr v8.1 is slightly faster than v8.0 when both use those TUNE values. A hidden benefit. Without GC_TUNE settings Solr v8.1 shows its appetite for much memory, several GB's more than v8.0. Because those TUNE settings can make an improvment to Solr v8.0 it would be beneficial to have the documentation discuss that usage. Meanwhile, the memory consumption problem remains as discussed. On the overfeeding part of things. The classical approach is pipeline the work and between each stage have a go/stop sign to throttle traffic (a road crossing lollypop lady, if you like). Such signs could be set when a regional thread consumption is reached, or similar resource limit encountered. This permits one stage to stop listening while the work continues within it and many other stages, and then the sign changes to go and the regional flow resumes. We see this in common road/people traffic situations etc every day. It's nicely asynchronous and does not need a complicated (nor any) master controller. The key is have limits based on sound engineering criteria, and yes, that might mean having a few sets of them for different operating situations and the customer chooses appropriately. Thanks, Joe D. On 27/05/2019 11:05, Joe Doupnik wrote: You are certainly correct about using external load balancers when appropriate. However, a basic problem with servers, that of accepting more incoming items than can be handled gracefully is as we know an age-old one and solved by back pressure methods (particularly hard limits). My experience with Solr suggests that parts (say Tika) are being too nice to incoming material, letting too many items enter the application, consume resources, and so forth which then become awkward to handle (see the locks item discussion cited earlier). Entry ought to be blocked until the processing structure declares that resources are available to accept new entries (a full but not overfull pipeline). Those internal issues, locks, memory and similar, are resolvable when limits are imposed. Also, with limits then your mentioned load balancers stand a chance of sensing when a particular server is currently not accepting new requests. Establishing limits does take some creative thinking about how the system as a whole is constructed. I brought up the overload case because it pertains to this main memory management thread. Thanks, Joe D. On 27/05/2019 10:21, Bernd Fehling wrote: I think it is not fair blaiming Solr not also having a load balancer. It is up to you and your needs to set up the required infrastucture including load balancing. The are many products available on the market. If your current system can't handle all requests then install more replicas. Regards Bernd Am 27.05.19 um 10:33 schrieb Joe Doupnik: While on the topic of resource consumption and locks etc, there is one other aspect to which Solr has been vulnerable. It is failing to fend off too many requests at one time. The standard approach is, of course, named back pressure, such as not replying to a query until resources permit and thus keeping competion outside of the application. That limits resource consumption, including locks, memory and sundry, while permiting normal work within to progress smoothly. Let the crowds coming to a hit show queue in the rain outside the theatre until empty seats become available. On 27/05/2019 08:52, Joe Doupnik wrote: Generalizations tend to fail when confronted with conflicting evidence. The simple evidence is asking how much real memory the Solr owned process has been allocated (top, or ps aux or similar) and that yields two very different values (the ~1.6GB of Solr v8.0 and 4.5+GB of Solr v8.1). I have no knowledge of how Java chooses to name its usage (heap or otherwise). Prior to v8.1 Solr memory consumption varied with activity, thus memory management was occuring, memory was borrowed from and returned to the system. What might be happening in Solr v8.1 is the new memory management code is failing to do a proper job, for reasons which are not visible to us in the field, and that failure is important to us. In regard to the referenced lock discussion, it would be a good idea to not let the tail wag the dog, tend the common cases and live
Re: Solr-8.1.0 uses much more memory
You are certainly correct about using external load balancers when appropriate. However, a basic problem with servers, that of accepting more incoming items than can be handled gracefully is as we know an age-old one and solved by back pressure methods (particularly hard limits). My experience with Solr suggests that parts (say Tika) are being too nice to incoming material, letting too many items enter the application, consume resources, and so forth which then become awkward to handle (see the locks item discussion cited earlier). Entry ought to be blocked until the processing structure declares that resources are available to accept new entries (a full but not overfull pipeline). Those internal issues, locks, memory and similar, are resolvable when limits are imposed. Also, with limits then your mentioned load balancers stand a chance of sensing when a particular server is currently not accepting new requests. Establishing limits does take some creative thinking about how the system as a whole is constructed. I brought up the overload case because it pertains to this main memory management thread. Thanks, Joe D. On 27/05/2019 10:21, Bernd Fehling wrote: I think it is not fair blaiming Solr not also having a load balancer. It is up to you and your needs to set up the required infrastucture including load balancing. The are many products available on the market. If your current system can't handle all requests then install more replicas. Regards Bernd Am 27.05.19 um 10:33 schrieb Joe Doupnik: While on the topic of resource consumption and locks etc, there is one other aspect to which Solr has been vulnerable. It is failing to fend off too many requests at one time. The standard approach is, of course, named back pressure, such as not replying to a query until resources permit and thus keeping competion outside of the application. That limits resource consumption, including locks, memory and sundry, while permiting normal work within to progress smoothly. Let the crowds coming to a hit show queue in the rain outside the theatre until empty seats become available. On 27/05/2019 08:52, Joe Doupnik wrote: Generalizations tend to fail when confronted with conflicting evidence. The simple evidence is asking how much real memory the Solr owned process has been allocated (top, or ps aux or similar) and that yields two very different values (the ~1.6GB of Solr v8.0 and 4.5+GB of Solr v8.1). I have no knowledge of how Java chooses to name its usage (heap or otherwise). Prior to v8.1 Solr memory consumption varied with activity, thus memory management was occuring, memory was borrowed from and returned to the system. What might be happening in Solr v8.1 is the new memory management code is failing to do a proper job, for reasons which are not visible to us in the field, and that failure is important to us. In regard to the referenced lock discussion, it would be a good idea to not let the tail wag the dog, tend the common cases and live with a few corner case difficulties because perfection is not possible. Thanks, Joe D. On 26/05/2019 20:30, Shawn Heisey wrote: On 5/26/2019 12:52 PM, Joe Doupnik wrote: I do queries while indexing, have done so for a long time, without difficulty nor memory usage spikes from dual use. The system has been designed to support that. Again, one may look at the numbers using "top" or similar. Try Solr v8.0 and 8.1 to see the difference which I experience here. For reference, the only memory adjustables set in my configuration is in the Solr startup script solr.in.sh saying add "-Xss1024k" in the SOLR_OPTS list and setting SOLR_HEAP="4024m". There is one significant difference between 8.0 and 8.1 in the realm of memory management -- we have switched from the CMS garbage collector to the G1 collector. So the way that Java manages the heap has changed. This was done because the CMS collector is slated for removal from Java. https://issues.apache.org/jira/browse/SOLR-13394 Java is unlike other programs in one respect -- once it allocates heap from the OS, it never gives it back. This behavior has given Java an undeserved reputation as a memory hog ... but in fact Java's overall memory usage can be very easily limited ... an option that many other programs do NOT have. In your configuration, you set the max heap to a little less than 4GB. You have to expect that it *WILL* use that memory. By using the SOLR_HEAP variable, you have instructed Solr's startup script to use the same setting for the minimum heap as well as the maximum heap. This is the design intent. If you want to know how much heap is being used, you can't ask the operating system, which means tools like top. You have to ask Java. And you will have to look at a long-term graph, finding the low points. An instananeous look at Java's heap usage could show you th
Re: Solr-8.1.0 uses much more memory
While on the topic of resource consumption and locks etc, there is one other aspect to which Solr has been vulnerable. It is failing to fend off too many requests at one time. The standard approach is, of course, named back pressure, such as not replying to a query until resources permit and thus keeping competion outside of the application. That limits resource consumption, including locks, memory and sundry, while permiting normal work within to progress smoothly. Let the crowds coming to a hit show queue in the rain outside the theatre until empty seats become available. On 27/05/2019 08:52, Joe Doupnik wrote: Generalizations tend to fail when confronted with conflicting evidence. The simple evidence is asking how much real memory the Solr owned process has been allocated (top, or ps aux or similar) and that yields two very different values (the ~1.6GB of Solr v8.0 and 4.5+GB of Solr v8.1). I have no knowledge of how Java chooses to name its usage (heap or otherwise). Prior to v8.1 Solr memory consumption varied with activity, thus memory management was occuring, memory was borrowed from and returned to the system. What might be happening in Solr v8.1 is the new memory management code is failing to do a proper job, for reasons which are not visible to us in the field, and that failure is important to us. In regard to the referenced lock discussion, it would be a good idea to not let the tail wag the dog, tend the common cases and live with a few corner case difficulties because perfection is not possible. Thanks, Joe D. On 26/05/2019 20:30, Shawn Heisey wrote: On 5/26/2019 12:52 PM, Joe Doupnik wrote: I do queries while indexing, have done so for a long time, without difficulty nor memory usage spikes from dual use. The system has been designed to support that. Again, one may look at the numbers using "top" or similar. Try Solr v8.0 and 8.1 to see the difference which I experience here. For reference, the only memory adjustables set in my configuration is in the Solr startup script solr.in.sh saying add "-Xss1024k" in the SOLR_OPTS list and setting SOLR_HEAP="4024m". There is one significant difference between 8.0 and 8.1 in the realm of memory management -- we have switched from the CMS garbage collector to the G1 collector. So the way that Java manages the heap has changed. This was done because the CMS collector is slated for removal from Java. https://issues.apache.org/jira/browse/SOLR-13394 Java is unlike other programs in one respect -- once it allocates heap from the OS, it never gives it back. This behavior has given Java an undeserved reputation as a memory hog ... but in fact Java's overall memory usage can be very easily limited ... an option that many other programs do NOT have. In your configuration, you set the max heap to a little less than 4GB. You have to expect that it *WILL* use that memory. By using the SOLR_HEAP variable, you have instructed Solr's startup script to use the same setting for the minimum heap as well as the maximum heap. This is the design intent. If you want to know how much heap is being used, you can't ask the operating system, which means tools like top. You have to ask Java. And you will have to look at a long-term graph, finding the low points. An instananeous look at Java's heap usage could show you that the whole heap is allocated ... but a significant part of that allocation could be garbage, which becomes available once the garbage is collected. Thanks, Shawn
Re: Solr-8.1.0 uses much more memory
Generalizations tend to fail when confronted with conflicting evidence. The simple evidence is asking how much real memory the Solr owned process has been allocated (top, or ps aux or similar) and that yields two very different values (the ~1.6GB of Solr v8.0 and 4.5+GB of Solr v8.1). I have no knowledge of how Java chooses to name its usage (heap or otherwise). Prior to v8.1 Solr memory consumption varied with activity, thus memory management was occuring, memory was borrowed from and returned to the system. What might be happening in Solr v8.1 is the new memory management code is failing to do a proper job, for reasons which are not visible to us in the field, and that failure is important to us. In regard to the referenced lock discussion, it would be a good idea to not let the tail wag the dog, tend the common cases and live with a few corner case difficulties because perfection is not possible. Thanks, Joe D. On 26/05/2019 20:30, Shawn Heisey wrote: On 5/26/2019 12:52 PM, Joe Doupnik wrote: I do queries while indexing, have done so for a long time, without difficulty nor memory usage spikes from dual use. The system has been designed to support that. Again, one may look at the numbers using "top" or similar. Try Solr v8.0 and 8.1 to see the difference which I experience here. For reference, the only memory adjustables set in my configuration is in the Solr startup script solr.in.sh saying add "-Xss1024k" in the SOLR_OPTS list and setting SOLR_HEAP="4024m". There is one significant difference between 8.0 and 8.1 in the realm of memory management -- we have switched from the CMS garbage collector to the G1 collector. So the way that Java manages the heap has changed. This was done because the CMS collector is slated for removal from Java. https://issues.apache.org/jira/browse/SOLR-13394 Java is unlike other programs in one respect -- once it allocates heap from the OS, it never gives it back. This behavior has given Java an undeserved reputation as a memory hog ... but in fact Java's overall memory usage can be very easily limited ... an option that many other programs do NOT have. In your configuration, you set the max heap to a little less than 4GB. You have to expect that it *WILL* use that memory. By using the SOLR_HEAP variable, you have instructed Solr's startup script to use the same setting for the minimum heap as well as the maximum heap. This is the design intent. If you want to know how much heap is being used, you can't ask the operating system, which means tools like top. You have to ask Java. And you will have to look at a long-term graph, finding the low points. An instananeous look at Java's heap usage could show you that the whole heap is allocated ... but a significant part of that allocation could be garbage, which becomes available once the garbage is collected. Thanks, Shawn
Re: Solr-8.1.0 uses much more memory
I do queries while indexing, have done so for a long time, without difficulty nor memory usage spikes from dual use. The system has been designed to support that. Again, one may look at the numbers using "top" or similar. Try Solr v8.0 and 8.1 to see the difference which I experience here. For reference, the only memory adjustables set in my configuration is in the Solr startup script solr.in.sh saying add "-Xss1024k" in the SOLR_OPTS list and setting SOLR_HEAP="4024m". Thanks, Joe D. On 26/05/2019 19:43, Jörn Franke wrote: I think this is also a very risky memory strategy. What happens if you Index and query at the same time etc. maybe it is more worth to provide as much memory as for concurrent operations are needed. This includes JVM memory but also the disk caches. Am 26.05.2019 um 20:38 schrieb Joe Doupnik : On 26/05/2019 19:15, Joe Doupnik wrote: On 26/05/2019 19:08, Shawn Heisey wrote: On 5/25/2019 9:40 AM, Joe Doupnik wrote: Comparing memory consumption (real, not virtual) of quiesent Solr v8.0 and prior with Solr v8.1.0 reveals the older versions use about 1.6GB on my systems but v8.1.0 uses 4.5 to 5+GB. Systems used are SUSE Linux, with Oracle JDK v1.8 and openjdk v10. This is a major memory consumption issue. I have seen no mention of it in the docs nor forums. If Solr is using 4 to 5 GB of memory on your system, it is only doing that because you told it that it was allowed to. If you run a Java program with a minimum heap that's smaller than the max heap, which Solr does not do by default, then what you will find is that Java *might* stay lower than the maximum for a while. But eventually it WILL allocate the entire maximum heap from the OS, plus some extra for Java itself to work with. Solr 8.0 and Solr 8.1 are not different from each other in this regard. Thanks, Shawn Not to be argumentative, prior to Solr v8.1 quiesent resident memory remained at about the 1.6GB level, and during active indexing it could exceed 3.5GB. With the same configuration settings Solr v8.1 changes that to use _a lot_ more memory. Thus something significant has changed with Solr v8.1 when compared to its predecessors. The question is what, and what can we do about it. I am not about to enter a guessing game with Solr and Java and its heap usage. That is far to complex to hope to win. Thus, something changed, for the worse here in the field, and I do not know what. Thanks, Joe D. --- If I were forced to guess about this situation it woud be to flag an item mentioned vaguely in passing: the garbage collector. How to return it to status quo ante is not known here. Presumably such a step would be covered in the yet to appear documentation for Solr v8.1 To add a little more to the story. Memory remained at the 1.6GB level except when doing heavy indexing. To "adjust" Solr so that it always consumes too much, as at present, is not acceptable, nor is acceptable risking trouble by setting an upper limit down to say 1.6GB and thence cause indexing to fail. We see the dilemna. Expert assistance is needed to resolve this. Thanks, Joe D.
Re: Solr-8.1.0 uses much more memory
On 26/05/2019 19:38, Jörn Franke wrote: Different garbage collector configuration? It does not mean that Solr uses more memory if it is occupied - it could also mean that the JVM just kept it reserved for future memory needs. Am 25.05.2019 um 17:40 schrieb Joe Doupnik : Comparing memory consumption (real, not virtual) of quiesent Solr v8.0 and prior with Solr v8.1.0 reveals the older versions use about 1.6GB on my systems but v8.1.0 uses 4.5 to 5+GB. Systems used are SUSE Linux, with Oracle JDK v1.8 and openjdk v10. This is a major memory consumption issue. I have seen no mention of it in the docs nor forums. Thanks, Joe D. --- The garbage collector was on my mind as well (in a msg sent just before yours). These numbers are easy to verify, just by using "top". They say allocated, meaning Java owns it, no matter what Java does with it. Java does not own the machine; there are other useful activities to tend as well. Let's find the problem and cure it. Thanks, Joe D.
Re: Solr-8.1.0 uses much more memory
On 26/05/2019 19:15, Joe Doupnik wrote: On 26/05/2019 19:08, Shawn Heisey wrote: On 5/25/2019 9:40 AM, Joe Doupnik wrote: Comparing memory consumption (real, not virtual) of quiesent Solr v8.0 and prior with Solr v8.1.0 reveals the older versions use about 1.6GB on my systems but v8.1.0 uses 4.5 to 5+GB. Systems used are SUSE Linux, with Oracle JDK v1.8 and openjdk v10. This is a major memory consumption issue. I have seen no mention of it in the docs nor forums. If Solr is using 4 to 5 GB of memory on your system, it is only doing that because you told it that it was allowed to. If you run a Java program with a minimum heap that's smaller than the max heap, which Solr does not do by default, then what you will find is that Java *might* stay lower than the maximum for a while. But eventually it WILL allocate the entire maximum heap from the OS, plus some extra for Java itself to work with. Solr 8.0 and Solr 8.1 are not different from each other in this regard. Thanks, Shawn Not to be argumentative, prior to Solr v8.1 quiesent resident memory remained at about the 1.6GB level, and during active indexing it could exceed 3.5GB. With the same configuration settings Solr v8.1 changes that to use _a lot_ more memory. Thus something significant has changed with Solr v8.1 when compared to its predecessors. The question is what, and what can we do about it. I am not about to enter a guessing game with Solr and Java and its heap usage. That is far to complex to hope to win. Thus, something changed, for the worse here in the field, and I do not know what. Thanks, Joe D. --- If I were forced to guess about this situation it woud be to flag an item mentioned vaguely in passing: the garbage collector. How to return it to status quo ante is not known here. Presumably such a step would be covered in the yet to appear documentation for Solr v8.1 To add a little more to the story. Memory remained at the 1.6GB level except when doing heavy indexing. To "adjust" Solr so that it always consumes too much, as at present, is not acceptable, nor is acceptable risking trouble by setting an upper limit down to say 1.6GB and thence cause indexing to fail. We see the dilemna. Expert assistance is needed to resolve this. Thanks, Joe D.
Re: Solr-8.1.0 uses much more memory
On 26/05/2019 19:08, Shawn Heisey wrote: On 5/25/2019 9:40 AM, Joe Doupnik wrote: Comparing memory consumption (real, not virtual) of quiesent Solr v8.0 and prior with Solr v8.1.0 reveals the older versions use about 1.6GB on my systems but v8.1.0 uses 4.5 to 5+GB. Systems used are SUSE Linux, with Oracle JDK v1.8 and openjdk v10. This is a major memory consumption issue. I have seen no mention of it in the docs nor forums. If Solr is using 4 to 5 GB of memory on your system, it is only doing that because you told it that it was allowed to. If you run a Java program with a minimum heap that's smaller than the max heap, which Solr does not do by default, then what you will find is that Java *might* stay lower than the maximum for a while. But eventually it WILL allocate the entire maximum heap from the OS, plus some extra for Java itself to work with. Solr 8.0 and Solr 8.1 are not different from each other in this regard. Thanks, Shawn Not to be argumentative, prior to Solr v8.1 quiesent resident memory remained at about the 1.6GB level, and during active indexing it could exceed 3.5GB. With the same configuration settings Solr v8.1 changes that to use _a lot_ more memory. Thus something significant has changed with Solr v8.1 when compared to its predecessors. The question is what, and what can we do about it. I am not about to enter a guessing game with Solr and Java and its heap usage. That is far to complex to hope to win. Thus, something changed, for the worse here in the field, and I do not know what. Thanks, Joe D.
Solr-8.1.0 uses much more memory
Comparing memory consumption (real, not virtual) of quiesent Solr v8.0 and prior with Solr v8.1.0 reveals the older versions use about 1.6GB on my systems but v8.1.0 uses 4.5 to 5+GB. Systems used are SUSE Linux, with Oracle JDK v1.8 and openjdk v10. This is a major memory consumption issue. I have seen no mention of it in the docs nor forums. Thanks, Joe D.
Schema API Version 2 - 7.6.0
Hi - according to the documentation here: https://lucene.apache.org/solr/guide/7_6/schema-api.html The V2 API is located at api/cores/collection/schema However the documentation here: https://lucene.apache.org/solr/guide/7_6/v2-api.html has it at api/c/collection/schema I believe the later is correct - true? Thank you! -Joe Obernberger
Re: High CPU usage with Solr 7.7.0
Just to add to this. We upgraded to 7.7.0 and saw very large CPU usage on multi core boxes - sustained in the 1200% range. We then switched to 7.6.0 (no other configuration changes) and the problem went away. We have a 40 node cluster and all 40 nodes had high CPU usage with 3 indexes stored on HDFS. -Joe On 2/27/2019 5:04 AM, Lukas Weiss wrote: Hello, we recently updated our Solr server from 6.6.5 to 7.7.0. Since then, we have problems with the server's CPU usage. We have two Solr cores configured, but even if we clear all indexes and do not start the index process, we see 100 CPU usage for both cores. Here's what our top says: root@solr:~ # top top - 09:25:24 up 17:40, 1 user, load average: 2,28, 2,56, 2,68 Threads: 74 total, 3 running, 71 sleeping, 0 stopped, 0 zombie %Cpu0 :100,0 us, 0,0 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu1 :100,0 us, 0,0 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu2 : 11,3 us, 1,0 sy, 0,0 ni, 86,7 id, 0,7 wa, 0,0 hi, 0,3 si, 0,0 st %Cpu3 : 3,0 us, 3,0 sy, 0,0 ni, 93,7 id, 0,3 wa, 0,0 hi, 0,0 si, 0,0 st KiB Mem : 8388608 total, 7859168 free, 496744 used,32696 buff/cache KiB Swap: 2097152 total, 2097152 free,0 used. 7859168 avail Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND P 10209 solr 20 0 6138468 452520 25740 R 99,9 5,4 29:43.45 java -server -Xms1024m -Xmx1024m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:ConcGCThreads=4 + 24 10214 solr 20 0 6138468 452520 25740 R 99,9 5,4 28:42.91 java -server -Xms1024m -Xmx1024m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:ConcGCThreads=4 + 25 The solr server is installed on a Debian Stretch 9.8 (64bit) on Linux LXC dedicated Container. Some more server info: root@solr:~ # java -version openjdk version "1.8.0_181" OpenJDK Runtime Environment (build 1.8.0_181-8u181-b13-2~deb9u1-b13) OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode) root@solr:~ # free -m totalusedfree shared buff/cache available Mem: 8192 4847675 701 31 7675 Swap: 2048 02048 We also found something strange if we do an strace of the main process, we get lots of ongoing connection timeouts: root@solr:~ # strace -F -p 4136 strace: Process 4136 attached with 48 threads strace: [ Process PID=11089 runs in x32 mode. ] [pid 4937] epoll_wait(139, [pid 4936] restart_syscall(<... resuming interrupted futex ...> [pid 4909] restart_syscall(<... resuming interrupted futex ...> [pid 4618] epoll_wait(136, [pid 4576] futex(0x7ff61ce66474, FUTEX_WAIT_PRIVATE, 1, NULL [pid 4279] futex(0x7ff61ce62b34, FUTEX_WAIT_PRIVATE, 2203, NULL [pid 4244] restart_syscall(<... resuming interrupted futex ...> [pid 4227] futex(0x7ff56c71ae14, FUTEX_WAIT_PRIVATE, 2237, NULL [pid 4243] restart_syscall(<... resuming interrupted futex ...> [pid 4228] futex(0x7ff5608331a4, FUTEX_WAIT_PRIVATE, 2237, NULL [pid 4208] futex(0x7ff61ce63e54, FUTEX_WAIT_PRIVATE, 5, NULL [pid 4205] restart_syscall(<... resuming interrupted futex ...> [pid 4204] restart_syscall(<... resuming interrupted futex ...> [pid 4196] restart_syscall(<... resuming interrupted futex ...> [pid 4195] restart_syscall(<... resuming interrupted futex ...> [pid 4194] restart_syscall(<... resuming interrupted futex ...> [pid 4193] restart_syscall(<... resuming interrupted futex ...> [pid 4187] restart_syscall(<... resuming interrupted restart_syscall ...> [pid 4180] restart_syscall(<... resuming interrupted futex ...> [pid 4179] restart_syscall(<... resuming interrupted futex ...> [pid 4177] restart_syscall(<... resuming interrupted futex ...> [pid 4174] accept(133, [pid 4173] restart_syscall(<... resuming interrupted futex ...> [pid 4172] restart_syscall(<... resuming interrupted futex ...> [pid 4171] restart_syscall(<... resuming interrupted restart_syscall ...> [pid 4165] restart_syscall(<... resuming interrupted futex ...> [pid 4164] futex(0x7ff61c1f5054, FUTEX_WAIT_PRIVATE, 3, NULL [pid 4163] restart_syscall(<... resuming interrupted futex ...> [pid 4162] restart_syscall(<... resuming interrupted futex ...> [pid 4161] restart_syscall(<... resuming interrupted futex ...> [pid 4160] futex(0x7ff623d52c20, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 0x [pid 4159] futex(0x7ff61c1e9d54, FUTEX_WAIT_PRIVATE, 7, NULL [pid 4158] futex(0x7ff61c1b7f54, FUTEX_WAIT_PRIVATE, 15, NULL [pid 4157] futex(0x7ff61c1b5554, FUTEX_WAIT_PRIVATE, 19, NULL [pid 4156] restart_syscall(<... resuming interrupted futex ...> [pid
Re: Solr 7.7.0 - Garbage Collection issue
Reverted back to 7.6.0 - same settings, but now I do not encounter the large CPU usage. -Joe On 2/12/2019 12:37 PM, Joe Obernberger wrote: Thank you Shawn. Yes, I used the settings off of your site. I've restarted the cluster and the CPU usage is back up again. Looking at it now, it doesn't appear to be GC related. Full log from one of the nodes that is pegging 13 CPU cores: http://lovehorsepower.com/solr_gc.log.0.current Thank you For the gceasy.io site - that is very slick! I'll use that in the future. I can try using the standard settings, but again - at this point it doesn't look GC related to me? -Joe On 2/12/2019 11:35 AM, Shawn Heisey wrote: On 2/12/2019 7:35 AM, Joe Obernberger wrote: Yesterday, we upgraded our 40 node cluster from solr 7.6.0 to solr 7.7.0. This morning, all the nodes are using 1200+% of CPU. It looks like it's in garbage collection. We did reduce our HDFS cache size from 11G to 6G, but other than that, no other parameters were changes. Your message included a small excerpt from the GC log. That is not helpful. We will need the entire GC log, possibly more than one log. The log or logs should fully cover the timeframe where the problem occurs. Full disclosure: Once obtained, I would use this website to analyze GC log data: http://gceasy.io Parameters are: GC_TUNE="-XX:+UseG1GC \ -XX:MaxDirectMemorySize=6g \ -XX:+PerfDisableSharedMem \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=16m \ -XX:MaxGCPauseMillis=300 \ -XX:InitiatingHeapOccupancyPercent=75 \ -XX:+UseLargePages \ -XX:ParallelGCThreads=16 \ -XX:-ResizePLAB \ -XX:+AggressiveOpts" Looks like you've chosen to use G1 settings very similar to what I put on my wiki page: https://wiki.apache.org/solr/ShawnHeisey#Current_experiments Those settings are not intended to be a canonical resource that everyone can use. Your heap size is different than what I was using when I worked on that, so you may need different settings. Have you considered not using your own GC tuning, letting Solr's start script handle that? With the limited information available, my initial guess is that you need a larger heap, that Java is spending all its time freeing up enough memory to keep the program running. Thanks, Shawn --- This email has been checked for viruses by AVG. https://www.avg.com
Re: Solr 7.7.0 - Garbage Collection issue
Thank you Shawn. Yes, I used the settings off of your site. I've restarted the cluster and the CPU usage is back up again. Looking at it now, it doesn't appear to be GC related. Full log from one of the nodes that is pegging 13 CPU cores: http://lovehorsepower.com/solr_gc.log.0.current Thank you For the gceasy.io site - that is very slick! I'll use that in the future. I can try using the standard settings, but again - at this point it doesn't look GC related to me? -Joe On 2/12/2019 11:35 AM, Shawn Heisey wrote: On 2/12/2019 7:35 AM, Joe Obernberger wrote: Yesterday, we upgraded our 40 node cluster from solr 7.6.0 to solr 7.7.0. This morning, all the nodes are using 1200+% of CPU. It looks like it's in garbage collection. We did reduce our HDFS cache size from 11G to 6G, but other than that, no other parameters were changes. Your message included a small excerpt from the GC log. That is not helpful. We will need the entire GC log, possibly more than one log. The log or logs should fully cover the timeframe where the problem occurs. Full disclosure: Once obtained, I would use this website to analyze GC log data: http://gceasy.io Parameters are: GC_TUNE="-XX:+UseG1GC \ -XX:MaxDirectMemorySize=6g \ -XX:+PerfDisableSharedMem \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=16m \ -XX:MaxGCPauseMillis=300 \ -XX:InitiatingHeapOccupancyPercent=75 \ -XX:+UseLargePages \ -XX:ParallelGCThreads=16 \ -XX:-ResizePLAB \ -XX:+AggressiveOpts" Looks like you've chosen to use G1 settings very similar to what I put on my wiki page: https://wiki.apache.org/solr/ShawnHeisey#Current_experiments Those settings are not intended to be a canonical resource that everyone can use. Your heap size is different than what I was using when I worked on that, so you may need different settings. Have you considered not using your own GC tuning, letting Solr's start script handle that? With the limited information available, my initial guess is that you need a larger heap, that Java is spending all its time freeing up enough memory to keep the program running. Thanks, Shawn --- This email has been checked for viruses by AVG. https://www.avg.com
Solr 7.7.0 - Garbage Collection issue
bytes, 283456928 total - age 12: 2599280 bytes, 286056208 total - age 13: 9197304 bytes, 295253512 total - age 14: 2616704 bytes, 297870216 total - age 15: 8565352 bytes, 306435568 total , 0.0540664 secs] [Parallel Time: 47.3 ms, GC Workers: 16] [GC Worker Start (ms): Min: 62350482.7, Avg: 62350482.8, Max: 62350483.0, Diff: 0.3] [Ext Root Scanning (ms): Min: 0.6, Avg: 0.9, Max: 2.6, Diff: 2.0, Sum: 14.7] [Update RS (ms): Min: 1.3, Avg: 2.8, Max: 3.1, Diff: 1.8, Sum: 44.2] [Processed Buffers: Min: 2, Avg: 9.8, Max: 26, Diff: 24, Sum: 157] [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum: 1.1] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Object Copy (ms): Min: 42.9, Avg: 43.0, Max: 43.1, Diff: 0.2, Sum: 688.3] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Termination Attempts: Min: 1, Avg: 1.1, Max: 2, Diff: 1, Sum: 17] [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum: 1.2] [GC Worker Total (ms): Min: 46.6, Avg: 46.9, Max: 47.0, Diff: 0.4, Sum: 749.6] [GC Worker End (ms): Min: 62350529.6, Avg: 62350529.7, Max: 62350529.8, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.7 ms] [Other: 6.0 ms] [Choose CSet: 0.0 ms] [Ref Proc: 3.8 ms] [Ref Enq: 0.4 ms] [Redirty Cards: 0.3 ms] [Humongous Register: 0.2 ms] [Humongous Reclaim: 0.1 ms] [Free CSet: 0.7 ms] [Eden: 5408.0M(5408.0M)->0.0B(5376.0M) Survivors: 304.0M->320.0M Heap: 16.5G(19.0G)->11.3G(19.0G)] Heap after GC invocations=792 (full 0): garbage-first heap total 19922944K, used 11821182K [0x00024000, 0x000241002600, 0x0007c000) region size 16384K, 20 young (327680K), 20 survivors (327680K) Metaspace used 90791K, capacity 93476K, committed 93696K, reserved 1132544K class space used 10929K, capacity 11487K, committed 11520K, reserved 1048576K } [Times: user=0.77 sys=0.01, real=0.06 secs] Parameters are: GC_TUNE="-XX:+UseG1GC \ -XX:MaxDirectMemorySize=6g \ -XX:+PerfDisableSharedMem \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=16m \ -XX:MaxGCPauseMillis=300 \ -XX:InitiatingHeapOccupancyPercent=75 \ -XX:+UseLargePages \ -XX:ParallelGCThreads=16 \ -XX:-ResizePLAB \ -XX:+AggressiveOpts" Anything I can try / change? Thank you! -Joe
Re: Cannot Figure out Reason for Persistent Zookeeper Warning
Our application runs on Tomcat. We found that when we deploy to Tomcat using Jenkins or Ansible--a "hot" deployment--the ZK log problem starts. The only solution we've been able to find was to bounce Tomcat. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Warnings in Zookeeper Server Logs
After a long slog, I am now able to answer my own question, just in case anybody is listening. We determined that when we deploy our application to Tomcat using the Tomcat deploy service, which happens when we deploy with Jenkins and Ansible, these errors start. Conversely, if we re-start Tomcat from scratch, the errors go away. Nothing else we tried (and we tried a lot) worked. Our guess is that the Zookeeper libraries we build into our application do something that do not go away, even when the application is re-deployed. This isn't a great answer from us, as we use Ansible to deploy our application to production, and we use Jenkins to continuously deploy in development. But, it is what it is, and at least our logs are readable now. Joe -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Warnings in Zookeeper Server Logs
Hi (yes again): We have a simple architecture: 2 SOLR Cloud servers (on servers #1 and #2), and 3 zookeeper instances (on servers #1, #2, and #3). Things appear to work fine, and I have confirmed that our basic configuration is correct. But we are seeing TONS of the following warnings in all of our zookeeper server logs: 2019-01-04 14:48:04,266 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /XXX.YY.ZZZ.46:51516 2019-01-04 14:48:04,266 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) at java.lang.Thread.run(Thread.java:748) 2019-01-04 14:48:04,266 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket connection for client /XXX.YY.ZZZ.46:51516 (no session established for client) These messages seem to correspond to similar message we are seeing in the application client-side logs. (I don’t see any messages that would indicate Too many connections.) Reading the log content, it seems to be saying that a connection is accepted, but then there is an "end of stream" exception. But our users are not experiencing any problems--they are searching SOLR like crazy. Any suggestions? Thanks! Joe -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Continuous Zookeeper Client Warnings
(ClientCnxn.java:1063) [MYAPP-WEB] 2019-01-03 14:19:49,912 WARN [org.apache.zookeeper.ClientCnxn] - java.lang.NoClassDefFoundError: org/apache/zookeeper/Login at org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslClient(ZooKeeperSaslClient.java:216) at org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:119) at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1011) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1063) [MYAPP-WEB] 2019-01-03 14:19:50,977 WARN [org.apache.zookeeper.ClientCnxn] - java.lang.NoClassDefFoundError: org/apache/zookeeper/Login at org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslClient(ZooKeeperSaslClient.java:216) at org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:119) at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1011) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1063) [MYAPP-WEB] 2019-01-03 14:19:51,233 WARN [org.apache.zookeeper.ClientCnxn] - java.lang.NoClassDefFoundError: org/apache/zookeeper/Login at org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslClient(ZooKeeperSaslClient.java:216) at org.apache.zookeeper.client.ZooKeeperSaslClient.(ZooKeeperSaslClient.java:119) at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1011) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1063) These are making our application logs impossible to read, and I assume indicate that something is wrong. Thanks for any help! Joe Lerner -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: So Many Zookeeper Warnings--There Must Be a Problem
wrt, "You'll probably have to delete the contents of the zk data directory and rebuild your collections." Rebuild my *SOLR* collections? That's easy enough for us. If this is how we're incorrectly configured now: server #1 = myid#1 server #2 = myid#2 server #3 = myid#2 My plan would be to do the following, while users are still online (it's a big [bad] deal if we need to take search offline): 1. Take zk #3 down. 2. Fix zk #3 by deleting the contents of the zk data directory and assign it myid#3 3. Bring zk#3 back up 4. Do a full re-build of all collections Thanks! Joe -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: So Many Zookeeper Warnings--There Must Be a Problem
Hi Scott, First, we are definitely mis-onfigured for the myid thing. Basically two of them were identifying as ID #2, and they are the two ZK's claiming to be the leader. Definitely something to straighten out! Our 3 lines in zoo.cfg look correct. Except they look like this: clientPort:2181 server.1=host1:2190:2195 server.2=host2:2191:2196 server.3=host3:2192:2197 Notice the port range, and overlap... Is that.../copacetic/? Thanks! Joe -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
So Many Zookeeper Warnings--There Must Be a Problem
Hi, We have a simple architecture: 2 SOLR Cloud servers (on servers #1 and #2), and 3 zookeeper instances (on servers #1, #2, and #3). Things work fine (although we had a couple of brief unexplained outages), but: One worrisome thing is that when I status zookeeper on #1 and #2, I get Mode=Leader on both--#3 shows follower. This seems to be a pretty permanent condition, at least right now as I look at it. And there isn't any big maintenance or anything going on. Also, we are getting *TONS* of continuous log warnings from our client applications. From one server it shows this: And from another server we get this: These are making our logs impossible to read, but worse, I assume indicate that something is wrong. Thanks for any help! Joe Lerner -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Solr 7.1 nodes shutting down
$ReadCallback.succeeded(AbstractConnection.java:279) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626) at java.lang.Thread.run(Thread.java:748) and: 2018-08-10 19:14:10.401 ERROR (qtp1908316405-209211) [c:UNCLASS s:shard23 r:core_node47 x:UNCLASS_shard23_replica_n44] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: No registered leader was found after waiting for 4000ms , collection: UNCLASS slice: shard23 saw state=DocCollection(UNCLASS//collections/UNCLASS/state.json/3828)={ any ideas on what to try? I've been trying to figure this out for a couple days now, but it's very intermittent. Thank you! -Joe
Re: Schema Change for Solr 7.4
OK--yes, I can see how that would work. But it would require some quick infrastructure flexibility that, at least to this point, we don't really have. Joe -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Schema Change for Solr 7.4
We recently set up Solr 7.4 in Production. There are 2 Solr nodes, with 3 zookeepers. We need to make a schema change. What I want to do is simply push the updated schema to Solr, and then re-index all the content to pick up the change. But I am being told that I need to: 1. Delete the collection that depends on this config-set. 2. Reload the config-set 3. Recreate the dependent collection It seems to me that between steps #1 and #3, users will not be able to search, which is not cool. Can I avoid the outage to my search capabilitty? Thanks! Joe -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: CloudSolrClient URL Too Long
Shawn - thank you! That works great. Stupid huge searches here I come! -Joe On 7/12/2018 4:46 PM, Shawn Heisey wrote: On 7/12/2018 12:48 PM, Joe Obernberger wrote: Hi - I'm using SolrCloud 7.3.1 and calling a search from Java using: org.apache.solr.client.solrj.response.QueryResponse response = CloudSolrClient.query(ModifiableSolrParams) If the ModifiableSolrParams are long, I get an error: Bad Message 414reason: URI Too Long I have the maximum number of terms set to 1024 (default), and I'm using about 500 terms. Is there a way around this? The total query length is 10,131 bytes. Add a parameter to the query call. Change this: client.query(query) to this: client.query(query, METHOD.POST) The import you'll need for that is org.apache.solr.client.solrj.SolrRequest.METHOD. What you're running into is the length limit on the HTTP request of 8192 characters. This is the limit that virtually all webservers, including the Jetty that Solr includes, have configured. Changing the request method to POST puts all those parameters into the request body. Solr's default limit on the size of the request body is 2 megabytes. Thanks, Shawn --- This email has been checked for viruses by AVG. https://www.avg.com
CloudSolrClient URL Too Long
Hi - I'm using SolrCloud 7.3.1 and calling a search from Java using: org.apache.solr.client.solrj.response.QueryResponse response = CloudSolrClient.query(ModifiableSolrParams) If the ModifiableSolrParams are long, I get an error: Bad Message 414reason: URI Too Long I have the maximum number of terms set to 1024 (default), and I'm using about 500 terms. Is there a way around this? The total query length is 10,131 bytes. Thank you! -Joe
Re: Can't recover - HDFS
Thank you Shawn - I think the root issue is related to some weirdness with HDFS. Log file is here: http://lovehorsepower.com/solr.log.4 Config is here: http://lovehorsepower.com/solrconfig.xml I don't see anything set to 20 seconds. I believe the root exception is: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /solr7.1.0/UNCLASS_30DAYS/core_node-1684300827/data/tlog/tlog.0008930 could only be replicated to 0 nodes instead of minReplication (=1). There are 41 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1724) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3449) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:692) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275) at org.apache.hadoop.ipc.Client.call(Client.java:1504) at org.apache.hadoop.ipc.Client.call(Client.java:1441) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy11.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:423) at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy12.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1860) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1656) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:790) 2018-07-02 14:50:24.949 ERROR (indexFetcher-41-thread-1) [c:UNCLASS_30DAYS s:shard37 r:core_node-1684300827 x:UNCLASS_30DAYS_shard37_replica_t-1246382645] o.a.s.h.ReplicationHandler Exception in fetching index org.apache.solr.common.SolrException: Error logging add at org.apache.solr.update.TransactionLog.write(TransactionLog.java:420) at org.apache.solr.update.UpdateLog.add(UpdateLog.java:535) at org.apache.solr.update.UpdateLog.add(UpdateLog.java:519) at org.apache.solr.update.UpdateLog.copyOverOldUpdates(UpdateLog.java:1213) at org.apache.solr.update.UpdateLog.copyAndSwitchToNewTlog(UpdateLog.java:1168) at org.apache.solr.update.UpdateLog.copyOverOldUpdates(UpdateLog.java:1155) at org.apache.solr.cloud.ReplicateFromLeader.lambda$startReplication$0(ReplicateFromLeader.java:100) at org.apache.solr.handler.ReplicationHandler.lambda$setupPolling$12(ReplicationHandler.java:1160) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Thank you very much for the help! -Joe On 7/2/2018 8:32 PM, Shawn Heisey wrote: On 7/2/2018 1:40 PM, Joe Obernberger wrote: Hi All - having this same
Re: Solr 7.1.0 - NoNode for /collections
Just to add to this - looks like the only valid replica that is remaining is a TLOG type, and I suspect that is why it no longer has a leader. Poop. -Joe On 7/2/2018 7:54 PM, Joe Obernberger wrote: Hi - On startup, I'm getting the following error. The shard had 3 replicas, but none are selected as the leader. I deleted one, and adding a new one back, but that had no effect, and at times the calls would timeout. I was having the same issue with another shard on the same collection and deleting/re-adding a replica worked; the shard now has a leader. This one, I can't seem to get to come up. Any ideas? org.apache.solr.common.SolrException: Error getting leader from zk for shard shard6 at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:1223) at org.apache.solr.cloud.ZkController.register(ZkController.java:1090) at org.apache.solr.cloud.ZkController.register(ZkController.java:1018) at org.apache.solr.core.ZkContainer.lambda$registerInZk$0(ZkContainer.java:187) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.solr.common.SolrException: Could not get leader props at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1270) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1234) at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:1190) ... 7 more Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /collections/UNCLASS_30DAYS/leaders/shard6/leader at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.solr.common.cloud.SolrZkClient.lambda$getData$5(SolrZkClient.java:340) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:340) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1248) ... 9 more -Joe
Solr 7.1.0 - NoNode for /collections
Hi - On startup, I'm getting the following error. The shard had 3 replicas, but none are selected as the leader. I deleted one, and adding a new one back, but that had no effect, and at times the calls would timeout. I was having the same issue with another shard on the same collection and deleting/re-adding a replica worked; the shard now has a leader. This one, I can't seem to get to come up. Any ideas? org.apache.solr.common.SolrException: Error getting leader from zk for shard shard6 at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:1223) at org.apache.solr.cloud.ZkController.register(ZkController.java:1090) at org.apache.solr.cloud.ZkController.register(ZkController.java:1018) at org.apache.solr.core.ZkContainer.lambda$registerInZk$0(ZkContainer.java:187) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.solr.common.SolrException: Could not get leader props at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1270) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1234) at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:1190) ... 7 more Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /collections/UNCLASS_30DAYS/leaders/shard6/leader at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.solr.common.cloud.SolrZkClient.lambda$getData$5(SolrZkClient.java:340) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:340) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1248) ... 9 more -Joe
Can't recover - HDFS
Hi All - having this same problem again with a large index in HDFS. A replica needs to recover, and it just spins retrying over and over again. Any ideas? Is there an adjustable timeout? Screenshot: http://lovehorsepower.com/images/SolrShot1.jpg Thank you! -Joe Obernberger
Exception writing to index; possible analysis error - 7.3.1 - HDFS
(Throwable.java:1043) at java.io.FilterOutputStream.close(FilterOutputStream.java:159) at org.apache.lucene.store.OutputStreamIndexOutput.close(OutputStreamIndexOutput.java:70) at org.apache.lucene.store.RateLimitedIndexOutput.close(RateLimitedIndexOutput.java:49) at org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:123) at org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:112) at org.apache.lucene.codecs.lucene50.Lucene50PostingsWriter.close(Lucene50PostingsWriter.java:482) at org.apache.lucene.util.IOUtils.close(IOUtils.java:89) at org.apache.lucene.util.IOUtils.close(IOUtils.java:76) at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.close(BlockTreeTermsWriter.java:1026) at org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:123) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:170) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:230) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:115) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4443) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4083) at org.apache.solr.update.SolrIndexWriter.merge(SolrIndexWriter.java:190) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661) [CIRCULAR REFERENCE:org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /solr7.1.0/UNCLASS/core_node39/data/index/_4i92_Lucene50_0.pos could only be replicated to 0 nodes instead of minReplication (=1). There are 41 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1724) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3449) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:692) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275) -Joe
Re: Solr 7 + HDFS issue
"_4jlm_9.liv", "_4jm6.cfe", "_4jm6.cfs", "_4jm6.si", "_4jm6_c.liv", "_4jmr.cfe", "_4jmr.cfs", "_4jmr.si", "_4jmr_3.liv", "_4jna.cfe", "_4jna.cfs", "_4jna.si", "_4jna_5.liv", "_4joy.cfe", "_4joy.cfs", "_4joy.si", "_4joy_6.liv", "_4jpi.cfe", "_4jpi.cfs", "_4jpi.si", "_4jpi_4.liv", "_4jq2.cfe", "_4jq2.cfs", "_4jq2.si", "_4jq2_4.liv", "_4jqm.cfe", "_4jqm.cfs", "_4jqm.si", "_4jqm_1.liv", "_4jqn.cfe", "_4jqn.cfs", "_4jqn.si", "_4jqn_2.liv", "_4jqr.cfe", "_4jqr.cfs", "_4jqr.si", "_4jqu.cfe", "_4jqu.cfs", "_4jqu.si", "_4jqv.cfe", "_4jqv.cfs", "_4jqv.si", "_4jqv_1.liv", "_4jqw.cfe", "_4jqw.cfs", "_4jqw.si", "_4jqw_1.liv", "_4jqy.cfe", "_4jqy.cfs", "_4jqy.si", "_4jqy_1.liv", "_4jqz.cfe", "_4jqz.cfs", "_4jqz.si", "_4jqz_1.liv", "_4jr0.cfe", "_4jr0.cfs", "_4jr0.si", "_4jr0_1.liv", "_4jr3.cfe", "_4jr3.cfs", "_4jr3.si", "_4jr3_1.liv", "_4jr6.cfe", "_4jr6.cfs", "_4jr6.si", "_4jr6_2.liv", "_4jr8.cfe", "_4jr8.cfs", "_4jr8.si", "_4jr9.cfe", "_4jr9.cfs", "_4jr9.si", "_4jr9_1.liv", "_4jra.cfe", "_4jra.cfs", "_4jra.si", "_4jra_1.liv", "_4jrb.cfe", "_4jrb.cfs", "_4jrb.si", "_4jrb_1.liv", "_4jrd.cfe", "_4jrd.cfs", "_4jrd.si", "_4jre.cfe", "_4jre.cfs", "_4jre.si", "_4jrh.cfe", "_4jrh.cfs", "_4jrh.si", "_4jro.cfe", "_4jro.cfs", "_4jro.si", "_4jrp.cfe", "_4jrp.cfs", "_4jrp.si", "_4jrq.cfe", "_4jrq.cfs", "_4jrq.si", "_4jrr.cfe", "_4jrr.cfs", "_4jrr.si", "_4jrr_1.liv", "_4jrs.cfe", "_4jrs.cfs", "_4jrs.si", "_4jrt.cfe", "_4jrt.cfs", "_4jrt.si", "_4jru.cfe", "_4jru.cfs", "_4jru.si", "_4jrv.cfe", "_4jrv.cfs", "_4jrv.si", "_4jrw.cfe", "_4jrw.cfs", "_4jrw.si", "_4jrx.cfe", "_4jrx.cfs", "_4jrx.si", "_4jry.cfe", "_4jry.cfs", "_4jry.si", "_4jrz.cfe", "_4jrz.cfs", "_4jrz.si", "_4js0.cfe", "_4js0.cfs", "_4js0.si", "_4js1.cfe", "_4js1.cfs", "_4js1.si", "_4js2.cfe", "_4js2.cfs", "_4js2.si", "_4js3.cfe", "_4js3.cfs", "_4js3.si", "_itc.cfe", "_itc.cfs", "_itc.si", "_itc_2s.liv", "segments_6bh"]]], "isMaster":"true", "isSlave":"false", "indexVersion":1528861822922, "generation":8189, "master":{ "replicateAfter":["commit"], "replicationEnabled":"true", "replicableVersion":1528861822922, "replicableGeneration":8189}}} - -Joe On 6/12/2018 11:48 AM, Shawn Heisey wrote: On 6/11/2018 9:46 AM, Joe Obernberger wrote: We are seeing an issue on our Solr Cloud 7.3.1 cluster where replication starts and pegs network interfaces so aggressively that other tasks cannot talk. We will see it peg a bonded 2GB interfaces. In some cases the replication fails over and over until it finally succeeds and the replica comes back up. Usually the error is a timeout. Has anyone seen this? We've tried adjust the /replication requestHandler and setting: 75 Here's something I'd like you to try. Open a browser and visit the URL for the handler with some specific parameters, so we can see if that config is actually being applied. Substitute the correct host, port, and collection name: http://host:port/solr/collection/replication?command=details=all=json=true And provide the full raw JSON response. On a solr 7.3.0 example, I added your replication handler definition, and this is the result of visiting a similar URL: { "responseHeader":{ "status":0, "QTime":5, "params":{ "echoParams":"all", "indent":"true", "wt":"json", "command":"details", "maxWriteMBPerSec":"75"}}, "details":{ "indexSize":"6.27 KB", "indexPath":"C:\\Users\\sheisey\\Downloads\\solr-7.3.0\\server\\solr\\foo\\data\\index/", "commits":[[ "indexVersion",1528213960436, "generation",4, "filelist",["_0.fdt", "_0.fdx", "_0.fnm", "_0.si", "_0_Lucene50_0.doc", "_0_Lucene50_0.tim", "_0_Lucene50_0.tip", "_0_Lucene70_0.dvd", "_0_Lucene70_0.dvm", "_1.fdt", "_1.fdx", "_1.fnm", "_1.nvd", "_1.nvm", "_1.si", "_1_Lucene50_0.doc", "_1_Lucene50_0.pos", "_1_Lucene50_0.tim", "_1_Lucene50_0.tip", "_1_Lucene70_0.dvd", "_1_Lucene70_0.dvm", "_2.fdt", "_2.fdx", "_2.fnm", "_2.nvd", "_2.nvm", "_2.si", "_2_Lucene50_0.doc", "_2_Lucene50_0.pos", "_2_Lucene50_0.tim", "_2_Lucene50_0.tip", "_2_Lucene70_0.dvd", "_2_Lucene70_0.dvm", "segments_4"]]], "isMaster":"true", "isSlave":"false", "indexVersion":1528213960436, "generation":4, "master":{ "replicateAfter":["commit"], "replicationEnabled":"true"}}} The maxWriteMBPerSec parameter can be seen in the response header, so on this system, it looks like it's working. Thanks, Shawn --- This email has been checked for viruses by AVG. https://www.avg.com
Solr 7 + HDFS issue
We are seeing an issue on our Solr Cloud 7.3.1 cluster where replication starts and pegs network interfaces so aggressively that other tasks cannot talk. We will see it peg a bonded 2GB interfaces. In some cases the replication fails over and over until it finally succeeds and the replica comes back up. Usually the error is a timeout. Has anyone seen this? We've tried adjust the /replication requestHandler and setting: 75 but it appears to have no effect. Any ideas? Thank you! -Joe
Re: query bag of word with negation
On 22/04/2018 19:26, Joe Doupnik wrote: On 22/04/2018 19:04, Nicolas Paris wrote: Hello I wonder if there is a plain text query syntax to say: give me all document that match: wonderful pizza NOT peperoni all those in a 5 distance word bag then pizza are wonderful -> would match I made a wonderful pasta and pizza -> would match Peperoni pizza are so wonderful -> would not match I tested: "wonderful pizza - peperoni"~5 without success Thanks --- A partial answer to your question is contained in this Help screen text from my Solr query program: Some hints about using this facility: 1. Query terms containing other than just letters or digits may be placed within double quotes so that those other characters do not separate a term into many terms. A dot (period) and white space are neither letter nor digit. Examples: "Now is the time for all good men" (spaces, quotes impose ordering too), "goods.doc" (a dot). 2. Mode button "or" (the default) means match one or more terms, perhaps scattered about. Mode button "and" means must match all terms, scattered or not. 3. A one word query term may be prefixed by title: or url: to search on those fields. A space must follow the colon, and the search term is case sensitive. Examples: url: .ppt or title: Goodies. Many docs do not have a formal internal title field, thus prefix title: may not work. 4. Compound queries can be built by joining terms with and or - and group items with ( ). Not is expressed as a minus sign prefixing a term. A bare space means use the Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert Daniel) which means both the first two and not Jane and neither of the two guys. 5. A query of asterisk/star (*) means match everything. Examples: * for everything (zero or more characters). Fussy, show all without term .pdf * and -".pdf" For normal queries the program uses the edismax interface. A few, such as url: foobar, reference the Lucene interface. This is specified by the qagent= parameter, of edismax or empty respectively, in a search request. Thus regular facilities can do most of this work. What this example does not address is your distance 5 critera. However, the NOT facility may do the trick for you, though a minus sign is taken as a literal minus sign or word separator if located within a quoted string. Thanks, Joe D. -- Golly, that was well and truly munged by the receiver. Let me try again - A partial answer to your question is contained in this Help screen text from my Solr query program: Some hints about using this facility: 1. Query terms containing other than just letters or digits may be placed within double quotes so that those other characters do not separate a term into many terms. A dot (period) and white space are neither letter nor digit. Examples: "Now is the time for all good men" (spaces, quotes impose ordering too), "goods.doc" (a dot). 2. Mode button "or" (the default) means match one or more terms, perhaps scattered about. Mode button "and" means must match all terms, scattered or not. 3. A one word query term may be prefixed by title: or url: to search on those fields. A space must follow the colon, and the search term is case sensitive. Examples: url: .ppt or title: Goodies. Many docs do not have a formal internal title field, thus prefix title: may not work. 4. Compound queries can be built by joining terms with and or - and group items with ( ). Not is expressed as a minus sign prefixing a term. A bare space means use the Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert Daniel) which means both the first two and not Jane and neither of the two guys. 5. A query of asterisk/star (*) means match everything. Examples: * for everything (zero or more characters). Fussy, show all without term .pdf * and -".pdf" For normal queries the program uses the edismax interface. A few, such as url: foobar, reference the Lucene interface. This is specified by the qagent= parameter, of edismax or empty respectively, in a search request. Thus regular facilities can do most of this work. What this example does not address is your distance 5 critera. However, the NOT facility may do the trick for you, though a minus sign is taken as a literal minus sign or word separator if located within a quoted string. Hopefully that will be more readable. Thanks, Joe D.
Re: query bag of word with negation
On 22/04/2018 19:04, Nicolas Paris wrote: Hello I wonder if there is a plain text query syntax to say: give me all document that match: wonderful pizza NOT peperoni all those in a 5 distance word bag then pizza are wonderful -> would match I made a wonderful pasta and pizza -> would match Peperoni pizza are so wonderful -> would not match I tested: "wonderful pizza - peperoni"~5 without success Thanks --- A partial answer to your question is contained in this Help screen text from my Solr query program: Some hints about using this facility: 1. Query terms containing other than just letters or digits may be placed within double quotes so that those other characters do not separate a term into many terms. A dot (period) and white space are neither letter nor digit. Examples: "Now is the time for all good men" (spaces, quotes impose ordering too), "goods.doc" (a dot). 2. Mode button "or" (the default) means match one or more terms, perhaps scattered about. Mode button "and" means must match all terms, scattered or not. 3. A one word query term may be prefixed by title: or url: to search on those fields. A space must follow the colon, and the search term is case sensitive. Examples: url: .ppt or title: Goodies. Many docs do not have a formal internal title field, thus prefix title: may not work. 4. Compound queries can be built by joining terms with and or - and group items with ( ). Not is expressed as a minus sign prefixing a term. A bare space means use the Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert Daniel) which means both the first two and not Jane and neither of the two guys. 5. A query of asterisk/star (*) means match everything. Examples: * for everything (zero or more characters). Fussy, show all without term .pdf * and -".pdf" For normal queries the program uses the edismax interface. A few, such as url: foobar, reference the Lucene interface. This is specified by the qagent= parameter, of edismax or empty respectively, in a search request. Thus regular facilities can do most of this work. What this example does not address is your distance 5 critera. However, the NOT facility may do the trick for you, though a minus sign is taken as a literal minus sign or word separator if located within a quoted string. Thanks, Joe D.
Re: Loss of "Optimise" button
Yet, that is a task which the main application, Solr, could and should undertake, rather than ask we human slaves to add sundry programs to tend it from afar. Similarly, it would be useful for there to be feedback from Solr when adding material so that we don't overwhelm parts of the pipeline. That's a classical problem with known solutions. Thanks, Joe D. On 21/04/2018 19:16, Erick Erickson wrote: Yeah, trying to have something that satisfies all use cases is a bear. I know of one installation where the indexing rate was so huge that they couldn't afford to have any merging (80B docs/day) so in that situation any heuristics built into Solr would be wrong. Here's an alternate approach to having buttons where you have to attend to it each day: http://localhost:8983/solr/admin/cores?action=STATUS returns each core and the number of docs, maxdocs, and deleted docs. One could set up a cron job that runs every night at 3:00 am that then sends the optimize command to any core with greater than X% deleted docs, where X is your locally-determined threshold. That would be less work actually than having to attend to it every day. FWIW On Sat, Apr 21, 2018 at 10:55 AM, Joe Doupnik <j...@netlab1.net> wrote: A good find Erick, and one which brings into focus the real problem at hand. That overload case would happen if there were an Optimise button or if the curl equivalent command were issued, and is not a reason to avoid either/both. So, what could be done to avoid such awkward difficulties? Well, an obvious suggestion, without knowing the details, is might the system be able to estimate internal conditions sufficiently to issue a warning and decline an Optimise. Certainly average system managers are not about to decode and monitor Java VM nuances. Discussion about automating removals based on sizes of this and that seem, from this distance, to be musings yet to face the real world. In the meanwhile we need to control matters, hence the button request. The resource consumption issue is inherent in such systems, and we in the field have very little information to help make choices. I know, it's not a simple affair, and too many buzz words fly about. Thus the engineers close to the code might have a ponder about the above predictive capability and about the overall resource consumption process which might permit the system to adapt to progressively larger loads over time. In my own situation I feed material into Solr a file at a time, give a small pause, repeat, get to 100 entries and wait a bit longer, and so on every file, hundred files, thousand files. This works well to reduce resource peaks and uncompleted operations, and it lets the system run in the background all day if necessary without disturbing main activities. My longest run was over a full day, 660+K documents which worked just fine and did not upset other activities in the machine. Thanks, Joe D. On 21/04/2018 17:54, Erick Erickson wrote: Joe: Serendipity strikes, The thread titled "JVM Heap Memory Increase (SOLR CLOUD)" is a perfect example of why the optimize button is so "fraught". Best, Erick On Sat, Apr 21, 2018 at 9:43 AM, Erick Erickson <erickerick...@gmail.com> wrote: Joe: Thanks for moving the conversation over here that we were having on the blog post. I think the wider audience will benefit from this going forward. bq: ...apparent inability to remove piles of deleted docs do note that deleted docs are removed during normal indexing when segments are merged, they're not permanently retained in the index. Part of the thinking behind SOLR-7733 is exactly that once you press the very tempting optimize button, you can get into a situation where your one huge segment does _not_ have the deleted docs removed until the "live" document space is < 2.5G. Thus if you have a 100G segment after optimize, it'll look like deleted docs are never removed until at least 97.5% of the docs are deleted. The default max segment size is 5G, and the current algorithm doesn't consider segments eligible for merging until 50% of that maximum number consists of "live" docs. The optimize functionality in the admin UI was removed as part of SOLR-7733 from the screen that comes up when you select a core, but the "core admin" screen still has the optimize button that comes and goes depending on whether there are any deleted documents or not. This page is only visible in standalone mode. Unfortunately SOLR-7733 removed the functionality that actually sent the optimize command from the javascript, so pressing the optimize button does nothing. This is indeed a bug, see: SOLR-12253 which will remove the button from the core admin screen in stand-alone mode. Optimize (aka forceMerge) is pretty actively discouraged because it is: 1> very expensive 2> has significant "gotchas" (we chatted in comments in the blog post
Re: Loss of "Optimise" button
A good find Erick, and one which brings into focus the real problem at hand. That overload case would happen if there were an Optimise button or if the curl equivalent command were issued, and is not a reason to avoid either/both. So, what could be done to avoid such awkward difficulties? Well, an obvious suggestion, without knowing the details, is might the system be able to estimate internal conditions sufficiently to issue a warning and decline an Optimise. Certainly average system managers are not about to decode and monitor Java VM nuances. Discussion about automating removals based on sizes of this and that seem, from this distance, to be musings yet to face the real world. In the meanwhile we need to control matters, hence the button request. The resource consumption issue is inherent in such systems, and we in the field have very little information to help make choices. I know, it's not a simple affair, and too many buzz words fly about. Thus the engineers close to the code might have a ponder about the above predictive capability and about the overall resource consumption process which might permit the system to adapt to progressively larger loads over time. In my own situation I feed material into Solr a file at a time, give a small pause, repeat, get to 100 entries and wait a bit longer, and so on every file, hundred files, thousand files. This works well to reduce resource peaks and uncompleted operations, and it lets the system run in the background all day if necessary without disturbing main activities. My longest run was over a full day, 660+K documents which worked just fine and did not upset other activities in the machine. Thanks, Joe D. On 21/04/2018 17:54, Erick Erickson wrote: Joe: Serendipity strikes, The thread titled "JVM Heap Memory Increase (SOLR CLOUD)" is a perfect example of why the optimize button is so "fraught". Best, Erick On Sat, Apr 21, 2018 at 9:43 AM, Erick Erickson <erickerick...@gmail.com> wrote: Joe: Thanks for moving the conversation over here that we were having on the blog post. I think the wider audience will benefit from this going forward. bq: ...apparent inability to remove piles of deleted docs do note that deleted docs are removed during normal indexing when segments are merged, they're not permanently retained in the index. Part of the thinking behind SOLR-7733 is exactly that once you press the very tempting optimize button, you can get into a situation where your one huge segment does _not_ have the deleted docs removed until the "live" document space is < 2.5G. Thus if you have a 100G segment after optimize, it'll look like deleted docs are never removed until at least 97.5% of the docs are deleted. The default max segment size is 5G, and the current algorithm doesn't consider segments eligible for merging until 50% of that maximum number consists of "live" docs. The optimize functionality in the admin UI was removed as part of SOLR-7733 from the screen that comes up when you select a core, but the "core admin" screen still has the optimize button that comes and goes depending on whether there are any deleted documents or not. This page is only visible in standalone mode. Unfortunately SOLR-7733 removed the functionality that actually sent the optimize command from the javascript, so pressing the optimize button does nothing. This is indeed a bug, see: SOLR-12253 which will remove the button from the core admin screen in stand-alone mode. Optimize (aka forceMerge) is pretty actively discouraged because it is: 1> very expensive 2> has significant "gotchas" (we chatted in comments in the blog post about the gotchas). So we made a decision to make it more of an 'expert' option, requiring users to issue a curl/Browser URL command like "solr/core_or_collection/update?optimize=true" if this functionality is really desirable in their situation. Docs will be updated too, they're lagging a bit. Coming probably in Solr 7.4 is a new parameter (tentatively) for TieredMergePolicy (TMP) that puts a soft ceiling on the percentage of deleted docs in an index. The current version of this patch (LUCENE-7976) sets this threshold at 20% at the expense of about 10% more I/O in my tests from the current TMP implementation. Under discussion is how low to allow this to be, we're thinking 10% as a floor, and what the default should be. The current TMP caps the percentage deleted docs at close to 50%. The thinking behind not allowing the percent deleted documents to be too low is that that would trigger its own massive I/O issues, rewriting "live" documents over and over and over. For NRT indexes, that's almost certainly a horrible tradeoff. For more static indexes, the "expert" API command is still available. Best, Erick On Sat, Apr 21, 2018 at 5:08 AM, Joe Doupnik <j...@netlab1.net> wrote:
Re: Loss of "Optimise" button
On 21/04/2018 17:25, Doug Turnbull wrote: I haven’t tracked this change, but can you still optimize through the API? Here’s an example using update XML https://stackoverflow.com/questions/6954358/how-to-optimize-solr-index There are so many cases hitting “optimize” causes a huge segment merge that brings down a Solr cluster that I think I agree with the decision to remove such an inviting button. Doug On Sat, Apr 21, 2018 at 8:08 AM Joe Doupnik <j...@netlab1.net> wrote: - Doug, Thanks for that feedback. Here are my thoughts on the matter. Removing deleted docs is often an irregular occurrence, such as say when there is overlapping of incoming material with older input. We don' t want to optimise often, least often in fact, but when the circumstances do exist we want to do it expeditiously and wisely. That says a button which we may use when needed, human judgement is employed, and thus avoid automation which totally lacks that judgement and which leads to often unnecessary system peak loads every day or so. I don't want to search high and low for just the right curl command. The capability needs to be an intrinsic part of the management facility design (say the GUI). Clearly documenting the curl approach would be helpful for many sites. In my cases, I run my programs typically at night to crawl this & that, and do not wish to have to tend them daily. The situations of overlap and thus deleted docs typically occurs when I am rebuilding the background data sources, and that is when I am working on things and can tend the overlaps. I do not want unnecessary peak loadings from automation. Thus reinstatement of the Optimize button would mean humans could again invoke it if and only when they thought appropriate. The admin GUI would be the normal place to perform that, as it was previously. Lacking that management facility would be a design shortcoming. Thanks, Joe D. In Solr v7.3.0 the ability to removed "deleted" docs from a core by use of what until then was the Optmise button on the admin GUI has been changed in an ungood way. That is, in the V7.3.0 Changes list, item SOLR 7733 (quote remove "optmize from the UI, end quote). The result of that is an apparent inability to remove piles of deleted docs, which amongst other things means wasting disk space. That is a marked step backward and is unhelpful for use of Solr in the field. As other comments in the now closed 7733 ticket explain, this is a user item whidh has impact on their site, and it ought to be an inherent feature of Solr. Consider a file system where complete deletes are forbidden, or your kitchen where taking out the rubbish is denied. Hand waving about obscure auto-sizing notions will not suffice. Thus may I urge that the Optimse button and operation be returned to use, as it was until Solr v7.3.0. Thanks, Joe D.
Loss of "Optimise" button
In Solr v7.3.0 the ability to removed "deleted" docs from a core by use of what until then was the Optmise button on the admin GUI has been changed in an ungood way. That is, in the V7.3.0 Changes list, item SOLR 7733 (quote remove "optmize from the UI, end quote). The result of that is an apparent inability to remove piles of deleted docs, which amongst other things means wasting disk space. That is a marked step backward and is unhelpful for use of Solr in the field. As other comments in the now closed 7733 ticket explain, this is a user item whidh has impact on their site, and it ought to be an inherent feature of Solr. Consider a file system where complete deletes are forbidden, or your kitchen where taking out the rubbish is denied. Hand waving about obscure auto-sizing notions will not suffice. Thus may I urge that the Optimse button and operation be returned to use, as it was until Solr v7.3.0. Thanks, Joe D.
Re: Solr OOM Crashes / JVM tuning advice
Just as a side note, when Solr goes OOM and kills itself, and if you're running HDFS, you are guaranteed to have write.lock files left over. If you're running lots of shards/replicas, you may have many files that you need to go into HDFS and delete before restarting. -Joe On 4/11/2018 10:46 AM, Shawn Heisey wrote: On 4/11/2018 4:01 AM, Adam Harrison-Fuller wrote: I was wondering if I could get some JVM/GC tuning advice to resolve an issue that we are experiencing. Full disclaimer, I am in no way a JVM/Solr expert so any advice you can render would be greatly appreciated. Our Solr cloud nodes are having issues throwing OOM exceptions under load. This issue has only started manifesting itself over the last few months during which time the only change I can discern is an increase in index size. They are running Solr 5.5.2 on OpenJDK version "1.8.0_101". The index is currently 58G and the server has 46G of physical RAM and runs nothing other than the Solr node. The advice I see about tuning your garbage collection won't help you. GC tuning can do absolutely nothing about OutOfMemoryError problems. Better tuning might *delay* the OOM, but it can't prevent it. You need to figure out exactly what resource is running out. Hopefully one of the solr logfiles will have the actual OutOfMemoryError exception information. It might not be the heap. Once you know what resource is running out and causing the OOM, then we can look deeper. A side note: The OOM is not *technically* causing a crash, even though that might be the visible behavior. When Solr is started on a non-windows system with the included scripts, it runs with a parameter that calls a script on OOM. That script *very intentionally* kills Solr. This is done because program operation when OOM hits is unpredictable, and there's a decent chance that if it keeps running, your index will get corrupted. That could happen anyway, but with quick action to kill the program, it's less likely. The JVM is invoked with the following JVM options: -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888 -XX:+ManagementServer -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8 -XX:NewRatio=3 -XX:OldPLABSize=16 -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 3 /data/gnpd/solr/logs -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC Solr 5.5.2 includes GC tuning options in its default configuration. Unless you'd like to switch to G1, you might want to let Solr's start script handle that for you instead of overriding the options. The defaults are substantially similar to what you have defined. I have imported the GC logs into GCViewer and attached a link to a screenshot showing the lead up to a OOM crash. Interestingly the young generation space is almost empty before the repeated GC's and subsequent crash. https://imgur.com/a/Wtlez Can you share the actual GC logfile? You'll need to use a file sharing site to do that, attachments almost never work on the mailing list. The info in the summary to the right of the graph seems to support your contention that there is plenty of heap, so the OutOfMemoryError is probably not related to heap memory. You're going to have to look at your logfiles to see what the root cause is. Thanks, Shawn --- This email has been checked for viruses by AVG. http://www.avg.com
Solr7.1.0 - deleting collections when using HDFS
Hi All - I've noticed that if I delete a collection that is stored in HDFS, the files/directory in HDFS remain. If I then try to recreate the collection with the same name, I get an error about unable to open searcher. If I then remove the directory from HDFS, the error remains due to files stored in /etc/solr. Once those are also removed on all the noeds, then I can re-create the collection. -Joe
Re: Solr 7.1.0 - concurrent.ExecutionException building model
leHttpClient.execute(CloseableHttpClient.java:56) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:525) ... 12 more The cluster was basically idle during this run, but that certainly doesn't mean there was not a legitimate read timeout. Thanks for looking! -Joe On 4/5/2018 8:27 PM, Joel Bernstein wrote: Hi Joe, Currently you will eventually run into memory problems if the training sets gets too large. Under the covers on each node it is creating a matrix with a row for each document and a column for each feature. This can get large quite quickly. By choosing fewer features you can make this matrix much smaller. Its fairly easy to make the train function work on a random sample of the training set on each iteration rather then the entire training set, but currently this is not how its implemented. Feel free to create a ticket requesting this the sampling approach. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Apr 5, 2018 at 5:32 PM, Joe Obernberger < joseph.obernber...@gmail.com> wrote: I tried to build a large model based on about 1.2 million documents. One of the nodes ran out of memory and killed itself. Is this much data not reasonable to use? The nodes have 16g of heap. Happy to increase it, but not sure if this is possible? Thank you! -Joe On 4/5/2018 10:24 AM, Joe Obernberger wrote: Thank you Shawn - sorry so long to respond, been playing around with this a good bit. It is an amazing capability. It looks like it could be related to certain nodes in the cluster not responding quickly enough. In one case, I got the concurrent.ExecutionException, but it looks like the root cause was a SocketTimeoutException. I'm using HDFS for the index and it gets hit pretty hard by other processes running, and I'm wondering if that's causing this. java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: params expr=update(models,+batchSize% 3D"50",train(MODEL1033_1522883727011,features(MODEL1033_ 1522883727011,q%3D"*:*",featureSet%3D"FSet_MODEL1033_ 1522883727011",field%3D"Text",outcome%3D"out_i", positiveLabel%3D1,numTerms%3D1000),q%3D"*:*",name%3D"MODEL10 33",field%3D"Text",outcome%3D"out_i",maxIterations%3D"1000") )=/stream=true=*:*=id=id+asc=false at org.apache.solr.client.solrj.io.stream.CloudSolrStream.openS treams(CloudSolrStream.java:405) at org.apache.solr.client.solrj.io.stream.CloudSolrStream.open( CloudSolrStream.java:275) at com.ngc.bigdata.ie_solrmodelbuilder.SolrModelBuilderProcesso r.doWork(SolrModelBuilderProcessor.java:114) at com.ngc.intelenterprise.intelentutil.utils.Processor.run( Processor.java:140) at com.ngc.intelenterprise.intelentutil.jms.IntelEntQueueProc. process(IntelEntQueueProc.java:208) at org.apache.camel.processor.DelegateSyncProcessor.process(Del egateSyncProcessor.java:63) at org.apache.camel.management.InstrumentationProcessor.process (InstrumentationProcessor.java:77) at org.apache.camel.processor.RedeliveryErrorHandler.process(Re deliveryErrorHandler.java:460) at org.apache.camel.processor.CamelInternalProcessor.process(Ca melInternalProcessor.java:190) at org.apache.camel.processor.CamelInternalProcessor.process(Ca melInternalProcessor.java:190) at org.apache.camel.component.direct.DirectProducer.process(Dir ectProducer.java:62) at org.apache.camel.processor.SendProcessor.process(SendProcess or.java:141) at org.apache.camel.management.InstrumentationProcessor.process (InstrumentationProcessor.java:77) at org.apache.camel.processor.RedeliveryErrorHandler.process(Re deliveryErrorHandler.java:460) at org.apache.camel.processor.CamelInternalProcessor.process(Ca melInternalProcessor.java:190) at org.apache.camel.processor.CamelInternalProcessor.process(Ca melInternalProcessor.java:190) at org.apache.camel.component.jms.EndpointMessageListener.onMes sage(EndpointMessageListener.java:114) at org.springframework.jms.listener.AbstractMessageListenerCont ainer.doInvokeListener(AbstractMessageListenerContainer.java:699) at org.springframework.jms.listener.AbstractMessageListenerCont ainer.invokeListener(AbstractMessageListenerContainer.java:637) at org.springframework.jms.listener.AbstractMessageListenerCont ainer.doExecuteListener(AbstractMessageListenerContainer.java:605) at org.springframework.jms.listener.AbstractPollingMessageListe nerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer. java:308) at org.springframework.jms.listener.AbstractPollingMessageListe nerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer. java:246) at org.springframework.jms.listener.DefaultMessageListenerConta iner$As
Re: Solr 7.1.0 - concurrent.ExecutionException building model
I tried to build a large model based on about 1.2 million documents. One of the nodes ran out of memory and killed itself. Is this much data not reasonable to use? The nodes have 16g of heap. Happy to increase it, but not sure if this is possible? Thank you! -Joe On 4/5/2018 10:24 AM, Joe Obernberger wrote: Thank you Shawn - sorry so long to respond, been playing around with this a good bit. It is an amazing capability. It looks like it could be related to certain nodes in the cluster not responding quickly enough. In one case, I got the concurrent.ExecutionException, but it looks like the root cause was a SocketTimeoutException. I'm using HDFS for the index and it gets hit pretty hard by other processes running, and I'm wondering if that's causing this. java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: params expr=update(models,+batchSize%3D"50",train(MODEL1033_1522883727011,features(MODEL1033_1522883727011,q%3D"*:*",featureSet%3D"FSet_MODEL1033_1522883727011",field%3D"Text",outcome%3D"out_i",positiveLabel%3D1,numTerms%3D1000),q%3D"*:*",name%3D"MODEL1033",field%3D"Text",outcome%3D"out_i",maxIterations%3D"1000"))=/stream=true=*:*=id=id+asc=false at org.apache.solr.client.solrj.io.stream.CloudSolrStream.openStreams(CloudSolrStream.java:405) at org.apache.solr.client.solrj.io.stream.CloudSolrStream.open(CloudSolrStream.java:275) at com.ngc.bigdata.ie_solrmodelbuilder.SolrModelBuilderProcessor.doWork(SolrModelBuilderProcessor.java:114) at com.ngc.intelenterprise.intelentutil.utils.Processor.run(Processor.java:140) at com.ngc.intelenterprise.intelentutil.jms.IntelEntQueueProc.process(IntelEntQueueProc.java:208) at org.apache.camel.processor.DelegateSyncProcessor.process(DelegateSyncProcessor.java:63) at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77) at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:460) at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190) at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190) at org.apache.camel.component.direct.DirectProducer.process(DirectProducer.java:62) at org.apache.camel.processor.SendProcessor.process(SendProcessor.java:141) at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77) at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:460) at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190) at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190) at org.apache.camel.component.jms.EndpointMessageListener.onMessage(EndpointMessageListener.java:114) at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:699) at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:637) at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:605) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:308) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:246) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1144) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1136) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1033) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.ExecutionException: java.io.IOException: params expr=update(models,+batchSize%3D"50",train(MODEL1033_1522883727011,features(MODEL1033_1522883727011,q%3D"*:*",featureSet%3D"FSet_MODEL1033_1522883727011",field%3D"Text",outcome%3D"out_i",positiveLabel%3D1,numTerms%3D1000),q%3D"*:*",name%3D"MODEL1033",field%3D"Text",outcome%3D"out_i",maxIte
Re: Largest number of indexed documents used by Solr
50 billion per day? Wow! How large are these documents? We have a cluster with one large collection that contains 2.4 billion documents spread across 40 machines using HDFS for the index. We store our data inside of HBase, and in order to re-index data we pull from HBase and index with solr cloud. Most we can do is around 57 million per day; usually limited by pulling data out of HBase not Solr. -Joe On 4/4/2018 10:57 PM, 苗海泉 wrote: When we have 49 shards per collection, there are more than 600 collections. Solr will have serious performance problems. I don't know how to deal with them. My advice to you is to minimize the number of collections. Our environment is 49 solr server nodes, each with 32cpu/128g, and the data volume is about 50 billion per day. <https://mailtrack.io/> Sent with Mailtrack <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality;> 2018-04-04 9:23 GMT+08:00 Yago Riveiro <yago.rive...@gmail.com>: Hi, In my company we are running a 12 node cluster with 10 (american) Billion documents 12 shards / 2 replicas. We do mainly faceting queries with a very reasonable performance. 36 million documents it's not an issue, you can handle that volume of documents with 2 nodes with SSDs and 32G of ram Regards. -- Yago Riveiro On 4 Apr 2018 02:15 +0100, Abhi Basu <9000r...@gmail.com>, wrote: We have tested Solr 4.10 with 200 million docs with avg doc size of 250 KB. No issues with performance when using 3 shards / 2 replicas. On Tue, Apr 3, 2018 at 8:12 PM, Steven White <swhite4...@gmail.com> wrote: Hi everyone, I'm about to start a project that requires indexing 36 million records using Solr 7.2.1. Each record range from 500 KB to 0.25 MB where the average is 0.1 MB. Has anyone indexed this number of records? What are the things I should worry about? And out of curiosity, what is the largest number of records that Solr has indexed which is published out there? Thanks Steven -- Abhi Basu
Re: Solr 7.1.0 - concurrent.ExecutionException building model
ption: params expr=update(models,+batchSize%3D"50",train(MODEL1033_1522883727011,features(MODEL1033_1522883727011,q%3D"*:*",featureSet%3D"FSet_MODEL1033_1522883727011",field%3D"Text",outcome%3D"out_i",positiveLabel%3D1,numTerms%3D1000),q%3D"*:*",name%3D"MODEL1033",field%3D"Text",outcome%3D"out_i",maxIterations%3D"1000"))=/stream=true=*:*=id=id+asc=false at org.apache.solr.client.solrj.io.stream.SolrStream.open(SolrStream.java:115) at org.apache.solr.client.solrj.io.stream.CloudSolrStream$StreamOpener.call(CloudSolrStream.java:510) at org.apache.solr.client.solrj.io.stream.CloudSolrStream$StreamOpener.call(CloudSolrStream.java:499) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188) ... 3 more Caused by: org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://leda:9100/solr/MODEL1033_1522883727011_shard20_replica_n74 at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:637) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:253) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:242) at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) at org.apache.solr.client.solrj.io.stream.SolrStream.constructParser(SolrStream.java:269) at org.apache.solr.client.solrj.io.stream.SolrStream.open(SolrStream.java:113) ... 7 more Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139) at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155) at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261) at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165) at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:525) -Joe On 4/2/2018 7:09 PM, Shawn Heisey wrote: On 4/2/2018 1:55 PM, Joe Obernberger wrote: The training data was split across 20 shards - specifically created with: http://icarus.querymasters.com:9100/solr/admin/collections?action=CREATE=MODEL1024_1522696624083=20=2=5=TRAINING Any ideas? The complete error is: HTTP ERROR 404 Problem accessing /solr/MODEL1024_1522696624083_shard20_replica_n75/select. Reason: Not Found I'll warn you in advance that I know nothing at all about the learning to rank functionality. I'm replying about the underlying error you're getting, independent of what your query is trying to accomplish. It's a 404 error, trying to access the URL mentioned above. The error doesn't indicate exactly WHAT wasn't found. It could either be the core named "MODEL1024_1522696624083_shard20_replica_n75" or the "/select" handler on that core. That's something you need to figure out. It could be that the core *does* exist, but for some reason, Solr on that machine was unable to start it. The solr.log file on the Solr instance that returned the error (which seems to be on the
Re: Solr 7.1.0 - concurrent.ExecutionException building model
Hi Joel - thank you for your reply. Yes, the machine (Vesta) is up, and I can access it. I don't see anything specific in the log, apart from the same error, but this time to a different server. We have constant indexing happening on this cluster, so if one went down, the indexing would stop, and I've not seen that happen. Interestingly, despite the error, the model is still built at least up to some number of iterations. In other words, many iterations complete OK. -Joe On 4/2/2018 6:54 PM, Joel Bernstein wrote: It looks like it accessing a replica that's down. Are the logs from http://vesta:9100/solr/MODEL1024_1522696624083_shard20_replica_n75 reporting any issues? When you go to that url is it back up and running? Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Apr 2, 2018 at 3:55 PM, Joe Obernberger < joseph.obernber...@gmail.com> wrote: Hi All - when building machine learning models using information gain, I sometimes get this error when the number of iterations is high. I'm using about 20k news articles in my training set (about 10k positive, and 10k negative), and (for this particular run) am using 500 terms and 25,000 iterations. I have gotten the error with a much lower number of iterations (1,000) as well. The specific stream command was: update(models, batchSize="50",train(MODEL1024_1522696624083,features( MODEL1024_1522696624083,q="*:*",featureSet="FSet_MODEL1024_1 522696624083",field="Text",outcome="out_i",positiveLabel=1, numTerms=500),q="*:*",name="MODEL1024",field="Text",outcome= "out_i",maxIterations="25000")) The training data was split across 20 shards - specifically created with: http://icarus.querymasters.com:9100/solr/admin/collections? action=CREATE=MODEL1024_1522696624083=20 licationFactor=2=5=TRAINING Any ideas? The complete error is: java.io.IOException: java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://vesta:9100/solr/MODEL10 24_1522696624083_shard20_replica_n75: Expected mime type application/octet-stream but got text/html. Error 404 Not Found HTTP ERROR 404 Problem accessing /solr/MODEL1024_1522696624083_shard20_replica_n75/select. Reason: Not Found at org.apache.solr.client.solrj.io.stream.TextLogitStream.read( TextLogitStream.java:498) at org.apache.solr.client.solrj.io.stream.PushBackStream.read(P ushBackStream.java:87) at org.apache.solr.client.solrj.io.stream.UpdateStream.read(Upd ateStream.java:109) at org.apache.solr.client.solrj.io.stream.ExceptionStream.read( ExceptionStream.java:68) at org.apache.solr.handler.StreamHandler$TimerStream.read( StreamHandler.java:627) at org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr iteMap$0(TupleStream.java:87) at org.apache.solr.response.JSONWriter.writeIterator(JSONRespon seWriter.java:523) at org.apache.solr.response.TextResponseWriter.writeVal(TextRes ponseWriter.java:180) at org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter .java:559) at org.apache.solr.client.solrj.io.stream.TupleStream.writeMap( TupleStream.java:84) at org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri ter.java:547) at org.apache.solr.response.TextResponseWriter.writeVal(TextRes ponseWriter.java:198) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD ups(JSONResponseWriter.java:209) at org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo nseWriter.java:325) at org.apache.solr.response.JSONWriter.writeResponse(JSONRespon seWriter.java:120) at org.apache.solr.response.JSONResponseWriter.write(JSONRespon seWriter.java:71) at org.apache.solr.response.QueryResponseWriterUtil.writeQueryR esponse(QueryResponseWriterUtil.java:65) at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC all.java:806) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:535) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp atchFilter.java:382) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp atchFilter.java:326) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte r(ServletHandler.java:1751) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan dler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped Handler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa ndler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle( SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle( ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand ler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope( SessionH