Shard CPU usage?
Hi guys, I was wondering does the introduction of shards actually increase CPU usage? I have a 30GB index split into two shards (15GB each), and by analyzing the logs, I figured out that ~80% of the queries have the "=http://10.3.4.12:8080/solr/mycore/|http://10.3.4.14:8080/solr/mycore/;. I basically don't need sharding, and am now starting to wonder if shards are actually increasing the CPU usage of my nodes or not, cause of the huge percentage of queries with "shard.url=" flag? I'm fighting with high cpu usage, and if turning sharding of and just keeping the replicas in my collection would lower the CPU usage for more then 10% I would choose that path.. Any insights? Thanks.
Admin UI doesn't show logs?
Hi, I'm running 4.10.3 under tomcat 7, and I have an issue with Admin UI. When I click on a Logging - I don't see actual entries but only: No Events available and round icon circling non stop. When I click on Level, I see the same icon, and message Loading Is there a hint or something you could point me to, so I could fix it?
Solr 4.10.x on Oracle Java 1.8.x ?
Hi guys, at the end of April Java 1.7 will be obsoleted, and Oracle will stop updating it. Is it safe to run Tomcat7 / Solr 4.10 on Java 1.8? Did anyone tried it already?
Adding new core to solr cloud?
Hi guys I need to add a new core to existing solr cloud of 4 nodes (2 replicas and 2 shardS), this is the procedure I have in mind: 1) stop node01 2) change solr.xml to include new core (included in tomcat configuration) 3) add -Dbootstrap_conf=true to JAVA_OPTS 4) start tomcat on node01 Now, I know this should push configuration for even existing cores, but I don't mind cause configuration didn't change for quite a bit. After this, I plan to remove -Dbootstrap_conf=true from node01 JAVA_OPTS and restart it again, and after the cloud stabilizes, do steps 1), 2), and 4) on remaining nodes. What do you think, am I missing something?
Re: Solr on Tomcat
On 02/10/2015 07:55 PM, Dan Davis wrote: As an application developer, I have to agree with this direction. I ran ManifoldCF and Solr together in the same Tomcat, and the sl4j configurations of the two conflicted with strange results. From a systems administrator/operations perspective, a separate install allows better packaging, e.g. Debian and RPM packages are then possible, although may not be preferred as many enterprises will want to use Oracle Java rather than OpenJDK. And what exactly stops you from running two different Tomcat services, each of them for 1 respective app ?
Re: Migrating cloud to another set of machines
On 10/30/2014 04:47 AM, Otis Gospodnetic wrote: Hi/Bok Jakov, 2) sounds good to me. It means no down-time. 1) means stoppage. If stoppage is not OK, but falling behind with indexing new content is OK, you could: * add a new cluster * start reading from old index and indexing into the new index * stop old cluster when done * index new content to new cluster (or maybe you can be doing this all along if indexing old + new at the same time is OK for you) -- Thank you for suggestions Otis. Everything is acceptable currently, but in the future as the data grows, we will certainly enter those edge cases where neither stopping indexing nor stopping queries will be acceptable. What makes things a little bit more problematic is that ZooKeepers are migrating also to new machines.
Migrating cloud to another set of machines
Hi guys I was wondering is there some smart way to migrate Solr cloud from 1 set of machines to another? Specificaly, I have 2 cores, each of them with 2 replicas and 2 shards, spread across 4 machines. We bought new HW and are in a process of moving to new 4 machines. What are my options? 1) - Create new cluster on new set of machines. - stop write operations - copy data directories from old machines to new machines - start solrs on new machines 2) - expand number of replicas from 2 to 4 - add new solr nodes to cloud - wait for resync - stop old solr nodes - shrink number of replicas from 4 back to 2 Is there any other path to achieve this? I'm leaning towards no1, because I don't feel too comfortable with doing all those changes explained in no2 ... Ideas?
Re: New cloud - replica in recovering state?
On 09/08/2014 02:55 AM, Erick Erickson wrote: I really recommend you use the new-style core discovery, if for no other reason than this style is deprecated in 5.0. See: https://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond Oh I didn't know that. Anyway problem I experienced was result of wrong hostPort and/or hostContext set in cores tag. After I fixed those, now it works, but anyway I will take a look into new way of setting up cores. Ty!
New cloud - replica in recovering state?
Hi guys, I'm trying to set up new solr cloud, with two core's, each with two shards and two replicas. This is my solr.xml: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true zkHost=10.200.1.104:2181,10.200.1.105:2181,10.200.1.106:2181 cores adminPath=/admin/cores defaultCoreName=mycore1 host=${host:} hostPort=${jetty.port:} hostContext=${hostContext:} zkClientTimeout=${zkClientTimeout:15000} core name=mycore1 instanceDir=mycore1 numShards=2/ core name=mycore2 instanceDir=mycore2 numShards=2/ /cores /solr But when I start everything, I can see 4 cores (each for 1 shard) are green in solr01:8080/solr/#/~cloud, but replicas are in yellow, RECOVERING state. How can I fix them to go from Recovering to Active?
Re: solr cloud going down repeatedly
On 08/19/2014 04:58 PM, Shawn Heisey wrote: On 8/19/2014 3:12 AM, Jakov Sosic wrote: Thank you for your comment. How did you test these settings? I mean, that's a lot of tuning and I would like to set up some test environment to be certain this is what I want... I included a section on tools when I wrote this page: http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems Thanks, we ended up using cron to restart Tomcats every 7 days, each solr node per day... that way we avoid GC pauses. Until we figure things out in our dev environment and test GC optimizations, we will keep it this way.
Re: solr cloud going down repeatedly
On 08/18/2014 08:38 PM, Shawn Heisey wrote: With an 8GB heap and UseConcMarkSweepGC as your only GC tuning, I can pretty much guarantee that you'll see occasional GC pauses of 10-15 seconds, because I saw exactly that happening with my own setup. This is what I use now: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning I can't claim that my problem is 100% solved, but collections that go over one second are *very* rare now, and I'm pretty sure they are all under two seconds. Thank you for your comment. How did you test these settings? I mean, that's a lot of tuning and I would like to set up some test environment to be certain this is what I want...
solr cloud going down repeatedly
Hi guys. I have a solr cloud, consisting of 3 zookeper VMs running 3.4.5 backported from Ubuntu 14.04 LTS to 12.04 LTS. They are orchestrating 4 solr nodes, which have 2 cores. Each core is sharded, so 1 shard is on each of the solr nodes. Solr runs under tomcat7 and ubuntus latest openjdk 7. Version of solr is 4.2.1. Each of the nodes have around 7GB of data, and JVM is set to run 8GB heap. All solr nodes have 16GB RAM. Few weeks back we started having issues with this installation. Tomcat was filling up catalina.out with following messages: SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: Only solution was to restart all 4 tomcats on 4 solr nodes. After that, issue would rectify itself, but would occur again, approximately a week after a restart. This happened last time yesterday, and I succeded in recording some of the stuff happening on boxes via Zabbix and atop. Basically at 15:35 load on machine went berzerk, jumping from around 0.5 to around 30+ Zabbix and atop didn't notice any heavy IO, all the other processes were practicaly idle, only JVM (tomcat) exploded with cpu usage increasing from standard ~80% to around ~750% These are the parts of Atop recordings on one of the node. Note that they are 10 mins appart: (15:28:42) CPL | avg10.12 | | avg50.36 | avg15 0.38 | (15:38:42) CPL | avg18.54 | | avg53.62 | avg15 1.61 | (15:48:42) CPL | avg1 30.14 | | avg5 27.09 | avg15 14.73 | This is the status of tomcat at last point (15:48:42): 28891tomcat7 tomcat7 411 8.68s 70m14s 209.9M 204K0K 5804K -- - S5704%java I have noticed similar stuff happening around the solr nodes. At 17:41 on call person decided to hard reset all the solr nodes, and cloud came back up running normally after that. These are the logs that I found on first node: Aug 17, 2014 3:44:58 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: Aug 17, 2014 3:46:12 PM org.apache.solr.cloud.OverseerCollectionProcessor run WARNING: Overseer cannot talk to ZK Aug 17, 2014 3:46:12 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater amILeader WARNING: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /overseer_elect/leader Then a bunch of : Aug 17, 2014 3:46:42 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: until the server was rebooted. On other nodes I can see: node2: Aug 17, 2014 3:44:58 PM org.apache.solr.cloud.RecoveryStrategy close WARNING: Stopping recovery for zkNodeName=10.100.254.103:8080_solr_myappcore=myapp Aug 17, 2014 3:44:58 PM org.apache.solr.cloud.RecoveryStrategy close WARNING: Stopping recovery for zkNodeName=10.100.254.103:8080_solr_myapp2core=myapp2 Aug 17, 2014 3:46:24 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://node1:8080/solr/myapp node4: Aug 17, 2014 3:44:06 PM org.apache.solr.cloud.RecoveryStrategy close WARNING: Stopping recovery for zkNodeName=10.100.254.105:8080_solr_myapp2core=myapp2 Aug 17, 2014 3:44:09 PM org.apache.solr.cloud.RecoveryStrategy close WARNING: Stopping recovery for zkNodeName=10.100.254.105:8080_solr_myappcore=myapp Aug 17, 2014 3:45:37 PM org.apache.solr.common.SolrException log SEVERE: There was a problem finding the leader in zk:org.apache.solr.common.SolrException: Could not get leader props My impression is that garbage collector is at fault here. This is the cmdline of tomcat: /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Djava.util.logging.config.file=/var/lib/tomcat7/conf/logging.properties -Djava.awt.headless=true -Xmx8192m -XX:+UseConcMarkSweepGC -DnumShards=2 -Djetty.port=8080 -DzkHost=10.215.1.96:2181,10.215.1.97:2181,10.215.1.98:2181 -javaagent:/opt/newrelic/newrelic.jar -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djav .endorsed.dirs=/usr/share/tomcat7/endorsed -classpath /usr/share/tomcat7/bin/bootstrap.jar:/usr/share/tomcat7/bin/tomcat-juli.jar -Dcatalina.base=/var/lib/tomcat7 -Dcatalina.home=/usr/share/tomcat7 -Djava.io.tmpdir=/tmp/tomcat7-tomcat7-tmp org.apache.catalina.startup.Bootstrap start So, I am using MarkSweepGC. Do you have any suggestion how can I debug this further and potentially eliminate the issue causing downtimes?