Shard CPU usage?

2017-04-26 Thread Jakov Sosic

Hi guys,

I was wondering does the introduction of shards actually increase CPU usage?

I have a 30GB index split into two shards (15GB each), and by analyzing 
the logs, I figured out that ~80% of the queries have the 
"=http://10.3.4.12:8080/solr/mycore/|http://10.3.4.14:8080/solr/mycore/;.


I basically don't need sharding, and am now starting to wonder if shards 
are actually increasing the CPU usage of my nodes or not, cause of the 
huge percentage of queries with "shard.url=" flag?


I'm fighting with high cpu usage, and if turning sharding of and just 
keeping the replicas in my collection would lower the CPU usage for more 
then 10% I would choose that path..



Any insights?

Thanks.



Admin UI doesn't show logs?

2015-03-05 Thread Jakov Sosic

Hi,

I'm running 4.10.3 under tomcat 7, and I have an issue with Admin UI.

When I click on a Logging - I don't see actual entries but only:


   No Events available


and round icon circling non stop.

When I click on Level, I see the same icon, and message Loading 



Is there a hint or something you could point me to, so I could fix it?


Solr 4.10.x on Oracle Java 1.8.x ?

2015-02-10 Thread Jakov Sosic

Hi guys,

at the end of April Java 1.7 will be obsoleted, and Oracle will stop 
updating it.


Is it safe to run Tomcat7 / Solr 4.10 on Java 1.8? Did anyone tried it 
already?


Adding new core to solr cloud?

2015-02-10 Thread Jakov Sosic

Hi guys

I need to add a new core to existing solr cloud of 4 nodes (2 replicas 
and 2 shardS), this is the procedure I have in mind:


1) stop node01
2) change solr.xml to include new core (included in tomcat configuration)
3) add -Dbootstrap_conf=true to JAVA_OPTS
4) start tomcat on node01

Now, I know this should push configuration for even existing cores, but 
I don't mind cause configuration didn't change for quite a bit.


After this, I plan to remove -Dbootstrap_conf=true from node01 
JAVA_OPTS and restart it again, and after the cloud stabilizes, do steps 
1), 2), and 4) on remaining nodes.



What do you think, am I missing something?


Re: Solr on Tomcat

2015-02-10 Thread Jakov Sosic

On 02/10/2015 07:55 PM, Dan Davis wrote:

As an application developer, I have to agree with this direction.   I ran
ManifoldCF and Solr together in the same Tomcat, and the sl4j
configurations of the two conflicted with strange results.   From a systems
administrator/operations perspective, a separate install allows better
packaging, e.g. Debian and RPM packages are then possible, although may not
be preferred as many enterprises will want to use Oracle Java rather than
OpenJDK.


And what exactly stops you from running two different Tomcat services, 
each of them for 1 respective app ?




Re: Migrating cloud to another set of machines

2014-10-30 Thread Jakov Sosic

On 10/30/2014 04:47 AM, Otis Gospodnetic wrote:

Hi/Bok Jakov,

2) sounds good to me.  It means no down-time.  1) means stoppage.  If
stoppage is not OK, but falling behind with indexing new content is OK, you
could:
* add a new cluster
* start reading from old index and indexing into the new index
* stop old cluster when done
* index new content to new cluster (or maybe you can be doing this all
along if indexing old + new at the same time is OK for you)
--


Thank you for suggestions Otis.

Everything is acceptable currently, but in the future as the data grows, 
we will certainly enter those edge cases where neither stopping indexing 
nor stopping queries will be acceptable.


What makes things a little bit more problematic is that ZooKeepers are 
migrating also to new machines.





Migrating cloud to another set of machines

2014-10-29 Thread Jakov Sosic

Hi guys


I was wondering is there some smart way to migrate Solr cloud from 1 set 
of machines to another?


Specificaly, I have 2 cores, each of them with 2 replicas and 2 shards, 
spread across 4 machines.


We bought new HW and are in a process of moving to new 4 machines.


What are my options?


1) - Create new cluster on new set of machines.
   - stop write operations
   - copy data directories from old machines to new machines
   - start solrs on new machines


2) - expand number of replicas from 2 to 4
   - add new solr nodes to cloud
   - wait for resync
   - stop old solr nodes
   - shrink number of replicas from 4 back to 2


Is there any other path to achieve this?

I'm leaning towards no1, because I don't feel too comfortable with doing 
all those changes explained in no2 ...


Ideas?


Re: New cloud - replica in recovering state?

2014-09-08 Thread Jakov Sosic

On 09/08/2014 02:55 AM, Erick Erickson wrote:

I really recommend you use the new-style core discovery, if for no
other reason than this style is deprecated in 5.0. See:
https://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond


Oh I didn't know that.

Anyway problem I experienced was result of wrong hostPort and/or 
hostContext set in cores tag.


After I fixed those, now it works, but anyway I will take a look into 
new way of setting up cores. Ty!




New cloud - replica in recovering state?

2014-09-07 Thread Jakov Sosic

Hi guys,


I'm trying to set up new solr cloud, with two core's, each with two 
shards and two replicas.


This is my solr.xml:

?xml version=1.0 encoding=UTF-8 ?
  solr persistent=true
zkHost=10.200.1.104:2181,10.200.1.105:2181,10.200.1.106:2181
cores adminPath=/admin/cores
   defaultCoreName=mycore1
   host=${host:} hostPort=${jetty.port:}
   hostContext=${hostContext:}
   zkClientTimeout=${zkClientTimeout:15000}
  core name=mycore1 instanceDir=mycore1 numShards=2/
  core name=mycore2 instanceDir=mycore2 numShards=2/
/cores
/solr

But when I start everything, I can see 4 cores (each for 1 shard) are 
green in solr01:8080/solr/#/~cloud, but replicas are in yellow, 
RECOVERING state.


How can I fix them to go from Recovering to Active?


Re: solr cloud going down repeatedly

2014-08-25 Thread Jakov Sosic

On 08/19/2014 04:58 PM, Shawn Heisey wrote:

On 8/19/2014 3:12 AM, Jakov Sosic wrote:

Thank you for your comment.

How did you test these settings? I mean, that's a lot of tuning and I
would like to set up some test environment to be certain this is what
I want...


I included a section on tools when I wrote this page:

http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems


Thanks,


we ended up using cron to restart Tomcats every 7 days, each solr node 
per day... that way we avoid GC pauses.


Until we figure things out in our dev environment and test GC 
optimizations, we will keep it this way.




Re: solr cloud going down repeatedly

2014-08-19 Thread Jakov Sosic

On 08/18/2014 08:38 PM, Shawn Heisey wrote:


With an 8GB heap and UseConcMarkSweepGC as your only GC tuning, I can
pretty much guarantee that you'll see occasional GC pauses of 10-15
seconds, because I saw exactly that happening with my own setup.

This is what I use now:

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

I can't claim that my problem is 100% solved, but collections that go
over one second are *very* rare now, and I'm pretty sure they are all
under two seconds.


Thank you for your comment.

How did you test these settings? I mean, that's a lot of tuning and I 
would like to set up some test environment to be certain this is what I 
want...




solr cloud going down repeatedly

2014-08-18 Thread Jakov Sosic

Hi guys.

I have a solr cloud, consisting of 3 zookeper VMs running 3.4.5 
backported from Ubuntu 14.04 LTS to 12.04 LTS.


They are orchestrating 4 solr nodes, which have 2 cores. Each core is 
sharded, so 1 shard is on each of the solr nodes.


Solr runs under tomcat7 and ubuntus latest openjdk 7.

Version of solr is 4.2.1.

Each of the nodes have around 7GB of data, and JVM is set to run 8GB 
heap. All solr nodes have 16GB RAM.



Few weeks back we started having issues with this installation. Tomcat 
was filling up catalina.out with following messages:


SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:


Only solution was to restart all 4 tomcats on 4 solr nodes. After that, 
issue would rectify itself, but would occur again, approximately a week 
after a restart.


This happened last time yesterday, and I succeded in recording some of 
the stuff happening on boxes via Zabbix and atop.



Basically at 15:35 load on machine went berzerk, jumping from around 0.5 
to around 30+


Zabbix and atop didn't notice any heavy IO, all the other processes were 
practicaly idle, only JVM (tomcat) exploded with cpu usage increasing 
from standard ~80% to around ~750%


These are the parts of Atop recordings on one of the node. Note that 
they are 10 mins appart:


(15:28:42)
CPL | avg10.12  |   | avg50.36  | avg15   0.38  |

(15:38:42)
CPL | avg18.54  |   | avg53.62  | avg15   1.61  |

(15:48:42)
CPL | avg1   30.14  |   | avg5   27.09  | avg15  14.73  |



This is the status of tomcat at last point (15:48:42):
28891tomcat7 tomcat7  411  8.68s  70m14s 
   209.9M  204K0K 5804K --  - 
  S5704%java



I have noticed similar stuff happening around the solr nodes. At 17:41 
on call person decided to hard reset all the solr nodes, and cloud came 
back up running normally after that.


These are the logs that I found on first node:

Aug 17, 2014 3:44:58 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:

Aug 17, 2014 3:46:12 PM 
org.apache.solr.cloud.OverseerCollectionProcessor run

WARNING: Overseer cannot talk to ZK
Aug 17, 2014 3:46:12 PM 
org.apache.solr.cloud.Overseer$ClusterStateUpdater amILeader

WARNING:
org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for /overseer_elect/leader


Then a bunch of :

Aug 17, 2014 3:46:42 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:

until the server was rebooted.


On other nodes I can see:
node2:

Aug 17, 2014 3:44:58 PM org.apache.solr.cloud.RecoveryStrategy close
WARNING: Stopping recovery for 
zkNodeName=10.100.254.103:8080_solr_myappcore=myapp

Aug 17, 2014 3:44:58 PM org.apache.solr.cloud.RecoveryStrategy close
WARNING: Stopping recovery for 
zkNodeName=10.100.254.103:8080_solr_myapp2core=myapp2

Aug 17, 2014 3:46:24 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: 
org.apache.solr.client.solrj.SolrServerException: IOException occured 
when talking to server at: http://node1:8080/solr/myapp


node4:

Aug 17, 2014 3:44:06 PM org.apache.solr.cloud.RecoveryStrategy close
WARNING: Stopping recovery for 
zkNodeName=10.100.254.105:8080_solr_myapp2core=myapp2

Aug 17, 2014 3:44:09 PM org.apache.solr.cloud.RecoveryStrategy close
WARNING: Stopping recovery for 
zkNodeName=10.100.254.105:8080_solr_myappcore=myapp

Aug 17, 2014 3:45:37 PM org.apache.solr.common.SolrException log
SEVERE: There was a problem finding the leader in 
zk:org.apache.solr.common.SolrException: Could not get leader props





My impression is that garbage collector is at fault here.

This is the cmdline of tomcat:

/usr/lib/jvm/java-7-openjdk-amd64/bin/java 
-Djava.util.logging.config.file=/var/lib/tomcat7/conf/logging.properties 
-Djava.awt.headless=true -Xmx8192m -XX:+UseConcMarkSweepGC -DnumShards=2 
-Djetty.port=8080 
-DzkHost=10.215.1.96:2181,10.215.1.97:2181,10.215.1.98:2181 
-javaagent:/opt/newrelic/newrelic.jar -Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=9010 
-Dcom.sun.management.jmxremote.local.only=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djav 
.endorsed.dirs=/usr/share/tomcat7/endorsed -classpath 
/usr/share/tomcat7/bin/bootstrap.jar:/usr/share/tomcat7/bin/tomcat-juli.jar 
-Dcatalina.base=/var/lib/tomcat7 -Dcatalina.home=/usr/share/tomcat7 
-Djava.io.tmpdir=/tmp/tomcat7-tomcat7-tmp 
org.apache.catalina.startup.Bootstrap start



So, I am using MarkSweepGC.

Do you have any suggestion how can I debug this further and potentially 
eliminate the issue causing downtimes?