The transition of the load happened after I restarted the bulk insert process.
The size of the index on each server about 500GB. There are about 8 warnings on each server for "Not found segment file" like that Error getting file length for [segments_2s4] java.nio.file.NoSuchFileException: /media/ssd_losedata/solr-home/data/documents_online_shard16_replica_n1/data/index/segments_2s4 at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) at java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) at java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:145) at java.base/sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) at java.base/java.nio.file.Files.readAttributes(Files.java:1755) at java.base/java.nio.file.Files.size(Files.java:2369) at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243) at org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128) at org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:611) at org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:584) at org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:136) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2474) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:720) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:526) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:378) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:322) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.base/java.lang.Thread.run(Thread.java:844) On Mon, Oct 16, 2017 at 1:08 PM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > I did not look at graph details - now I see that it is over 3h time span. > It seems that there was a load on the other server before this one and > ended with 14GB read spike and 10GB write spike, just before load started > on this server. Do you see any errors or suspicious logs lines? > How big is your index? > > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 16 Oct 2017, at 12:39, Mahmoud Almokadem <prog.mahm...@gmail.com> > wrote: > > > > Yes, it's constantly since I started this bulk indexing process. > > As you see the write operations on the loaded server are 3x the normal > > server despite Disk writes not 3x times. > > > > Mahmoud > > > > > > On Mon, Oct 16, 2017 at 12:32 PM, Emir Arnautović < > > emir.arnauto...@sematext.com> wrote: > > > >> Hi Mahmoud, > >> Is this something that you see constantly? Network charts suggests that > >> your servers are loaded equally and as you said - you are not using > routing > >> so expected. Disk read/write and CPU are not equal and it is expected to > >> not be equal during heavy indexing since it also triggers segment merges > >> which require those resources. Even if host same documents (e.g. leader > and > >> replica) merges are not likely to happen at the same time and you can > >> expect to see such cases. > >> > >> Thanks, > >> Emir > >> -- > >> Monitoring - Log Management - Alerting - Anomaly Detection > >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > >> > >> > >> > >>> On 16 Oct 2017, at 11:58, Mahmoud Almokadem <prog.mahm...@gmail.com> > >> wrote: > >>> > >>> Here are the screen shots for the two server metrics on Amazon > >>> > >>> https://ibb.co/kxBQam > >>> https://ibb.co/fn0Jvm > >>> https://ibb.co/kUpYT6 > >>> > >>> > >>> > >>> On Mon, Oct 16, 2017 at 11:37 AM, Mahmoud Almokadem < > >> prog.mahm...@gmail.com> > >>> wrote: > >>> > >>>> Hi Emir, > >>>> > >>>> We doesn't use routing. > >>>> > >>>> Servers is already balanced and the number of documents on each shard > >> are > >>>> approximately the same. > >>>> > >>>> Nothing running on the servers except Solr and ZooKeeper. > >>>> > >>>> I initialized the client as > >>>> > >>>> String zkHost = "192.168.1.89:2181,192.168.1.99:2181"; > >>>> > >>>> CloudSolrClient solrCloud = new CloudSolrClient.Builder() > >>>> .withZkHost(zkHost) > >>>> .build(); > >>>> > >>>> solrCloud.setIdField("document_id"); > >>>> solrCloud.setDefaultCollection(collection); > >>>> solrCloud.setRequestWriter(new BinaryRequestWriter()); > >>>> > >>>> > >>>> And the documents are approximately the same size. > >>>> > >>>> I Used 10 threads with 10 SolrClients to send data to solr and every > >>>> thread send a batch of 1000 documents every time. > >>>> > >>>> Thanks, > >>>> Mahmoud > >>>> > >>>> > >>>> > >>>> On Mon, Oct 16, 2017 at 11:01 AM, Emir Arnautović < > >>>> emir.arnauto...@sematext.com> wrote: > >>>> > >>>>> Hi Mahmoud, > >>>>> Do you use routing? Are your servers equally balanced - do you end up > >>>>> having approximately the same number of documents hosted on both > >> servers > >>>>> (counted all shards)? > >>>>> Do you have anything else running on those servers? > >>>>> How do you initialise your SolrJ client? > >>>>> Are documents of similar size? > >>>>> > >>>>> Thanks, > >>>>> Emir > >>>>> -- > >>>>> Monitoring - Log Management - Alerting - Anomaly Detection > >>>>> Solr & Elasticsearch Consulting Support Training - > >> http://sematext.com/ > >>>>> > >>>>> > >>>>> > >>>>>> On 16 Oct 2017, at 10:46, Mahmoud Almokadem <prog.mahm...@gmail.com > > > >>>>> wrote: > >>>>>> > >>>>>> We've installed SolrCloud 7.0.1 with two nodes and 8 shards per > node. > >>>>>> > >>>>>> The configurations and the specs of the two servers are identical. > >>>>>> > >>>>>> When running bulk indexing using SolrJ we see one of the servers is > >>>>> fully > >>>>>> loaded as you see on the images and the other is normal. > >>>>>> > >>>>>> Images URLs: > >>>>>> > >>>>>> https://ibb.co/jkE6gR > >>>>>> https://ibb.co/hyzvam > >>>>>> https://ibb.co/mUpvam > >>>>>> https://ibb.co/e4bxo6 > >>>>>> > >>>>>> How can I figure this issue? > >>>>>> > >>>>>> Thanks, > >>>>>> Mahmoud > >>>>> > >>>>> > >>>> > >> > >> > >