Re: High disk write usage
Thanks Shawn! I will try to change the values of those parameters 2017-07-10 14:57 GMT+02:00 Shawn Heisey: > On 7/10/2017 2:57 AM, Antonio De Miguel wrote: > > I continue deeping inside this problem... high writing rates continues. > > > > Searching in logs i see this: > > > > 2017-07-10 08:46:18.888 INFO (commitScheduler-11-thread-1) [c:ads > s:shard2 > > r:core_node47 x:ads_shard2_replica3] o.a.s.u.LoggingInfoStream > > [DWPT][commitScheduler-11-thread-1]: flushed: segment=_mb7 > ramUsed=7.531 MB > > newFlushedSize=2.472 MB docs/MB=334.132 > > 2017-07-10 08:46:29.336 INFO (commitScheduler-11-thread-1) [c:ads > s:shard2 > > r:core_node47 x:ads_shard2_replica3] o.a.s.u.LoggingInfoStream > > [DWPT][commitScheduler-11-thread-1]: flushed: segment=_mba > ramUsed=8.079 MB > > newFlushedSize=1.784 MB docs/MB=244.978 > > > > > > A flush happens each 10 seconds (my autosoftcommit time is 10 secs and > > hardcommit 5 minutes). ¿is the expected behaviour? > > If you are indexing continuously, then the auto soft commit time of 10 > seconds means that this will be happening every ten seconds. > > > I thought soft commits does not write into disk... > > If you are using the correct DirectoryFactory type, a soft commit has > the *possibility* of not writing to disk, but the amount of memory > reserved is fairly small. > > Looking into the source code for NRTCachingDirectoryFactory, I see that > maxMergeSizeMB defaults to 4, and maxCachedMB defaults to 48. This is a > little bit different than what the javadoc states for > NRTCachingDirectory (5 and 60): > > http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/ > store/NRTCachingDirectory.html > > The way I read this, assuming the amount of segment data created is > small, only the first few soft commits will be entirely handled in > memory. After that, older segments must be flushed to disk to make room > for new ones. > > If the indexing rate is high, there's not really much difference between > soft commits and hard commits. This also assumes that you have left the > directory at the default of NRTCachingDirectoryFactory. If this has > been changed, then there is no caching in RAM, and soft commit probably > behaves *exactly* the same as hard commit. > > Thanks, > Shawn > >
Re: High disk write usage
On 7/10/2017 2:57 AM, Antonio De Miguel wrote: > I continue deeping inside this problem... high writing rates continues. > > Searching in logs i see this: > > 2017-07-10 08:46:18.888 INFO (commitScheduler-11-thread-1) [c:ads s:shard2 > r:core_node47 x:ads_shard2_replica3] o.a.s.u.LoggingInfoStream > [DWPT][commitScheduler-11-thread-1]: flushed: segment=_mb7 ramUsed=7.531 MB > newFlushedSize=2.472 MB docs/MB=334.132 > 2017-07-10 08:46:29.336 INFO (commitScheduler-11-thread-1) [c:ads s:shard2 > r:core_node47 x:ads_shard2_replica3] o.a.s.u.LoggingInfoStream > [DWPT][commitScheduler-11-thread-1]: flushed: segment=_mba ramUsed=8.079 MB > newFlushedSize=1.784 MB docs/MB=244.978 > > > A flush happens each 10 seconds (my autosoftcommit time is 10 secs and > hardcommit 5 minutes). ¿is the expected behaviour? If you are indexing continuously, then the auto soft commit time of 10 seconds means that this will be happening every ten seconds. > I thought soft commits does not write into disk... If you are using the correct DirectoryFactory type, a soft commit has the *possibility* of not writing to disk, but the amount of memory reserved is fairly small. Looking into the source code for NRTCachingDirectoryFactory, I see that maxMergeSizeMB defaults to 4, and maxCachedMB defaults to 48. This is a little bit different than what the javadoc states for NRTCachingDirectory (5 and 60): http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/store/NRTCachingDirectory.html The way I read this, assuming the amount of segment data created is small, only the first few soft commits will be entirely handled in memory. After that, older segments must be flushed to disk to make room for new ones. If the indexing rate is high, there's not really much difference between soft commits and hard commits. This also assumes that you have left the directory at the default of NRTCachingDirectoryFactory. If this has been changed, then there is no caching in RAM, and soft commit probably behaves *exactly* the same as hard commit. Thanks, Shawn
Re: High disk write usage
Hi! I continue deeping inside this problem... high writing rates continues. Searching in logs i see this: 2017-07-10 08:46:18.888 INFO (commitScheduler-11-thread-1) [c:ads s:shard2 r:core_node47 x:ads_shard2_replica3] o.a.s.u.LoggingInfoStream [DWPT][commitScheduler-11-thread-1]: flushed: segment=_mb7 ramUsed=7.531 MB newFlushedSize=2.472 MB docs/MB=334.132 2017-07-10 08:46:29.336 INFO (commitScheduler-11-thread-1) [c:ads s:shard2 r:core_node47 x:ads_shard2_replica3] o.a.s.u.LoggingInfoStream [DWPT][commitScheduler-11-thread-1]: flushed: segment=_mba ramUsed=8.079 MB newFlushedSize=1.784 MB docs/MB=244.978 A flush happens each 10 seconds (my autosoftcommit time is 10 secs and hardcommit 5 minutes). ¿is the expected behaviour? I thought soft commits does not write into disk... 2017-07-06 0:02 GMT+02:00 Antonio De Miguel <deveto...@gmail.com>: > Hi erik. > > What i want to said is that we have enough memory to store shards, and > furthermore, JVMs heapspaces > > Machine has 400gb of RAM. I think we have enough. > > We have 10 JVM running on the machine, each of one using 16gb. > > Shard size is about 8gb. > > When we have query or indexing peaks our problem are the CPU ussage and > the disk io, but we have a lot of unused memory. > > > > > > > > > > El 5/7/2017 19:04, "Erick Erickson" <erickerick...@gmail.com> escribió: > >> bq: We have enough physical RAM to store full collection and 16Gb for >> each JVM. >> >> That's not quite what I was asking for. Lucene uses MMapDirectory to >> map part of the index into the OS memory space. If you've >> over-allocated the JVM space relative to your physical memory that >> space can start swapping. Frankly I'd expect your query performance to >> die if that was happening so this is a sanity check. >> >> How much physical memory does the machine have and how much memory is >> allocated to _all_ of the JVMs running on that machine? >> >> see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on >> -64bit.html >> >> Best, >> Erick >> >> >> On Wed, Jul 5, 2017 at 9:41 AM, Antonio De Miguel <deveto...@gmail.com> >> wrote: >> > Hi Erik! thanks for your response! >> > >> > Our soft commit is 5 seconds. Why generates I/0 a softcommit? first >> notice. >> > >> > >> > We have enough physical RAM to store full collection and 16Gb for each >> > JVM. The collection is relatively small. >> > >> > I've tried (for testing purposes) disabling transactionlog (commenting >> > )... but cluster does not go up. I'll try writing into >> separated >> > drive, nice idea... >> > >> > >> > >> > >> > >> > >> > >> > >> > 2017-07-05 18:04 GMT+02:00 Erick Erickson <erickerick...@gmail.com>: >> > >> >> What is your soft commit interval? That'll cause I/O as well. >> >> >> >> How much physical RAM and how much is dedicated to _all_ the JVMs on a >> >> machine? One cause here is that Lucene uses MMapDirectory which can be >> >> starved for OS memory if you use too much JVM, my rule of thumb is >> >> that _at least_ half of the physical memory should be reserved for the >> >> OS. >> >> >> >> Your transaction logs should fluctuate but even out. By that I mean >> >> they should increase in size but every hard commit should truncate >> >> some of them so I wouldn't expect them to grow indefinitely. >> >> >> >> One strategy is to put your tlogs on a separate drive exactly to >> >> reduce contention. You could disable them too at a cost of risking >> >> your data. That might be a quick experiment you could run though, >> >> disable tlogs and see what that changes. Of course I'd do this on my >> >> test system ;). >> >> >> >> But yeah, Solr will use a lot of I/O in the scenario you are outlining >> >> I'm afraid. >> >> >> >> Best, >> >> Erick >> >> >> >> On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel <deveto...@gmail.com >> > >> >> wrote: >> >> > thanks Markus! >> >> > >> >> > We already have SSD. >> >> > >> >> > About changing topology we probed yesterday with 10 shards, but >> >> system >> >> > goes more inconsistent than with the current topology (5x10). I dont >> know >> >> > why... too many traffic perhaps? &
Re: High disk write usage
Hi erik. What i want to said is that we have enough memory to store shards, and furthermore, JVMs heapspaces Machine has 400gb of RAM. I think we have enough. We have 10 JVM running on the machine, each of one using 16gb. Shard size is about 8gb. When we have query or indexing peaks our problem are the CPU ussage and the disk io, but we have a lot of unused memory. El 5/7/2017 19:04, "Erick Erickson" <erickerick...@gmail.com> escribió: > bq: We have enough physical RAM to store full collection and 16Gb for each > JVM. > > That's not quite what I was asking for. Lucene uses MMapDirectory to > map part of the index into the OS memory space. If you've > over-allocated the JVM space relative to your physical memory that > space can start swapping. Frankly I'd expect your query performance to > die if that was happening so this is a sanity check. > > How much physical memory does the machine have and how much memory is > allocated to _all_ of the JVMs running on that machine? > > see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory- > on-64bit.html > > Best, > Erick > > > On Wed, Jul 5, 2017 at 9:41 AM, Antonio De Miguel <deveto...@gmail.com> > wrote: > > Hi Erik! thanks for your response! > > > > Our soft commit is 5 seconds. Why generates I/0 a softcommit? first > notice. > > > > > > We have enough physical RAM to store full collection and 16Gb for each > > JVM. The collection is relatively small. > > > > I've tried (for testing purposes) disabling transactionlog (commenting > > )... but cluster does not go up. I'll try writing into > separated > > drive, nice idea... > > > > > > > > > > > > > > > > > > 2017-07-05 18:04 GMT+02:00 Erick Erickson <erickerick...@gmail.com>: > > > >> What is your soft commit interval? That'll cause I/O as well. > >> > >> How much physical RAM and how much is dedicated to _all_ the JVMs on a > >> machine? One cause here is that Lucene uses MMapDirectory which can be > >> starved for OS memory if you use too much JVM, my rule of thumb is > >> that _at least_ half of the physical memory should be reserved for the > >> OS. > >> > >> Your transaction logs should fluctuate but even out. By that I mean > >> they should increase in size but every hard commit should truncate > >> some of them so I wouldn't expect them to grow indefinitely. > >> > >> One strategy is to put your tlogs on a separate drive exactly to > >> reduce contention. You could disable them too at a cost of risking > >> your data. That might be a quick experiment you could run though, > >> disable tlogs and see what that changes. Of course I'd do this on my > >> test system ;). > >> > >> But yeah, Solr will use a lot of I/O in the scenario you are outlining > >> I'm afraid. > >> > >> Best, > >> Erick > >> > >> On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel <deveto...@gmail.com> > >> wrote: > >> > thanks Markus! > >> > > >> > We already have SSD. > >> > > >> > About changing topology we probed yesterday with 10 shards, but > >> system > >> > goes more inconsistent than with the current topology (5x10). I dont > know > >> > why... too many traffic perhaps? > >> > > >> > About merge factor.. we set default configuration for some days... but > >> when > >> > a merge occurs system overload. We probed with mergefactor of 4 to > >> improbe > >> > query times and trying to have smaller merges. > >> > > >> > 2017-07-05 16:51 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io > >: > >> > > >> >> Try mergeFactor of 10 (default) which should be fine in most cases. > If > >> you > >> >> got an extreme case, either create more shards and consider better > >> hardware > >> >> (SSD's) > >> >> > >> >> -Original message- > >> >> > From:Antonio De Miguel <deveto...@gmail.com> > >> >> > Sent: Wednesday 5th July 2017 16:48 > >> >> > To: solr-user@lucene.apache.org > >> >> > Subject: Re: High disk write usage > >> >> > > >> >> > Thnaks a lot alessandro! > >> >> > > >> >> > Yes, we have very big physical dedicated machines, with a topology > of > >> 5 > >> >> >
Re: High disk write usage
bq: We have enough physical RAM to store full collection and 16Gb for each JVM. That's not quite what I was asking for. Lucene uses MMapDirectory to map part of the index into the OS memory space. If you've over-allocated the JVM space relative to your physical memory that space can start swapping. Frankly I'd expect your query performance to die if that was happening so this is a sanity check. How much physical memory does the machine have and how much memory is allocated to _all_ of the JVMs running on that machine? see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Best, Erick On Wed, Jul 5, 2017 at 9:41 AM, Antonio De Miguel <deveto...@gmail.com> wrote: > Hi Erik! thanks for your response! > > Our soft commit is 5 seconds. Why generates I/0 a softcommit? first notice. > > > We have enough physical RAM to store full collection and 16Gb for each > JVM. The collection is relatively small. > > I've tried (for testing purposes) disabling transactionlog (commenting > )... but cluster does not go up. I'll try writing into separated > drive, nice idea... > > > > > > > > > 2017-07-05 18:04 GMT+02:00 Erick Erickson <erickerick...@gmail.com>: > >> What is your soft commit interval? That'll cause I/O as well. >> >> How much physical RAM and how much is dedicated to _all_ the JVMs on a >> machine? One cause here is that Lucene uses MMapDirectory which can be >> starved for OS memory if you use too much JVM, my rule of thumb is >> that _at least_ half of the physical memory should be reserved for the >> OS. >> >> Your transaction logs should fluctuate but even out. By that I mean >> they should increase in size but every hard commit should truncate >> some of them so I wouldn't expect them to grow indefinitely. >> >> One strategy is to put your tlogs on a separate drive exactly to >> reduce contention. You could disable them too at a cost of risking >> your data. That might be a quick experiment you could run though, >> disable tlogs and see what that changes. Of course I'd do this on my >> test system ;). >> >> But yeah, Solr will use a lot of I/O in the scenario you are outlining >> I'm afraid. >> >> Best, >> Erick >> >> On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel <deveto...@gmail.com> >> wrote: >> > thanks Markus! >> > >> > We already have SSD. >> > >> > About changing topology we probed yesterday with 10 shards, but >> system >> > goes more inconsistent than with the current topology (5x10). I dont know >> > why... too many traffic perhaps? >> > >> > About merge factor.. we set default configuration for some days... but >> when >> > a merge occurs system overload. We probed with mergefactor of 4 to >> improbe >> > query times and trying to have smaller merges. >> > >> > 2017-07-05 16:51 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io>: >> > >> >> Try mergeFactor of 10 (default) which should be fine in most cases. If >> you >> >> got an extreme case, either create more shards and consider better >> hardware >> >> (SSD's) >> >> >> >> -Original message- >> >> > From:Antonio De Miguel <deveto...@gmail.com> >> >> > Sent: Wednesday 5th July 2017 16:48 >> >> > To: solr-user@lucene.apache.org >> >> > Subject: Re: High disk write usage >> >> > >> >> > Thnaks a lot alessandro! >> >> > >> >> > Yes, we have very big physical dedicated machines, with a topology of >> 5 >> >> > shards and10 replicas each shard. >> >> > >> >> > >> >> > 1. transaction log files are increasing but not with this rate >> >> > >> >> > 2. we 've probed with values between 300 and 2000 MB... without any >> >> > visible results >> >> > >> >> > 3. We don't use those features >> >> > >> >> > 4. No. >> >> > >> >> > 5. I've probed with low and high mergefacors and i think that is the >> >> point. >> >> > >> >> > With low merge factor (over 4) we 've high write disk rate as i said >> >> > previously >> >> > >> >> > with merge factor of 20, writing disk rate is decreasing, but now, >> with >> >> > high qps rates (over 1000 qps) system is overloaded. >> >> > >> >> > i think that's the expecte
Re: High disk write usage
Hi Erik! thanks for your response! Our soft commit is 5 seconds. Why generates I/0 a softcommit? first notice. We have enough physical RAM to store full collection and 16Gb for each JVM. The collection is relatively small. I've tried (for testing purposes) disabling transactionlog (commenting )... but cluster does not go up. I'll try writing into separated drive, nice idea... 2017-07-05 18:04 GMT+02:00 Erick Erickson <erickerick...@gmail.com>: > What is your soft commit interval? That'll cause I/O as well. > > How much physical RAM and how much is dedicated to _all_ the JVMs on a > machine? One cause here is that Lucene uses MMapDirectory which can be > starved for OS memory if you use too much JVM, my rule of thumb is > that _at least_ half of the physical memory should be reserved for the > OS. > > Your transaction logs should fluctuate but even out. By that I mean > they should increase in size but every hard commit should truncate > some of them so I wouldn't expect them to grow indefinitely. > > One strategy is to put your tlogs on a separate drive exactly to > reduce contention. You could disable them too at a cost of risking > your data. That might be a quick experiment you could run though, > disable tlogs and see what that changes. Of course I'd do this on my > test system ;). > > But yeah, Solr will use a lot of I/O in the scenario you are outlining > I'm afraid. > > Best, > Erick > > On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel <deveto...@gmail.com> > wrote: > > thanks Markus! > > > > We already have SSD. > > > > About changing topology we probed yesterday with 10 shards, but > system > > goes more inconsistent than with the current topology (5x10). I dont know > > why... too many traffic perhaps? > > > > About merge factor.. we set default configuration for some days... but > when > > a merge occurs system overload. We probed with mergefactor of 4 to > improbe > > query times and trying to have smaller merges. > > > > 2017-07-05 16:51 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io>: > > > >> Try mergeFactor of 10 (default) which should be fine in most cases. If > you > >> got an extreme case, either create more shards and consider better > hardware > >> (SSD's) > >> > >> -Original message- > >> > From:Antonio De Miguel <deveto...@gmail.com> > >> > Sent: Wednesday 5th July 2017 16:48 > >> > To: solr-user@lucene.apache.org > >> > Subject: Re: High disk write usage > >> > > >> > Thnaks a lot alessandro! > >> > > >> > Yes, we have very big physical dedicated machines, with a topology of > 5 > >> > shards and10 replicas each shard. > >> > > >> > > >> > 1. transaction log files are increasing but not with this rate > >> > > >> > 2. we 've probed with values between 300 and 2000 MB... without any > >> > visible results > >> > > >> > 3. We don't use those features > >> > > >> > 4. No. > >> > > >> > 5. I've probed with low and high mergefacors and i think that is the > >> point. > >> > > >> > With low merge factor (over 4) we 've high write disk rate as i said > >> > previously > >> > > >> > with merge factor of 20, writing disk rate is decreasing, but now, > with > >> > high qps rates (over 1000 qps) system is overloaded. > >> > > >> > i think that's the expected behaviour :( > >> > > >> > > >> > > >> > > >> > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti <a.benede...@sease.io > >: > >> > > >> > > Point 2 was the ram Buffer size : > >> > > > >> > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene > >> > > indexing for buffering added documents and deletions before > >> they > >> > > are > >> > > flushed to the Directory. > >> > > maxBufferedDocs sets a limit on the number of documents > >> buffered > >> > > before flushing. > >> > > If both ramBufferSizeMB and maxBufferedDocs is set, then > >> > > Lucene will flush based on whichever limit is hit first. > >> > > > >> > > 100 > >> > > 1000 > >> > > > >> > > > >> > > > >> > > > >> > > - > >> > > --- > >> > > Alessandro Benedetti > >> > > Search Consultant, R Software Engineer, Director > >> > > Sease Ltd. - www.sease.io > >> > > -- > >> > > View this message in context: http://lucene.472066.n3. > >> > > nabble.com/High-disk-write-usage-tp4344356p4344386.html > >> > > Sent from the Solr - User mailing list archive at Nabble.com. > >> > > > >> > > >> >
Re: High disk write usage
What is your soft commit interval? That'll cause I/O as well. How much physical RAM and how much is dedicated to _all_ the JVMs on a machine? One cause here is that Lucene uses MMapDirectory which can be starved for OS memory if you use too much JVM, my rule of thumb is that _at least_ half of the physical memory should be reserved for the OS. Your transaction logs should fluctuate but even out. By that I mean they should increase in size but every hard commit should truncate some of them so I wouldn't expect them to grow indefinitely. One strategy is to put your tlogs on a separate drive exactly to reduce contention. You could disable them too at a cost of risking your data. That might be a quick experiment you could run though, disable tlogs and see what that changes. Of course I'd do this on my test system ;). But yeah, Solr will use a lot of I/O in the scenario you are outlining I'm afraid. Best, Erick On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel <deveto...@gmail.com> wrote: > thanks Markus! > > We already have SSD. > > About changing topology we probed yesterday with 10 shards, but system > goes more inconsistent than with the current topology (5x10). I dont know > why... too many traffic perhaps? > > About merge factor.. we set default configuration for some days... but when > a merge occurs system overload. We probed with mergefactor of 4 to improbe > query times and trying to have smaller merges. > > 2017-07-05 16:51 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io>: > >> Try mergeFactor of 10 (default) which should be fine in most cases. If you >> got an extreme case, either create more shards and consider better hardware >> (SSD's) >> >> -Original message- >> > From:Antonio De Miguel <deveto...@gmail.com> >> > Sent: Wednesday 5th July 2017 16:48 >> > To: solr-user@lucene.apache.org >> > Subject: Re: High disk write usage >> > >> > Thnaks a lot alessandro! >> > >> > Yes, we have very big physical dedicated machines, with a topology of 5 >> > shards and10 replicas each shard. >> > >> > >> > 1. transaction log files are increasing but not with this rate >> > >> > 2. we 've probed with values between 300 and 2000 MB... without any >> > visible results >> > >> > 3. We don't use those features >> > >> > 4. No. >> > >> > 5. I've probed with low and high mergefacors and i think that is the >> point. >> > >> > With low merge factor (over 4) we 've high write disk rate as i said >> > previously >> > >> > with merge factor of 20, writing disk rate is decreasing, but now, with >> > high qps rates (over 1000 qps) system is overloaded. >> > >> > i think that's the expected behaviour :( >> > >> > >> > >> > >> > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti <a.benede...@sease.io>: >> > >> > > Point 2 was the ram Buffer size : >> > > >> > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene >> > > indexing for buffering added documents and deletions before >> they >> > > are >> > > flushed to the Directory. >> > > maxBufferedDocs sets a limit on the number of documents >> buffered >> > > before flushing. >> > > If both ramBufferSizeMB and maxBufferedDocs is set, then >> > > Lucene will flush based on whichever limit is hit first. >> > > >> > > 100 >> > > 1000 >> > > >> > > >> > > >> > > >> > > - >> > > --- >> > > Alessandro Benedetti >> > > Search Consultant, R Software Engineer, Director >> > > Sease Ltd. - www.sease.io >> > > -- >> > > View this message in context: http://lucene.472066.n3. >> > > nabble.com/High-disk-write-usage-tp4344356p4344386.html >> > > Sent from the Solr - User mailing list archive at Nabble.com. >> > > >> > >>
Re: High disk write usage
thanks Markus! We already have SSD. About changing topology we probed yesterday with 10 shards, but system goes more inconsistent than with the current topology (5x10). I dont know why... too many traffic perhaps? About merge factor.. we set default configuration for some days... but when a merge occurs system overload. We probed with mergefactor of 4 to improbe query times and trying to have smaller merges. 2017-07-05 16:51 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io>: > Try mergeFactor of 10 (default) which should be fine in most cases. If you > got an extreme case, either create more shards and consider better hardware > (SSD's) > > -Original message- > > From:Antonio De Miguel <deveto...@gmail.com> > > Sent: Wednesday 5th July 2017 16:48 > > To: solr-user@lucene.apache.org > > Subject: Re: High disk write usage > > > > Thnaks a lot alessandro! > > > > Yes, we have very big physical dedicated machines, with a topology of 5 > > shards and10 replicas each shard. > > > > > > 1. transaction log files are increasing but not with this rate > > > > 2. we 've probed with values between 300 and 2000 MB... without any > > visible results > > > > 3. We don't use those features > > > > 4. No. > > > > 5. I've probed with low and high mergefacors and i think that is the > point. > > > > With low merge factor (over 4) we 've high write disk rate as i said > > previously > > > > with merge factor of 20, writing disk rate is decreasing, but now, with > > high qps rates (over 1000 qps) system is overloaded. > > > > i think that's the expected behaviour :( > > > > > > > > > > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti <a.benede...@sease.io>: > > > > > Point 2 was the ram Buffer size : > > > > > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene > > > indexing for buffering added documents and deletions before > they > > > are > > > flushed to the Directory. > > > maxBufferedDocs sets a limit on the number of documents > buffered > > > before flushing. > > > If both ramBufferSizeMB and maxBufferedDocs is set, then > > > Lucene will flush based on whichever limit is hit first. > > > > > > 100 > > > 1000 > > > > > > > > > > > > > > > - > > > --- > > > Alessandro Benedetti > > > Search Consultant, R Software Engineer, Director > > > Sease Ltd. - www.sease.io > > > -- > > > View this message in context: http://lucene.472066.n3. > > > nabble.com/High-disk-write-usage-tp4344356p4344386.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > >
RE: High disk write usage
Try mergeFactor of 10 (default) which should be fine in most cases. If you got an extreme case, either create more shards and consider better hardware (SSD's) -Original message- > From:Antonio De Miguel <deveto...@gmail.com> > Sent: Wednesday 5th July 2017 16:48 > To: solr-user@lucene.apache.org > Subject: Re: High disk write usage > > Thnaks a lot alessandro! > > Yes, we have very big physical dedicated machines, with a topology of 5 > shards and10 replicas each shard. > > > 1. transaction log files are increasing but not with this rate > > 2. we 've probed with values between 300 and 2000 MB... without any > visible results > > 3. We don't use those features > > 4. No. > > 5. I've probed with low and high mergefacors and i think that is the point. > > With low merge factor (over 4) we 've high write disk rate as i said > previously > > with merge factor of 20, writing disk rate is decreasing, but now, with > high qps rates (over 1000 qps) system is overloaded. > > i think that's the expected behaviour :( > > > > > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti <a.benede...@sease.io>: > > > Point 2 was the ram Buffer size : > > > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene > > indexing for buffering added documents and deletions before they > > are > > flushed to the Directory. > > maxBufferedDocs sets a limit on the number of documents buffered > > before flushing. > > If both ramBufferSizeMB and maxBufferedDocs is set, then > > Lucene will flush based on whichever limit is hit first. > > > > 100 > > 1000 > > > > > > > > > > - > > --- > > Alessandro Benedetti > > Search Consultant, R Software Engineer, Director > > Sease Ltd. - www.sease.io > > -- > > View this message in context: http://lucene.472066.n3. > > nabble.com/High-disk-write-usage-tp4344356p4344386.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > >
Re: High disk write usage
Thnaks a lot alessandro! Yes, we have very big physical dedicated machines, with a topology of 5 shards and10 replicas each shard. 1. transaction log files are increasing but not with this rate 2. we 've probed with values between 300 and 2000 MB... without any visible results 3. We don't use those features 4. No. 5. I've probed with low and high mergefacors and i think that is the point. With low merge factor (over 4) we 've high write disk rate as i said previously with merge factor of 20, writing disk rate is decreasing, but now, with high qps rates (over 1000 qps) system is overloaded. i think that's the expected behaviour :( 2017-07-05 15:49 GMT+02:00 alessandro.benedetti: > Point 2 was the ram Buffer size : > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene > indexing for buffering added documents and deletions before they > are > flushed to the Directory. > maxBufferedDocs sets a limit on the number of documents buffered > before flushing. > If both ramBufferSizeMB and maxBufferedDocs is set, then > Lucene will flush based on whichever limit is hit first. > > 100 > 1000 > > > > > - > --- > Alessandro Benedetti > Search Consultant, R Software Engineer, Director > Sease Ltd. - www.sease.io > -- > View this message in context: http://lucene.472066.n3. > nabble.com/High-disk-write-usage-tp4344356p4344386.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: High disk write usage
Point 2 was the ram Buffer size : *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene indexing for buffering added documents and deletions before they are flushed to the Directory. maxBufferedDocs sets a limit on the number of documents buffered before flushing. If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first. 100 1000 - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- View this message in context: http://lucene.472066.n3.nabble.com/High-disk-write-usage-tp4344356p4344386.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: High disk write usage
Is the phisical machine dedicated ? Is a dedicated VM on shared metal ? Apart from this operational checks I will assume the machine is dedicated. In Solr a write to the disk does not happen only on commit, I can think to other scenarios : 1) *Transaction log* [1] 2) 3) Spellcheck and SuggestComponent building ( this depends on the config in case you use them) 4) memory Swapping ? 5) merges ( they are triggered potentially by a segment writing or an explicit optimize call and they can last a while potentially) Maybe other edge cases, but i would first check this list! [1] https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- View this message in context: http://lucene.472066.n3.nabble.com/High-disk-write-usage-tp4344356p4344383.html Sent from the Solr - User mailing list archive at Nabble.com.