Re: High disk write usage

2017-07-11 Thread Antonio De Miguel
Thanks Shawn!


I will try to change the values of those parameters


2017-07-10 14:57 GMT+02:00 Shawn Heisey :

> On 7/10/2017 2:57 AM, Antonio De Miguel wrote:
> > I continue deeping inside this problem...  high writing rates continues.
> >
> > Searching in logs i see this:
> >
> > 2017-07-10 08:46:18.888 INFO  (commitScheduler-11-thread-1) [c:ads
> s:shard2
> > r:core_node47 x:ads_shard2_replica3] o.a.s.u.LoggingInfoStream
> > [DWPT][commitScheduler-11-thread-1]: flushed: segment=_mb7
> ramUsed=7.531 MB
> > newFlushedSize=2.472 MB docs/MB=334.132
> > 2017-07-10 08:46:29.336 INFO  (commitScheduler-11-thread-1) [c:ads
> s:shard2
> > r:core_node47 x:ads_shard2_replica3] o.a.s.u.LoggingInfoStream
> > [DWPT][commitScheduler-11-thread-1]: flushed: segment=_mba
> ramUsed=8.079 MB
> > newFlushedSize=1.784 MB docs/MB=244.978
> >
> >
> > A flush happens each 10 seconds (my autosoftcommit time is 10 secs and
> > hardcommit 5 minutes).  ¿is the expected behaviour?
>
> If you are indexing continuously, then the auto soft commit time of 10
> seconds means that this will be happening every ten seconds.
>
> > I thought soft commits does not write into disk...
>
> If you are using the correct DirectoryFactory type, a soft commit has
> the *possibility* of not writing to disk, but the amount of memory
> reserved is fairly small.
>
> Looking into the source code for NRTCachingDirectoryFactory, I see that
> maxMergeSizeMB defaults to 4, and maxCachedMB defaults to 48.  This is a
> little bit different than what the javadoc states for
> NRTCachingDirectory (5 and 60):
>
> http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/
> store/NRTCachingDirectory.html
>
> The way I read this, assuming the amount of segment data created is
> small, only the first few soft commits will be entirely handled in
> memory.  After that, older segments must be flushed to disk to make room
> for new ones.
>
> If the indexing rate is high, there's not really much difference between
> soft commits and hard commits.  This also assumes that you have left the
> directory at the default of NRTCachingDirectoryFactory.  If this has
> been changed, then there is no caching in RAM, and soft commit probably
> behaves *exactly* the same as hard commit.
>
> Thanks,
> Shawn
>
>


Re: High disk write usage

2017-07-10 Thread Shawn Heisey
On 7/10/2017 2:57 AM, Antonio De Miguel wrote:
> I continue deeping inside this problem...  high writing rates continues.
>
> Searching in logs i see this:
>
> 2017-07-10 08:46:18.888 INFO  (commitScheduler-11-thread-1) [c:ads s:shard2
> r:core_node47 x:ads_shard2_replica3] o.a.s.u.LoggingInfoStream
> [DWPT][commitScheduler-11-thread-1]: flushed: segment=_mb7 ramUsed=7.531 MB
> newFlushedSize=2.472 MB docs/MB=334.132
> 2017-07-10 08:46:29.336 INFO  (commitScheduler-11-thread-1) [c:ads s:shard2
> r:core_node47 x:ads_shard2_replica3] o.a.s.u.LoggingInfoStream
> [DWPT][commitScheduler-11-thread-1]: flushed: segment=_mba ramUsed=8.079 MB
> newFlushedSize=1.784 MB docs/MB=244.978
>
>
> A flush happens each 10 seconds (my autosoftcommit time is 10 secs and
> hardcommit 5 minutes).  ¿is the expected behaviour?

If you are indexing continuously, then the auto soft commit time of 10
seconds means that this will be happening every ten seconds.

> I thought soft commits does not write into disk...

If you are using the correct DirectoryFactory type, a soft commit has
the *possibility* of not writing to disk, but the amount of memory
reserved is fairly small.

Looking into the source code for NRTCachingDirectoryFactory, I see that
maxMergeSizeMB defaults to 4, and maxCachedMB defaults to 48.  This is a
little bit different than what the javadoc states for
NRTCachingDirectory (5 and 60):

http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/store/NRTCachingDirectory.html

The way I read this, assuming the amount of segment data created is
small, only the first few soft commits will be entirely handled in
memory.  After that, older segments must be flushed to disk to make room
for new ones.

If the indexing rate is high, there's not really much difference between
soft commits and hard commits.  This also assumes that you have left the
directory at the default of NRTCachingDirectoryFactory.  If this has
been changed, then there is no caching in RAM, and soft commit probably
behaves *exactly* the same as hard commit.

Thanks,
Shawn



Re: High disk write usage

2017-07-10 Thread Antonio De Miguel
Hi!

I continue deeping inside this problem...  high writing rates continues.

Searching in logs i see this:

2017-07-10 08:46:18.888 INFO  (commitScheduler-11-thread-1) [c:ads s:shard2
r:core_node47 x:ads_shard2_replica3] o.a.s.u.LoggingInfoStream
[DWPT][commitScheduler-11-thread-1]: flushed: segment=_mb7 ramUsed=7.531 MB
newFlushedSize=2.472 MB docs/MB=334.132
2017-07-10 08:46:29.336 INFO  (commitScheduler-11-thread-1) [c:ads s:shard2
r:core_node47 x:ads_shard2_replica3] o.a.s.u.LoggingInfoStream
[DWPT][commitScheduler-11-thread-1]: flushed: segment=_mba ramUsed=8.079 MB
newFlushedSize=1.784 MB docs/MB=244.978


A flush happens each 10 seconds (my autosoftcommit time is 10 secs and
hardcommit 5 minutes).  ¿is the expected behaviour?

I thought soft commits does not write into disk...


2017-07-06 0:02 GMT+02:00 Antonio De Miguel <deveto...@gmail.com>:

> Hi erik.
>
> What i want to said is that we have enough memory to store shards, and
> furthermore, JVMs heapspaces
>
> Machine has 400gb of RAM. I think we have enough.
>
> We have 10 JVM running on the machine, each of one using 16gb.
>
> Shard size is about 8gb.
>
> When we have query or indexing peaks our problem are the CPU ussage and
> the disk io, but we have a lot of unused memory.
>
>
>
>
>
>
>
>
>
> El 5/7/2017 19:04, "Erick Erickson" <erickerick...@gmail.com> escribió:
>
>> bq: We have enough physical RAM to store full collection and 16Gb for
>> each JVM.
>>
>> That's not quite what I was asking for. Lucene uses MMapDirectory to
>> map part of the index into the OS memory space. If you've
>> over-allocated the JVM space relative to your physical memory that
>> space can start swapping. Frankly I'd expect your query performance to
>> die if that was happening so this is a sanity check.
>>
>> How much physical memory does the machine have and how much memory is
>> allocated to _all_ of the JVMs running on that machine?
>>
>> see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on
>> -64bit.html
>>
>> Best,
>> Erick
>>
>>
>> On Wed, Jul 5, 2017 at 9:41 AM, Antonio De Miguel <deveto...@gmail.com>
>> wrote:
>> > Hi Erik! thanks for your response!
>> >
>> > Our soft commit is 5 seconds. Why generates I/0 a softcommit? first
>> notice.
>> >
>> >
>> > We have enough physical RAM to store full collection and 16Gb for each
>> > JVM.  The collection is relatively small.
>> >
>> > I've tried (for testing purposes)  disabling transactionlog (commenting
>> > )... but cluster does not go up. I'll try writing into
>> separated
>> > drive, nice idea...
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 2017-07-05 18:04 GMT+02:00 Erick Erickson <erickerick...@gmail.com>:
>> >
>> >> What is your soft commit interval? That'll cause I/O as well.
>> >>
>> >> How much physical RAM and how much is dedicated to _all_ the JVMs on a
>> >> machine? One cause here is that Lucene uses MMapDirectory which can be
>> >> starved for OS memory if you use too much JVM, my rule of thumb is
>> >> that _at least_ half of the physical memory should be reserved for the
>> >> OS.
>> >>
>> >> Your transaction logs should fluctuate but even out. By that I mean
>> >> they should increase in size but every hard commit should truncate
>> >> some of them so I wouldn't expect them to grow indefinitely.
>> >>
>> >> One strategy is to put your tlogs on a separate drive exactly to
>> >> reduce contention. You could disable them too at a cost of risking
>> >> your data. That might be a quick experiment you could run though,
>> >> disable tlogs and see what that changes. Of course I'd do this on my
>> >> test system ;).
>> >>
>> >> But yeah, Solr will use a lot of I/O in the scenario you are outlining
>> >> I'm afraid.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel <deveto...@gmail.com
>> >
>> >> wrote:
>> >> > thanks Markus!
>> >> >
>> >> > We already have SSD.
>> >> >
>> >> > About changing topology we probed yesterday with 10 shards, but
>> >> system
>> >> > goes more inconsistent than with the current topology (5x10). I dont
>> know
>> >> > why... too many traffic perhaps?
&

Re: High disk write usage

2017-07-05 Thread Antonio De Miguel
Hi erik.

What i want to said is that we have enough memory to store shards, and
furthermore, JVMs heapspaces

Machine has 400gb of RAM. I think we have enough.

We have 10 JVM running on the machine, each of one using 16gb.

Shard size is about 8gb.

When we have query or indexing peaks our problem are the CPU ussage and the
disk io, but we have a lot of unused memory.









El 5/7/2017 19:04, "Erick Erickson" <erickerick...@gmail.com> escribió:

> bq: We have enough physical RAM to store full collection and 16Gb for each
> JVM.
>
> That's not quite what I was asking for. Lucene uses MMapDirectory to
> map part of the index into the OS memory space. If you've
> over-allocated the JVM space relative to your physical memory that
> space can start swapping. Frankly I'd expect your query performance to
> die if that was happening so this is a sanity check.
>
> How much physical memory does the machine have and how much memory is
> allocated to _all_ of the JVMs running on that machine?
>
> see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-
> on-64bit.html
>
> Best,
> Erick
>
>
> On Wed, Jul 5, 2017 at 9:41 AM, Antonio De Miguel <deveto...@gmail.com>
> wrote:
> > Hi Erik! thanks for your response!
> >
> > Our soft commit is 5 seconds. Why generates I/0 a softcommit? first
> notice.
> >
> >
> > We have enough physical RAM to store full collection and 16Gb for each
> > JVM.  The collection is relatively small.
> >
> > I've tried (for testing purposes)  disabling transactionlog (commenting
> > )... but cluster does not go up. I'll try writing into
> separated
> > drive, nice idea...
> >
> >
> >
> >
> >
> >
> >
> >
> > 2017-07-05 18:04 GMT+02:00 Erick Erickson <erickerick...@gmail.com>:
> >
> >> What is your soft commit interval? That'll cause I/O as well.
> >>
> >> How much physical RAM and how much is dedicated to _all_ the JVMs on a
> >> machine? One cause here is that Lucene uses MMapDirectory which can be
> >> starved for OS memory if you use too much JVM, my rule of thumb is
> >> that _at least_ half of the physical memory should be reserved for the
> >> OS.
> >>
> >> Your transaction logs should fluctuate but even out. By that I mean
> >> they should increase in size but every hard commit should truncate
> >> some of them so I wouldn't expect them to grow indefinitely.
> >>
> >> One strategy is to put your tlogs on a separate drive exactly to
> >> reduce contention. You could disable them too at a cost of risking
> >> your data. That might be a quick experiment you could run though,
> >> disable tlogs and see what that changes. Of course I'd do this on my
> >> test system ;).
> >>
> >> But yeah, Solr will use a lot of I/O in the scenario you are outlining
> >> I'm afraid.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel <deveto...@gmail.com>
> >> wrote:
> >> > thanks Markus!
> >> >
> >> > We already have SSD.
> >> >
> >> > About changing topology we probed yesterday with 10 shards, but
> >> system
> >> > goes more inconsistent than with the current topology (5x10). I dont
> know
> >> > why... too many traffic perhaps?
> >> >
> >> > About merge factor.. we set default configuration for some days... but
> >> when
> >> > a merge occurs system overload. We probed with mergefactor of 4 to
> >> improbe
> >> > query times and trying to have smaller merges.
> >> >
> >> > 2017-07-05 16:51 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io
> >:
> >> >
> >> >> Try mergeFactor of 10 (default) which should be fine in most cases.
> If
> >> you
> >> >> got an extreme case, either create more shards and consider better
> >> hardware
> >> >> (SSD's)
> >> >>
> >> >> -Original message-
> >> >> > From:Antonio De Miguel <deveto...@gmail.com>
> >> >> > Sent: Wednesday 5th July 2017 16:48
> >> >> > To: solr-user@lucene.apache.org
> >> >> > Subject: Re: High disk write usage
> >> >> >
> >> >> > Thnaks a lot alessandro!
> >> >> >
> >> >> > Yes, we have very big physical dedicated machines, with a topology
> of
> >> 5
> >> >> > 

Re: High disk write usage

2017-07-05 Thread Erick Erickson
bq: We have enough physical RAM to store full collection and 16Gb for each JVM.

That's not quite what I was asking for. Lucene uses MMapDirectory to
map part of the index into the OS memory space. If you've
over-allocated the JVM space relative to your physical memory that
space can start swapping. Frankly I'd expect your query performance to
die if that was happening so this is a sanity check.

How much physical memory does the machine have and how much memory is
allocated to _all_ of the JVMs running on that machine?

see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick


On Wed, Jul 5, 2017 at 9:41 AM, Antonio De Miguel <deveto...@gmail.com> wrote:
> Hi Erik! thanks for your response!
>
> Our soft commit is 5 seconds. Why generates I/0 a softcommit? first notice.
>
>
> We have enough physical RAM to store full collection and 16Gb for each
> JVM.  The collection is relatively small.
>
> I've tried (for testing purposes)  disabling transactionlog (commenting
> )... but cluster does not go up. I'll try writing into separated
> drive, nice idea...
>
>
>
>
>
>
>
>
> 2017-07-05 18:04 GMT+02:00 Erick Erickson <erickerick...@gmail.com>:
>
>> What is your soft commit interval? That'll cause I/O as well.
>>
>> How much physical RAM and how much is dedicated to _all_ the JVMs on a
>> machine? One cause here is that Lucene uses MMapDirectory which can be
>> starved for OS memory if you use too much JVM, my rule of thumb is
>> that _at least_ half of the physical memory should be reserved for the
>> OS.
>>
>> Your transaction logs should fluctuate but even out. By that I mean
>> they should increase in size but every hard commit should truncate
>> some of them so I wouldn't expect them to grow indefinitely.
>>
>> One strategy is to put your tlogs on a separate drive exactly to
>> reduce contention. You could disable them too at a cost of risking
>> your data. That might be a quick experiment you could run though,
>> disable tlogs and see what that changes. Of course I'd do this on my
>> test system ;).
>>
>> But yeah, Solr will use a lot of I/O in the scenario you are outlining
>> I'm afraid.
>>
>> Best,
>> Erick
>>
>> On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel <deveto...@gmail.com>
>> wrote:
>> > thanks Markus!
>> >
>> > We already have SSD.
>> >
>> > About changing topology we probed yesterday with 10 shards, but
>> system
>> > goes more inconsistent than with the current topology (5x10). I dont know
>> > why... too many traffic perhaps?
>> >
>> > About merge factor.. we set default configuration for some days... but
>> when
>> > a merge occurs system overload. We probed with mergefactor of 4 to
>> improbe
>> > query times and trying to have smaller merges.
>> >
>> > 2017-07-05 16:51 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io>:
>> >
>> >> Try mergeFactor of 10 (default) which should be fine in most cases. If
>> you
>> >> got an extreme case, either create more shards and consider better
>> hardware
>> >> (SSD's)
>> >>
>> >> -Original message-
>> >> > From:Antonio De Miguel <deveto...@gmail.com>
>> >> > Sent: Wednesday 5th July 2017 16:48
>> >> > To: solr-user@lucene.apache.org
>> >> > Subject: Re: High disk write usage
>> >> >
>> >> > Thnaks a lot alessandro!
>> >> >
>> >> > Yes, we have very big physical dedicated machines, with a topology of
>> 5
>> >> > shards and10 replicas each shard.
>> >> >
>> >> >
>> >> > 1. transaction log files are increasing but not with this rate
>> >> >
>> >> > 2.  we 've probed with values between 300 and 2000 MB... without any
>> >> > visible results
>> >> >
>> >> > 3.  We don't use those features
>> >> >
>> >> > 4. No.
>> >> >
>> >> > 5. I've probed with low and high mergefacors and i think that is  the
>> >> point.
>> >> >
>> >> > With low merge factor (over 4) we 've high write disk rate as i said
>> >> > previously
>> >> >
>> >> > with merge factor of 20, writing disk rate is decreasing, but now,
>> with
>> >> > high qps rates (over 1000 qps) system is overloaded.
>> >> >
>> >> > i think that's the expecte

Re: High disk write usage

2017-07-05 Thread Antonio De Miguel
Hi Erik! thanks for your response!

Our soft commit is 5 seconds. Why generates I/0 a softcommit? first notice.


We have enough physical RAM to store full collection and 16Gb for each
JVM.  The collection is relatively small.

I've tried (for testing purposes)  disabling transactionlog (commenting
)... but cluster does not go up. I'll try writing into separated
drive, nice idea...








2017-07-05 18:04 GMT+02:00 Erick Erickson <erickerick...@gmail.com>:

> What is your soft commit interval? That'll cause I/O as well.
>
> How much physical RAM and how much is dedicated to _all_ the JVMs on a
> machine? One cause here is that Lucene uses MMapDirectory which can be
> starved for OS memory if you use too much JVM, my rule of thumb is
> that _at least_ half of the physical memory should be reserved for the
> OS.
>
> Your transaction logs should fluctuate but even out. By that I mean
> they should increase in size but every hard commit should truncate
> some of them so I wouldn't expect them to grow indefinitely.
>
> One strategy is to put your tlogs on a separate drive exactly to
> reduce contention. You could disable them too at a cost of risking
> your data. That might be a quick experiment you could run though,
> disable tlogs and see what that changes. Of course I'd do this on my
> test system ;).
>
> But yeah, Solr will use a lot of I/O in the scenario you are outlining
> I'm afraid.
>
> Best,
> Erick
>
> On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel <deveto...@gmail.com>
> wrote:
> > thanks Markus!
> >
> > We already have SSD.
> >
> > About changing topology we probed yesterday with 10 shards, but
> system
> > goes more inconsistent than with the current topology (5x10). I dont know
> > why... too many traffic perhaps?
> >
> > About merge factor.. we set default configuration for some days... but
> when
> > a merge occurs system overload. We probed with mergefactor of 4 to
> improbe
> > query times and trying to have smaller merges.
> >
> > 2017-07-05 16:51 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io>:
> >
> >> Try mergeFactor of 10 (default) which should be fine in most cases. If
> you
> >> got an extreme case, either create more shards and consider better
> hardware
> >> (SSD's)
> >>
> >> -Original message-
> >> > From:Antonio De Miguel <deveto...@gmail.com>
> >> > Sent: Wednesday 5th July 2017 16:48
> >> > To: solr-user@lucene.apache.org
> >> > Subject: Re: High disk write usage
> >> >
> >> > Thnaks a lot alessandro!
> >> >
> >> > Yes, we have very big physical dedicated machines, with a topology of
> 5
> >> > shards and10 replicas each shard.
> >> >
> >> >
> >> > 1. transaction log files are increasing but not with this rate
> >> >
> >> > 2.  we 've probed with values between 300 and 2000 MB... without any
> >> > visible results
> >> >
> >> > 3.  We don't use those features
> >> >
> >> > 4. No.
> >> >
> >> > 5. I've probed with low and high mergefacors and i think that is  the
> >> point.
> >> >
> >> > With low merge factor (over 4) we 've high write disk rate as i said
> >> > previously
> >> >
> >> > with merge factor of 20, writing disk rate is decreasing, but now,
> with
> >> > high qps rates (over 1000 qps) system is overloaded.
> >> >
> >> > i think that's the expected behaviour :(
> >> >
> >> >
> >> >
> >> >
> >> > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti <a.benede...@sease.io
> >:
> >> >
> >> > > Point 2 was the ram Buffer size :
> >> > >
> >> > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
> >> > >  indexing for buffering added documents and deletions before
> >> they
> >> > > are
> >> > >  flushed to the Directory.
> >> > >  maxBufferedDocs sets a limit on the number of documents
> >> buffered
> >> > >  before flushing.
> >> > >  If both ramBufferSizeMB and maxBufferedDocs is set, then
> >> > >  Lucene will flush based on whichever limit is hit first.
> >> > >
> >> > > 100
> >> > > 1000
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > -
> >> > > ---
> >> > > Alessandro Benedetti
> >> > > Search Consultant, R Software Engineer, Director
> >> > > Sease Ltd. - www.sease.io
> >> > > --
> >> > > View this message in context: http://lucene.472066.n3.
> >> > > nabble.com/High-disk-write-usage-tp4344356p4344386.html
> >> > > Sent from the Solr - User mailing list archive at Nabble.com.
> >> > >
> >> >
> >>
>


Re: High disk write usage

2017-07-05 Thread Erick Erickson
What is your soft commit interval? That'll cause I/O as well.

How much physical RAM and how much is dedicated to _all_ the JVMs on a
machine? One cause here is that Lucene uses MMapDirectory which can be
starved for OS memory if you use too much JVM, my rule of thumb is
that _at least_ half of the physical memory should be reserved for the
OS.

Your transaction logs should fluctuate but even out. By that I mean
they should increase in size but every hard commit should truncate
some of them so I wouldn't expect them to grow indefinitely.

One strategy is to put your tlogs on a separate drive exactly to
reduce contention. You could disable them too at a cost of risking
your data. That might be a quick experiment you could run though,
disable tlogs and see what that changes. Of course I'd do this on my
test system ;).

But yeah, Solr will use a lot of I/O in the scenario you are outlining
I'm afraid.

Best,
Erick

On Wed, Jul 5, 2017 at 8:08 AM, Antonio De Miguel <deveto...@gmail.com> wrote:
> thanks Markus!
>
> We already have SSD.
>
> About changing topology we probed yesterday with 10 shards, but system
> goes more inconsistent than with the current topology (5x10). I dont know
> why... too many traffic perhaps?
>
> About merge factor.. we set default configuration for some days... but when
> a merge occurs system overload. We probed with mergefactor of 4 to improbe
> query times and trying to have smaller merges.
>
> 2017-07-05 16:51 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io>:
>
>> Try mergeFactor of 10 (default) which should be fine in most cases. If you
>> got an extreme case, either create more shards and consider better hardware
>> (SSD's)
>>
>> -Original message-
>> > From:Antonio De Miguel <deveto...@gmail.com>
>> > Sent: Wednesday 5th July 2017 16:48
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: High disk write usage
>> >
>> > Thnaks a lot alessandro!
>> >
>> > Yes, we have very big physical dedicated machines, with a topology of 5
>> > shards and10 replicas each shard.
>> >
>> >
>> > 1. transaction log files are increasing but not with this rate
>> >
>> > 2.  we 've probed with values between 300 and 2000 MB... without any
>> > visible results
>> >
>> > 3.  We don't use those features
>> >
>> > 4. No.
>> >
>> > 5. I've probed with low and high mergefacors and i think that is  the
>> point.
>> >
>> > With low merge factor (over 4) we 've high write disk rate as i said
>> > previously
>> >
>> > with merge factor of 20, writing disk rate is decreasing, but now, with
>> > high qps rates (over 1000 qps) system is overloaded.
>> >
>> > i think that's the expected behaviour :(
>> >
>> >
>> >
>> >
>> > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti <a.benede...@sease.io>:
>> >
>> > > Point 2 was the ram Buffer size :
>> > >
>> > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
>> > >  indexing for buffering added documents and deletions before
>> they
>> > > are
>> > >  flushed to the Directory.
>> > >  maxBufferedDocs sets a limit on the number of documents
>> buffered
>> > >  before flushing.
>> > >  If both ramBufferSizeMB and maxBufferedDocs is set, then
>> > >  Lucene will flush based on whichever limit is hit first.
>> > >
>> > > 100
>> > > 1000
>> > >
>> > >
>> > >
>> > >
>> > > -
>> > > ---
>> > > Alessandro Benedetti
>> > > Search Consultant, R Software Engineer, Director
>> > > Sease Ltd. - www.sease.io
>> > > --
>> > > View this message in context: http://lucene.472066.n3.
>> > > nabble.com/High-disk-write-usage-tp4344356p4344386.html
>> > > Sent from the Solr - User mailing list archive at Nabble.com.
>> > >
>> >
>>


Re: High disk write usage

2017-07-05 Thread Antonio De Miguel
thanks Markus!

We already have SSD.

About changing topology we probed yesterday with 10 shards, but system
goes more inconsistent than with the current topology (5x10). I dont know
why... too many traffic perhaps?

About merge factor.. we set default configuration for some days... but when
a merge occurs system overload. We probed with mergefactor of 4 to improbe
query times and trying to have smaller merges.

2017-07-05 16:51 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io>:

> Try mergeFactor of 10 (default) which should be fine in most cases. If you
> got an extreme case, either create more shards and consider better hardware
> (SSD's)
>
> -Original message-
> > From:Antonio De Miguel <deveto...@gmail.com>
> > Sent: Wednesday 5th July 2017 16:48
> > To: solr-user@lucene.apache.org
> > Subject: Re: High disk write usage
> >
> > Thnaks a lot alessandro!
> >
> > Yes, we have very big physical dedicated machines, with a topology of 5
> > shards and10 replicas each shard.
> >
> >
> > 1. transaction log files are increasing but not with this rate
> >
> > 2.  we 've probed with values between 300 and 2000 MB... without any
> > visible results
> >
> > 3.  We don't use those features
> >
> > 4. No.
> >
> > 5. I've probed with low and high mergefacors and i think that is  the
> point.
> >
> > With low merge factor (over 4) we 've high write disk rate as i said
> > previously
> >
> > with merge factor of 20, writing disk rate is decreasing, but now, with
> > high qps rates (over 1000 qps) system is overloaded.
> >
> > i think that's the expected behaviour :(
> >
> >
> >
> >
> > 2017-07-05 15:49 GMT+02:00 alessandro.benedetti <a.benede...@sease.io>:
> >
> > > Point 2 was the ram Buffer size :
> > >
> > > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
> > >  indexing for buffering added documents and deletions before
> they
> > > are
> > >  flushed to the Directory.
> > >  maxBufferedDocs sets a limit on the number of documents
> buffered
> > >  before flushing.
> > >  If both ramBufferSizeMB and maxBufferedDocs is set, then
> > >  Lucene will flush based on whichever limit is hit first.
> > >
> > > 100
> > > 1000
> > >
> > >
> > >
> > >
> > > -
> > > ---
> > > Alessandro Benedetti
> > > Search Consultant, R Software Engineer, Director
> > > Sease Ltd. - www.sease.io
> > > --
> > > View this message in context: http://lucene.472066.n3.
> > > nabble.com/High-disk-write-usage-tp4344356p4344386.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>


RE: High disk write usage

2017-07-05 Thread Markus Jelsma
Try mergeFactor of 10 (default) which should be fine in most cases. If you got 
an extreme case, either create more shards and consider better hardware (SSD's)
 
-Original message-
> From:Antonio De Miguel <deveto...@gmail.com>
> Sent: Wednesday 5th July 2017 16:48
> To: solr-user@lucene.apache.org
> Subject: Re: High disk write usage
> 
> Thnaks a lot alessandro!
> 
> Yes, we have very big physical dedicated machines, with a topology of 5
> shards and10 replicas each shard.
> 
> 
> 1. transaction log files are increasing but not with this rate
> 
> 2.  we 've probed with values between 300 and 2000 MB... without any
> visible results
> 
> 3.  We don't use those features
> 
> 4. No.
> 
> 5. I've probed with low and high mergefacors and i think that is  the point.
> 
> With low merge factor (over 4) we 've high write disk rate as i said
> previously
> 
> with merge factor of 20, writing disk rate is decreasing, but now, with
> high qps rates (over 1000 qps) system is overloaded.
> 
> i think that's the expected behaviour :(
> 
> 
> 
> 
> 2017-07-05 15:49 GMT+02:00 alessandro.benedetti <a.benede...@sease.io>:
> 
> > Point 2 was the ram Buffer size :
> >
> > *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
> >  indexing for buffering added documents and deletions before they
> > are
> >  flushed to the Directory.
> >  maxBufferedDocs sets a limit on the number of documents buffered
> >  before flushing.
> >  If both ramBufferSizeMB and maxBufferedDocs is set, then
> >  Lucene will flush based on whichever limit is hit first.
> >
> > 100
> > 1000
> >
> >
> >
> >
> > -
> > ---
> > Alessandro Benedetti
> > Search Consultant, R Software Engineer, Director
> > Sease Ltd. - www.sease.io
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/High-disk-write-usage-tp4344356p4344386.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> 


Re: High disk write usage

2017-07-05 Thread Antonio De Miguel
Thnaks a lot alessandro!

Yes, we have very big physical dedicated machines, with a topology of 5
shards and10 replicas each shard.


1. transaction log files are increasing but not with this rate

2.  we 've probed with values between 300 and 2000 MB... without any
visible results

3.  We don't use those features

4. No.

5. I've probed with low and high mergefacors and i think that is  the point.

With low merge factor (over 4) we 've high write disk rate as i said
previously

with merge factor of 20, writing disk rate is decreasing, but now, with
high qps rates (over 1000 qps) system is overloaded.

i think that's the expected behaviour :(




2017-07-05 15:49 GMT+02:00 alessandro.benedetti :

> Point 2 was the ram Buffer size :
>
> *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
>  indexing for buffering added documents and deletions before they
> are
>  flushed to the Directory.
>  maxBufferedDocs sets a limit on the number of documents buffered
>  before flushing.
>  If both ramBufferSizeMB and maxBufferedDocs is set, then
>  Lucene will flush based on whichever limit is hit first.
>
> 100
> 1000
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/High-disk-write-usage-tp4344356p4344386.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: High disk write usage

2017-07-05 Thread alessandro.benedetti
Point 2 was the ram Buffer size :

*ramBufferSizeMB* sets the amount of RAM that may be used by Lucene
 indexing for buffering added documents and deletions before they
are
 flushed to the Directory.
 maxBufferedDocs sets a limit on the number of documents buffered
 before flushing.
 If both ramBufferSizeMB and maxBufferedDocs is set, then
 Lucene will flush based on whichever limit is hit first.  

100
1000




-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/High-disk-write-usage-tp4344356p4344386.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: High disk write usage

2017-07-05 Thread alessandro.benedetti
Is the phisical machine dedicated ? Is a dedicated VM on shared metal ?
Apart from this operational checks I will assume the machine is dedicated.

In Solr a write to the disk does not happen only on commit, I can think to
other scenarios :
1) *Transaction log* [1] 
2) 



3) Spellcheck and SuggestComponent  building ( this depends on the config in
case you use them)

4) memory Swapping ?

5) merges ( they are triggered potentially by a segment writing or an
explicit optimize call and they can last a while potentially)

Maybe other edge cases, but i would first check this list!

[1]
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/




-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/High-disk-write-usage-tp4344356p4344383.html
Sent from the Solr - User mailing list archive at Nabble.com.