When you request 5000 rows, you are requesting 5000 disk reads times the number
of fields requested. It is normal for that to be a lot of disk access.

Use SSD disks.

wunder
Walter Underwood
[email protected]
http://observer.wunderwood.org/  (my blog)

> On Dec 23, 2021, at 5:34 AM, Ufuk YILMAZ <[email protected]> wrote:
> 
> My id field definition on the schema is indexed=true docValues=true 
> stored=false useDocValuesAsStored=true
> 
> Can it cause this kind of behavior? id field not being stored?
> 
> On 2021-12-23 15:57, Jeff Courtade wrote:
>> Rule of thumb to reduce disk reads with solr is to ...
>> Tun linux
>> Have fast disk/ssd for indexes
>> Separate disk for os/prog, indexes, solr logs
>> Enough ram on the system to allow the system to load the entire index into
>> ram
>> And have room for applications and os.
>> Linux will store files it has accessed in a ram buffer... this is the
>> buffers you see when looking at memory allocation.
>> Heap allocation is best left to Java 11 or higher.
>> If I were you I would rearchitect to get your indexes down to a more
>> manageable level and run more solr nodes with enough ram and fast disk.
>> If you can get 400+?40? Gigs of ram on your system it will help you with io
>> issues.
>> On Thu, Dec 23, 2021, 7:19 AM Ufuk YILMAZ <[email protected]>
>> wrote:
>>> I have a problem with my SolrCloud cluster which when I request a few
>>> stored fields disk read rate caps at the maximum for a long period of
>>> time, but when I request no fields response time is consistent with a
>>> few seconds.
>>> My cluster has 4 nodes. Total index size is 400GB per node. Each node
>>> has 96GB ram, 24GB of it is allocated to Solr heap. All data is
>>> persisted on SSD's.
>>> These tests are done when no other reads are being made (iotop command
>>> shows 0 read) but indexing is going on with 200kilobytes per second
>>> being written to the disk.
>>> When I send a query with rows=0 response time is consistent around 0.5 -
>>> 2 seconds.
>>> But when I request rows=5000 and a single stored field (field type is
>>> text_general with stored=true), response time jumps to a 3 - 10 minutes,
>>> during which disk read is topped at 1000M/s (maximum my disks can do)
>>> and stays at the top until request finishes. Document size is around
>>> 1-4KB and typical result set is 50-1000 docs. If I send a few requests
>>> at the same time, it gets even worse and I start to get errors.
>>> Why does Solr need to read hundreds of gigabytes of data to return a few
>>> hundred kilobytes of stored fields?
>>> I have been reading on how index and stored fields are organised to find
>>> out if this is expected. If queries with rows=0 was slow too, I'd simply
>>> say the index is too big for my machines.
>>> Do you have any pointers for this issue?
>>> --uyilmaz
> 

Reply via email to