When you request 5000 rows, you are requesting 5000 disk reads times the number of fields requested. It is normal for that to be a lot of disk access.
Use SSD disks. wunder Walter Underwood [email protected] http://observer.wunderwood.org/ (my blog) > On Dec 23, 2021, at 5:34 AM, Ufuk YILMAZ <[email protected]> wrote: > > My id field definition on the schema is indexed=true docValues=true > stored=false useDocValuesAsStored=true > > Can it cause this kind of behavior? id field not being stored? > > On 2021-12-23 15:57, Jeff Courtade wrote: >> Rule of thumb to reduce disk reads with solr is to ... >> Tun linux >> Have fast disk/ssd for indexes >> Separate disk for os/prog, indexes, solr logs >> Enough ram on the system to allow the system to load the entire index into >> ram >> And have room for applications and os. >> Linux will store files it has accessed in a ram buffer... this is the >> buffers you see when looking at memory allocation. >> Heap allocation is best left to Java 11 or higher. >> If I were you I would rearchitect to get your indexes down to a more >> manageable level and run more solr nodes with enough ram and fast disk. >> If you can get 400+?40? Gigs of ram on your system it will help you with io >> issues. >> On Thu, Dec 23, 2021, 7:19 AM Ufuk YILMAZ <[email protected]> >> wrote: >>> I have a problem with my SolrCloud cluster which when I request a few >>> stored fields disk read rate caps at the maximum for a long period of >>> time, but when I request no fields response time is consistent with a >>> few seconds. >>> My cluster has 4 nodes. Total index size is 400GB per node. Each node >>> has 96GB ram, 24GB of it is allocated to Solr heap. All data is >>> persisted on SSD's. >>> These tests are done when no other reads are being made (iotop command >>> shows 0 read) but indexing is going on with 200kilobytes per second >>> being written to the disk. >>> When I send a query with rows=0 response time is consistent around 0.5 - >>> 2 seconds. >>> But when I request rows=5000 and a single stored field (field type is >>> text_general with stored=true), response time jumps to a 3 - 10 minutes, >>> during which disk read is topped at 1000M/s (maximum my disks can do) >>> and stays at the top until request finishes. Document size is around >>> 1-4KB and typical result set is 50-1000 docs. If I send a few requests >>> at the same time, it gets even worse and I start to get errors. >>> Why does Solr need to read hundreds of gigabytes of data to return a few >>> hundred kilobytes of stored fields? >>> I have been reading on how index and stored fields are organised to find >>> out if this is expected. If queries with rows=0 was slow too, I'd simply >>> say the index is too big for my machines. >>> Do you have any pointers for this issue? >>> --uyilmaz >
