Re: [Beowulf] Large Dell, odd IO delays

2018-02-15 Thread Gus Correa
On 02/15/2018 02:04 AM, John Hearns via Beowulf wrote: Hmmm...  I will also chip in with my favourite tip Look at the sysctl for min_free_kbytes    It is often set very low. Increase this substantially. It will do no harm to your system (unless you set it ti an absurd value!) You should be loo

Re: [Beowulf] Large Dell, odd IO delays

2018-02-15 Thread Michael Di Domenico
On Wed, Feb 14, 2018 at 6:44 PM, Kilian Cavalotti wrote: > On Wed, Feb 14, 2018 at 2:26 PM, David Mathog wrote: >> Checked the hugepage settings and found a difference there. The two systems >> that don't do this have /sys/kernel/mm/redhat_transparent_hugepage/defrag >> >> always madvise [never

Re: [Beowulf] Large Dell, odd IO delays

2018-02-14 Thread John Hearns via Beowulf
Hmmm... I will also chip in with my favourite tip Look at the sysctl for min_free_kbytesIt is often set very low. Increase this substantially. It will do no harm to your system (unless you set it ti an absurd value!) You should be looking at the vm dirty ratios etc. also On 15 February 2018

Re: [Beowulf] Large Dell, odd IO delays

2018-02-14 Thread Kilian Cavalotti
On Wed, Feb 14, 2018 at 2:26 PM, David Mathog wrote: > Checked the hugepage settings and found a difference there. The two systems > that don't do this have /sys/kernel/mm/redhat_transparent_hugepage/defrag > > always madvise [never] > > whereas the system with the issue has: > > [always] madvis

Re: [Beowulf] Large Dell, odd IO delays

2018-02-14 Thread Christopher Samuel
On 15/02/18 09:26, David Mathog wrote: Sometimes for no reason that I can discern an IO operation on this machine will stall. Things that should take seconds will run for minutes, or at least until I get tired of waiting and kill them. Here is today's example: gunzip -c largeFile.gz > largeF

[Beowulf] Large Dell, odd IO delays

2018-02-14 Thread David Mathog
Dell PowerEdge T630, PERC H730P, single 11Tb RAID5 array. Xeon CPU E5-2650 cpus with 40 total threads. 512Gb RAM. Centos 6.9. Kernel 2.6.32-696.20.1.el6.x86_64. (This machine is basically a small beowulf in a box.) Sometimes for no reason that I can discern an IO operation on this machine