subject:"Re\: \[go\-nuts\] Realizing SSD random read IOPS"

Re: [go-nuts] Realizing SSD random read IOPS

2017-08-07 Thread Manish Rai Jain

Hey folks,

Just wanted to update the status of this.

During Gophercon, I happened to meet Russ Cox and asked him the same
question. If File::Read blocks goroutines, which then spawn new OS threads,
in a long running job, there should be plenty of OS threads created
already, so the random read throughput should increase over time and
stabilize to the maximum possible value. But, that's not what I see in my
benchmarks.

And his explanation was that the GOMAXPROCS in a way acts like a
multiplexer. From docs, "the GOMAXPROCS variable limits the number of
operating system threads that can execute user-level Go code
simultaneously." Which basically means, all reads must first be run only
via GOMAXPROCS number of goroutines, before switching over to some OS
thread (not really a switch, but conceptually speaking). This introduces a
bottleneck for throughput.

I re-ran my benchmarks with a much higher GOMAXPROCS and was able to then
achieve the maximum throughput. The numbers are here:
https://github.com/dgraph-io/badger-bench/blob/master/randread/maxprocs.txt
To summarize these benchmarks, Linux fio achieves 118K IOPS, and with
GOMAXPROCS=64/128, I'm able to achieve 105K IOPS, which is close enough.
Win!

Regarding the point about using io_submit etc., instead of goroutines; I
managed to find a library which does that, but it performed worse than just
using goroutines.
https://github.com/traetox/goaio/issues/3
>From what I gather (talking to Russ and Ian), whatever work is going on
in user space, the same work has to happen in kernel space; so there's not
much benefit here.

Overall, with GOMAXPROCS set to a higher value (as I've done in Dgraph
),
one can get the advertised SSD throughput using goroutines.

Thanks, Ian, Russ and the Go community in helping solve this problem!

On Sat, May 20, 2017 at 5:31 AM, Ian Lance Taylor  wrote:

> On Fri, May 19, 2017 at 3:26 AM, Manish Rai Jain 
> wrote:
> >
> >> It's not obvious to me that io_submit would be a win for normal
> > programs, but if anybody wants to try it out and see that would be
> > great.
> >
> > Yeah, my hunch is that the cost of threads context switching is going to
> be
> > a hindrance to achieving the true throughput of SSDs. So, I'd like to
> try it
> > out. A few guiding pointers would be useful:
> >
> > - This can be done directly via Syscall and Syscall6, is that right? Or
> > should I use Cgo?
>
> You should be able to use syscall.Syscall.
>
> > - I see SYS_IO_SUBMIT in syscall package. But, no aio_context_t, or
> iocbpp
> > structs in the package.
> > - Similarly, other structs for io_getevents etc.
> > - What's the best way to generate them, so syscall.Syscall would accept
> > these?
>
> The simplest way is to get them via cgo.  The better way is to add
> them to the x/sys/unix package as described at
> https://github.com/golang/sys/blob/master/unix/README.md .
>
> Ian
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Realizing SSD random read IOPS

2017-05-19 Thread Ian Lance Taylor

On Fri, May 19, 2017 at 3:26 AM, Manish Rai Jain  wrote:
>
>> It's not obvious to me that io_submit would be a win for normal
> programs, but if anybody wants to try it out and see that would be
> great.
>
> Yeah, my hunch is that the cost of threads context switching is going to be
> a hindrance to achieving the true throughput of SSDs. So, I'd like to try it
> out. A few guiding pointers would be useful:
>
> - This can be done directly via Syscall and Syscall6, is that right? Or
> should I use Cgo?

You should be able to use syscall.Syscall.

> - I see SYS_IO_SUBMIT in syscall package. But, no aio_context_t, or iocbpp
> structs in the package.
> - Similarly, other structs for io_getevents etc.
> - What's the best way to generate them, so syscall.Syscall would accept
> these?

The simplest way is to get them via cgo.  The better way is to add
them to the x/sys/unix package as described at
https://github.com/golang/sys/blob/master/unix/README.md .

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Realizing SSD random read IOPS

2017-05-19 Thread Manish Rai Jain

Sorry for the delay in replying. Got busy with a presentation at Go meetup.

> I agree with Dave that looking at the execution tracer is likely to help.

I tried to run it, but nothing renders on my chrome (running on Arch
Linux). Typical about:tracing works, but this doesn't. And there isn't much
documentation to troubleshoot.

> It's not obvious to me that io_submit would be a win for normal
programs, but if anybody wants to try it out and see that would be
great.

Yeah, my hunch is that the cost of threads context switching is going to be
a hindrance to achieving the true throughput of SSDs. So, I'd like to try
it out. A few guiding pointers would be useful:

- This can be done directly via Syscall and Syscall6, is that right? Or
should I use Cgo?
- I see SYS_IO_SUBMIT in syscall package. But, no aio_context_t, or
iocbpp structs in the package.
- Similarly, other structs for io_getevents etc.
- What's the best way to generate them, so syscall.Syscall would accept
these?


On Thu, May 18, 2017 at 12:36 AM, Ian Lance Taylor  wrote:

> On Wed, May 17, 2017 at 12:29 AM, Manish Rai Jain 
> wrote:
> >
> >> libaio sounds good on paper, but at least on GNU/Linux it's all in user
> >> space.
> >
> > I see. That makes sense. Reading a bit more, Linux native I/O sounds
> like it
> > does exactly what we expect, i.e. save OS threads, and push this to
> kernel:
> > http://man7.org/linux/man-pages/man2/io_submit.2.html
> > But, I suppose this can't be part of Go, because it's not portable. Is my
> > understanding correct?
>
> We could use io_submit and friends on GNU/Linux.  We want to provide a
> consistent API to Go code, but the internal code can be different on
> different operating systems.  For example the implementations on
> WIndows and Unix systems are of course quite different.
>
> It's not obvious to me that io_submit would be a win for normal
> programs, but if anybody wants to try it out and see that would be
> great.
>
>
> > Also, any explanations about why GOMAXPROCS causes throughput to
> increase,
> > if new OS threads are being spawned by blocked goroutines anyway? I
> thought
> > I understood it before but now I don't.
>
> My guess is that it's the timing.  The current runtime doesn't spawn a
> new OS thread until an existing thread has been blocked in a syscall
> for 20us or more.  Having more threads ready to go avoids that delay.
>
> I agree with Dave that looking at the execution tracer is likely to help.
>
> Ian
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Realizing SSD random read IOPS

2017-05-17 Thread Ian Lance Taylor

On Wed, May 17, 2017 at 12:29 AM, Manish Rai Jain  wrote:
>
>> libaio sounds good on paper, but at least on GNU/Linux it's all in user
>> space.
>
> I see. That makes sense. Reading a bit more, Linux native I/O sounds like it
> does exactly what we expect, i.e. save OS threads, and push this to kernel:
> http://man7.org/linux/man-pages/man2/io_submit.2.html
> But, I suppose this can't be part of Go, because it's not portable. Is my
> understanding correct?

We could use io_submit and friends on GNU/Linux.  We want to provide a
consistent API to Go code, but the internal code can be different on
different operating systems.  For example the implementations on
WIndows and Unix systems are of course quite different.

It's not obvious to me that io_submit would be a win for normal
programs, but if anybody wants to try it out and see that would be
great.

> Also, any explanations about why GOMAXPROCS causes throughput to increase,
> if new OS threads are being spawned by blocked goroutines anyway? I thought
> I understood it before but now I don't.

My guess is that it's the timing.  The current runtime doesn't spawn a
new OS thread until an existing thread has been blocked in a syscall
for 20us or more.  Having more threads ready to go avoids that delay.

I agree with Dave that looking at the execution tracer is likely to help.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Realizing SSD random read IOPS

2017-05-17 Thread 'David Klempner' via golang-nuts

On May 16, 2017 22:03, "Ian Lance Taylor"  wrote:

On Tue, May 16, 2017 at 9:26 PM, Manish Rai Jain 
wrote:
>> The runtime will spawn a new thread to replace the one that is blocked.
>
> Realized that after writing my last mail. And that actually explains some
of
> the other crashes we saw, about "too many threads", if we run tens of
> thousands of goroutines to do these reads, one goroutine per read.
>
> It is obviously lot more expensive to spawn a new OS thread. It seems like
> this exact same problem was already solved for network via netpoller
> (https://morsmachine.dk/netpoller). Blocking OS threads for disk reads
made
> sense for HDDs, which could only do 200 IOPS; for SSDs we'd need a
solution
> based on async I/O.

Note that in the upcoming Go 1.9 release we now use the netpoller for
the os package as well.  However, it's not as effective as one would
hope, because on GNU/Linux you can't use epoll for disk files.


There's a not very well documented API to make AIO completions kick an
eventfd.

It
mainly helps with pipes.

Ian

--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


smime.p7s
Description: S/MIME Cryptographic Signature

Re: [go-nuts] Realizing SSD random read IOPS

2017-05-17 Thread Dave Cheney

Can you post the svg versions of those profiles?

Also, I recommend the execution trace profiler for this job, it'll show you
a lot of detail about how the runtime is interacting with your program.

On Wed, 17 May 2017, 17:29 Manish Rai Jain  wrote:

> > libaio sounds good on paper, but at least on GNU/Linux it's all in user
> space.
>
> I see. That makes sense. Reading a bit more, Linux native I/O sounds like
> it does exactly what we expect, i.e. save OS threads, and push this to
> kernel: http://man7.org/linux/man-pages/man2/io_submit.2.html
> But, I suppose this can't be part of Go, because it's not portable. Is my
> understanding correct?
>
> Also, any explanations about why GOMAXPROCS causes throughput to increase,
> if new OS threads are being spawned by blocked goroutines anyway? I thought
> I understood it before but now I don't.
>
> Dave, profiler doesn't show any issues with the code itself. It's just
> blocked waiting on syscalls.
>
> $ go tool pprof randread /tmp/profile398062565/cpu.pprof
>  ~/go/src/github.com/dgraph-io/badger-bench/randread
> Entering interactive mode (type "help" for commands)
> (pprof) top
> 19.48s of 19.76s total (98.58%)
> Dropped 27 nodes (cum <= 0.10s)
>   flat  flat%   sum%cum   cum%
> 19.34s 97.87% 97.87% 19.52s 98.79%  syscall.Syscall6
>  0.07s  0.35% 98.23%  0.11s  0.56%  runtime.exitsyscall
>  0.03s  0.15% 98.38% 19.56s 98.99%  os.(*File).ReadAt
>  0.02s   0.1% 98.48%  0.10s  0.51%  math/rand.(*Rand).Intn
>  0.01s 0.051% 98.53% 19.70s 99.70%  main.Conc2.func1
>  0.01s 0.051% 98.58% 19.53s 98.84%  syscall.Pread
>  0 0% 98.58%  0.13s  0.66%  main.getIndices
>  0 0% 98.58% 19.53s 98.84%  os.(*File).pread
>  0 0% 98.58% 19.70s 99.70%  runtime.goexit
> (pprof)
>
>
> $ go tool pprof randread /tmp/profile192709852/block.pprof
>  ~/go/src/github.com/dgraph-io/badger-bench/randread
> Entering interactive mode (type "help" for commands)
> (pprof) top
> 58.48s of 58.48s total (  100%)
> Dropped 8 nodes (cum <= 0.29s)
>   flat  flat%   sum%cum   cum%
> 58.48s   100%   100% 58.48s   100%  sync.(*WaitGroup).Wait
>  0 0%   100% 58.48s   100%  main.Conc2
>  0 0%   100% 58.48s   100%  main.main
>  0 0%   100% 58.48s   100%  runtime.goexit
>  0 0%   100% 58.48s   100%  runtime.main
> (pprof)
>
>
> On Wed, May 17, 2017 at 3:25 PM, Dave Cheney  wrote:
>
>> Rather than guessing what is going on, I think it's time to break out the
>> profiling tools Manish.
>>
>> On Wed, 17 May 2017, 15:23 David Klempner  wrote:
>>
>>>
>>> On May 16, 2017 22:03, "Ian Lance Taylor"  wrote:
>>>
>>> On Tue, May 16, 2017 at 9:26 PM, Manish Rai Jain 
>>> wrote:
>>> >> The runtime will spawn a new thread to replace the one that is
>>> blocked.
>>> >
>>> > Realized that after writing my last mail. And that actually explains
>>> some of
>>> > the other crashes we saw, about "too many threads", if we run tens of
>>> > thousands of goroutines to do these reads, one goroutine per read.
>>> >
>>> > It is obviously lot more expensive to spawn a new OS thread. It seems
>>> like
>>> > this exact same problem was already solved for network via netpoller
>>> > (https://morsmachine.dk/netpoller). Blocking OS threads for disk
>>> reads made
>>> > sense for HDDs, which could only do 200 IOPS; for SSDs we'd need a
>>> solution
>>> > based on async I/O.
>>>
>>> Note that in the upcoming Go 1.9 release we now use the netpoller for
>>> the os package as well.  However, it's not as effective as one would
>>> hope, because on GNU/Linux you can't use epoll for disk files.
>>>
>>>
>>> There's a not very well documented API to make AIO completions kick an
>>> eventfd.
>>>
>>> It
>>> mainly helps with pipes.
>>>
>>>
>>> Ian
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "golang-nuts" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to golang-nuts+unsubscr...@googlegroups.com.
>>>
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Realizing SSD random read IOPS

2017-05-17 Thread Manish Rai Jain

> libaio sounds good on paper, but at least on GNU/Linux it's all in user
space.

I see. That makes sense. Reading a bit more, Linux native I/O sounds like
it does exactly what we expect, i.e. save OS threads, and push this to
kernel: http://man7.org/linux/man-pages/man2/io_submit.2.html
But, I suppose this can't be part of Go, because it's not portable. Is my
understanding correct?

Also, any explanations about why GOMAXPROCS causes throughput to increase,
if new OS threads are being spawned by blocked goroutines anyway? I thought
I understood it before but now I don't.

Dave, profiler doesn't show any issues with the code itself. It's just
blocked waiting on syscalls.

$ go tool pprof randread /tmp/profile398062565/cpu.pprof
 ~/go/src/github.com/dgraph-io/badger-bench/randread
Entering interactive mode (type "help" for commands)
(pprof) top
19.48s of 19.76s total (98.58%)
Dropped 27 nodes (cum <= 0.10s)
  flat  flat%   sum%cum   cum%
19.34s 97.87% 97.87% 19.52s 98.79%  syscall.Syscall6
 0.07s  0.35% 98.23%  0.11s  0.56%  runtime.exitsyscall
 0.03s  0.15% 98.38% 19.56s 98.99%  os.(*File).ReadAt
 0.02s   0.1% 98.48%  0.10s  0.51%  math/rand.(*Rand).Intn
 0.01s 0.051% 98.53% 19.70s 99.70%  main.Conc2.func1
 0.01s 0.051% 98.58% 19.53s 98.84%  syscall.Pread
 0 0% 98.58%  0.13s  0.66%  main.getIndices
 0 0% 98.58% 19.53s 98.84%  os.(*File).pread
 0 0% 98.58% 19.70s 99.70%  runtime.goexit
(pprof)


$ go tool pprof randread /tmp/profile192709852/block.pprof
 ~/go/src/github.com/dgraph-io/badger-bench/randread
Entering interactive mode (type "help" for commands)
(pprof) top
58.48s of 58.48s total (  100%)
Dropped 8 nodes (cum <= 0.29s)
  flat  flat%   sum%cum   cum%
58.48s   100%   100% 58.48s   100%  sync.(*WaitGroup).Wait
 0 0%   100% 58.48s   100%  main.Conc2
 0 0%   100% 58.48s   100%  main.main
 0 0%   100% 58.48s   100%  runtime.goexit
 0 0%   100% 58.48s   100%  runtime.main
(pprof)


On Wed, May 17, 2017 at 3:25 PM, Dave Cheney  wrote:

> Rather than guessing what is going on, I think it's time to break out the
> profiling tools Manish.
>
> On Wed, 17 May 2017, 15:23 David Klempner  wrote:
>
>>
>> On May 16, 2017 22:03, "Ian Lance Taylor"  wrote:
>>
>> On Tue, May 16, 2017 at 9:26 PM, Manish Rai Jain 
>> wrote:
>> >> The runtime will spawn a new thread to replace the one that is blocked.
>> >
>> > Realized that after writing my last mail. And that actually explains
>> some of
>> > the other crashes we saw, about "too many threads", if we run tens of
>> > thousands of goroutines to do these reads, one goroutine per read.
>> >
>> > It is obviously lot more expensive to spawn a new OS thread. It seems
>> like
>> > this exact same problem was already solved for network via netpoller
>> > (https://morsmachine.dk/netpoller). Blocking OS threads for disk reads
>> made
>> > sense for HDDs, which could only do 200 IOPS; for SSDs we'd need a
>> solution
>> > based on async I/O.
>>
>> Note that in the upcoming Go 1.9 release we now use the netpoller for
>> the os package as well.  However, it's not as effective as one would
>> hope, because on GNU/Linux you can't use epoll for disk files.
>>
>>
>> There's a not very well documented API to make AIO completions kick an
>> eventfd.
>>
>> It
>> mainly helps with pipes.
>>
>>
>> Ian
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to golang-nuts+unsubscr...@googlegroups.com.
>>
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Realizing SSD random read IOPS

2017-05-16 Thread Dave Cheney

Rather than guessing what is going on, I think it's time to break out the
profiling tools Manish.

On Wed, 17 May 2017, 15:23 David Klempner  wrote:

>
> On May 16, 2017 22:03, "Ian Lance Taylor"  wrote:
>
> On Tue, May 16, 2017 at 9:26 PM, Manish Rai Jain 
> wrote:
> >> The runtime will spawn a new thread to replace the one that is blocked.
> >
> > Realized that after writing my last mail. And that actually explains
> some of
> > the other crashes we saw, about "too many threads", if we run tens of
> > thousands of goroutines to do these reads, one goroutine per read.
> >
> > It is obviously lot more expensive to spawn a new OS thread. It seems
> like
> > this exact same problem was already solved for network via netpoller
> > (https://morsmachine.dk/netpoller). Blocking OS threads for disk reads
> made
> > sense for HDDs, which could only do 200 IOPS; for SSDs we'd need a
> solution
> > based on async I/O.
>
> Note that in the upcoming Go 1.9 release we now use the netpoller for
> the os package as well.  However, it's not as effective as one would
> hope, because on GNU/Linux you can't use epoll for disk files.
>
>
> There's a not very well documented API to make AIO completions kick an
> eventfd.
>
> It
> mainly helps with pipes.
>
>
> Ian
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
>
>
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Realizing SSD random read IOPS

2017-05-16 Thread Ian Lance Taylor

On Tue, May 16, 2017 at 9:26 PM, Manish Rai Jain  wrote:
>> The runtime will spawn a new thread to replace the one that is blocked.
>
> Realized that after writing my last mail. And that actually explains some of
> the other crashes we saw, about "too many threads", if we run tens of
> thousands of goroutines to do these reads, one goroutine per read.
>
> It is obviously lot more expensive to spawn a new OS thread. It seems like
> this exact same problem was already solved for network via netpoller
> (https://morsmachine.dk/netpoller). Blocking OS threads for disk reads made
> sense for HDDs, which could only do 200 IOPS; for SSDs we'd need a solution
> based on async I/O.

Note that in the upcoming Go 1.9 release we now use the netpoller for
the os package as well.  However, it's not as effective as one would
hope, because on GNU/Linux you can't use epoll for disk files.  It
mainly helps with pipes.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Realizing SSD random read IOPS

2017-05-16 Thread Ian Lance Taylor

On Tue, May 16, 2017 at 8:04 PM, Manish Rai Jain  wrote:
>
> Ideally, the disk reads could be happening via libaio, causing the OS
> threads to not block, so all goroutines can make progress, increasing the
> number of read requests that can be made concurrently. This would then also
> ensure that one doesn't need to set GOMAXPROCS to a value greater than
> number of cores to achieve higher throughput.

libaio sounds good on paper, but at least on GNU/Linux it's all in
user space.  In effect it does exactly what the Go runtime does
already: it hands file I/O operations off to separate threads.  The Go
runtime would gain nothing at all by switching to using libaio.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Realizing SSD random read IOPS

2017-05-16 Thread Manish Rai Jain

> The runtime will spawn a new thread to replace the one that is blocked.

Realized that after writing my last mail. And that actually explains some
of the other crashes we saw, about "too many threads", if we run tens of
thousands of goroutines to do these reads, one goroutine per read.

It is obviously lot more expensive to spawn a new OS thread. It seems like
this exact same problem was already solved for network via netpoller (
https://morsmachine.dk/netpoller). Blocking OS threads for disk reads made
sense for HDDs, which could only do 200 IOPS; for SSDs we'd need a solution
based on async I/O.

On Wed, May 17, 2017 at 2:01 PM, Dave Cheney  wrote:

> > So, if an OS thread is blocked, no goroutines can be scheduled on this
> thread, therefore even pure CPU operations can't be run.
>
> The runtime will spawn a new thread to replace the one that is blocked.
>
> On Wednesday, 17 May 2017 13:05:49 UTC+10, Manish Rai Jain wrote:
>>
>> On further thought about GOMAXPROCS, and its impact on throughput:
>>
>> A file::pread would block the OS thread. Go runs one OS thread per core.
>> So, if an OS thread is blocked, no goroutines can be scheduled on this
>> thread, therefore even pure CPU operations can't be run. This would lead to
>> core wastage.
>>
>> This is probably the reason why increasing GOMAXPROCS improves
>> throughput, and running any number of goroutines >= GOMAXPROCS has little
>> impact on anything. The underlying OS threads are already blocked, so
>> goroutines can't do much.
>>
>> If this logic is valid, then in a complex system, which is doing many
>> random reads, while also performing calculations (like Dgraph) would
>> suffer; even if we set GOMAXPROCS to a factor more than number of cores.
>>
>> Ideally, the disk reads could be happening via libaio, causing the OS
>> threads to not block, so all goroutines can make progress, increasing the
>> number of read requests that can be made concurrently. This would then also
>> ensure that one doesn't need to set GOMAXPROCS to a value greater than
>> number of cores to achieve higher throughput.
>>
>>
>> On Wed, May 17, 2017 at 10:38 AM, Manish Rai Jain 
>> wrote:
>>
>>> So, I fixed the rand and removed the atomics usage (link in my original
>>> post).
>>>
>>> Setting GOMAXPROCS definitely helped a lot. And now it seems to make
>>> sense, because (the following command in) fio spawns 16 threads; and
>>> GOMAXPROCS would do the same thing. However, the numbers are still quite a
>>> bit off.
>>>
>>> I realized fio seems to overestimate, and my Go program seems to
>>> underestimate, so we used sar to determine the IOPS.
>>>
>>> $ fio --name=randread --ioengine=psync --iodepth=32 --rw=randread
>>> --bs=4k --direct=0 --size=2G --numjobs=16 --runtime=120 --group_reporting
>>> Gives around 62K, tested via sar -d 1 -p, while
>>>
>>> $ go build . && GOMAXPROCS=16 ./randread --dir ~/diskfio --jobs 16 --num
>>> 200 --mode 1
>>> Gives around 44K, via sar. Number of cores on my machine are 4.
>>>
>>> Note that this is way better than the earlier 20K with GOMAXPROCS =
>>> number of cores, but still leaves much to be desired.
>>>
>>> On Tue, May 16, 2017 at 11:36 PM, Ian Lance Taylor 
>>> wrote:
>>>
 On Tue, May 16, 2017 at 4:59 AM, Manish Rai Jain 
 wrote:
 >
 > 3 is slower than 2 (of course). But, 2 is never able to achieve the
 IOPS
 > that Fio can achieve. I've tried other things, to no luck. What I
 notice is
 > that Go and Fio are close to each other as long as number of
 Goroutines is
 > <= number of cores. Once you exceed cores, Go stays put, while Fio
 IOPS
 > keeps on improving, until it reaches SSD thresholds.

 One thing I notice about your program is that each goroutine is
 calling rand.Intn and rand.Int63n.  Those functions acquire and
 release a lock, so that single lock is being contested by every
 goroutine.  That's an unfortunate and unnecessary slowdown.  Give each
 goroutine its own source of pseudo-random numbers by using rand.New.

 You also have a point of contention on the local variable i, which you
 are manipulating using atomic functions.  It would be cheaper to give
 each goroutine a number of operations to do rather than to compute
 that dynamically using a contended address.

 I'll also note that if a program that should be I/O bound shows a
 behavior change when the number of parallel goroutines exceeds the
 number of CPUs, then it might be interesting to try setting GOMAXPROCS
 to be higher.  I don't know what effect that would have here, but it's
 worth checking.

 Ian

>>>
>>>
>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "golang-nuts" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/golang-nuts/jPb_h3TvlKE/unsubscribe.
> To unsubscribe from

Re: [go-nuts] Realizing SSD random read IOPS

2017-05-16 Thread Dave Cheney

> So, if an OS thread is blocked, no goroutines can be scheduled on this 
thread, therefore even pure CPU operations can't be run.

The runtime will spawn a new thread to replace the one that is blocked.

On Wednesday, 17 May 2017 13:05:49 UTC+10, Manish Rai Jain wrote:
>
> On further thought about GOMAXPROCS, and its impact on throughput:
>
> A file::pread would block the OS thread. Go runs one OS thread per core. 
> So, if an OS thread is blocked, no goroutines can be scheduled on this 
> thread, therefore even pure CPU operations can't be run. This would lead to 
> core wastage.
>
> This is probably the reason why increasing GOMAXPROCS improves throughput, 
> and running any number of goroutines >= GOMAXPROCS has little impact on 
> anything. The underlying OS threads are already blocked, so goroutines 
> can't do much.
>
> If this logic is valid, then in a complex system, which is doing many 
> random reads, while also performing calculations (like Dgraph) would 
> suffer; even if we set GOMAXPROCS to a factor more than number of cores.
>
> Ideally, the disk reads could be happening via libaio, causing the OS 
> threads to not block, so all goroutines can make progress, increasing the 
> number of read requests that can be made concurrently. This would then also 
> ensure that one doesn't need to set GOMAXPROCS to a value greater than 
> number of cores to achieve higher throughput.
>
>
> On Wed, May 17, 2017 at 10:38 AM, Manish Rai Jain  > wrote:
>
>> So, I fixed the rand and removed the atomics usage (link in my original 
>> post).
>>
>> Setting GOMAXPROCS definitely helped a lot. And now it seems to make 
>> sense, because (the following command in) fio spawns 16 threads; and 
>> GOMAXPROCS would do the same thing. However, the numbers are still quite a 
>> bit off.
>>
>> I realized fio seems to overestimate, and my Go program seems to 
>> underestimate, so we used sar to determine the IOPS.
>>
>> $ fio --name=randread --ioengine=psync --iodepth=32 --rw=randread --bs=4k 
>> --direct=0 --size=2G --numjobs=16 --runtime=120 --group_reporting
>> Gives around 62K, tested via sar -d 1 -p, while
>>
>> $ go build . && GOMAXPROCS=16 ./randread --dir ~/diskfio --jobs 16 --num 
>> 200 --mode 1
>> Gives around 44K, via sar. Number of cores on my machine are 4.
>>
>> Note that this is way better than the earlier 20K with GOMAXPROCS = 
>> number of cores, but still leaves much to be desired.
>>
>> On Tue, May 16, 2017 at 11:36 PM, Ian Lance Taylor > > wrote:
>>
>>> On Tue, May 16, 2017 at 4:59 AM, Manish Rai Jain >> > wrote:
>>> >
>>> > 3 is slower than 2 (of course). But, 2 is never able to achieve the 
>>> IOPS
>>> > that Fio can achieve. I've tried other things, to no luck. What I 
>>> notice is
>>> > that Go and Fio are close to each other as long as number of 
>>> Goroutines is
>>> > <= number of cores. Once you exceed cores, Go stays put, while Fio IOPS
>>> > keeps on improving, until it reaches SSD thresholds.
>>>
>>> One thing I notice about your program is that each goroutine is
>>> calling rand.Intn and rand.Int63n.  Those functions acquire and
>>> release a lock, so that single lock is being contested by every
>>> goroutine.  That's an unfortunate and unnecessary slowdown.  Give each
>>> goroutine its own source of pseudo-random numbers by using rand.New.
>>>
>>> You also have a point of contention on the local variable i, which you
>>> are manipulating using atomic functions.  It would be cheaper to give
>>> each goroutine a number of operations to do rather than to compute
>>> that dynamically using a contended address.
>>>
>>> I'll also note that if a program that should be I/O bound shows a
>>> behavior change when the number of parallel goroutines exceeds the
>>> number of CPUs, then it might be interesting to try setting GOMAXPROCS
>>> to be higher.  I don't know what effect that would have here, but it's
>>> worth checking.
>>>
>>> Ian
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Realizing SSD random read IOPS

2017-05-16 Thread Manish Rai Jain

So, I fixed the rand and removed the atomics usage (link in my original
post).

Setting GOMAXPROCS definitely helped a lot. And now it seems to make sense,
because (the following command in) fio spawns 16 threads; and GOMAXPROCS
would do the same thing. However, the numbers are still quite a bit off.

I realized fio seems to overestimate, and my Go program seems to
underestimate, so we used sar to determine the IOPS.

$ fio --name=randread --ioengine=psync --iodepth=32 --rw=randread --bs=4k
--direct=0 --size=2G --numjobs=16 --runtime=120 --group_reporting
Gives around 62K, tested via sar -d 1 -p, while

$ go build . && GOMAXPROCS=16 ./randread --dir ~/diskfio --jobs 16 --num
200 --mode 1
Gives around 44K, via sar. Number of cores on my machine are 4.

Note that this is way better than the earlier 20K with GOMAXPROCS = number
of cores, but still leaves much to be desired.

On Tue, May 16, 2017 at 11:36 PM, Ian Lance Taylor  wrote:

> On Tue, May 16, 2017 at 4:59 AM, Manish Rai Jain 
> wrote:
> >
> > 3 is slower than 2 (of course). But, 2 is never able to achieve the IOPS
> > that Fio can achieve. I've tried other things, to no luck. What I notice
> is
> > that Go and Fio are close to each other as long as number of Goroutines
> is
> > <= number of cores. Once you exceed cores, Go stays put, while Fio IOPS
> > keeps on improving, until it reaches SSD thresholds.
>
> One thing I notice about your program is that each goroutine is
> calling rand.Intn and rand.Int63n.  Those functions acquire and
> release a lock, so that single lock is being contested by every
> goroutine.  That's an unfortunate and unnecessary slowdown.  Give each
> goroutine its own source of pseudo-random numbers by using rand.New.
>
> You also have a point of contention on the local variable i, which you
> are manipulating using atomic functions.  It would be cheaper to give
> each goroutine a number of operations to do rather than to compute
> that dynamically using a contended address.
>
> I'll also note that if a program that should be I/O bound shows a
> behavior change when the number of parallel goroutines exceeds the
> number of CPUs, then it might be interesting to try setting GOMAXPROCS
> to be higher.  I don't know what effect that would have here, but it's
> worth checking.
>
> Ian
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Realizing SSD random read IOPS

2017-05-16 Thread Ian Lance Taylor

On Tue, May 16, 2017 at 4:59 AM, Manish Rai Jain  wrote:
>
> 3 is slower than 2 (of course). But, 2 is never able to achieve the IOPS
> that Fio can achieve. I've tried other things, to no luck. What I notice is
> that Go and Fio are close to each other as long as number of Goroutines is
> <= number of cores. Once you exceed cores, Go stays put, while Fio IOPS
> keeps on improving, until it reaches SSD thresholds.

One thing I notice about your program is that each goroutine is
calling rand.Intn and rand.Int63n.  Those functions acquire and
release a lock, so that single lock is being contested by every
goroutine.  That's an unfortunate and unnecessary slowdown.  Give each
goroutine its own source of pseudo-random numbers by using rand.New.

You also have a point of contention on the local variable i, which you
are manipulating using atomic functions.  It would be cheaper to give
each goroutine a number of operations to do rather than to compute
that dynamically using a contended address.

I'll also note that if a program that should be I/O bound shows a
behavior change when the number of parallel goroutines exceeds the
number of CPUs, then it might be interesting to try setting GOMAXPROCS
to be higher.  I don't know what effect that would have here, but it's
worth checking.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Realizing SSD random read IOPS

Re: [go-nuts] Realizing SSD random read IOPS

Re: [go-nuts] Realizing SSD random read IOPS

Re: [go-nuts] Realizing SSD random read IOPS

Re: [go-nuts] Realizing SSD random read IOPS

Re: [go-nuts] Realizing SSD random read IOPS

Re: [go-nuts] Realizing SSD random read IOPS

Re: [go-nuts] Realizing SSD random read IOPS

Re: [go-nuts] Realizing SSD random read IOPS

Re: [go-nuts] Realizing SSD random read IOPS

Re: [go-nuts] Realizing SSD random read IOPS

Re: [go-nuts] Realizing SSD random read IOPS

Re: [go-nuts] Realizing SSD random read IOPS

Re: [go-nuts] Realizing SSD random read IOPS

14 matches

Site Navigation

Mail list logo

Footer information