[go-nuts] Re: perfs of a Go binary VS C++
1.35s 10.38% 32.13% 12.99s 99.85% go-hep.org/x/hep/rootio.(*Scanner).Scan /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/scanner.go Looks like you spent your time in a scanner routine. I never used scanners to scan text tokens, but I guess there is a big perfomance penalty depending on the structure of your data. The most important problem in scanners is when they start copying the data for scanning. If you for example would have an STX [LENGTH] [DATA BLOCK] logic in your data, the scanner is a big gun shooting for nothing. You could then generate []byte blobs pointing into your hopefully mmapped data file, that you can communicate through channels and put concurrent parallel worker to classify your data blocks. I guess that's exactly how you should sequentuallly streamline your data for HEP analysis. Actually that approach would be something , that golang is very good at. Am Montag, 16. Oktober 2017 17:53:56 UTC+2 schrieb Sebastien Binet: > > hi there, > > I have this Go program that reads a binary file format coming from CERN. > I am trying to get it running almost as fast as its C++ counter-part. > > the profile and code are here: > - https://cern.ch/binet/go-hep/rootio.cpu.pprof > - https://cern.ch/binet/go-hep/read-data.go > > on my machine, I get the following timings for the C++ (g++-7.2.0) and the > Go-1.9.1 programs: > > === C++/ROOT === > real=6.84 user=6.29 sys=0.52 CPU=99% MaxRSS=258384 I/O=0/0 > real=6.99 user=6.42 sys=0.55 CPU=99% MaxRSS=258312 I/O=0/0 > real=6.68 user=6.12 sys=0.55 CPU=99% MaxRSS=257956 I/O=0/0 > real=6.88 user=6.33 sys=0.54 CPU=99% MaxRSS=258964 I/O=0/0 > real=6.87 user=6.30 sys=0.56 CPU=99% MaxRSS=258024 I/O=0/0 > real=6.89 user=6.32 sys=0.56 CPU=99% MaxRSS=260452 I/O=0/0 > real=6.70 user=6.14 sys=0.56 CPU=99% MaxRSS=258368 I/O=0/0 > real=7.02 user=6.47 sys=0.54 CPU=99% MaxRSS=258128 I/O=0/0 > real=7.46 user=6.44 sys=0.56 CPU=93% MaxRSS=257972 I/O=1840/0 > real=7.11 user=6.47 sys=0.55 CPU=98% MaxRSS=258148 I/O=3984/0 > > === go-hep/rootio === > real=13.06 user=12.43 sys=0.61 CPU=99% MaxRSS=40864 I/O=0/0 > real=13.05 user=12.46 sys=0.58 CPU=100% MaxRSS=40892 I/O=0/0 > real=13.05 user=12.46 sys=0.59 CPU=100% MaxRSS=40888 I/O=0/0 > real=13.07 user=12.48 sys=0.58 CPU=99% MaxRSS=40940 I/O=0/0 > real=13.08 user=12.48 sys=0.60 CPU=100% MaxRSS=40924 I/O=0/0 > real=13.09 user=12.46 sys=0.60 CPU=99% MaxRSS=40976 I/O=0/0 > real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40876 I/O=0/0 > real=13.09 user=12.52 sys=0.58 CPU=100% MaxRSS=40880 I/O=0/0 > real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40880 I/O=0/0 > real=12.99 user=12.41 sys=0.58 CPU=100% MaxRSS=40872 I/O=0/0 > > any idea on how to speed that up? > > I have also this rather interesting strace-log for the Go program: > - https://cern.ch/binet/go-hep/strace-read-go.txt > > ie: there is a clear pattern of "epoll_wait/pselect6" on one thread > followed by a bunch of "pread64" on another thread. > I suspect this is the runtime I/O layer (?) dispatching data to the > "worker" thread/goroutine ? > (I have tried with GOGC=off and it was still there, as well as the same > timings for my Go binary.) > > I have put a ~764Mb data file there: > - https://cern.ch/binet/go-hep/f64s.root > > if anybody wants to have a look. > > I initially thought it was a (lack of) bufio.Reader issue (as the file > format forces you to do a bunch of seeks or readat, using bufio.Reader > wasn't completely straightforward.) > but, mmap-ing the whole file in memory didn't help. > > here is the top50 from the CPU profile: > > (pprof) top50 > Showing nodes accounting for 12.90s, 99.15% of 13.01s total > Dropped 4 nodes (cum <= 0.07s) > flat flat% sum%cum cum% > 2.83s 21.75% 21.75% 3.21s 24.67% > go-hep.org/x/hep/rootio.(*LeafD).readBasket /home/binet/dev/go/gocode/src/ > go-hep.org/x/hep/rootio/leaf_gen.go > 1.35s 10.38% 32.13% 12.99s 99.85% > go-hep.org/x/hep/rootio.(*Scanner).Scan /home/binet/dev/go/gocode/src/ > go-hep.org/x/hep/rootio/scanner.go > 1.26s 9.68% 41.81% 1.26s 9.68% > go-hep.org/x/hep/rootio.(*tbranch).loadEntry > /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/basket.go > 1.17s 8.99% 50.81% 2.01s 15.45% > go-hep.org/x/hep/rootio.(*tbranch).loadBasket > /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go > 1.15s 8.84% 59.65% 7.15s 54.96% > go-hep.org/x/hep/rootio.(*tbranch).loadEntry > /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go > 0.82s 6.30% 65.95% 3.06s 23.52% > go-hep.org/x/hep/rootio.(*tbranch).scan /home/binet/dev/go/gocode/src/ > go-hep.org/x/hep/rootio/branch.go > 0.76s 5.84% 71.79% 3.90s 29.98% > go-hep.org/x/hep/rootio.(*Basket).readLeaf /home/binet/dev/go/gocode/src/ > go-hep.org/x/hep/rootio/basket.go > 0.62s 4.77% 76.56% 1.55s 11.91% > go-hep.org/x/hep/rootio.(*LeafD).scan /home/binet/dev/go/gocode/src/ >
Re: [go-nuts] Re: perfs of a Go binary VS C++
Hi, Can you publish the C++ test program as well ? I have the ROOT installed and would like to explore the issue further. -mura On Wednesday, October 18, 2017 at 6:21:17 PM UTC+8, mura wrote: > > Could you use `perf record ./test f64s.root` and then `perf report > --stdio` for both the Go and C++ programs? (You may need to `echo 0 | sudo > tee /proc/sys/kernel/kptr_restrict before perf record, supposing that you > can test it on Linux) > > I am just interested in the performance comparison between C++ and Go too. > :D > > -mura > > On Tuesday, October 17, 2017 at 6:48:43 PM UTC+8, Sebastien Binet wrote: >> >> hi, >> >> (thanks for having a look at this) >> >> On Tue, Oct 17, 2017 at 6:01 AM, murawrote: >> >>> Hi, >>> >>> A sequential read with `dd if=f64s.root of=/dev/null bs=32` takes around >>> 8.5 secs (10.4 secs using read-data.go) on my machine. It would even take >>> up to 55 secs when bs = 1. >>> Also I observe that the MaxRSS of the C++ program is much bigger than >>> the Go version, so I suspect the C++ program probably does some kind of >>> caching or preloading. >>> >> >> most of that extra RSS is coming from the C++ libraries that are linked >> against. >> that said, it's true the C++ version does both caching and prefetching. >> I am trying to devise a test case where both are disabled. >> >> but, if I were to ioutil.ReadFile("foo.root") and then use that []byte >> slice as input (implementing all the io.Reader, io.ReaderAt ... needed >> interfaces) I should be able to at least recoup the caching effect, right? >> (that's in effect pretty much the same thing than mmap-ing the whole in >> memory, isn't?) >> >> -s >> >> >>> >>> -mura >>> >>> >>> On Monday, October 16, 2017 at 11:53:56 PM UTC+8, Sebastien Binet wrote: hi there, I have this Go program that reads a binary file format coming from CERN. I am trying to get it running almost as fast as its C++ counter-part. the profile and code are here: - https://cern.ch/binet/go-hep/rootio.cpu.pprof - https://cern.ch/binet/go-hep/read-data.go on my machine, I get the following timings for the C++ (g++-7.2.0) and the Go-1.9.1 programs: === C++/ROOT === real=6.84 user=6.29 sys=0.52 CPU=99% MaxRSS=258384 I/O=0/0 real=6.99 user=6.42 sys=0.55 CPU=99% MaxRSS=258312 I/O=0/0 real=6.68 user=6.12 sys=0.55 CPU=99% MaxRSS=257956 I/O=0/0 real=6.88 user=6.33 sys=0.54 CPU=99% MaxRSS=258964 I/O=0/0 real=6.87 user=6.30 sys=0.56 CPU=99% MaxRSS=258024 I/O=0/0 real=6.89 user=6.32 sys=0.56 CPU=99% MaxRSS=260452 I/O=0/0 real=6.70 user=6.14 sys=0.56 CPU=99% MaxRSS=258368 I/O=0/0 real=7.02 user=6.47 sys=0.54 CPU=99% MaxRSS=258128 I/O=0/0 real=7.46 user=6.44 sys=0.56 CPU=93% MaxRSS=257972 I/O=1840/0 real=7.11 user=6.47 sys=0.55 CPU=98% MaxRSS=258148 I/O=3984/0 === go-hep/rootio === real=13.06 user=12.43 sys=0.61 CPU=99% MaxRSS=40864 I/O=0/0 real=13.05 user=12.46 sys=0.58 CPU=100% MaxRSS=40892 I/O=0/0 real=13.05 user=12.46 sys=0.59 CPU=100% MaxRSS=40888 I/O=0/0 real=13.07 user=12.48 sys=0.58 CPU=99% MaxRSS=40940 I/O=0/0 real=13.08 user=12.48 sys=0.60 CPU=100% MaxRSS=40924 I/O=0/0 real=13.09 user=12.46 sys=0.60 CPU=99% MaxRSS=40976 I/O=0/0 real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40876 I/O=0/0 real=13.09 user=12.52 sys=0.58 CPU=100% MaxRSS=40880 I/O=0/0 real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40880 I/O=0/0 real=12.99 user=12.41 sys=0.58 CPU=100% MaxRSS=40872 I/O=0/0 any idea on how to speed that up? I have also this rather interesting strace-log for the Go program: - https://cern.ch/binet/go-hep/strace-read-go.txt ie: there is a clear pattern of "epoll_wait/pselect6" on one thread followed by a bunch of "pread64" on another thread. I suspect this is the runtime I/O layer (?) dispatching data to the "worker" thread/goroutine ? (I have tried with GOGC=off and it was still there, as well as the same timings for my Go binary.) I have put a ~764Mb data file there: - https://cern.ch/binet/go-hep/f64s.root if anybody wants to have a look. I initially thought it was a (lack of) bufio.Reader issue (as the file format forces you to do a bunch of seeks or readat, using bufio.Reader wasn't completely straightforward.) but, mmap-ing the whole file in memory didn't help. here is the top50 from the CPU profile: (pprof) top50 Showing nodes accounting for 12.90s, 99.15% of 13.01s total Dropped 4 nodes (cum <= 0.07s) flat flat% sum%cum cum% 2.83s 21.75% 21.75% 3.21s 24.67% go-hep.org/x/hep/rootio.(*LeafD).readBasket /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/leaf_gen.go 1.35s 10.38% 32.13% 12.99s 99.85%
Re: [go-nuts] Re: perfs of a Go binary VS C++
On Thu, Oct 19, 2017 at 8:32 AM, Sokolov Yurawrote: > Why io is not buffered? > https://github.com/go-hep/hep/blob/master/rootio/file.go#L95 > > Use bufio > that was also my first knee-jerk reaction. but I can't do that: the reader needs to implement io.Reader, io.ReaderAt and io.Seeker. bufio.Reader doesn't implement the last two. that's why I've implemented (in another branch) a little reader that just call ioutil.ReadFile, works off the raw []byte and the implements the above 3 interfaces. but I didn't notice any improvements overall. -s -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [go-nuts] Re: perfs of a Go binary VS C++
Why io is not buffered? https://github.com/go-hep/hep/blob/master/rootio/file.go#L95 Use bufio -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [go-nuts] Re: perfs of a Go binary VS C++
Could you use `perf record ./test f64s.root` and then `perf report --stdio` for both the Go and C++ programs? (You may need to `echo 0 | sudo tee /proc/sys/kernel/kptr_restrict before perf record, supposing that you can test it on Linux) I am just interested in the performance comparison between C++ and Go too. :D -mura On Tuesday, October 17, 2017 at 6:48:43 PM UTC+8, Sebastien Binet wrote: > > hi, > > (thanks for having a look at this) > > On Tue, Oct 17, 2017 at 6:01 AM, mura> wrote: > >> Hi, >> >> A sequential read with `dd if=f64s.root of=/dev/null bs=32` takes around >> 8.5 secs (10.4 secs using read-data.go) on my machine. It would even take >> up to 55 secs when bs = 1. >> Also I observe that the MaxRSS of the C++ program is much bigger than the >> Go version, so I suspect the C++ program probably does some kind of caching >> or preloading. >> > > most of that extra RSS is coming from the C++ libraries that are linked > against. > that said, it's true the C++ version does both caching and prefetching. > I am trying to devise a test case where both are disabled. > > but, if I were to ioutil.ReadFile("foo.root") and then use that []byte > slice as input (implementing all the io.Reader, io.ReaderAt ... needed > interfaces) I should be able to at least recoup the caching effect, right? > (that's in effect pretty much the same thing than mmap-ing the whole in > memory, isn't?) > > -s > > >> >> -mura >> >> >> On Monday, October 16, 2017 at 11:53:56 PM UTC+8, Sebastien Binet wrote: >>> >>> hi there, >>> >>> I have this Go program that reads a binary file format coming from CERN. >>> I am trying to get it running almost as fast as its C++ counter-part. >>> >>> the profile and code are here: >>> - https://cern.ch/binet/go-hep/rootio.cpu.pprof >>> - https://cern.ch/binet/go-hep/read-data.go >>> >>> on my machine, I get the following timings for the C++ (g++-7.2.0) and >>> the Go-1.9.1 programs: >>> >>> === C++/ROOT === >>> real=6.84 user=6.29 sys=0.52 CPU=99% MaxRSS=258384 I/O=0/0 >>> real=6.99 user=6.42 sys=0.55 CPU=99% MaxRSS=258312 I/O=0/0 >>> real=6.68 user=6.12 sys=0.55 CPU=99% MaxRSS=257956 I/O=0/0 >>> real=6.88 user=6.33 sys=0.54 CPU=99% MaxRSS=258964 I/O=0/0 >>> real=6.87 user=6.30 sys=0.56 CPU=99% MaxRSS=258024 I/O=0/0 >>> real=6.89 user=6.32 sys=0.56 CPU=99% MaxRSS=260452 I/O=0/0 >>> real=6.70 user=6.14 sys=0.56 CPU=99% MaxRSS=258368 I/O=0/0 >>> real=7.02 user=6.47 sys=0.54 CPU=99% MaxRSS=258128 I/O=0/0 >>> real=7.46 user=6.44 sys=0.56 CPU=93% MaxRSS=257972 I/O=1840/0 >>> real=7.11 user=6.47 sys=0.55 CPU=98% MaxRSS=258148 I/O=3984/0 >>> >>> === go-hep/rootio === >>> real=13.06 user=12.43 sys=0.61 CPU=99% MaxRSS=40864 I/O=0/0 >>> real=13.05 user=12.46 sys=0.58 CPU=100% MaxRSS=40892 I/O=0/0 >>> real=13.05 user=12.46 sys=0.59 CPU=100% MaxRSS=40888 I/O=0/0 >>> real=13.07 user=12.48 sys=0.58 CPU=99% MaxRSS=40940 I/O=0/0 >>> real=13.08 user=12.48 sys=0.60 CPU=100% MaxRSS=40924 I/O=0/0 >>> real=13.09 user=12.46 sys=0.60 CPU=99% MaxRSS=40976 I/O=0/0 >>> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40876 I/O=0/0 >>> real=13.09 user=12.52 sys=0.58 CPU=100% MaxRSS=40880 I/O=0/0 >>> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40880 I/O=0/0 >>> real=12.99 user=12.41 sys=0.58 CPU=100% MaxRSS=40872 I/O=0/0 >>> >>> any idea on how to speed that up? >>> >>> I have also this rather interesting strace-log for the Go program: >>> - https://cern.ch/binet/go-hep/strace-read-go.txt >>> >>> ie: there is a clear pattern of "epoll_wait/pselect6" on one thread >>> followed by a bunch of "pread64" on another thread. >>> I suspect this is the runtime I/O layer (?) dispatching data to the >>> "worker" thread/goroutine ? >>> (I have tried with GOGC=off and it was still there, as well as the same >>> timings for my Go binary.) >>> >>> I have put a ~764Mb data file there: >>> - https://cern.ch/binet/go-hep/f64s.root >>> >>> if anybody wants to have a look. >>> >>> I initially thought it was a (lack of) bufio.Reader issue (as the file >>> format forces you to do a bunch of seeks or readat, using bufio.Reader >>> wasn't completely straightforward.) >>> but, mmap-ing the whole file in memory didn't help. >>> >>> here is the top50 from the CPU profile: >>> >>> (pprof) top50 >>> Showing nodes accounting for 12.90s, 99.15% of 13.01s total >>> Dropped 4 nodes (cum <= 0.07s) >>> flat flat% sum%cum cum% >>> 2.83s 21.75% 21.75% 3.21s 24.67% >>> go-hep.org/x/hep/rootio.(*LeafD).readBasket >>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/leaf_gen.go >>> 1.35s 10.38% 32.13% 12.99s 99.85% >>> go-hep.org/x/hep/rootio.(*Scanner).Scan /home/binet/dev/go/gocode/src/ >>> go-hep.org/x/hep/rootio/scanner.go >>> 1.26s 9.68% 41.81% 1.26s 9.68% >>> go-hep.org/x/hep/rootio.(*tbranch).loadEntry >>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/basket.go >>> 1.17s 8.99% 50.81% 2.01s 15.45% >>>
Re: [go-nuts] Re: perfs of a Go binary VS C++
yes On Tue, Oct 17, 2017 at 3:47 AM, Sebastien Binetwrote: > hi, > > (thanks for having a look at this) > > On Tue, Oct 17, 2017 at 6:01 AM, mura wrote: > >> Hi, >> >> A sequential read with `dd if=f64s.root of=/dev/null bs=32` takes around >> 8.5 secs (10.4 secs using read-data.go) on my machine. It would even take >> up to 55 secs when bs = 1. >> Also I observe that the MaxRSS of the C++ program is much bigger than the >> Go version, so I suspect the C++ program probably does some kind of caching >> or preloading. >> > > most of that extra RSS is coming from the C++ libraries that are linked > against. > that said, it's true the C++ version does both caching and prefetching. > I am trying to devise a test case where both are disabled. > > but, if I were to ioutil.ReadFile("foo.root") and then use that []byte > slice as input (implementing all the io.Reader, io.ReaderAt ... needed > interfaces) I should be able to at least recoup the caching effect, right? > (that's in effect pretty much the same thing than mmap-ing the whole in > memory, isn't?) > > -s > > >> >> -mura >> >> >> On Monday, October 16, 2017 at 11:53:56 PM UTC+8, Sebastien Binet wrote: >>> >>> hi there, >>> >>> I have this Go program that reads a binary file format coming from CERN. >>> I am trying to get it running almost as fast as its C++ counter-part. >>> >>> the profile and code are here: >>> - https://cern.ch/binet/go-hep/rootio.cpu.pprof >>> - https://cern.ch/binet/go-hep/read-data.go >>> >>> on my machine, I get the following timings for the C++ (g++-7.2.0) and >>> the Go-1.9.1 programs: >>> >>> === C++/ROOT === >>> real=6.84 user=6.29 sys=0.52 CPU=99% MaxRSS=258384 I/O=0/0 >>> real=6.99 user=6.42 sys=0.55 CPU=99% MaxRSS=258312 I/O=0/0 >>> real=6.68 user=6.12 sys=0.55 CPU=99% MaxRSS=257956 I/O=0/0 >>> real=6.88 user=6.33 sys=0.54 CPU=99% MaxRSS=258964 I/O=0/0 >>> real=6.87 user=6.30 sys=0.56 CPU=99% MaxRSS=258024 I/O=0/0 >>> real=6.89 user=6.32 sys=0.56 CPU=99% MaxRSS=260452 I/O=0/0 >>> real=6.70 user=6.14 sys=0.56 CPU=99% MaxRSS=258368 I/O=0/0 >>> real=7.02 user=6.47 sys=0.54 CPU=99% MaxRSS=258128 I/O=0/0 >>> real=7.46 user=6.44 sys=0.56 CPU=93% MaxRSS=257972 I/O=1840/0 >>> real=7.11 user=6.47 sys=0.55 CPU=98% MaxRSS=258148 I/O=3984/0 >>> >>> === go-hep/rootio === >>> real=13.06 user=12.43 sys=0.61 CPU=99% MaxRSS=40864 I/O=0/0 >>> real=13.05 user=12.46 sys=0.58 CPU=100% MaxRSS=40892 I/O=0/0 >>> real=13.05 user=12.46 sys=0.59 CPU=100% MaxRSS=40888 I/O=0/0 >>> real=13.07 user=12.48 sys=0.58 CPU=99% MaxRSS=40940 I/O=0/0 >>> real=13.08 user=12.48 sys=0.60 CPU=100% MaxRSS=40924 I/O=0/0 >>> real=13.09 user=12.46 sys=0.60 CPU=99% MaxRSS=40976 I/O=0/0 >>> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40876 I/O=0/0 >>> real=13.09 user=12.52 sys=0.58 CPU=100% MaxRSS=40880 I/O=0/0 >>> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40880 I/O=0/0 >>> real=12.99 user=12.41 sys=0.58 CPU=100% MaxRSS=40872 I/O=0/0 >>> >>> any idea on how to speed that up? >>> >>> I have also this rather interesting strace-log for the Go program: >>> - https://cern.ch/binet/go-hep/strace-read-go.txt >>> >>> ie: there is a clear pattern of "epoll_wait/pselect6" on one thread >>> followed by a bunch of "pread64" on another thread. >>> I suspect this is the runtime I/O layer (?) dispatching data to the >>> "worker" thread/goroutine ? >>> (I have tried with GOGC=off and it was still there, as well as the same >>> timings for my Go binary.) >>> >>> I have put a ~764Mb data file there: >>> - https://cern.ch/binet/go-hep/f64s.root >>> >>> if anybody wants to have a look. >>> >>> I initially thought it was a (lack of) bufio.Reader issue (as the file >>> format forces you to do a bunch of seeks or readat, using bufio.Reader >>> wasn't completely straightforward.) >>> but, mmap-ing the whole file in memory didn't help. >>> >>> here is the top50 from the CPU profile: >>> >>> (pprof) top50 >>> Showing nodes accounting for 12.90s, 99.15% of 13.01s total >>> Dropped 4 nodes (cum <= 0.07s) >>> flat flat% sum%cum cum% >>> 2.83s 21.75% 21.75% 3.21s 24.67% >>> go-hep.org/x/hep/rootio.(*LeafD).readBasket >>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/leaf_gen.go >>> 1.35s 10.38% 32.13% 12.99s 99.85% >>> go-hep.org/x/hep/rootio.(*Scanner).Scan /home/binet/dev/go/gocode/src/ >>> go-hep.org/x/hep/rootio/scanner.go >>> 1.26s 9.68% 41.81% 1.26s 9.68% >>> go-hep.org/x/hep/rootio.(*tbranch).loadEntry >>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/basket.go >>> 1.17s 8.99% 50.81% 2.01s 15.45% >>> go-hep.org/x/hep/rootio.(*tbranch).loadBasket >>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go >>> 1.15s 8.84% 59.65% 7.15s 54.96% >>> go-hep.org/x/hep/rootio.(*tbranch).loadEntry >>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go >>> 0.82s 6.30% 65.95% 3.06s 23.52% >>> go-hep.org/x/hep/rootio.(*tbranch).scan
Re: [go-nuts] Re: perfs of a Go binary VS C++
hi, (thanks for having a look at this) On Tue, Oct 17, 2017 at 6:01 AM, murawrote: > Hi, > > A sequential read with `dd if=f64s.root of=/dev/null bs=32` takes around > 8.5 secs (10.4 secs using read-data.go) on my machine. It would even take > up to 55 secs when bs = 1. > Also I observe that the MaxRSS of the C++ program is much bigger than the > Go version, so I suspect the C++ program probably does some kind of caching > or preloading. > most of that extra RSS is coming from the C++ libraries that are linked against. that said, it's true the C++ version does both caching and prefetching. I am trying to devise a test case where both are disabled. but, if I were to ioutil.ReadFile("foo.root") and then use that []byte slice as input (implementing all the io.Reader, io.ReaderAt ... needed interfaces) I should be able to at least recoup the caching effect, right? (that's in effect pretty much the same thing than mmap-ing the whole in memory, isn't?) -s > > -mura > > > On Monday, October 16, 2017 at 11:53:56 PM UTC+8, Sebastien Binet wrote: >> >> hi there, >> >> I have this Go program that reads a binary file format coming from CERN. >> I am trying to get it running almost as fast as its C++ counter-part. >> >> the profile and code are here: >> - https://cern.ch/binet/go-hep/rootio.cpu.pprof >> - https://cern.ch/binet/go-hep/read-data.go >> >> on my machine, I get the following timings for the C++ (g++-7.2.0) and >> the Go-1.9.1 programs: >> >> === C++/ROOT === >> real=6.84 user=6.29 sys=0.52 CPU=99% MaxRSS=258384 I/O=0/0 >> real=6.99 user=6.42 sys=0.55 CPU=99% MaxRSS=258312 I/O=0/0 >> real=6.68 user=6.12 sys=0.55 CPU=99% MaxRSS=257956 I/O=0/0 >> real=6.88 user=6.33 sys=0.54 CPU=99% MaxRSS=258964 I/O=0/0 >> real=6.87 user=6.30 sys=0.56 CPU=99% MaxRSS=258024 I/O=0/0 >> real=6.89 user=6.32 sys=0.56 CPU=99% MaxRSS=260452 I/O=0/0 >> real=6.70 user=6.14 sys=0.56 CPU=99% MaxRSS=258368 I/O=0/0 >> real=7.02 user=6.47 sys=0.54 CPU=99% MaxRSS=258128 I/O=0/0 >> real=7.46 user=6.44 sys=0.56 CPU=93% MaxRSS=257972 I/O=1840/0 >> real=7.11 user=6.47 sys=0.55 CPU=98% MaxRSS=258148 I/O=3984/0 >> >> === go-hep/rootio === >> real=13.06 user=12.43 sys=0.61 CPU=99% MaxRSS=40864 I/O=0/0 >> real=13.05 user=12.46 sys=0.58 CPU=100% MaxRSS=40892 I/O=0/0 >> real=13.05 user=12.46 sys=0.59 CPU=100% MaxRSS=40888 I/O=0/0 >> real=13.07 user=12.48 sys=0.58 CPU=99% MaxRSS=40940 I/O=0/0 >> real=13.08 user=12.48 sys=0.60 CPU=100% MaxRSS=40924 I/O=0/0 >> real=13.09 user=12.46 sys=0.60 CPU=99% MaxRSS=40976 I/O=0/0 >> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40876 I/O=0/0 >> real=13.09 user=12.52 sys=0.58 CPU=100% MaxRSS=40880 I/O=0/0 >> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40880 I/O=0/0 >> real=12.99 user=12.41 sys=0.58 CPU=100% MaxRSS=40872 I/O=0/0 >> >> any idea on how to speed that up? >> >> I have also this rather interesting strace-log for the Go program: >> - https://cern.ch/binet/go-hep/strace-read-go.txt >> >> ie: there is a clear pattern of "epoll_wait/pselect6" on one thread >> followed by a bunch of "pread64" on another thread. >> I suspect this is the runtime I/O layer (?) dispatching data to the >> "worker" thread/goroutine ? >> (I have tried with GOGC=off and it was still there, as well as the same >> timings for my Go binary.) >> >> I have put a ~764Mb data file there: >> - https://cern.ch/binet/go-hep/f64s.root >> >> if anybody wants to have a look. >> >> I initially thought it was a (lack of) bufio.Reader issue (as the file >> format forces you to do a bunch of seeks or readat, using bufio.Reader >> wasn't completely straightforward.) >> but, mmap-ing the whole file in memory didn't help. >> >> here is the top50 from the CPU profile: >> >> (pprof) top50 >> Showing nodes accounting for 12.90s, 99.15% of 13.01s total >> Dropped 4 nodes (cum <= 0.07s) >> flat flat% sum%cum cum% >> 2.83s 21.75% 21.75% 3.21s 24.67% >> go-hep.org/x/hep/rootio.(*LeafD).readBasket >> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/leaf_gen.go >> 1.35s 10.38% 32.13% 12.99s 99.85% >> go-hep.org/x/hep/rootio.(*Scanner).Scan /home/binet/dev/go/gocode/src/ >> go-hep.org/x/hep/rootio/scanner.go >> 1.26s 9.68% 41.81% 1.26s 9.68% >> go-hep.org/x/hep/rootio.(*tbranch).loadEntry >> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/basket.go >> 1.17s 8.99% 50.81% 2.01s 15.45% >> go-hep.org/x/hep/rootio.(*tbranch).loadBasket >> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go >> 1.15s 8.84% 59.65% 7.15s 54.96% >> go-hep.org/x/hep/rootio.(*tbranch).loadEntry >> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go >> 0.82s 6.30% 65.95% 3.06s 23.52% >> go-hep.org/x/hep/rootio.(*tbranch).scan /home/binet/dev/go/gocode/src/ >> go-hep.org/x/hep/rootio/branch.go >> 0.76s 5.84% 71.79% 3.90s 29.98% >> go-hep.org/x/hep/rootio.(*Basket).readLeaf /home/binet/dev/go/gocode/src/ >>
[go-nuts] Re: perfs of a Go binary VS C++
Hi, A sequential read with `dd if=f64s.root of=/dev/null bs=32` takes around 8.5 secs (10.4 secs using read-data.go) on my machine. It would even take up to 55 secs when bs = 1. Also I observe that the MaxRSS of the C++ program is much bigger than the Go version, so I suspect the C++ program probably does some kind of caching or preloading. -mura On Monday, October 16, 2017 at 11:53:56 PM UTC+8, Sebastien Binet wrote: > > hi there, > > I have this Go program that reads a binary file format coming from CERN. > I am trying to get it running almost as fast as its C++ counter-part. > > the profile and code are here: > - https://cern.ch/binet/go-hep/rootio.cpu.pprof > - https://cern.ch/binet/go-hep/read-data.go > > on my machine, I get the following timings for the C++ (g++-7.2.0) and the > Go-1.9.1 programs: > > === C++/ROOT === > real=6.84 user=6.29 sys=0.52 CPU=99% MaxRSS=258384 I/O=0/0 > real=6.99 user=6.42 sys=0.55 CPU=99% MaxRSS=258312 I/O=0/0 > real=6.68 user=6.12 sys=0.55 CPU=99% MaxRSS=257956 I/O=0/0 > real=6.88 user=6.33 sys=0.54 CPU=99% MaxRSS=258964 I/O=0/0 > real=6.87 user=6.30 sys=0.56 CPU=99% MaxRSS=258024 I/O=0/0 > real=6.89 user=6.32 sys=0.56 CPU=99% MaxRSS=260452 I/O=0/0 > real=6.70 user=6.14 sys=0.56 CPU=99% MaxRSS=258368 I/O=0/0 > real=7.02 user=6.47 sys=0.54 CPU=99% MaxRSS=258128 I/O=0/0 > real=7.46 user=6.44 sys=0.56 CPU=93% MaxRSS=257972 I/O=1840/0 > real=7.11 user=6.47 sys=0.55 CPU=98% MaxRSS=258148 I/O=3984/0 > > === go-hep/rootio === > real=13.06 user=12.43 sys=0.61 CPU=99% MaxRSS=40864 I/O=0/0 > real=13.05 user=12.46 sys=0.58 CPU=100% MaxRSS=40892 I/O=0/0 > real=13.05 user=12.46 sys=0.59 CPU=100% MaxRSS=40888 I/O=0/0 > real=13.07 user=12.48 sys=0.58 CPU=99% MaxRSS=40940 I/O=0/0 > real=13.08 user=12.48 sys=0.60 CPU=100% MaxRSS=40924 I/O=0/0 > real=13.09 user=12.46 sys=0.60 CPU=99% MaxRSS=40976 I/O=0/0 > real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40876 I/O=0/0 > real=13.09 user=12.52 sys=0.58 CPU=100% MaxRSS=40880 I/O=0/0 > real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40880 I/O=0/0 > real=12.99 user=12.41 sys=0.58 CPU=100% MaxRSS=40872 I/O=0/0 > > any idea on how to speed that up? > > I have also this rather interesting strace-log for the Go program: > - https://cern.ch/binet/go-hep/strace-read-go.txt > > ie: there is a clear pattern of "epoll_wait/pselect6" on one thread > followed by a bunch of "pread64" on another thread. > I suspect this is the runtime I/O layer (?) dispatching data to the > "worker" thread/goroutine ? > (I have tried with GOGC=off and it was still there, as well as the same > timings for my Go binary.) > > I have put a ~764Mb data file there: > - https://cern.ch/binet/go-hep/f64s.root > > if anybody wants to have a look. > > I initially thought it was a (lack of) bufio.Reader issue (as the file > format forces you to do a bunch of seeks or readat, using bufio.Reader > wasn't completely straightforward.) > but, mmap-ing the whole file in memory didn't help. > > here is the top50 from the CPU profile: > > (pprof) top50 > Showing nodes accounting for 12.90s, 99.15% of 13.01s total > Dropped 4 nodes (cum <= 0.07s) > flat flat% sum%cum cum% > 2.83s 21.75% 21.75% 3.21s 24.67% > go-hep.org/x/hep/rootio.(*LeafD).readBasket /home/binet/dev/go/gocode/src/ > go-hep.org/x/hep/rootio/leaf_gen.go > 1.35s 10.38% 32.13% 12.99s 99.85% > go-hep.org/x/hep/rootio.(*Scanner).Scan /home/binet/dev/go/gocode/src/ > go-hep.org/x/hep/rootio/scanner.go > 1.26s 9.68% 41.81% 1.26s 9.68% > go-hep.org/x/hep/rootio.(*tbranch).loadEntry > /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/basket.go > 1.17s 8.99% 50.81% 2.01s 15.45% > go-hep.org/x/hep/rootio.(*tbranch).loadBasket > /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go > 1.15s 8.84% 59.65% 7.15s 54.96% > go-hep.org/x/hep/rootio.(*tbranch).loadEntry > /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go > 0.82s 6.30% 65.95% 3.06s 23.52% > go-hep.org/x/hep/rootio.(*tbranch).scan /home/binet/dev/go/gocode/src/ > go-hep.org/x/hep/rootio/branch.go > 0.76s 5.84% 71.79% 3.90s 29.98% > go-hep.org/x/hep/rootio.(*Basket).readLeaf /home/binet/dev/go/gocode/src/ > go-hep.org/x/hep/rootio/basket.go > 0.62s 4.77% 76.56% 1.55s 11.91% > go-hep.org/x/hep/rootio.(*LeafD).scan /home/binet/dev/go/gocode/src/ > go-hep.org/x/hep/rootio/leaf_gen.go > 0.56s 4.30% 80.86% 0.56s 4.30% syscall.Syscall6 > /usr/lib/go/src/syscall/asm_linux_amd64.s > 0.44s 3.38% 84.24% 0.93s 7.15% reflect.Indirect > /usr/lib/go/src/reflect/value.go > 0.38s 2.92% 87.16% 0.38s 2.92% reflect.Value.Elem > /usr/lib/go/src/reflect/value.go > 0.34s 2.61% 89.78% 0.34s 2.61% reflect.ValueOf > /usr/lib/go/src/reflect/value.go > 0.28s 2.15% 91.93% 0.28s 2.15% > go-hep.org/x/hep/rootio.(*tbranch).findBasketIndex >