[go-nuts] Re: perfs of a Go binary VS C++

2019-09-06 Thread Ingo Krabbe
 1.35s 10.38% 32.13% 12.99s 99.85%  
go-hep.org/x/hep/rootio.(*Scanner).Scan 
/home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/scanner.go

Looks like you spent your time in a scanner routine.
I never used scanners to scan text tokens, but I guess there is a big 
perfomance penalty depending on the structure of your data.
The most important problem in scanners is when they start copying the data 
for scanning.

If you for example would have an STX [LENGTH] [DATA BLOCK]  logic in your 
data, the scanner is a big gun shooting for nothing. You could then 
generate []byte blobs pointing into your hopefully mmapped data file, that 
you can communicate through channels and put concurrent parallel worker to 
classify your data blocks. I guess that's exactly how you should 
sequentuallly streamline your data for HEP analysis. Actually that approach 
would be something , that golang is very good at.

Am Montag, 16. Oktober 2017 17:53:56 UTC+2 schrieb Sebastien Binet:
>
> hi there,
>
> I have this Go program that reads a binary file format coming from CERN.
> I am trying to get it running almost as fast as its C++ counter-part.
>
> the profile and code are here:
> - https://cern.ch/binet/go-hep/rootio.cpu.pprof
> - https://cern.ch/binet/go-hep/read-data.go
>
> on my machine, I get the following timings for the C++ (g++-7.2.0) and the 
> Go-1.9.1 programs:
>
> === C++/ROOT ===
> real=6.84 user=6.29 sys=0.52 CPU=99% MaxRSS=258384 I/O=0/0
> real=6.99 user=6.42 sys=0.55 CPU=99% MaxRSS=258312 I/O=0/0
> real=6.68 user=6.12 sys=0.55 CPU=99% MaxRSS=257956 I/O=0/0
> real=6.88 user=6.33 sys=0.54 CPU=99% MaxRSS=258964 I/O=0/0
> real=6.87 user=6.30 sys=0.56 CPU=99% MaxRSS=258024 I/O=0/0
> real=6.89 user=6.32 sys=0.56 CPU=99% MaxRSS=260452 I/O=0/0
> real=6.70 user=6.14 sys=0.56 CPU=99% MaxRSS=258368 I/O=0/0
> real=7.02 user=6.47 sys=0.54 CPU=99% MaxRSS=258128 I/O=0/0
> real=7.46 user=6.44 sys=0.56 CPU=93% MaxRSS=257972 I/O=1840/0
> real=7.11 user=6.47 sys=0.55 CPU=98% MaxRSS=258148 I/O=3984/0
>
> === go-hep/rootio ===
> real=13.06 user=12.43 sys=0.61 CPU=99%  MaxRSS=40864 I/O=0/0
> real=13.05 user=12.46 sys=0.58 CPU=100% MaxRSS=40892 I/O=0/0
> real=13.05 user=12.46 sys=0.59 CPU=100% MaxRSS=40888 I/O=0/0
> real=13.07 user=12.48 sys=0.58 CPU=99%  MaxRSS=40940 I/O=0/0
> real=13.08 user=12.48 sys=0.60 CPU=100% MaxRSS=40924 I/O=0/0
> real=13.09 user=12.46 sys=0.60 CPU=99%  MaxRSS=40976 I/O=0/0
> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40876 I/O=0/0
> real=13.09 user=12.52 sys=0.58 CPU=100% MaxRSS=40880 I/O=0/0
> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40880 I/O=0/0
> real=12.99 user=12.41 sys=0.58 CPU=100% MaxRSS=40872 I/O=0/0
>
> any idea on how to speed that up?
>
> I have also this rather interesting strace-log for the Go program:
> - https://cern.ch/binet/go-hep/strace-read-go.txt
>
> ie: there is a clear pattern of "epoll_wait/pselect6" on one thread 
> followed by a bunch of "pread64" on another thread.
> I suspect this is the runtime I/O layer (?) dispatching data to the 
> "worker" thread/goroutine ?
> (I have tried with GOGC=off and it was still there, as well as the same 
> timings for my Go binary.)
>
> I have put a ~764Mb data file there:
> - https://cern.ch/binet/go-hep/f64s.root
>
> if anybody wants to have a look.
>
> I initially thought it was a (lack of) bufio.Reader issue (as the file 
> format forces you to do a bunch of seeks or readat, using bufio.Reader 
> wasn't completely straightforward.)
> but, mmap-ing the whole file in memory didn't help.
>
> here is the top50 from the CPU profile:
>
> (pprof) top50   
> Showing nodes accounting for 12.90s, 99.15% of 13.01s total
> Dropped 4 nodes (cum <= 0.07s)
>   flat  flat%   sum%cum   cum%
>  2.83s 21.75% 21.75%  3.21s 24.67%  
> go-hep.org/x/hep/rootio.(*LeafD).readBasket /home/binet/dev/go/gocode/src/
> go-hep.org/x/hep/rootio/leaf_gen.go
>  1.35s 10.38% 32.13% 12.99s 99.85%  
> go-hep.org/x/hep/rootio.(*Scanner).Scan /home/binet/dev/go/gocode/src/
> go-hep.org/x/hep/rootio/scanner.go
>  1.26s  9.68% 41.81%  1.26s  9.68%  
> go-hep.org/x/hep/rootio.(*tbranch).loadEntry 
> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/basket.go
>  1.17s  8.99% 50.81%  2.01s 15.45%  
> go-hep.org/x/hep/rootio.(*tbranch).loadBasket 
> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go
>  1.15s  8.84% 59.65%  7.15s 54.96%  
> go-hep.org/x/hep/rootio.(*tbranch).loadEntry 
> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go
>  0.82s  6.30% 65.95%  3.06s 23.52%  
> go-hep.org/x/hep/rootio.(*tbranch).scan /home/binet/dev/go/gocode/src/
> go-hep.org/x/hep/rootio/branch.go
>  0.76s  5.84% 71.79%  3.90s 29.98%  
> go-hep.org/x/hep/rootio.(*Basket).readLeaf /home/binet/dev/go/gocode/src/
> go-hep.org/x/hep/rootio/basket.go
>  0.62s  4.77% 76.56%  1.55s 11.91%  
> go-hep.org/x/hep/rootio.(*LeafD).scan /home/binet/dev/go/gocode/src/
> 

Re: [go-nuts] Re: perfs of a Go binary VS C++

2017-10-19 Thread mura
Hi,
Can you publish the C++ test program as well ? I have the ROOT installed 
and would like to explore the issue further.

-mura

On Wednesday, October 18, 2017 at 6:21:17 PM UTC+8, mura wrote:
>
> Could you use `perf record ./test f64s.root` and then `perf report 
> --stdio` for both the Go and C++ programs? (You may need to `echo 0 | sudo 
> tee /proc/sys/kernel/kptr_restrict before perf record, supposing that you 
> can test it on Linux)
>
> I am just interested in the performance comparison between C++ and Go too. 
> :D
>
> -mura
>
> On Tuesday, October 17, 2017 at 6:48:43 PM UTC+8, Sebastien Binet wrote:
>>
>> hi,
>>
>> (thanks for having a look at this)
>>
>> On Tue, Oct 17, 2017 at 6:01 AM, mura  wrote:
>>
>>> Hi,
>>>
>>> A sequential read with `dd if=f64s.root of=/dev/null bs=32` takes around 
>>> 8.5 secs (10.4 secs using read-data.go) on my machine. It would even take 
>>> up to 55 secs when bs = 1.
>>> Also I observe that the MaxRSS of the C++ program is much bigger than 
>>> the Go version, so I suspect the C++ program probably does some kind of 
>>> caching or preloading.
>>>
>>
>> most of that extra RSS is coming from the C++ libraries that are linked 
>> against.
>> that said, it's true the C++ version does both caching and prefetching.
>> I am trying to devise a test case where both are disabled.
>>
>> but, if I were to ioutil.ReadFile("foo.root") and then use that []byte 
>> slice as input (implementing all the io.Reader, io.ReaderAt ... needed 
>> interfaces) I should be able to at least recoup the caching effect, right? 
>> (that's in effect pretty much the same thing than mmap-ing the whole in 
>> memory, isn't?)
>>
>> -s
>>  
>>
>>>
>>> -mura
>>>
>>>
>>> On Monday, October 16, 2017 at 11:53:56 PM UTC+8, Sebastien Binet wrote:

 hi there,

 I have this Go program that reads a binary file format coming from CERN.
 I am trying to get it running almost as fast as its C++ counter-part.

 the profile and code are here:
 - https://cern.ch/binet/go-hep/rootio.cpu.pprof
 - https://cern.ch/binet/go-hep/read-data.go

 on my machine, I get the following timings for the C++ (g++-7.2.0) and 
 the Go-1.9.1 programs:

 === C++/ROOT ===
 real=6.84 user=6.29 sys=0.52 CPU=99% MaxRSS=258384 I/O=0/0
 real=6.99 user=6.42 sys=0.55 CPU=99% MaxRSS=258312 I/O=0/0
 real=6.68 user=6.12 sys=0.55 CPU=99% MaxRSS=257956 I/O=0/0
 real=6.88 user=6.33 sys=0.54 CPU=99% MaxRSS=258964 I/O=0/0
 real=6.87 user=6.30 sys=0.56 CPU=99% MaxRSS=258024 I/O=0/0
 real=6.89 user=6.32 sys=0.56 CPU=99% MaxRSS=260452 I/O=0/0
 real=6.70 user=6.14 sys=0.56 CPU=99% MaxRSS=258368 I/O=0/0
 real=7.02 user=6.47 sys=0.54 CPU=99% MaxRSS=258128 I/O=0/0
 real=7.46 user=6.44 sys=0.56 CPU=93% MaxRSS=257972 I/O=1840/0
 real=7.11 user=6.47 sys=0.55 CPU=98% MaxRSS=258148 I/O=3984/0

 === go-hep/rootio ===
 real=13.06 user=12.43 sys=0.61 CPU=99%  MaxRSS=40864 I/O=0/0
 real=13.05 user=12.46 sys=0.58 CPU=100% MaxRSS=40892 I/O=0/0
 real=13.05 user=12.46 sys=0.59 CPU=100% MaxRSS=40888 I/O=0/0
 real=13.07 user=12.48 sys=0.58 CPU=99%  MaxRSS=40940 I/O=0/0
 real=13.08 user=12.48 sys=0.60 CPU=100% MaxRSS=40924 I/O=0/0
 real=13.09 user=12.46 sys=0.60 CPU=99%  MaxRSS=40976 I/O=0/0
 real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40876 I/O=0/0
 real=13.09 user=12.52 sys=0.58 CPU=100% MaxRSS=40880 I/O=0/0
 real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40880 I/O=0/0
 real=12.99 user=12.41 sys=0.58 CPU=100% MaxRSS=40872 I/O=0/0

 any idea on how to speed that up?

 I have also this rather interesting strace-log for the Go program:
 - https://cern.ch/binet/go-hep/strace-read-go.txt

 ie: there is a clear pattern of "epoll_wait/pselect6" on one thread 
 followed by a bunch of "pread64" on another thread.
 I suspect this is the runtime I/O layer (?) dispatching data to the 
 "worker" thread/goroutine ?
 (I have tried with GOGC=off and it was still there, as well as the same 
 timings for my Go binary.)

 I have put a ~764Mb data file there:
 - https://cern.ch/binet/go-hep/f64s.root

 if anybody wants to have a look.

 I initially thought it was a (lack of) bufio.Reader issue (as the file 
 format forces you to do a bunch of seeks or readat, using bufio.Reader 
 wasn't completely straightforward.)
 but, mmap-ing the whole file in memory didn't help.

 here is the top50 from the CPU profile:

 (pprof) top50   
 Showing nodes accounting for 12.90s, 99.15% of 13.01s total
 Dropped 4 nodes (cum <= 0.07s)
   flat  flat%   sum%cum   cum%
  2.83s 21.75% 21.75%  3.21s 24.67%  
 go-hep.org/x/hep/rootio.(*LeafD).readBasket 
 /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/leaf_gen.go
  1.35s 10.38% 32.13% 12.99s 99.85%  
 

Re: [go-nuts] Re: perfs of a Go binary VS C++

2017-10-19 Thread Sebastien Binet
On Thu, Oct 19, 2017 at 8:32 AM, Sokolov Yura 
wrote:

> Why io is not buffered?
> https://github.com/go-hep/hep/blob/master/rootio/file.go#L95
>
> Use bufio
>

that was also my first knee-jerk reaction.
but I can't do that:
the reader needs to implement io.Reader, io.ReaderAt and io.Seeker.
bufio.Reader doesn't implement the last two.

that's why I've implemented (in another branch) a little reader that just
call ioutil.ReadFile, works off the raw []byte and the implements the above
3 interfaces.
but I didn't notice any improvements overall.

-s

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Re: perfs of a Go binary VS C++

2017-10-19 Thread Sokolov Yura
Why io is not buffered?
https://github.com/go-hep/hep/blob/master/rootio/file.go#L95

Use bufio

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Re: perfs of a Go binary VS C++

2017-10-18 Thread mura
Could you use `perf record ./test f64s.root` and then `perf report --stdio` 
for both the Go and C++ programs? (You may need to `echo 0 | sudo tee 
/proc/sys/kernel/kptr_restrict before perf record, supposing that you can 
test it on Linux)

I am just interested in the performance comparison between C++ and Go too. 
:D

-mura

On Tuesday, October 17, 2017 at 6:48:43 PM UTC+8, Sebastien Binet wrote:
>
> hi,
>
> (thanks for having a look at this)
>
> On Tue, Oct 17, 2017 at 6:01 AM, mura  
> wrote:
>
>> Hi,
>>
>> A sequential read with `dd if=f64s.root of=/dev/null bs=32` takes around 
>> 8.5 secs (10.4 secs using read-data.go) on my machine. It would even take 
>> up to 55 secs when bs = 1.
>> Also I observe that the MaxRSS of the C++ program is much bigger than the 
>> Go version, so I suspect the C++ program probably does some kind of caching 
>> or preloading.
>>
>
> most of that extra RSS is coming from the C++ libraries that are linked 
> against.
> that said, it's true the C++ version does both caching and prefetching.
> I am trying to devise a test case where both are disabled.
>
> but, if I were to ioutil.ReadFile("foo.root") and then use that []byte 
> slice as input (implementing all the io.Reader, io.ReaderAt ... needed 
> interfaces) I should be able to at least recoup the caching effect, right? 
> (that's in effect pretty much the same thing than mmap-ing the whole in 
> memory, isn't?)
>
> -s
>  
>
>>
>> -mura
>>
>>
>> On Monday, October 16, 2017 at 11:53:56 PM UTC+8, Sebastien Binet wrote:
>>>
>>> hi there,
>>>
>>> I have this Go program that reads a binary file format coming from CERN.
>>> I am trying to get it running almost as fast as its C++ counter-part.
>>>
>>> the profile and code are here:
>>> - https://cern.ch/binet/go-hep/rootio.cpu.pprof
>>> - https://cern.ch/binet/go-hep/read-data.go
>>>
>>> on my machine, I get the following timings for the C++ (g++-7.2.0) and 
>>> the Go-1.9.1 programs:
>>>
>>> === C++/ROOT ===
>>> real=6.84 user=6.29 sys=0.52 CPU=99% MaxRSS=258384 I/O=0/0
>>> real=6.99 user=6.42 sys=0.55 CPU=99% MaxRSS=258312 I/O=0/0
>>> real=6.68 user=6.12 sys=0.55 CPU=99% MaxRSS=257956 I/O=0/0
>>> real=6.88 user=6.33 sys=0.54 CPU=99% MaxRSS=258964 I/O=0/0
>>> real=6.87 user=6.30 sys=0.56 CPU=99% MaxRSS=258024 I/O=0/0
>>> real=6.89 user=6.32 sys=0.56 CPU=99% MaxRSS=260452 I/O=0/0
>>> real=6.70 user=6.14 sys=0.56 CPU=99% MaxRSS=258368 I/O=0/0
>>> real=7.02 user=6.47 sys=0.54 CPU=99% MaxRSS=258128 I/O=0/0
>>> real=7.46 user=6.44 sys=0.56 CPU=93% MaxRSS=257972 I/O=1840/0
>>> real=7.11 user=6.47 sys=0.55 CPU=98% MaxRSS=258148 I/O=3984/0
>>>
>>> === go-hep/rootio ===
>>> real=13.06 user=12.43 sys=0.61 CPU=99%  MaxRSS=40864 I/O=0/0
>>> real=13.05 user=12.46 sys=0.58 CPU=100% MaxRSS=40892 I/O=0/0
>>> real=13.05 user=12.46 sys=0.59 CPU=100% MaxRSS=40888 I/O=0/0
>>> real=13.07 user=12.48 sys=0.58 CPU=99%  MaxRSS=40940 I/O=0/0
>>> real=13.08 user=12.48 sys=0.60 CPU=100% MaxRSS=40924 I/O=0/0
>>> real=13.09 user=12.46 sys=0.60 CPU=99%  MaxRSS=40976 I/O=0/0
>>> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40876 I/O=0/0
>>> real=13.09 user=12.52 sys=0.58 CPU=100% MaxRSS=40880 I/O=0/0
>>> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40880 I/O=0/0
>>> real=12.99 user=12.41 sys=0.58 CPU=100% MaxRSS=40872 I/O=0/0
>>>
>>> any idea on how to speed that up?
>>>
>>> I have also this rather interesting strace-log for the Go program:
>>> - https://cern.ch/binet/go-hep/strace-read-go.txt
>>>
>>> ie: there is a clear pattern of "epoll_wait/pselect6" on one thread 
>>> followed by a bunch of "pread64" on another thread.
>>> I suspect this is the runtime I/O layer (?) dispatching data to the 
>>> "worker" thread/goroutine ?
>>> (I have tried with GOGC=off and it was still there, as well as the same 
>>> timings for my Go binary.)
>>>
>>> I have put a ~764Mb data file there:
>>> - https://cern.ch/binet/go-hep/f64s.root
>>>
>>> if anybody wants to have a look.
>>>
>>> I initially thought it was a (lack of) bufio.Reader issue (as the file 
>>> format forces you to do a bunch of seeks or readat, using bufio.Reader 
>>> wasn't completely straightforward.)
>>> but, mmap-ing the whole file in memory didn't help.
>>>
>>> here is the top50 from the CPU profile:
>>>
>>> (pprof) top50   
>>> Showing nodes accounting for 12.90s, 99.15% of 13.01s total
>>> Dropped 4 nodes (cum <= 0.07s)
>>>   flat  flat%   sum%cum   cum%
>>>  2.83s 21.75% 21.75%  3.21s 24.67%  
>>> go-hep.org/x/hep/rootio.(*LeafD).readBasket 
>>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/leaf_gen.go
>>>  1.35s 10.38% 32.13% 12.99s 99.85%  
>>> go-hep.org/x/hep/rootio.(*Scanner).Scan /home/binet/dev/go/gocode/src/
>>> go-hep.org/x/hep/rootio/scanner.go
>>>  1.26s  9.68% 41.81%  1.26s  9.68%  
>>> go-hep.org/x/hep/rootio.(*tbranch).loadEntry 
>>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/basket.go
>>>  1.17s  8.99% 50.81%  2.01s 15.45%  
>>> 

Re: [go-nuts] Re: perfs of a Go binary VS C++

2017-10-17 Thread Michael Jones
yes

On Tue, Oct 17, 2017 at 3:47 AM, Sebastien Binet  wrote:

> hi,
>
> (thanks for having a look at this)
>
> On Tue, Oct 17, 2017 at 6:01 AM, mura  wrote:
>
>> Hi,
>>
>> A sequential read with `dd if=f64s.root of=/dev/null bs=32` takes around
>> 8.5 secs (10.4 secs using read-data.go) on my machine. It would even take
>> up to 55 secs when bs = 1.
>> Also I observe that the MaxRSS of the C++ program is much bigger than the
>> Go version, so I suspect the C++ program probably does some kind of caching
>> or preloading.
>>
>
> most of that extra RSS is coming from the C++ libraries that are linked
> against.
> that said, it's true the C++ version does both caching and prefetching.
> I am trying to devise a test case where both are disabled.
>
> but, if I were to ioutil.ReadFile("foo.root") and then use that []byte
> slice as input (implementing all the io.Reader, io.ReaderAt ... needed
> interfaces) I should be able to at least recoup the caching effect, right?
> (that's in effect pretty much the same thing than mmap-ing the whole in
> memory, isn't?)
>
> -s
>
>
>>
>> -mura
>>
>>
>> On Monday, October 16, 2017 at 11:53:56 PM UTC+8, Sebastien Binet wrote:
>>>
>>> hi there,
>>>
>>> I have this Go program that reads a binary file format coming from CERN.
>>> I am trying to get it running almost as fast as its C++ counter-part.
>>>
>>> the profile and code are here:
>>> - https://cern.ch/binet/go-hep/rootio.cpu.pprof
>>> - https://cern.ch/binet/go-hep/read-data.go
>>>
>>> on my machine, I get the following timings for the C++ (g++-7.2.0) and
>>> the Go-1.9.1 programs:
>>>
>>> === C++/ROOT ===
>>> real=6.84 user=6.29 sys=0.52 CPU=99% MaxRSS=258384 I/O=0/0
>>> real=6.99 user=6.42 sys=0.55 CPU=99% MaxRSS=258312 I/O=0/0
>>> real=6.68 user=6.12 sys=0.55 CPU=99% MaxRSS=257956 I/O=0/0
>>> real=6.88 user=6.33 sys=0.54 CPU=99% MaxRSS=258964 I/O=0/0
>>> real=6.87 user=6.30 sys=0.56 CPU=99% MaxRSS=258024 I/O=0/0
>>> real=6.89 user=6.32 sys=0.56 CPU=99% MaxRSS=260452 I/O=0/0
>>> real=6.70 user=6.14 sys=0.56 CPU=99% MaxRSS=258368 I/O=0/0
>>> real=7.02 user=6.47 sys=0.54 CPU=99% MaxRSS=258128 I/O=0/0
>>> real=7.46 user=6.44 sys=0.56 CPU=93% MaxRSS=257972 I/O=1840/0
>>> real=7.11 user=6.47 sys=0.55 CPU=98% MaxRSS=258148 I/O=3984/0
>>>
>>> === go-hep/rootio ===
>>> real=13.06 user=12.43 sys=0.61 CPU=99%  MaxRSS=40864 I/O=0/0
>>> real=13.05 user=12.46 sys=0.58 CPU=100% MaxRSS=40892 I/O=0/0
>>> real=13.05 user=12.46 sys=0.59 CPU=100% MaxRSS=40888 I/O=0/0
>>> real=13.07 user=12.48 sys=0.58 CPU=99%  MaxRSS=40940 I/O=0/0
>>> real=13.08 user=12.48 sys=0.60 CPU=100% MaxRSS=40924 I/O=0/0
>>> real=13.09 user=12.46 sys=0.60 CPU=99%  MaxRSS=40976 I/O=0/0
>>> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40876 I/O=0/0
>>> real=13.09 user=12.52 sys=0.58 CPU=100% MaxRSS=40880 I/O=0/0
>>> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40880 I/O=0/0
>>> real=12.99 user=12.41 sys=0.58 CPU=100% MaxRSS=40872 I/O=0/0
>>>
>>> any idea on how to speed that up?
>>>
>>> I have also this rather interesting strace-log for the Go program:
>>> - https://cern.ch/binet/go-hep/strace-read-go.txt
>>>
>>> ie: there is a clear pattern of "epoll_wait/pselect6" on one thread
>>> followed by a bunch of "pread64" on another thread.
>>> I suspect this is the runtime I/O layer (?) dispatching data to the
>>> "worker" thread/goroutine ?
>>> (I have tried with GOGC=off and it was still there, as well as the same
>>> timings for my Go binary.)
>>>
>>> I have put a ~764Mb data file there:
>>> - https://cern.ch/binet/go-hep/f64s.root
>>>
>>> if anybody wants to have a look.
>>>
>>> I initially thought it was a (lack of) bufio.Reader issue (as the file
>>> format forces you to do a bunch of seeks or readat, using bufio.Reader
>>> wasn't completely straightforward.)
>>> but, mmap-ing the whole file in memory didn't help.
>>>
>>> here is the top50 from the CPU profile:
>>>
>>> (pprof) top50
>>> Showing nodes accounting for 12.90s, 99.15% of 13.01s total
>>> Dropped 4 nodes (cum <= 0.07s)
>>>   flat  flat%   sum%cum   cum%
>>>  2.83s 21.75% 21.75%  3.21s 24.67%
>>> go-hep.org/x/hep/rootio.(*LeafD).readBasket
>>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/leaf_gen.go
>>>  1.35s 10.38% 32.13% 12.99s 99.85%
>>> go-hep.org/x/hep/rootio.(*Scanner).Scan /home/binet/dev/go/gocode/src/
>>> go-hep.org/x/hep/rootio/scanner.go
>>>  1.26s  9.68% 41.81%  1.26s  9.68%
>>> go-hep.org/x/hep/rootio.(*tbranch).loadEntry
>>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/basket.go
>>>  1.17s  8.99% 50.81%  2.01s 15.45%
>>> go-hep.org/x/hep/rootio.(*tbranch).loadBasket
>>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go
>>>  1.15s  8.84% 59.65%  7.15s 54.96%
>>> go-hep.org/x/hep/rootio.(*tbranch).loadEntry
>>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go
>>>  0.82s  6.30% 65.95%  3.06s 23.52%
>>> go-hep.org/x/hep/rootio.(*tbranch).scan 

Re: [go-nuts] Re: perfs of a Go binary VS C++

2017-10-17 Thread Sebastien Binet
hi,

(thanks for having a look at this)

On Tue, Oct 17, 2017 at 6:01 AM, mura  wrote:

> Hi,
>
> A sequential read with `dd if=f64s.root of=/dev/null bs=32` takes around
> 8.5 secs (10.4 secs using read-data.go) on my machine. It would even take
> up to 55 secs when bs = 1.
> Also I observe that the MaxRSS of the C++ program is much bigger than the
> Go version, so I suspect the C++ program probably does some kind of caching
> or preloading.
>

most of that extra RSS is coming from the C++ libraries that are linked
against.
that said, it's true the C++ version does both caching and prefetching.
I am trying to devise a test case where both are disabled.

but, if I were to ioutil.ReadFile("foo.root") and then use that []byte
slice as input (implementing all the io.Reader, io.ReaderAt ... needed
interfaces) I should be able to at least recoup the caching effect, right?
(that's in effect pretty much the same thing than mmap-ing the whole in
memory, isn't?)

-s


>
> -mura
>
>
> On Monday, October 16, 2017 at 11:53:56 PM UTC+8, Sebastien Binet wrote:
>>
>> hi there,
>>
>> I have this Go program that reads a binary file format coming from CERN.
>> I am trying to get it running almost as fast as its C++ counter-part.
>>
>> the profile and code are here:
>> - https://cern.ch/binet/go-hep/rootio.cpu.pprof
>> - https://cern.ch/binet/go-hep/read-data.go
>>
>> on my machine, I get the following timings for the C++ (g++-7.2.0) and
>> the Go-1.9.1 programs:
>>
>> === C++/ROOT ===
>> real=6.84 user=6.29 sys=0.52 CPU=99% MaxRSS=258384 I/O=0/0
>> real=6.99 user=6.42 sys=0.55 CPU=99% MaxRSS=258312 I/O=0/0
>> real=6.68 user=6.12 sys=0.55 CPU=99% MaxRSS=257956 I/O=0/0
>> real=6.88 user=6.33 sys=0.54 CPU=99% MaxRSS=258964 I/O=0/0
>> real=6.87 user=6.30 sys=0.56 CPU=99% MaxRSS=258024 I/O=0/0
>> real=6.89 user=6.32 sys=0.56 CPU=99% MaxRSS=260452 I/O=0/0
>> real=6.70 user=6.14 sys=0.56 CPU=99% MaxRSS=258368 I/O=0/0
>> real=7.02 user=6.47 sys=0.54 CPU=99% MaxRSS=258128 I/O=0/0
>> real=7.46 user=6.44 sys=0.56 CPU=93% MaxRSS=257972 I/O=1840/0
>> real=7.11 user=6.47 sys=0.55 CPU=98% MaxRSS=258148 I/O=3984/0
>>
>> === go-hep/rootio ===
>> real=13.06 user=12.43 sys=0.61 CPU=99%  MaxRSS=40864 I/O=0/0
>> real=13.05 user=12.46 sys=0.58 CPU=100% MaxRSS=40892 I/O=0/0
>> real=13.05 user=12.46 sys=0.59 CPU=100% MaxRSS=40888 I/O=0/0
>> real=13.07 user=12.48 sys=0.58 CPU=99%  MaxRSS=40940 I/O=0/0
>> real=13.08 user=12.48 sys=0.60 CPU=100% MaxRSS=40924 I/O=0/0
>> real=13.09 user=12.46 sys=0.60 CPU=99%  MaxRSS=40976 I/O=0/0
>> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40876 I/O=0/0
>> real=13.09 user=12.52 sys=0.58 CPU=100% MaxRSS=40880 I/O=0/0
>> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40880 I/O=0/0
>> real=12.99 user=12.41 sys=0.58 CPU=100% MaxRSS=40872 I/O=0/0
>>
>> any idea on how to speed that up?
>>
>> I have also this rather interesting strace-log for the Go program:
>> - https://cern.ch/binet/go-hep/strace-read-go.txt
>>
>> ie: there is a clear pattern of "epoll_wait/pselect6" on one thread
>> followed by a bunch of "pread64" on another thread.
>> I suspect this is the runtime I/O layer (?) dispatching data to the
>> "worker" thread/goroutine ?
>> (I have tried with GOGC=off and it was still there, as well as the same
>> timings for my Go binary.)
>>
>> I have put a ~764Mb data file there:
>> - https://cern.ch/binet/go-hep/f64s.root
>>
>> if anybody wants to have a look.
>>
>> I initially thought it was a (lack of) bufio.Reader issue (as the file
>> format forces you to do a bunch of seeks or readat, using bufio.Reader
>> wasn't completely straightforward.)
>> but, mmap-ing the whole file in memory didn't help.
>>
>> here is the top50 from the CPU profile:
>>
>> (pprof) top50
>> Showing nodes accounting for 12.90s, 99.15% of 13.01s total
>> Dropped 4 nodes (cum <= 0.07s)
>>   flat  flat%   sum%cum   cum%
>>  2.83s 21.75% 21.75%  3.21s 24.67%
>> go-hep.org/x/hep/rootio.(*LeafD).readBasket
>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/leaf_gen.go
>>  1.35s 10.38% 32.13% 12.99s 99.85%
>> go-hep.org/x/hep/rootio.(*Scanner).Scan /home/binet/dev/go/gocode/src/
>> go-hep.org/x/hep/rootio/scanner.go
>>  1.26s  9.68% 41.81%  1.26s  9.68%
>> go-hep.org/x/hep/rootio.(*tbranch).loadEntry
>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/basket.go
>>  1.17s  8.99% 50.81%  2.01s 15.45%
>> go-hep.org/x/hep/rootio.(*tbranch).loadBasket
>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go
>>  1.15s  8.84% 59.65%  7.15s 54.96%
>> go-hep.org/x/hep/rootio.(*tbranch).loadEntry
>> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go
>>  0.82s  6.30% 65.95%  3.06s 23.52%
>> go-hep.org/x/hep/rootio.(*tbranch).scan /home/binet/dev/go/gocode/src/
>> go-hep.org/x/hep/rootio/branch.go
>>  0.76s  5.84% 71.79%  3.90s 29.98%
>> go-hep.org/x/hep/rootio.(*Basket).readLeaf /home/binet/dev/go/gocode/src/
>> 

[go-nuts] Re: perfs of a Go binary VS C++

2017-10-16 Thread mura
Hi,

A sequential read with `dd if=f64s.root of=/dev/null bs=32` takes around 
8.5 secs (10.4 secs using read-data.go) on my machine. It would even take 
up to 55 secs when bs = 1.
Also I observe that the MaxRSS of the C++ program is much bigger than the 
Go version, so I suspect the C++ program probably does some kind of caching 
or preloading.

-mura

On Monday, October 16, 2017 at 11:53:56 PM UTC+8, Sebastien Binet wrote:
>
> hi there,
>
> I have this Go program that reads a binary file format coming from CERN.
> I am trying to get it running almost as fast as its C++ counter-part.
>
> the profile and code are here:
> - https://cern.ch/binet/go-hep/rootio.cpu.pprof
> - https://cern.ch/binet/go-hep/read-data.go
>
> on my machine, I get the following timings for the C++ (g++-7.2.0) and the 
> Go-1.9.1 programs:
>
> === C++/ROOT ===
> real=6.84 user=6.29 sys=0.52 CPU=99% MaxRSS=258384 I/O=0/0
> real=6.99 user=6.42 sys=0.55 CPU=99% MaxRSS=258312 I/O=0/0
> real=6.68 user=6.12 sys=0.55 CPU=99% MaxRSS=257956 I/O=0/0
> real=6.88 user=6.33 sys=0.54 CPU=99% MaxRSS=258964 I/O=0/0
> real=6.87 user=6.30 sys=0.56 CPU=99% MaxRSS=258024 I/O=0/0
> real=6.89 user=6.32 sys=0.56 CPU=99% MaxRSS=260452 I/O=0/0
> real=6.70 user=6.14 sys=0.56 CPU=99% MaxRSS=258368 I/O=0/0
> real=7.02 user=6.47 sys=0.54 CPU=99% MaxRSS=258128 I/O=0/0
> real=7.46 user=6.44 sys=0.56 CPU=93% MaxRSS=257972 I/O=1840/0
> real=7.11 user=6.47 sys=0.55 CPU=98% MaxRSS=258148 I/O=3984/0
>
> === go-hep/rootio ===
> real=13.06 user=12.43 sys=0.61 CPU=99%  MaxRSS=40864 I/O=0/0
> real=13.05 user=12.46 sys=0.58 CPU=100% MaxRSS=40892 I/O=0/0
> real=13.05 user=12.46 sys=0.59 CPU=100% MaxRSS=40888 I/O=0/0
> real=13.07 user=12.48 sys=0.58 CPU=99%  MaxRSS=40940 I/O=0/0
> real=13.08 user=12.48 sys=0.60 CPU=100% MaxRSS=40924 I/O=0/0
> real=13.09 user=12.46 sys=0.60 CPU=99%  MaxRSS=40976 I/O=0/0
> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40876 I/O=0/0
> real=13.09 user=12.52 sys=0.58 CPU=100% MaxRSS=40880 I/O=0/0
> real=13.01 user=12.41 sys=0.60 CPU=100% MaxRSS=40880 I/O=0/0
> real=12.99 user=12.41 sys=0.58 CPU=100% MaxRSS=40872 I/O=0/0
>
> any idea on how to speed that up?
>
> I have also this rather interesting strace-log for the Go program:
> - https://cern.ch/binet/go-hep/strace-read-go.txt
>
> ie: there is a clear pattern of "epoll_wait/pselect6" on one thread 
> followed by a bunch of "pread64" on another thread.
> I suspect this is the runtime I/O layer (?) dispatching data to the 
> "worker" thread/goroutine ?
> (I have tried with GOGC=off and it was still there, as well as the same 
> timings for my Go binary.)
>
> I have put a ~764Mb data file there:
> - https://cern.ch/binet/go-hep/f64s.root
>
> if anybody wants to have a look.
>
> I initially thought it was a (lack of) bufio.Reader issue (as the file 
> format forces you to do a bunch of seeks or readat, using bufio.Reader 
> wasn't completely straightforward.)
> but, mmap-ing the whole file in memory didn't help.
>
> here is the top50 from the CPU profile:
>
> (pprof) top50   
> Showing nodes accounting for 12.90s, 99.15% of 13.01s total
> Dropped 4 nodes (cum <= 0.07s)
>   flat  flat%   sum%cum   cum%
>  2.83s 21.75% 21.75%  3.21s 24.67%  
> go-hep.org/x/hep/rootio.(*LeafD).readBasket /home/binet/dev/go/gocode/src/
> go-hep.org/x/hep/rootio/leaf_gen.go
>  1.35s 10.38% 32.13% 12.99s 99.85%  
> go-hep.org/x/hep/rootio.(*Scanner).Scan /home/binet/dev/go/gocode/src/
> go-hep.org/x/hep/rootio/scanner.go
>  1.26s  9.68% 41.81%  1.26s  9.68%  
> go-hep.org/x/hep/rootio.(*tbranch).loadEntry 
> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/basket.go
>  1.17s  8.99% 50.81%  2.01s 15.45%  
> go-hep.org/x/hep/rootio.(*tbranch).loadBasket 
> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go
>  1.15s  8.84% 59.65%  7.15s 54.96%  
> go-hep.org/x/hep/rootio.(*tbranch).loadEntry 
> /home/binet/dev/go/gocode/src/go-hep.org/x/hep/rootio/branch.go
>  0.82s  6.30% 65.95%  3.06s 23.52%  
> go-hep.org/x/hep/rootio.(*tbranch).scan /home/binet/dev/go/gocode/src/
> go-hep.org/x/hep/rootio/branch.go
>  0.76s  5.84% 71.79%  3.90s 29.98%  
> go-hep.org/x/hep/rootio.(*Basket).readLeaf /home/binet/dev/go/gocode/src/
> go-hep.org/x/hep/rootio/basket.go
>  0.62s  4.77% 76.56%  1.55s 11.91%  
> go-hep.org/x/hep/rootio.(*LeafD).scan /home/binet/dev/go/gocode/src/
> go-hep.org/x/hep/rootio/leaf_gen.go
>  0.56s  4.30% 80.86%  0.56s  4.30%  syscall.Syscall6 
> /usr/lib/go/src/syscall/asm_linux_amd64.s
>  0.44s  3.38% 84.24%  0.93s  7.15%  reflect.Indirect 
> /usr/lib/go/src/reflect/value.go
>  0.38s  2.92% 87.16%  0.38s  2.92%  reflect.Value.Elem 
> /usr/lib/go/src/reflect/value.go
>  0.34s  2.61% 89.78%  0.34s  2.61%  reflect.ValueOf 
> /usr/lib/go/src/reflect/value.go
>  0.28s  2.15% 91.93%  0.28s  2.15%  
> go-hep.org/x/hep/rootio.(*tbranch).findBasketIndex 
>