Re: [petsc-users] Understanding matmult memory performance

2017-09-29 Thread Lawrence Mitchell
> On 29 Sep 2017, at 15:24, Lawrence Mitchell > wrote: > >> according to >> https://ark.intel.com/products/75283/Intel-Xeon-Processor-E5-2697-v2-30M-Cache-2_70-GHz >> you get 59.7 GB/sec of peak memory bandwidth per CPU, so you should get >> about 240 GB/sec

Re: [petsc-users] Understanding matmult memory performance

2017-09-29 Thread Lawrence Mitchell
> On 29 Sep 2017, at 15:05, Tobin Isaac wrote: > > On Fri, Sep 29, 2017 at 09:04:47AM -0400, Tobin Isaac wrote: >> On Fri, Sep 29, 2017 at 12:19:54PM +0100, Lawrence Mitchell wrote: >>> Dear all, >>> >>> I'm attempting to understand some results I'm getting for matmult

Re: [petsc-users] Understanding matmult memory performance

2017-09-29 Thread Karl Rupp
Hi Lawrence, according to https://ark.intel.com/products/75283/Intel-Xeon-Processor-E5-2697-v2-30M-Cache-2_70-GHz you get 59.7 GB/sec of peak memory bandwidth per CPU, so you should get about 240 GB/sec for your two-node system. If you use PETSc's `make streams`, then processor placement may

Re: [petsc-users] Understanding matmult memory performance

2017-09-29 Thread Tobin Isaac
On Fri, Sep 29, 2017 at 09:04:47AM -0400, Tobin Isaac wrote: > On Fri, Sep 29, 2017 at 12:19:54PM +0100, Lawrence Mitchell wrote: > > Dear all, > > > > I'm attempting to understand some results I'm getting for matmult > > performance. In particular, it looks like I'm obtaining timings that > >

Re: [petsc-users] Understanding matmult memory performance

2017-09-29 Thread Tobin Isaac
On Fri, Sep 29, 2017 at 12:19:54PM +0100, Lawrence Mitchell wrote: > Dear all, > > I'm attempting to understand some results I'm getting for matmult > performance. In particular, it looks like I'm obtaining timings that suggest > that I'm getting more main memory bandwidth than I think is

[petsc-users] Understanding matmult memory performance

2017-09-29 Thread Lawrence Mitchell
Dear all, I'm attempting to understand some results I'm getting for matmult performance. In particular, it looks like I'm obtaining timings that suggest that I'm getting more main memory bandwidth than I think is possible. The run setup is using 2 24 core (dual socket) ivybridge nodes (Xeon