> On Sep 22, 2019, at 9:56 AM, Jed Brown <j...@jedbrown.org> wrote:
>
> Run two resource sets on one side versus separate nodes.
I don't know what this is suppose to mean. Is it a toy situation where you
show the problem is measurable or a real application run properly at scale
where you show the problem has an affect. Facilities care about real
applications at scale losing performance but toys don't mean that much unless
if you can convince them that it actually affects the real application at scale
as well.
This discuss is probably not important so we should drop it.
>
> On Sep 22, 2019 08:46, "Smith, Barry F." <bsm...@mcs.anl.gov> wrote:
>
> I'm guessing it would be very difficult to connect this particular
> performance bug with a decrease in performance for an actual full application
> since models don't catch this level of detail well (and since you cannot run
> the application without the bug to see the better performance)? IBM/Nvidia
> are not going to care about it if is just an abstract oddity as opposed to
> clearly demonstrating a problem for the use of the machine, especially if the
> machine is an orphan.
>
> > On Sep 22, 2019, at 8:35 AM, Jed Brown via petsc-dev
> > <petsc-dev@mcs.anl.gov> wrote:
> >
> > Karl Rupp <r...@iue.tuwien.ac.at> writes:
> >
> >>> I wonder if the single-node latency bugs on AC922 are related to these
> >>> weird performance results.
> >>>
> >>> https://docs.google.com/spreadsheets/d/1amFJIbpvs9oJcUc-WntsFHO_C0LE7xFJeor-oElt0LY/edit#gid=0
> >>>
> >>>
> >>
> >> Thanks for these numbers!
> >> Intra-Node > Inter-Node is indeed weird. I haven't observed such an
> >> inversion before.
> >
> > As far as I know, it's been there since the machines were deployed
> > despite obviously being a bug. I know people at LLNL regard it as a
> > bug, but it has not been their top priority (presumably at least in part
> > because applications have not clearly expressed the impact of latency
> > regressions on their science).
>
>
>