Re: [OMPI users] 4.1 mpi-io test failures on lustre

2021-01-19 Thread Gabriel, Edgar via users
ok, so what I get from this conversation is the following todo list:

1. check out the tests src/mpi/romio/test
2. revisit the atomicity issue. You are right that there scenarios where it 
might be required, the fact that we were not able to hit the issues in our 
tests is no evidence.
3. will work on an update of the FAQ section.



-Original Message-
From: users  On Behalf Of Dave Love via users
Sent: Monday, January 18, 2021 11:14 AM
To: Gabriel, Edgar via users 
Cc: Dave Love 
Subject: Re: [OMPI users] 4.1 mpi-io test failures on lustre

"Gabriel, Edgar via users"  writes:

>> How should we know that's expected to fail?  It at least shouldn't fail like 
>> that; set_atomicity doesn't return an error (which the test is prepared for 
>> on a filesystem like pvfs2).  
>> I assume doing nothing, but appearing to, can lead to corrupt data, and I'm 
>> surprised that isn't being seen already.
>> HDF5 requires atomicity -- at least to pass its tests -- so presumably 
>> anyone like us who needs it should use something mpich-based with recent or 
>> old romio, and that sounds like most general HPC systems.  
>> Am I missing something?
>> With the current romio everything I tried worked, but we don't get that 
>> option with openmpi.
>
> First of all, it is mentioned on the FAQ sites of Open MPI, although 
> admittedly it is not entirely update (it lists external32 support also 
> as missing, which is however now available since 4.1).

Yes, the FAQ was full of confusing obsolete material when I last looked.
Anyway, users can't be expected to check whether any particular operation is 
expected to fail silently.  I should have said that
MPI_File_set_atomicity(3) explicitly says the default is true for multiple 
nodes, and doesn't say the call is a no-op with the default implementation.  I 
don't know whether the MPI spec allows not implementing it, but I at least 
expect an error return if it doesn't.
As far as I remember, that's what romio does on a filesystem like pvfs2 (or 
lustre when people know better than implementers and insist on noflock); I 
mis-remembered from before, thinking that ompio would be changed to do the 
same.  From that thread, I did think atomicity was on its way.

Presumably an application requests atomicity for good reason, and can take 
appropriate action if the status indicates it's not available on that 
filesystem.

> You don't need atomicity for the HDF5 tests, we are passing all of them to 
> the best my knowledge, and this is one of the testsuites that we do run 
> regularly as part of our standard testing process.

I guess we're just better at breaking things.

> I am aware that they have an atomicity test -  which we pass for whatever 
> reason. This highlight also btw the issue(s) that I am having with the 
> atomicity option in MPI I/O. 

I don't know what the application is of atomicity in HDF5.  Maybe it isn't 
required for typical operations, but I assume it's not used blithely.  However, 
I'd have thought HDF5 should be prepared for something like pvfs2, and at least 
not abort the test at that stage.

I've learned to be wary of declaring concurrent systems working after a few 
tests.  In fact, the phdf5 test failed for me like this when I tried across 
four lustre client nodes with 4.1's defaults.  (I'm confused about the striping 
involved, because I thought I set it to four, and now it shows as one on that 
directory.)

  ...
  Testing  -- dataset atomic updates (atomicity)
  Proc 9: *** Parallel ERRProc 54: *** Parallel ERROR ***
  VRFY (H5Sset_hyperslab succeeded) failed at line 4293 in t_dset.c
  aborting MPI proceProc 53: *** Parallel ERROR ***

Unfortunately I hadn't turned on backtracing, and I wouldn't get another job 
trough for a while.

> The entire infrastructure to enforce atomicity is actually in place in ompio, 
> and I can give you the option on how to enforce strict atomic behavior for 
> all files in ompio (just not on a per file basis), just be aware that the 
> performance will nose-dive. This is not just the case with ompio, but also in 
> romio, you can read up on that various discussion boards on that topic, look 
> at NFS related posts (where you need the atomicity for correctness in 
> basically all scenarios).

I'm fairly sure I accidentally ran tests successfully on NFS4, at least 
single-node.  I never found a good discussion of the topic, and what I have 
seen about "NFS" was probably specific to NFS3 and non-POSIX compliance, though 
I don't actually care about parallel i/o on NFS.  The information we got about 
lustre was direct from Rob Latham, as nothing showed up online.

I don't like fast-but-wrong, so I think there should be the option of 
correctness, especially as it's the documented default.

> Just as another data point, in the 8+ years that ompio has been available, 
> t

Re: [OMPI users] 4.1 mpi-io test failures on lustre

2021-01-18 Thread Gilles Gouaillardet via users

Dave,


On 1/19/2021 2:13 AM, Dave Love via users wrote:


Generally it's not surprising if there's a shortage
of effort when outside contributions seem unwelcome.  I've tried to
contribute several times.  The final attempt wasted two or three days,
after being encouraged to get the port of current romio into a decent
state when it was being done separately "behind the scenes", but that
hasn't been released.


External contributions are not only welcome, they are encouraged.

All Pull Requests will be considered for inclusion upstream

(as long as the commits are properly signed-off).

You could not be more wrong on that part, and since you chose to bring 
your this to the public mailing list,


let me recap the facts:



ROMIO is refreshed when needed (and time allows it)

All code changes are coming from public Pull Requests.

For example :

 - ROMIO 3.3.2 refresh (https://github.com/open-mpi/ompi/pull/8249 - 
issued on November 24th 2020)


 - ROMIO 3.4b1 refresh (https://github.com/open-mpi/ompi/pull/8279 - 
issued December 10th 2020)


 - ROMIO 3.4 refresh (https://github.com/open-mpi/ompi/pull/8343 - 
January 6th 2021)



On the other hand, this is what you did:

on December 2nd you wrote to the ML:


In the meantime I've hacked in romio from mpich-4.3b1 without really
understanding what I'm doing;
and finally  posted a link to your code on December 11th (and detailed a 
shortcut you took), before deleting your repository (!) around December 
16th.


Unless I missed it, you never issued a Pull Request.





It took some time to figure out upstream ROMIO 3.3.2 did not pass the 
HDF5 test on Lustre,


and a newer ROMIO (3.4b1 at that time, 3.4 now) had to be used in order 
to fix the issue on the long term.


All the heavy lifting was already done in #8249, very likely before you 
even start hacking, and moving to 3.4b1


and 3.4 was then  very straightforward.


ROMIO 3.4 refresh will be merged in the master branch once properly 
tested and reviewed, and the goal


is to have this available in Open MPI 5.

ROMIO fixes will be applied to the release branches (and they are 
available at https://github.com/open-mpi/ompi/pull/8371)


once tested and reviewed.



Bottom line, your "hack" is the only one that was actually done behind 
the scene,


and has returned there since.

All pull requests are welcome, with- as far as I am concerned - the 
following caveat (besided signed-off commits):


Open MPI is a meritocracy.

If you had issued a proper PR (you did not, but chose to post a - now 
broken - link to your code instead),


it would likely have been rejected based on its (lack of) merits.


There are many ways to contribute to Open MPI, and in this case, 
testing/discussing the Pull Requests/Issues on github


would have been (and will be) very helpful to the Open MPI community.

On the contrary, ranting and bragging on a public ML are - in my not so 
humble opinion - counter productive, but I have a pretty high threshold


for this kind of BS. However, I have a much lower threshold for your 
gross mischaracterization of the Open MPI community, its values, and how 
the work gets done.




Cheers,


Gilles



Re: [OMPI users] 4.1 mpi-io test failures on lustre

2021-01-18 Thread Dave Love via users
"Gabriel, Edgar via users"  writes:

>> How should we know that's expected to fail?  It at least shouldn't fail like 
>> that; set_atomicity doesn't return an error (which the test is prepared for 
>> on a filesystem like pvfs2).  
>> I assume doing nothing, but appearing to, can lead to corrupt data, and I'm 
>> surprised that isn't being seen already.
>> HDF5 requires atomicity -- at least to pass its tests -- so presumably 
>> anyone like us who needs it should use something mpich-based with recent or 
>> old romio, and that sounds like most general HPC systems.  
>> Am I missing something?
>> With the current romio everything I tried worked, but we don't get that 
>> option with openmpi.
>
> First of all, it is mentioned on the FAQ sites of Open MPI, although
> admittedly it is not entirely update (it lists external32 support also
> as missing, which is however now available since 4.1).

Yes, the FAQ was full of confusing obsolete material when I last looked.
Anyway, users can't be expected to check whether any particular
operation is expected to fail silently.  I should have said that
MPI_File_set_atomicity(3) explicitly says the default is true for
multiple nodes, and doesn't say the call is a no-op with the default
implementation.  I don't know whether the MPI spec allows not
implementing it, but I at least expect an error return if it doesn't.
As far as I remember, that's what romio does on a filesystem like pvfs2
(or lustre when people know better than implementers and insist on
noflock); I mis-remembered from before, thinking that ompio would be
changed to do the same.  From that thread, I did think atomicity was on
its way.

Presumably an application requests atomicity for good reason, and can
take appropriate action if the status indicates it's not available on
that filesystem.

> You don't need atomicity for the HDF5 tests, we are passing all of them to 
> the best my knowledge, and this is one of the testsuites that we do run 
> regularly as part of our standard testing process.

I guess we're just better at breaking things.

> I am aware that they have an atomicity test -  which we pass for whatever 
> reason. This highlight also btw the issue(s) that I am having with the 
> atomicity option in MPI I/O. 

I don't know what the application is of atomicity in HDF5.  Maybe it
isn't required for typical operations, but I assume it's not used
blithely.  However, I'd have thought HDF5 should be prepared for
something like pvfs2, and at least not abort the test at that stage.

I've learned to be wary of declaring concurrent systems working after a
few tests.  In fact, the phdf5 test failed for me like this when I tried
across four lustre client nodes with 4.1's defaults.  (I'm confused
about the striping involved, because I thought I set it to four, and now
it shows as one on that directory.)

  ...
  Testing  -- dataset atomic updates (atomicity) 
  Proc 9: *** Parallel ERRProc 54: *** Parallel ERROR ***
  VRFY (H5Sset_hyperslab succeeded) failed at line 4293 in t_dset.c
  aborting MPI proceProc 53: *** Parallel ERROR ***

Unfortunately I hadn't turned on backtracing, and I wouldn't get another
job trough for a while.

> The entire infrastructure to enforce atomicity is actually in place in ompio, 
> and I can give you the option on how to enforce strict atomic behavior for 
> all files in ompio (just not on a per file basis), just be aware that the 
> performance will nose-dive. This is not just the case with ompio, but also in 
> romio, you can read up on that various discussion boards on that topic, look 
> at NFS related posts (where you need the atomicity for correctness in 
> basically all scenarios).

I'm fairly sure I accidentally ran tests successfully on NFS4, at least
single-node.  I never found a good discussion of the topic, and what I
have seen about "NFS" was probably specific to NFS3 and non-POSIX
compliance, though I don't actually care about parallel i/o on NFS.  The
information we got about lustre was direct from Rob Latham, as nothing
showed up online.

I don't like fast-but-wrong, so I think there should be the option of
correctness, especially as it's the documented default.

> Just as another data point, in the 8+ years that ompio has been available, 
> there was not one issue reported related to correctness due to missing the 
> atomicity option.

Yes, I forget some history over the years, like that one on a local
filesystem:
.

> That being said, if you feel more comfortable using romio, it is completely 
> up to you. Open MPI offers this option, and it is incredibly easy to set the 
> default parameters on a  platform for all users such that romio is being used.

Unfortunately that option fails the tests.

> We are doing with our limited resources the best we can, and while ompio is 
> by no means perfect, we try to be responsive to issues reported by users and 
> value constructive feedback and 

Re: [OMPI users] 4.1 mpi-io test failures on lustre

2021-01-15 Thread Gabriel, Edgar via users
I would like to correct one of my statements:

-Original Message-
From: users  On Behalf Of Gabriel, Edgar via 
users
Sent: Friday, January 15, 2021 7:58 AM
To: Open MPI Users 
Cc: Gabriel, Edgar 
Subject: Re: [OMPI users] 4.1 mpi-io test failures on lustre

> The entire infrastructure to enforce atomicity is actually in place in ompio, 
> and I can give you the option on how to enforce strict atomic behavior for 
> all files in ompio (just not on a per file basis), just > be aware that the 
> performance will nose-dive. 

I realized that this statement is not entirely true, we are missing one aspect 
for being able to provide full atomicity.

Thanks
Edgar



Re: [OMPI users] 4.1 mpi-io test failures on lustre

2021-01-15 Thread Gabriel, Edgar via users
-Original Message-
From: users  On Behalf Of Dave Love via users
Sent: Friday, January 15, 2021 4:48 AM
To: Gabriel, Edgar via users 
Cc: Dave Love 
Subject: Re: [OMPI users] 4.1 mpi-io test failures on lustre

> How should we know that's expected to fail?  It at least shouldn't fail like 
> that; set_atomicity doesn't return an error (which the test is prepared for 
> on a filesystem like pvfs2).  
> I assume doing nothing, but appearing to, can lead to corrupt data, and I'm 
> surprised that isn't being seen already.
> HDF5 requires atomicity -- at least to pass its tests -- so presumably anyone 
> like us who needs it should use something mpich-based with recent or old 
> romio, and that sounds like most general HPC systems.  
> Am I missing something?
> With the current romio everything I tried worked, but we don't get that 
> option with openmpi.

First of all, it is mentioned on the FAQ sites of Open MPI, although admittedly 
it is not entirely update (it lists external32 support also as missing, which 
is however now available since 4.1). You don't need atomicity for the HDF5 
tests, we are passing all of them to the best my knowledge, and this is one of 
the testsuites that we do run regularly as part of our standard testing 
process. I am aware that they have an atomicity test -  which we pass for 
whatever reason. This highlight also btw the issue(s) that I am having with the 
atomicity option in MPI I/O. 

The entire infrastructure to enforce atomicity is actually in place in ompio, 
and I can give you the option on how to enforce strict atomic behavior for all 
files in ompio (just not on a per file basis), just be aware that the 
performance will nose-dive. This is not just the case with ompio, but also in 
romio, you can read up on that various discussion boards on that topic, look at 
NFS related posts (where you need the atomicity for correctness in basically 
all scenarios).

Just as another data point, in the 8+ years that ompio has been available, 
there was not one issue reported related to correctness due to missing the 
atomicity option.

That being said, if you feel more comfortable using romio, it is completely up 
to you. Open MPI offers this option, and it is incredibly easy to set the 
default parameters on a  platform for all users such that romio is being used.
We are doing with our limited resources the best we can, and while ompio is by 
no means perfect, we try to be responsive to issues reported by users and value 
constructive feedback and discussion.

Thanks
Edgar 



Re: [OMPI users] 4.1 mpi-io test failures on lustre

2021-01-15 Thread Dave Love via users
"Gabriel, Edgar via users"  writes:

> I will have a look at those tests. The recent fixes were not
> correctness, but performance fixes.
> Nevertheless, we used to pass the mpich tests, but I admit that it is
> not a testsuite that we run regularly, I will have a look at them. The
> atomicity tests are expected to fail, since this the one chapter of
> MPI I/O that is not implemented in ompio.

How should we know that's expected to fail?  It at least shouldn't fail
like that; set_atomicity doesn't return an error (which the test is
prepared for on a filesystem like pvfs2).  I assume doing nothing, but
appearing to, can lead to corrupt data, and I'm surprised that isn't
being seen already.

HDF5 requires atomicity -- at least to pass its tests -- so presumably
anyone like us who needs it should use something mpich-based with recent
or old romio, and that sounds like most general HPC systems.  Am I
missing something?

With the current romio everything I tried worked, but we don't get that
option with openmpi.


Re: [OMPI users] 4.1 mpi-io test failures on lustre

2021-01-14 Thread Gabriel, Edgar via users
I will have a look at those tests. The recent fixes were not correctness, but 
performance fixes.
Nevertheless, we used to pass the mpich tests, but I admit that it is not a 
testsuite that we run regularly, I will have a look at them. The atomicity 
tests are expected to fail, since this the one chapter of MPI I/O that is not 
implemented in ompio.

Thanks
Edgar

-Original Message-
From: users  On Behalf Of Dave Love via users
Sent: Thursday, January 14, 2021 5:46 AM
To: users@lists.open-mpi.org
Cc: Dave Love 
Subject: [OMPI users] 4.1 mpi-io test failures on lustre

I tried mpi-io tests from mpich 4.3 with openmpi 4.1 on the ac922 system that I 
understand was used to fix ompio problems on lustre.  I'm puzzled that I still 
see failures.

I don't know why there are disjoint sets in mpich's test/mpi/io and 
src/mpi/romio/test, but I ran all the non-Fortran ones with MCA io defaults 
across two nodes.  In src/mpi/romio/test, atomicity failed (ignoring error and 
syshints); in test/mpi/io, the failures were setviewcur, tst_fileview, 
external32_derived_dtype, i_bigtype, and i_setviewcur.  tst_fileview was 
probably killed by the 100s timeout.

It may be that some are only appropriate for romio, but no-one said so before 
and they presumably shouldn't segv or report libc errors.

I built against ucx 1.9 with cuda support.  I realize that has problems on 
ppc64le, with no action on the issue, but there's a limit to what I can do.  
cuda looks relevant since one test crashes while apparently trying to register 
cuda memory; that's presumably not ompio's fault, but we need cuda.


[OMPI users] 4.1 mpi-io test failures on lustre

2021-01-14 Thread Dave Love via users
I tried mpi-io tests from mpich 4.3 with openmpi 4.1 on the ac922 system
that I understand was used to fix ompio problems on lustre.  I'm puzzled
that I still see failures.

I don't know why there are disjoint sets in mpich's test/mpi/io and
src/mpi/romio/test, but I ran all the non-Fortran ones with MCA io
defaults across two nodes.  In src/mpi/romio/test, atomicity failed
(ignoring error and syshints); in test/mpi/io, the failures were
setviewcur, tst_fileview, external32_derived_dtype, i_bigtype, and
i_setviewcur.  tst_fileview was probably killed by the 100s timeout.

It may be that some are only appropriate for romio, but no-one said so
before and they presumably shouldn't segv or report libc errors.

I built against ucx 1.9 with cuda support.  I realize that has problems
on ppc64le, with no action on the issue, but there's a limit to what I
can do.  cuda looks relevant since one test crashes while apparently
trying to register cuda memory; that's presumably not ompio's fault, but
we need cuda.