What file system are you running your code on ? And is the same directory
shared across all nodes? I have seen this error if users try to use a
non-shared directory for MPI I/O operations ( e.g. /tmp which is a different
drive/folder on each node).
Thanks
Edgar
-Original Message-
From
collective I/O for
example).
-Original Message-
From: users On Behalf Of Gabriel, Edgar via
users
Sent: Thursday, September 23, 2021 5:31 PM
To: Eric Chamberland ; Open MPI Users
Cc: Gabriel, Edgar ; Louis Poirel
; Vivien Clauzon
Subject: Re: [OMPI users] Status of pNFS, CephFS and MPI I/O
data in order to improve performance of
their code, since most I/O patterns do not require this super-strict locking
behavior. This is the fs_ufs_lock_algorithm parameter.
Thanks
Edgar
Thanks,
Eric
On 2021-09-23 1:57 p.m., Gabriel, Edgar wrote:
> Eric,
>
> generally speaking, ompio
Eric,
generally speaking, ompio should be able to operate correctly on all file
systems that have support for POSIX functions. The generic ufs component is
for example being used on BeeGFS parallel file systems without problems, we
are using that on a daily basis. For GPFS, the only reason we
work on an update of the FAQ section.
-Original Message-
From: users On Behalf Of Dave Love via users
Sent: Monday, January 18, 2021 11:14 AM
To: Gabriel, Edgar via users
Cc: Dave Love
Subject: Re: [OMPI users] 4.1 mpi-io test failures on lustre
"Gabriel, Edgar via users&quo
I would like to correct one of my statements:
-Original Message-
From: users On Behalf Of Gabriel, Edgar via
users
Sent: Friday, January 15, 2021 7:58 AM
To: Open MPI Users
Cc: Gabriel, Edgar
Subject: Re: [OMPI users] 4.1 mpi-io test failures on lustre
> The entire infrastructure
-Original Message-
From: users On Behalf Of Dave Love via users
Sent: Friday, January 15, 2021 4:48 AM
To: Gabriel, Edgar via users
Cc: Dave Love
Subject: Re: [OMPI users] 4.1 mpi-io test failures on lustre
> How should we know that's expected to fail? It at least shouldn
I will have a look at those tests. The recent fixes were not correctness, but
performance fixes.
Nevertheless, we used to pass the mpich tests, but I admit that it is not a
testsuite that we run regularly, I will have a look at them. The atomicity
tests are expected to fail, since this the one c
the reason for potential performance issues on NFS are very different from
Lustre. Basically, depending on your use-case and the NFS configuration, you
have to enforce different locking policy to ensure correct output files. The
default value for chosen for ompio is the most conservative setting
I will have a look at the t_bigio tests on Lustre with ompio. We had from
collaborators some reports about the performance problems similar to the one
that you mentioned here (which was the reason we were hesitant to make ompio
the default on Lustre), but part of the problem is that we were not
the --with-lustre option twice, once inside of the
"--with-io-romio-flags=" (along with the option that you provided), and once
outside (for ompio).
Thanks
Edgar
-Original Message-
From: Mark Dixon
Sent: Monday, November 16, 2020 8:19 AM
To: Gabriel, Edgar via users
Cc: Gabr
this is in theory still correct, the default MPI I/O library used by Open MPI
on Lustre file systems is ROMIO in all release versions. That being said, ompio
does have support for Lustre as well starting from the 2.1 series, so you can
use that as well. The main reason that we did not switch to
the ompio software infrastructure has multiple frameworks.
fs framework: abstracts out file system level operations (open, close, etc)
fbtl framework: provides the abstractions and implementations of *individual*
file I/O operations (seek,read,write, iread,iwrite)
fcoll framework: provides the
Your code looks correct, and based on your output I would actually suspect that
the I/O part finished correctly, the error message that you see is not an IO
error, but from the btl (which is communication related).
What version of Open MPI are using, and on what file system?
Thanks
Edgar
-
achieved in this
scenario, as long as the number of processes are moderate.
Thanks
Edgar
From: Dong-In Kang
Sent: Monday, April 6, 2020 9:34 AM
To: Collin Strassburger
Cc: Open MPI Users ; Gabriel, Edgar
Subject: Re: [OMPI users] Slow collective MPI File IO
Hi Collin,
It is written in C.
So, I
Hi,
A couple of comments. First, if you use MPI_File_write_at, this is usually not
considered collective I/O, even if executed by multiple processes.
MPI_File_write_at_all would be collective I/O.
Second, MPI I/O can not do ‘magic’, but is bound by hardware that you are
providing. If already a
ompio only added recently support for gpfs, and its only available in master
(so far). If you are using any of the released versions of Open MPI (2.x, 3.x,
4.x) you will not find this feature in ompio yet. Thus, the issue is only how
to disable gpfs in romio. I could not find right away an optio
How is the performance if you leave a few cores for the OS, e,g. running with
60 processes instead of 64? Reasoning being that the file read operation is
really executed by the OS, and could potentially be quite resource intensive.
Thanks
Edgar
From: users On Behalf Of Ali Cherry via users
Sen
I am not an expert for the one-sided code in Open MPI, I wanted to comment
briefly on the potential MPI -IO related item. As far as I can see, the error
message
“Read -1, expected 48, errno = 1”
does not stem from MPI I/O, at least not from the ompio library. What file
system did you use for t
Orion,
It might be a good idea. This bug is triggered from the fcoll/two_phase
component (and having spent just two minutes in looking at it, I have a
suspicion what triggers it, namely in int vs. long conversion issue), so it is
probably unrelated to the other one.
I need to add running the ne
Never mind, I see it in the backtrace :-)
Will look into it, but am currently traveling. Until then, Gilles suggestion is
probably the right approach.
Thanks
Edgar
> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gabriel,
> Edgar via use
Orion,
I will look into this problem, is there a specific code or testcase that
triggers this problem?
Thanks
Edgar
> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Orion
> Poplawski via users
> Sent: Thursday, October 24, 2019 11:56 PM
> To: Open
nt: Thursday, February 21, 2019 1:59 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> 3.1.3
>
>
> > On Feb 21, 2019, at 2:52 PM, Gabriel, Edgar
> wrote:
> >
> >> -Original Message-
> >
gt;>> --
> >>> ____
> >>> || \\UTGERS,
> >>> |---*O*---
> >>> ||_// the State| Ryan Novosielski - novos...@rutgers.edu
> >>> || \\ University | Sr. Techno
| Ryan Novosielski - novos...@rutgers.edu
> > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
> Campus
> > || \\of NJ | Office of Advanced Research Computing - MSB C630,
> Newark
> > `'
> >
> >> On Feb 18, 2019, at
pi.org] On Behalf Of
> Gabriel, Edgar
> Sent: Sunday, February 17, 2019 10:34 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> 3.1.3
>
> I will also run our testsuite and the HDF5 testsuite on GPFS, I have access
>
, 2019 at 3:08 AM Ryan Novosielski
> > wrote:
> >>
> >> I verified that it makes it through to a bash prompt, but I’m a
> >> little less confident that something make test does doesn’t clear it.
> >> Any recommendation for a way to verify?
> >>
>
What file system are you running on?
I will look into this, but it might be later next week. I just wanted to
emphasize that we are regularly running the parallel hdf5 tests with ompio, and
I am not aware of any outstanding items that do not work (and are supposed to
work). That being said, I r
Gilles submitted a patch for that, and I approved it a couple of days back, I
*think* it has not been merged however. This was a bug in the Open MPI Lustre
configure logic, should be fixed after this one however.
https://github.com/open-mpi/ompi/pull/6080
Thanks
Edgar
> -Original Message--
Dave,
Thank you for your detailed report and testing, that is indeed very helpful. We
will definitely have to do something.
Here is what I think would be potentially doable.
a) if we detect a Lustre file system without flock support, we can printout an
error message. Completely disabling MPI I/O
> -Original Message-
> From: Dave Love [mailto:dave.l...@manchester.ac.uk]
> Sent: Wednesday, October 10, 2018 3:46 AM
> To: Gabriel, Edgar
> Cc: Open MPI Users
> Subject: Re: [OMPI users] ompio on Lustre
>
> "Gabriel, Edgar" writes:
>
> > Ok,
in place, but we
didn't have the manpower (and requests to be honest) to finish that work.
Thanks
Edgar
> -Original Message-
> From: Dave Love [mailto:dave.l...@manchester.ac.uk]
> Sent: Tuesday, October 9, 2018 7:05 AM
> To: Gabriel, Edgar
> Cc: Open MPI Users
Hm, thanks for the report, I will look into this. I did not run the romio
tests, but the hdf5 tests are run regularly and with 3.1.2 you should not have
any problems on a regular unix fs. How many processes did you use, and which
tests did you run specifically? The main tests that I execute from
It was originally for performance reasons, but this should be fixed at this
point. I am not aware of correctness problems.
However, let me try to clarify your question about: What do you precisely mean
by "MPI I/O on Lustre mounts without flock"? Was the Lustre filesystem mounted
without flock?
34 matches
Mail list logo