>>> We have found that when trying to use pvfs with romio under openmpi,
>>> we are getting errors when the task count is bigger than 128, using
>>> 1MB messages.  Smaller message sizes and larger task counts also cause
>>> the same error to be generated, just not as consistently or quickly.
>>> Errors that we see look like:
>>>
>>> [E 15:05:50.012128] job_time_mgr_expire: job time out: cancelling
bmi operation, job_id: 34.
>> [E 15:05:50.012380] msgpair failed, will retry: Operation cancelled (possibly due to timeout)

>>Just want to understand your workload a bit:
>>You are doing a collective write with 128 processes each writing 1MB,
right?

>The code is not using collective writes.

>>> Writing to an NFS mounted file system instead of PVFS, works fine even
>>> with 256 tasks.
>>> Our version of PVFS is 2.6.2.  Both openmpi 1.1.x and 1.2 produce the
>>> same errors.  Any known limitations with romio and PVFS?
>>> We can supply you with a test code if you are interested in reproducing
>>> the problem. The code should compile well with mpich as well as openmpi.

>>Go ahead and send the test code, but it really looks like you are
>>pushing the servers hard and hitting a timeout.  How many servers do
>>you have for this many clients?  PVFS should be smarter about such a
>>situation, but could you check something for us?  In your fs.conf,
>>what is the value of ServerJobBMITimeoutSecs ?

>>http://www.pvfs.org/pvfs2-options.html#ServerJobBMITimeoutSecs

>>If you increase that value to, say, 3600, we can ensure the timeouts
>>won't get triggered.

>I have a few other ideas, but let's try this one first.

>>==rob

>>--
>>Rob Latham
>>Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
>>Argonne National Lab, IL USA                 B29D F333 664A 4280 315B

>For this PVFS file system, we are using 8 I/O servers and one meta data
>server.  I have adjusted the value of ServerJobBMITimeoutSecs on all
>the servers involved.  They had the default value of 30.  I will try
>to schedule an interrupt later today, to restart the pvfs2-server
>processes.  I will let you know how the next test goes after this.

We finally got the PVFS server processes restarted today with the new timeout values (ServerJobBMITimeoutSecs set to 3600). It did not seem to make any difference:

[E 10:02:38.455116] job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 156.
[E 10:02:39.102364] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 389.
[E 10:02:39.186147] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 392.
[E 10:02:39.238168] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 384.
[E 10:02:39.275503] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 450.
[E 10:02:39.407072] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 384.

This is when trying to use romio from a 468-way job.

Jan

>Attached is the test code.  The tar-ball contains two subdirectories,
>utilities and mpi_io_test.  You need to cd into mpi_io_test/src.  Here
>you'll find a README file, which describes the problems we see on our
>cluster, specifics about our sw env., how to build the code and how to
>run the code.

Jan Lindheim
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to