On 02/07/2013 05:29 PM, Randall Martin wrote:
Thanks for the patch.  I'll merge it in with my patch.

By the way Randy, when do you expect to have patches ready ?
Is it a matter of days, of month ? Just to have a rough idea,

Thanks in advance,

yves


-Randy

On 2/7/13 11:25 AM, "Michael Robbert" <[email protected]> wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Randy,
As long as you're working on IB patches. I just remembered that I had
to apply a patch before I could get 2.8.7 to build on my CentOS 5
machines running their stock IB stack.

- --- src/io/bmi/bmi_ib/openib.c.orig     2013-01-10 15:47:52.000000000
- -0700
+++ src/io/bmi/bmi_ib/openib.c  2013-01-10 15:37:59.000000000 -0700
@@ -745,7 +745,9 @@
#ifdef HAVE_IBV_EVENT_CLIENT_REREGISTER
        CASE(IBV_EVENT_CLIENT_REREGISTER);
#endif
+#ifdef HAVE_IBV_EVENT_GID_CHANGE
        CASE(IBV_EVENT_GID_CHANGE);
+#endif
     }
     return s;
}

The issue was brought up in a thread on this list last summer, but I
never saw a final resolution and if there was one it apparently didn't
make it into 2.8.7

Thanks,
Mike Robbert
Colorado School of Mines

On 2/7/13 8:20 AM, Randall Martin wrote:
I'm working on a set of patches for the IB support.  There are
several issues I'm working through on the patches before I commit
them.  I'll send you a copy when I have them ready for release so
you can test them.


-Randy


On 2/7/13 8:54 AM, "Yves Revaz" <[email protected]> wrote:

On 10/18/2012 11:41 PM, Kyle Schochenmaier wrote:
Hi Yves -

How frequently do you see these warnings?  Does it cause any
servers/clients to hang?
Hi Kyle and the list,

In a previous mail, I was mentioning the following errors:

[E 02/07/2013 14:39:24] Warning: encourage_recv_incoming: mop_id
d0e680 in RTS_DONE message not found. [E 02/07/2013 14:39:54]
job_time_mgr_expire: job time out: cancelling flow operation,
job_id: 17549115350. [E 02/07/2013 14:39:54]
fp_multiqueue_cancel: flow proto cancel called on 0x1bce5e0 [E
02/07/2013 14:39:54] fp_multiqueue_cancel: I/O error occurred [E
02/07/2013 14:39:54] handle_io_error: flow proto error cleanup
started on 0x1bce5e0: Operation cancelled (possibly due to
timeout) [E 02/07/2013 14:39:54] handle_io_error: flow proto
0x1bce5e0 canceled 1 operations, will clean up. [E 02/07/2013
14:39:54] bmi_recv_callback_fn: I/O error occurred [E 02/07/2013
14:39:54] handle_io_error: flow proto 0x1bce5e0 error cleanup
finished: Operation cancelled (possibly due to timeout)

In fact, I'm trying to move 10Tb of data in our pvfs, using and
rsync. When a lot of data are transfered, those errors occurs
very frequently, about every 5 minutes, which is very annoying.

I've checked our IB network which is perfectly sane. I'm
currently using orangefs-2.8.6/. Should I move to 2.8.7 ? Looking
at the changelog of the 2.8.7 realease, I don't thinks IB related
problems have been fixed.

Thanks,

yves














If not common/destructive this could be that there was a simple
error case on the infiniband fabric and that the operation
timed out in pvfs and that can be readily ignored as it would
be retransmitted eventually.

If you see this a lot it may be one of a few issues that we've
fixed in recent releases, which version of orangefs/pvfs are
you using? ~Kyle

Kyle Schochenmaier


On Thu, Oct 18, 2012 at 4:31 PM, Becky
Ligon<[email protected]>  wrote:
Yves:

The timeouts that you listed below are in the configuration
file.

ClientJobBMITimeoutSecs 300 - The client's job scheduler
limits each "job" sent across the network to this timeout.
If the job exceeds this limit, the job is cancelled.
Depending on the request, the job may be retried. Keep in
mind that one PVFS request can be made up of many jobs.

ClientJobFlowTimeoutSecs - This value limits the time spent
on a particular job called a flow.  A flow is used to
transfer data across the network to a server or to transfer
data from a server to the client.    Again, if the flow
exceeds this timeout, then the flow is cancelled.

The server counterparts for these settings are rarely used,
since the server doesn't normally initiate reads or writes.

I think your real problem has something to do with IB, but I
am not an expert in that area.  I have cc'd Kyle
Schochenmaier to see if he can help.

Becky



On Thu, Oct 18, 2012 at 4:07 PM, Yves
Revaz<[email protected]>  wrote:
Dear list,

I sometimes have the following error occuring in my pvfs
server log.

[E 10/18/2012 20:59:50] Warning: encourage_recv_incoming:
mop_id 150c320 in RTS_DONE message not found. [E 10/18/2012
21:00:50] job_time_mgr_expire: job time out: cancelling
flow operation, job_id: 33307291. [E 10/18/2012 21:00:50]
fp_multiqueue_cancel: flow proto cancel called on 0xf18c80
[E 10/18/2012 21:00:50] fp_multiqueue_cancel: I/O error
occurred [E 10/18/2012 21:00:50] handle_io_error: flow
proto error cleanup started on 0xf18c80: Operation
cancelled (possibly due to timeout) [E 10/18/2012 21:00:50]
handle_io_error: flow proto 0xf18c80 canceled 1 operations,
will clean up. [E 10/18/2012 21:00:50]
bmi_recv_callback_fn: I/O error occurred [E 10/18/2012
21:00:50] handle_io_error: flow proto 0xf18c80 error
cleanup finished: Operation cancelled (possibly due to
time


Looking at the mailing list, I've found that increasing
these default value (300)

ServerJobBMITimeoutSecs 30 ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 300 ClientJobFlowTimeoutSecs 300

to 600.

What is at the origin of these  timeout ?

Thanks,


yves





-- (o o)
--------------------------------------------oOO--(_)--OOo-------


Dr. Yves Revaz
Laboratory of Astrophysics EPFL

Observatoire de Sauverny     Tel : ++ 41 22 379 24 28 51.
Ch. des Maillettes       Fax : ++ 41 22 379 22 05 1290
Sauverny             e-mail : [email protected]
SWITZERLAND                  Web :
http://www.lunix.ch/revaz/
----------------------------------------------------------------



_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users



- --
Becky Ligon OrangeFS Support and Development Omnibond
Systems Anderson, South Carolina



--

----------------------------------------------------------------
Dr. Yves Revaz Laboratory of Astrophysics Ecole Polytechnique
F←d←rale de Lausanne (EPFL) Observatoire de Sauverny     Tel : ++
41 22 379 24 28 51. Ch. des Maillettes       Fax : ++ 41 22 379
22 05 1290 Sauverny             e-mail : [email protected]
SWITZERLAND                  Web : http://www.lunix.ch/revaz/
----------------------------------------------------------------

_______________________________________________ Pvfs2-users
mailing list [email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users


_______________________________________________ Pvfs2-users mailing
list [email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJRE9VcAAoJEFmgPOBxQDtBEYMIAJtgo1LMWxVtyWPa2PNvWr2c
NMUw30GNJ2llhwJVdefpmNqPLdou0Sqr7moAPseA2qYBguER1jqSH0rnXg7yE5TX
CNERJwaL4+99y+tRsvKukrEvegrS/CQ5tUPsiuFaqqcTlQRGYeGPtqJV3JuAsEa2
bu49sN7yWFtM2fY0ZaFa2ouya6PR2mFAdH0ZnpcWr4OTY1Uf4py8njWvvWrMCB/2
I3//H5RoOxhCBIe85RCdXbMh4LMQbwBeTYFePlutE7YplbrQwDLg/K4/ctswRl3T
oKpRy5GJ83LJQomhwWWjAAnWWXe6zNlbiGe/B5APrlgZfV960shxFPeWwej3EEk=
=iXn7
-----END PGP SIGNATURE-----
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users


--
                                                 (o o)
--------------------------------------------oOO--(_)--OOo-------
  Dr. Yves Revaz
  Laboratory of Astrophysics EPFL
  Observatoire de Sauverny     Tel : ++ 41 22 379 24 28
  51. Ch. des Maillettes       Fax : ++ 41 22 379 22 05
  1290 Sauverny             e-mail : [email protected]
  SWITZERLAND                  Web : http://www.lunix.ch/revaz/
----------------------------------------------------------------

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to