Thanks for the patch. I'll merge it in with my patch. -Randy
On 2/7/13 11:25 AM, "Michael Robbert" <[email protected]> wrote: >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > >Randy, >As long as you're working on IB patches. I just remembered that I had >to apply a patch before I could get 2.8.7 to build on my CentOS 5 >machines running their stock IB stack. > >- --- src/io/bmi/bmi_ib/openib.c.orig 2013-01-10 15:47:52.000000000 >- -0700 >+++ src/io/bmi/bmi_ib/openib.c 2013-01-10 15:37:59.000000000 -0700 >@@ -745,7 +745,9 @@ > #ifdef HAVE_IBV_EVENT_CLIENT_REREGISTER > CASE(IBV_EVENT_CLIENT_REREGISTER); > #endif >+#ifdef HAVE_IBV_EVENT_GID_CHANGE > CASE(IBV_EVENT_GID_CHANGE); >+#endif > } > return s; > } > >The issue was brought up in a thread on this list last summer, but I >never saw a final resolution and if there was one it apparently didn't >make it into 2.8.7 > >Thanks, >Mike Robbert >Colorado School of Mines > >On 2/7/13 8:20 AM, Randall Martin wrote: >> I'm working on a set of patches for the IB support. There are >> several issues I'm working through on the patches before I commit >> them. I'll send you a copy when I have them ready for release so >> you can test them. >> >> >> -Randy >> >> >> On 2/7/13 8:54 AM, "Yves Revaz" <[email protected]> wrote: >> >>> On 10/18/2012 11:41 PM, Kyle Schochenmaier wrote: >>>> Hi Yves - >>>> >>>> How frequently do you see these warnings? Does it cause any >>>> servers/clients to hang? >>> >>> Hi Kyle and the list, >>> >>> In a previous mail, I was mentioning the following errors: >>> >>> [E 02/07/2013 14:39:24] Warning: encourage_recv_incoming: mop_id >>> d0e680 in RTS_DONE message not found. [E 02/07/2013 14:39:54] >>> job_time_mgr_expire: job time out: cancelling flow operation, >>> job_id: 17549115350. [E 02/07/2013 14:39:54] >>> fp_multiqueue_cancel: flow proto cancel called on 0x1bce5e0 [E >>> 02/07/2013 14:39:54] fp_multiqueue_cancel: I/O error occurred [E >>> 02/07/2013 14:39:54] handle_io_error: flow proto error cleanup >>> started on 0x1bce5e0: Operation cancelled (possibly due to >>> timeout) [E 02/07/2013 14:39:54] handle_io_error: flow proto >>> 0x1bce5e0 canceled 1 operations, will clean up. [E 02/07/2013 >>> 14:39:54] bmi_recv_callback_fn: I/O error occurred [E 02/07/2013 >>> 14:39:54] handle_io_error: flow proto 0x1bce5e0 error cleanup >>> finished: Operation cancelled (possibly due to timeout) >>> >>> In fact, I'm trying to move 10Tb of data in our pvfs, using and >>> rsync. When a lot of data are transfered, those errors occurs >>> very frequently, about every 5 minutes, which is very annoying. >>> >>> I've checked our IB network which is perfectly sane. I'm >>> currently using orangefs-2.8.6/. Should I move to 2.8.7 ? Looking >>> at the changelog of the 2.8.7 realease, I don't thinks IB related >>> problems have been fixed. >>> >>> Thanks, >>> >>> yves >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>> If not common/destructive this could be that there was a simple >>>> error case on the infiniband fabric and that the operation >>>> timed out in pvfs and that can be readily ignored as it would >>>> be retransmitted eventually. >>>> >>>> If you see this a lot it may be one of a few issues that we've >>>> fixed in recent releases, which version of orangefs/pvfs are >>>> you using? ~Kyle >>>> >>>> Kyle Schochenmaier >>>> >>>> >>>> On Thu, Oct 18, 2012 at 4:31 PM, Becky >>>> Ligon<[email protected]> wrote: >>>>> Yves: >>>>> >>>>> The timeouts that you listed below are in the configuration >>>>> file. >>>>> >>>>> ClientJobBMITimeoutSecs 300 - The client's job scheduler >>>>> limits each "job" sent across the network to this timeout. >>>>> If the job exceeds this limit, the job is cancelled. >>>>> Depending on the request, the job may be retried. Keep in >>>>> mind that one PVFS request can be made up of many jobs. >>>>> >>>>> ClientJobFlowTimeoutSecs - This value limits the time spent >>>>> on a particular job called a flow. A flow is used to >>>>> transfer data across the network to a server or to transfer >>>>> data from a server to the client. Again, if the flow >>>>> exceeds this timeout, then the flow is cancelled. >>>>> >>>>> The server counterparts for these settings are rarely used, >>>>> since the server doesn't normally initiate reads or writes. >>>>> >>>>> I think your real problem has something to do with IB, but I >>>>> am not an expert in that area. I have cc'd Kyle >>>>> Schochenmaier to see if he can help. >>>>> >>>>> Becky >>>>> >>>>> >>>>> >>>>> On Thu, Oct 18, 2012 at 4:07 PM, Yves >>>>> Revaz<[email protected]> wrote: >>>>>> >>>>>> Dear list, >>>>>> >>>>>> I sometimes have the following error occuring in my pvfs >>>>>> server log. >>>>>> >>>>>> [E 10/18/2012 20:59:50] Warning: encourage_recv_incoming: >>>>>> mop_id 150c320 in RTS_DONE message not found. [E 10/18/2012 >>>>>> 21:00:50] job_time_mgr_expire: job time out: cancelling >>>>>> flow operation, job_id: 33307291. [E 10/18/2012 21:00:50] >>>>>> fp_multiqueue_cancel: flow proto cancel called on 0xf18c80 >>>>>> [E 10/18/2012 21:00:50] fp_multiqueue_cancel: I/O error >>>>>> occurred [E 10/18/2012 21:00:50] handle_io_error: flow >>>>>> proto error cleanup started on 0xf18c80: Operation >>>>>> cancelled (possibly due to timeout) [E 10/18/2012 21:00:50] >>>>>> handle_io_error: flow proto 0xf18c80 canceled 1 operations, >>>>>> will clean up. [E 10/18/2012 21:00:50] >>>>>> bmi_recv_callback_fn: I/O error occurred [E 10/18/2012 >>>>>> 21:00:50] handle_io_error: flow proto 0xf18c80 error >>>>>> cleanup finished: Operation cancelled (possibly due to >>>>>> time >>>>>> >>>>>> >>>>>> Looking at the mailing list, I've found that increasing >>>>>> these default value (300) >>>>>> >>>>>> ServerJobBMITimeoutSecs 30 ServerJobFlowTimeoutSecs 30 >>>>>> ClientJobBMITimeoutSecs 300 ClientJobFlowTimeoutSecs 300 >>>>>> >>>>>> to 600. >>>>>> >>>>>> What is at the origin of these timeout ? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> >>>>>> yves >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- (o o) >>>>>> --------------------------------------------oOO--(_)--OOo------- >>>>>> >>>>>> >Dr. Yves Revaz >>>>>> Laboratory of Astrophysics EPFL >>>>>> >>>>>> Observatoire de Sauverny Tel : ++ 41 22 379 24 28 51. >>>>>> Ch. des Maillettes Fax : ++ 41 22 379 22 05 1290 >>>>>> Sauverny e-mail : [email protected] >>>>>> SWITZERLAND Web : >>>>>> http://www.lunix.ch/revaz/ >>>>>> ---------------------------------------------------------------- >>>>>> >>>>>> >>>>>> >_______________________________________________ >>>>>> Pvfs2-users mailing list >>>>>> [email protected] >>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >>>>> >>>>> >>>>> >>>>> >>>>>> >- -- >>>>> Becky Ligon OrangeFS Support and Development Omnibond >>>>> Systems Anderson, South Carolina >>>>> >>>>> >>> >>> >>> -- >>> >>> ---------------------------------------------------------------- >>> Dr. Yves Revaz Laboratory of Astrophysics Ecole Polytechnique >>> F←d←rale de Lausanne (EPFL) Observatoire de Sauverny Tel : ++ >>> 41 22 379 24 28 51. Ch. des Maillettes Fax : ++ 41 22 379 >>> 22 05 1290 Sauverny e-mail : [email protected] >>> SWITZERLAND Web : http://www.lunix.ch/revaz/ >>> ---------------------------------------------------------------- >>> >>> _______________________________________________ Pvfs2-users >>> mailing list [email protected] >>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >> >> >> >> _______________________________________________ Pvfs2-users mailing >> list [email protected] >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >> >-----BEGIN PGP SIGNATURE----- >Version: GnuPG/MacGPG2 v2.0.19 (Darwin) >Comment: GPGTools - http://gpgtools.org > >iQEcBAEBAgAGBQJRE9VcAAoJEFmgPOBxQDtBEYMIAJtgo1LMWxVtyWPa2PNvWr2c >NMUw30GNJ2llhwJVdefpmNqPLdou0Sqr7moAPseA2qYBguER1jqSH0rnXg7yE5TX >CNERJwaL4+99y+tRsvKukrEvegrS/CQ5tUPsiuFaqqcTlQRGYeGPtqJV3JuAsEa2 >bu49sN7yWFtM2fY0ZaFa2ouya6PR2mFAdH0ZnpcWr4OTY1Uf4py8njWvvWrMCB/2 >I3//H5RoOxhCBIe85RCdXbMh4LMQbwBeTYFePlutE7YplbrQwDLg/K4/ctswRl3T >oKpRy5GJ83LJQomhwWWjAAnWWXe6zNlbiGe/B5APrlgZfV960shxFPeWwej3EEk= >=iXn7 >-----END PGP SIGNATURE----- >_______________________________________________ >Pvfs2-users mailing list >[email protected] >http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
