Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB
Quoting r. Or Gerlitz <[EMAIL PROTECTED]>: > Subject: getting LOC_QP_OP_ERR with IPoIB > > Hi, > > While doing some work to have linux bonding driver be able to work on top > of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error > 62. > > ib0: failed send event (status=2, wrid=52 vend_err 62) > > What does this vendor error means? its the same system over which i saw the > qp modify error. vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD mismatched -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] OpenSM - guid2lid cache file questions
Hi list, I have a question regarding the guid2lid cache file. The file is read by OpenSM on the start up. OpenSM may reassign LIDs according to the LIDs saved in this file. It isn't always acceptable. Is it a right policy? Am I missing anything here? Is there a way to disable the file reading on start up? Regards, Leonid ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB
Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz <[EMAIL PROTECTED]>: >> While doing some work to have linux bonding driver be able to work on top >> of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error >> 62. >> ib0: failed send event (status=2, wrid=52 vend_err 62) >> What does this vendor error means? its the same system over which i saw the >> qp modify error. > vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD > mismatched Thanks. So what's your thinking, am i running into some ipoib bogus scenario? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB
Quoting r. Or Gerlitz <[EMAIL PROTECTED]>: > Subject: Re: getting LOC_QP_OP_ERR with IPoIB > > Michael S. Tsirkin wrote: > > Quoting r. Or Gerlitz <[EMAIL PROTECTED]>: > > >> While doing some work to have linux bonding driver be able to work on top > >> of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) > >> error 62. > >>ib0: failed send event (status=2, wrid=52 vend_err 62) > >> What does this vendor error means? its the same system over which i saw > >> the qp modify error. > > > > vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD > > mismatched > > Thanks. > > So what's your thinking, am i running into some ipoib bogus scenario? > > Or. Donnu, it looks really weird. Could you try firmware 3.5.0 please? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM - guid2lid cache file questions
Hi Leonid, On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote: > Hi list, > > I have a question regarding the guid2lid cache file. > > The file is read by OpenSM on the start up. > OpenSM may reassign LIDs according to the LIDs saved in this file. > It isn't always acceptable. > > Is it a right policy? Am I missing anything here? > Is there a way to disable the file reading on start up? There is the -r (--reassign_lids) option for this but it is not the default behavior of OpenSM. -- Hal > > Regards, >Leonid > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] MPI Brodcast doubt
John, On Mon, 2006-09-04 at 08:56, john t wrote: > Hi, > > I have 3 nodes connected via IB as shown below: > > node1 ---> switch1 ---> node2 > |--> node3 > > If node1 sends a brodcast message to node2 and node3, I want to know > if the message is delivered to the switch twice (first time for node2 > and second time for node3) or just once (where switch will know by > looking at some headers or so that its a brodcast message and will > send it on all the outgoing ports) ? Assuming nodes 1, 2, and 3 are part of the same multicast group, the multicast send is sent once from node 1. When received at the switch, it is replicated to all ports which have members in the same group (in this case, nodes 2 and 3). The switch knows by the header (specifically the LRH:DLID which is a multicast LID) and uses the MulticastForwardingTable to determine on which ports to forward it. However, IB multicast is unreliable so to create reliable multicast, it is sometimes "emulated" in that the sender tracks the group members and may use serial unicast sends or augment a multicast send with unicast sends to the receivers and track their acknowledgements of receipt. -- Hal > Regards, > John T. > > __ > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM - guid2lid cache file questions
Hi Hal, Thank you for your reply. Probably I wasn't clear. I have a problem when OpenSM, being started, reads an out-if-date guid2lid file. OpenSM changes LIDs in this case. I don't want the LIDs to be changed. As I understand it, the '-r' option, on the contrary, causes the SM to reassign all the LIDs. I could just remove the file to handle the problem. I'd like to know if there is a way to do it without touching the file. Thanks, Leonid On 05 Sep 2006 06:57:53 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > Hi Leonid, > > On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote: > > Hi list, > > > > I have a question regarding the guid2lid cache file. > > > > The file is read by OpenSM on the start up. > > OpenSM may reassign LIDs according to the LIDs saved in this file. > > It isn't always acceptable. > > > > Is it a right policy? Am I missing anything here? > > Is there a way to disable the file reading on start up? > > There is the -r (--reassign_lids) option for this but it is not the > default behavior of OpenSM. > > -- Hal > > > > > Regards, > >Leonid > > > > ___ > > openib-general mailing list > > openib-general@openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] MPI Brodcast doubt
Hal Rosenstock wrote: > John, > > On Mon, 2006-09-04 at 08:56, john t wrote: > >> Hi, >> >> I have 3 nodes connected via IB as shown below: >> >> node1 ---> switch1 ---> node2 >> |--> node3 >> >> If node1 sends a brodcast message to node2 and node3, I want to know >> if the message is delivered to the switch twice (first time for node2 >> and second time for node3) or just once (where switch will know by >> looking at some headers or so that its a brodcast message and will >> send it on all the outgoing ports) ? >> > > Assuming nodes 1, 2, and 3 are part of the same multicast group, the > multicast send is sent once from node 1. When received at the switch, it > is replicated to all ports which have members in the same group (in this > case, nodes 2 and 3). The switch knows by the header (specifically the > LRH:DLID which is a multicast LID) and uses the MulticastForwardingTable > to determine on which ports to forward it. However, IB multicast is > unreliable so to create reliable multicast, it is sometimes "emulated" > in that the sender tracks the group members and may use serial unicast > sends or augment a multicast send with unicast sends to the receivers > and track their acknowledgements of receipt. > > -- Hal > All of the above is true for IB multicast (there isn't any broadcast in IB). If the question was "what happens when one send a message using MPI_broadcast?" then the answer will be: it depends on the MPI implementation. I know that in MVAPICH the MPI handles the duplications by itself by default (and the switch will get two messages and not one). There is an option in that MPI to use IB multicast but it is disabled by default. Dotan ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM - guid2lid cache file questions
Hi Leonid, On Tue, 2006-09-05 at 08:11, Leonid Arsh wrote: > Hi Hal, > > Thank you for your reply. > > Probably I wasn't clear. > > I have a problem when OpenSM, being started, reads an out-if-date guid2lid > file. > OpenSM changes LIDs in this case. How do you know the file is "out of date" ? > I don't want the LIDs to be changed. Oh, it's the other way you were asking about. > As I understand it, the '-r' option, on the contrary, causes the SM to > reassign all the LIDs. > > I could just remove the file to handle the problem. or move it aside. > I'd like to know if there is a way to do it without touching the file. Not currently. There is the -x (--honor_guid2lid) which will do this (ignore the guid2lid file) when OpenSM is coming out of STANDBY though. -- Hal > Thanks, > Leonid > > On 05 Sep 2006 06:57:53 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > Hi Leonid, > > > > On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote: > > > Hi list, > > > > > > I have a question regarding the guid2lid cache file. > > > > > > The file is read by OpenSM on the start up. > > > OpenSM may reassign LIDs according to the LIDs saved in this file. > > > It isn't always acceptable. > > > > > > Is it a right policy? Am I missing anything here? > > > Is there a way to disable the file reading on start up? > > > > There is the -r (--reassign_lids) option for this but it is not the > > default behavior of OpenSM. > > > > -- Hal > > > > > > > > Regards, > > >Leonid > > > > > > ___ > > > openib-general mailing list > > > openib-general@openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > ___ > > openib-general mailing list > > openib-general@openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question
Michael S. Tsirkin wrote: > Donnu, it looks really weird. Could you try firmware 3.5.0 please? I just noted that you can not work with mstflint if the mthca driver is not loaded, i think it was not the case in the gen1 tools, am i correct. Is this connected to this print ACPI: PCI interrupt for device :02:00.0 disabled i see once the mthca driver is unloaded? Or. > dill:/tmp # modprobe -r ib_mthca > dill:/tmp # ./mstflint -d 00:02:00.0 q > *** ERROR *** Read a corrupted device id (0x). Probably HW/PCI access > problem > *** ERROR *** Device type 65535 not supported. > *** ERROR *** Can not get flash type using device 00:02:00.0 > dill:/tmp # modprobe ib_mthca > dill:/tmp # ./mstflint -d 00:02:00.0 q > Image type: Failsafe > I.S. Version:1 > Chip Revision: A1 > GUID Des:Node Port1Port2Sys image > GUIDs: 0008f104039651dc 0008f104039651dd 0008f104039651de > 0008f104039651df > Board ID: (VLT0010010001) > VSD: > PSID:VLT0010010001 > dill:/tmp # dmesg > ACPI: PCI interrupt for device :02:00.0 disabled > ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) > ib_mthca: Initializing :02:00.0 > PCI: Enabling device :02:00.0 (0110 -> 0112) > ACPI: PCI Interrupt :02:00.0[A] -> GSI 29 (level, low) -> IRQ 193 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] problems to regiser memory as a reglar
Dhabaleswar Panda wrote: > Christian - Thanks for sending instructions for running mvapich2-0.9.5 > to Tziporet. > > Tziporet - Thanks for looking into this problem on SLES9 environment. > > Please note that a detailed user guide for running and tuning MVAPICH2 > 0.9.5 is available from the following URL: > > http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html > > DK > Thanks to all, We found the bug that was in memory registration flow of SLES9 only. A fix will be available in OFED 1.1 RC4 Tziporet ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM - guid2lid cache file questions
Thanks, On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > I have a problem when OpenSM, being started, reads an out-if-date guid2lid > > file. > > OpenSM changes LIDs in this case. > > How do you know the file is "out of date" ? > Actually, the LIDs were assigned by another SM. When I start my new OpenSM, the old SM is already dead. Before starting the new OpenSM, the ibnetdiscover utility shows LIDs different from ones in the file. When I start OpenSM, the LIDs are reassigned on the fabric. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [Bug 131] working with huge pages may crash the kernel on Suse10
http://openib.org/bugzilla/show_bug.cgi?id=131 [EMAIL PROTECTED] changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #1 from [EMAIL PROTECTED] 2006-09-05 06:16 --- was fixed in 1.1-rc3 --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [Bug 145] IB Core unable to communicate IPoIB on Fedora Core 4
http://openib.org/bugzilla/show_bug.cgi?id=145 [EMAIL PROTECTED] changed: What|Removed |Added Status|NEW |RESOLVED Resolution||WONTFIX --- Comment #2 from [EMAIL PROTECTED] 2006-09-05 06:18 --- this is not a bug in OFED --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM - guid2lid cache file questions
Leonid, On Tue, 2006-09-05 at 09:13, Leonid Arsh wrote: > Thanks, > > On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > > I have a problem when OpenSM, being started, reads an out-if-date > > > guid2lid file. > > > OpenSM changes LIDs in this case. > > > > How do you know the file is "out of date" ? > > > Actually, the LIDs were assigned by another SM. Different (vendor) SMs have different LID assignment and pathing (routing) policies. It is inadvisable to failover across vendor SMs for this and other reasons. -- Hal > When I start my new OpenSM, the old SM is already dead. > Before starting the new OpenSM, the ibnetdiscover utility shows LIDs > different > from ones in the file. > When I start OpenSM, the LIDs are reassigned on the fabric. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] opensm: osm_log_init_v2() - new osm_log initializer
On Mon, 2006-09-04 at 13:20, Sasha Khapyorsky wrote: > There is new osm_log initializer osm_log_init_v2(), this is wrapped > by osm_log_init() in order to preserve existing API. > > Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]> Thanks. Applied (to trunk and 1.1). -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM - guid2lid cache file questions
Hi Leonid, The best approach when switching from another vendor SM to OpenSM is to delete the /var/cache/osm/guid2lid file. > -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Hal Rosenstock > Sent: Tuesday, September 05, 2006 4:18 PM > To: Leonid Arsh > Cc: openib-general@openib.org > Subject: Re: [openib-general] OpenSM - guid2lid cache file questions > > Leonid, > > On Tue, 2006-09-05 at 09:13, Leonid Arsh wrote: > > Thanks, > > > > On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > > > I have a problem when OpenSM, being started, reads an out-if-date > guid2lid file. > > > > OpenSM changes LIDs in this case. > > > > > > How do you know the file is "out of date" ? > > > > > Actually, the LIDs were assigned by another SM. > > Different (vendor) SMs have different LID assignment and pathing > (routing) policies. It is inadvisable to failover across vendor SMs for this and > other reasons. > > -- Hal > > > When I start my new OpenSM, the old SM is already dead. > > Before starting the new OpenSM, the ibnetdiscover utility shows LIDs > > different from ones in the file. > > When I start OpenSM, the LIDs are reassigned on the fabric. > > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question
Quoting r. Or Gerlitz <[EMAIL PROTECTED]>: > Subject: Re: getting LOC_QP_OP_ERR with IPoIB - mstflint question > > Michael S. Tsirkin wrote: > > Donnu, it looks really weird. Could you try firmware 3.5.0 please? > > I just noted that you can not work with mstflint if the mthca driver is > not loaded, i think it was not the case in the gen1 tools, am i correct. Yes, recent kernels disable device access once driver is unloaded: mstflint -d 08:00.0 q *** ERROR *** Read a corrupted device id (0x). Probably HW/PCI access problem *** ERROR *** Device type 65535 not supported. *** ERROR *** Can not get flash type using device 08:00.0 mstflint should work without driver using /proc: mstflint -d /proc/bus/pci/08/00.0 q Image type: Failsafe I.S. Version:1 Chip Revision: A0 In gen1 flint had a separate driver which you had to load. I am not sure whether this would work on 2.6.18 > Is this connected to this print > > ACPI: PCI interrupt for device :02:00.0 disabled > > i see once the mthca driver is unloaded? > > Or. Probably not. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] libibcm can't connect/talk to libicm on other machine.
Title: libibcm can't connect/talk to libicm on other machine. I’m still in the process of migrating my gen1 application to gen2. Actually I CAN connect a gen2 application to a gen2 listener application on the same machine but NOT to a gen 2 listener on another machine. Any hints where to look at? Is there anything in the architecture that might prevent a libibcm connection to another machine? I’m using an old Voltaire switch to connect the machines. Can this be the reason? The switch didn’t cause problems using gen1 clients. Thanks Thomas Bub ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] libibcm can't connect/talk to libicm on other machine.
Hi bub. Bub Thomas wrote: > > I’m still in the process of migrating my gen1 application to gen2. > > Actually I CAN connect a gen2 application to a gen2 listener > application on the same machine but NOT to a gen 2 listener on another > machine. > > Any hints where to look at? > > Is there anything in the architecture that might prevent a libibcm > connection to another machine? > > I’m using an old Voltaire switch to connect the machines. Can this be > the reason? > > The switch didn’t cause problems using gen1 clients. > What is the problem that you see? there are some examples that comes with the libibcm that can show you how to use the library. there can be several reasons for your problem: 1) side A send a req when side B is not ready and there is a timeout failure 2) only in side A the ib_ucm kernel module enabled 3) SM is not working (well) 4) host A cannot be reached to host B using IB 5) endianess issues? i tried to use the libibcm and i don't have any problem (but i don't have any Voltaire switch, so i can't check your scenario). Dotan ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] libibcm can't connect/talk to libicm on other machine.
Hi Bub, On Tue, 2006-09-05 at 10:22, Bub Thomas wrote: > I’m still in the process of migrating my gen1 application to gen2. > > Actually I CAN connect a gen2 application to a gen2 listener > application on the same machine but NOT to a gen 2 listener on another > machine. > > Any hints where to look at? What are you using for SM ? OpenSM or vendor SM ? > Is there anything in the architecture that might prevent a libibcm > connection to another machine? I don't think this is an architectural issue. -- Hal > I’m using an old Voltaire switch to connect the machines. Can this be > the reason? > > The switch didn’t cause problems using gen1 clients. > > Thanks > > Thomas Bub > > > > __ > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] libibcm can't connect/talk to libicm on other machine.
Dotan, the ibv_rc_pingpong example works for me so I can exclude the architecture. I never got the libibcm example compiled. Which is your example and which architecture x86 vs. x86_64 did you compile it for? Can you share your libibcm the example code? (if it is not the standard that I can't get compiled) Thomas -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dotan Barak Sent: Tuesday, September 05, 2006 5:12 PM To: Bub Thomas Cc: openib-general@openib.org Subject: Re: [openib-general] libibcm can't connect/talk to libicm on other machine. Hi bub. Bub Thomas wrote: > > I'm still in the process of migrating my gen1 application to gen2. > > Actually I CAN connect a gen2 application to a gen2 listener > application on the same machine but NOT to a gen 2 listener on another > machine. > > Any hints where to look at? > > Is there anything in the architecture that might prevent a libibcm > connection to another machine? > > I'm using an old Voltaire switch to connect the machines. Can this be > the reason? > > The switch didn't cause problems using gen1 clients. > What is the problem that you see? there are some examples that comes with the libibcm that can show you how to use the library. there can be several reasons for your problem: 1) side A send a req when side B is not ready and there is a timeout failure 2) only in side A the ib_ucm kernel module enabled 3) SM is not working (well) 4) host A cannot be reached to host B using IB 5) endianess issues? i tried to use the libibcm and i don't have any problem (but i don't have any Voltaire switch, so i can't check your scenario). Dotan ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] New development tool for boot-time drivers (FCode, IEE-1275, IBM/Sun)
If anyone is interested in developing boot-time device drivers for plug-in devices, conformant to the IEEE-1275 (Open Firmware) specification, using FCode (tokenized Forth source), which is compatible with both IBM and Sun platforms (and is platform-independent, so that a driver written once is compatible with all Open Firmware platforms ... but you already know all this if you're using Open Firmware), then you will need a Tokenizer to translate from your Forth source to FCode tokens, which are the "medium of exchange" between the device and the platform. I am writing to announce that a new FCode Tokenizer, capable of running on IBM equipment (and that can be compiled on any other host that supports the GnuCC compiler, and others as well) is freely available at the web-site of the OpenBIOS project, www.openbios.org (and just follow the links about the New FCODE suite) If you have any questions, please direct them to the OpenBIOS Mailing List. Thank you. - David L. Paktor System Firmware Developer System and Technology Group Global Firmware Division [EMAIL PROTECTED] David L Paktor/Almaden/[EMAIL PROTECTED] 18880 Homestead Rd. Building 9945 Cupertino CA 95014 Room 1026 408-342-6110 T/L 560-6110 "The Bug Stops Here" ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] libibcm can't connect/talk to libicm on other machine.
Bub Thomas wrote: > Dotan, > the ibv_rc_pingpong example works for me so I can exclude the > architecture. > I never got the libibcm example compiled. > Which is your example and which architecture x86 vs. x86_64 did you > compile it for? > Can you share your libibcm the example code? (if it is not the standard > that I can't get compiled) > Thomas Did you try applying the following patch? http://openib.org/pipermail/openib-general/2006-August/025005.html I should also mention that I have a version of cmpost that works with the new libibsa, but I am waiting for the review of the kernel sa_query changes before committing. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] libibcm can't connect/talk to libicm on other machine.
Title: libibcm can't connect/talk to libicm on other machine. I know this sounds simple, but have you checked the routing tables? JW - Original Message - From: Bub Thomas To: openib-general@openib.org Sent: Tuesday, September 05, 2006 9:22 AM Subject: [openib-general] libibcm can't connect/talk to libicm on other machine. Im still in the process of migrating my gen1 application to gen2. Actually I CAN connect a gen2 application to a gen2 listener application on the same machine but NOT to a gen 2 listener on another machine. Any hints where to look at? Is there anything in the architecture that might prevent a libibcm connection to another machine? Im using an old Voltaire switch to connect the machines. Can this be the reason? The switch didnt cause problems using gen1 clients. Thanks Thomas Bub ___openib-general mailing listopenib-general@openib.orghttp://openib.org/mailman/listinfo/openib-generalTo unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Robert, Here is a slightly modified patch for your attributes issue. Can you give it a try? Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/openib/dapl_ib_util.c === --- dapl/openib/dapl_ib_util.c (revision 9106) +++ dapl/openib/dapl_ib_util.c (working copy) @@ -446,6 +446,7 @@ return(dapl_convert_errno(errno,"ib_query_hca")); if (ia_attr != NULL) { + (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr)); ia_attr->adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr->vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr->ia_address_ptr = @@ -470,7 +471,12 @@ /* ia_attr->hardware_version_minor = dev_attr.fw_ver; */ ia_attr->max_eps = dev_attr.max_qp; ia_attr->max_dto_per_ep = dev_attr.max_qp_wr; - ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_in = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_out= dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_per_ep_in = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_per_ep_in_guaranteed = DAT_TRUE; + ia_attr->max_rdma_read_per_ep_out_guaranteed = DAT_TRUE; ia_attr->max_evds = dev_attr.max_cq; ia_attr->max_evd_qlen = dev_attr.max_cqe; ia_attr->max_iov_segments_per_dto = dev_attr.max_sge; @@ -501,6 +507,7 @@ } if (ep_attr != NULL) { + (void) dapl_os_memzero(ep_attr, sizeof(*ep_attr)); ep_attr->max_mtu_size = port_attr.max_msg_sz; ep_attr->max_rdma_size= port_attr.max_msg_sz; ep_attr->max_recv_dtos= dev_attr.max_qp_wr; Index: dapl/openib_cma/dapl_ib_util.c === --- dapl/openib_cma/dapl_ib_util.c (revision 9106) +++ dapl/openib_cma/dapl_ib_util.c (working copy) @@ -424,6 +424,7 @@ return(dapl_convert_errno(errno,"ib_query_hca")); if (ia_attr != NULL) { + (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr)); ia_attr->adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr->vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr->ia_address_ptr = @@ -446,6 +447,8 @@ ia_attr->hardware_version_major = dev_attr.hw_ver; ia_attr->max_eps = dev_attr.max_qp; ia_attr->max_dto_per_ep = dev_attr.max_qp_wr; + ia_attr->max_rdma_read_in = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_out= dev_attr.max_qp_rd_atom; ia_attr->max_rdma_read_per_ep_in = dev_attr.max_qp_rd_atom; ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom; ia_attr->max_rdma_read_per_ep_in_guaranteed = DAT_TRUE; @@ -481,6 +484,7 @@ } if (ep_attr != NULL) { + (void) dapl_os_memzero(ep_attr, sizeof(*ep_attr)); ep_attr->max_mtu_size = port_attr.max_msg_sz; ep_attr->max_rdma_size= port_attr.max_msg_sz; ep_attr->max_recv_dtos= dev_attr.max_qp_wr; Index: dapl/openib_scm/dapl_ib_util.c === --- dapl/openib_scm/dapl_ib_util.c (revision 9106) +++ dapl/openib_scm/dapl_ib_util.c (working copy) @@ -373,6 +373,7 @@ return(dapl_convert_errno(errno,"ib_query_hca")); if (ia_attr != NULL) { + (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr)); ia_attr->adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr->vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr->ia_address_ptr = (DAT_IA_ADDRESS_PTR)&hca_ptr->hca_address; @@ -390,7 +391,12 @@ /* ia_attr->hardware_version_minor = dev_attr.fw_ver; */ ia_attr->max_eps = dev_attr.max_qp; ia_attr->max_dto_per_ep = dev_attr.max_qp_wr; - ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_in = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_out= dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_per_ep_in = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom; + ia_attr->max_rdma_read_per_ep_in_guaranteed = DAT_TRUE; + ia_attr->max_rdma_read_per_ep_out_guaranteed = DAT_TRUE; ia_attr->max_evds = de
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Arlin Davis wrote: > Robert, > > Here is a slightly modified patch for your attributes issue. Can you give it > a try? > I'll give it a spin this afternoon: it looks quite a bit more comprehensive than the small patch I did. Regards, Robert. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP3sXfzvnpzTd9fxAQLwwAf+IOIsC+gqb9Juzt8rwJJlnSW1PjZFrRGi NrCnRXvn52tsgclNNHGSzqOgkIntZ2TqxwEJJeTou3UhUQ5laJWEkQgwrvFTazcn +IQH3BGDLFyZJJQO0WSi2685dEKOH5by6Zp9yVo9sy3Odu6jod2v/uCOjdGkR8ys CvQW+y70qDmom1SJ9P2XQ4/dxxX/v2IFYOWMoVzMlDZsNnvnti/Uspwc1KpQeP6F RRwWImlDyuuAW6+JX6atM5Lne797T5IO7MugW6d/+0oAMVU7H3oiDBdX+9tVwBci IBJJ/PdQ8e7a7x4uOg+LKOSDH16IFVNaua4XhBfVmQEjf1y41KepDw== =1zt8 -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Question about interrupt generation
Hi All,I tried the following simple experiment and am not able to understand the results:Calcualted the number of interrupts generated by the infiniband [with little or no traffic to the NIC] over a period of 10seconds and saw around 10-20 interrupts/sec. Then ran a netperf test and saw around 100+ K interrupts/sec. This screwed up my host machine. To reduce the impact of the interrupts, I add a call back that is scheduled to be periodically called every few microseconds that masks the irq line used by the NIC and a little later unmasks the same. Noticed that with no traffic, I see anywhere between 30-50K interrupts/sec. With the netperf traffic, I see around 120K+ interrupts/sec. Am a newbie to infiniband technology and so do not understand why so many interrupts are getting generated when I have my call back periodically called. Could it be that the Infiniband supports MSI? Or is what I am seeing IPIs? Or does Infiniband generate interrupts based on types of events and what I am doing by masking/unmasking the interrupt line is one such event? Any information/suggestions would be useful.Thanks in advance,harish ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Robert Walsh wrote, >I'll give it a spin this afternoon: it looks quite a bit more >comprehensive than the small patch I did. I also just tried running the ib_rdma_bw test and it seems to be flaky if you stress it. If you just run the defaults, it seems to work, but if you crank up the iterations and the message size, it sometimes fails with. [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 | iters=1 | duplex=0 | cma=0 | 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400 VAddr 0x2a95dd3480 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500 VAddr 0x2a95c85480 4730:main: Completion with error at client: 4730:main: Failed status 9: wr_id 3 4730:main: scnt=7584, ccnt=6584 [EMAIL PROTECTED] bin]$ woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups
Thanks, I've rolled this up in the amso1100 patch I have queued up. > - #if 0 the following unused global function: > - c2_mq.c: c2_mq_count() Tom/Steve, any reason to keep c2_mq_count() at all? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction
Thanks, queued for 2.6.19. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Robert Walsh wrote: >-BEGIN PGP SIGNED MESSAGE- >Hash: SHA1 > >Arlin Davis wrote: > > >>Robert, >> >>Here is a slightly modified patch for your attributes issue. Can you give it >>a try? >> >> >> > >I'll give it a spin this afternoon: it looks quite a bit more >comprehensive than the small patch I did. > >Regards, > Robert. > > Just added all appropriate RDMA in/out fields and some code to zero out the structure to avoid uninitialized data fields. -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 > Just added all appropriate RDMA in/out fields and some code to zero out > the structure to avoid uninitialized data fields. Yup. By "comprehensive", I meant "better" :-) -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP32hfzvnpzTd9fxAQJnMwgAgcyxQpxdbk/eLEECXTnAOAYjyv3seTpE Ir1s+K7JEYL2Rbyk9h9CzbK67YSYe4QeIE52pTopEVFw8mnSLaz+ZIOmvdRUiHSS FiwEyfbXEPrFKZfyXu/REsigWx5vn7vCZid3hUIdx1vbt9eVAiVPGbAO1ALI8en9 /xc7iTGpYxwBwNOYbdhW0cOCjvobV98Fp6UJebvxd9xiRUS6c2JeZKLYdQyRO5rm JV7L8HqJr1dS8nbAiPG7DSjCv7/3SFdQVr+Tgt5MQpVfD56z41eBBuXzEfeqsg5E HHSxUOTdqizpscMyLudAWGAr5DZwOAQ4Z90zAL8gc2YYbjbOT3D6bA== =JKRU -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups
Its old debug code that isn't used anywhere. It would be nice to keep it around, but if you really don't want it, nuke it... On Tue, 2006-09-05 at 14:57 -0700, Roland Dreier wrote: > Thanks, I've rolled this up in the amso1100 patch I have queued up. > > > - #if 0 the following unused global function: > > - c2_mq.c: c2_mq_count() > > Tom/Steve, any reason to keep c2_mq_count() at all? > > - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Arlin Davis wrote: > Robert, > > Here is a slightly modified patch for your attributes issue. Can you give it > a try? Oddly enough, I'm back to the same problem with your new patch as I saw with the unpatched version: $ mpiexec -n 2 ./a.out I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use rdma configuration will use rdma configuration [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma: could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) Hello world: rank 0 of 2 running on ib-idev-05 rank 1 in job 1 ib-idev-05_51891 caused collective abort of all ranks exit status of rank 1: killed by signal 9 Still tracking this one down. I noticed in the patch you removed a couple of lines, too: - ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom; Any particular reason why you did this? Regards, Robert. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP37QvzvnpzTd9fxAQI79wf6Anc3/Ve7tg3x31hE4i5qa9bB01qEYmEv 9xx4FQqXNbhMos9hHEQAWJ9S0sKccr+yCNekkIX6GzlaVDv+AKDzZF6uzA8Prrhr CEcf28c1Pw7gflg8MMfVcnAHr2YG/hXyd+ve9m6cGv0rxgPqY6lWmHjghKDxKO7h f/SaDOaVAuN6kEJMRgIrKIxDyFSVl4z1tGXAK3yHVhslvPqNqGwDqNfFMV6UQK+V NNfKVVKVCttUWdzcVELzi3zkiat5xDdqIcwQr8xs2YaXHfAGeD4NurWowil887Sn bRuh5soVdBaKW9mAtQWuAECt9VLDvyYReLWkEq6ikgilPGCeJluDEw== =TNaE -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups
Steve> Its old debug code that isn't used anywhere. It would be Steve> nice to keep it around, but if you really don't want it, Steve> nuke it... No, that's fine, I'll leave it inside the #if 0. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
>Oddly enough, I'm back to the same problem with your new patch as I saw >with the unpatched version: Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your adapter and it worked. Did you ever pick up the Intel MPI 3.0 beta? > > $ mpiexec -n 2 ./a.out > I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from >registry: OpenIB-cma > I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from >registry: OpenIB-cma > I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use >rdma configuration > will use rdma configuration > [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma: >could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) > Hello world: rank 0 of 2 running on ib-idev-05 > rank 1 in job 1 ib-idev-05_51891 caused collective abort of all ranks >exit status of rank 1: killed by signal 9 > >Still tracking this one down. I noticed in the patch you removed a >couple of lines, too: > > - ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom; > >Any particular reason why you did this? max_rdma_read_per_ep is the same as max_rdma_read_per_ep_in. Look at dat.h line #369 /* To support backwards compatibility for DAPL-1.0 */ #define max_rdma_read_per_epmax_rdma_read_per_ep_in #define DAT_IA_FIELD_IA_MAX_DTO_PER_OP DAT_IA_FIELD_IA_MAX_DTO_PER_EP_IN /* To support backwards compatibility for DAPL-1.0 & DAPL-1.1 */ #define max_mtu_size max_message_size -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 >> Oddly enough, I'm back to the same problem with your new patch as I saw >> with the unpatched version: > > Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your > adapter and it worked. Weird - it's not working for me at all. Maybe I'm messing up somewhere. I've got a meeting for the next hour or so - I'll check again when I get back. > Did you ever pick up the Intel MPI 3.0 beta? Yup. > max_rdma_read_per_ep is the same as max_rdma_read_per_ep_in. Ah - fair enough. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP4DLPzvnpzTd9fxAQJ3nwgAiO+dLDRQv22RrBHYqHcodDwC2ZakxzFh pXBn9j5kwzA2EmnXCvex14v7K168Alqr9lgUpfaGr6StZsCdBU0FY2TRjok41VFl h+fYu78QFgDjleTMkp17Hl7RG9/r8AWzKzTG1LDn1YqwHrn9ngeZlqFfy1BP1tfB pkkW+Nj7HQXbXUNiDc/V9HKW7eBOjwCvkfDI7Knbrfp2QVBI/9ABpWGO4bJf3P7X n9ZzlEBN0SCOHKtGAa1gspQrmJGMHw0qyajUA6Yuyp1dWRygbl8L+ahF2BJFwZSx KGyhoBRZexpP8m0AJASnKgAVjGf6JR31dL7O8WAOjD4QpFEofMSqqA== =yDmH -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [Bug 218] New: Call usage verifier is detecting reinitialization of spinlocks already in use
http://openib.org/bugzilla/show_bug.cgi?id=218 Summary: Call usage verifier is detecting reinitialization of spinlocks already in use Product: OpenFabrics Windows Version: unspecified Platform: X86 OS/Version: Other Status: NEW Severity: major Priority: P2 Component: mthca driver AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] I built a debug version of revision 467 and turned on call usage verifier (CUV) for the mthca driver. It's detecting many cases of spinlocks being initialized after they have already been used. This is usually bad. To build with CUV all you have to do is add the following line to the sources file. VERIFIER_DDK_EXTENSIONS=1 My experience is CUV tends to detect a different set of bugs from driver verifier, and it might be useful to turn on CUV for all the drivers and see what's reported. CUV Driver Error: Calling KeInitializeSpinLock(...) at File k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_spinlock.h, Line 57 The Spin lock specified as parameter 1 [0x87840EDC] has been previously initialized and used as a In-Stack Queued Spin lock by this driver. Break, Ignore, Zap, Remove, Disable all, H for help (bizrdh)? b b Breaking in... (press g to return to assert menu) Break instruction exception - code 8003 (first chance) nt!DbgBreakPoint: 8075cc00 cc int 3 0: kd> k 50 ChildEBP RetAddr f7926438 baeab189 nt!DbgBreakPoint f7926450 baeaa814 mthca!DDKExtPrompt+0x10a [d:\dnsrv\sdktools\ddk\ddk_ext\verifier\messages.cpp @ 709] f7926468 baea990e mthca!DDKExtVInitializeItem+0x98 [d:\dnsrv\sdktools\ddk\ddk_ext\verifier\validate.cpp @ 195] f7926490 bae81635 mthca!DDK_KeInitializeSpinLock+0x35 [d:\dnsrv\sdktools\ddk\ddk_ext\verifier\locks.cpp @ 298] f79264a4 baea42ee mthca!spin_lock_init+0x15 [k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_spinlock.h @ 58] f79264b0 baea4057 mthca!mthca_wq_init+0xe [k:\windows-openib\src\winib-467b\hw\mthca\kernel\mthca_qp.c @ 383] f792653c bae7eaac mthca!mthca_modify_qp+0xe97 [k:\windows-openib\src\winib-467b\hw\mthca\kernel\mthca_qp.c @ 853] f7926550 bae76eaa mthca!ibv_modify_qp+0x1c [k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_verbs.c @ 467] f7926628 ba99e0f3 mthca!mlnx_modify_qp+0x11a [k:\windows-openib\src\winib-467b\hw\mthca\kernel\hca_verbs.c @ 955] f792673c ba99df12 ibbus!al_modify_qp+0x113 [k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1346] f7926760 ba99d7b8 ibbus!modify_qp+0x502 [k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1313] f7926778 ba99eef5 ibbus!ib_modify_qp+0x18 [k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1288] f7926848 ba99ec9e ibbus!init_dgrm_svc+0x175 [k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1453] f7926870 ba96d005 ibbus!ib_init_dgrm_svc+0x73e [k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1395] f7926c4c ba969fd8 ibbus!create_spl_qp_svc+0x18a5 [k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 718] f7926c78 ba969a45 ibbus!spl_qp_agent_pnp+0x128 [k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 476] f7926c8c ba98f071 ibbus!spl_qp0_agent_pnp_cb+0x95 [k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 429] f7926cf4 ba98f2e8 ibbus!__pnp_notify_user+0x561 [k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 523] f7926d38 ba990e7c ibbus!__pnp_port_notify+0x118 [k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 612] f7926d70 ba94d8a4 ibbus!__pnp_process_add_ca+0x2dc [k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 943] f7926d8c ba953b94 ibbus!__cl_async_proc_worker+0x94 [k:\windows-openib\src\winib-467b\core\complib\cl_async_proc.c @ 153] f7926da0 ba955c4c ibbus!__cl_thread_pool_routine+0x54 [k:\windows-openib\src\winib-467b\core\complib\cl_threadpool.c @ 67] f7926dac 80a07678 ibbus!__thread_callback+0x2c [k:\windows-openib\src\winib-467b\core\complib\kernel\cl_thread.c @ 49] f7926ddc 80781346 nt!PspSystemThreadStartup+0x2e nt!KiThreadStartup+0x16 0: kd> g --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Question about interrupt generation
Hi,One more question. What kind of event mask helps mask the interrupts?thanksharishOn 9/5/06, harish < [EMAIL PROTECTED]> wrote:Hi All,I tried the following simple experiment and am not able to understand the results: Calcualted the number of interrupts generated by the infiniband [with little or no traffic to the NIC] over a period of 10seconds and saw around 10-20 interrupts/sec. Then ran a netperf test and saw around 100+ K interrupts/sec. This screwed up my host machine. To reduce the impact of the interrupts, I add a call back that is scheduled to be periodically called every few microseconds that masks the irq line used by the NIC and a little later unmasks the same. Noticed that with no traffic, I see anywhere between 30-50K interrupts/sec. With the netperf traffic, I see around 120K+ interrupts/sec. Am a newbie to infiniband technology and so do not understand why so many interrupts are getting generated when I have my call back periodically called. Could it be that the Infiniband supports MSI? Or is what I am seeing IPIs? Or does Infiniband generate interrupts based on types of events and what I am doing by masking/unmasking the interrupt line is one such event? Any information/suggestions would be useful.Thanks in advance,harish ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Woodruff, Robert J wrote: > Robert Walsh wrote, >> I'll give it a spin this afternoon: it looks quite a bit more >> comprehensive than the small patch I did. > > I also just tried running the ib_rdma_bw test and it seems to > be flaky if you stress it. If you just run the defaults, it seems to > work, but if you crank up the iterations and the message size, > it sometimes fails with. > > [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12 > 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 | > iters=1 | duplex=0 | cma=0 | > 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400 > VAddr 0x2a95dd3480 > 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500 > VAddr 0x2a95c85480 > 4730:main: Completion with error at client: > 4730:main: Failed status 9: wr_id 3 > 4730:main: scnt=7584, ccnt=6584 > [EMAIL PROTECTED] bin]$ This looks like a known bug, the fix to which didn't make it into OFED 1.1-RC3. Hopefully we can still get this into 1.1-RC4. Regards, Robert. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP4aOfzvnpzTd9fxAQKAEggAlZC5hYi9kdxLkj9Mfl/BwHJQxWUwsKcG K2ck3jtrP6PVa04FdVI/TNL2XE7R3eu69vTfBaTS26pw2CVM6av0ztFiWEV2r5Fu 8FXGJBOuDOYxnwuA0o3yHSMVFtrRW6Jgn2G/JQPZ8IDAK7GrPj3VebvyclPwF5+d KMPIFXJaTzjoJl2JEGFLiSlf+tFMOEs3vazrRwkZpQezKRcs3F1E6TQImtN7kuYK 0/IKxeS4ZOduXpczsJZgsPs6Y9kYi94XN0E4JeJJAh9Miq+bXkxhxbrafieNl7xW n9m7i/phcFcngSzDwjBNXE2ZuQjujDpz94SRnkVedomYNbr5zKXBgQ== =NurT -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 > Here is a slightly modified patch for your attributes issue. Can you give it > a try? I rebuilt OFED from scratch with the patch, and ran successfully on Intel MPI 2.0.1 with the refresh patch. I could not get it to run on Intel MPI 3.0b. If you could verify that the fix you mentioned that is in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it. If you have a later beta version you could send me, that would be great, too. Regards, Robert. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP4ijvzvnpzTd9fxAQIqeggAkJ4OQ3GrkpqyJUbHImgqbob6npINOv5L lBUANcHZZ8DMFIq5hP4H+OYX2s/yoS3AKDGf0x8kHoVsTDFTFNe69bsGzJMT3znP YDmq3ETN4aSGOgKX2NFzWs+mYG0pEN9uDt/SmEYmccYiIuK3lTlb8jxON6mqqJFL nfitAp7WaLn7OS8A3CfVrAbWwYJ4U6UWPD/rB5sJTg8nTxECc94JaOhPZ90smB6H 9xk8OihEoTxodFLzcpaz/ORS4EPAle69Uw2tP3myjr/4w/SzLGJT6DFVpGQ0BaWC jVXFYVKyVW4JmFMcW1X29ogmVNH8gEDBUfbG1P5Wd8sLzMMB18tINA== =X/q7 -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [openfabrics-ewg] OFED 1.1-rc2 is ready (how do I enable madeye)?
> 5. Added Madeye utility How do I build madeye? I don't see any reference to it to install.sh. Is there any documentation for madeye? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [Bug 218] Call usage verifier is detecting reinitialization of spinlocks already in use
http://openib.org/bugzilla/show_bug.cgi?id=218 [EMAIL PROTECTED] changed: What|Removed |Added AssignedTo|[EMAIL PROTECTED] |[EMAIL PROTECTED] --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general