On Mon, Apr 11, 2005 at 05:10:51PM -0700, Roland Dreier wrote: > ardavis> Redhat EL 4.0, 64-bit > > OK, I found a system with that distro installed, although I can't test > the results of the build. However, I built libmthca with the same > CFLAGS that rpm seems to use, namely "-g -O2 -m64 -pipe". I found > that mthca_tavor_arm_cq() compiles to the following tiny fragment: > > 0000000000001d10 <mthca_tavor_arm_cq>: > 1d10: 48 8b 07 mov (%rdi),%rax > 1d13: 48 8b 90 a8 ef ff ff mov 0xffffffffffffefa8(%rax),%rdx > 1d1a: 48 8b 44 24 f8 mov 0xfffffffffffffff8(%rsp),%rax > 1d1f: 48 89 42 20 mov %rax,0x20(%rdx) > 1d23: 31 c0 xor %eax,%eax > 1d25: c3 retq > > in other words, the compiler seems to be discarding all the > assignments to doorbell[0] and doorbell[1].
doorbell[] is a local variable and mthca_write64() is static inline. I don't see a problem with the assignments to doorbell getting optimized out since the scope of that variable is completely visible to gcc. A smart compiler would just use registers and reduce the 32-bit stores. I see a problem with "(notify == IB_CQ_SOLICITED ? ....)" code getting optimized away. "notifier" is passed in parameter (not a constant) and the function is only invoked as an indirect function call. I don't see how gcc could know what value notifier will have and optimize the test away. Hrm...maybe the bug is "notifier" is somehow overloaded to a constant. You'd have to look at the intermediate "-E" (preprocessed) output. > I'm not sure if this is a compiler bug or what -- I need to > investigate further.> In any case > can you try the following patch to libmthca and see if it fixes > things: > > Index: src/cq.c > =================================================================== > --- src/cq.c (revision 2156) > +++ src/cq.c (working copy) > @@ -441,6 +441,8 @@ int mthca_tavor_arm_cq(struct ibv_cq *cq > to_mcq(cq)->cqn); > doorbell[1] = 0xffffffff; > > + mb(); > + > mthca_write64(doorbell, to_mctx(cq->context), MTHCA_CQ_DOORBELL); I don't get how this fixes the problem. mthca_write64() uses a spinlock and I thought that has to enforce some sort of memory/instruction ordering already. I'm sketchy on details and can't look it up right now. hth, grant _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
