--------linked against an old copy of ibverbs, reconfigured and it
compiled just fine.
(also messed up the declaration in ib.h)
Got it to compile and I'll start testing it to see how close we are
getting to nailing this issue.


~~Kyle

On Mon, Mar 10, 2008 at 1:51 PM, Kyle Schochenmaier <[EMAIL PROTECTED]> wrote:
> Pete -
>  I am trying to hack together a test case to implement what we had
>  talked about in the previous emails with a wr_credit...
>  I'm trying to keep track of it in the openib_device (od) structure
>  inside openib.c and would like to keep the necessary changes inside
>  openib.c if at all possible.  The problem I'm running into, is that
>  I'm going to need to call check_cq() from inside the send_rdma writes
>  function, which lies in openib.c, not ib.c.        openib.c has a
>  function for this but its really intended to work *with* ib.c's
>  check_cq() fucntionality...
>  In order to get around this I needed to make ib_check_cq() visible to
>  openib.c  (got rid of the static and added a declaration to ib.h)..
>  but I'm getting weird things when I'm linking..
>
>  Any ideas how to get around this?
>
>  lib/libpvfs2-server.a(bmi-server.o):(.rodata+0x780): undefined
>  reference to `bmi_ib_ops'
>  collect2: ld returned 1 exit status
>  make: *** [src/server/pvfs2-server] Error 1
>
>  (I've attached a very rudimentary patch that sort of gets at what I'm
>  trying to do, not sure if its correct yet, still trying to compile)
>
>
>
>
>
>
>
>  On Fri, Mar 7, 2008 at 11:26 AM, Troy Benjegerdes <[EMAIL PROTECTED]> wrote:
>  > For further information, the diff to libmthca we used to figure this
>  >  out, and some extensive logfiles of the problem occuring on the servers
>  >  with full PVFS2_DEBUGMASK=network are at:
>  >
>  >  http://scl.ameslab.gov/~troy/pvfs/ibv_post_send/
>  >
>  >  (This is the error showing ibv_post_send failing with the -1001 error
>  >  code I added)
>  >
>  >  [D 12:33:00.861561] PVFS2 Server version 2.7.1pre1-2008-03-05-215140
>  >  starting.
>  >  [E 13:51:39.436430] openib_post_sr_rdmaw: ibv_post_send failed ret:
>  >  -1001 errno: 0
>  >  [E 13:51:39.445031]  wr_id: 0x0 next: (nil) sg_list 0x65bb30 num_sge 1
>  >  [E 13:51:39.445073]  opcode: 0x0 send_flags: 0x0 imm_data: 0x0
>  >  [E 13:51:39.445091]  sr.wr.rdma.remote_addr: 0xf509c000 rkey 0x300055
>  >  [E 13:51:39.445195] openib_post_sr_rdmaw: QP_request sge: 1
>  >  [E 13:51:39.445249] Error: openib_post_sr_rdmaw: QP_sge: 28
>  >  : Unknown error 18446744073709550615.
>  >
>  >  Included in the logfiles in an attempt where I just tried to repost the
>  >  send after 100 us for 10 retries, but that didn't seem to help. I am
>  >  wondering if doing something like calling the ib_poll_cq function needs
>  >  to be done to make some progress, or if there's some other way to back
>  >  off out of openib_post_wr_rdma when the queues are full.
>  >
>  >
>  >
>  >  Kyle Schochenmaier wrote:
>  >  > Pete -
>  >  >
>  >  > We're still trying to track down this "bug" with how we use our ib
>  >  > nics with pvfs2.  This is a continuation of previous emails regarding
>  >  > failures inside the openib_post_wr_rdma() functions for openib.c where
>  >  > we get into a situation with running out of wq entries on the server
>  >  > nics with multiple client processes hammering the filesystem.  We only
>  >  > see the servers going out here, the client ends up with a timeout
>  >  > eventually.
>  >  >
>  >  > Troy and I have tracked down the specific resources down to the driver
>  >  > that were being reported, ( and unfortunately sharing the same -errno
>  >  > codes ).
>  >  > So as far as we can tell, every time we get into this situation, we
>  >  > get a wq_overflow() from the driver, and then of course the post_send
>  >  > fails, leading to problems that are unrecoverable.  We are pretty sure
>  >  > that we're just running into the hw constraints of our nics, and that
>  >  > the best way to deal with this type of thing would be to create some
>  >  > sort of ib_flush_outgoing_requests() functionality for pvfs2 that
>  >  > would either implement a backoff mechanism for the send requests to
>  >  > wait for things at the nic level to be processed, or would just
>  >  > 'flush' everything out.. We're not sure exactly how to go about this,
>  >  > where or if this would be appropriate, or if we're missing something
>  >  > obvious..
>  >  > Can we recover from this elegantly?
>  >  >
>  >  > What does everyone think?
>  >  >
>  >  > ~~Kyle
>  >  >
>  >  >
>  >  > Included is the path from pvfs2 to what we are seeing in the driver:
>  >  >
>  >  > pvfs2/src/io/bmi/bmi_ib/openib.c
>  >  >
>  >  > static void openib_post_sr_rdmaw(struct ib_work *sq, msg_header_cts_t 
> *mh_cts,
>  >  >                                  void *mh_cts_buf)
>  >  > {
>  >  > <snip>
>  >  >
>  >  >         ret = ibv_post_send(oc->qp, &sr, &bad_wr);
>  >  >
>  >  > <snip>
>  >  > }
>  >  >
>  >  > -------------------------
>  >  > ibv_post_send()  points to this function for memfull mellanox cards in
>  >  > libmthca-*/src/qp.c
>  >  > -------------------------
>  >  > int mthca_tavor_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
>  >  >                           struct ibv_send_wr **bad_wr)
>  >  >
>  >  > {
>  >  >         struct mthca_qp *qp = to_mqp(ibqp);
>  >  >         void *wqe, *prev_wqe;
>  >  >         int ind;
>  >  >         int nreq;
>  >  >         int ret = 0;
>  >  >         int size;
>  >  >         int size0 = 0;
>  >  >         int i;
>  >  >         /*
>  >  >          * f0 and op0 cannot be used unless nreq > 0, which means this
>  >  >          * function makes it through the loop at least once.  So the
>  >  >          * code inside the if (!size0) will be executed, and f0 and
>  >  >          * op0 will be initialized.  So any gcc warning about "may be
>  >  >          * used unitialized" is bogus.
>  >  >          */
>  >  >         uint32_t f0;
>  >  >         uint32_t op0;
>  >  >
>  >  >         pthread_spin_lock(&qp->sq.lock);
>  >  >
>  >  >         ind = qp->sq.next_ind;
>  >  >
>  >  >         for (nreq = 0; wr; ++nreq, wr = wr->next) {
>  >  > ******                if (wq_overflow(&qp->sq, nreq,
>  >  > to_mcq(qp->ibv_qp.send_cq))) {
>  >  >                         ret = -1;
>  >  >                         *bad_wr = wr;
>  >  >                         goto out;
>  >  >                 }
>  >  >
>  >  > <snip>
>  >  >
>  >  >
>  >  >
>  >
>  >
>
>
>
>  --
>  Kyle Schochenmaier
>



-- 
Kyle Schochenmaier
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to