Hi,

Could someone have a look on these two different error messages ? I'd like to 
know the reason(s) why they were displayed and their actual meaning.

Thanks,
Eloi

On Monday 19 July 2010 16:38:57 Eloi Gaudry wrote:
> Hi,
> 
> I've been working on a random segmentation fault that seems to occur during
> a collective communication when using the openib btl (see [OMPI users]
> [openib] segfault when using openib btl).
> 
> During my tests, I've come across different issues reported by
> OpenMPI-1.4.2:
> 
> 1/
> [[12770,1],0][btl_openib_component.c:3227:handle_wc] from bn0103 to: bn0122
> error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for
> wr_id 560618664 opcode 1  vendor error 105 qp_idx 3
> 
> 2/
> [[992,1],6][btl_openib_component.c:3227:handle_wc] from pbn04 to: pbn05
> error polling LP CQ with status REMOTE ACCESS ERROR status number 10 for
> wr_id 162858496 opcode 1  vendor error 136 qp_idx
> 0[[992,1],5][btl_openib_component.c:3227:handle_wc] from pbn05 to: pbn04
> error polling HP CQ with status WORK REQUEST FLUSHED ERROR status number 5
> for wr_id 485900928 opcode 0  vendor error 249 qp_idx 0
> 
> --------------------------------------------------------------------------
> The OpenFabrics stack has reported a network error event.  Open MPI will
> try to continue, but your job may end up failing.
> 
>   Local host:        p'"
>   MPI process PID:   20743
>   Error number:      3 (IBV_EVENT_QP_ACCESS_ERR)
> 
> This error may indicate connectivity problems within the fabric; please
> contact your system administrator.
> --------------------------------------------------------------------------
> 
> I'd like to know what these two errors mean and where they come from.
> 
> Thanks for your help,
> Eloi

-- 


Eloi Gaudry

Free Field Technologies
Company Website: http://www.fft.be
Company Phone:   +32 10 487 959

Reply via email to