Nathan, I found this parameters under /sys/module/mlx4_core/parameters. How do you incorporate a changed value? What to restart/rebuild?
-----Original Message----- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Nathan Hjelm Sent: Monday, September 12, 2011 11:00 AM To: Open MPI Users Subject: EXTERNAL: Re: [OMPI users] qp memory allocation problem I also recommend checking the log_mtts_per_set parameter to the mlx4 module. This parameter controls how much memory can be registered for use by the mlx4 driver and it should be in the range 1-5 (or 0-7 depending on the version of the mlx4 driver). I recommend tthe parameter be set such that you can register all the memory on the node. Assuming you are not using huge pages here is a list of possible values (and how much memory can be registered). 0 - 2 GB (new mlx4 default-- bad setting) 1 - 4 GB 2 - 8 GB 3 - 16 GB (old mlx4 default) 4 - 32 GB 5 - 64 GB 6 - 128 GB 7 - 256 GB -Nathan Hjelm Los Alamos National Laboratory On Mon, 12 Sep 2011, Samuel K. Gutierrez wrote: > Hi, > This problem can be caused by a variety of things, but I suspect our > default queue pair parameters (QP) aren't helping the situation :-). > > What happens when you add the following to your mpirun command? > > -mca btl_openib_receive_queues S,4096,128:S,12288,128:S,65536,12 > > OMPI Developers: > > Maybe we should consider disabling the use of per-peer queue pairs by > default. Do they buy us anything? For what it is worth, we have stopped > using them on all of our large systems here at LANL. > > Thanks, > > Samuel K. Gutierrez > Los Alamos National Laboratory > > On Sep 12, 2011, at 9:23 AM, Blosch, Edwin L wrote: > > I am getting this error message below and I don't know what it means or > how to fix it. It only happens when I > run on a large number of processes, e.g. 960. Things work fine on 480, > and I don't think the application has a > bug. Any help is appreciated... > > [c1n01][[30697,1],3][connect/btl_openib_connect_oob.c:464:qp_create_on > e] error creating qp errno says Cannot allocate memory > [c1n01][[30697,1],4][connect/btl_openib_connect_oob.c:464:qp_create_on > e] error creating qp errno says Cannot allocate memory > > Here's the mpirun command I used: > mpirun --prefix /usr/mpi/intel/openmpi-1.4.3 --machinefile <host file> > -np 960 --mca btl ^tcp --mca mpool_base_use_mem_hooks 1 --mca > mpi_leave_pinned 1 -x LD_LIBRARY_PATH -x MPI_ENVIRONMENT=1 > <application name + > arguments> > > Here's the applicable hardware from the > /usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-param > s.ini > file: > # A.k.a. ConnectX > [Mellanox Hermon] > vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f > vendor_part_id = > 25408,25418,25428,26418,26428,25448,26438,26448,26468,26478,26488 > use_eager_rdma = 1 > mtu = 2048 > max_inline_data = 128 > > And this is the output of ompi_info -param btl openib: > MCA btl: parameter "btl_base_verbose" (current value: > "0", data source: default value) > Verbosity level of the BTL framework > MCA btl: parameter "btl" (current value: <none>, data > source: default value) > Default selection set of components for the > btl framework (<none> means use all components that can be found) > MCA btl: parameter "btl_openib_verbose" (current > value: "0", data source: default value) > Output some verbose OpenIB BTL information > (0 = no output, nonzero = output) > MCA btl: parameter > "btl_openib_warn_no_device_params_found" (current value: "1", data source: > default value, synonyms: > btl_openib_warn_no_hca_params_found) > Warn when no device-specific parameters are > found in the INI file specified by the btl_openib_device_param_files > MCA > parameter (0 = do not warn; any other value > = warn) > MCA btl: parameter > "btl_openib_warn_no_hca_params_found" (current value: "1", data > source: default value, deprecated, synonym > of: btl_openib_warn_no_device_params_found) > Warn when no device-specific parameters are > found in the INI file specified by the btl_openib_device_param_files > MCA > parameter (0 = do not warn; any other value > = warn) > MCA btl: parameter > "btl_openib_warn_default_gid_prefix" (current value: "1", data source: > default > value) > Warn when there is more than one active > ports and at least one of them connected to the network with only > default GID > prefix configured (0 = do not warn; any > other value = warn) > MCA btl: parameter "btl_openib_warn_nonexistent_if" > (current value: "1", data source: default value) > Warn if non-existent devices and/or ports > are specified in the btl_openib_if_[in|ex]clude MCA parameters (0 = do > not > warn; any other value = warn) > MCA btl: parameter "btl_openib_want_fork_support" > (current value: "-1", data source: default value) > Whether fork support is desired or not > (negative = try to enable fork support, but continue even if it is not > available, 0 = do not enable fork support, > positive = try to enable fork support and fail if it is not available) > MCA btl: parameter "btl_openib_device_param_files" (current > value: > > "/usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-params.ini", > data source: > default value, synonyms: > btl_openib_hca_param_files) > Colon-delimited list of INI-style files that > contain device vendor/part-specific parameters > MCA btl: parameter "btl_openib_hca_param_files" (current > value: > > "/usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-params.ini", > data source: > default value, deprecated, > synonym of: btl_openib_device_param_files) > Colon-delimited list of INI-style files that > contain device vendor/part-specific parameters > MCA btl: parameter "btl_openib_device_type" (current > value: "all", data source: default value) > Specify to only use IB or iWARP network > adapters (infiniband = only use InfiniBand HCAs; iwarp = only use > iWARP NICs; > all = use any available adapters) > MCA btl: parameter "btl_openib_max_btls" (current > value: "-1", data source: default value) > Maximum number of device ports to use (-1 = > use all available, otherwise must be >= 1) > MCA btl: parameter "btl_openib_free_list_num" > (current value: "8", data source: default value) > Intial size of free lists (must be >= 1) > MCA btl: parameter "btl_openib_free_list_max" > (current value: "-1", data source: default value) > Maximum size of free lists (-1 = infinite, > otherwise must be >= 0) > MCA btl: parameter "btl_openib_free_list_inc" > (current value: "32", data source: default value) > Increment size of free lists (must be >= 1) > MCA btl: parameter "btl_openib_mpool" (current value: > "rdma", data source: default value) > Name of the memory pool to be used (it is > unlikely that you will ever want to change this > MCA btl: parameter "btl_openib_reg_mru_len" (current > value: "16", data source: default value) > Length of the registration cache most > recently used list (must be >= 1) > MCA btl: parameter "btl_openib_cq_size" (current value: > "1000", data source: default value, synonyms: > btl_openib_ib_cq_size) > Size of the OpenFabrics completion queue > (will automatically be set to a minimum of (2 * number_of_peers * > btl_openib_rd_num)) > MCA btl: parameter "btl_openib_ib_cq_size" (current > value: "1000", data source: default value, deprecated, synonym of: > btl_openib_cq_size) > Size of the OpenFabrics completion queue > (will automatically be set to a minimum of (2 * number_of_peers * > btl_openib_rd_num)) > MCA btl: parameter "btl_openib_max_inline_data" > (current value: "-1", data source: default value, > synonyms: > btl_openib_ib_max_inline_data) > Maximum size of inline data segment (-1 = > run-time probe to discover max value, otherwise must be >= 0). If not > explicitly set, use max_inline_data from the > INI file containing device-specific parameters > MCA btl: parameter "btl_openib_ib_max_inline_data" > (current value: "-1", data source: default value, deprecated, synonym of: > btl_openib_max_inline_data) > Maximum size of inline data segment (-1 = > run-time probe to discover max value, otherwise must be >= 0). If not > explicitly set, use max_inline_data from the > INI file containing device-specific parameters > MCA btl: parameter "btl_openib_pkey" (current value: "0", > data source: default value, synonyms: > btl_openib_ib_pkey_val) > OpenFabrics partition key (pkey) value. > Unsigned integer decimal or hex values are allowed (e.g., "3" or > "0x3f") and > will be masked against the maximum allowable > IB paritition key value (0x7fff) > MCA btl: parameter "btl_openib_ib_pkey_val" (current > value: "0", data source: default value, deprecated, synonym of: > btl_openib_pkey) > OpenFabrics partition key (pkey) value. > Unsigned integer decimal or hex values are allowed (e.g., "3" or > "0x3f") and > will be masked against the maximum allowable > IB paritition key value (0x7fff) > MCA btl: parameter "btl_openib_psn" (current value: "0", > data source: default value, synonyms: > btl_openib_ib_psn) > OpenFabrics packet sequence starting number > (must be >= 0) > MCA btl: parameter "btl_openib_ib_psn" (current > value: "0", data source: default value, deprecated, synonym of: > btl_openib_psn) > OpenFabrics packet sequence starting number > (must be >= 0) > MCA btl: parameter "btl_openib_ib_qp_ous_rd_atom" > (current value: "4", data source: default value) > InfiniBand outstanding atomic reads (must be > >= 0) > MCA btl: parameter "btl_openib_mtu" (current value: "3", > data source: default value, synonyms: > btl_openib_ib_mtu) > OpenFabrics MTU, in bytes (if not specified > in INI files). Valid values are: 1=256 bytes, > 2=512 bytes, 3=1024 bytes, > 4=2048 bytes, 5=4096 bytes > MCA btl: parameter "btl_openib_ib_mtu" (current > value: "3", data source: default value, deprecated, synonym of: > btl_openib_mtu) > OpenFabrics MTU, in bytes (if not specified > in INI files). Valid values are: 1=256 bytes, > 2=512 bytes, 3=1024 bytes, > 4=2048 bytes, 5=4096 bytes > MCA btl: parameter "btl_openib_ib_min_rnr_timer" > (current value: "25", data source: default value) > InfiniBand minimum "receiver not ready" > timer, in seconds (must be >= 0 and <= 31) > MCA btl: parameter "btl_openib_ib_timeout" (current > value: "20", data source: default value) > InfiniBand transmit timeout, plugged into > formula: 4.096 microseconds * (2^btl_openib_ib_timeout)(must be >= 0 > and <= > 31) > MCA btl: parameter "btl_openib_ib_retry_count" > (current value: "7", data source: default value) > InfiniBand transmit retry count (must be >= > 0 and <= 7) > MCA btl: parameter "btl_openib_ib_rnr_retry" (current > value: "7", data source: default value) > InfiniBand "receiver not ready" retry count; > applies *only* to SRQ/XRC queues. PP queues use RNR retry values of 0 > because Open MPI performs software flow > control to guarantee that RNRs never occur (must be >= 0 and <= 7; 7 = > "infinite") > MCA btl: parameter "btl_openib_ib_max_rdma_dst_ops" > (current value: "4", data source: default value) > InfiniBand maximum pending RDMA destination > operations (must be >= 0) > MCA btl: parameter "btl_openib_ib_service_level" > (current value: "0", data source: default value) > InfiniBand service level (must be >= 0 and > <= 15) > MCA btl: parameter "btl_openib_use_eager_rdma" > (current value: "-1", data source: default value) > Use RDMA for eager messages (-1 = use device > default, 0 = do not use eager RDMA, 1 = use eager > RDMA) > MCA btl: parameter "btl_openib_eager_rdma_threshold" > (current value: "16", data source: default value) > Use RDMA for short messages after this > number of messages are received from a given peer (must be >= 1) > MCA btl: parameter "btl_openib_max_eager_rdma" > (current value: "16", data source: default value) > Maximum number of peers allowed to use RDMA > for short messages (RDMA is used for all long messages, except if > explicitly disabled, such as with the "dr" > pml) (must be >= 0) > MCA btl: parameter "btl_openib_eager_rdma_num" > (current value: "16", data source: default value) > Number of RDMA buffers to allocate for small > messages(must be >= 1) > MCA btl: parameter "btl_openib_btls_per_lid" (current > value: "1", data source: default value) > Number of BTLs to create for each InfiniBand > LID (must be >= 1) > MCA btl: parameter "btl_openib_max_lmc" (current > value: "0", data source: default value) > Maximum number of LIDs to use for each > device port (must be >= 0, where 0 = use all available) > MCA btl: parameter "btl_openib_enable_apm_over_lmc" > (current value: "0", data source: default value) > Maximum number of alterative paths for each > device port (must be >= -1, where 0 = disable apm, > -1 = all availible > alternative paths ) > MCA btl: parameter "btl_openib_enable_apm_over_ports" > (current value: "0", data source: default value) > Enable alterative path migration (APM) over > different ports of the same device (must be >= 0, where 0 = disable > APM > over ports , 1 = enable APM over ports of > the same device) > MCA btl: parameter > "btl_openib_use_async_event_thread" (current value: "1", data source: > default value) > If nonzero, use the thread that will handle > InfiniBand asyncihronous events > MCA btl: parameter "btl_openib_buffer_alignment" > (current value: "64", data source: default value) > Prefered communication buffer alignment, in > bytes (must be > 0 and power of two) > MCA btl: parameter > "btl_openib_use_message_coalescing" (current value: "1", data source: > default value) > Use message coalescing > MCA btl: parameter "btl_openib_cq_poll_ratio" > (current value: "100", data source: default value) > how often poll high priority CQ versus low > priority CQ > MCA btl: parameter "btl_openib_eager_rdma_poll_ratio" > (current value: "100", data source: default > value) > how often poll eager RDMA channel versus CQ > MCA btl: parameter > "btl_openib_hp_cq_poll_per_progress" (current value: "10", data > source: default > value) > max number of completion events to process > for each call of BTL progress engine > MCA btl: information "btl_openib_have_fork_support" > (value: "1", data source: default value) > Whether the OpenFabrics stack supports > applications that invoke the "fork()" system call or not (0 = no, 1 = yes). > Note that this value does NOT indicate > whether the system being run on supports "fork()" with OpenFabrics > applications > or not. > MCA btl: parameter "btl_openib_exclusivity" (current > value: "1024", data source: default value) > BTL exclusivity (must be >= 0) > MCA btl: parameter "btl_openib_flags" (current value: > "310", data source: default value) > BTL bit flags (general flags: SEND=1, PUT=2, > GET=4, SEND_INPLACE=8, RDMA_MATCHED=64, HETEROGENEOUS_RDMA=256; flags > only used by the "dr" PML (ignored by > others): ACK=16, CHECKSUM=32, RDMA_COMPLETION=128) > MCA btl: parameter "btl_openib_rndv_eager_limit" > (current value: "12288", data source: default value) > Size (in bytes) of "phase 1" fragment sent > for all large messages (must be >= 0 and <= > eager_limit) > MCA btl: parameter "btl_openib_eager_limit" (current > value: "12288", data source: default value) > Maximum size (in bytes) of "short" messages (must > be >= 1). > MCA btl: parameter "btl_openib_max_send_size" > (current value: "65536", data source: default value) > Maximum size (in bytes) of a single "phase > 2" fragment of a long message when using the pipeline protocol (must > be >= > 1) > MCA btl: parameter "btl_openib_rdma_pipeline_send_length" > (current value: "1048576", data source: > default value) > Length of the "phase 2" portion of a large > message (in bytes) when using the pipeline protocol. This part of the > message will be split into fragments of size > max_send_size and sent using send/receive semantics (must be >= 0; > only > relevant when the PUT flag is set) > MCA btl: parameter > "btl_openib_rdma_pipeline_frag_size" (current value: "1048576", data > source: default > value) > Maximum size (in bytes) of a single "phase > 3" fragment from a long message when using the pipeline protocol. > These > fragments will be sent using RDMA semantics > (must be >= 1; only relevant when the PUT flag is > set) > MCA btl: parameter > "btl_openib_min_rdma_pipeline_size" (current value: "262144", data > source: default > value) > Messages smaller than this size (in bytes) > will not use the RDMA pipeline protocol. Instead, they will be split > into > fragments of max_send_size and sent using > send/receive semantics (must be >=0, and is automatically adjusted up > to at > least > (eager_limit+btl_rdma_pipeline_send_length); only relevant when the > PUT flag is set) > MCA btl: parameter "btl_openib_bandwidth" (current > value: "800", data source: default value) > Approximate maximum bandwidth of > interconnect(must be >= 1) > MCA btl: parameter "btl_openib_latency" (current > value: "10", data source: default value) > Approximate latency of interconnect (must be > >= 0) > MCA btl: parameter "btl_openib_receive_queues" (current > value: > > "P,128,256,192,128:S,2048,256,128,32:S,12288,256,128,32:S,65536,256,128,32", > data source: > default value) > Colon-delimited, comma delimited list of > receive queues: P,4096,8,6,4:P,32768,8,6,4 > MCA btl: parameter "btl_openib_if_include" (current > value: <none>, data source: default value) > Comma-delimited list of devices/ports to be > used (e.g. "mthca0,mthca1:2"; empty value means to use all ports found). > Mutually exclusive with btl_openib_if_exclude. > MCA btl: parameter "btl_openib_if_exclude" (current > value: <none>, data source: default value) > Comma-delimited list of device/ports to be > excluded (empty value means to not exclude any ports). Mutually > exclusive > with btl_openib_if_include. > MCA btl: parameter "btl_openib_ipaddr_include" > (current value: <none>, data source: default value) > Comma-delimited list of IP Addresses to be > used (e.g. "192.168.1.0/24"). Mutually exclusive with > btl_openib_ipaddr_exclude. > MCA btl: parameter "btl_openib_ipaddr_exclude" > (current value: <none>, data source: default value) > Comma-delimited list of IP Addresses to be > excluded (e.g. "192.168.1.0/24"). Mutually exclusive with > btl_openib_ipaddr_include. > MCA btl: parameter "btl_openib_cpc_include" (current > value: <none>, data source: default value) > Method used to select OpenFabrics > connections (valid values: oob,xoob,rdmacm) > MCA btl: parameter "btl_openib_cpc_exclude" (current > value: <none>, data source: default value) > Method used to exclude OpenFabrics > connections (valid values: oob,xoob,rdmacm) > MCA btl: parameter "btl_openib_connect_oob_priority" > (current value: "50", data source: default value) > The selection method priority for oob > MCA btl: parameter "btl_openib_connect_xoob_priority" > (current value: "60", data source: default value) > The selection method priority for xoob > MCA btl: parameter > "btl_openib_connect_rdmacm_priority" (current value: "30", data > source: default > value) > The selection method priority for rdma_cm > MCA btl: parameter "btl_openib_connect_rdmacm_port" > (current value: "0", data source: default value) > The selection method port for rdma_cm > MCA btl: parameter > "btl_openib_connect_rdmacm_resolve_timeout" (current value: "1000", data > source: > default value) > The timeout (in miliseconds) for address and > route resolution > MCA btl: parameter > "btl_openib_connect_rdmacm_retry_count" (current value: "20", data > source: default > value) > Maximum number of times rdmacm will retry > route resolution > MCA btl: parameter > "btl_openib_connect_rdmacm_reject_causes_connect_error" (current > value: "0", data > source: default value) > The drivers for some devices are buggy such > that an RDMA REJECT action may result in a CONNECT_ERROR event instead > of > a REJECTED event. Setting this MCA > parameter to true tells Open MPI to treat CONNECT_ERROR events on > connections > where a REJECT is expected as a REJECT > (default: false) > MCA btl: parameter "btl_openib_priority" (current > value: "0", data source: default value) > MCA btl: parameter "btl_base_warn_component_unused" > (current value: "1", data source: default value) > This parameter is used to turn on warning > messages when certain NICs are not used > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > >