Things that read like they should be unsigned look suspicious to me:
nbElems -909934592
count -1819869184
Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov
> On Nov 1, 2018, at 10:34 PM, Ben Menadue wrote:
>
> Hi,
>
> I haven’t heard back from the user yet, but I just put this e
Hi,I haven’t heard back from the user yet, but I just put this example together which works on 1, 2, and 3 ranks but fails for 4. Unfortunately it needs a fair amount of memory, about 14.3GB per process, so I was running it with -map-by ppr:1:node.It doesn’t fail with the segfault as the user’s cod
HI Gilles,
> On 2 Nov 2018, at 11:03 am, Gilles Gouaillardet wrote:
> I noted the stack traces refers opal_cuda_memcpy(). Is this issue specific to
> CUDA environments ?
No, this is just on normal CPU-only nodes. But memcpy always goes through
opal_cuda_memcpy when CUDA support is enabled, eve
Hi Ben,
I noted the stack traces refers opal_cuda_memcpy(). Is this issue
specific to CUDA environments ?
The coll/tuned default collective module is known not to work when tasks
use matching but different signatures.
For example, one task sends one vector of N elements, and the other tas
Hi,
One of our users is reporting an issue using MPI_Allgatherv with a large
derived datatype — it segfaults inside OpenMPI. Using a debug build of OpenMPI
3.1.2 produces a ton of messages like this before the segfault:
[r3816:50921] ../../../../../opal/datatype/opal_datatype_pack.h:53