Hi: There are some news about this question. The new code as below, I change from __sync_fetch_and_add to pthread_mutex_xxx pthread_mutex_lock(&g_mutex); int curr = m_nCurrent; m_nCurrent += m_nStep; pthread_mutex_unlock(&g_mutex); Now there is no testcases with valgrind running too long, and failed. But pthread_mutex_lock is not efficient as __sync_fetch_and_add, so the pthread_mutex_lock is just for now, for testing.
And I think there is something related to schedule module of valgrind . why it last too long? BR Owen -----邮件原件----- 发件人: John Reiser [mailto:jrei...@bitwagon.com] 发送时间: 2018年1月26日 12:44 收件人: valgrind-users@lists.sourceforge.net 主题: Re: [Valgrind-users] 答复: 答复: 答复: [Help] Valgrind sometime run the program very slowly sometimes , it last at least one hour. can you show me why or some way to analyze it? On 01/25/2018 15:37 UTC, Wuweijia wrote: > Function1: > bool CDynamicScheduling::GetProcLoop( > int& nBegin, > int& nEndPlusOne) > { > int curr = __sync_fetch_and_add(&m_nCurrent, m_nStep); How large is 'm_nStep'? [Are you sure?] The overhead expense of switching threads in valgrind would be reduced by making m_nStep as large as possible. It looks like the code in Function2 would produce the same values regardless. > if (curr > m_nEnd) > { > return false; > } > > nBegin = curr; > int limit = m_nEnd + 1; Local variable 'limit' is unused. By itself this is unimportant, but it might be a clue to something that is not shown here. > nEndPlusOne = curr + m_nStep; > return true; > } > > > Function2: > .... > int beginY, endY; > while (pDS->GetProcLoop(beginY, endY)){ > for (y = beginY; y < endY; y++){ > for(x = 0; x < dstWDiv2-7; x+=8){ > vtmp0 = vld2q_u16(&pSrc[(y<<1)*srcStride+(x<<1)]); > vtmp1 = vld2q_u16(&pSrc[((y<<1)+1)*srcStride+(x<<1)]); I hope the actual source contains a comment such as: Compute pDst[] as the rounded average of non-overlapping 2x2 blocks of pixels in pSrc[]. > vst1q_u16(&pDst[y*dstStride+x], (vtmp0.val[0] + vtmp0.val[1] + > vtmp1.val[0] + vtmp1.val[1] + vdupq_n_u16(2)) >> vdupq_n_u16(2)); > } > for(; x < dstWDiv2; x++){ > pDst[y*dstStride+x] = (pSrc[(y<<1)*srcStride+(x<<1)] + > pSrc[(y<<1)*srcStride+(x<<1)+1] + pSrc[((y<<1)+1)*srcStride+(x<<1)] + > pSrc[((y<<1)+1)*srcStride+((x<<1)+1)] + 2) >> 2; > } > } > } > > return; > } ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users