[Valgrind-users] 答复: 答复: 答复: 答复: [Help] Valgrind sometime run the program very slowly sometimes , it last at least one hour. can you show me why or some way to analyze it?

Wuweijia Thu, 25 Jan 2018 23:00:42 -0800

Hi:

How large is 'm_nStep'?  [Are you sure?]


The source  as below, all are the integer. Do you care what value ?.
class CDynamicScheduling
{
public:
        static const int m_nDefaultStepUnit;
        static const int m_nDefaultStepFactor;

private:
        int m_nBegin;
        int m_nEnd;
        int m_nStep;
#if defined(_MSC_VER)
        std::atomic<int> m_nCurrent;
#else
        int m_nCurrent;
#endif


I hope the actual source contains a comment such as:
     Compute pDst[] as the rounded average of non-overlapping 2x2 blocks of 
pixels in pSrc[].

    Yes, you are right. It just compute the average of 2 * 2 blocks

I show you just the aarch64 neon code:
This is same function, but implement is x86.

      UINT16 *pDstL;
        UINT16 *pSrcL;
        INT32 dstWDiv2 = srcW >> 1;
//      INT32 dstHDiv2 = srcH >> 1;
        INT32 x, y;
        INT32 posDst,posSrc;

        pSrcL = pSrc;
        pDstL = pDst;

        int beginY, endY;
        while (pDS->GetProcLoop(beginY, endY))
        {
//              for (y = 0; y < dstHDiv2; y++)
                for (y = beginY; y < endY; y++)
                {
                        for (x = 0; x < dstWDiv2; x++)
                        {
                                posDst = y*dstStride + x;
                                posSrc = (y<<1)*srcStride + (x<<1);
                                pDstL[posDst] = (pSrcL[posSrc] + pSrcL[posSrc + 
1] + pSrcL[posSrc+srcStride] + pSrcL[posSrc+srcStride + 1] + 2) >> 2;
                        }
                }
        }
     
       pSrc is image  buffer,  about 11m.  Width:3968  Height: 2976  srcStride: 
3968
      It meant  four thread compute the average of 2 * 2 blocks
      pSrc is divided into many small pieces , and compute the average of every 
piceces, not by designed,  by status of the running threads, maybe some threads 
 hold the cpu ,so they compute more pieces, Maybe some thread not hold the cpu, 
compute less pieces ;
     
       
BR
Owen

-----邮件原件-----
发件人: John Reiser [mailto:jrei...@bitwagon.com] 
发送时间: 2018年1月26日 12:44
收件人: valgrind-users@lists.sourceforge.net
主题: Re: [Valgrind-users] 答复: 答复: 答复: [Help] Valgrind sometime run the program 
very slowly sometimes , it last at least one hour. can you show me why or some 
way to analyze it?

On 01/25/2018 15:37 UTC, Wuweijia wrote:

>       Function1:
> bool CDynamicScheduling::GetProcLoop(
>          int& nBegin,
>          int& nEndPlusOne)
> {
>          int curr = __sync_fetch_and_add(&m_nCurrent, m_nStep);

How large is 'm_nStep'?  [Are you sure?] The overhead expense of switching 
threads in valgrind would be reduced by making m_nStep as large as possible.  
It looks like the code in Function2 would produce the same values regardless.


>          if (curr > m_nEnd)
>          {
>                  return false;
>          }
> 
>          nBegin = curr;

>          int limit = m_nEnd + 1;

Local variable 'limit' is unused.  By itself this is unimportant, but it might 
be a clue to something that is not shown here.

>          nEndPlusOne = curr + m_nStep;
>          return true;
> }
>       
>       
>       Function2:
>       ....
>       int beginY, endY;
>    while (pDS->GetProcLoop(beginY, endY)){
>      for (y = beginY; y < endY; y++){
>        for(x = 0; x < dstWDiv2-7; x+=8){
>          vtmp0 = vld2q_u16(&pSrc[(y<<1)*srcStride+(x<<1)]);
>          vtmp1 = vld2q_u16(&pSrc[((y<<1)+1)*srcStride+(x<<1)]);

I hope the actual source contains a comment such as:
     Compute pDst[] as the rounded average of non-overlapping 2x2 blocks of 
pixels in pSrc[].

>          vst1q_u16(&pDst[y*dstStride+x], (vtmp0.val[0] + vtmp0.val[1] + 
> vtmp1.val[0] + vtmp1.val[1] + vdupq_n_u16(2)) >> vdupq_n_u16(2));
>        }
>        for(; x < dstWDiv2; x++){
>          pDst[y*dstStride+x] = (pSrc[(y<<1)*srcStride+(x<<1)] + 
> pSrc[(y<<1)*srcStride+(x<<1)+1] + pSrc[((y<<1)+1)*srcStride+(x<<1)] + 
> pSrc[((y<<1)+1)*srcStride+((x<<1)+1)] + 2) >> 2;
>        }
>      }
>    }
> 
>    return;
> }     

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most engaging tech 
sites, Slashdot.org! http://sdm.link/slashdot 
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

[Valgrind-users] 答复: 答复: 答复: 答复: [Help] Valgrind sometime run the program very slowly sometimes , it last at least one hour. can you show me why or some way to analyze it?

Reply via email to