[gem5-users] Re: Load dependency in gem5

2022-10-07 Thread Eliot Moss

On 10/7/2022 1:30 PM, Aritra Bagchi wrote:

Hi Eliot,

Thanks for the response. The unrolled loop, despite having the same dependency across "j", can send 
multiple loads simultaneously. So the limitation might not be due to that dependency across "j" of 
different iterations. But in the non-unrolled loop, the control dependency is there, which goes away 
when the loop is unrolled. But even without any speculation, gem5 could have scheduled loads as 
follows:


first schedule load A[ k ]
then, compare i with N
then schedule load A [ k + 1 ] if i < N

But what is happening is A[ k + 1 ] is scheduled only after load A[ k ] is completed. Is that 
completion necessary? It seems it isn't. The memory system is underutilised.


Thanks and regards,
Aritra


I understand your thinking, but it would be helpful to include
the assembly code listings for the two cases for us to apply
more careful reasoning as to what may be happening.

Best - EM
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Load dependency in gem5

2022-10-07 Thread Aritra Bagchi
Hi Eliot,

Thanks for the response. The unrolled loop, despite having the same
dependency across "j", can send multiple loads simultaneously. So the
limitation might not be due to that dependency across "j" of different
iterations. But in the non-unrolled loop, the control dependency is there,
which goes away when the loop is unrolled. But even without any
speculation, gem5 could have scheduled loads as follows:

first schedule load A[ k ]
then, compare i with N
then schedule load A [ k + 1 ] if i < N

But what is happening is A[ k + 1 ] is scheduled only after load A[ k ] is
completed. Is that completion necessary? It seems it isn't. The memory
system is underutilised.

Thanks and regards,
Aritra


On Fri, Oct 7, 2022 at 10:48 PM Eliot Moss  wrote:

> On 10/7/2022 1:13 PM, Eliot Moss wrote:
> > On 10/7/2022 1:03 PM, Aritra Bagchi wrote:
> >> Hi all,
> >>
> >> Any suggestions on this are most helpful.
> >>
> >> Thanks and regards,
> >> Aritra
> >
> > My guess is that it is because the non-unrolled loop
> > has a test of i against 1000 before each access to A[i].
> > That test guards the load, so must be completed before
> > the load can proceed.  It could also be because of the
> > way j is used - the next update cannot proceed until
> > the last one finishes.  It might be helpful to loop
> > at the actual instructions involved, but the control
> > dependency could be an issue.
> >
> > The unrolled loop avoids both of these possible
> > dependencies.
> >
> > I further observe that if we are talking about an
> > Intel processor. those processor handle loads in the
> > order the program presents them.  Not sure if that
> > has any impact here.  Also unsure whether cpu
> > speculative execution plays a role (which would actually
> > improve matters).
> >
> > Best - Eliot Moss
> >
> >> On Thu, Oct 6, 2022 at 6:01 PM Aritra Bagchi  >> > wrote:
> >>
> >> Hi all,
> >>
> >> *for (i = 0; i < 1000; i++) {*
> >> *  j = j + A[ i ]*
> >> *}*
> >>
> >> Suppose such a loop program is executed on gem5 (single-core
> execution, with O3 CU model). In
> >> that case, the memory hierarchy gets to see only one access at a
> time, e.g. only after A[ k ] is
> >> completed, A [ k + 1 ] access is sent to the memory hierarchy.
> Whereas, if the loop is unrolled
> >> (on i), multiple memory accesses are seen simultaneously. Why is
> that so? The memory loads could
> >> be serviced independently (even without unrolling the loop), so why
> is gem5 taking such a
> >> conservative approach?
> >>
> >> Any form of help/suggestion is highly appreciated.
> >>
> >> Thanks and regards,
> >> Aritra Bagchi
> >> Research Scholar,
> >> Department of Computer Science and Engineering,
> >> Indian Institute of Technology Delhi
> >>
> >>
> >>
> >> ___
> >> gem5-users mailing list -- gem5-users@gem5.org
> >> To unsubscribe send an email to gem5-users-le...@gem5.org
> > ___
> > gem5-users mailing list -- gem5-users@gem5.org
> > To unsubscribe send an email to gem5-users-le...@gem5.org
> ___
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Load dependency in gem5

2022-10-07 Thread Eliot Moss

On 10/7/2022 1:13 PM, Eliot Moss wrote:

On 10/7/2022 1:03 PM, Aritra Bagchi wrote:

Hi all,

Any suggestions on this are most helpful.

Thanks and regards,
Aritra


My guess is that it is because the non-unrolled loop
has a test of i against 1000 before each access to A[i].
That test guards the load, so must be completed before
the load can proceed.  It could also be because of the
way j is used - the next update cannot proceed until
the last one finishes.  It might be helpful to loop
at the actual instructions involved, but the control
dependency could be an issue.

The unrolled loop avoids both of these possible
dependencies.

I further observe that if we are talking about an
Intel processor. those processor handle loads in the
order the program presents them.  Not sure if that
has any impact here.  Also unsure whether cpu
speculative execution plays a role (which would actually
improve matters).

Best - Eliot Moss

On Thu, Oct 6, 2022 at 6:01 PM Aritra Bagchi > wrote:


    Hi all,

    *for (i = 0; i < 1000; i++) {*
    *      j = j + A[ i ]*
    *}*

    Suppose such a loop program is executed on gem5 (single-core execution, 
with O3 CU model). In
    that case, the memory hierarchy gets to see only one access at a time, e.g. 
only after A[ k ] is
    completed, A [ k + 1 ] access is sent to the memory hierarchy. Whereas, if 
the loop is unrolled
    (on i), multiple memory accesses are seen simultaneously. Why is that so? 
The memory loads could
    be serviced independently (even without unrolling the loop), so why is gem5 
taking such a
    conservative approach?

    Any form of help/suggestion is highly appreciated.

    Thanks and regards,
    Aritra Bagchi
    Research Scholar,
    Department of Computer Science and Engineering,
    Indian Institute of Technology Delhi



___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Load dependency in gem5

2022-10-07 Thread Eliot Moss

On 10/7/2022 1:03 PM, Aritra Bagchi wrote:

Hi all,

Any suggestions on this are most helpful.

Thanks and regards,
Aritra


My guess is that is is because the non-unrolled loop
has a test of i against 1000 before each access to A[i].
That test guards the load, so must be completed before
the load can proceed.  It could also be because of the
way j is used - the next update cannot proceed until
the last one finishes.  It might be helpful to loop
at the actual instructions involved, but the control
dependency could be an issue.

The unrolled loop avoids both of these possible
dependencies.

I further observe that if we are talking about an
Intel processor. those processor handle loads in the
order the program presents them.  Not sure if that
has any impact here.  Also unsure whether cpu
speculative execution plays a role (which would actually
improve matters).

Best - Eliot Moss

On Thu, Oct 6, 2022 at 6:01 PM Aritra Bagchi > wrote:


Hi all,

*for (i = 0; i < 1000; i++) {*
*      j = j + A[ i ]*
*}*

Suppose such a loop program is executed on gem5 (single-core execution, 
with O3 CU model). In
that case, the memory hierarchy gets to see only one access at a time, e.g. 
only after A[ k ] is
completed, A [ k + 1 ] access is sent to the memory hierarchy. Whereas, if 
the loop is unrolled
(on i), multiple memory accesses are seen simultaneously. Why is that so? 
The memory loads could
be serviced independently (even without unrolling the loop), so why is gem5 
taking such a
conservative approach?

Any form of help/suggestion is highly appreciated.

Thanks and regards,
Aritra Bagchi
Research Scholar,
Department of Computer Science and Engineering,
Indian Institute of Technology Delhi



___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org


[gem5-users] Re: Load dependency in gem5

2022-10-07 Thread Aritra Bagchi
Hi all,

Any suggestions on this are most helpful.

Thanks and regards,
Aritra


On Thu, Oct 6, 2022 at 6:01 PM Aritra Bagchi 
wrote:

> Hi all,
>
> *for (i = 0; i < 1000; i++) {*
> *  j = j + A[ i ]*
> *}*
>
> Suppose such a loop program is executed on gem5 (single-core execution,
> with O3 CU model). In that case, the memory hierarchy gets to see only one
> access at a time, e.g. only after A[ k ] is completed, A [ k + 1 ] access
> is sent to the memory hierarchy. Whereas, if the loop is unrolled (on i),
> multiple memory accesses are seen simultaneously. Why is that so? The
> memory loads could be serviced independently (even without unrolling the
> loop), so why is gem5 taking such a conservative approach?
>
> Any form of help/suggestion is highly appreciated.
>
> Thanks and regards,
> Aritra Bagchi
> Research Scholar,
> Department of Computer Science and Engineering,
> Indian Institute of Technology Delhi
>
>
>
___
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org