Re: [gem5-users] x86 floating point instruction

2018-05-25 Thread Tariq Azmy
Hi Gabe, Jason,

Are those x86 SIMD SSE arithmetic instructions take only one cycle as
latency? I looked into the FuncUnitConfig.py and seems like the op lats for
the SIMD functional units are not defined, so I assumed it takes value of 1
by default.

I am not really familiar with x86 SIMD extension, so maybe this question is
more related to x86 ISA in general.

Thanks.

On Thu, May 24, 2018 at 9:52 AM, Jason Lowe-Power 
wrote:

> Hi Tariq,
>
> It wold be great if you could review Gabe's patch on gerrit. Since it
> works for you, giving it a +1 or a +2 would be appropriate.
>
> Cheers,
> Jason
>
> On Wed, May 23, 2018 at 5:56 PM Tariq Azmy 
> wrote:
>
>> Thanks Gabe. Yeah it does not impact the program but it's just that the
>> statistic is incorrect.
>>
>> By the way, I applied the patch and stats now shows correct micro-ops
>> entries.
>>
>> Appreciate your help. Thanks again
>>
>> On Wed, May 23, 2018 at 6:51 PM, Gabe Black  wrote:
>>
>>> Yep, those microops aren't given a operand class, and so the isa parser
>>> is guessing and making the FloatAddOp. I haven't really tested this beyond
>>> making sure it compiles, but here's a patch that might get this working for
>>> you.
>>>
>>> https://gem5-review.googlesource.com/c/public/gem5/+/10541
>>>
>>> Gabe
>>>
>>> On Wed, May 23, 2018 at 4:13 PM, Gabe Black 
>>> wrote:
>>>
 I'm confident they aren't implemented with floating point add. It's
 likely either that the microops are misclassified, or they're unimplemented
 and printing a warning, but the fact that they don't actually do any math
 isn't impacting your program for whatever reason. I'll take a quick look.

 Gabe

 On Wed, May 23, 2018 at 2:07 PM, Tariq Azmy 
 wrote:

> Hi,
>
> I wrote simple code that does simple floating point multiplication and
> division operation and from the assembly, I can see there are MULSS and
> DIVSS instructions. But after I ran the simulation on gem5 and looked at
> the stat.txt, I can only see the entries in 
> system.cpu.iq.FU_type_0::FloatAdd,
> where as the entries in FloatMul and FloatDiv remains 0.
>
> If I understand correctly, these stats refer to the micro-ops. Does
> that mean the MULSS and DIVSS instruction are broken down and executed 
> with
> floating point Add?
>
> Thanks
>
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>


>>>
>>> ___
>>> gem5-users mailing list
>>> gem5-users@gem5.org
>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>
>>
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Response for WritebackDirty packets (learning.gem5)

2018-05-25 Thread Muhammad Ali Akhtar
Dear Jason,

Thkns for the response. Just another quick question.

What if memory was busy when u call the "sendTiimingReq" for WritebackDirty
packet.  In insert() function, when you call memport.sendTimingReq for
WritebackDirty blocks, you don't save them in blocked Packet, in case
Memory is blocked and called 'sendReqRetry()" later.



Muhammad Ali Akhtar
Principal Design Engineer
http://www.linkedin.com/in/muhammadakhtar

On Tue, May 22, 2018 at 3:40 AM, Jason Lowe-Power 
wrote:

> Hello,
>
> No. You should not have a response for WritebackDirty. In fact, most
> (all?) writes do not have responses. See src/mem/packet.cc. (
> https://gem5.googlesource.com/public/gem5/+/master/src/mem/packet.cc#80)
> Some commands have the "NeedsResponse" flag set. If so, this request will
> be turned into a response by whatever memory object fulfills the request
> (by calling pkt.makeResponse()).
>
> I hope this answers your question.
>
> Jason
>
> On Sat, May 19, 2018 at 11:38 PM Muhammad Ali Akhtar <
> muhammadali...@gmail.com> wrote:
>
>> Hello All,
>>
>> Following jason's website, created my own cache.
>>
>> On Cache miss, I send the TimingReq to memory, and get the response,
>> which I handle in "handleResponse".
>>
>> during HandleResponse, in case the insertion causes eviction (cache was
>> full), the insert function generates another memPort.sendTimingReq(). This
>> time, the pkt is WritebackDirty. However, For this TimingReq() to memory
>> (WritebackDirty), we don't get any response from memory Write?
>>
>> My question is:
>>
>> Do we ever get a response from memory for packets of type
>> "WritebackDirty". When I examine the simulator output, it seems that it
>> moves on to next instrutions without waiting for response from memory for
>> this particular request.
>>
>>
>> Muhammad Ali Akhtar
>> Principal Design Engineer
>> http://www.linkedin.com/in/muhammadakhtar
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] RISCV ISA : "C" (compressed) extension supported?

2018-05-25 Thread Marcelo Brandalero
Hi Jason, Alec,

Just to provide some feedback on this issue, it seems that the processor is
mistakenly identifying (add reg, reg, reg) in compressed format as a branch
instruction.

I'm running a kernel that looks like this (result from
*riscv64-unknown-elf-objdump
-D*)

0001019a :
  1019a:   06400793li  a5,100
  1019e:   4701li  a4,0
  101a0:   4681li  a3,0
  101a2:   4601li  a2,0
  101a4:   0c800513li  a0,200
  101a8:   952aadd a0,a0,a0
  101aa:   9632add a2,a2,a2
  101ac:   96b6add a3,a3,a3
  101ae:   973aadd a4,a4,a4




*   101b0:   952aadd a0,a0,a0   101b2:
  9632add a2,a2,a2   101b4:   96b6
   add a3,a3,a3   101b6:   973a
   add a4,a4,a4*(repeat the four instructions above
until this:)
  104b8:   952aadd a0,a0,a0
  104ba:   9632add a2,a2,a2
  104bc:   96b6add a3,a3,a3
  104be:   973aadd a4,a4,a4
  104c0:   952aadd a0,a0,a0
  104c2:   2501sext.w  a0,a0
  104c4:   9632add a2,a2,a2
  104c6:   2601sext.w  a2,a2
  104c8:   96b6add a3,a3,a3
  104ca:   2681sext.w  a3,a3
  104cc:   973aadd a4,a4,a4
  104ce:   2701sext.w  a4,a4
  104d0:   37fdaddiw   a5,a5,-1
  104d2:   cc079be3bneza5,101a8 

And what the Fetch stage looks like when fetching this code block is this:

4048968: system.cpu.fetch: [tid:0] Waking up from cache miss.
4048968: system.cpu.fetch: Running stage.
4048968: system.cpu.fetch: Attempting to fetch from [tid:0]
4048968: system.cpu.fetch: [tid:0]: Icache miss is complete.
4048968: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4048968: system.cpu.fetch: [tid:0]: Instruction PC 0x101a8 (0) created
[sn:8124].
4048968: system.cpu.fetch: [tid:0]: Instruction is: c_add a0, a0, a0
4048968: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4048968: system.cpu.fetch: Branch detected with PC =
(0x101a8=>0x101aa).(0=>1)*
4048968: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4048968: system.cpu.fetch: [tid:0][sn:8124]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4049281: system.cpu.fetch: Running stage.
4049281: system.cpu.fetch: Attempting to fetch from [tid:0]
4049281: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4049281: system.cpu.fetch: [tid:0]: Instruction PC 0x101aa (0) created
[sn:8125].
4049281: system.cpu.fetch: [tid:0]: Instruction is: c_add a2, a2, a2
4049281: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4049281: system.cpu.fetch: Branch detected with PC =
(0x101aa=>0x101ac).(0=>1)*
4049281: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4049281: system.cpu.fetch: [tid:0][sn:8125]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4049594: system.cpu.fetch: Running stage.
4049594: system.cpu.fetch: Attempting to fetch from [tid:0]
4049594: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4049594: system.cpu.fetch: [tid:0]: Instruction PC 0x101ac (0) created
[sn:8126].
4049594: system.cpu.fetch: [tid:0]: Instruction is: c_add a3, a3, a3
4049594: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4049594: system.cpu.fetch: Branch detected with PC =
(0x101ac=>0x101ae).(0=>1)*
4049594: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4049594: system.cpu.fetch: [tid:0][sn:8126]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4049907: system.cpu.fetch: Running stage.
4049907: system.cpu.fetch: Attempting to fetch from [tid:0]
4049907: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4049907: system.cpu.fetch: [tid:0]: Instruction PC 0x101ae (0) created
[sn:8127].
4049907: system.cpu.fetch: [tid:0]: Instruction is: c_add a4, a4, a4
4049907: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4049907: system.cpu.fetch: Branch detected with PC =
(0x101ae=>0x101b0).(0=>1)*
4049907: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4049907: system.cpu.fetch: [tid:0][sn:8127]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4050220: system.cpu.fetch: Running stage.
4050220: system.cpu.fetch: Attempting to fetch from [tid:0]
4050220: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4050220: