Re: [gem5-users] x86 floating point instruction
Hi Gabe, Jason, Are those x86 SIMD SSE arithmetic instructions take only one cycle as latency? I looked into the FuncUnitConfig.py and seems like the op lats for the SIMD functional units are not defined, so I assumed it takes value of 1 by default. I am not really familiar with x86 SIMD extension, so maybe this question is more related to x86 ISA in general. Thanks. On Thu, May 24, 2018 at 9:52 AM, Jason Lowe-Power wrote: > Hi Tariq, > > It wold be great if you could review Gabe's patch on gerrit. Since it > works for you, giving it a +1 or a +2 would be appropriate. > > Cheers, > Jason > > On Wed, May 23, 2018 at 5:56 PM Tariq Azmy > wrote: > >> Thanks Gabe. Yeah it does not impact the program but it's just that the >> statistic is incorrect. >> >> By the way, I applied the patch and stats now shows correct micro-ops >> entries. >> >> Appreciate your help. Thanks again >> >> On Wed, May 23, 2018 at 6:51 PM, Gabe Black wrote: >> >>> Yep, those microops aren't given a operand class, and so the isa parser >>> is guessing and making the FloatAddOp. I haven't really tested this beyond >>> making sure it compiles, but here's a patch that might get this working for >>> you. >>> >>> https://gem5-review.googlesource.com/c/public/gem5/+/10541 >>> >>> Gabe >>> >>> On Wed, May 23, 2018 at 4:13 PM, Gabe Black >>> wrote: >>> I'm confident they aren't implemented with floating point add. It's likely either that the microops are misclassified, or they're unimplemented and printing a warning, but the fact that they don't actually do any math isn't impacting your program for whatever reason. I'll take a quick look. Gabe On Wed, May 23, 2018 at 2:07 PM, Tariq Azmy wrote: > Hi, > > I wrote simple code that does simple floating point multiplication and > division operation and from the assembly, I can see there are MULSS and > DIVSS instructions. But after I ran the simulation on gem5 and looked at > the stat.txt, I can only see the entries in > system.cpu.iq.FU_type_0::FloatAdd, > where as the entries in FloatMul and FloatDiv remains 0. > > If I understand correctly, these stats refer to the micro-ops. Does > that mean the MULSS and DIVSS instruction are broken down and executed > with > floating point Add? > > Thanks > > > ___ > gem5-users mailing list > gem5-users@gem5.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > >>> >>> ___ >>> gem5-users mailing list >>> gem5-users@gem5.org >>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>> >> >> ___ >> gem5-users mailing list >> gem5-users@gem5.org >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > ___ > gem5-users mailing list > gem5-users@gem5.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] Response for WritebackDirty packets (learning.gem5)
Dear Jason, Thkns for the response. Just another quick question. What if memory was busy when u call the "sendTiimingReq" for WritebackDirty packet. In insert() function, when you call memport.sendTimingReq for WritebackDirty blocks, you don't save them in blocked Packet, in case Memory is blocked and called 'sendReqRetry()" later. Muhammad Ali Akhtar Principal Design Engineer http://www.linkedin.com/in/muhammadakhtar On Tue, May 22, 2018 at 3:40 AM, Jason Lowe-Power wrote: > Hello, > > No. You should not have a response for WritebackDirty. In fact, most > (all?) writes do not have responses. See src/mem/packet.cc. ( > https://gem5.googlesource.com/public/gem5/+/master/src/mem/packet.cc#80) > Some commands have the "NeedsResponse" flag set. If so, this request will > be turned into a response by whatever memory object fulfills the request > (by calling pkt.makeResponse()). > > I hope this answers your question. > > Jason > > On Sat, May 19, 2018 at 11:38 PM Muhammad Ali Akhtar < > muhammadali...@gmail.com> wrote: > >> Hello All, >> >> Following jason's website, created my own cache. >> >> On Cache miss, I send the TimingReq to memory, and get the response, >> which I handle in "handleResponse". >> >> during HandleResponse, in case the insertion causes eviction (cache was >> full), the insert function generates another memPort.sendTimingReq(). This >> time, the pkt is WritebackDirty. However, For this TimingReq() to memory >> (WritebackDirty), we don't get any response from memory Write? >> >> My question is: >> >> Do we ever get a response from memory for packets of type >> "WritebackDirty". When I examine the simulator output, it seems that it >> moves on to next instrutions without waiting for response from memory for >> this particular request. >> >> >> Muhammad Ali Akhtar >> Principal Design Engineer >> http://www.linkedin.com/in/muhammadakhtar >> ___ >> gem5-users mailing list >> gem5-users@gem5.org >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > ___ > gem5-users mailing list > gem5-users@gem5.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > ___ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Re: [gem5-users] RISCV ISA : "C" (compressed) extension supported?
Hi Jason, Alec, Just to provide some feedback on this issue, it seems that the processor is mistakenly identifying (add reg, reg, reg) in compressed format as a branch instruction. I'm running a kernel that looks like this (result from *riscv64-unknown-elf-objdump -D*) 0001019a : 1019a: 06400793li a5,100 1019e: 4701li a4,0 101a0: 4681li a3,0 101a2: 4601li a2,0 101a4: 0c800513li a0,200 101a8: 952aadd a0,a0,a0 101aa: 9632add a2,a2,a2 101ac: 96b6add a3,a3,a3 101ae: 973aadd a4,a4,a4 * 101b0: 952aadd a0,a0,a0 101b2: 9632add a2,a2,a2 101b4: 96b6 add a3,a3,a3 101b6: 973a add a4,a4,a4*(repeat the four instructions above until this:) 104b8: 952aadd a0,a0,a0 104ba: 9632add a2,a2,a2 104bc: 96b6add a3,a3,a3 104be: 973aadd a4,a4,a4 104c0: 952aadd a0,a0,a0 104c2: 2501sext.w a0,a0 104c4: 9632add a2,a2,a2 104c6: 2601sext.w a2,a2 104c8: 96b6add a3,a3,a3 104ca: 2681sext.w a3,a3 104cc: 973aadd a4,a4,a4 104ce: 2701sext.w a4,a4 104d0: 37fdaddiw a5,a5,-1 104d2: cc079be3bneza5,101a8 And what the Fetch stage looks like when fetching this code block is this: 4048968: system.cpu.fetch: [tid:0] Waking up from cache miss. 4048968: system.cpu.fetch: Running stage. 4048968: system.cpu.fetch: Attempting to fetch from [tid:0] 4048968: system.cpu.fetch: [tid:0]: Icache miss is complete. 4048968: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode. 4048968: system.cpu.fetch: [tid:0]: Instruction PC 0x101a8 (0) created [sn:8124]. 4048968: system.cpu.fetch: [tid:0]: Instruction is: c_add a0, a0, a0 4048968: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256). *4048968: system.cpu.fetch: Branch detected with PC = (0x101a8=>0x101aa).(0=>1)* 4048968: system.cpu.fetch: [tid:0]: Done fetching, predicted branch instruction encountered. 4048968: system.cpu.fetch: [tid:0][sn:8124]: Sending instruction to decode from fetch queue. Fetch queue size: 1. 4049281: system.cpu.fetch: Running stage. 4049281: system.cpu.fetch: Attempting to fetch from [tid:0] 4049281: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode. 4049281: system.cpu.fetch: [tid:0]: Instruction PC 0x101aa (0) created [sn:8125]. 4049281: system.cpu.fetch: [tid:0]: Instruction is: c_add a2, a2, a2 4049281: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256). *4049281: system.cpu.fetch: Branch detected with PC = (0x101aa=>0x101ac).(0=>1)* 4049281: system.cpu.fetch: [tid:0]: Done fetching, predicted branch instruction encountered. 4049281: system.cpu.fetch: [tid:0][sn:8125]: Sending instruction to decode from fetch queue. Fetch queue size: 1. 4049594: system.cpu.fetch: Running stage. 4049594: system.cpu.fetch: Attempting to fetch from [tid:0] 4049594: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode. 4049594: system.cpu.fetch: [tid:0]: Instruction PC 0x101ac (0) created [sn:8126]. 4049594: system.cpu.fetch: [tid:0]: Instruction is: c_add a3, a3, a3 4049594: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256). *4049594: system.cpu.fetch: Branch detected with PC = (0x101ac=>0x101ae).(0=>1)* 4049594: system.cpu.fetch: [tid:0]: Done fetching, predicted branch instruction encountered. 4049594: system.cpu.fetch: [tid:0][sn:8126]: Sending instruction to decode from fetch queue. Fetch queue size: 1. 4049907: system.cpu.fetch: Running stage. 4049907: system.cpu.fetch: Attempting to fetch from [tid:0] 4049907: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode. 4049907: system.cpu.fetch: [tid:0]: Instruction PC 0x101ae (0) created [sn:8127]. 4049907: system.cpu.fetch: [tid:0]: Instruction is: c_add a4, a4, a4 4049907: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256). *4049907: system.cpu.fetch: Branch detected with PC = (0x101ae=>0x101b0).(0=>1)* 4049907: system.cpu.fetch: [tid:0]: Done fetching, predicted branch instruction encountered. 4049907: system.cpu.fetch: [tid:0][sn:8127]: Sending instruction to decode from fetch queue. Fetch queue size: 1. 4050220: system.cpu.fetch: Running stage. 4050220: system.cpu.fetch: Attempting to fetch from [tid:0] 4050220: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode. 4050220: system.cpu.fetch: [tid:0]: I