Re: [gem5-users] Micro-op Data Dependency

2016-08-02 Thread Steve Reinhardt
I don't know that off the top of my head---the ISAs I'm familiar with are
either not microcoded, or use a micro-op assembler to generate all the
micro-ops (i.e., x86).  Have you looked at how ARM micro-ops are
constructed?  That's the one ISA that I believe is mostly not microcoded
but still has some microcode in it.

Though come to think of it, it may be as easy as just using a constant
where the other operands specify the machine code bitfield, if there's
syntax that allows that.

Steve


On Tue, Aug 2, 2016 at 1:54 PM Alec Roelke  wrote:

> Okay, thanks.  How do I tell the ISA parser that the 'Rt' operand I've
> created refers to the extra architectural register?  Or is there some
> function I can call inside the instruction's code that writes directly to
> an architectural register?  All I can see from the code GEM5 generates is
> "setIntRegOperand," which takes indices into _destRegIdx rather than
> register indices.
>
> On Mon, Aug 1, 2016 at 10:58 AM, Steve Reinhardt  wrote:
>
>> You don't need to worry about the size of the bitfield in the instruction
>> encoding, because the temporary register(s) will never be directly
>> addressed by any machine instruction.  You should define a new
>> architectural register using an index that doesn't appear in any
>> instruction (e.g., if the ISA includes r0 to r31, then the temp reg can be
>> r32). This register will get renamed in the O3 model.
>>
>> Steve
>>
>>
>> On Sun, Jul 31, 2016 at 7:21 AM Alec Roelke  wrote:
>>
>>> That makes sense.  Would it be enough for me to just create a new IntReg
>>> operand, like this:
>>>
>>> 'Rt': ('IntReg', 'ud', None, 'IsInteger', 4)
>>>
>>> and then increase the number of integer registers?  The other integer
>>> operands have a bit field from the instruction bits, but since the ISA
>>> doesn't specify that these RMW instructions should be microcoded, there's
>>> no way to decode a temporary register from the instruction bits.  Will GEM5
>>> understand that and pick any integer register that's available?
>>>
>>> The memory address is taken from Rs1 before the load micro-op, and then
>>> stored in a C++ variable for the remainder of the instruction.  That was
>>> done to ensure that other intervening instructions that might get executed
>>> in the O3 model don't change Rs1 between the load and modify-write
>>> micro-ops, but if I can get the temp register to work then that might fix
>>> itself.
>>>
>>> I was only setting _srcRegIdx and _destRegIdx for disassembly reasons;
>>> since the macro-op and first micro-op don't make use of Rs2, the
>>> instruction wasn't setting _srcRegIdx[1] and the disassembly would show
>>> something like 4294967295.  Then it presented a potential solution to the
>>> minor CPU model problem I described before.
>>>
>>> No, most of the ISA is not microcoded.  In fact, as I said, these RMW
>>> instructions are not specified to be microcoded by the ISA, but since they
>>> each have two memory transactions they didn't appear to work unless I split
>>> them into two micro-ops.
>>>
>>> On Sat, Jul 30, 2016 at 2:14 PM, Steve Reinhardt 
>>> wrote:
>>>
 You shouldn't be passing values between micro-ops using C++ variables,
 you should pass the data in a register.  (If necessary, create
 microcode-only temporary registers for this purpose, like x86 does.) This
 is microarchitectural state so you can't hide it from the CPU model.  The
 main problem here is that, since this "hidden" data dependency isn't
 visible to the CPU model, it doesn't know that the micro-ops must be
 executed in order.  If you pass that data in a register, the pipeline model
 will enforce the dependency.

 Also, where do you set the address for the memory accesses?  Again,
 both micro-ops should read that out of a register, it should not be passed
 implicitly via hidden variables.

 You shouldn't have to explicitly set the internal fields like
 _srcRegIdx and _destRegIdx, the ISA parser should do that for you.

 Unfortunately the ISA description system wasn't originally designed to
 support microcode, and that support was kind of shoehorned in after the
 fact, so it is a little messy.  Is your whole ISA microcoded, or just a few
 specific instructions?

 Steve


 On Fri, Jul 29, 2016 at 7:37 PM Alec Roelke  wrote:

> Sure, I can show some code snippets.  First, here is the code for the
> read micro-op for an atomic read-add-write:
>
> temp = Mem_sd;
>
> And the modify-write micro-op:
>
> Rd_sd = temp;
> Mem_sd = Rs2_sd + temp;
>
> The memory address comes from Rs1.  The variable "temp" is a temporary
> location shared between the read and modify-write micro-ops (the address
> from Rs1 is shared similarly to ensure it's the same when the instructions
> are issued).
>
> In the 

Re: [gem5-users] Micro-op Data Dependency

2016-08-02 Thread Alec Roelke
Okay, thanks.  How do I tell the ISA parser that the 'Rt' operand I've
created refers to the extra architectural register?  Or is there some
function I can call inside the instruction's code that writes directly to
an architectural register?  All I can see from the code GEM5 generates is
"setIntRegOperand," which takes indices into _destRegIdx rather than
register indices.

On Mon, Aug 1, 2016 at 10:58 AM, Steve Reinhardt  wrote:

> You don't need to worry about the size of the bitfield in the instruction
> encoding, because the temporary register(s) will never be directly
> addressed by any machine instruction.  You should define a new
> architectural register using an index that doesn't appear in any
> instruction (e.g., if the ISA includes r0 to r31, then the temp reg can be
> r32). This register will get renamed in the O3 model.
>
> Steve
>
>
> On Sun, Jul 31, 2016 at 7:21 AM Alec Roelke  wrote:
>
>> That makes sense.  Would it be enough for me to just create a new IntReg
>> operand, like this:
>>
>> 'Rt': ('IntReg', 'ud', None, 'IsInteger', 4)
>>
>> and then increase the number of integer registers?  The other integer
>> operands have a bit field from the instruction bits, but since the ISA
>> doesn't specify that these RMW instructions should be microcoded, there's
>> no way to decode a temporary register from the instruction bits.  Will GEM5
>> understand that and pick any integer register that's available?
>>
>> The memory address is taken from Rs1 before the load micro-op, and then
>> stored in a C++ variable for the remainder of the instruction.  That was
>> done to ensure that other intervening instructions that might get executed
>> in the O3 model don't change Rs1 between the load and modify-write
>> micro-ops, but if I can get the temp register to work then that might fix
>> itself.
>>
>> I was only setting _srcRegIdx and _destRegIdx for disassembly reasons;
>> since the macro-op and first micro-op don't make use of Rs2, the
>> instruction wasn't setting _srcRegIdx[1] and the disassembly would show
>> something like 4294967295.  Then it presented a potential solution to the
>> minor CPU model problem I described before.
>>
>> No, most of the ISA is not microcoded.  In fact, as I said, these RMW
>> instructions are not specified to be microcoded by the ISA, but since they
>> each have two memory transactions they didn't appear to work unless I split
>> them into two micro-ops.
>>
>> On Sat, Jul 30, 2016 at 2:14 PM, Steve Reinhardt 
>> wrote:
>>
>>> You shouldn't be passing values between micro-ops using C++ variables,
>>> you should pass the data in a register.  (If necessary, create
>>> microcode-only temporary registers for this purpose, like x86 does.) This
>>> is microarchitectural state so you can't hide it from the CPU model.  The
>>> main problem here is that, since this "hidden" data dependency isn't
>>> visible to the CPU model, it doesn't know that the micro-ops must be
>>> executed in order.  If you pass that data in a register, the pipeline model
>>> will enforce the dependency.
>>>
>>> Also, where do you set the address for the memory accesses?  Again, both
>>> micro-ops should read that out of a register, it should not be passed
>>> implicitly via hidden variables.
>>>
>>> You shouldn't have to explicitly set the internal fields like _srcRegIdx
>>> and _destRegIdx, the ISA parser should do that for you.
>>>
>>> Unfortunately the ISA description system wasn't originally designed to
>>> support microcode, and that support was kind of shoehorned in after the
>>> fact, so it is a little messy.  Is your whole ISA microcoded, or just a few
>>> specific instructions?
>>>
>>> Steve
>>>
>>>
>>> On Fri, Jul 29, 2016 at 7:37 PM Alec Roelke  wrote:
>>>
 Sure, I can show some code snippets.  First, here is the code for the
 read micro-op for an atomic read-add-write:

 temp = Mem_sd;

 And the modify-write micro-op:

 Rd_sd = temp;
 Mem_sd = Rs2_sd + temp;

 The memory address comes from Rs1.  The variable "temp" is a temporary
 location shared between the read and modify-write micro-ops (the address
 from Rs1 is shared similarly to ensure it's the same when the instructions
 are issued).

 In the constructor for the macro-op, I've included some code that
 explicitly sets the src and dest register indices so that they are
 displayed properly for execution traces:

 _numSrcRegs = 2;
 _srcRegIdx[0] = RS1;
 _srcRegIdx[1] = RS2;
 _numDestRegs = 1;
 _destRegIdx[0] = RD;

 So far, this works for the O3 model.  But, in the minor model, it tries
 to execute the modify-write micro-op before the read micro-op is executed.
 The address is never loaded from Rs1, and so a segmentation fault often
 occurs.  To try to fix it, I added this code to the constructors of each of
 the two micro-ops:

 

Re: [gem5-users] Physmem in SE Mode (Jason Lowe-Power)

2016-08-02 Thread Zaman, Monir
Hello Jason,
Thanks for your reply. I have few questions needs to be answered:
- I looked into the Status Matrix (http://www.m5sim.org/Status_Matrix), and it 
says the Memory System for SPARC does not work with InOrder (or MinorCPU) CPU. 
So in that case how will I be able to run anything using the MinorCPU for 
SPARC. I had tried compiling the SPARC with MinorCPU option, but it didn’t work 
when I tried to run the helloworld program. Basically it said unavailable 
option.
- Next, I looked into the “To Do List” for InOrder CPU, and it says SPARC is 
partially implemented, what does it mean?
- For Threading in SPARC, is it same as running multiple workloads in multiple 
CPUs in the SE mode? I have not yet seen any SPARC implementation in GEM5 using 
the InOrder Detailed CPU. Other than the timing of the pipeline stages, can I 
be functionally correct by running the SPARC in “timing” model of the 
“SimpleCPU” and calculate the power using McPAT?
- For assigning multiple workloads in SE mode, I have tried just simply 
increasing “np” to 2 and running  2 “hello_world” binaries. In the stats file, 
I see only data for “cpu0” and “cpu1” seems to be idle (all 0).
- Where should I be looking at to “Detailed MinorCPU” implementation for SPARC?

/Monir

On 8/2/16, 11:00 AM, "gem5-users on behalf of gem5-users-requ...@gem5.org" 
 wrote:

Send gem5-users mailing list submissions to
gem5-users@gem5.org

To subscribe or unsubscribe via the World Wide Web, visit
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
or, via email, send a message with subject or body 'help' to
gem5-users-requ...@gem5.org

You can reach the person managing the list at
gem5-users-ow...@gem5.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gem5-users digest..."


Today's Topics:

   1. Making a C application work on multiple cores with gem5
  (anoir nechi)
   2. Making a C application work on multiple cores with gem5
  (anoir nechi)
   3. Virtual to Physical Address in ARMv8 FS Classic Memory
  (Vanchinathan Venkataramani)
   4. Re: Making a C application work on multiple cores withgem5
  (Jason Lowe-Power)
   5. Re: Making a C application work on multiple cores withgem5
  (Jason Lowe-Power)
   6. Re: Physmem in SE Mode (Jason Lowe-Power)
   7. Re: Making a C application work on multiple cores withgem5
  (anoir nechi)


--

Message: 1
Date: Tue, 2 Aug 2016 11:17:55 +0200
From: anoir nechi 
To: gem5 users mailing list , m5-us...@m5sim.org
Subject: [gem5-users] Making a C application work on multiple cores
with gem5
Message-ID:

Re: [gem5-users] Making a C application work on multiple cores with gem5

2016-08-02 Thread anoir nechi
Hi Jason

for the cores not simultaneously. i read abouat what gem5 support

so i will try for example 4 cores of x86 then 4 cores of ARM.

And by the way do you have any idea about ARM SIMD instructions please?

Thank you

On Tue, Aug 2, 2016 at 3:28 PM, Jason Lowe-Power 
wrote:

> Hello,
>
> If you're using full system mode (FS mode), you can use pthreads or any
> other threading library just like on a real machine. If you're using
> syscall emulation (SE) mode, then you can use the m5threads library which
> is a pthreads-like library (http://repo.gem5.org/m5threads/).
>
> If I've misunderstood your question and you want to try to use x86 and ARM
> cores simultaneously... that currently isn't supported by gem5.
>
> Jason
>
> On Tue, Aug 2, 2016 at 4:18 AM anoir nechi  wrote:
>
>> hello
>>
>> I am new with gem5 simulator. I have a C application that i want to make
>> it run faster. So the first thing I've done is to optimize it using several
>> techniques like loop unrolling and SIMD. And the next step, i intend to
>> make it work on *multiple cores* (*X86* and *ARM*) for that i must use
>> the gem5 simulator.
>>
>> The application is for Radix4 computing. For now I've succeeded to make
>> it work on one core systems for *X86* and *ARM* but, now i want to make
>> it work on 4, 16, ... cores X86 or ARM.
>>
>> could someone give me some hints or show me the right way to do this?
>> Thank you
>>
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>



-- 
*Anouar NECHI*


*IT Engineer : Industrial systemsHigher Institute of Computer ScienceTunis
- El Manar University*
*Phone :* *(+216) 50 311 536*
*E-mail :* *anoirne...@gmail.com *
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Physmem in SE Mode

2016-08-02 Thread Jason Lowe-Power
Hi Monir,

The AbstractMemory class (along with the System class) implements the
physical memory of the system. When configuring gem5, if you instantiate a
memory object (e.g., DRAMCtrl like DDR3_1600_x64) this object will register
the physical memory with the System object. The with the DRAMCtrl, you can
configure both the size and the location in the address space of the
physical memory.

For configuring a system like a SPARC T1... There isn't anything out of the
box that will "just work". You'll have to dig into the CPU options
(probably the MinorCPU since the T1 was in-order) to see if you can enable
threading and configure it like the T1.

Jason

On Mon, Aug 1, 2016 at 10:31 AM Zaman, Monir 
wrote:

> Hello all,
>
> I was running the example/se.py script for my test cases, and I don’t see
> the “physmem” stats which mimics the DRAM. I do see a 512MB value for
> memory, but how do I setup the Physical Memory (Main Memory) in the setup?
>
> Also, how do I set up the Hardware threading to mimic the SPARC T1
> processor?
>
>
>
> Thanks
>
> Monir
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Making a C application work on multiple cores with gem5

2016-08-02 Thread Jason Lowe-Power
Hello,

If you're using full system mode (FS mode), you can use pthreads or any
other threading library just like on a real machine. If you're using
syscall emulation (SE) mode, then you can use the m5threads library which
is a pthreads-like library (http://repo.gem5.org/m5threads/).

If I've misunderstood your question and you want to try to use x86 and ARM
cores simultaneously... that currently isn't supported by gem5.

Jason

On Tue, Aug 2, 2016 at 4:18 AM anoir nechi  wrote:

> hello
>
> I am new with gem5 simulator. I have a C application that i want to make
> it run faster. So the first thing I've done is to optimize it using several
> techniques like loop unrolling and SIMD. And the next step, i intend to
> make it work on *multiple cores* (*X86* and *ARM*) for that i must use
> the gem5 simulator.
>
> The application is for Radix4 computing. For now I've succeeded to make it
> work on one core systems for *X86* and *ARM* but, now i want to make it
> work on 4, 16, ... cores X86 or ARM.
>
> could someone give me some hints or show me the right way to do this?
> Thank you
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Virtual to Physical Address in ARMv8 FS Classic Memory

2016-08-02 Thread Vanchinathan Venkataramani
Hi all

I am currently running an application on 64 core ARMv8 FS with Classic
Memory with individual L1 D and I Cache and unified L2 cache.

On looking at the cache memory trace, two virtual addresses, one from
Kernel space (e.g. 0xffc071a63400) and one from application space (e.g.
0x915400) are mapped to the same physical address (e.g. 0xf1a63400)

The *kernel memory access* occurs first and ends as a *cache miss*.
However, the first access to the application memory address ends up as a *cache
hit. *I double checked with the cache trace and statistics to confirm this.

One explanation is that these belong to two different threads and hence can
have the same physical address due to context switching. However, if that
is the case, access to the application address should end up as a miss
(which is not the case).

Any explanation is greatly appreciated. Thanks a lot in advance.
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Making a C application work on multiple cores with gem5

2016-08-02 Thread anoir nechi
hello

I am new with gem5 simulator. I have a C application that i want to make it
run faster. So the first thing I've done is to optimize it using several
techniques like loop unrolling and SIMD. And the next step, i intend to
make it work on *multiple cores* (*X86* and *ARM*) for that i must use the
gem5 simulator.

The application is for Radix4 computing. For now I've succeeded to make it
work on one core systems for *X86* and *ARM* but, now i want to make it
work on 4, 16, ... cores X86 or ARM.

could someone give me some hints or show me the right way to do this? Thank
you
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users