[gem5-users] Avoiding L1 cache/ forcing L1 misses in my protocol

2017-07-24 Thread Hariharasudhan Venkataraman
Hi everyone,

I am using gem5 that simulates an architecture that takes some information
from a core and passes into the Network Interface. I use Mesh topology and
Two-Level MESI Protocol. I use ruby and garnet for my memory and
interconnect modelling.

My need is to bypass L1 cache for data access and all the information
should be in shared L2 cache and memory nodes. I want to use L1 only for
Instructions (Icache).
Is there a way to do it without complicating much with the slicc protocol?
Please let me know if there is a  method to do this.

Thank you,

Hari
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Avoiding L1 cache/ forcing L1 misses in my protocol

2017-07-24 Thread Hariharasudhan Venkataraman
Hi everyone,

I am using gem5 that simulates an architecture that takes some information
from a core and passes into the Network Interface. I use Mesh topology and
Two-Level MESI Protocol. I use ruby and garnet for my memory and
interconnect modelling.

My need is to bypass L1 cache for data access and all the information
should be in shared L2 cache and memory nodes. I want to use L1 only for
Instructions (Icache).
Is there a way to do it without complicating much with the slicc protocol?
Please let me know if there is a  method to do this.

Thank you,

Hari
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Garnet 2.0: Torus is being Deadlocked for 256 nodes with injection rate = 0.14

2017-07-24 Thread Krishna, Tushar
The lectures on NoC deadlocks on my website might help understand the problem:
http://tusharkrishna.ece.gatech.edu/wp-content/uploads/sites/175/2016/10/L05-Deadlocks-I.pdf
http://tusharkrishna.ece.gatech.edu/wp-content/uploads/sites/175/2016/10/L06-Deadlocks-II.pdf
http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/

By default any VC can be selected as you rightly pointed out.
This means a cyclic dependence can form leading to a deadlock.
To avoid it, one technique is to partition the VCs into 2 halves, and require 
all flits crossing a specific link to switch from the first half to the second 
half. Flits can cross from VC 0 to VC 1, but not from VC 1 to VC 0, thereby 
ensuring no cyclic dependence.
To implement this, you need to hack into the VC select code.

[The same holds true in Garnet1.0 as well - it will also deadlock with a Torus].

If you want to use a Torus topology, this is something that needs to be 
implemented and not supported out of the box in garnet (yet).

Cheers,
Tushar



On Jul 24, 2017, at 10:36 AM, F. A. Faisal 
> wrote:

Plus...

The default weight based routing selects the free VC.
If so, then why I need to do the VC partitioning as you mentioned in the Torus 
network.

VC Selection (VS): The winner of SA selects a free VC (if HEAD/HEAD_TAIL flit) 
from its output port.

I think this is a very important issue for all the users of garnet 2,0.

I would like to solve this.

Thanks again.

Faisal



On Mon, Jul 24, 2017 at 11:21 PM, F. A. Faisal 
> wrote:
Thanks a lot for reply.

This is little bit terrible news for me.

However, as far I know garnet1.0 don't have the deadlock issue with Torus.
Please let me know how can I implement a VC partitioning scheme. Is it possible?

I can configure the routing algorithm with particular channel selection, but I 
have no idea of VC partitioning in gem5.

Please help me.

Thanks again.

Faisal


On Mon, Jul 24, 2017 at 10:33 PM, Krishna, Tushar 
> wrote:
Hi Faisal,
The Torus topology deadlocks as it has rings in each dimension unless one 
implements a VC partitioning scheme or bubble flow control. That's why I 
removed torus from the default topologies provided by garnet2.0. If you 
implement torus, you will have to implement deadlock freedom.

Cheers,
Tushar


On Jul , 2017, at 5:56 PM, F. A. Faisal 
> wrote:

Dear All,

I like to simulate the synthetic traffic analysis for Torus for 256 nodes with 
uniform traffic.
However, the network is showing latency degradation after 0.14 injection rate 
(flit latency = 33.044985 for 0.14 and flit latency =  38.244770 for 0.13 ), 
which could be the possible case of network deadlocked.
I configured the garnet 2.0 with all the default settings (4 vc + 16 bandwith 
factor) and Mesh network is also performing properly. As the number of VC is 4, 
Torus should not be in a deadlock.

I also like to share the network file as attachment.
And please consider the simulation condition as below-

./build/Garnet_standalone/gem5.debug configs.py/example/garnet_synth_traffic 
--num-cpus=256 --num-dirs=256 --network=garnet2.0 --topology=Torus_XY 
--mesh-rows=16 --sim-cycles=2 --synthetic=uniform_random 
--injectionrate=0.14 --routing-algorithm=0 --vcs-per-vnet=4

Please let me know how to resolve this issue for Garnet 2.0.

Thanks and best regards,

F.A. Faisal

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users


___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] coherence state machine - 1 event per clock

2017-07-24 Thread Hagai David
Thanks Jason!

I see that this option called ports and it does work.
Is there any other impact of this option on the implementation or only on the 
transition_per_cycle ?

About the example, on my case it should be split to at least 2 different 
clocks, but I understand the cases that 2 transition might occur on same cycle, 
I'll look into the resource tracking example, this might be what I'm looking 
for.

Thanks,

   Hagai

From: gem5-users [mailto:gem5-users-boun...@gem5.org] On Behalf Of Jason 
Lowe-Power
Sent: Monday, July 24, 2017 5:21 PM
To: gem5 users mailing list 
Subject: Re: [gem5-users] coherence state machine - 1 event per clock

Hi Hagai,

You can limit the transitions per cycle with the option "transitions_per_cycle" 
on each controller.

Note that some of the transitions are "logical" and not what a real 
implementation would do. This is why the default transitions per cycle is 
higher. FOr instance, in your example, the 4 transitions are really just one 
"real" transition, but uses more logical transitions for simplicity of 
implementation.

There is some support in Ruby for tracking resources by tagging each 
transition. See MOESI_AMD_Base-CorePair.sm for an example of how to do this.

Jason

On Mon, Jul 24, 2017 at 2:39 AM Hagai David 
> wrote:
Hello,

I'm a new gem5 user and simulating the coherent protocol of MESI with two level 
of caches.
I see on the ProtocolTrace log file that several transition of the same address 
can occur at the same cycle.
See below:

7263   0L2Cache UnblockMT_SB>SS [0x400, 
line 0x400]
7263   0L2Cache L1_GETS   SS>SS [0x400, 
line 0x400]
7263   0L2Cache L1_GETS   SS>SS [0x400, 
line 0x400]
7263   0L2Cache L1_GET_INSTR SS>SS [0x400, line 
0x400]

All these 4 messages are accepted and influence the state machine (even though 
it seems like the state kept the same (SS>SS) the requestor has added to the 
sharer list).
I was expecting to have only 1 event per cycle that will impact the state 
machine.

What/Where is the best way to define this restriction (1 message at a clock) on 
All cache level's ?

Thanks,

   Hagai
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Problem about PerfectSwitch

2017-07-24 Thread Jimmy Chao
I want to ask about PerfectSwitch. PerfectSwitch is a router or arbiter?

http://www.m5sim.org/Simple

And Why PerfectSwitch need to have routing table?

I know PerfectSwitch has Round Robin scheduling to choose a virtual network
perform and the switch will see the message destination.

But Why it needs to have a priority number and reverse the priority number?
Is avoid starvation? But Round Robin will have starvation condition?

Why they need a switch to control? (Is it order to a better shortest
destination?)

Best regards,

Jimmy
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Garnet 2.0: Torus is being Deadlocked for 256 nodes with injection rate = 0.14

2017-07-24 Thread F. A. Faisal
Plus...

The default weight based routing selects the free VC.
If so, then why I need to do the VC partitioning as you mentioned in the
Torus network.

*VC Selection (VS)*: The winner of SA selects a free VC (if HEAD/HEAD_TAIL
flit) from its output port.

I think this is a very important issue for all the users of garnet 2,0.

I would like to solve this.

Thanks again.

Faisal



On Mon, Jul 24, 2017 at 11:21 PM, F. A. Faisal  wrote:

> Thanks a lot for reply.
>
> This is little bit terrible news for me.
>
> However, as far I know garnet1.0 don't have the deadlock issue with Torus.
> Please let me know how can I implement a VC partitioning scheme. Is it
> possible?
>
> I can configure the routing algorithm with particular channel selection,
> but I have no idea of VC partitioning in gem5.
>
> Please help me.
>
> Thanks again.
>
> Faisal
>
>
> On Mon, Jul 24, 2017 at 10:33 PM, Krishna, Tushar 
> wrote:
>
>> Hi Faisal,
>> The Torus topology deadlocks as it has rings in each dimension unless one
>> implements a VC partitioning scheme or bubble flow control. That's why I
>> removed torus from the default topologies provided by garnet2.0. If you
>> implement torus, you will have to implement deadlock freedom.
>>
>> Cheers,
>> Tushar
>>
>>
>> On Jul , 2017, at 5:56 PM, F. A. Faisal  wrote:
>>
>> Dear All,
>>
>> I like to simulate the synthetic traffic analysis for Torus for 256 nodes
>> with uniform traffic.
>> However, the network is showing latency degradation after 0.14 injection
>> rate (flit latency = 33.044985 for 0.14 and flit latency =  38.244770 for
>> 0.13 ), which could be the possible case of network deadlocked.
>> I configured the garnet 2.0 with all the default settings (4 vc + 16
>> bandwith factor) and Mesh network is also performing properly. As the
>> number of VC is 4, Torus should not be in a deadlock.
>>
>> I also like to share the network file as attachment.
>> And please consider the simulation condition as below-
>>
>> ./build/Garnet_standalone/gem5.debug configs.py/example/garnet_synt
>> h_traffic --num-cpus=256 --num-dirs=256 --network=garnet2.0
>> --topology=Torus_XY --mesh-rows=16 --sim-cycles=2
>> --synthetic=uniform_random --injectionrate=0.14 --routing-algorithm=0 --
>> vcs-per-vnet=4
>>
>>
>> Please let me know how to resolve this issue for Garnet 2.0.
>>
>>
>> Thanks and best regards,
>>
>>
>> F.A. Faisal
>>
>> 
>>
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>>
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Garnet 2.0: Torus is being Deadlocked for 256 nodes with injection rate = 0.14

2017-07-24 Thread F. A. Faisal
Thanks a lot for reply.

This is little bit terrible news for me.

However, as far I know garnet1.0 don't have the deadlock issue with Torus.
Please let me know how can I implement a VC partitioning scheme. Is it
possible?

I can configure the routing algorithm with particular channel selection,
but I have no idea of VC partitioning in gem5.

Please help me.

Thanks again.

Faisal


On Mon, Jul 24, 2017 at 10:33 PM, Krishna, Tushar 
wrote:

> Hi Faisal,
> The Torus topology deadlocks as it has rings in each dimension unless one
> implements a VC partitioning scheme or bubble flow control. That's why I
> removed torus from the default topologies provided by garnet2.0. If you
> implement torus, you will have to implement deadlock freedom.
>
> Cheers,
> Tushar
>
>
> On Jul , 2017, at 5:56 PM, F. A. Faisal  wrote:
>
> Dear All,
>
> I like to simulate the synthetic traffic analysis for Torus for 256 nodes
> with uniform traffic.
> However, the network is showing latency degradation after 0.14 injection
> rate (flit latency = 33.044985 for 0.14 and flit latency =  38.244770 for
> 0.13 ), which could be the possible case of network deadlocked.
> I configured the garnet 2.0 with all the default settings (4 vc + 16
> bandwith factor) and Mesh network is also performing properly. As the
> number of VC is 4, Torus should not be in a deadlock.
>
> I also like to share the network file as attachment.
> And please consider the simulation condition as below-
>
> ./build/Garnet_standalone/gem5.debug configs.py/example/garnet_
> synth_traffic --num-cpus=256 --num-dirs=256 --network=garnet2.0
> --topology=Torus_XY --mesh-rows=16 --sim-cycles=2
> --synthetic=uniform_random --injectionrate=0.14 --routing-algorithm=0 --
> vcs-per-vnet=4
>
>
> Please let me know how to resolve this issue for Garnet 2.0.
>
>
> Thanks and best regards,
>
>
> F.A. Faisal
>
> 
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Garnet 2.0: Torus is being Deadlocked for 256 nodes with injection rate = 0.14

2017-07-24 Thread Krishna, Tushar
Hi Faisal,
The Torus topology deadlocks as it has rings in each dimension unless one 
implements a VC partitioning scheme or bubble flow control. That's why I 
removed torus from the default topologies provided by garnet2.0. If you 
implement torus, you will have to implement deadlock freedom.

Cheers,
Tushar


On Jul , 2017, at 5:56 PM, F. A. Faisal 
> wrote:

Dear All,

I like to simulate the synthetic traffic analysis for Torus for 256 nodes with 
uniform traffic.
However, the network is showing latency degradation after 0.14 injection rate 
(flit latency = 33.044985 for 0.14 and flit latency =  38.244770 for 0.13 ), 
which could be the possible case of network deadlocked.
I configured the garnet 2.0 with all the default settings (4 vc + 16 bandwith 
factor) and Mesh network is also performing properly. As the number of VC is 4, 
Torus should not be in a deadlock.

I also like to share the network file as attachment.
And please consider the simulation condition as below-


./build/Garnet_standalone/gem5.debug configs.py/example/garnet_synth_traffic 
--num-cpus=256 --num-dirs=256 --network=garnet2.0 --topology=Torus_XY 
--mesh-rows=16 --sim-cycles=2 --synthetic=uniform_random 
--injectionrate=0.14 --routing-algorithm=0 --vcs-per-vnet=4


Please let me know how to resolve this issue for Garnet 2.0.


Thanks and best regards,


F.A. Faisal


___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Garnet 2.0: Torus is being Deadlocked for 256 nodes with injection rate = 0.14

2017-07-24 Thread F. A. Faisal
Dear All,

I like to simulate the synthetic traffic analysis for Torus for 256 nodes
with uniform traffic.
However, the network is showing latency degradation after 0.14 injection
rate (flit latency = 33.044985 for 0.14 and flit latency =  38.244770 for
0.13 ), which could be the possible case of network deadlocked.
I configured the garnet 2.0 with all the default settings (4 vc + 16
bandwith factor) and Mesh network is also performing properly. As the
number of VC is 4, Torus should not be in a deadlock.

I also like to share the network file as attachment.
And please consider the simulation condition as below-

./build/Garnet_standalone/gem5.debug configs.py/example/garnet_synth_traffic
--num-cpus=256 --num-dirs=256 --network=garnet2.0 --topology=Torus_XY
--mesh-rows=16 --sim-cycles=2 --synthetic=uniform_random
--injectionrate=0.14
--routing-algorithm=0 --vcs-per-vnet=4


Please let me know how to resolve this issue for Garnet 2.0.


Thanks and best regards,


F.A. Faisal
# Copyright (c) 2010 Advanced Micro Devices, Inc.
#   2016 Georgia Institute of Technology
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# Authors: Brad Beckmann
#  Tushar Krishna

from m5.params import *
from m5.objects import *

from BaseTopology import SimpleTopology

# Creates a generic Mesh assuming an equal number of cache
# and directory controllers.
# XY routing is enforced (using link weights)
# to guarantee deadlock freedom.

class Torus_XY(SimpleTopology):
description='Torus_XY'

def __init__(self, controllers):
self.nodes = controllers

# Makes a generic mesh
# assuming an equal number of cache and directory cntrls

def makeTopology(self, options, network, IntLink, ExtLink, Router):
nodes = self.nodes

num_routers = options.num_cpus
num_rows = options.mesh_rows

# default values for link latency and router latency.
# Can be over-ridden on a per link/router basis
link_latency = options.link_latency # used by simple and garnet
router_latency = options.router_latency # only used by garnet


# There must be an evenly divisible number of cntrls to routers
# Also, obviously the number or rows must be <= the number of routers
cntrls_per_router, remainder = divmod(len(nodes), num_routers)
assert(num_rows > 0 and num_rows <= num_routers)
num_columns = int(num_routers / num_rows)
assert(num_columns * num_rows == num_routers)

# Create the routers in the torus
routers = [Router(router_id=i, latency = router_latency) \
for i in range(num_routers)]
network.routers = routers

# link counter to set unique link ids
link_count = 0

# Add all but the remainder nodes to the list of nodes to be uniformly
# distributed across the network.
network_nodes = []
remainder_nodes = []
for node_index in xrange(len(nodes)):
if node_index < (len(nodes) - remainder):
network_nodes.append(nodes[node_index])
else:
remainder_nodes.append(nodes[node_index])

# Connect each node to the appropriate router
ext_links = []
for (i, n) in enumerate(network_nodes):
cntrl_level, router_id = divmod(i, num_routers)
assert(cntrl_level < cntrls_per_router)
ext_links.append(ExtLink(link_id=link_count, ext_node=n,
int_node=routers[router_id],

[gem5-users] coherence state machine - 1 event per clock

2017-07-24 Thread Hagai David
Hello,

I'm a new gem5 user and simulating the coherent protocol of MESI with two level 
of caches.
I see on the ProtocolTrace log file that several transition of the same address 
can occur at the same cycle.
See below:

7263   0L2Cache UnblockMT_SB>SS [0x400, 
line 0x400]
7263   0L2Cache L1_GETS   SS>SS [0x400, 
line 0x400]
7263   0L2Cache L1_GETS   SS>SS [0x400, 
line 0x400]
7263   0L2Cache L1_GET_INSTR SS>SS [0x400, line 
0x400]

All these 4 messages are accepted and influence the state machine (even though 
it seems like the state kept the same (SS>SS) the requestor has added to the 
sharer list).
I was expecting to have only 1 event per cycle that will impact the state 
machine.

What/Where is the best way to define this restriction (1 message at a clock) on 
All cache level's ?

Thanks,

   Hagai
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users