[gem5-users] gem5 takes a lot of time to load global data from program binary

2019-11-06 Thread Nitish Srivastava
Hi,

I am trying to run some benchmarks in gem5. These benchmarks have really
big matrices stored as global arrays. While simulating it seems like gem5
is taking a lot of time to just load these global matrices i.e. the time
difference between the time when I start gem5 simulation and the time when
the "main" function in the binary actually gets executed is huge. Is there
a way I can speed this up?

Thanks,
Nitish
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Getting single cache line from multiple channels of HBM

2019-09-29 Thread Nitish Srivastava
Hi Chenfeng,

Sorry for the late response. If you look into src/mem/Xbar.py the width of
SystemXbar is set to be 16-bytes. This means if you are running the network
at 1GHz, even though your HBM can support 128 GBps (with 8 channels), the
bandwidth from crossbar is 16*1 GHz = 16GBps. There might be multiple ways
to solve this issue. One way is to just increase the SystemXbar width from
16-bytes to 128 bytes. Then you should be able to achieve 128 GBps.

Let me know if this helps.

Thanks,
Nitish

On Tue, Sep 17, 2019 at 5:15 PM Zhao, Chenfeng 
wrote:

> Hi Nitish,
>
> I am also trying to use 8 channels of HBM to take advantage of 128Gbps
> bandwidth. However, I can only get 16GBps as well. I was wondering have you
> figured out how to solve this problem?
>
> Thanks a lot!
>
> Chenfeng Zhao
>
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Unable to compile and run splatt on gem5 x86

2019-06-11 Thread Nitish Srivastava
Hi,

I am trying to compile and run splatt (
https://github.com/ShadenSmith/splatt ) on gem5 for x86 ISA. I am doing
static linking when compiling splatt. I am able to run the generated binary
natively; however, I am not able to run it on gem5 (syscall emulation
mode). Initially, gem5 was complaining about some vector instructions like
"pcmpeqq_Vdq_Wdq" which are not implemented in gem5. I tried to disable
vectorization. Then gem5 started to complain "malloc(): smallbin double
linked list corrupted" for the places splatt uses "posix_mem_align". So I
replaced "posix_mem_align" with "malloc" to avoid this error. But now gem5
complains "panic: Tried to read unmapped address 0x4942cc0.
PC: 0x15f20, Instr:   MOVSD_XMM_M : ldfp   %xmm0_low, DS:[8*rdx + rax]". I
have set the mem-size to be 4GB in gem5 and this address definitely lies
within that range.

Is compiling code for gem5-x86 that hard or I am doing something wrong. Any
suggestions on how to get splatt compiled and run for gem5 would be helpful.

Thanks,
Nitish
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Getting single cache line from multiple channels of HBM

2019-04-02 Thread Nitish Srivastava
Hi,

I have a streaming application which reads two arrays A and B sequentially
from the memory. I am trying to use HBM memory model (HBM_1000_4H_1x128) in
gem5 with 8 channels (without ruby). Since there are 8 channels and each
can provide 16 GBps, I want to achieve 8x16 = 128 GBps while reading a
single cache line from memory. I guess this is only possible when the lower
address bits are mapped to different channels? Out of the three address
mappings “RoRaBaChCo”, “RoRaBaCoCh” and “RoCoRaBaCh” supported in gem5 only
“RoRaBaCoCh” and “RoCoRaBaCh” allow the channel bits to be at LSB
positions. However, in dram_ctrl.cc in DRAMCtrl::init() there is an
assertion which ensures that an entire cache line is mapped to a single
channel which is preventing me to map a single cache line to multiple
channels.

if (system()->cacheLineSize() > range.granularity()) {
   fatal("Channel interleaving of %s must be at least as large "
"as the cache line size\n", name());
}

Is there a way I can achieve 128 GBps for this streaming application? Let
me know if you have suggestions. I tried running for different settings and
memory configurations but the achieved bandwidth is much lower than
128GBps. I also integrated the changes as mentioned in
https://www.mail-archive.com/gem5-users@gem5.org/msg16420.html but it
didn’t help either.

Thanks,
Nitish
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] How to use multiple channels in gem5

2019-03-13 Thread Nitish Srivastava
Hi,

I am trying to use HBM_1000_4H_1x64 DRAM model with 8 channels. I do this
by passing "--mem-type=HBM_1000_4H_1x64 --mem-channels=8" parameters while
running gem5 simulation. However, I am not getting any performance
improvement by increasing the number of channels. Do I need to do something
else to make use of these channels? Some kind of address mapping scheme for
the stored data?

Please let me know if you have suggestions. Any help will be appreciated.

Thanks,
Nitish
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] "srcRegRelativeLatency" for multiply unit

2018-02-09 Thread Nitish Srivastava
Hi everyone,

It seems like the default value of srcRegsRelativeLats is 0 for the
multiplier while the operation latency is 3. (
https://github.com/gem5/gem5/blob/master/src/cpu/minor/MinorCPU.py). This
means that an instruction which is dependent on the dest reg of a multiply
instruction can be issued in the very next clock cycle. Is this a bug? How
does this work?

class MinorDefaultIntMulFU(MinorFU):
opClasses = minorMakeOpClassSet(['IntMult'])
timings = [MinorFUTiming(description='Mul', srcRegsRelativeLats=[0])]
opLat = 3

Thanks,
Nitish

‌
​
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] gem5 segfault after running whole benchmark

2018-02-01 Thread Nitish Srivastava
Hi everyone,

I am running minor CPU gem5 (RISC-V port) with some of my changes, and it
seems like the program execution on the minor CPU completes but then
suddenly it shows that gem5 has segfaulted. The stack trace using gdb also
doesn’t seem to be useful. Any suggestions on how to debug this
segmentation fault?

C0:  0x0 |  ecall f56, f4, 0|  addi  |  addi  |
C0:  0x0 |  ecall f56, f4, 0|  ecall |  addi  |
C0:  0x0 |  ecall f56, f4, 0|  ecall |  addi  |
Active Contexts1
C0:  0x0 |  ecall f56, f4, 0|  ecall |  ecall |
Exiting @ tick 33538000 because exiting with last active thread context0

Program received signal SIGSEGV, Segmentation fault.0x015f76ab
in std::equal_to::operator() (this=0x29c0cd0,
__x=@0x7fffd7f8: 0x3393b10, __y=)
at 
/opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/stl_function.h:356356
  { return __x == __y; }
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64
libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7_4.1.x86_64
libselinux-2.5-11.el7.x86_64 libstdc++-4.8.5-16.el7_4.1.x86_64
openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64
zlib-1.2.7-17.el7.x86_64
(gdb) bt#0  0x015f76ab in std::equal_to::operator() (this=0x29c0cd0, __x=@0x7fffd7f8: 0x3393b10,
__y=)
at /opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/stl_function.h:356#1
 0x015f1c98 in std::__detail::_Equal_helper, std::__detail::_Select1st,
std::equal_to, unsigned long, false>::_S_equals
(__eq=..., __extract=..., __k=@0x7fffd7f8: 0x3393b10,
__n=0xfd56aee3)
at 
/opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/hashtable_policy.h:1331#2
 0x015edb54 in std::__detail::_Hashtable_base, std::__detail::_Select1st,
std::equal_to, std::hash,
std::__detail::_Mod_range_hashing,
std::__detail::_Default_ranged_hash,
std::__detail::_Hashtable_traits >::_M_equals
(this=0x29c0cd0, __k=@0x7fffd7f8: 0x3393b10, __c=54082320,
__n=0xfd56aee3)
at 
/opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/hashtable_policy.h:1702#3
 0x015f1b4d in std::_Hashtable, std::allocator, std::__detail::_Select1st, std::equal_to,
std::hash, std::__detail::_Mod_range_hashing,
std::__detail::_Default_ranged_hash,
std::__detail::_Prime_rehash_policy,
std::__detail::_Hashtable_traits
>::_M_find_before_node (this=0x29c0cd0, __n=1424, __k=@0x7fffd7f8:
0x3393b10,
__code=54082320) at
/opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/hashtable.h:1420#4
 0x015edaa4 in std::_Hashtable, std::allocator, std::__detail::_Select1st, std::equal_to,
std::hash, std::__detail::_Mod_range_hashing,
std::__detail::_Default_ranged_hash,
std::__detail::_Prime_rehash_policy,
std::__detail::_Hashtable_traits >::_M_find_node
(this=0x29c0cd0, __bkt=1424, __key=@0x7fffd7f8: 0x3393b10,
__c=54082320) at
/opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/hashtable.h:634#5
 0x015e9708 in std::_Hashtable, std::allocator, std::__detail::_Select1st, std::equal_to,
std::hash, std::__detail::_Mod_range_hashing,
std::__detail::_Default_ranged_hash,
std::__detail::_Prime_rehash_policy,
std::__detail::_Hashtable_traits >::equal_range
(this=0x29c0cd0, __k=@0x7fffd7f8: 0x3393b10)
at /opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/hashtable.h:1358#6
 0x015e6d4d in std::unordered_multimap, std::equal_to,
std::allocator >::equal_range
(this=0x29c0cd0, __x=@0x7fffd7f8: 0x3393b10) at
/opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/unordered_map.h:1611#7
 0x015e112f in pybind11::detail::pybind11_object_dealloc
(self=0x7fffeaececf0) at
ext/pybind11/include/pybind11/class_support.h:217#8
0x01df2e09 in list_dealloc (op=0x7fffeb110b00) at
../Objects/listobject.c:309#9  0x01e099d7 in
insertdict_by_entry (value=0x28e1310 <_Py_NoneStruct>, ep=0x2bdfde0,
hash=-4965403010368847346, key=0x7fffefe455a0,
mp=0x7fffee8737f8) at ../Objects/dictobject.c:519#10 insertdict
(mp=0x7fffee8737f8, key=0x7fffefe455a0, hash=-4965403010368847346,
value=0x28e1310 <_Py_NoneStruct>) at ../Objects/dictobject.c:556#11
0x01e0ae24 in dict_set_item_by_hash_or_entry (ep=0x0,
value=0x28e1310 <_Py_NoneStruct>, hash=, key=,
op=0x7fffee8737f8) at ../Objects/dictobject.c:765#12
PyDict_SetItem (op=op@entry=0x7fffee8737f8, key=,
value=0x28e1310 <_Py_NoneStruct>) at ../Objects/dictobject.c:818#13
0x01e0da0c in _PyModule_Clear (m=) at
../Objects/moduleobject.c:139#14 0x01e769de in
PyImport_Cleanup () at ../Python/import.c:530#15 0x01e8b68e in
Py_Finalize () at ../Python/pythonrun.c:458#16 0x01e8b288 in
Py_Exit (sts=sts@entry=1) at ../Python/pythonrun.c:1777#17
0x013d6722 in handle_system_exit () at
../Python/pythonrun.c:1151#18 0x01e8c87c in handle_system_exit
() at ../Python/pythonrun.c:1192#19 

[gem5-users] RISCV port for gem5 [code works on spike not on gem5]

2017-09-25 Thread Nitish Srivastava
Hi everyone,

I was recently trying to compiler few benchmarks from the
https://github.com/cornell-brg/xloops-bmarks using RISCV gcc compiler and
was trying to run them on gem5. There were few benchmarks in which gem5 was
showing unusual behaviour, reporting a page fault when there should be
none. I ran the riscv binary on spike and it worked fine. Then I tried
compiling the benchmark natively on x86 machine and ran it using valgrind
and didn’t find any issue. I was wondering is this a bug in the RISCV port
in gem5? I am compiling the following benchmark.

////
viterbi////
This application performs viterbi decoding on a frame of encoded data.
#include #include #include #include 
////
viterbi 
dataset////
constraint length (number of memory registers)const int K = 7;//
number of symbol bits per input bitconst int rate = 2;// Dataset
parameters// size of data packetconst int framebits = 2048;//
generator polynomialsint polys[rate] = {121, 91};// size of bitpacked
data arraysconst int data_sz = framebits/8;unsigned char
data[((framebits + (K-1)) / 8) + 1];unsigned char
syms[rate*(framebits+(K-1))];unsigned char ref[(framebits+(K-1))/8+1];
namespace viterbi {namespace details {// constraint length (number of
memory registers)// implementation requires K <= 8const int K = 7;//
number of symbol bits per input bitconst int rate = 2;// number of
possible decoder statesconst int STATES (1 << (K-1));
}}
namespace viterbi {
  using namespace details;

  // Quickly generate a parity bit
  // see http://graphics.stanford.edu/~seander/bithacks.html#ParityParallel

  //__attribute__ ((always_inline))
  int generate_symbol_bit_scalar(int x) {
x ^= (x >> 16);
x ^= (x >> 8);
x ^= (x >> 4);
x &= 0xf;
return (0x6996 >> x) & 1;
  }

  // Viterbi Decoder: take encoded symbols and return the decoded msg
  void viterbi_scalar(unsigned char symbols[], unsigned char msg[],
  int poly[], int framebits) {

// Branch Table stores the state transitions, we only need to build
// half thanks to trellace symmetry
unsigned int branch_table[STATES/2 * rate];

unsigned int* branch_table_ptr = _table[0];
//
// Build Branch Lookup Table
//

for (int state = 0; state < STATES/2; state++) {
  for (int i = 0; i < rate; i ++) {
int bit = generate_symbol_bit_scalar(2*state & poly[i]);
branch_table[i*STATES/2 + state] = bit ? 255 : 0;
  }
}

// Two buffers to store the accumulated error for each state/path
unsigned int error1[STATES];
unsigned int error2[STATES];
// Pointers to track which accumulated error buffer is most current,
// pointer targets are swapped after each frame bit
unsigned int * old_error = error1;
unsigned int * new_error = error2;
// Record the minimum error path entering each state for all framebits
int traces[framebits+(K-1)][STATES];

// Bias the accumulated error buffer towards the known start state
error1[0] = 0;
for(int i=1;i m1;
decision1 = m2 > m3;

// Save error for minimum transition
new_error[2*i] =   decision0 ? m1 : m0;
new_error[2*i+1] = decision1 ? m3 : m2;

// Save transmission bit for minimum transition
traces[s][2*i]   = decision0;
traces[s][2*i+1] = decision1;
  }

  // Swap targets of old and new error buffers
  unsigned int * temp = old_error;
  old_error = new_error;
  new_error = temp;
}

//
   

[gem5-users] RISCV tool-chain commit point used for Gem5+RISCV

2017-07-08 Thread Nitish Srivastava
Hi Gem5 users,

Can someone tell which commit point and branch in the riscv-tools repo (
https://github.com/riscv/riscv-tools ) was used for the development of
RISCV in gem5? I was trying to run a simple hello world program compiled
with the latest version of the compiler ( commit point 7b1680 in
riscv-tools trepo ):

  1 #include 
  2
  3 int main( int argc, char* argv[] )
  4 {
  5   std::cout << "Hello World, RISC-V!" << std::endl;
  6   return 0;
  7 }

and compiling it and running it on gem5 gives error:

% riscv64-unknown-elf-g++ hello.c -o hello
% spike pk hello
Hello World, RISC-V!
% cd gem5
% ./build/RISCV/gem5.opt ./configs/example/se.py -c hello

gives the following error:

warn: DRAM device capacity (8192 Mbytes) does not match the address
range assigned (512 Mbytes)
warn: Unknown operating system; assuming Linux.0:
system.remote_gdb.listener: listening for remote gdb #0 on port 7000
 REAL SIMULATION 
info: Entering event queue @ 0.  Starting simulation...*panic: Unknown
instruction 0x41975985 at pc 0x0001036e*
Memory Usage: 757624 KBytes
Program aborted at tick 0

Seeing the object-dump for hello at pc 1036e shows:

*0001036e <_start>:*
  * 1036e:   000a4197auipc   gp,0xa4*
   10372:   73a18193addigp,gp,1850 # b4aa8
<__global_pointer$>
   10376:   000a4517auipc   a0,0xa4
   1037a:   19250513addia0,a0,402 # b4508 <_edata>

Thanks,

Nitish
​
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users