Re: [gem5-users] RISCV port for gem5 [code works on spike not on gem5]

2017-09-25 Thread Alec Roelke
Hi Nitish,

Yes, this is a bug with gem5, but I haven't been able to track down the
problem myself.  It could be due to some inaccuracy in setting up the
program's memory (in src/arch/riscv/process.cc), which is based on my
interpretation of the proxy kernel's code.

-Alec Roelke

On Mon, Sep 25, 2017 at 4:58 PM, Nitish Srivastava 
wrote:

> Hi everyone,
>
> I was recently trying to compiler few benchmarks from the
> https://github.com/cornell-brg/xloops-bmarks using RISCV gcc compiler and
> was trying to run them on gem5. There were few benchmarks in which gem5 was
> showing unusual behaviour, reporting a page fault when there should be
> none. I ran the riscv binary on spike and it worked fine. Then I tried
> compiling the benchmark natively on x86 machine and ran it using valgrind
> and didn’t find any issue. I was wondering is this a bug in the RISCV port
> in gem5? I am compiling the following benchmark.
>
> //// 
> viterbi////
>  This application performs viterbi decoding on a frame of encoded data.
> #include #include #include #include 
> //// 
> viterbi 
> dataset////
>  constraint length (number of memory registers)const int K = 7;// number of 
> symbol bits per input bitconst int rate = 2;// Dataset parameters// size of 
> data packetconst int framebits = 2048;// generator polynomialsint polys[rate] 
> = {121, 91};// size of bitpacked data arraysconst int data_sz = 
> framebits/8;unsigned char data[((framebits + (K-1)) / 8) + 1];unsigned char 
> syms[rate*(framebits+(K-1))];unsigned char ref[(framebits+(K-1))/8+1];
> namespace viterbi {namespace details {// constraint length (number of memory 
> registers)// implementation requires K <= 8const int K = 7;// number of 
> symbol bits per input bitconst int rate = 2;// number of possible decoder 
> statesconst int STATES (1 << (K-1));
> }}
> namespace viterbi {
>   using namespace details;
>
>   // Quickly generate a parity bit
>   // see http://graphics.stanford.edu/~seander/bithacks.html#ParityParallel
>
>   //__attribute__ ((always_inline))
>   int generate_symbol_bit_scalar(int x) {
> x ^= (x >> 16);
> x ^= (x >> 8);
> x ^= (x >> 4);
> x &= 0xf;
> return (0x6996 >> x) & 1;
>   }
>
>   // Viterbi Decoder: take encoded symbols and return the decoded msg
>   void viterbi_scalar(unsigned char symbols[], unsigned char msg[],
>   int poly[], int framebits) {
>
> // Branch Table stores the state transitions, we only need to build
> // half thanks to trellace symmetry
> unsigned int branch_table[STATES/2 * rate];
>
> unsigned int* branch_table_ptr = &branch_table[0];
> //
> // Build Branch Lookup Table
> //
>
> for (int state = 0; state < STATES/2; state++) {
>   for (int i = 0; i < rate; i ++) {
> int bit = generate_symbol_bit_scalar(2*state & poly[i]);
> branch_table[i*STATES/2 + state] = bit ? 255 : 0;
>   }
> }
>
> // Two buffers to store the accumulated error for each state/path
> unsigned int error1[STATES];
> unsigned int error2[STATES];
> // Pointers to track which accumulated error buffer is most current,
> // pointer targets are swapped after each frame bit
> unsigned int * old_error = error1;
> unsigned int * new_error = error2;
> // Record the minimum error path entering each state for all framebits
> int traces[framebits+(K-1)][STATES];
>
> // Bias the accumulated error buffer towards the known start state
> error1[0] = 0;
> for(int i=1;i   error1[i] = 63;
>
> //
> // Calculate Forward Paths
> //
> // For each frame bit, accumulate errors and determine path entering
> // states i & i+(STATES/2) (evaluate simultaneously using symmetry)
>
> for (int s = 0; s < framebits + (K-1); s++) {
>   for (int i = 0; i < STATES/2; i++) {
>
> int decision0, decision1;
> unsigned int metric, m0, m1, m2, m3;
> metric = 0;
>
> // Compute the error metric for this state
> for (int j = 0; j < rate; j++)
>   metric += branch_table_ptr[i+j*STATES/2] ^ symbols[s*rate+j];
>
> const unsigned int max = rate*(256 - 1);
>
> m0 = old_error[i] + metric;
> m1 = old_error[i+STATES/2] + (max - metric);
> m2 = old_error[i] + (max - metric);
> m3 = old_error[i+STATES/2] + metric;
>
> // Determine which transition is minimum
> decision0 = m0 > m1;
>

Re: [gem5-users] ARM Architecture with Ruby Memory model

2017-09-25 Thread Nikos Nikoleris
Hi Amit,


How did you run the two simulations? To simulate different ruby protocols, 
normally you would have to recompile gem5 with a different PROTOCOL variable.


Nikos


From: gem5-users  on behalf of Amit Joshi 

Sent: 24 September 2017 12:31:28
To: gem5 users mailing list
Subject: Re: [gem5-users] ARM Architecture with Ruby Memory model

Hi Nikos,
I tried SE and FS modes with MOESI CMP Directory and MESI protocols using ARM 
architecture. It is not working appropriately. There is no difference in stats 
and config files for both the protocols. Please give me some detailed insights 
to execute this appropriately.

--
Regards,

AMIT D. JOSHI



On Fri, Aug 18, 2017 at 3:00 PM, Nikos Nikoleris 
mailto:nikos.nikole...@arm.com>> wrote:

Hi Amit,


We've recently made some changes and you should be able to run full system 
simulations using arm and ruby as the memory model. I believe you should be 
able to boot Linux using the MOESI_CMP_directory ruby protocol and possibly 
other protocols as well.


Nikos



From: gem5-users 
mailto:gem5-users-boun...@gem5.org>> on behalf of 
Amit Joshi mailto:amitjoshi...@gmail.com>>
Sent: 17 August 2017 08:23
To: gem5-users@gem5.org
Subject: [gem5-users] ARM Architecture with Ruby Memory model

Can we use ARM architecture with Ruby memory model with full system simulation?


AMIT

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] RISCV port for gem5 [code works on spike not on gem5]

2017-09-25 Thread Nitish Srivastava
Hi everyone,

I was recently trying to compiler few benchmarks from the
https://github.com/cornell-brg/xloops-bmarks using RISCV gcc compiler and
was trying to run them on gem5. There were few benchmarks in which gem5 was
showing unusual behaviour, reporting a page fault when there should be
none. I ran the riscv binary on spike and it worked fine. Then I tried
compiling the benchmark natively on x86 machine and ran it using valgrind
and didn’t find any issue. I was wondering is this a bug in the RISCV port
in gem5? I am compiling the following benchmark.

////
viterbi////
This application performs viterbi decoding on a frame of encoded data.
#include #include #include #include 
////
viterbi 
dataset////
constraint length (number of memory registers)const int K = 7;//
number of symbol bits per input bitconst int rate = 2;// Dataset
parameters// size of data packetconst int framebits = 2048;//
generator polynomialsint polys[rate] = {121, 91};// size of bitpacked
data arraysconst int data_sz = framebits/8;unsigned char
data[((framebits + (K-1)) / 8) + 1];unsigned char
syms[rate*(framebits+(K-1))];unsigned char ref[(framebits+(K-1))/8+1];
namespace viterbi {namespace details {// constraint length (number of
memory registers)// implementation requires K <= 8const int K = 7;//
number of symbol bits per input bitconst int rate = 2;// number of
possible decoder statesconst int STATES (1 << (K-1));
}}
namespace viterbi {
  using namespace details;

  // Quickly generate a parity bit
  // see http://graphics.stanford.edu/~seander/bithacks.html#ParityParallel

  //__attribute__ ((always_inline))
  int generate_symbol_bit_scalar(int x) {
x ^= (x >> 16);
x ^= (x >> 8);
x ^= (x >> 4);
x &= 0xf;
return (0x6996 >> x) & 1;
  }

  // Viterbi Decoder: take encoded symbols and return the decoded msg
  void viterbi_scalar(unsigned char symbols[], unsigned char msg[],
  int poly[], int framebits) {

// Branch Table stores the state transitions, we only need to build
// half thanks to trellace symmetry
unsigned int branch_table[STATES/2 * rate];

unsigned int* branch_table_ptr = &branch_table[0];
//
// Build Branch Lookup Table
//

for (int state = 0; state < STATES/2; state++) {
  for (int i = 0; i < rate; i ++) {
int bit = generate_symbol_bit_scalar(2*state & poly[i]);
branch_table[i*STATES/2 + state] = bit ? 255 : 0;
  }
}

// Two buffers to store the accumulated error for each state/path
unsigned int error1[STATES];
unsigned int error2[STATES];
// Pointers to track which accumulated error buffer is most current,
// pointer targets are swapped after each frame bit
unsigned int * old_error = error1;
unsigned int * new_error = error2;
// Record the minimum error path entering each state for all framebits
int traces[framebits+(K-1)][STATES];

// Bias the accumulated error buffer towards the known start state
error1[0] = 0;
for(int i=1;i m1;
decision1 = m2 > m3;

// Save error for minimum transition
new_error[2*i] =   decision0 ? m1 : m0;
new_error[2*i+1] = decision1 ? m3 : m2;

// Save transmission bit for minimum transition
traces[s][2*i]   = decision0;
traces[s][2*i+1] = decision1;
  }

  // Swap targets of old and new error buffers
  unsigned int * temp = old_error;
  old_error = new_error;
  new_error = temp;
}

//
// Path Traceback
//
// Traceback through the path with the lowest error and generate
// the decoded message based on the observed state transitions

// Assume final state is 0
unsigned int endstate = 0;
unsigned int nbits = framebits;

// Offset to only access the last framebits bits (ignore flush bits)
int (* trace_ptr)[STATES] = traces+(K-1);

// Traceback loop, densely pack the message bits
while (nbits-- !=0) {
  int k = trace_ptr[nbits][endstate];
  endstate = (endstate >> 1) | (k << (K-2));
  msg[nbits>>3] >>= 1;
  msg[nbits>>3] |= (k << 7);
}
  }

}
int main( int argc, char* argv[] )
{
for ( int i = 0; i < ((framebits + (K-1)) / 8) + 1; i++ )
  data[i] = i;

for ( int i = 0; i < rate*(framebits+(K-1)); i++ )
  syms[i] = i;

for ( int i = 0; i < (framebits+(K-1))/8+1; i++ )
  ref[i] = i;

unsigned char msg[(framebits + (K-1))/8 + 1] = {0};