Re: [gem5-users] Failed to boot dist-gem5 with aarch32-linux.img

2018-07-05 Thread Mohammad Alian
Happy that you figured it out. If you don't synchronize  the nodes before
starting the communication between nodes, you may get a panic message
indicating that the simulated cluster is out of sync. Like this message
that you get in your log.switch:

"panic: panic condition recv_tick <= curTick() occurred: Simulators out of
sync - missed packet receive by 1807723498999 ticks"

If you don't get the panic message in some simulations, it means that you
are lucky and the simulated cluster is sync by itself when the
communication takes place. This is a nondeterministic process though.

Best,
Mohammad

On Tue, Jul 3, 2018 at 11:23 PM, Boyang Xu  wrote:

> Hi,
>
> I have solved the problem becasue I added a command into the command line,
> which is "--dist-sync-start=10t". However, I can checkpoint in
> aarch64-linux.img without this command by dist-gem5. I have not known why
> yet. If you guys knows it, please tell it. Thanks a lot!
>
> On 2018-07-02 09:16 PM, Boyang Xu wrote:
>
> Hi Ciro,
> Thanks for your suggestion. I should have given more details.
>
> The .rcS script I use to take checkpoint is as follow. I downloaded it
> from dist-gem5 official website and did not modify it.
>
>
>> #!/bin/bash
>> # Authors: Mohammad Alian 
>> # boot gem5 and take a checkpoint
>> #
>> # The idea of this script is the same as
>> # "configs/boot/hack_back_ckpt.rcS" by Joel Hestness
>> # Please look into that for more info
>> #
>> source /root/.bashrc
>> # Retrieve dist-gem5 rank and size parameters using the 'm5' utility
>> MY_RANK=$(/sbin/m5 initparam dist-rank)
>> [ $? = 0 ] || { echo "m5 initparam failed"; exit -1; }
>> MY_SIZE=$(/sbin/m5 initparam dist-size)
>> [ $? = 0 ] || { echo "m5 initparam failed"; exit -1; }
>> echo "* Start boot script! *"
>> if [ "${RUNSCRIPT_VAR+set}" != set ]
>> then
>> # Signal our future self that it's safe to continue
>> echo "RUNSCRIPT_VAR not set! Setting it ..."
>> export RUNSCRIPT_VAR=1
>> else
>> echo "RUNSCRIPT_VAR is set!"
>> # We've already executed once, so we should exit
>> echo "calling m5 exit ..."
>> /sbin/m5 exit 1
>> fi
>> /bin/hostname node${MY_RANK}
>> # Keep MAC address assignment simple for now ...
>> (($MY_RANK > 97)) && { echo "(E) Rank must be less than 98"; /sbin/m5
>> abort; }
>> ((MY_ADDR = MY_RANK + 2))
>> if (($MY_ADDR < 10))
>> then
>> MY_ADDR_PADDED=0${MY_ADDR}
>> else
>> MY_ADDR_PADDED=${MY_ADDR}
>> fi
>> /sbin/ifconfig eth0 hw ether 00:90:00:00:00:${MY_ADDR_PADDED}
>> /sbin/ifconfig eth0 192.168.0.${MY_ADDR} netmask 255.255.255.0 up
>> /sbin/ifconfig -a
>> # take a checkpoint
>> if [ "$MY_RANK" == "0" ]
>> then
>> /sbin/m5 checkpoint 1
>> else
>> sleep 0.01
>> fi
>> #THIS IS WHERE EXECUTION BEGINS FROM AFTER RESTORING FROM CKPT
>> if [ "$RUNSCRIPT_VAR" -eq 1 ]
>> then
>> # Signal our future self not to recurse infinitely
>> export RUNSCRIPT_VAR=2
>> # Read the script for the checkpoint restored execution
>> echo "Loading new script..."
>> /sbin/m5 readfile > /tmp/runscript1.sh
>> # Execute the new runscript
>> if [ -s /tmp/runscript1.sh ]
>> then
>> /bin/bash /tmp/runscript1.sh
>> else
>> echo "Script not specified"
>> fi
>> fi
>> echo "Fell through script. Exiting ..."
>> /sbin/m5 exit 1
>
>
> When I took a checkpoint in "aarch64-ubuntu-trusty-headless.img" by
> dist-gem5, It works.  The important part in output file log.0 is as follow.
>
>
>> warn: Device specific PCI config space not implemented for
>> testsys.realview.ethernet!
>> 26258473387000: global: DistIface::readyToCkpt() called, delay:1 period:0
>> info: m5 checkpoint called with non-zero delay => triggering immediate
>> checkpoint (at the next sync)
>>
>>
>>
>>
>>
>> *2625848000: global: DistIFace::drain() called2625848500: global:
>> DistIFace::drain() calledinfo: Entering event queue @ 2625848000.
>> Starting simulation...Writing checkpoint2625848500: global:
>> DistIFace::drainResume() calledinfo: Entering event queue @
>> 2625848500.  Starting simulation...*
>> 26267427931500: global: DistIface::readyToExit() called, delay:1
>> info: m5 exit called with non-zero delay => triggering immediate exit (at
>> the next sync)
>> Exiting @ tick 2626743000 because exit request from gem5 peers
>
>
> However, when I took a checkpoint in "aarch32-ubuntu-natty-headless.img",
> It did not work. The same part in output file log.0 is as follow:
>
>> warn:  instruction 'mcr bpiall' unimplemented
>> 336281844: global: DistIface::readyToCkpt() called, delay:1 period:0
>> info: m5 checkpoint called with non-zero delay => triggering immediate
>> checkpoint (at the next sync)
>> 3385307368500: global: DistIface::readyToExit() called, delay:1
>> info: m5 exit called with non-zero delay => triggering immediate exit (at
>> the next sync)
>> info: recv(): Connection closed
>> Exiting @ tick 3389177332000 because connection to gem5 peer got closed
>
>
>
> You can see 

Re: [gem5-users] Failed to boot dist-gem5 with aarch32-linux.img

2018-07-03 Thread Boyang Xu
Hi, 

I have solved the problem becasue I added a command into the command
line, which is "--dist-sync-start=10t". However, I can
checkpoint in aarch64-linux.img without this command by dist-gem5. I
have not known why yet. If you guys knows it, please tell it. Thanks a
lot! 

On 2018-07-02 09:16 PM, Boyang Xu wrote:

> Hi Ciro,
> 
> Thanks for your suggestion. I should have given more details. 
> 
> The .rcS script I use to take checkpoint is as follow. I downloaded it from 
> dist-gem5 official website and did not modify it. 
> 
>> #!/bin/bash
>> # Authors: Mohammad Alian 
>> # boot gem5 and take a checkpoint
>> #
>> # The idea of this script is the same as
>> # "configs/boot/hack_back_ckpt.rcS" by Joel Hestness
>> # Please look into that for more info
>> #
>> source /root/.bashrc
>> # Retrieve dist-gem5 rank and size parameters using the 'm5' utility
>> MY_RANK=$(/sbin/m5 initparam dist-rank)
>> [ $? = 0 ] || { echo "m5 initparam failed"; exit -1; }
>> MY_SIZE=$(/sbin/m5 initparam dist-size)
>> [ $? = 0 ] || { echo "m5 initparam failed"; exit -1; }
>> echo "* Start boot script! *"
>> if [ "${RUNSCRIPT_VAR+set}" != set ]
>> then
>> # Signal our future self that it's safe to continue
>> echo "RUNSCRIPT_VAR not set! Setting it ..."
>> export RUNSCRIPT_VAR=1
>> else
>> echo "RUNSCRIPT_VAR is set!"
>> # We've already executed once, so we should exit
>> echo "calling m5 exit ..."
>> /sbin/m5 exit 1
>> fi
>> /bin/hostname node${MY_RANK}
>> # Keep MAC address assignment simple for now ...
>> (($MY_RANK > 97)) && { echo "(E) Rank must be less than 98"; /sbin/m5 abort; 
>> }
>> ((MY_ADDR = MY_RANK + 2))
>> if (($MY_ADDR < 10))
>> then
>> MY_ADDR_PADDED=0${MY_ADDR}
>> else
>> MY_ADDR_PADDED=${MY_ADDR}
>> fi
>> /sbin/ifconfig eth0 hw ether 00:90:00:00:00:${MY_ADDR_PADDED}
>> /sbin/ifconfig eth0 192.168.0.${MY_ADDR} netmask 255.255.255.0 up
>> /sbin/ifconfig -a
>> # take a checkpoint
>> if [ "$MY_RANK" == "0" ]
>> then
>> /sbin/m5 checkpoint 1
>> else
>> sleep 0.01
>> fi
>> #THIS IS WHERE EXECUTION BEGINS FROM AFTER RESTORING FROM CKPT
>> if [ "$RUNSCRIPT_VAR" -eq 1 ]
>> then
>> # Signal our future self not to recurse infinitely
>> export RUNSCRIPT_VAR=2
>> # Read the script for the checkpoint restored execution
>> echo "Loading new script..."
>> /sbin/m5 readfile > /tmp/runscript1.sh
>> # Execute the new runscript
>> if [ -s /tmp/runscript1.sh ]
>> then
>> /bin/bash /tmp/runscript1.sh
>> else
>> echo "Script not specified"
>> fi
>> fi
>> echo "Fell through script. Exiting ..."
>> /sbin/m5 exit 1
> 
> When I took a checkpoint in "aarch64-ubuntu-trusty-headless.img" by 
> dist-gem5, It works.  The important part in output file log.0 is as follow. 
> 
>> warn: Device specific PCI config space not implemented for 
>> testsys.realview.ethernet!
>> 26258473387000: global: DistIface::readyToCkpt() called, delay:1 period:0
>> info: m5 checkpoint called with non-zero delay => triggering immediate 
>> checkpoint (at the next sync)
>> 2625848000: global: DistIFace::drain() called
>> 2625848500: global: DistIFace::drain() called
>> info: Entering event queue @ 2625848000.  Starting simulation...
>> Writing checkpoint
>> 2625848500: global: DistIFace::drainResume() called
>> info: Entering event queue @ 2625848500.  Starting simulation...
>> 26267427931500: global: DistIface::readyToExit() called, delay:1
>> info: m5 exit called with non-zero delay => triggering immediate exit (at 
>> the next sync)
>> Exiting @ tick 2626743000 because exit request from gem5 peers
> 
> However, when I took a checkpoint in "aarch32-ubuntu-natty-headless.img", It 
> did not work. The same part in output file log.0 is as follow: 
> 
>> warn:  instruction 'mcr bpiall' unimplemented
>> 336281844: global: DistIface::readyToCkpt() called, delay:1 period:0
>> info: m5 checkpoint called with non-zero delay => triggering immediate 
>> checkpoint (at the next sync)
>> 3385307368500: global: DistIface::readyToExit() called, delay:1
>> info: m5 exit called with non-zero delay => triggering immediate exit (at 
>> the next sync)
>> info: recv(): Connection closed
>> Exiting @ tick 3389177332000 because connection to gem5 peer got closed
> 
> You can see there is not the overstriking part of the first log.0 in the 
> second log.0. 
> 
> On 2018-07-01 02:13 AM, Ciro Santilli wrote: 
> Just saw the attachments now. 
> 
> I would recommend in-lining them as much as possible in the email, and 
> selecting the most interesting part if they are huge. 
> 
> This will make it more likely that people will look at them, and allow search 
> engines to index them. 
> 
> On Sun, Jul 1, 2018 at 9:56 AM, Ciro Santilli  wrote:
> 
> How did you try to take the checkpoint? Manually or with some init script? 
> 
> How did you try to restore it, and how did it fail. 
> 
> Did the init actually script run? Add prints or set -x to it. 
> 
> On Sun, Jul 1, 2018 at 7:45 AM, Boyang Xu <6172...@gmail.com> wrote: 
> 
> Hi everyone, 
> 
> I 

Re: [gem5-users] Failed to boot dist-gem5 with aarch32-linux.img

2018-07-02 Thread Boyang Xu
Hi Ciro,

Thanks for your suggestion. I should have given more details. 

The .rcS script I use to take checkpoint is as follow. I downloaded it
from dist-gem5 official website and did not modify it. 

> #!/bin/bash
> # Authors: Mohammad Alian 
> # boot gem5 and take a checkpoint
> #
> # The idea of this script is the same as
> # "configs/boot/hack_back_ckpt.rcS" by Joel Hestness
> # Please look into that for more info
> #
> source /root/.bashrc
> # Retrieve dist-gem5 rank and size parameters using the 'm5' utility
> MY_RANK=$(/sbin/m5 initparam dist-rank)
> [ $? = 0 ] || { echo "m5 initparam failed"; exit -1; }
> MY_SIZE=$(/sbin/m5 initparam dist-size)
> [ $? = 0 ] || { echo "m5 initparam failed"; exit -1; }
> echo "* Start boot script! *"
> if [ "${RUNSCRIPT_VAR+set}" != set ]
> then
> # Signal our future self that it's safe to continue
> echo "RUNSCRIPT_VAR not set! Setting it ..."
> export RUNSCRIPT_VAR=1
> else
> echo "RUNSCRIPT_VAR is set!"
> # We've already executed once, so we should exit
> echo "calling m5 exit ..."
> /sbin/m5 exit 1
> fi
> /bin/hostname node${MY_RANK}
> # Keep MAC address assignment simple for now ...
> (($MY_RANK > 97)) && { echo "(E) Rank must be less than 98"; /sbin/m5 abort; }
> ((MY_ADDR = MY_RANK + 2))
> if (($MY_ADDR < 10))
> then
> MY_ADDR_PADDED=0${MY_ADDR}
> else
> MY_ADDR_PADDED=${MY_ADDR}
> fi
> /sbin/ifconfig eth0 hw ether 00:90:00:00:00:${MY_ADDR_PADDED}
> /sbin/ifconfig eth0 192.168.0.${MY_ADDR} netmask 255.255.255.0 up
> /sbin/ifconfig -a
> # take a checkpoint
> if [ "$MY_RANK" == "0" ]
> then
> /sbin/m5 checkpoint 1
> else
> sleep 0.01
> fi
> #THIS IS WHERE EXECUTION BEGINS FROM AFTER RESTORING FROM CKPT
> if [ "$RUNSCRIPT_VAR" -eq 1 ]
> then
> # Signal our future self not to recurse infinitely
> export RUNSCRIPT_VAR=2
> # Read the script for the checkpoint restored execution
> echo "Loading new script..."
> /sbin/m5 readfile > /tmp/runscript1.sh
> # Execute the new runscript
> if [ -s /tmp/runscript1.sh ]
> then
> /bin/bash /tmp/runscript1.sh
> else
> echo "Script not specified"
> fi
> fi
> echo "Fell through script. Exiting ..."
> /sbin/m5 exit 1

When I took a checkpoint in "aarch64-ubuntu-trusty-headless.img" by
dist-gem5, It works.  The important part in output file log.0 is as
follow. 

> warn: Device specific PCI config space not implemented for 
> testsys.realview.ethernet!
> 26258473387000: global: DistIface::readyToCkpt() called, delay:1 period:0
> info: m5 checkpoint called with non-zero delay => triggering immediate 
> checkpoint (at the next sync)
> 2625848000: global: DistIFace::drain() called
> 2625848500: global: DistIFace::drain() called
> info: Entering event queue @ 2625848000.  Starting simulation...
> Writing checkpoint
> 2625848500: global: DistIFace::drainResume() called
> info: Entering event queue @ 2625848500.  Starting simulation...
> 26267427931500: global: DistIface::readyToExit() called, delay:1
> info: m5 exit called with non-zero delay => triggering immediate exit (at the 
> next sync)
> Exiting @ tick 2626743000 because exit request from gem5 peers

However, when I took a checkpoint in
"aarch32-ubuntu-natty-headless.img", It did not work. The same part in
output file log.0 is as follow: 

> warn:  instruction 'mcr bpiall' unimplemented
> 336281844: global: DistIface::readyToCkpt() called, delay:1 period:0
> info: m5 checkpoint called with non-zero delay => triggering immediate 
> checkpoint (at the next sync)
> 3385307368500: global: DistIface::readyToExit() called, delay:1
> info: m5 exit called with non-zero delay => triggering immediate exit (at the 
> next sync)
> info: recv(): Connection closed
> Exiting @ tick 3389177332000 because connection to gem5 peer got closed

You can see there is not the overstriking part of the first log.0 in the
second log.0. 

On 2018-07-01 02:13 AM, Ciro Santilli wrote:

> Just saw the attachments now. 
> 
> I would recommend in-lining them as much as possible in the email, and 
> selecting the most interesting part if they are huge. 
> 
> This will make it more likely that people will look at them, and allow search 
> engines to index them. 
> 
> On Sun, Jul 1, 2018 at 9:56 AM, Ciro Santilli  wrote:
> 
> How did you try to take the checkpoint? Manually or with some init script? 
> 
> How did you try to restore it, and how did it fail. 
> 
> Did the init actually script run? Add prints or set -x to it. 
> 
> On Sun, Jul 1, 2018 at 7:45 AM, Boyang Xu <6172...@gmail.com> wrote: 
> 
> Hi everyone, 
> 
> I failed to take a checkpoint with aarch32-ubuntu-natty-headless.img by 
> dist-gem5, but succeeded to do it with aarch64-ubuntu-trusty-headless.img. 
> The input and output files are attached. 
> 
> My command line is as follow: build/ARM/gem5.opt
> -d m5out.0
> --debug-flags=DistEthernet
> configs/example/fs.py
> --cpu-type=AtomicSimpleCPU --num-cpus=1 --machine-type=VExpress_EMM
> --disk-image=aarch32-ubuntu-natty-headless.img
> 

Re: [gem5-users] Failed to boot dist-gem5 with aarch32-linux.img

2018-07-01 Thread Ciro Santilli
Just saw the attachments now.

I would recommend in-lining them as much as possible in the email, and
selecting the most interesting part if they are huge.

This will make it more likely that people will look at them, and allow
search engines to index them.

On Sun, Jul 1, 2018 at 9:56 AM, Ciro Santilli 
wrote:

> How did you try to take the checkpoint? Manually or with some init script?
>
> How did you try to restore it, and how did it fail.
>
> Did the init actually script run? Add prints or set -x to it.
>
> On Sun, Jul 1, 2018 at 7:45 AM, Boyang Xu <6172...@gmail.com> wrote:
>
>> Hi everyone,
>>
>>
>> I failed to take a checkpoint with aarch32-ubuntu-natty-headless.img by
>> dist-gem5, but succeeded to do it with aarch64-ubuntu-trusty-headless.img.
>> The input and output files are attached.
>>
>>
>> My command line is as follow:
>>
>>> build/ARM/gem5.opt
>>> -d m5out.0
>>> --debug-flags=DistEthernet
>>> configs/example/fs.py
>>> --cpu-type=AtomicSimpleCPU --num-cpus=1 --machine-type=VExpress_EMM
>>> --disk-image=aarch32-ubuntu-natty-headless.img
>>> --kernel=vmlinux.aarch32.ll_20131205.0-gem5
>>> --script=boot.easy.ckpt.rcS
>>> --checkpoint-dir=m5out.0
>>> --dist --dist-rank=0 --dist-size=2 --dist-server-name=127.0.0.1
>>> --dist-server-port=2200
>>
>>
>>
>> Any suggestion and help on taking a checkpoint with linux_32bit.img by
>> dist-gem5 is welcomed. Thanks a lot!
>>
>>
>> Best Regards,
>> Boyang Xu
>>
>> A graduate student in UVIC
>>
>> ___
>> gem5-users mailing list
>> gem5-users@gem5.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Failed to boot dist-gem5 with aarch32-linux.img

2018-07-01 Thread Ciro Santilli
How did you try to take the checkpoint? Manually or with some init script?

How did you try to restore it, and how did it fail.

Did the init actually script run? Add prints or set -x to it.

On Sun, Jul 1, 2018 at 7:45 AM, Boyang Xu <6172...@gmail.com> wrote:

> Hi everyone,
>
>
> I failed to take a checkpoint with aarch32-ubuntu-natty-headless.img by
> dist-gem5, but succeeded to do it with aarch64-ubuntu-trusty-headless.img.
> The input and output files are attached.
>
>
> My command line is as follow:
>
>> build/ARM/gem5.opt
>> -d m5out.0
>> --debug-flags=DistEthernet
>> configs/example/fs.py
>> --cpu-type=AtomicSimpleCPU --num-cpus=1 --machine-type=VExpress_EMM
>> --disk-image=aarch32-ubuntu-natty-headless.img
>> --kernel=vmlinux.aarch32.ll_20131205.0-gem5
>> --script=boot.easy.ckpt.rcS
>> --checkpoint-dir=m5out.0
>> --dist --dist-rank=0 --dist-size=2 --dist-server-name=127.0.0.1
>> --dist-server-port=2200
>
>
>
> Any suggestion and help on taking a checkpoint with linux_32bit.img by
> dist-gem5 is welcomed. Thanks a lot!
>
>
> Best Regards,
> Boyang Xu
>
> A graduate student in UVIC
>
> ___
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

[gem5-users] Failed to boot dist-gem5 with aarch32-linux.img

2018-07-01 Thread Boyang Xu
Hi everyone,


I failed to take a checkpoint with aarch32-ubuntu-natty-headless.img by
dist-gem5, but succeeded to do it with aarch64-ubuntu-trusty-headless.img.
The input and output files are attached.


My command line is as follow:

> build/ARM/gem5.opt
> -d m5out.0
> --debug-flags=DistEthernet
> configs/example/fs.py
> --cpu-type=AtomicSimpleCPU --num-cpus=1 --machine-type=VExpress_EMM
> --disk-image=aarch32-ubuntu-natty-headless.img
> --kernel=vmlinux.aarch32.ll_20131205.0-gem5
> --script=boot.easy.ckpt.rcS
> --checkpoint-dir=m5out.0
> --dist --dist-rank=0 --dist-size=2 --dist-server-name=127.0.0.1
> --dist-server-port=2200



Any suggestion and help on taking a checkpoint with linux_32bit.img by
dist-gem5 is welcomed. Thanks a lot!


Best Regards,
Boyang Xu

A graduate student in UVIC


log.0
Description: Binary data


log.1
Description: Binary data


log.switch
Description: Binary data


m5out.0.testsys.terminal
Description: Binary data


m5out.1.testsys.terminal
Description: Binary data
___
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users