Nanotrusting the Nanotime and amortization.

2018-04-24 Thread John Hening
I'm reading the great article from 
https://shipilev.net/blog/2014/nanotrusting-nanotime/ (thanks Aleksey! :)) 
and I am not sure whether I understand correctly that. 

Firstly, it is compared performance of plain and volatile writes:


Benchmark Mode Samples Mean Mean error Units
o.s.VolatileWriteSucks.incrPlain avgt 250 3.589 0.025 ns/op
o.s.VolatileWriteSucks.incrVolatile avgt 250 15.219 0.114 ns/op


and then it is written that: 

"In real code, the heavy-weight operations are mixed with relatively low-weight 
ops, which amortize the costs."

And my question is: What does it mean to amortize costs exactly? I explain it 
myself that amortization is caused by out of order execution of CPU, yes? 
So even if volatile write takes much more time than plain write, it isn't so 
painful because CPU executes other instruction out of order (if it can). 

What do you think?

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Analysis of perf sched for simple server.

2018-04-24 Thread Avi Kivity



On 2018-04-23 22:33, John Hening wrote:

|
Hello,

1. I have a simple one-threaded tcp server written in Java. I try to 
measure its receive-and-process-response latency. More preceisely, the 
client sends (by loopback) the 128-bytes message with a timestamp in 
the header. The server receives a message, reads a content byte by 
byte and compute a difference between `now` and the timestamp from the 
header. The difference is more or less 6μs.


Now, I am trying to make it faster.

But, firstly I would like to examine a scheduling issue. So, I've 
collected results with:


perf sched record -p  -- sleep 10

and then:

perf sched timehist -V

Excerpt from the result is presented below: (I've filtered it for my 
server thread)


|
|

    wait time  sch delay   
run time

|time| |cpu| |task name|[tid/pid](msec)(msec)(msec)
---

1849.531842[0002]   s      java[7340/7338]0..00056.003
21849.587844[0002]   s      java[7340/7338]0..00056.001
21849.607838[0002]   s      java[7340/7338]0..00019.994
21849.615836[0002]   s      java[7340/7338]0..0007.998
...
21849.691834[0002]   s      java[7340/7338]0..0004.000
21849.703837[0001]  s     java[7340/7338]0..00038.330
21849.711838[0005]      s     java[7340/7338]0...000
21849.719834[0005]      s     java[7340/7338]0..0007.996
|

My question is:
How is it possible that wait_time is always zero? After all, it is 
impossible. My CPU is multicore but there is a lot of processes 
(threads) that needs CPU time. How to interpret that?
From the other hand I am gonna to reduce CPU migration. Perhaps it 
will help :-)


Maybe your setup is broken in some way. But perhaps if you unfilter the 
results it will make more sense.





2. The second issue is:

The excerpt from perf sched script
|

            java 7340[002]21848.012360:sched:sched_wakeup:comm=java 
pid=7511prio=120target_cpu=005
            java 7340[002]21848.012375:sched:sched_wakeup:comm=java 
pid=7511prio=120target_cpu=005
            java 7340[002]21848.012391:sched:sched_wakeup:comm=java 
pid=7511prio=120target_cpu=005
            java 7340[002]21848.012406:sched:sched_wakeup:comm=java 
pid=7511prio=120target_cpu=005

...
         swapper 0[007]21848.012554:sched:sched_wakeup:comm=java 
pid=7377prio=120target_cpu=007
         swapper 0[007]21848.012554:sched:sched_wakeup:comm=java 
pid=7377prio=120target_cpu=007

...
            java 7340[002]21848.012555:sched:sched_wakeup:comm=java 
pid=7511prio=120target_cpu=005
            java 
7377[007]21848.012582:sched:sched_stat_runtime:comm=java 
pid=7377runtime=37420[ns]vruntime=1628300433237[ns]
            java 7340[002]21848.012585:sched:sched_wakeup:comm=java 
pid=7511prio=120target_cpu=005
            java 
7377[007]21848.012587:sched:sched_switch:prev_comm=java 
prev_pid=7377prev_prio=120prev_state=S 
==>next_comm=swapper/7next_pid=0next_prio=120

|

Why my server receives sched_wakup that looks like spurious wakeups? 
What is a swapper?






When a cpu goes idle, it switches to the swapper (an idiosyncrasy). An 
interrupt then wakes up the cpu, runs the TCP stack, which decides to 
wake up your process. Those wakeups will be seen from any process which 
happens to run; swapper is simply the task that runs when nothing needs 
the cpu.


--
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.