Re: [Qemu-devel] [PATCH v2] rtc: placing RTC memory region outside BQL

2018-02-09 Thread Gonglei (Arei)
>
> > >
> > > $ cat strace_c.sh
> > > strace -tt -p $1 -c -o result_$1.log &
> > > sleep $2
> > > pid=$(pidof strace)
> > > kill $pid
> > > cat result_$1.log
> > >
> > > Before appling this change:
> > > $ ./strace_c.sh 10528 30
> > > % time seconds  usecs/call callserrors syscall
> > > -- --- --- - - 
> > >  93.870.119070  30  4000   ppoll
> > >   3.270.004148   2  2038   ioctl
> > >   2.660.003370   2  2014   futex
> > >   0.090.000113   1   106   read
> > >   0.090.000109   1   104   io_getevents
> > >   0.020.29   130   poll
> > >   0.000.00   0 1   write
> > > -- --- --- - - 
> > > 100.000.126839  8293   total
> > >
> > > After appling the change:
> > > $ ./strace_c.sh 23829 30
> > > % time seconds  usecs/call callserrors syscall
> > > -- --- --- - - 
> > >  92.860.067441  16  4094   ppoll
> > >   4.850.003522   2  2136   ioctl
> > >   1.170.000850   4   189   futex
> > >   0.540.000395   2   202   read
> > >   0.520.000379   2   202   io_getevents
> > >   0.050.37   130   poll
> > > -- --- --- - - 
> > > 100.000.072624  6853   total
> > >
> > > The futex call number decreases ~90.6% on an idle windows 7 guest.
> >
> > These are the same figures as from v1 -- it would be interesting
> > to check whether the additional locking that v2 adds has affected
> > the results.
> >
> Oh, yes. the futex number of v2 don't decline compared too much to v1 because
> it
> takes the BQL before raising the outbound IRQ line now.
> 
> Before applying v2:
> # ./strace_c.sh 8776 30
> % time seconds  usecs/call callserrors syscall
> -- --- --- - - 
>  78.010.164188  26  6436   ppoll
>   8.390.017650   5  370039 futex
>   7.680.016157   6  2758   ioctl
>   5.480.011530   3  4586  1113 read
>   0.300.000640  2032   io_submit
>   0.150.000317   489   write
> -- --- --- - - 
> 100.000.210482 17601  1152 total
> 
> After applying v2:
> # ./strace_c.sh 15968 30
> % time seconds  usecs/call callserrors syscall
> -- --- --- - - 
>  78.280.171117  27  6272   ppoll
>   8.500.018571   5  366321 futex
>   7.760.016973   6  2732   ioctl
>   4.850.010597   3  4115   853 read
>   0.310.000672  1163   io_submit
>   0.300.000659   4   180   write
> -- --- --- - - 
> 100.000.218589 17025   874 total
> 
> > Does the patch improve performance in a more interesting use
> > case than "the guest is just idle" ?
> >
> I think so, after all, the scope of the locking is reduced .
> Besides this, can we optimize the rtc timer to avoid to hold BQL
> by separate threads?
> 
Hi Peter, Paolo

I tested PCMark 8 (https://www.futuremark.com/benchmarks/pcmark) 
in win7 guest and got the below results:

Guest: 2U2G

Before applying v2:

Your Work 2.0 score:   2000
Web Browsing - JunglePin0.334s
Web Browsing - Amazonia0.132s
Writing3.59s
Spreadsheet70.13s
Video Chat v2/Video Chat playback 1 v2   22.8 fps
Video Chat v2/Video Chat encoding v2   307.0 ms
Benchmark duration1h 35min 46s

After applying v2:

Your Work 2.0 score:   2040
Web Browsing - JunglePin0.345s
Web Browsing - Amazonia0.132s
Writing3.56s
Spreadsheet67.83s
Video Chat v2/Video Chat playback 1 v2   28.7 fps
Video Chat v2/Video Chat encoding v2   324.7 ms
Benchmark duration1h 32min 5s

Test results show that optimization is very effective in stressful situations.

Thanks,
-Gonglei



Re: [Qemu-devel] [PATCH v2] rtc: placing RTC memory region outside BQL

2018-02-07 Thread Gonglei (Arei)
> -Original Message-
> From: Peter Maydell [mailto:peter.mayd...@linaro.org]
> Sent: Tuesday, February 06, 2018 10:36 PM
> To: Gonglei (Arei)
> Cc: QEMU Developers; Paolo Bonzini; Huangweidong (C)
> Subject: Re: [PATCH v2] rtc: placing RTC memory region outside BQL
> 
> On 6 February 2018 at 14:07, Gonglei  wrote:
> > As windows guest use rtc as the clock source device,
> > and access rtc frequently. Let's move the rtc memory
> > region outside BQL to decrease overhead for windows guests.
> > Meanwhile, adding a new lock to avoid different vCPUs
> > access the RTC together.
> >
> > $ cat strace_c.sh
> > strace -tt -p $1 -c -o result_$1.log &
> > sleep $2
> > pid=$(pidof strace)
> > kill $pid
> > cat result_$1.log
> >
> > Before appling this change:
> > $ ./strace_c.sh 10528 30
> > % time seconds  usecs/call callserrors syscall
> > -- --- --- - - 
> >  93.870.119070  30  4000   ppoll
> >   3.270.004148   2  2038   ioctl
> >   2.660.003370   2  2014   futex
> >   0.090.000113   1   106   read
> >   0.090.000109   1   104   io_getevents
> >   0.020.29   130   poll
> >   0.000.00   0 1   write
> > -- --- --- - - 
> > 100.000.126839  8293   total
> >
> > After appling the change:
> > $ ./strace_c.sh 23829 30
> > % time seconds  usecs/call callserrors syscall
> > -- --- --- - - 
> >  92.860.067441  16  4094   ppoll
> >   4.850.003522   2  2136   ioctl
> >   1.170.000850   4   189   futex
> >   0.540.000395   2   202   read
> >   0.520.000379   2   202   io_getevents
> >   0.050.37   130   poll
> > -- --- --- - - 
> > 100.000.072624  6853   total
> >
> > The futex call number decreases ~90.6% on an idle windows 7 guest.
> 
> These are the same figures as from v1 -- it would be interesting
> to check whether the additional locking that v2 adds has affected
> the results.
> 
Oh, yes. the futex number of v2 don't decline compared too much to v1 because it
takes the BQL before raising the outbound IRQ line now.

Before applying v2:
# ./strace_c.sh 8776 30
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 78.010.164188  26  6436   ppoll
  8.390.017650   5  370039 futex
  7.680.016157   6  2758   ioctl
  5.480.011530   3  4586  1113 read
  0.300.000640  2032   io_submit
  0.150.000317   489   write
-- --- --- - - 
100.000.210482 17601  1152 total

After applying v2:
# ./strace_c.sh 15968 30
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 78.280.171117  27  6272   ppoll
  8.500.018571   5  366321 futex
  7.760.016973   6  2732   ioctl
  4.850.010597   3  4115   853 read
  0.310.000672  1163   io_submit
  0.300.000659   4   180   write
-- --- --- - - 
100.000.218589 17025   874 total

> Does the patch improve performance in a more interesting use
> case than "the guest is just idle" ?
> 
I think so, after all, the scope of the locking is reduced . 
Besides this, can we optimize the rtc timer to avoid to hold BQL 
by separate threads?

> > +static void rtc_rasie_irq(RTCState *s)
> 
> Typo: should be "raise".
> 
Good catch. :)

Thanks,
-Gonglei


Re: [Qemu-devel] [PATCH v2] rtc: placing RTC memory region outside BQL

2018-02-06 Thread Peter Maydell
On 6 February 2018 at 14:07, Gonglei  wrote:
> As windows guest use rtc as the clock source device,
> and access rtc frequently. Let's move the rtc memory
> region outside BQL to decrease overhead for windows guests.
> Meanwhile, adding a new lock to avoid different vCPUs
> access the RTC together.
>
> $ cat strace_c.sh
> strace -tt -p $1 -c -o result_$1.log &
> sleep $2
> pid=$(pidof strace)
> kill $pid
> cat result_$1.log
>
> Before appling this change:
> $ ./strace_c.sh 10528 30
> % time seconds  usecs/call callserrors syscall
> -- --- --- - - 
>  93.870.119070  30  4000   ppoll
>   3.270.004148   2  2038   ioctl
>   2.660.003370   2  2014   futex
>   0.090.000113   1   106   read
>   0.090.000109   1   104   io_getevents
>   0.020.29   130   poll
>   0.000.00   0 1   write
> -- --- --- - - 
> 100.000.126839  8293   total
>
> After appling the change:
> $ ./strace_c.sh 23829 30
> % time seconds  usecs/call callserrors syscall
> -- --- --- - - 
>  92.860.067441  16  4094   ppoll
>   4.850.003522   2  2136   ioctl
>   1.170.000850   4   189   futex
>   0.540.000395   2   202   read
>   0.520.000379   2   202   io_getevents
>   0.050.37   130   poll
> -- --- --- - - 
> 100.000.072624  6853   total
>
> The futex call number decreases ~90.6% on an idle windows 7 guest.

These are the same figures as from v1 -- it would be interesting
to check whether the additional locking that v2 adds has affected
the results.

Does the patch improve performance in a more interesting use
case than "the guest is just idle" ?

> +static void rtc_rasie_irq(RTCState *s)

Typo: should be "raise".

thanks
-- PMM