Re: Netra T5220

2018-11-18 Thread Don NetBSD

On 11/18/2018 2:09 PM, Brett Lymn wrote:

On Sat, Nov 17, 2018 at 06:53:51PM -0700, Don NetBSD wrote:

On 11/17/2018 1:52 PM, Brett Lymn wrote:

On Fri, Nov 16, 2018 at 02:11:26PM -0700, Don NetBSD wrote:


Yes, but what's the prompt BEFORE that (u-boot>)?  And, where do I
find the capabilities, there, documented?


As someone else mentioned that is the Service Processor boot, it is a
cut down linux image, IIRC running on powerpc.  I doubt if you will find
much publically available information on the guts of the SP... even if
you have access to Oracle support, it is not something that Oracle
customers are meant to mess with.


No.  The prompt BEFORE the service processor starts Linux.


Yes, I know what you are talking about.  I deal with Sun/Oracle
equipment at $WORK.  I have seen that prompt.  I have never seen any
documentation as to what you can do there.  I would be surprised if
there is anything available at all outside Oracle - I think the attitude
is that the customers don't need to mess with the SP at all and should
be treated just as a firmware blob (which is, in fact, how the updates
are provided - a blov for the linux image plus OFW update)


With it, you can:
- reconfigure the serial port parameters
- adjust the delay before Linux boots (give you more time to interrupt
  that process)
- reset the password (that Linux will request at it's "login:" prompt)
- upgrade the ILOM firmware (without ILOM *or* OS being functional!)
- reset the default ILOM parameters (e.g., for network settings)
- configure the ILOM network parameters
- test the ILOM's network connection (e.g., ping other hosts)
- indicate whether or not physical presence is required to break autoboot
- connect to the "system's" serial port (bypassing ILOM)
- add/delete "users"
- examine SP settings
- enable/disable the front panel power button
- power down the host
- reset the SP and/or host
- run diagnostics on the SP

and, of course:

- boot the ILOM

[Of course, I suspect I'll uncover additional uses as I tinker more with it!]

Think of it as having a similar function wrt the ILOM as the OBP has to
the OS in older Sun boxen.

Of course, at $WORK, you're not trying to get INTO a box that someone
has locked up -- as YOU are the party who likely locked it up in the first
place!

OTOH, when a system falls into your lap, you don't always have that sort
of access.  So, you need to rely on mechanisms that the designers put in
place to make this sort of thing possible!


Re: Testing memory performance

2018-11-18 Thread Eric Hawicz

On 11/18/2018 7:00 AM, Sad Clouds wrote:

I'm developing a small tool that tests memory performance/throughput
across different environments. I'm noticing performance issues on
NetBSD-8, below are the details:

...

NetBSD and Linux have different versions of GCC, but I was hoping the
following flags would keep optimization differences to a minimum:


If you want to rule that out, you could always build the same version of 
gcc on both.  Or even run the linux binary (and libs) on NetBSD.




NetBSD: 16 threads x 1 GiB, using 1 KiB memcpy size, no mlock:
Thread 2 preflt=13504.86 msec, memcpy=2874.69 MiB/sec
...
Total transfer rate: 5817.56 MiB/sec


What?  I think your measurements are a bit off here.  There may be a 
problem with the speed, but if you're measuring the per-thread rate 
properly then the sum of those should equal your total transfer rate.  
Are the periods during which each thread calculates its rate very 
different from the period of the overall test?



Also, your subsequent email about memcpy disassembly does not list the 
full code for the linux version (the jumps at the start refer to 
instruction addresses that you don't include), so you can't really 
compare them.  I expect that both implementations have a variety of code 
blocks to handle different alignments, different supported instructions, 
etc..



Eric



Re: Testing memory performance

2018-11-18 Thread Rhialto
On Sun 18 Nov 2018 at 19:04:02 +, Sad Clouds wrote:
> Linux (gcc 6.3.0):

It looks to me like this fragment is not the whole function:

> Dump of assembler code for function memcpy:
> => 0x778a0e90 <+0>:   mov%rdi,%rax
>0x778a0e93 <+3>:   cmp$0x10,%rdx
>0x778a0e97 <+7>:   jb 0x778a0f77

0x778a0f77 isn't in the disassembly

>0x778a0e9d <+13>:  cmp$0x20,%rdx
>0x778a0ea1 <+17>:  ja 0x778a0fc6

0x778a0fc6 neither.

>0x778a0ea7 <+23>:  movups (%rsi),%xmm0
>0x778a0eaa <+26>:  movups -0x10(%rsi,%rdx,1),%xmm1
>0x778a0eaf <+31>:  movups %xmm0,(%rdi)
>0x778a0eb2 <+34>:  movups %xmm1,-0x10(%rdi,%rdx,1)
>0x778a0eb7 <+39>:  retq   
> End of assembler dump.

It looks like both functions check for some initial conditions to see
which optimized loop they can use, but they use very different
optimizations.

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- "What good is a Ring of Power
\X/ rhialto/at/falu.nl  -- if you're unable...to Speak." - Agent Elrond


signature.asc
Description: PGP signature


Re: Netra T5220

2018-11-18 Thread Brett Lymn
On Sat, Nov 17, 2018 at 06:53:51PM -0700, Don NetBSD wrote:
> On 11/17/2018 1:52 PM, Brett Lymn wrote:
> >On Fri, Nov 16, 2018 at 02:11:26PM -0700, Don NetBSD wrote:
> >>
> >>Yes, but what's the prompt BEFORE that (u-boot>)?  And, where do I
> >>find the capabilities, there, documented?
> >
> >As someone else mentioned that is the Service Processor boot, it is a
> >cut down linux image, IIRC running on powerpc.  I doubt if you will find
> >much publically available information on the guts of the SP... even if
> >you have access to Oracle support, it is not something that Oracle
> >customers are meant to mess with.
> 
> No.  The prompt BEFORE the service processor starts Linux.
> 

Yes, I know what you are talking about.  I deal with Sun/Oracle
equipment at $WORK.  I have seen that prompt.  I have never seen any
documentation as to what you can do there.  I would be surprised if
there is anything available at all outside Oracle - I think the attitude
is that the customers don't need to mess with the SP at all and should
be treated just as a firmware blob (which is, in fact, how the updates
are provided - a blov for the linux image plus OFW update)

-- 
Brett Lymn
Let go, or be dragged - Zen proverb.


Testing memory performance

2018-11-18 Thread Sad Clouds
I'm developing a small tool that tests memory performance/throughput
across different environments. I'm noticing performance issues on
NetBSD-8, below are the details:

The tool creates a number of concurrent threads, each threads allocates
1 GiB memory segment and a 1 KiB transfer block. It pre-faults every
page by writing a single byte at every 4 KiB offset. It then calls
memcpy () in a loop, copying 1 KiB block until 1 GiB memory segment is
filled.

NetBSD and Linux have different versions of GCC, but I was hoping the
following flags would keep optimization differences to a minimum:

gcc -O1 -fno-builtin -march=westmere -Wall -pedantic -std=c11 \
-D_FILE_OFFSET_BITS=64 -D_XOPEN_SOURCE=700 -D_DEFAULT_SOURCE

Hardware has 48 GiB of RAM, For this test I'm using 16 threads x 1 GiB =
16 GiB total.

I'm seeing several issues on NetBSD:

1. When each thread calls mlock() to lock pages, sometimes when 
unlocking those pages, munlock() fails with ENOMEM. It doesn't happen 
every time, but frequently enough and I don't know why specifically 
munlock() fails. Same code works correctly on Linux.

2. Performance with 16 concurrent threads is rather bad. Most threads
are idle 60% of the time (on Linux they are 100% busy), which suggests
some sort of contention somewhere. On NetBSD average throughput with 16
threads is around 5.8 GiB/sec, on Linux it is around 15.3 GiB/sec.

3. This issue affects both NetBSD and Linux. When using mlock() to
lock memory pages before issuing memcpy(), overall throughput drops
significantly. Threads seem to be serialized, while a few threads are
running, others are blocked for some reason. I don't know why mlock()
has this affect. 

If anyone has any thoughts on this, please let me know. 

Below are details of SMP architecture and test results

# lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):16
On-line CPU(s) list:   0-15
Thread(s) per core:2
Core(s) per socket:4
Socket(s): 2
NUMA node(s):  2
Vendor ID: GenuineIntel
CPU family:6
Model: 44
Model name:Intel(R) Xeon(R) CPU   E5620  @ 2.40GHz
Stepping:  2
CPU MHz:   1596.000
CPU max MHz:   2395.
CPU min MHz:   1596.
BogoMIPS:  4787.71
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  12288K
NUMA node0 CPU(s): 0-3,8-11
NUMA node1 CPU(s): 4-7,12-15



NetBSD: 16 threads x 1 GiB, using 1 KiB memcpy size, no mlock:
Thread 2 preflt=13504.86 msec, memcpy=2874.69 MiB/sec
Thread 7 preflt=14277.53 msec, memcpy=2891.39 MiB/sec
Thread 3 preflt=14765.99 msec, memcpy=2553.72 MiB/sec
Thread 14preflt=15036.90 msec, memcpy=2288.19 MiB/sec
Thread 1 preflt=15126.01 msec, memcpy=2315.53 MiB/sec
Thread 12preflt=15333.82 msec, memcpy=2071.52 MiB/sec
Thread 5 preflt=15603.25 msec, memcpy=1880.64 MiB/sec
Thread 6 preflt=15704.05 msec, memcpy=1662.66 MiB/sec
Thread 10preflt=15693.48 msec, memcpy=1642.44 MiB/sec
Thread 4 preflt=15571.64 msec, memcpy=1557.73 MiB/sec
Thread 15preflt=15574.60 msec, memcpy=1571.76 MiB/sec
Thread 9 preflt=15750.08 msec, memcpy=2170.44 MiB/sec
Thread 13preflt=15588.69 msec, memcpy=1900.24 MiB/sec
Thread 8 preflt=15587.50 msec, memcpy=2043.66 MiB/sec
Thread 16preflt=15265.48 msec, memcpy=1884.74 MiB/sec
Thread 11preflt=15294.87 msec, memcpy=2272.75 MiB/sec
Total transfer rate: 5817.56 MiB/sec


NetBSD: 16 threads x 1 GiB, using 1 KiB memcpy size, with mlock:
Thread 2 preflt=5.27 msec, memcpy=2595.67 MiB/sec
Thread 3 preflt=5.37 msec, memcpy=2550.90 MiB/sec
Thread 16preflt=5.02 msec, memcpy=2770.11 MiB/sec
Thread 4 preflt=4.12 msec, memcpy=3209.06 MiB/sec
Thread 15preflt=5.31 msec, memcpy=2496.82 MiB/sec
Thread 13preflt=7.46 msec, memcpy=3083.72 MiB/sec
Thread 5 preflt=5.49 msec, memcpy=2766.81 MiB/sec
Thread 14preflt=6.94 msec, memcpy=2574.98 MiB/sec
Thread 8 preflt=6.53 msec, memcpy=2201.47 MiB/sec
Thread 12preflt=4.90 msec, memcpy=2814.79 MiB/sec
Thread 10preflt=4.41 msec, memcpy=2615.27 MiB/sec
Thread 6 preflt=6.18 msec, memcpy=2844.57 MiB/sec
Thread 9 preflt=5.38 msec, memcpy=2976.05 MiB/sec
Thread 7 preflt=4.81 msec, memcpy=2828.54 MiB/sec
Thread 11preflt=5.10 msec, memcpy=2778.69 MiB/sec
Thread 1 preflt=3.84 msec, memcpy=3229.88 MiB/sec
Total transfer rate: 3789.33 MiB/sec




Linux: 16 threads x 1 GiB, using 1 KiB memcpy size, no mlock:
Thread 5 preflt=1122.06 msec, memcpy=990.24 MiB/sec
Thread 2 preflt=1137.94 msec, memcpy=990.41 MiB/sec
Thread 15preflt=1125.65 msec, memcpy=982.23 MiB/sec
Thread 4 preflt=1130.02 msec, memcpy=981.37 MiB/sec
Thread 9 preflt=1130.47 msec, memcpy=982.23 MiB/sec
Thread 13preflt=1127.70 msec, memcpy=982.00 MiB/sec
Thread 3 

Re: Netra T5220

2018-11-18 Thread Don NetBSD

On 11/18/2018 2:01 AM, Sad Clouds wrote:

On Sat, 17 Nov 2018 18:53:51 -0700
Don NetBSD  wrote:


The earlier "u-boot" prompt is significant as it lets you tinker with
the "pre-Linux" environment.  Among other things, it lets you erase
the Linux password so you CAN log into the SP (if you'd lost that
information)

I want to know what else it is useful for (besides exploring "help" at
that prompt)


U-Boot is just a bootloader, like Grub, the only useful thing it does,
is booting embedded Linux. Not sure why you'd want to tinker with that,
because if you misconfigure/damage it, you may find your hardware no
longer boots.


Actually, it does a fair bit more than "just boot Linux" -- hence the
reason to tinker with it!  :>  (why would it have a command interpreter if
the only thing it could do was "boot"?)

[Hint: ask yourself what you'd do if you didn't have the password to the ILOM;
or, if the Linux image had been corrupted/wouldn't boot; or if you wanted to
reflash that image (e.g., to support 11.4); or, if the serial port wasn't
"connected" to the ILOM]

For folks like me who acquire these devices without being able to
speak to the previous owner ("what's root's password?"), it's an
essential tool to getting into a box that may typically have been
locked up to prevent casual access (esp as you can't "pull the SP's
disk" to alter its contents off-line)

The hooks have been placed there, for a reason.  Silly NOT to understand
them and use them!


But anyway, good luck with your investigations