Re: Netra T5220
On 11/18/2018 2:09 PM, Brett Lymn wrote: On Sat, Nov 17, 2018 at 06:53:51PM -0700, Don NetBSD wrote: On 11/17/2018 1:52 PM, Brett Lymn wrote: On Fri, Nov 16, 2018 at 02:11:26PM -0700, Don NetBSD wrote: Yes, but what's the prompt BEFORE that (u-boot>)? And, where do I find the capabilities, there, documented? As someone else mentioned that is the Service Processor boot, it is a cut down linux image, IIRC running on powerpc. I doubt if you will find much publically available information on the guts of the SP... even if you have access to Oracle support, it is not something that Oracle customers are meant to mess with. No. The prompt BEFORE the service processor starts Linux. Yes, I know what you are talking about. I deal with Sun/Oracle equipment at $WORK. I have seen that prompt. I have never seen any documentation as to what you can do there. I would be surprised if there is anything available at all outside Oracle - I think the attitude is that the customers don't need to mess with the SP at all and should be treated just as a firmware blob (which is, in fact, how the updates are provided - a blov for the linux image plus OFW update) With it, you can: - reconfigure the serial port parameters - adjust the delay before Linux boots (give you more time to interrupt that process) - reset the password (that Linux will request at it's "login:" prompt) - upgrade the ILOM firmware (without ILOM *or* OS being functional!) - reset the default ILOM parameters (e.g., for network settings) - configure the ILOM network parameters - test the ILOM's network connection (e.g., ping other hosts) - indicate whether or not physical presence is required to break autoboot - connect to the "system's" serial port (bypassing ILOM) - add/delete "users" - examine SP settings - enable/disable the front panel power button - power down the host - reset the SP and/or host - run diagnostics on the SP and, of course: - boot the ILOM [Of course, I suspect I'll uncover additional uses as I tinker more with it!] Think of it as having a similar function wrt the ILOM as the OBP has to the OS in older Sun boxen. Of course, at $WORK, you're not trying to get INTO a box that someone has locked up -- as YOU are the party who likely locked it up in the first place! OTOH, when a system falls into your lap, you don't always have that sort of access. So, you need to rely on mechanisms that the designers put in place to make this sort of thing possible!
Re: Testing memory performance
On 11/18/2018 7:00 AM, Sad Clouds wrote: I'm developing a small tool that tests memory performance/throughput across different environments. I'm noticing performance issues on NetBSD-8, below are the details: ... NetBSD and Linux have different versions of GCC, but I was hoping the following flags would keep optimization differences to a minimum: If you want to rule that out, you could always build the same version of gcc on both. Or even run the linux binary (and libs) on NetBSD. NetBSD: 16 threads x 1 GiB, using 1 KiB memcpy size, no mlock: Thread 2 preflt=13504.86 msec, memcpy=2874.69 MiB/sec ... Total transfer rate: 5817.56 MiB/sec What? I think your measurements are a bit off here. There may be a problem with the speed, but if you're measuring the per-thread rate properly then the sum of those should equal your total transfer rate. Are the periods during which each thread calculates its rate very different from the period of the overall test? Also, your subsequent email about memcpy disassembly does not list the full code for the linux version (the jumps at the start refer to instruction addresses that you don't include), so you can't really compare them. I expect that both implementations have a variety of code blocks to handle different alignments, different supported instructions, etc.. Eric
Re: Testing memory performance
On Sun 18 Nov 2018 at 19:04:02 +, Sad Clouds wrote: > Linux (gcc 6.3.0): It looks to me like this fragment is not the whole function: > Dump of assembler code for function memcpy: > => 0x778a0e90 <+0>: mov%rdi,%rax >0x778a0e93 <+3>: cmp$0x10,%rdx >0x778a0e97 <+7>: jb 0x778a0f77 0x778a0f77 isn't in the disassembly >0x778a0e9d <+13>: cmp$0x20,%rdx >0x778a0ea1 <+17>: ja 0x778a0fc6 0x778a0fc6 neither. >0x778a0ea7 <+23>: movups (%rsi),%xmm0 >0x778a0eaa <+26>: movups -0x10(%rsi,%rdx,1),%xmm1 >0x778a0eaf <+31>: movups %xmm0,(%rdi) >0x778a0eb2 <+34>: movups %xmm1,-0x10(%rdi,%rdx,1) >0x778a0eb7 <+39>: retq > End of assembler dump. It looks like both functions check for some initial conditions to see which optimized loop they can use, but they use very different optimizations. -Olaf. -- ___ Olaf 'Rhialto' Seibert -- "What good is a Ring of Power \X/ rhialto/at/falu.nl -- if you're unable...to Speak." - Agent Elrond signature.asc Description: PGP signature
Re: Netra T5220
On Sat, Nov 17, 2018 at 06:53:51PM -0700, Don NetBSD wrote: > On 11/17/2018 1:52 PM, Brett Lymn wrote: > >On Fri, Nov 16, 2018 at 02:11:26PM -0700, Don NetBSD wrote: > >> > >>Yes, but what's the prompt BEFORE that (u-boot>)? And, where do I > >>find the capabilities, there, documented? > > > >As someone else mentioned that is the Service Processor boot, it is a > >cut down linux image, IIRC running on powerpc. I doubt if you will find > >much publically available information on the guts of the SP... even if > >you have access to Oracle support, it is not something that Oracle > >customers are meant to mess with. > > No. The prompt BEFORE the service processor starts Linux. > Yes, I know what you are talking about. I deal with Sun/Oracle equipment at $WORK. I have seen that prompt. I have never seen any documentation as to what you can do there. I would be surprised if there is anything available at all outside Oracle - I think the attitude is that the customers don't need to mess with the SP at all and should be treated just as a firmware blob (which is, in fact, how the updates are provided - a blov for the linux image plus OFW update) -- Brett Lymn Let go, or be dragged - Zen proverb.
Testing memory performance
I'm developing a small tool that tests memory performance/throughput across different environments. I'm noticing performance issues on NetBSD-8, below are the details: The tool creates a number of concurrent threads, each threads allocates 1 GiB memory segment and a 1 KiB transfer block. It pre-faults every page by writing a single byte at every 4 KiB offset. It then calls memcpy () in a loop, copying 1 KiB block until 1 GiB memory segment is filled. NetBSD and Linux have different versions of GCC, but I was hoping the following flags would keep optimization differences to a minimum: gcc -O1 -fno-builtin -march=westmere -Wall -pedantic -std=c11 \ -D_FILE_OFFSET_BITS=64 -D_XOPEN_SOURCE=700 -D_DEFAULT_SOURCE Hardware has 48 GiB of RAM, For this test I'm using 16 threads x 1 GiB = 16 GiB total. I'm seeing several issues on NetBSD: 1. When each thread calls mlock() to lock pages, sometimes when unlocking those pages, munlock() fails with ENOMEM. It doesn't happen every time, but frequently enough and I don't know why specifically munlock() fails. Same code works correctly on Linux. 2. Performance with 16 concurrent threads is rather bad. Most threads are idle 60% of the time (on Linux they are 100% busy), which suggests some sort of contention somewhere. On NetBSD average throughput with 16 threads is around 5.8 GiB/sec, on Linux it is around 15.3 GiB/sec. 3. This issue affects both NetBSD and Linux. When using mlock() to lock memory pages before issuing memcpy(), overall throughput drops significantly. Threads seem to be serialized, while a few threads are running, others are blocked for some reason. I don't know why mlock() has this affect. If anyone has any thoughts on this, please let me know. Below are details of SMP architecture and test results # lscpu Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):16 On-line CPU(s) list: 0-15 Thread(s) per core:2 Core(s) per socket:4 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family:6 Model: 44 Model name:Intel(R) Xeon(R) CPU E5620 @ 2.40GHz Stepping: 2 CPU MHz: 1596.000 CPU max MHz: 2395. CPU min MHz: 1596. BogoMIPS: 4787.71 Virtualization:VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 12288K NUMA node0 CPU(s): 0-3,8-11 NUMA node1 CPU(s): 4-7,12-15 NetBSD: 16 threads x 1 GiB, using 1 KiB memcpy size, no mlock: Thread 2 preflt=13504.86 msec, memcpy=2874.69 MiB/sec Thread 7 preflt=14277.53 msec, memcpy=2891.39 MiB/sec Thread 3 preflt=14765.99 msec, memcpy=2553.72 MiB/sec Thread 14preflt=15036.90 msec, memcpy=2288.19 MiB/sec Thread 1 preflt=15126.01 msec, memcpy=2315.53 MiB/sec Thread 12preflt=15333.82 msec, memcpy=2071.52 MiB/sec Thread 5 preflt=15603.25 msec, memcpy=1880.64 MiB/sec Thread 6 preflt=15704.05 msec, memcpy=1662.66 MiB/sec Thread 10preflt=15693.48 msec, memcpy=1642.44 MiB/sec Thread 4 preflt=15571.64 msec, memcpy=1557.73 MiB/sec Thread 15preflt=15574.60 msec, memcpy=1571.76 MiB/sec Thread 9 preflt=15750.08 msec, memcpy=2170.44 MiB/sec Thread 13preflt=15588.69 msec, memcpy=1900.24 MiB/sec Thread 8 preflt=15587.50 msec, memcpy=2043.66 MiB/sec Thread 16preflt=15265.48 msec, memcpy=1884.74 MiB/sec Thread 11preflt=15294.87 msec, memcpy=2272.75 MiB/sec Total transfer rate: 5817.56 MiB/sec NetBSD: 16 threads x 1 GiB, using 1 KiB memcpy size, with mlock: Thread 2 preflt=5.27 msec, memcpy=2595.67 MiB/sec Thread 3 preflt=5.37 msec, memcpy=2550.90 MiB/sec Thread 16preflt=5.02 msec, memcpy=2770.11 MiB/sec Thread 4 preflt=4.12 msec, memcpy=3209.06 MiB/sec Thread 15preflt=5.31 msec, memcpy=2496.82 MiB/sec Thread 13preflt=7.46 msec, memcpy=3083.72 MiB/sec Thread 5 preflt=5.49 msec, memcpy=2766.81 MiB/sec Thread 14preflt=6.94 msec, memcpy=2574.98 MiB/sec Thread 8 preflt=6.53 msec, memcpy=2201.47 MiB/sec Thread 12preflt=4.90 msec, memcpy=2814.79 MiB/sec Thread 10preflt=4.41 msec, memcpy=2615.27 MiB/sec Thread 6 preflt=6.18 msec, memcpy=2844.57 MiB/sec Thread 9 preflt=5.38 msec, memcpy=2976.05 MiB/sec Thread 7 preflt=4.81 msec, memcpy=2828.54 MiB/sec Thread 11preflt=5.10 msec, memcpy=2778.69 MiB/sec Thread 1 preflt=3.84 msec, memcpy=3229.88 MiB/sec Total transfer rate: 3789.33 MiB/sec Linux: 16 threads x 1 GiB, using 1 KiB memcpy size, no mlock: Thread 5 preflt=1122.06 msec, memcpy=990.24 MiB/sec Thread 2 preflt=1137.94 msec, memcpy=990.41 MiB/sec Thread 15preflt=1125.65 msec, memcpy=982.23 MiB/sec Thread 4 preflt=1130.02 msec, memcpy=981.37 MiB/sec Thread 9 preflt=1130.47 msec, memcpy=982.23 MiB/sec Thread 13preflt=1127.70 msec, memcpy=982.00 MiB/sec Thread 3
Re: Netra T5220
On 11/18/2018 2:01 AM, Sad Clouds wrote: On Sat, 17 Nov 2018 18:53:51 -0700 Don NetBSD wrote: The earlier "u-boot" prompt is significant as it lets you tinker with the "pre-Linux" environment. Among other things, it lets you erase the Linux password so you CAN log into the SP (if you'd lost that information) I want to know what else it is useful for (besides exploring "help" at that prompt) U-Boot is just a bootloader, like Grub, the only useful thing it does, is booting embedded Linux. Not sure why you'd want to tinker with that, because if you misconfigure/damage it, you may find your hardware no longer boots. Actually, it does a fair bit more than "just boot Linux" -- hence the reason to tinker with it! :> (why would it have a command interpreter if the only thing it could do was "boot"?) [Hint: ask yourself what you'd do if you didn't have the password to the ILOM; or, if the Linux image had been corrupted/wouldn't boot; or if you wanted to reflash that image (e.g., to support 11.4); or, if the serial port wasn't "connected" to the ILOM] For folks like me who acquire these devices without being able to speak to the previous owner ("what's root's password?"), it's an essential tool to getting into a box that may typically have been locked up to prevent casual access (esp as you can't "pull the SP's disk" to alter its contents off-line) The hooks have been placed there, for a reason. Silly NOT to understand them and use them! But anyway, good luck with your investigations