Re: multicore processors gain
Hi, On Sat, 08 Jan 2011 18:48:00 +0700, Landry Breuil landry.bre...@gmail.com wrote: On Fri, Jan 7, 2011 at 7:54 PM, Ted Unangst ted.unan...@gmail.com wrote: On Fri, Jan 7, 2011 at 1:18 PM, Christian Weisgerber na...@mips.inka.de wrote: I guess Landry doesn't read this list, or he could tell you how his experiment with parallel ports building on a 64-way sparc64 T2 went. With 32 build jobs it looked like this: landry_p22 0.8%Int 48.9%Sys 6.0%Usr 0.0%Nic 44.3%Idle landry_p22 around that all the time My understanding is that the T2 is closer to an 8-way machine. If we could recognize the real cores and balance appropriately, 8 build jobs shouldn't be too bad. At least with a 4-core 8-thread i7 processor, make -j 8 scales reasonably well. In that particular case, dpb jobs are a bit different than just running 'make -j'. It's more like oh let's build XX ports at the same time, which is a perfect stresstest for smp. 32 Build jobs made the machine totally unusable (load was constant around 40/45 iirc), so far i've settled for 12 jobs, which spawns approx ~50/60 make processes in parallel (a single port build spawns 4/5 makes), more or less the same amount of shells, and smth like ~20 ssh process as it's the dpb master node. Load is constant around 20, and the machine is still 'responsive'. I have a SMP -i386 current that runs make build with -j switch that still forwarding 1Mpps packet, systat -i and bgpd. ssh and other works just normal. It's a Xeon 3110 Machine. 227 processes: 210 idle, 17 on processor All CPUs: 5.8% user, 0.0% nice, 16.9% system, 0.8% interrupt, 76.5% idle Landry Thanks, Insan Praja -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: multicore processors gain
On Fri, Jan 7, 2011 at 7:54 PM, Ted Unangst ted.unan...@gmail.com wrote: On Fri, Jan 7, 2011 at 1:18 PM, Christian Weisgerber na...@mips.inka.de wrote: I guess Landry doesn't read this list, or he could tell you how his experiment with parallel ports building on a 64-way sparc64 T2 went. With 32 build jobs it looked like this: landry_p22 0.8%Int 48.9%Sys 6.0%Usr 0.0%Nic 44.3%Idle landry_p22 around that all the time My understanding is that the T2 is closer to an 8-way machine. If we could recognize the real cores and balance appropriately, 8 build jobs shouldn't be too bad. At least with a 4-core 8-thread i7 processor, make -j 8 scales reasonably well. In that particular case, dpb jobs are a bit different than just running 'make -j'. It's more like oh let's build XX ports at the same time, which is a perfect stresstest for smp. 32 Build jobs made the machine totally unusable (load was constant around 40/45 iirc), so far i've settled for 12 jobs, which spawns approx ~50/60 make processes in parallel (a single port build spawns 4/5 makes), more or less the same amount of shells, and smth like ~20 ssh process as it's the dpb master node. Load is constant around 20, and the machine is still 'responsive'. 227 processes: 210 idle, 17 on processor All CPUs: 5.8% user, 0.0% nice, 16.9% system, 0.8% interrupt, 76.5% idle Landry
Re: multicore processors gain
Well, Thank you for on topic answers. I've seen the -pthread parameters on some ports' compile, but I thought is an alias for process. I will read about them. Damn, am I the only one who gets mad when receiving a link to wikipedia ? It looks like a sindrome on internet. I'm confused about multicore gain, but what the hell, let the who has the bigger dick [computer, server, %, etc. ] game continue
Re: multicore processors gain
* Chris Cappuccio ch...@nmedia.net [2011-01-06 22:06]: But, yeah, if you want to maximize your 48 core AMD box in a data center and you don't see make -j48 as a practical application, OpenBSD may not be there yet for you. I don't have anything with more than 4 cores, so it was never really a concern for me :) you're wrong. my OpenBSD SMP boxes (no, no 48 cores) do very well. as long as the load is userland-driven we scale fine. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
Re: multicore processors gain
Hi folks, I will reformulate the question. Sorry for this, but it sleeps off topic. So, I'm interested about Intel Core 2 Duo family and i3, i5, i7 families. I don't know what SMP is about. I remember UNIX has no threads, just processes spawn by fork(). Having this in mind, will a processor from upper categories help me and how - by using all its cores or just some extra L2 or L3 cache. Are there some differences in the way OpenBSD runs on a processor from upper categories and some blade server with many stand alone processors. Many thanks.
Re: multicore processors gain
On Fri, Jan 7, 2011 at 9:53 AM, Mihai Popescu B.S. mihai...@gmail.com wrote: I remember UNIX has no threads, just processes spawn by fork(). A lot has changed since 1995.
Re: multicore processors gain
A lot has changed since 1995. pthreads -- https://computing.llnl.gov/tutorials/pthreads/ rthreads -- http://www.informatik.uni-augsburg.de/~ungerer/rthreads/RThreads.html and etc.
Re: multicore processors gain
Yes, it will use all your cores. I don't understand your question about blade servers, but they are just a different form factor of the essentially the same hardware. If the hardware is supported SMP should work just fine. PS: SMP is what lets you use all your cores: http://en.wikipedia.org/wiki/Symmetric_multiprocessing -- Jeremy Chase http://twitter.com/jeremychase On Fri, Jan 7, 2011 at 9:53 AM, Mihai Popescu B.S. mihai...@gmail.com wrote: Hi folks, I will reformulate the question. Sorry for this, but it sleeps off topic. So, I'm interested about Intel Core 2 Duo family and i3, i5, i7 families. I don't know what SMP is about. I remember UNIX has no threads, just processes spawn by fork(). Having this in mind, will a processor from upper categories help me and how - by using all its cores or just some extra L2 or L3 cache. Are there some differences in the way OpenBSD runs on a processor from upper categories and some blade server with many stand alone processors. Many thanks.
Re: multicore processors gain
On Fri, Jan 7, 2011 at 10:49 AM, Adam M. Dutko dutko.a...@gmail.com wrote: rthreads -- http://www.informatik.uni-augsburg.de/~ungerer/rthreads/RThreads.html The above paper has nothing to do with what's called being rthreads in OpenBSD. A more appropriate paper from 1995 would be this one, except OpenBSD uses a system call named rfork instead of sfork. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.8012
Re: multicore processors gain
Henning Brauer lists-open...@bsws.de wrote: you're wrong. my OpenBSD SMP boxes (no, no 48 cores) do very well. as long as the load is userland-driven we scale fine. I guess Landry doesn't read this list, or he could tell you how his experiment with parallel ports building on a 64-way sparc64 T2 went. With 32 build jobs it looked like this: landry_p22 0.8%Int 48.9%Sys 6.0%Usr 0.0%Nic 44.3%Idle landry_p22 around that all the time -- Christian naddy Weisgerber na...@mips.inka.de
Re: multicore processors gain
On Fri, Jan 7, 2011 at 1:18 PM, Christian Weisgerber na...@mips.inka.de wrote: I guess Landry doesn't read this list, or he could tell you how his experiment with parallel ports building on a 64-way sparc64 T2 went. With 32 build jobs it looked like this: landry_p22 0.8%Int 48.9%Sys 6.0%Usr 0.0%Nic 44.3%Idle landry_p22 around that all the time My understanding is that the T2 is closer to an 8-way machine. If we could recognize the real cores and balance appropriately, 8 build jobs shouldn't be too bad. At least with a 4-core 8-thread i7 processor, make -j 8 scales reasonably well.
Re: multicore processors gain
On 2011-01-07 19.54, Ted Unangst wrote: experiment with parallel ports building on a 64-way sparc64 T2 went. With 32 build jobs it looked like this: landry_p22 0.8%Int 48.9%Sys 6.0%Usr 0.0%Nic 44.3%Idle landry_p22 around that all the time My understanding is that the T2 is closer to an 8-way machine. If we could recognize the real cores and balance appropriately, 8 build jobs shouldn't be too bad. At least with a 4-core 8-thread i7 processor, make -j 8 scales reasonably well. Just to illustrate, a quick test on my 8-core (2 cpu x 4 core) Supermicro AMD box (compile a GENERIC.MP kernel): # make clean make depend # time make ... 3m26.78s real 2m43.73s user 0m35.08s system # make clean make depend # time make -j8 ... 0m47.40s real 2m52.75s user 3m1.70s system On a first glance it doesn't scale all that well, about 4,4 times quicker real time when running eight compiler tasks simultaneously compared to the single one. But the server isn't idle to begin with (it is run in quite heavy production), and this sort of test is of course not processor-only. Also, both tests were run with the MP kernel, so even the single-task test would probably utilize several kernels at times. Regards, /Benny dmesg below: 888888 (cut) OpenBSD 4.7 (GENERIC.MP) #130: Wed Mar 17 20:48:50 MDT 2010 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 3756720128 (3582MB) avail mem = 3650265088 (3481MB) mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xfb980 (77 entries) bios0: vendor American Megatrends Inc. version 080014 date 10/13/2008 bios0: Supermicro H8DMT acpi0 at bios0: rev 0 acpi0: tables DSDT FACP APIC MCFG OEMB SRAT EINJ BERT ERST HEST SSDT acpi0: wakeup devices PS2K(S4) PS2M(S4) NSMB(S4) USB0(S4) USB2(S1) NMAC(S5) NMAD(S5) P0P1(S4) HDAC(S4) BR10(S4) BR15(S4) SLPB(S4) PWRB(S4) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Quad-Core AMD Opteron(tm) Processor 2376, 2312.18 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu0: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative cpu0: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative cpu0: apic clock running at 201MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: Quad-Core AMD Opteron(tm) Processor 2376, 2311.86 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu1: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative cpu1: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative cpu2 at mainbus0: apid 2 (application processor) cpu2: Quad-Core AMD Opteron(tm) Processor 2376, 2311.86 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu2: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative cpu2: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative cpu3 at mainbus0: apid 3 (application processor) cpu3: Quad-Core AMD Opteron(tm) Processor 2376, 2311.86 MHz cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu3: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative cpu3: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative cpu4 at mainbus0: apid 4 (application processor) cpu4: Quad-Core AMD Opteron(tm) Processor 2376, 2311.86 MHz cpu4: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW cpu4: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu4: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative cpu4: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative cpu5 at mainbus0: apid 5 (application processor) cpu5: Quad-Core AMD Opteron(tm) Processor 2376, 2311.86 MHz cpu5: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW cpu5: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
Re: multicore processors gain
On 2011-01-07 20.45, Benny LC6fgren wrote: Also, both tests were run with the MP kernel, so even the single-task test would probably utilize several kernels at times. *duh* Meant to say ...utilize several cores..., not kernels. /B -- internetlabbet.se / work: +46 8 551 124 80 / Words must Benny LC6fgren/ mobile: +46 70 718 11 90 / be weighed, / fax:+46 8 551 124 89/not counted. /email: benny -at- internetlabbet.se
Re: multicore processors gain
* Benny Lvfgren bl-li...@lofgren.biz [2011-01-07 20:45]: On 2011-01-07 19.54, Ted Unangst wrote: experiment with parallel ports building on a 64-way sparc64 T2 went. With 32 build jobs it looked like this: landry_p22 0.8%Int 48.9%Sys 6.0%Usr 0.0%Nic 44.3%Idle landry_p22 around that all the time My understanding is that the T2 is closer to an 8-way machine. If we could recognize the real cores and balance appropriately, 8 build jobs shouldn't be too bad. At least with a 4-core 8-thread i7 processor, make -j 8 scales reasonably well. Just to illustrate, a quick test on my 8-core (2 cpu x 4 core) Supermicro AMD box (compile a GENERIC.MP kernel): # make clean make depend # time make ... 3m26.78s real 2m43.73s user 0m35.08s system # make clean make depend # time make -j8 ... 0m47.40s real 2m52.75s user 3m1.70s system On a first glance it doesn't scale all that well, about 4,4 times quicker real time when running eight compiler tasks simultaneously compared to the single one. But the server isn't idle to begin with (it is run in quite heavy production), and this sort of test is of course not processor-only. Also, both tests were run with the MP kernel, so even the single-task test would probably utilize several kernels at times. indeed - your test has some flaws. but still, the scaling it shows isn't all that bad - and keep in mind that cores typically share a bit more than seperate CPUs. this can have advantages or disadvantages. the box i have in mind does two things that matter for this discussion: -takes backups for/from many servers -does dns webalizer on webserver logfiles (many many, from many webservers) the backup sounds I/O-heavy - and of course kinda is. but the biggest load is gzip. the backup stuff i wrote myself over many years, it has a nifty scheduler that parallelizes nicely. the webserver logfile processing suffers from dns latency (local cache of course, but still). massive massive massive parallel processing (i wrote that stuff, too) drives it to a point where all CPUs are almost 100% busy (well, see below). the backup runs for about 3 hours with all CPUs busy. the webserver logfile thing usually like 2 hours, but only one hour with everything busy, afterwards only the big logs are still being processed and the latency is the limiting factor. the box used to be a dual xeon 2.2 (the older, p4-based heating plate), with hyperthreading, so 4 logical CPUs with ami RAID 5. the backup scales almost perfect, more than 3.5x faster with the 4 logical CPUs vs just one. webserver log processing gives the same picture. since wednesday it is an intel E7500, 2.93GHz, 2 cores, a sata disk to boot from and two big sata disks, softraid raid 1. it is slightly faster than the previous one. pls note that i can only give estimates, since backup and webserver log processing performance are influenced by external factors. and since somebody is going to ask - the seperate boot disk (that holds OS and everything, just not the raw data) is there to make it easy to replace the data disks by bigger ones. so for these tasks, we scale perfectly fine. throwing more than one cpu (core) at a database server running just one mysqld instance is not going to help right now. that's likely to change with rthreads so. throwing more than one core at a firewall (without much proxy stuff in userland) hurts more than it helps right now. guess my point is clear. we scale fine for many (I'd even say the most) tasks. we scale miserably for some others. yes, our SMP can be improved, but it isn't bad. heck, what cannot be improved? -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
Re: multicore processors gain
2011/1/7 Mihai Popescu B.S. mihai...@gmail.com: families. I don't know what SMP is about. There's a great site since the beginning of the millenium: http://en.wikipedia.org/wiki/SMP And you should read and follow http://www.catb.org/~esr/faqs/smart-questions.html HTH. HAND Martin
Re: multicore processors gain
On Thu, 6 Jan 2011 13:45:05 +0200 Mihai Popescu B.S. mihai...@gmail.com wrote: I got the idea from FAQ that OpenBSD is not using more than one core from multicore processors. http://www.openbsd.org/faq/faq8.html#SMP As soon as you run more than just the kernel on your system (...), the other CPUs/cores will be used as well. regards, Robert
Re: multicore processors gain
This is my not-so-technical understanding. OpenBSD's current SMP status: - The kernel uses a single lock for shared data. My understanding is that this means that the kernel itself doesn't benefit from SMP as much as it could otherwise, but it does use multiple cores. (I believe, but would like confirmation from someone who knows) - Userland processes can run on as many cores as are supported. So if you have multiple processes that are using a lot of CPU time, they will be split across all cores. - However all threads in a multi-threaded process will run on one core. For example Mysql will only use a single core, even though it is multi-threaded. Bottom line, SMP is very well supported. People blow the BKL thing out of proportion. -- Jeremy Chase http://twitter.com/jeremychase On Thu, Jan 6, 2011 at 6:45 AM, Mihai Popescu B.S. mihai...@gmail.com wrote: Hello, I got the idea from FAQ that OpenBSD is not using more than one core from multicore processors. Pretending I got it right, what's the benefit to buy an Intel Core 2 Duo ? Just the bigger cache and some extra instructions? Is there a difference in how OpenBSD handles let's say a multicore processor or an arhitecture with blade processors ? Thanks.
Re: multicore processors gain
On 01/06/11 06:44, Mihai Popescu B.S. wrote: Hello, I got the idea from FAQ that OpenBSD is not using more than one core from multicore processors. please indicate where you got that from... I can't do much about crap you ...read on the 'net..., but if there is something in the FAQ that implies that, I can correct or clarify... Multi-core is basically just cheap multiprocessor. It works. May not be the fastest system in the world, but probably does more than what you need... Nick.
Re: multicore processors gain
Jeremy Chase [jeremych...@gmail.com] wrote: This is my not-so-technical understanding. OpenBSD's current SMP status: - The kernel uses a single lock for shared data. My understanding is that this means that the kernel itself doesn't benefit from SMP as much as it could otherwise, but it does use multiple cores. (I believe, but would like confirmation from someone who knows) Which isn't symmetric at all. Having said that, I suspect most people don't get much benefit today from SMP outside of heavy server applications - Userland processes can run on as many cores as are supported. So if you have multiple processes that are using a lot of CPU time, they will be split across all cores. - However all threads in a multi-threaded process will run on one core. For example Mysql will only use a single core, even though it is multi-threaded. The threaded issue is actively being worked on with the development of the rthreads library and related kernel changes to accommodate rthreads. It turned out to be a deep hole, but it is likely to be working long before the kernel itself can use multiple processors Bottom line, SMP is very well supported. People blow the BKL thing out of proportion. I think people have looked at using multiple cores for offloading crypto, pf, various parts of the kernel, but make no mistake, the kernel is totally limited to one core. But, yeah, if you want to maximize your 48 core AMD box in a data center and you don't see make -j48 as a practical application, OpenBSD may not be there yet for you. I don't have anything with more than 4 cores, so it was never really a concern for me :)