Re: multicore processors gain

2011-01-10 Thread Insan Praja SW

Hi,
On Sat, 08 Jan 2011 18:48:00 +0700, Landry Breuil  
landry.bre...@gmail.com wrote:


On Fri, Jan 7, 2011 at 7:54 PM, Ted Unangst ted.unan...@gmail.com  
wrote:
On Fri, Jan 7, 2011 at 1:18 PM, Christian Weisgerber  
na...@mips.inka.de

wrote:

I guess Landry doesn't read this list, or he could tell you how his
experiment with parallel ports building on a 64-way sparc64 T2 went.
With 32 build jobs it looked like this:

landry_p22 0.8%Int  48.9%Sys   6.0%Usr   0.0%Nic  44.3%Idle
landry_p22 around that all the time


My understanding is that the T2 is closer to an 8-way machine.  If we
could recognize the real cores and balance appropriately, 8 build jobs
shouldn't be too bad.

At least with a 4-core 8-thread i7 processor, make -j 8 scales  
reasonably

well.


In that particular case, dpb jobs are a bit different than just
running 'make -j'.
It's more like oh let's build XX ports at the same time, which is a  
perfect

stresstest for smp.

32 Build jobs made the machine totally unusable (load was constant around
40/45 iirc), so far i've settled for 12 jobs, which spawns approx ~50/60  
make
processes in parallel (a single port build spawns 4/5 makes), more or  
less

the
same amount of shells, and smth like ~20 ssh process as it's the dpb  
master

node.
Load is constant around 20, and the machine is still 'responsive'.



I have a SMP -i386 current that runs make build with -j switch that  
still forwarding 1Mpps packet, systat -i and bgpd. ssh and other works  
just normal. It's a Xeon 3110 Machine.



227 processes: 210 idle, 17 on processor
All CPUs:  5.8% user,  0.0% nice, 16.9% system,  0.8% interrupt, 76.5%  
idle


Landry



Thanks,


Insan Praja
--
Using Opera's revolutionary email client: http://www.opera.com/mail/



Re: multicore processors gain

2011-01-08 Thread Landry Breuil
On Fri, Jan 7, 2011 at 7:54 PM, Ted Unangst ted.unan...@gmail.com wrote:
 On Fri, Jan 7, 2011 at 1:18 PM, Christian Weisgerber na...@mips.inka.de
 wrote:
 I guess Landry doesn't read this list, or he could tell you how his
 experiment with parallel ports building on a 64-way sparc64 T2 went.
 With 32 build jobs it looked like this:

 landry_p22 0.8%Int  48.9%Sys   6.0%Usr   0.0%Nic  44.3%Idle
 landry_p22 around that all the time

 My understanding is that the T2 is closer to an 8-way machine.  If we
 could recognize the real cores and balance appropriately, 8 build jobs
 shouldn't be too bad.

 At least with a 4-core 8-thread i7 processor, make -j 8 scales reasonably
 well.

In that particular case, dpb jobs are a bit different than just
running 'make -j'.
It's more like oh let's build XX ports at the same time, which is a perfect
stresstest for smp.

32 Build jobs made the machine totally unusable (load was constant around
40/45 iirc), so far i've settled for 12 jobs, which spawns approx ~50/60 make
processes in parallel (a single port build spawns 4/5 makes), more or less
the
same amount of shells, and smth like ~20 ssh process as it's the dpb master
node.
Load is constant around 20, and the machine is still 'responsive'.

227 processes: 210 idle, 17 on processor
All CPUs:  5.8% user,  0.0% nice, 16.9% system,  0.8% interrupt, 76.5% idle

Landry



Re: multicore processors gain

2011-01-08 Thread Mihai Popescu B.S.
Well,

Thank you for on topic answers. I've seen the -pthread parameters on
some ports' compile, but I thought is an alias for process. I will
read about them.
Damn, am I the only one who gets mad when receiving a link to
wikipedia ? It looks like a sindrome on internet.
I'm confused about multicore gain, but what the hell, let the who has
the bigger dick [computer, server, %, etc. ] game continue 



Re: multicore processors gain

2011-01-07 Thread Henning Brauer
* Chris Cappuccio ch...@nmedia.net [2011-01-06 22:06]:
 But, yeah, if you want to maximize your 48 core AMD box in a data center and 
 you don't see make -j48 as a practical application, OpenBSD may not be 
 there yet for you.  I don't have anything with more than 4 cores, so it was 
 never really a concern for me :)

you're wrong. my OpenBSD SMP boxes (no, no 48 cores) do very well.
as long as the load is userland-driven we scale fine.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting



Re: multicore processors gain

2011-01-07 Thread Mihai Popescu B.S.
Hi folks,

I will reformulate the question. Sorry for this, but it sleeps off topic.

So, I'm interested about Intel Core 2 Duo family and i3, i5, i7
families. I don't know what SMP is about.
I remember UNIX has no threads, just processes spawn by fork().

Having this in mind, will a processor from upper categories help me
and how - by using all its cores or just some extra L2 or L3 cache.
Are there some differences in the way OpenBSD runs on a processor from
upper categories and some blade server with many stand alone
processors.

Many thanks.



Re: multicore processors gain

2011-01-07 Thread Ted Unangst
On Fri, Jan 7, 2011 at 9:53 AM, Mihai Popescu B.S. mihai...@gmail.com wrote:
 I remember UNIX has no threads, just processes spawn by fork().

A lot has changed since 1995.



Re: multicore processors gain

2011-01-07 Thread Adam M. Dutko
 A lot has changed since 1995.

pthreads -- https://computing.llnl.gov/tutorials/pthreads/

rthreads -- 
http://www.informatik.uni-augsburg.de/~ungerer/rthreads/RThreads.html

and etc.



Re: multicore processors gain

2011-01-07 Thread Jeremy Chase
Yes, it will use all your cores.

I don't understand your question about blade servers, but they are
just a different form factor of the essentially the same hardware. If
the hardware is supported SMP should work just fine.

PS: SMP is what lets you use all your cores:
http://en.wikipedia.org/wiki/Symmetric_multiprocessing

--
Jeremy Chase
http://twitter.com/jeremychase




On Fri, Jan 7, 2011 at 9:53 AM, Mihai Popescu B.S. mihai...@gmail.com wrote:
 Hi folks,

 I will reformulate the question. Sorry for this, but it sleeps off topic.

 So, I'm interested about Intel Core 2 Duo family and i3, i5, i7
 families. I don't know what SMP is about.
 I remember UNIX has no threads, just processes spawn by fork().

 Having this in mind, will a processor from upper categories help me
 and how - by using all its cores or just some extra L2 or L3 cache.
 Are there some differences in the way OpenBSD runs on a processor from
 upper categories and some blade server with many stand alone
 processors.

 Many thanks.



Re: multicore processors gain

2011-01-07 Thread Ted Unangst
On Fri, Jan 7, 2011 at 10:49 AM, Adam M. Dutko dutko.a...@gmail.com wrote:
 rthreads -- 
 http://www.informatik.uni-augsburg.de/~ungerer/rthreads/RThreads.html

The above paper has nothing to do with what's called being rthreads in OpenBSD.

A more appropriate paper from 1995 would be this one, except OpenBSD
uses a system call named rfork instead of sfork.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.8012



Re: multicore processors gain

2011-01-07 Thread Christian Weisgerber
Henning Brauer lists-open...@bsws.de wrote:

 you're wrong. my OpenBSD SMP boxes (no, no 48 cores) do very well.
 as long as the load is userland-driven we scale fine.

I guess Landry doesn't read this list, or he could tell you how his
experiment with parallel ports building on a 64-way sparc64 T2 went.
With 32 build jobs it looked like this:

landry_p22 0.8%Int  48.9%Sys   6.0%Usr   0.0%Nic  44.3%Idle
landry_p22 around that all the time

-- 
Christian naddy Weisgerber  na...@mips.inka.de



Re: multicore processors gain

2011-01-07 Thread Ted Unangst
On Fri, Jan 7, 2011 at 1:18 PM, Christian Weisgerber na...@mips.inka.de
wrote:
 I guess Landry doesn't read this list, or he could tell you how his
 experiment with parallel ports building on a 64-way sparc64 T2 went.
 With 32 build jobs it looked like this:

 landry_p22 0.8%Int  48.9%Sys   6.0%Usr   0.0%Nic  44.3%Idle
 landry_p22 around that all the time

My understanding is that the T2 is closer to an 8-way machine.  If we
could recognize the real cores and balance appropriately, 8 build jobs
shouldn't be too bad.

At least with a 4-core 8-thread i7 processor, make -j 8 scales reasonably
well.



Re: multicore processors gain

2011-01-07 Thread Benny Löfgren

On 2011-01-07 19.54, Ted Unangst wrote:

experiment with parallel ports building on a 64-way sparc64 T2 went.
With 32 build jobs it looked like this:
landry_p22  0.8%Int  48.9%Sys   6.0%Usr   0.0%Nic  44.3%Idle
landry_p22  around that all the time

My understanding is that the T2 is closer to an 8-way machine.  If we
could recognize the real cores and balance appropriately, 8 build jobs
shouldn't be too bad.
At least with a 4-core 8-thread i7 processor, make -j 8 scales reasonably
well.


Just to illustrate, a quick test on my 8-core (2 cpu x 4 core) 
Supermicro AMD box (compile a GENERIC.MP kernel):


# make clean  make depend
# time make
...
3m26.78s real 2m43.73s user 0m35.08s system

# make clean  make depend
# time make -j8
...
0m47.40s real 2m52.75s user 3m1.70s system

On a first glance it doesn't scale all that well, about 4,4 times 
quicker real time when running eight compiler tasks simultaneously 
compared to the single one.


But the server isn't idle to begin with (it is run in quite heavy 
production), and this sort of test is of course not processor-only. 
Also, both tests were run with the MP kernel, so even the single-task 
test would probably utilize several kernels at times.



Regards,
/Benny


dmesg below:

888888 (cut)
OpenBSD 4.7 (GENERIC.MP) #130: Wed Mar 17 20:48:50 MDT 2010
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 3756720128 (3582MB)
avail mem = 3650265088 (3481MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xfb980 (77 entries)
bios0: vendor American Megatrends Inc. version 080014 date 10/13/2008
bios0: Supermicro H8DMT
acpi0 at bios0: rev 0
acpi0: tables DSDT FACP APIC MCFG OEMB SRAT EINJ BERT ERST HEST SSDT
acpi0: wakeup devices PS2K(S4) PS2M(S4) NSMB(S4) USB0(S4) USB2(S1) 
NMAC(S5) NMAD(S5) P0P1(S4) HDAC(S4) BR10(S4) BR15(S4) SLPB(S4) PWRB(S4)

acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Quad-Core AMD Opteron(tm) Processor 2376, 2312.18 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
64b/line 16-way L2 cache
cpu0: ITLB 32 4KB entries fully associative, 16 4MB entries fully 
associative
cpu0: DTLB 48 4KB entries fully associative, 48 4MB entries fully 
associative

cpu0: apic clock running at 201MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Quad-Core AMD Opteron(tm) Processor 2376, 2311.86 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
64b/line 16-way L2 cache
cpu1: ITLB 32 4KB entries fully associative, 16 4MB entries fully 
associative
cpu1: DTLB 48 4KB entries fully associative, 48 4MB entries fully 
associative

cpu2 at mainbus0: apid 2 (application processor)
cpu2: Quad-Core AMD Opteron(tm) Processor 2376, 2311.86 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
64b/line 16-way L2 cache
cpu2: ITLB 32 4KB entries fully associative, 16 4MB entries fully 
associative
cpu2: DTLB 48 4KB entries fully associative, 48 4MB entries fully 
associative

cpu3 at mainbus0: apid 3 (application processor)
cpu3: Quad-Core AMD Opteron(tm) Processor 2376, 2311.86 MHz
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
64b/line 16-way L2 cache
cpu3: ITLB 32 4KB entries fully associative, 16 4MB entries fully 
associative
cpu3: DTLB 48 4KB entries fully associative, 48 4MB entries fully 
associative

cpu4 at mainbus0: apid 4 (application processor)
cpu4: Quad-Core AMD Opteron(tm) Processor 2376, 2311.86 MHz
cpu4: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu4: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
64b/line 16-way L2 cache
cpu4: ITLB 32 4KB entries fully associative, 16 4MB entries fully 
associative
cpu4: DTLB 48 4KB entries fully associative, 48 4MB entries fully 
associative

cpu5 at mainbus0: apid 5 (application processor)
cpu5: Quad-Core AMD Opteron(tm) Processor 2376, 2311.86 MHz
cpu5: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu5: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 

Re: multicore processors gain

2011-01-07 Thread Benny Löfgren

On 2011-01-07 20.45, Benny LC6fgren wrote:

Also, both tests were run with the MP kernel, so even the single-task
test would probably utilize several kernels at times.


*duh* Meant to say ...utilize several cores..., not kernels.

/B

--
internetlabbet.se / work:   +46 8 551 124 80  / Words must
Benny LC6fgren/  mobile: +46 70 718 11 90 /   be weighed,
/   fax:+46 8 551 124 89/not counted.
   /email:  benny -at- internetlabbet.se



Re: multicore processors gain

2011-01-07 Thread Henning Brauer
* Benny Lvfgren bl-li...@lofgren.biz [2011-01-07 20:45]:
 On 2011-01-07 19.54, Ted Unangst wrote:
 experiment with parallel ports building on a 64-way sparc64 T2 went.
 With 32 build jobs it looked like this:
 landry_p22  0.8%Int  48.9%Sys   6.0%Usr   0.0%Nic  44.3%Idle
 landry_p22  around that all the time
 My understanding is that the T2 is closer to an 8-way machine.  If we
 could recognize the real cores and balance appropriately, 8 build jobs
 shouldn't be too bad.
 At least with a 4-core 8-thread i7 processor, make -j 8 scales reasonably
 well.
 
 Just to illustrate, a quick test on my 8-core (2 cpu x 4 core)
 Supermicro AMD box (compile a GENERIC.MP kernel):
 
 # make clean  make depend
 # time make
 ...
 3m26.78s real 2m43.73s user 0m35.08s system
 
 # make clean  make depend
 # time make -j8
 ...
 0m47.40s real 2m52.75s user 3m1.70s system
 
 On a first glance it doesn't scale all that well, about 4,4 times
 quicker real time when running eight compiler tasks simultaneously
 compared to the single one.
 
 But the server isn't idle to begin with (it is run in quite heavy
 production), and this sort of test is of course not processor-only.
 Also, both tests were run with the MP kernel, so even the
 single-task test would probably utilize several kernels at times.

indeed - your test has some flaws. but still, the scaling it shows
isn't all that bad - and keep in mind that cores typically share a bit
more than seperate CPUs. this can have advantages or disadvantages.

the box i have in mind does two things that matter for this
discussion:
-takes backups for/from many servers
-does dns  webalizer on webserver logfiles (many many, from many
 webservers)

the backup sounds I/O-heavy - and of course kinda is. but the biggest
load is gzip. the backup stuff i wrote myself over many years, it has
a nifty scheduler that parallelizes nicely.

the webserver logfile processing suffers from dns latency (local cache
of course, but still). massive massive massive parallel processing (i
wrote that stuff, too) drives it to a point where all CPUs are almost
100% busy (well, see below).

the backup runs for about 3 hours with all CPUs busy. the webserver
logfile thing usually like 2 hours, but only one hour with everything
busy, afterwards only the big logs are still being processed and the
latency is the limiting factor.

the box used to be a dual xeon 2.2 (the older, p4-based heating plate),
with hyperthreading, so 4 logical CPUs with ami RAID 5. the backup
scales almost perfect, more than 3.5x faster with the 4 logical CPUs vs
just one. webserver log processing gives the same picture.

since wednesday it is an intel E7500, 2.93GHz, 2 cores, a sata disk to
boot from and two big sata disks, softraid raid 1. it is slightly
faster than the previous one. pls note that i can only give estimates,
since backup and webserver log processing performance are influenced
by external factors.

and since somebody is going to ask - the seperate boot disk (that
holds OS and everything, just not the raw data) is there to make it
easy to replace the data disks by bigger ones.

so for these tasks, we scale perfectly fine.

throwing more than one cpu (core) at a database server running just
one mysqld instance is not going to help right now. that's likely to
change with rthreads so.

throwing more than one core at a firewall (without much proxy stuff in
userland) hurts more than it helps right now.

guess my point is clear. we scale fine for many (I'd even say the most)
tasks. we scale miserably for some others. yes, our SMP can be
improved, but it isn't bad. heck, what cannot be improved?

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting



Re: multicore processors gain

2011-01-07 Thread Martin Schröder
2011/1/7 Mihai Popescu B.S. mihai...@gmail.com:
 families. I don't know what SMP is about.

There's a great site since the beginning of the millenium:
http://en.wikipedia.org/wiki/SMP

And you should read and follow
http://www.catb.org/~esr/faqs/smart-questions.html

HTH. HAND
   Martin



Re: multicore processors gain

2011-01-06 Thread Robert
On Thu, 6 Jan 2011 13:45:05 +0200
Mihai Popescu B.S. mihai...@gmail.com wrote:
 I got the idea from FAQ that OpenBSD is not using more than one core
 from multicore processors.

http://www.openbsd.org/faq/faq8.html#SMP

As soon as you run more than just the kernel on your system (...), the
other CPUs/cores will be used as well.

regards,
Robert



Re: multicore processors gain

2011-01-06 Thread Jeremy Chase
This is my not-so-technical understanding.

OpenBSD's current SMP status:
- The kernel uses a single lock for shared data. My understanding is
that this means that the kernel itself doesn't benefit from SMP as
much as it could otherwise, but it does use multiple cores. (I
believe, but would like confirmation from someone who knows)
- Userland processes can run on as many cores as are supported. So if
you have multiple processes that are using a lot of CPU time, they
will be split across all cores.
- However all threads in a multi-threaded process will run on one
core. For example Mysql will only use a single core, even though it is
multi-threaded.

Bottom line, SMP is very well supported. People blow the BKL thing out
of proportion.

--
Jeremy Chase
http://twitter.com/jeremychase



On Thu, Jan 6, 2011 at 6:45 AM, Mihai Popescu B.S. mihai...@gmail.com wrote:

 Hello,

 I got the idea from FAQ that OpenBSD is not using more than one core
 from multicore processors.
 Pretending I got it right, what's the benefit to buy an Intel Core 2
 Duo ? Just the bigger cache and some extra instructions?

 Is there a difference in how OpenBSD handles let's say a multicore
 processor or an arhitecture with blade processors ?

 Thanks.



Re: multicore processors gain

2011-01-06 Thread Nick Holland
On 01/06/11 06:44, Mihai Popescu B.S. wrote:
 Hello,
 
 I got the idea from FAQ that OpenBSD is not using more than one core
 from multicore processors.

please indicate where you got that from...
I can't do much about crap you ...read on the 'net..., but if there is
something in the FAQ that implies that, I can correct or clarify...

Multi-core is basically just cheap multiprocessor.  It works.  May not
be the fastest system in the world, but probably does more than what you
need...

Nick.



Re: multicore processors gain

2011-01-06 Thread Chris Cappuccio
Jeremy Chase [jeremych...@gmail.com] wrote:
 This is my not-so-technical understanding.
 
 OpenBSD's current SMP status:
 - The kernel uses a single lock for shared data. My understanding is
 that this means that the kernel itself doesn't benefit from SMP as
 much as it could otherwise, but it does use multiple cores. (I
 believe, but would like confirmation from someone who knows)

Which isn't symmetric at all.  Having said that, I suspect most people don't 
get much benefit today from SMP outside of heavy server applications

 - Userland processes can run on as many cores as are supported. So if
 you have multiple processes that are using a lot of CPU time, they
 will be split across all cores.
 - However all threads in a multi-threaded process will run on one
 core. For example Mysql will only use a single core, even though it is
 multi-threaded.
 

The threaded issue is actively being worked on with the development of the 
rthreads library and related kernel changes to accommodate rthreads.  It turned 
out to be a deep hole, but it is likely to be working long before the kernel 
itself can use multiple processors

 Bottom line, SMP is very well supported. People blow the BKL thing out
 of proportion.

I think people have looked at using multiple cores for offloading crypto, pf, 
various parts of the kernel, but make no mistake, the kernel is totally limited 
to one core.

But, yeah, if you want to maximize your 48 core AMD box in a data center and 
you don't see make -j48 as a practical application, OpenBSD may not be there 
yet for you.  I don't have anything with more than 4 cores, so it was never 
really a concern for me :)