Re: [9fans] threads vs forks
On Fri, 06 Mar 2009 12:38:57 PST David Leimbach leim...@gmail.com wrote: Things like Clojure, or Scala become a bit more interesting when the VM is extended to allow tail recursion to happen in a nice way. A lack of TCO is not something that will prevent you from writing many interesting programs (except things like a state machine as a set of mutually calling functions!). There is nothing in Clojure, or C for that matter, that will disallow tail call optimization should an implemention provide it. It is just that unlike Scheme most programming languages do not *mandate* that tail calls be optimized.
Re: [9fans] threads vs forks
On Sat Mar 7 01:02:31 EST 2009, j...@eecs.harvard.edu wrote: On Fri, Mar 06, 2009 at 10:31:59PM -0500, erik quanstrom wrote: it's interesting to note that the quoted mtbf numbers for ssds is within a factor of 2 of enterprise hard drives. if one considers that one needs ~4 ssds to cover the capacity of 1 hard drive, the quoted mtbf/byte is worse for ssd. That's only if you think of flash as a direct replacement for disk. i think that's why they put them in a 2.5 form factor with a standard SATA interface. what are you thinking of? SSDs are expensive on a $/MB basis compared to disks. The good ones not as much as you think. a top-drawer 15k sas drive is on the order of 300GB and $350+. the intel ssd is only twice as much. if you compare the drives supported by the big-iron vendors, intel ssd already has cost parity. For short-lived data you only need go over the I/O bus twice vs. three times for most NVRAMs based on battery-backed DRAM. i'm missing something here. what are your assumptions on how things are connected? also, isn't there an assumption that you don't want to be writing short-lived data to flash if possible? - erik
Re: [9fans] threads vs forks
On Sat Mar 7 09:39:38 EST 2009, j...@eecs.harvard.edu wrote: On Sat, Mar 07, 2009 at 08:58:42AM -0500, erik quanstrom wrote: i think that's why they put them in a 2.5 form factor with a standard SATA interface. what are you thinking of? No, the reason they do that is for backwards compatibility. it's kind of funny to call sata backwards compatability. if things go as you suggest — pcie connected, i think we'll all long for the day when we could write one driver per hba rather than one driver per storage device. new boss, same as the old boss. SSDs are expensive on a $/MB basis compared to disks. The good ones not as much as you think. a top-drawer 15k sas drive is on the order of 300GB and $350+. the intel ssd is only twice as much. if you compare the drives supported by the big-iron vendors, intel ssd already has cost parity. The Intel SSD is cheap and slow :-) pick a lane! first you argued that they are expensive. ☺ Take a gander at the NetApp NAS filers or DataDomain restorers. so you're saying that these machines don't differentiate between primary cache and their write log (or whatever they call it)? My point isn't that it is a bad idea, just that it isn't likely to provide enough business to keep manufacturers interested. Moreover, for capacity disks will keep on winning for a long time. They just start to look more and more like tape. no. i agree. worm storage in general is not a popular topic, but the few companies that do use it pay the big bucks for it. it's always great when the backup media is less reliable than the primary media. - erik
Re: [9fans] threads vs forks
Clojure is definitely something that I would like to play with extensively. Looks very promising from the outset, so the only question that I have is how does it feel when used for substantial things. Thanks, Roman. P.S. My belief in it was actually reaffirmed by a raving endorsement it got from an old LISP community. Those guys are a bit like 9fans, if you know what I mean ;-) On Tue, 2009-03-03 at 10:38 -0800, Bakul Shah wrote: On Tue, 03 Mar 2009 10:11:10 PST Roman V. Shaposhnik r...@sun.com wrote: On Tue, 2009-03-03 at 07:19 -0800, David Leimbach wrote: My knowledge on this subject is about 8 or 9 years old, so check with your local Python guru The last I'd heard about Python's threading is that it was cooperative only, and that you couldn't get real parallelism out of it. It serves as a means to organize your program in a concurrent manner. In other words no two threads run at the same time in Python, even if you're on a multi-core system, due to something they call a Global Interpreter Lock. I believe GIL is as present in Python nowadays as ever. On a related note: does anybody know any sane interpreted languages with a decent threading model to go along? Stackless python is the only thing that I'm familiar with in that department. Depend on what you mean by sane interpreted language with a decent threading model and what you want to do with it but check out www.clojure.org. Then there is Erlang. Its wikipedia entry has this to say: Although Erlang was designed to fill a niche and has remained an obscure language for most of its existence, it is experiencing a rapid increase in popularity due to increased demand for concurrent services, inferior models of concurrency in most mainstream programming languages, and its substantial libraries and documentation.[7][8] Well-known applications include Amazon SimpleDB,[9] Yahoo! Delicious,[10] and the Facebook Chat system.[11]
Re: [9fans] threads vs forks
Things like Clojure, or Scala become a bit more interesting when the VM is extended to allow tail recursion to happen in a nice way. On Fri, Mar 6, 2009 at 10:47 AM, Roman V Shaposhnik r...@sun.com wrote: Clojure is definitely something that I would like to play with extensively. Looks very promising from the outset, so the only question that I have is how does it feel when used for substantial things. Thanks, Roman. P.S. My belief in it was actually reaffirmed by a raving endorsement it got from an old LISP community. Those guys are a bit like 9fans, if you know what I mean ;-) On Tue, 2009-03-03 at 10:38 -0800, Bakul Shah wrote: On Tue, 03 Mar 2009 10:11:10 PST Roman V. Shaposhnik r...@sun.com wrote: On Tue, 2009-03-03 at 07:19 -0800, David Leimbach wrote: My knowledge on this subject is about 8 or 9 years old, so check with your local Python guru The last I'd heard about Python's threading is that it was cooperative only, and that you couldn't get real parallelism out of it. It serves as a means to organize your program in a concurrent manner. In other words no two threads run at the same time in Python, even if you're on a multi-core system, due to something they call a Global Interpreter Lock. I believe GIL is as present in Python nowadays as ever. On a related note: does anybody know any sane interpreted languages with a decent threading model to go along? Stackless python is the only thing that I'm familiar with in that department. Depend on what you mean by sane interpreted language with a decent threading model and what you want to do with it but check out www.clojure.org. Then there is Erlang. Its wikipedia entry has this to say: Although Erlang was designed to fill a niche and has remained an obscure language for most of its existence, it is experiencing a rapid increase in popularity due to increased demand for concurrent services, inferior models of concurrency in most mainstream programming languages, and its substantial libraries and documentation.[7][8] Well-known applications include Amazon SimpleDB,[9] Yahoo! Delicious,[10] and the Facebook Chat system.[11]
Re: [9fans] threads vs forks
On Fri, 06 Mar 2009 10:47:20 PST Roman V Shaposhnik r...@sun.com wrote: Clojure is definitely something that I would like to play with extensively. Looks very promising from the outset, so the only question that I have is how does it feel when used for substantial things. You can browse various Clojure related google groups but there is only one way to find out if it is for you! P.S. My belief in it was actually reaffirmed by a raving endorsement it got from an old LISP community. Those guys are a bit like 9fans, if you know what I mean ;-) No comment :-)
Re: [9fans] threads vs forks
P.S. My belief in it was actually reaffirmed by a raving endorsement it got from an old LISP community. Those guys are a bit like 9fans, if you know what I mean ;-) You mean intelligent people who appreciate elegance? :) Sorry. Couldn't resist. BLS
Re: [9fans] threads vs forks
To be less flippant, what makes high performance flash difficult is the slow erasure time and large erasure blocks relative to the size of individual flash pages. Being full hurts since the flash is typically managed by a log structured storage system with a garbage collector. Small random writes require updating the logical-physical mapping efficiently and crash recoverably. You also need to do copy-on-write which leads to what is commonly called write amplification, which reduces the usuable number of writes. Small writes tend to exacerbate a lot of these problems. Where does all this fancy stuff belong? In the storage medium, in the HBA, in the device driver, in the file system, or in the application? it's interesting to note that the quoted mtbf numbers for ssds is within a factor of 2 of enterprise hard drives. if one considers that one needs ~4 ssds to cover the capacity of 1 hard drive, the quoted mtbf/byte is worse for ssd. the obvious conclusion is that if you think you need raid for hard drives, then you also need raid for ssds. at least if you believe the mtbf numbers. i think that it's a real good question where the fancy flash tricks belong. the naive guess would be that for backwards compatability reasons, the media will get much of the smarts. - erik
Re: [9fans] threads vs forks
Where does all this fancy stuff belong? In the storage medium, in the HBA, in the device driver, in the file system, or in the application? In a very intelligent cache? Or did you mention that above and in my ignorance I missed it? OK, let's try this: . Storage medium: only the hardware developers have access to that and they have never seemed interested in matching anyone else's requirements or suggestions. . The HBA (?). If that's the device adapter, the same applies as above. . The device driver should not be very complex and the block handling should hopefully be shared by more than one device driver, which with the effective demise of Streams is not a very easy thing to implement without resorting to jumping through flaming hoops. . The application? That's being facetious, surely? . A cache? As quanstro pointed out, flash makes a wonderful WORM. Now to get Fossil to work as originally intended, or a more suitable design and implementation to take its place in this role and we have a winner. ++L
Re: [9fans] threads vs forks
Much of the intelligence actually resides in the device driver. It is that secret sauce that gets you good performance. In theory it could be pushed down, but it takes CPU, memory, and memory bandwidth that may not be cost effective there. That would entail a really intelligent controller, which brings us back to a cache, does it not, this time hidden inside a black box. I have been thinking that the obsession with SMP has a negative impact on diverse engineering where intelligent peripherals take over operations that are too slow or too demanding on the generic CPU. Smacks of AoE to me, with a lot more packed into the A. But I'm just an old software developer with a hobbyist interest in electronic engineering and my opinions are not backed by much research. ++L
Re: [9fans] threads vs forks
Sadly, if a WORM is your only application, then no one cares. At least not enough to pony up for real peformance. The folks at places like Sandia are interested in running HPC applications and there are a lot of people in other industries such as big oil and finance that are willing to pay for performance for running HPC applications, VMs which tend to have high I/O requirements when an OS patch comes out, etc. ask not what a technology can do for the world, ask what a technology can do for you! - erik
Re: [9fans] threads vs forks
That's a fact. If you have access to The ACM Queue, check out p16-cantrill-concurrency.pdf (Cantrill and Bonwich on concurrency). Or you can rely on one of the hackish attempts at email attachment management or whatever conceptual error lead to this : https://agora.cs.illinois.edu/download/attachments/18744240/p16-cantrill.pdf?version=1 courtesy of a google datacentre near you
Re: [9fans] threads vs forks
John Barham wrote: On Tue, Mar 3, 2009 at 3:52 AM, hugo rivera uai...@gmail.com wrote: I have to launch many tasks running in parallel (~5000) in a cluster running linux. Each of the task performs some astronomical calculations and I am not pretty sure if using fork is the best answer here. First of all, all the programming is done in python and c... Take a look at the multiprocessing package (http://docs.python.org/library/multiprocessing.html), newly introduced with Python 2.6 and 3.0: multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. It should be a quick and easy way to set up a cluster-wide job processing system (provided all your jobs are driven by Python). Better: use parallelpython (www.parallelpython.org). Afaik multiprocessing is geared towards multi-core systems (one machine), while pp is also suitable for real clusters with more pc's. No special cluster software needed. It will start (here's your fork) a (some) python interpreters on each node, and then you can submit jobs to those 'workers'. The interpreters are kept alive between jobs, so the startup penalty becomes neglectibly when the number of jobs is large enough. Using it here to process massive amounts of satellite data, works like a charm. Vincent. It also looks like it's been (partially?) back-ported to Python 2.4 and 2.5: http://pypi.python.org/pypi/processing. John
Re: [9fans] threads vs forks
Thanks for the advice. Nevertheless I am in no position to decide what pieces of software the cluster will run, I just have to deal with what I have, but anyway I can suggest other possibilities. 2009/3/4, Vincent Schut sc...@sarvision.nl: John Barham wrote: On Tue, Mar 3, 2009 at 3:52 AM, hugo rivera uai...@gmail.com wrote: I have to launch many tasks running in parallel (~5000) in a cluster running linux. Each of the task performs some astronomical calculations and I am not pretty sure if using fork is the best answer here. First of all, all the programming is done in python and c... Take a look at the multiprocessing package (http://docs.python.org/library/multiprocessing.html), newly introduced with Python 2.6 and 3.0: multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. It should be a quick and easy way to set up a cluster-wide job processing system (provided all your jobs are driven by Python). Better: use parallelpython (www.parallelpython.org). Afaik multiprocessing is geared towards multi-core systems (one machine), while pp is also suitable for real clusters with more pc's. No special cluster software needed. It will start (here's your fork) a (some) python interpreters on each node, and then you can submit jobs to those 'workers'. The interpreters are kept alive between jobs, so the startup penalty becomes neglectibly when the number of jobs is large enough. Using it here to process massive amounts of satellite data, works like a charm. Vincent. It also looks like it's been (partially?) back-ported to Python 2.4 and 2.5: http://pypi.python.org/pypi/processing. John -- Hugo
Re: [9fans] threads vs forks
hugo rivera wrote: Thanks for the advice. Nevertheless I am in no position to decide what pieces of software the cluster will run, I just have to deal with what I have, but anyway I can suggest other possibilities. Well, depends on how you define 'software the cluster will run'. Do you mean cluster management software, or really any program or script or python module that needs to be installed on each node? Because for pp, you won't need any cluster software. pp is just some python module and helper scripts. You *do* need to install this (pure python) module on each node, yes, but that's it, nothing else needed. Btw, you said 'it's a small cluster, about 6 machines'. Now I'm not an expert, but I don't think you can do threading/forking from one machine to another (on linux). So I suppose there already is some cluster management software involved? And while you appear to be in no position to decide what pieces of software the cluster will run, you might want to enlighten us on what this cluster /will/ run? Your best solution might depend on that... Cheers, Vincent.
Re: [9fans] threads vs forks
hugo rivera wrote: The cluster has torque installed as the resource manager. I think it runs of top of pbs (an older project). As far as I know now I just have to call a qsub command to submit my jobs on a queue, then the resource manager allocates a processor in the cluster for my process to run till is finished. Well, I don't know torque neither pbs, but I'm guessing that when you submit a job, this job will be some program or script that is run on the allocated processor? If so, your initial question of forking vs threading is bogus. Your cluster manager will run (exec) your job, which if it is a python script will start a python interpreter for each job. I guess that's the overhead you get when running a flexible cluster system, flexible meaning that it can run any type of job (shell script, binary executable, python script, perl, etc.). However, your overhead of starting new python processes each time may seem significant when viewed in absolute terms, but if each job processes lots of data and takes, as you said, 5 min to run on a decent processor, don't you think the startup time for the python process would become non-significant? For example, on a decent machine here, the first time python takes 0.224 secs to start and shutdown immediately, and consequetive starts take only about 0.009 secs because everything is still in memory. Let's take the 0.224 secs for a worst case scenario. That would be approx 0.075 percent of your job execution time. Now lets say you have 6 machines with 8 cores each and perfect scaling, all your jobs would take 6000 / (6*8) *5min = 625 minutes (10 hours 25 mins) without python starting each time, and 625 minutes and 28 seconds with python starting anew each job. Don't you think you could just live with these 28 seconds more? Just reading this message might already have taken you more than those 28 seconds... Vincent. And I am not really sure if I have access to all the nodes, so I can install pp on each one of them. 2009/3/4, Vincent Schut sc...@sarvision.nl: hugo rivera wrote: Thanks for the advice. Nevertheless I am in no position to decide what pieces of software the cluster will run, I just have to deal with what I have, but anyway I can suggest other possibilities. Well, depends on how you define 'software the cluster will run'. Do you mean cluster management software, or really any program or script or python module that needs to be installed on each node? Because for pp, you won't need any cluster software. pp is just some python module and helper scripts. You *do* need to install this (pure python) module on each node, yes, but that's it, nothing else needed. Btw, you said 'it's a small cluster, about 6 machines'. Now I'm not an expert, but I don't think you can do threading/forking from one machine to another (on linux). So I suppose there already is some cluster management software involved? And while you appear to be in no position to decide what pieces of software the cluster will run, you might want to enlighten us on what this cluster /will/ run? Your best solution might depend on that... Cheers, Vincent.
Re: [9fans] threads vs forks
you are right. I was totally confused at the beggining. Thanks a lot. 2009/3/4, Vincent Schut sc...@sarvision.nl: hugo rivera wrote: The cluster has torque installed as the resource manager. I think it runs of top of pbs (an older project). As far as I know now I just have to call a qsub command to submit my jobs on a queue, then the resource manager allocates a processor in the cluster for my process to run till is finished. Well, I don't know torque neither pbs, but I'm guessing that when you submit a job, this job will be some program or script that is run on the allocated processor? If so, your initial question of forking vs threading is bogus. Your cluster manager will run (exec) your job, which if it is a python script will start a python interpreter for each job. I guess that's the overhead you get when running a flexible cluster system, flexible meaning that it can run any type of job (shell script, binary executable, python script, perl, etc.). However, your overhead of starting new python processes each time may seem significant when viewed in absolute terms, but if each job processes lots of data and takes, as you said, 5 min to run on a decent processor, don't you think the startup time for the python process would become non-significant? For example, on a decent machine here, the first time python takes 0.224 secs to start and shutdown immediately, and consequetive starts take only about 0.009 secs because everything is still in memory. Let's take the 0.224 secs for a worst case scenario. That would be approx 0.075 percent of your job execution time. Now lets say you have 6 machines with 8 cores each and perfect scaling, all your jobs would take 6000 / (6*8) *5min = 625 minutes (10 hours 25 mins) without python starting each time, and 625 minutes and 28 seconds with python starting anew each job. Don't you think you could just live with these 28 seconds more? Just reading this message might already have taken you more than those 28 seconds... Vincent. And I am not really sure if I have access to all the nodes, so I can install pp on each one of them. 2009/3/4, Vincent Schut sc...@sarvision.nl: hugo rivera wrote: Thanks for the advice. Nevertheless I am in no position to decide what pieces of software the cluster will run, I just have to deal with what I have, but anyway I can suggest other possibilities. Well, depends on how you define 'software the cluster will run'. Do you mean cluster management software, or really any program or script or python module that needs to be installed on each node? Because for pp, you won't need any cluster software. pp is just some python module and helper scripts. You *do* need to install this (pure python) module on each node, yes, but that's it, nothing else needed. Btw, you said 'it's a small cluster, about 6 machines'. Now I'm not an expert, but I don't think you can do threading/forking from one machine to another (on linux). So I suppose there already is some cluster management software involved? And while you appear to be in no position to decide what pieces of software the cluster will run, you might want to enlighten us on what this cluster /will/ run? Your best solution might depend on that... Cheers, Vincent. -- Hugo
Re: [9fans] threads vs forks
What about xcpu? On Wed, Mar 4, 2009 at 12:33 PM, hugo rivera uai...@gmail.com wrote: you are right. I was totally confused at the beggining. Thanks a lot. 2009/3/4, Vincent Schut sc...@sarvision.nl: hugo rivera wrote: The cluster has torque installed as the resource manager. I think it runs of top of pbs (an older project). As far as I know now I just have to call a qsub command to submit my jobs on a queue, then the resource manager allocates a processor in the cluster for my process to run till is finished. Well, I don't know torque neither pbs, but I'm guessing that when you submit a job, this job will be some program or script that is run on the allocated processor? If so, your initial question of forking vs threading is bogus. Your cluster manager will run (exec) your job, which if it is a python script will start a python interpreter for each job. I guess that's the overhead you get when running a flexible cluster system, flexible meaning that it can run any type of job (shell script, binary executable, python script, perl, etc.). However, your overhead of starting new python processes each time may seem significant when viewed in absolute terms, but if each job processes lots of data and takes, as you said, 5 min to run on a decent processor, don't you think the startup time for the python process would become non-significant? For example, on a decent machine here, the first time python takes 0.224 secs to start and shutdown immediately, and consequetive starts take only about 0.009 secs because everything is still in memory. Let's take the 0.224 secs for a worst case scenario. That would be approx 0.075 percent of your job execution time. Now lets say you have 6 machines with 8 cores each and perfect scaling, all your jobs would take 6000 / (6*8) *5min = 625 minutes (10 hours 25 mins) without python starting each time, and 625 minutes and 28 seconds with python starting anew each job. Don't you think you could just live with these 28 seconds more? Just reading this message might already have taken you more than those 28 seconds... Vincent. And I am not really sure if I have access to all the nodes, so I can install pp on each one of them. 2009/3/4, Vincent Schut sc...@sarvision.nl: hugo rivera wrote: Thanks for the advice. Nevertheless I am in no position to decide what pieces of software the cluster will run, I just have to deal with what I have, but anyway I can suggest other possibilities. Well, depends on how you define 'software the cluster will run'. Do you mean cluster management software, or really any program or script or python module that needs to be installed on each node? Because for pp, you won't need any cluster software. pp is just some python module and helper scripts. You *do* need to install this (pure python) module on each node, yes, but that's it, nothing else needed. Btw, you said 'it's a small cluster, about 6 machines'. Now I'm not an expert, but I don't think you can do threading/forking from one machine to another (on linux). So I suppose there already is some cluster management software involved? And while you appear to be in no position to decide what pieces of software the cluster will run, you might want to enlighten us on what this cluster /will/ run? Your best solution might depend on that... Cheers, Vincent. -- Hugo
Re: [9fans] threads vs forks
On Wed, Mar 4, 2009 at 2:30 AM, Vincent Schut sc...@sarvision.nl wrote: hugo rivera wrote: Now I'm not an expert, but I don't think you can do threading/forking from one machine to another (on linux). You can with bproc, but it's not supported past 2.6.21 or so. ron
Re: [9fans] threads vs forks
On Tue, 2009-03-03 at 23:24 -0600, blstu...@bellsouth.net wrote: it's interesting that parallel wasn't cool when chips were getting noticably faster rapidly. perhaps the focus on parallelization is a sign there aren't any other ideas. Gotta do something will all the extra transistors. After all, Moore's law hasn't been repealed. And pipelines and traditional caches are pretty good examples of dimishing returns. So multiple cores seems a pretty straightforward approach. Our running joke circa '05 was that the industry was suffering from the transistor overproduction crisis. One only needs to look at other overproduction crisis (especially the food industry) to appreciate the similarities. Now there is another use that would at least be intellectually interesting and possible useful in practice. Use the transistors for a really big memory running at cache speed. But instead of it being a hardware cache, manage it explicitly. In effect, we have a very high speed main memory, and the traditional main memory is backing store. It'd give a use for all those paging algorithms that aren't particularly justified at the main memory-disk boundary any more. And you can fit a lot of Plan 9 executable images in a 64MB on-chip memory space. Obviously, it wouldn't be a good fit for severely memory-hungry apps, and it might be a dead end overall, but it'd at least be something different... One could argue that transactional memory model is supposed to be exactly that. Thanks, Roman.
Re: [9fans] threads vs forks
On Wed, Mar 4, 2009 at 12:50 AM, erik quanstrom quans...@quanstro.net wrote: Both AMD and Intel are looking at I/O because it is and will be a limiting factor when scaling to higher core counts. i/o starts sucking wind with one core. that's why we differentiate i/o from everything else we do. And soon hard disk latencies are really going to start hurting (they already are hurting some, I'm sure), and I'm not convinced of the viability of SSDs. i'll assume you mean throughput. hard drive latency has been a big deal for a long time. tanenbaum integrated knowledge of track layout into his minix elevator algorithm. Yes, sorry. i think the gap between cpu performance and hd performance is narrowing, not getting wider. i don't have accurate measurements on how much real-world performance difference there is between a core i7 and an intel 5000. it's generally not spectacular, clock-for-clock. on the other hand, when the intel 5000-series was released, the rule of thumb for a sata hd was 50mb/s. it's not too hard to find regular sata hard drives that do 110mb/s today. the ssd drives we've (coraid) tested have been spectacular --- reading at 200mb/s. if you want to talk latency, ssds can deliver 1/100th the latency of spinning media. there's no way that the core i7 is 100x faster than the intel 5000. For the costs (in terms of power and durability) hard drives are really a pain, not just for some of the companies I've talked to that are burning out terabyte drives in a matter of weeks, but for mere mortals as well. And I'm sorry but the performance of hard drives is *not* very good, despite it improving. Every time I do something on a large directory tree, my drive (which is a model from last year) grinds and moans and takes, IMO, too long to do things. Putting 4GB of RAM in my computer helped, but the buffering algorithms aren't psychic, so I still pay a penalty the first time I use certain directories. Now I haven't tested an SSD for performance, but I know they are better. If I got one, this problem would likely subside, but I'm not convinced that SSDs are durable enough, despite what the manufacturers say. I haven't seen many torture tests on them, but the fact that erasing a block destroys it a little bit is scary. I do a lot of sustained writes with my typical desktop workload over the same files, and I'd rather not trust them to something that is delicate enough to need filesystem algorithms to be optimized for so they don't wear out. I guess, in essence, I just want my flying car today. - erik
Re: [9fans] threads vs forks
On Wed, Mar 4, 2009 at 8:52 AM, J.R. Mauro jrm8...@gmail.com wrote: Now I haven't tested an SSD for performance, but I know they are better. Well that I don't understand at all. Is this faith-based performance measurement? :-) I have a friend who is doing lots of SSD testing and they're not always better. For some cases, you pay a whole lot more for 2x greater throughput. it's not as simple as know they are better. If I got one, this problem would likely subside, but I'm not convinced that SSDs are durable enough, despite what the manufacturers say. I haven't seen many torture tests on them, but the fact that erasing a block destroys it a little bit is scary. I do a lot of sustained writes with my typical desktop workload over the same files, and I'd rather not trust them to something that is delicate enough to need filesystem algorithms to be optimized for so they don't wear out. in most cases write leveling is not in the file system. It's in the hardware or in a powerpc that is in the SSD controller. It's worth your doing some reading here. That said, I sure would like to have a fusion IO card for venti. From what my friend is telling me the fusion card would be ideal for venti -- as long as we keep only the arenas on it. ron
Re: [9fans] threads vs forks
That said, I sure would like to have a fusion IO card for venti. From what my friend is telling me the fusion card would be ideal for venti -- as long as we keep only the arenas on it. even better for ken's fs. i would imagine the performance difference between the fusion i/o card and mass storage is similar to that between wrens and the jukebox. - erik
Re: [9fans] threads vs forks
On Wed, Mar 4, 2009 at 12:14 PM, ron minnich rminn...@gmail.com wrote: On Wed, Mar 4, 2009 at 8:52 AM, J.R. Mauro jrm8...@gmail.com wrote: Now I haven't tested an SSD for performance, but I know they are better. Well that I don't understand at all. Is this faith-based performance measurement? :-) No, I have seen several benchmarks. The benchmarks I haven't seen are ones for how long does it take to actually break these drives? from anyone other than the manufacturer. I have a friend who is doing lots of SSD testing and they're not always better. For some cases, you pay a whole lot more for 2x greater throughput. it's not as simple as know they are better. What types of things degrade their performance? I'm interested in seeing other data than the handful of benchmarks I've seen. I imagine writes would be the culprit since you have to erase a whole block first? If I got one, this problem would likely subside, but I'm not convinced that SSDs are durable enough, despite what the manufacturers say. I haven't seen many torture tests on them, but the fact that erasing a block destroys it a little bit is scary. I do a lot of sustained writes with my typical desktop workload over the same files, and I'd rather not trust them to something that is delicate enough to need filesystem algorithms to be optimized for so they don't wear out. in most cases write leveling is not in the file system. It's in the hardware or in a powerpc that is in the SSD controller. It's worth your doing some reading here. I've seen a lot about optimizing the next-generation filesystems for flash. Despite the claims that the hardware-based solutions will be satisfactory, there are a lot of people interested in making existing filesystems smarter about SSDs, both for wear and for optimizing read/write. Beyond that, though, I feel very shaky just hearing the term wear leveling. I've had more flash-based devices fail on me than hard drives, but maybe I'm just crazy and the technology has gotten decent enough in the past couple years to allay my worrying. It would just be nice to see a bit stronger alternative being pushed as hard as SSDs. That said, I sure would like to have a fusion IO card for venti. From what my friend is telling me the fusion card would be ideal for venti -- as long as we keep only the arenas on it. ron
Re: [9fans] threads vs forks
On Wed, Mar 04, 2009 at 10:32:55PM -0500, J.R. Mauro wrote: What types of things degrade their performance? I'm interested in seeing other data than the handful of benchmarks I've seen. I imagine writes would be the culprit since you have to erase a whole block first? Being full. Small random writes, too, although much more so for run-of-the-mill SSDs than for FusionIO. [citation needed] - erik
[9fans] threads vs forks
Hi, this is not really a plan 9 question, but since you are the wisest guys I know I am hoping that you can help me. You see, I have to launch many tasks running in parallel (~5000) in a cluster running linux. Each of the task performs some astronomical calculations and I am not pretty sure if using fork is the best answer here. First of all, all the programming is done in python and c, and since we are using os.fork() python facility I think that it is somehow related to the underlying c fork (well, I really do not know much of forks in linux, the few things I do know about forks and threads I got them from Francisco Ballesteros' Introduction to operating system abstractions). The point here is if I should use forks or threads to deal with the job at hand? I heard that there are some problems if you fork too many processes (I am not sure how many are too many) so I am thinking to use threads. I know some basic differences between threads and forks, but I am not aware of the details of the implementation (probably I will never be). Finally, if this is a question that does not belong to the plan 9 mailing list, please let me know and I'll shut up. Saludos -- Hugo
Re: [9fans] threads vs forks
On Tue, Mar 3, 2009 at 3:52 AM, hugo rivera uai...@gmail.com wrote: Hi, this is not really a plan 9 question, but since you are the wisest guys I know I am hoping that you can help me. You see, I have to launch many tasks running in parallel (~5000) in a cluster running linux. Each of the task performs some astronomical calculations and I am not pretty sure if using fork is the best answer here. First of all, all the programming is done in python and c, and since we are using os.fork() python facility I think that it is somehow related to the underlying c fork (well, I really do not know much of forks in linux, the few things I do know about forks and threads I got them from Francisco Ballesteros' Introduction to operating system abstractions). My knowledge on this subject is about 8 or 9 years old, so check with your local Python guru The last I'd heard about Python's threading is that it was cooperative only, and that you couldn't get real parallelism out of it. It serves as a means to organize your program in a concurrent manner. In other words no two threads run at the same time in Python, even if you're on a multi-core system, due to something they call a Global Interpreter Lock. The point here is if I should use forks or threads to deal with the job at hand? I heard that there are some problems if you fork too many processes (I am not sure how many are too many) so I am thinking to use threads. I know some basic differences between threads and forks, but I am not aware of the details of the implementation (probably I will never be). Finally, if this is a question that does not belong to the plan 9 mailing list, please let me know and I'll shut up. Saludos I think you need to understand the system limits, which is something you can look up for yourself. Also you should understand what kind of runtime model threads in the language you're using actually implements. Those rules basically apply to any system. -- Hugo
Re: [9fans] threads vs forks
thanks a lot guys. I think I should study this issue in greater detail. It is not as easy as I tought it would be. 2009/3/3, David Leimbach leim...@gmail.com: On Tue, Mar 3, 2009 at 3:52 AM, hugo rivera uai...@gmail.com wrote: Hi, this is not really a plan 9 question, but since you are the wisest guys I know I am hoping that you can help me. You see, I have to launch many tasks running in parallel (~5000) in a cluster running linux. Each of the task performs some astronomical calculations and I am not pretty sure if using fork is the best answer here. First of all, all the programming is done in python and c, and since we are using os.fork() python facility I think that it is somehow related to the underlying c fork (well, I really do not know much of forks in linux, the few things I do know about forks and threads I got them from Francisco Ballesteros' Introduction to operating system abstractions). My knowledge on this subject is about 8 or 9 years old, so check with your local Python guru The last I'd heard about Python's threading is that it was cooperative only, and that you couldn't get real parallelism out of it. It serves as a means to organize your program in a concurrent manner. In other words no two threads run at the same time in Python, even if you're on a multi-core system, due to something they call a Global Interpreter Lock. The point here is if I should use forks or threads to deal with the job at hand? I heard that there are some problems if you fork too many processes (I am not sure how many are too many) so I am thinking to use threads. I know some basic differences between threads and forks, but I am not aware of the details of the implementation (probably I will never be). Finally, if this is a question that does not belong to the plan 9 mailing list, please let me know and I'll shut up. Saludos I think you need to understand the system limits, which is something you can look up for yourself. Also you should understand what kind of runtime model threads in the language you're using actually implements. Those rules basically apply to any system. -- Hugo -- Hugo
Re: [9fans] threads vs forks
Python 'threads' are the same pthreads turds all other lunix junk uses. The only difference is that the interpreter itself is not threadsafe, so they have a global lock which means threads suck even more than usual. Forking a python interpreter is a *bad* idea, because python's start up takes billions of years. This has nothing to do with the merits of fork, and all with how much python sucks. There is Stackless Python, which has proper CSP threads/procs and channels, very similar to limbo. http://www.stackless.com/ But that is too sane for the mainline python folks obviously, so they stick to the pthrereads turds, ... My advice: unless you can use Stackless, stay as far away as you can from any concurrent python stuff. (And don't get me started on twisted and their event based hacks). Oh, and as I mentioned in another thread, in my experience if you are going to fork, make sure you compile statically, dynamic linking is almost as evil as pthreads. But this is lunix, so what do you expect? uriel On Tue, Mar 3, 2009 at 4:19 PM, David Leimbach leim...@gmail.com wrote: On Tue, Mar 3, 2009 at 3:52 AM, hugo rivera uai...@gmail.com wrote: Hi, this is not really a plan 9 question, but since you are the wisest guys I know I am hoping that you can help me. You see, I have to launch many tasks running in parallel (~5000) in a cluster running linux. Each of the task performs some astronomical calculations and I am not pretty sure if using fork is the best answer here. First of all, all the programming is done in python and c, and since we are using os.fork() python facility I think that it is somehow related to the underlying c fork (well, I really do not know much of forks in linux, the few things I do know about forks and threads I got them from Francisco Ballesteros' Introduction to operating system abstractions). My knowledge on this subject is about 8 or 9 years old, so check with your local Python guru The last I'd heard about Python's threading is that it was cooperative only, and that you couldn't get real parallelism out of it. It serves as a means to organize your program in a concurrent manner. In other words no two threads run at the same time in Python, even if you're on a multi-core system, due to something they call a Global Interpreter Lock. The point here is if I should use forks or threads to deal with the job at hand? I heard that there are some problems if you fork too many processes (I am not sure how many are too many) so I am thinking to use threads. I know some basic differences between threads and forks, but I am not aware of the details of the implementation (probably I will never be). Finally, if this is a question that does not belong to the plan 9 mailing list, please let me know and I'll shut up. Saludos I think you need to understand the system limits, which is something you can look up for yourself. Also you should understand what kind of runtime model threads in the language you're using actually implements. Those rules basically apply to any system. -- Hugo
Re: [9fans] threads vs forks
On Tue, Mar 3, 2009 at 3:52 AM, hugo rivera uai...@gmail.com wrote: You see, I have to launch many tasks running in parallel (~5000) in a cluster running linux. Each of the task performs some astronomical calculations and I am not pretty sure if using fork is the best answer here. lots of questions first . how many cluster nodes. how long do the jobs run. input files or args? output files? how big? You can't say much with the information you gave. ron
Re: [9fans] threads vs forks
2009/3/3, Uriel urie...@gmail.com: Oh, and as I mentioned in another thread, in my experience if you are going to fork, make sure you compile statically, dynamic linking is almost as evil as pthreads. But this is lunix, so what do you expect? not much. Wish I could get it done with plan 9. -- Hugo
Re: [9fans] threads vs forks
2009/3/3, ron minnich rminn...@gmail.com: lots of questions first . how many cluster nodes. how long do the jobs run. input files or args? output files? how big? You can't say much with the information you gave. It is a small cluster, of 6 machines. I think each job runs for a few minutes (~5), take some input files and generate a couple of files (I am not really sure about how many output files each proccess generates). The size of the output files is ~1Mb. -- Hugo
Re: [9fans] threads vs forks
On Tue, Mar 3, 2009 at 3:52 AM, hugo rivera uai...@gmail.com wrote: I have to launch many tasks running in parallel (~5000) in a cluster running linux. Each of the task performs some astronomical calculations and I am not pretty sure if using fork is the best answer here. First of all, all the programming is done in python and c... Take a look at the multiprocessing package (http://docs.python.org/library/multiprocessing.html), newly introduced with Python 2.6 and 3.0: multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. It should be a quick and easy way to set up a cluster-wide job processing system (provided all your jobs are driven by Python). It also looks like it's been (partially?) back-ported to Python 2.4 and 2.5: http://pypi.python.org/pypi/processing. John
Re: [9fans] threads vs forks
On Tue, Mar 3, 2009 at 8:28 AM, hugo rivera uai...@gmail.com wrote: It is a small cluster, of 6 machines. I think each job runs for a few minutes (~5), take some input files and generate a couple of files (I am not really sure about how many output files each proccess generates). The size of the output files is ~1Mb. for that size cluster, and jobs running a few minutes, fork ought to be fine. ron
Re: [9fans] threads vs forks
On Tue, 2009-03-03 at 07:19 -0800, David Leimbach wrote: My knowledge on this subject is about 8 or 9 years old, so check with your local Python guru The last I'd heard about Python's threading is that it was cooperative only, and that you couldn't get real parallelism out of it. It serves as a means to organize your program in a concurrent manner. In other words no two threads run at the same time in Python, even if you're on a multi-core system, due to something they call a Global Interpreter Lock. I believe GIL is as present in Python nowadays as ever. On a related note: does anybody know any sane interpreted languages with a decent threading model to go along? Stackless python is the only thing that I'm familiar with in that department. Thanks, Roman.
Re: [9fans] threads vs forks
On Tue, 03 Mar 2009 10:11:10 PST Roman V. Shaposhnik r...@sun.com wrote: On Tue, 2009-03-03 at 07:19 -0800, David Leimbach wrote: My knowledge on this subject is about 8 or 9 years old, so check with your local Python guru The last I'd heard about Python's threading is that it was cooperative only, and that you couldn't get real parallelism out of it. It serves as a means to organize your program in a concurrent manner. In other words no two threads run at the same time in Python, even if you're on a multi-core system, due to something they call a Global Interpreter Lock. I believe GIL is as present in Python nowadays as ever. On a related note: does anybody know any sane interpreted languages with a decent threading model to go along? Stackless python is the only thing that I'm familiar with in that department. Depend on what you mean by sane interpreted language with a decent threading model and what you want to do with it but check out www.clojure.org. Then there is Erlang. Its wikipedia entry has this to say: Although Erlang was designed to fill a niche and has remained an obscure language for most of its existence, it is experiencing a rapid increase in popularity due to increased demand for concurrent services, inferior models of concurrency in most mainstream programming languages, and its substantial libraries and documentation.[7][8] Well-known applications include Amazon SimpleDB,[9] Yahoo! Delicious,[10] and the Facebook Chat system.[11]
Re: [9fans] threads vs forks
On Tue, Mar 3, 2009 at 1:11 PM, Roman V. Shaposhnik r...@sun.com wrote: On Tue, 2009-03-03 at 07:19 -0800, David Leimbach wrote: My knowledge on this subject is about 8 or 9 years old, so check with your local Python guru The last I'd heard about Python's threading is that it was cooperative only, and that you couldn't get real parallelism out of it. It serves as a means to organize your program in a concurrent manner. In other words no two threads run at the same time in Python, even if you're on a multi-core system, due to something they call a Global Interpreter Lock. I believe GIL is as present in Python nowadays as ever. On a related note: does anybody know any sane interpreted languages with a decent threading model to go along? Stackless python is the only thing that I'm familiar with in that department. I thought part of the reason for the big break with Python 3000 was to get rid of the GIL and clean that threading mess up. Or am I way off? Thanks, Roman.
Re: [9fans] threads vs forks
You are off. It is doubtful that the GIL will ever be removed. But that really isn't the issue, the issue is the lack of a decent concurrency model, like the one provided by Stackless. But apparently one of the things stackless allows is evil recursive programming, which Guido considers 'confusing' and wont allow in mainline python (I think another reason is that porting it to jython and .not would be hard, but I'm not familiar with the details). uriel On Wed, Mar 4, 2009 at 12:08 AM, J.R. Mauro jrm8...@gmail.com wrote: On Tue, Mar 3, 2009 at 1:11 PM, Roman V. Shaposhnik r...@sun.com wrote: On Tue, 2009-03-03 at 07:19 -0800, David Leimbach wrote: My knowledge on this subject is about 8 or 9 years old, so check with your local Python guru The last I'd heard about Python's threading is that it was cooperative only, and that you couldn't get real parallelism out of it. It serves as a means to organize your program in a concurrent manner. In other words no two threads run at the same time in Python, even if you're on a multi-core system, due to something they call a Global Interpreter Lock. I believe GIL is as present in Python nowadays as ever. On a related note: does anybody know any sane interpreted languages with a decent threading model to go along? Stackless python is the only thing that I'm familiar with in that department. I thought part of the reason for the big break with Python 3000 was to get rid of the GIL and clean that threading mess up. Or am I way off? Thanks, Roman.
Re: [9fans] threads vs forks
On Tue, Mar 3, 2009 at 6:15 PM, Uriel urie...@gmail.com wrote: You are off. It is doubtful that the GIL will ever be removed. That's too bad. Things like that just reinforce my view that Python is a hack :( Oh well, back to C... But that really isn't the issue, the issue is the lack of a decent concurrency model, like the one provided by Stackless. But apparently one of the things stackless allows is evil recursive programming, which Guido considers 'confusing' and wont allow in mainline python (I think another reason is that porting it to jython and .not would be hard, but I'm not familiar with the details). Concurrency seems to be one of those things that's too hard for everyone, and I don't buy it. There's no reason it needs to be as hard as it is. And nevermind the fact that it's not really usable for every (or even most) jobs out there. But Intel is pushing it, so that's where we have to go, I suppose. uriel - Show quoted text - On Wed, Mar 4, 2009 at 12:08 AM, J.R. Mauro jrm8...@gmail.com wrote: On Tue, Mar 3, 2009 at 1:11 PM, Roman V. Shaposhnik r...@sun.com wrote: On Tue, 2009-03-03 at 07:19 -0800, David Leimbach wrote: My knowledge on this subject is about 8 or 9 years old, so check with your local Python guru The last I'd heard about Python's threading is that it was cooperative only, and that you couldn't get real parallelism out of it. It serves as a means to organize your program in a concurrent manner. In other words no two threads run at the same time in Python, even if you're on a multi-core system, due to something they call a Global Interpreter Lock. I believe GIL is as present in Python nowadays as ever. On a related note: does anybody know any sane interpreted languages with a decent threading model to go along? Stackless python is the only thing that I'm familiar with in that department. I thought part of the reason for the big break with Python 3000 was to get rid of the GIL and clean that threading mess up. Or am I way off? Thanks, Roman.
Re: [9fans] threads vs forks
2009/3/3 J.R. Mauro jrm8...@gmail.com: Concurrency seems to be one of those things that's too hard for everyone, and I don't buy it. There's no reason it needs to be as hard as it is. That's a fact. If you have access to The ACM Queue, check out p16-cantrill-concurrency.pdf (Cantrill and Bonwich on concurrency). And nevermind the fact that it's not really usable for every (or even most) jobs out there. But Intel is pushing it, so that's where we have to go, I suppose. That's simply not true. In my world (server software and networking), most tasks can be improved by utilizing concurrent programming paradigms. Even in user interfaces, these are useful. For mathematics, there's simply no question that making use of concurrent algorithms is a win. In fact, I can't think of a single case in which doing two lines of work at once isn't better than doing one at a time, assuming that accuracy is maintained in the result. --dho
Re: [9fans] threads vs forks
On Tue, Mar 3, 2009 at 6:54 PM, Devon H. O'Dell devon.od...@gmail.com wrote: 2009/3/3 J.R. Mauro jrm8...@gmail.com: Concurrency seems to be one of those things that's too hard for everyone, and I don't buy it. There's no reason it needs to be as hard as it is. That's a fact. If you have access to The ACM Queue, check out p16-cantrill-concurrency.pdf (Cantrill and Bonwich on concurrency). Things like TBB and other libraries to automagically scale up repeated operations into parallelized ones help alleviate the problems with getting parallelization to work. They're ugly, they only address narrow problem sets, but they're attempts at solutions. And if you look at languages like LISP and Erlang, you're definitely left with a feeling that parallelization is being treated as harder than it is. I'm not saying it isn't hard, just that there are a lot of people who seem to be throwing up their hands over it. I suppose I should stop reading their material. And nevermind the fact that it's not really usable for every (or even most) jobs out there. But Intel is pushing it, so that's where we have to go, I suppose. That's simply not true. In my world (server software and networking), most tasks can be improved by utilizing concurrent programming paradigms. Even in user interfaces, these are useful. For mathematics, there's simply no question that making use of concurrent algorithms is a win. In fact, I can't think of a single case in which doing two lines of work at once isn't better than doing one at a time, assuming that accuracy is maintained in the result. I should have qualified. I mean *massive* parallelization when applied to average use cases. I don't think it's totally unusable (I complain about synchronous I/O on my phone every day), but it's being pushed as a panacea, and that is what I think is wrong. Don Knuth holds this opinion, but I think he's mostly alone on that, unfortunately. Of course for mathematically intensive and large-scale operations, the more parallel you can make things the better. --dho
Re: [9fans] threads vs forks
I should have qualified. I mean *massive* parallelization when applied to average use cases. I don't think it's totally unusable (I complain about synchronous I/O on my phone every day), but it's being pushed as a panacea, and that is what I think is wrong. Don Knuth holds this opinion, but I think he's mostly alone on that, unfortunately. it's interesting that parallel wasn't cool when chips were getting noticably faster rapidly. perhaps the focus on parallelization is a sign there aren't any other ideas. - erik
Re: [9fans] threads vs forks
On Tue, Mar 3, 2009 at 7:54 PM, erik quanstrom quans...@quanstro.net wrote: I should have qualified. I mean *massive* parallelization when applied to average use cases. I don't think it's totally unusable (I complain about synchronous I/O on my phone every day), but it's being pushed as a panacea, and that is what I think is wrong. Don Knuth holds this opinion, but I think he's mostly alone on that, unfortunately. it's interesting that parallel wasn't cool when chips were getting noticably faster rapidly. perhaps the focus on parallelization is a sign there aren't any other ideas. Indeed, I think it is. The big manufacturers seem to have hit a wall with clock speed, done a full reverse, and are now just trying to pack more transistors and cores on the chip. Not that this is evil, but I think this is just as bad as the obsession with upping the clock speeds in that they're too focused on one path instead of incorporating other cool ideas (i.e., things Transmeta was working on with virtualization and hosting foreign ISAs) - erik
Re: [9fans] threads vs forks
On Tue, Mar 3, 2009 at 4:54 PM, erik quanstrom quans...@quanstro.net wrote: I should have qualified. I mean *massive* parallelization when applied to average use cases. I don't think it's totally unusable (I complain about synchronous I/O on my phone every day), but it's being pushed as a panacea, and that is what I think is wrong. Don Knuth holds this opinion, but I think he's mostly alone on that, unfortunately. it's interesting that parallel wasn't cool when chips were getting noticably faster rapidly. perhaps the focus on parallelization is a sign there aren't any other ideas. That seems to be what Knuth thinks. Excerpt from a 2008 interview w/ InformIT: InformIT: Vendors of multicore processors have expressed frustration at the difficulty of moving developers to this model. As a former professor, what thoughts do you have on this transition and how to make it happen? Is it a question of proper tools, such as better native support for concurrency in languages, or of execution frameworks? Or are there other solutions? Knuth: I don’t want to duck your question entirely. I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the Itanium approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write. Full interview is at http://www.informit.com/articles/article.aspx?p=1193856.
Re: [9fans] threads vs forks
J.R. Mauro wrote: On Tue, Mar 3, 2009 at 7:54 PM, erik quanstrom quans...@quanstro.net wrote: I should have qualified. I mean *massive* parallelization when applied to average use cases. I don't think it's totally unusable (I complain about synchronous I/O on my phone every day), but it's being pushed as a panacea, and that is what I think is wrong. Don Knuth holds this opinion, but I think he's mostly alone on that, unfortunately. it's interesting that parallel wasn't cool when chips were getting noticably faster rapidly. perhaps the focus on parallelization is a sign there aren't any other ideas. Indeed, I think it is. The big manufacturers seem to have hit a wall with clock speed, done a full reverse, and are now just trying to pack more transistors and cores on the chip. Not that this is evil, but I think this is just as bad as the obsession with upping the clock speeds in that they're too focused on one path instead of incorporating other cool ideas (i.e., things Transmeta was working on with virtualization and hosting foreign ISAs) Die size has been the main focus for the foundries, reduced transistor switch time is just a benefit from that. Digital components work well here, but Analog suffers and creating a stable clock at high frequency is done in the Analog domain. It is much easier to double the transistor count than it is to double the clock frequency. Also have to consider the power/heat/noise costs from increasing the clock. I think the reason why you didn't see parallelism come out earlier in the PC market was because they needed to create new mechanisms for I/O. AMD did this with Hypertransport, and I've seen 32-core (8-socket) systems with this. Now Intel has their own I/O rethink out there. I've been trying to get my industry to look at parallel computing for many years, and it's only now that they are starting to sell parallel circuit simulators and still they are not that efficient. A traditionally week-long sim is now taking a single day when run on 12-cores. I'll take that 7x over 1x anytime though. /james
Re: [9fans] threads vs forks
I think the reason why you didn't see parallelism come out earlier in the PC market was because they needed to create new mechanisms for I/O. AMD did this with Hypertransport, and I've seen 32-core (8-socket) systems with this. Now Intel has their own I/O rethink out there. i think what you're saying is equivalent to saying (in terms i understand) that memory bandwidth was so bad that a second processor couldn't do much work. but i haven't found this to be the case. even the highly constrained pentium 4 gets some milage out of hyperthreading for the tests i've run. the intel 5000-series still use a fsb. and they seem to scale well from 1 to 4 cores. are there benchmarks that show otherwise similar hypertransport systems trouncing intel in multithreaded performance? i don't recall seeing anything more than a moderate (15-20%) advantage. - erik
Re: [9fans] threads vs forks
erik quanstrom wrote: I think the reason why you didn't see parallelism come out earlier in the PC market was because they needed to create new mechanisms for I/O. AMD did this with Hypertransport, and I've seen 32-core (8-socket) systems with this. Now Intel has their own I/O rethink out there. i think what you're saying is equivalent to saying (in terms i understand) that memory bandwidth was so bad that a second processor couldn't do much work. Yes bandwidth and latency. but i haven't found this to be the case. even the highly constrained pentium 4 gets some milage out of hyperthreading for the tests i've run. the intel 5000-series still use a fsb. and they seem to scale well from 1 to 4 cores. Many of the circuit simulators I use fall flat on their face after 4 cores, say. However I blame this on their algorithm not hardware. I wasn't making an AMD vs Intel comment, just that AMD had created HTX along with their K8 platform to address scalability concerns with I/O. are there benchmarks that show otherwise similar hypertransport systems trouncing intel in multithreaded performance? i don't recall seeing anything more than a moderate (15-20%) advantage. I don't have a 16-core Intel system to compare with, but: http://en.wikipedia.org/wiki/List_of_device_bandwidths#Computer_buses I think the reason why Intel developed their Common Systems Interconnect (now called QuickPath Interconnect) was to address it's shortcomings. Both AMD and Intel are looking at I/O because it is and will be a limiting factor when scaling to higher core counts. - erik
Re: [9fans] threads vs forks
On Tue, Mar 3, 2009 at 11:44 PM, James Tomaschke ja...@orcasystems.com wrote: erik quanstrom wrote: I think the reason why you didn't see parallelism come out earlier in the PC market was because they needed to create new mechanisms for I/O. AMD did this with Hypertransport, and I've seen 32-core (8-socket) systems with this. Now Intel has their own I/O rethink out there. i think what you're saying is equivalent to saying (in terms i understand) that memory bandwidth was so bad that a second processor couldn't do much work. Yes bandwidth and latency. but i haven't found this to be the case. even the highly constrained pentium 4 gets some milage out of hyperthreading for the tests i've run. the intel 5000-series still use a fsb. and they seem to scale well from 1 to 4 cores. Many of the circuit simulators I use fall flat on their face after 4 cores, say. However I blame this on their algorithm not hardware. I wasn't making an AMD vs Intel comment, just that AMD had created HTX along with their K8 platform to address scalability concerns with I/O. are there benchmarks that show otherwise similar hypertransport systems trouncing intel in multithreaded performance? i don't recall seeing anything more than a moderate (15-20%) advantage. I don't have a 16-core Intel system to compare with, but: http://en.wikipedia.org/wiki/List_of_device_bandwidths#Computer_buses I think the reason why Intel developed their Common Systems Interconnect (now called QuickPath Interconnect) was to address it's shortcomings. Both AMD and Intel are looking at I/O because it is and will be a limiting factor when scaling to higher core counts. And soon hard disk latencies are really going to start hurting (they already are hurting some, I'm sure), and I'm not convinced of the viability of SSDs. There was an interesting article I came across that compared the latencies of accessing a register, a CPU cache, main memory, and disk, which put them in human terms. As much as we like to say we understand the difference between a millisecond and a nanosecond, seeing cache access expressed in terms of moments and a disk access in terms of years was rather illuminating, if only to me. Same article also put a google search at only slightly slower latency than hard disk access. The internet really is becoming the computer, I suppose. - erik
Re: [9fans] threads vs forks
On Tue, Mar 3, 2009 at 10:11 AM, Roman V. Shaposhnik r...@sun.com wrote: On Tue, 2009-03-03 at 07:19 -0800, David Leimbach wrote: My knowledge on this subject is about 8 or 9 years old, so check with your local Python guru The last I'd heard about Python's threading is that it was cooperative only, and that you couldn't get real parallelism out of it. It serves as a means to organize your program in a concurrent manner. In other words no two threads run at the same time in Python, even if you're on a multi-core system, due to something they call a Global Interpreter Lock. I believe GIL is as present in Python nowadays as ever. On a related note: does anybody know any sane interpreted languages with a decent threading model to go along? Stackless python is the only thing that I'm familiar with in that department. I'm a fan of Erlang. Though I guess it's technically a compiled virtual machine of sorts, even when it's escript. But I've had an absolutely awesome experience over the last year using it, and so far only wishing it came with the type safety of Haskell :-). I love Haskell's threading model actually, in either the data parallelism or the forkIO interface, it's pretty sane. Typed data channels even between forkIO'd threads. Thanks, Roman.
Re: [9fans] threads vs forks
On Tue, Mar 3, 2009 at 5:54 PM, J.R. Mauro jrm8...@gmail.com wrote: On Tue, Mar 3, 2009 at 7:54 PM, erik quanstrom quans...@quanstro.net wrote: I should have qualified. I mean *massive* parallelization when applied to average use cases. I don't think it's totally unusable (I complain about synchronous I/O on my phone every day), but it's being pushed as a panacea, and that is what I think is wrong. Don Knuth holds this opinion, but I think he's mostly alone on that, unfortunately. it's interesting that parallel wasn't cool when chips were getting noticably faster rapidly. perhaps the focus on parallelization is a sign there aren't any other ideas. Indeed, I think it is. The big manufacturers seem to have hit a wall with clock speed, done a full reverse, and are now just trying to pack more transistors and cores on the chip. Not that this is evil, but I think this is just as bad as the obsession with upping the clock speeds in that they're too focused on one path instead of incorporating other cool ideas (i.e., things Transmeta was working on with virtualization and hosting foreign ISAs) Can we bring back the Burroughs? :-) - erik
Re: [9fans] threads vs forks
I believe GIL is as present in Python nowadays as ever. On a related note: does anybody know any sane interpreted languages with a decent threading model to go along? Stackless python is the only thing that I'm familiar with in that department. Check out Lua's coroutines: http://www.lua.org/manual/5.1/manual.html#2.11 Here's an implementation of the sieve of Eratosthenes using Lua coroutines similar to the Limbo one: http://www.lua.org/cgi-bin/demo?sieve
Re: [9fans] threads vs forks
Now there is another use that would at least be intellectually interesting and possible useful in practice. Use the transistors for a really big memory running at cache speed. But instead of it being a hardware cache, manage it explicitly. In effect, we have a very high speed main memory, and the traditional main memory is backing store. It'd give a use for all those paging algorithms that aren't particularly justified at the main memory-disk boundary any more. And you can fit a lot of Plan 9 executable images in a 64MB on-chip memory space. Obviously, it wouldn't be a good fit for severely memory-hungry apps, and it might be a dead end overall, but it'd at least be something different... ken's fs already has the machinery to handle this. one could imagine a cachefs that knew how to manage this for venti. (though venti seems like a poor fit.) there are lots of interesting uses of explicitly managed, heirarchical caches. yet so far hardware has done it's level best to hide this. - erik
Re: [9fans] threads vs forks
Both AMD and Intel are looking at I/O because it is and will be a limiting factor when scaling to higher core counts. i/o starts sucking wind with one core. that's why we differentiate i/o from everything else we do. And soon hard disk latencies are really going to start hurting (they already are hurting some, I'm sure), and I'm not convinced of the viability of SSDs. i'll assume you mean throughput. hard drive latency has been a big deal for a long time. tanenbaum integrated knowledge of track layout into his minix elevator algorithm. i think the gap between cpu performance and hd performance is narrowing, not getting wider. i don't have accurate measurements on how much real-world performance difference there is between a core i7 and an intel 5000. it's generally not spectacular, clock-for-clock. on the other hand, when the intel 5000-series was released, the rule of thumb for a sata hd was 50mb/s. it's not too hard to find regular sata hard drives that do 110mb/s today. the ssd drives we've (coraid) tested have been spectacular --- reading at 200mb/s. if you want to talk latency, ssds can deliver 1/100th the latency of spinning media. there's no way that the core i7 is 100x faster than the intel 5000. - erik
Re: [9fans] threads vs forks
the ssd drives we've (coraid) tested have been spectacular --- reading at 200mb/s. you know, i've read all the reviews and seen all the windows benchmarks. but this info, coming from somebody on this list, is much more assuring than all the slashdot articles. the tests didn't involve plan9 by any chance, did they? ;)