Re: Threads Design. A Win32 perspective.
On Sat, 03 Jan 2004 01:48:07 -0500, [EMAIL PROTECTED] (Uri Guttman) wrote: ding! ding! ding! you just brought in a cpu specific instruction which is not guaranteed to be on any other arch. in fact many have such a beast but again, it is not accessible from c. you can't bring x86 centrism into this. the fact that redmond/intel threads can make use of this instruction to do 'critical sections' is a major point why it is bad for a portable system with many architectures to be supported. i disagree. it is covered for the intel case only and that is not good enough. again, intel specific and it has to be tossed out. but atomic operations are not available on all platforms. not in a unix environment. real multi-threaded processes already have this problem. that is a kernel issue and most signalling systems don't have thread specific arguments AFAIK. virtual ram is what counts on unix. you can't request some large amount without using real swap space. again, redmond specific. what win32 does is not portable and not useful at a VM level. ok, i can see where you got the test/set and yield stuff from now finally, it is redmond specific again and very unportable. i have never heard of this fibre concept on any unix flavor. and that is not possible on other platforms. that is a big point and one that i don't see as possible. redmond can do what they want with their kernel and user procs. parrot can only use kernel concepts that are reasonably portable across most platforms. kernel threads are reasonably portable (FSDO of reasonable) but anything beyond that such as fibres, test/set in user space, etc is not. locks have to be in kernel space since we can do a fibre yield in user space on any other platform. so this rule out user space test/set as well since that would need a thread to spin instead of blocking. your ideas make sense but only on redmond/intel which is not the target space for parrot. That's pretty much the crux. I don't know what is available (in detail) on other platforms. Hence I needed to express the ideas in terms I understand and explain them sufficiently that they could be viewed, interpreted and either related to similar concepts on othe platforms, or shot down. I accept your overall judgement, though not necessarially all the specifics. Maybe it would be possible (for me + others) to write the core of a win32 specific, threaded VM interpreter that would take parrot byte code and run it. Thereby, utilising all the good stuff that preceeds the VM interpreter, plus probably large chunks of the parrot VM, but provides it with a more native compatible target. That's something that obviously not a simple project and is beyond the scope of this list. Thanks for taking the time to review this. Nigel Sandever.
Re: Threads Design. A Win32 perspective.
Nigel Sandever writes: Maybe it would be possible (for me + others) to write the core of a win32 specific, threaded VM interpreter that would take parrot byte code and run it. Thereby, utilising all the good stuff that preceeds the VM interpreter, plus probably large chunks of the parrot VM, but provides it with a more native compatible target. You want to write a parrot? Um, good luck. One thing Uri mentioned was writing platform specific macro code, and he dismissed it with that it's also a pain. While I agree that it's a pain, and it's about as maintainence-friendly as JIT, I don't think it has to be ruled out. Parrot is platform-independent, but that doesn't mean we can't take advantage of platform-specific instructions to make it faster on certain machines. Indeed, this is precisely what JIT is. But a lock on every PMC is still pretty heavy for those non-x86 platforms out there, and we should avoid it if we can. Luke That's something that obviously not a simple project and is beyond the scope of this list. Thanks for taking the time to review this. Nigel Sandever.
Re: Threads Design. A Win32 perspective.
Nigel Sandever [EMAIL PROTECTED] wrote: On Sat, 03 Jan 2004 01:48:07 -0500, [EMAIL PROTECTED] (Uri Guttman) wrote: your ideas make sense but only on redmond/intel which is not the target space for parrot. s/not the/by far not the only/ Maybe it would be possible (for me + others) to write the core of a win32 specific, threaded VM interpreter that would take parrot byte code and run it. Thereby, utilising all the good stuff that preceeds the VM interpreter, plus probably large chunks of the parrot VM, but provides it with a more native compatible target. I'd be glad, if someone fills the gaps in platform code. There is no need to rewrite the interpreter or such: Defining the necessary macros in an efficient way should be enough to start with. Nigel Sandever. leo
Re: Threads Design. A Win32 perspective.
Nigel Sandever [EMAIL PROTECTED] wrote: [ Line length adjusted for readability ] VIRTUAL MACHINE INTERPRETER At any given point in the running of the interpreter, the VM register set, program counter and stack must represent the entire state for that thread. That's exactly, what a ParrotInterpreter is: the entire state for a thread. I am completely against the idea of VM threads having a 1:1 relationship with interpreters. While these can be separated its not efficient. Please note that Parrot is a register-based VM. Switching state is *not* switching a stack-pointer and a PC thingy like in stack-based VMs. ... The runtime costs of duplicating all shared data between interpreters, tying all shared variables, added to the cost of locking, throw away almost all of the benefit of using threads. No, shared resources are not duplicated, nor tied. The locking of course remains, but that's it. The distinction is that a critical section does not disable time slicing/ preemption. It prevents any thread attempting to enter an owned critical section from receiving a timeslice until the current owner relinquishes it. These are platform specific details. We will use whatever the platform/OS provides. In the source code its a LOCK() UNLOCK() pair. The LOCK() can be any atomic operation and doesn't need to call the kernel, if the lock is aquired. [ Lot of Wintel stuff snipped ] Nigel Sandever. leo
Re: cvs commit: parrot/config/gen/platform ansi.c darwin.c generic.c openbsd.c platform_interface.h win32.c
Peter Gibbs [EMAIL PROTECTED] wrote: Log: Prevent attempts to reallocate mmap'd executable memory until somebody works out how to do it. For linux/fedora, I'd go with this scheme: - mem_alloc_executable uses valloc() that is memalign(getpagesize(), size) with size rounded up to the next page size. - mem_realloc_executable tries to realloc the memory region, and if a different (unaligned) pointer is returned, then valloc(), memcpy and free old. - mem_prepare_executable (called after the code is prepared and before execution starts, does mprotect(p, n, PROT_EXEC|PROT_READ) on fedora or flush these memory region on ARM and PPC. - mem_free_executable is just free - the current also has a missing (wrong) length argument. And finally generic platforms that don't have this executable protection just map these function to the plain allocation functions. The current mmap() approach is probably not suited for allocating a lot of different small code pieces, there is for sure some OS limit. This scheme is probably only usable as base for an own allocator. I'll check in a test if malloced memory is executable RSN. leo
Re: cvs commit: parrot/config/gen/platform ansi.c darwin.c generic.c openbsd.c platform_interface.h win32.c
Leopold Toetsch [EMAIL PROTECTED] wrote: I'll check in a test if malloced memory is executable RSN. The check is in. On fedora, the Configure line of JIT should have (has_exec_protect yes) If that's ok, some define in a header should be set. We currently seem to be missing a general solution to just define some item, the feature.h approach is almost overkill for such simple defines. I'd go with something like what's currently being done with the headers: if $PConfig{h_*} is defined a PARROT_HAS_HEADER_* is set. So for example: if $PConfig{d_*} is defined PARROT_HAS_* is defined to 1 else to 0 into a new generated include file, e.g. defines.h, which is included from config.h additionally to features.h. leo
[perl #24796] [PATCH] build errors on solaris
# New Ticket Created by Lars Balker Rasmussen # Please include the string: [perl #24796] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org/rt3/Ticket/Display.html?id=24796 Adding a hints file for solaris solves two thread-related link issues. -- Lars Balker Rasmussen Consult::Perl Index: MANIFEST === RCS file: /cvs/public/parrot/MANIFEST,v retrieving revision 1.525 diff -u -a -r1.525 MANIFEST --- MANIFEST 3 Jan 2004 14:01:43 - 1.525 +++ MANIFEST 3 Jan 2004 15:04:07 - @@ -175,6 +175,7 @@ config/init/hints/mswin32.pl [] config/init/hints/openbsd.pl [] config/init/hints/os2.pl [] +config/init/hints/solaris.pl [] config/init/hints/vms.pl [] config/init/manifest.pl [] config/init/miniparrot.pl [] Index: config/init/hints/solaris.pl === diff -u /dev/null config/init/hints/solaris.pl --- /dev/null Sat Jan 3 15:58:17 2004 +++ config/init/hints/solaris.pl Sat Jan 3 15:46:47 2004 @@ -1,0 +1,10 @@ +my $libs = Configure::Data-get('libs'); +if ( $libs !~ /-lpthread/ ) { +$libs .= ' -lpthread'; +} +if ( $libs !~ /-lrt\b/ ) { +$libs .= ' -lrt'; +} +Configure::Data-set( +libs = $libs, +);
[perl #24797] [PATCH] libnci build problem on FreeBSD
# New Ticket Created by Lars Balker Rasmussen # Please include the string: [perl #24797] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org/rt3/Ticket/Display.html?id=24797 nci_test.c included malloc.h which is deprecated on FreeBSD (and elsewhere I assume), without actually using any malloc-stuff as far as I could tell. According to #parrot, libnci.so is built automatically on some platforms, but not on my FreeBSD. I couldn't find any logic to do this - are you supposed to build libnci.so by hand if you want to test nci? -- Lars Balker Rasmussen Consult::Perl Index: src/nci_test.c === RCS file: /cvs/public/parrot/src/nci_test.c,v retrieving revision 1.13 diff -u -a -r1.13 nci_test.c --- src/nci_test.c 25 Dec 2003 12:02:25 - 1.13 +++ src/nci_test.c 3 Jan 2004 15:16:12 - @@ -1,5 +1,4 @@ #include stdio.h -#include malloc.h /* * cc -shared -fpic nci_test.c -o libnci.so -g * export LD_LIBRARY_PATH=.
[perl #24799] [PATCH] bug in find_chartype's chartype_create_from_mapping()
# New Ticket Created by Lars Balker Rasmussen # Please include the string: [perl #24799] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org/rt3/Ticket/Display.html?id=24799 chartype_create_from_mapping() (called from the op find_chartype) fails to notice that it has run out of file, so happily parses the last line twice. This leads to writing to array[-1]... -- Lars Balker Rasmussen Consult::Perl Index: src/chartype.c === RCS file: /cvs/public/parrot/src/chartype.c,v retrieving revision 1.21 diff -u -a -r1.21 chartype.c --- src/chartype.c 14 Nov 2003 20:27:02 - 1.21 +++ src/chartype.c 3 Jan 2004 16:10:34 - @@ -183,7 +183,7 @@ while (!feof(f)) { char *p = fgets(line, 80, f); -if (line[0] != '#') { +if (p *p != '#') { int n = sscanf(line, %li\t%li, typecode, unicode); if (n == 2 typecode = 0) { if (typecode 256 typecode == one2one
Re: Threads Design. A Win32 perspective.
LT == Leopold Toetsch [EMAIL PROTECTED] writes: LT These are platform specific details. We will use whatever the LT platform/OS provides. In the source code its a LOCK() UNLOCK() pair. LT The LOCK() can be any atomic operation and doesn't need to call the LT kernel, if the lock is aquired. if it doesn't call the kernel, how can a thread be blocked? you can't have user level locks without spinning. at some point (even with fibres) you need to make a kernel call so other threads can run. the macro layer will make the mainline source look better but you still need kernel calls in their definition. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
[PATCH?] Error in t/src/list.t on FreeBSD
I was seeing an error in the second test in t/src/list.t on FreeBSD: set_integer_keyed() not implemented in class 'PerlInt' I tracked it down to two consecutive calls to pmc_new() returning the same pointer, which is generally not what you want. Copying the following line from imcc/main.c to just before the first pmc_new() in the second test in t/src/list.t fixed the problem. interpreter-DOD_block_level++; But I'm unsure if this is the right way to go about it, or rather, if the line above belongs in Parrot_init() or elsewhere. Cheers, -- Lars Balker Rasmussen Consult::Perl
Re: Threads Design. A Win32 perspective.
On Sat, 3 Jan 2004 11:35:37 +0100, [EMAIL PROTECTED] (Leopold Toetsch) wrote: Nigel Sandever [EMAIL PROTECTED] wrote: VIRTUAL MACHINE INTERPRETER At any given point in the running of the interpreter, the VM register set, program counter and stack must represent the entire state for that thread. That's exactly, what a ParrotInterpreter is: the entire state for a thread. This is only true if a thread == interpreter. If a single interpreter can run 2 threads then that single interpreter cannot represent the state of both threads safely. I am completely against the idea of VM threads having a 1:1 relationship with interpreters. With 5005threads, multiple threads exist in a single interpreter. All VHLL level data is shared without duplication, but locking has to be performed on each entity. This model is more efficient than ithreads. However it was extremely difficult prevent onwanted interaction between the threads corrupting the internal state of the interpreter given the internal architecture of P5 and so it was abandoned. With ithreads, each thread is also a seperate interpreter. This insulates the internal state of one interpreter from the other but also insulates *all* perl level program data in one interpreter from the perl level data in the other. Spawning a new thread becomes a process of duplicating everything. The interpreter, the perl program, and all it existing data. Sharing data between the threads/interpreters is implemented by tieing the two copies of the variables to be shared and each time a STORE is performed in one thread, the same STORE has too be performed on the copy of that var held on every other threads dataspace. If 2 threads need to share a scalar, but the program has 10 other threads, then each write to the shared scalar requires the update of all 12 copies of that scalar. There is no way to indicate that you only need to share it between threads x y. With ithreads, there can be no shared references, so no shared objects and no shared compound data structures While these can be separated its not efficient. Please note that Parrot is a register-based VM. Switching state is *not* switching a stack-pointer and a PC thingy like in stack-based VMs. ... The runtime costs of duplicating all shared data between interpreters, tying all shared variables, added to the cost of locking, throw away almost all of the benefit of using threads. No, shared resources are not duplicated, nor tied. The locking of course remains, but that's it. The issues above are what make p5 ithreads almost unusable. If Parrot has found a way of avoiding these costs and limitations then everything I offered is a waste of time, because these are the issues was attempting to address. However, I have seen no indication here, in the sources or anywhere else that this is the case. I assume that the reason Dan opened the discussion up in the first place is because the perception by those looking on was that the p5 ithreads model was being suggested as the way Parrot was going to go. And the reaction from those wh have tried to make use of ithreads under p5 are all too aware that replicating them for Parrot would be . [phrase deleted as too emotionally charged:)] leo Nigel.
Re: Threads Design. A Win32 perspective.
At 18:20 + 1/3/04, Nigel Sandever wrote: Sharing data between the threads/interpreters is implemented by tieing the two copies of the variables to be shared and each time a STORE is performed in one thread, the same STORE has too be performed on the copy of that var held on every other threads dataspace. Hmmm is that true? My impression was (and that's the way I've implemented it in Thread::Tie) is that each STORE actually stores the value in a hidden background thread, and each FETCH obtains the current value from the background thread. I don't think each STORE is actually cascading through all of the threads. Not until they try to fetch the shared value, anyway. If 2 threads need to share a scalar, but the program has 10 other threads, then each write to the shared scalar requires the update of all 12 copies of that scalar. There is no way to indicate that you only need to share it between threads x y. That's true. But there is some implicit grouping. You can only create newly shared variables for the current thread and any thread that is started from the current thread. This is actually a pity, as that precludes you to start a bare (minimal number of Perl modules loaded, or with exactly the modules that _you_ want) thread at compile time, from which you could start other threads. With ithreads, there can be no shared references, so no shared objects and no shared compound data structures Actually, you can bless a reference to a shared variable, but you can't share a blessed object (the sharing will let you lose the content of the object). I think shared compound data structures _are_ possible, but very tricky to get right (because the CLONE sub is called as a package method, rather than as an object method: see Thread::Bless for an attempt at getting around this). While these can be separated its not efficient. Please note that Parrot is a register-based VM. Switching state is *not* switching a stack-pointer and a PC thingy like in stack-based VMs. ... The runtime costs of duplicating all shared data between interpreters, tying all shared variables, added to the cost of locking, throw away almost all of the benefit of using threads. No, shared resources are not duplicated, nor tied. The locking of course remains, but that's it. The issues above are what make p5 ithreads almost unusable. For more about this, see my article on Perl Monks: http://www.perlmonks.org/index.pl?node_id=288022 And the reaction from those wh have tried to make use of ithreads under p5 are all too aware that replicating them for Parrot would be . [phrase deleted as too emotionally charged:)] What can I say? ;-) Liz
Re: [perl #24799] [PATCH] bug in find_chartype's chartype_create_from_mapping()
Lars Balker Rasmussen wrote (via RT) chartype_create_from_mapping() (called from the op find_chartype) fails to notice that it has run out of file, so happily parses the last line twice. This leads to writing to array[-1]... -if (line[0] != '#') { +if (p *p != '#') { Patch applied. Many thanks Peter Gibbs EmKel Systems
Re: Threads Design. A Win32 perspective.
I'm trying to be constructive here. Some passages may appear to be blunt. Read at your own risk ;-) At 01:48 -0500 1/3/04, Uri Guttman wrote: NS == Nigel Sandever [EMAIL PROTECTED] writes: NS All that is required to protect an object from corruption through NS concurrent access and state change is to prevent two (or more) VMs NS trying to access the same object simultaneously. In order for the VM to NS address the PMC and must load it into a VM register, it must know where NS it is. Ie. It must have a pointer. It can use this pointer to access NS the PMC with a Bit Test and Set operation. (x86 BTS) NS [http://www.itis.mn.it/linux/quarta/x86/bts.htm] This is a CPU atomic NS operation. This occurs before the VM enters the VM operations critical NS section. ding! ding! ding! you just brought in a cpu specific instruction which is not guaranteed to be on any other arch. in fact many have such a beast but again, it is not accessible from c. I just _can't_ believe I'm hearing this. So what if it's not accessible from C? Could we just not build a little C-program that would create a small in whatever loadable library? Or have a post-processing run through the binary image inserting the right machine instructions in the right places? Not being from a *nix background, but more from a MS-DOS background, I've been used to inserting architecture specific machine codes from higher level languages into executable streams since 1983! Don't tell me that's not done anymore? ;-) .. and still some archs don't have it so it has to be emulated by dijkstra's algorithm. and that requires two counters and a little chunk of code. Well, I guess those architectures will have to live with that. If that's what it takes? you can't bring x86 centrism into this. the fact that redmond/intel threads can make use of this instruction to do 'critical sections' is a major point why it is bad for a portable system with many architectures to be supported. I disagree. I'm not a redmond fan, so I agree with a lot of your _sentiment_, but you should also realize that a _lot_ of intel hardware is running Linux. Heck, even some Solarises run on it. I'm not saying that the intel CPU's are superior to others. They're probably not. But it's just as with cars: most of the cars get people from A to B. They're not all Maserati's. Or Rolls Royce's. Most of them are Volkswagen, Fiat and whatever compacts you guys drive in the States. ;-) I don't think we're making Parrot to run well on Maserati's and Rolls Royce's only. We want to reach the Volkswagens. And not even today's Volkswagens: by the time Perl 6 comes around, CPU's will have doubled in power yet _again_! The portability is in Parrot itself: not by using the lowest common denominator of C runtime systems out there _today_! It will take a lot of trouble to create a system that will run everywhere, but that's just it what makes it worthwhile. Not that it offers the same limited capabilities on all systems! i am well aware about test/set and atomic instructions. my point in my previous reply was that it can't be assumed to be on a particular arch. so it has to be assumed to be on none and it needs a pure software solution. sure a platform specific macro/assembler lib could be done but that is a pain as well. Indeed. A pain, probably. Maybe not so much. I can't believe that Sparc processors are so badly designed that they don't offer something similar as what Nigel suggested for Intel platforms. again, intel specific and it has to be tossed out. Again, I think you're thinking too much inside the non-Wintel box. You stated yourself just now ...in fact many have such beast..., so maybe it should first be investigated which platforms do suppport this and which don't, and then decide whether it is a good idea or an idea to be tossed? virtual ram is what counts on unix. you can't request some large amount without using real swap space. it may not allocate real ram until later (page faulting on demand) but it is swap space that counts. it is used up as soon as you allocate it. So, maybe a wrapper is, either for *nix, or for Win32, or maybe both. what win32 does is not portable and not useful at a VM level. kernels and their threads can work well together. portable VMs and their threads are another beast that can't rely on a particular architecture instruction or kernel feature. This sounds too much like dogma to me. Why? Isn't Parrot about borgifying all good things from all OS's and VM's, now and in the future? ;-) hardware is tossed out with portable VM design. we have a user space process written in c with a VM and VM threads that are to be based on kernel threads. the locking issues for shared objects is the toughest nut to crack right now. there is no simple or fast solution to this given the known contraints. intel/redmond specific solutions are not applicable (though we can learn from them). Then
Re: Threads Design. A Win32 perspective.
On Jan 3, 2004, at 11:19 AM, Elizabeth Mattijsen wrote: At 01:48 -0500 1/3/04, Uri Guttman wrote: NS == Nigel Sandever [EMAIL PROTECTED] writes: NS All that is required to protect an object from corruption through NS concurrent access and state change is to prevent two (or more) VMs NS trying to access the same object simultaneously. In order for the VM to NS address the PMC and must load it into a VM register, it must know where NS it is. Ie. It must have a pointer. It can use this pointer to access NS the PMC with a Bit Test and Set operation. (x86 BTS) NS [http://www.itis.mn.it/linux/quarta/x86/bts.htm] This is a CPU atomic NS operation. This occurs before the VM enters the VM operations critical NS section. ding! ding! ding! you just brought in a cpu specific instruction which is not guaranteed to be on any other arch. in fact many have such a beast but again, it is not accessible from c. I just _can't_ believe I'm hearing this. So what if it's not accessible from C? Could we just not build a little C-program that would create a small in whatever loadable library? Or have a post-processing run through the binary image inserting the right machine instructions in the right places? Not being from a *nix background, but more from a MS-DOS background, I've been used to inserting architecture specific machine codes from higher level languages into executable streams since 1983! Don't tell me that's not done anymore? ;-) Yes, you are correct--we are already using bits of assembly in parrot. C compilers tend to allow you to insert bits of inline assembly, and we are taking advantage of that--for instance, look for __asm__ in the following files: jit/arm/jit_emit.h jit/ppc/jit_emit.h src/list.c src/malloc.c Also, JIT is all about generating platform-specific machine instructions at runtime. So it's certainly do-able, and right along the lines of of what we are already doing. JEff
Re: [perl #24797] [PATCH] libnci build problem on FreeBSD
Lars Balker Rasmussen [EMAIL PROTECTED] wrote: According to #parrot, libnci.so is built automatically on some platforms, but not on my FreeBSD. No, its currently not built automatically. - are you supposed to build libnci.so by hand if you want to test nci? Yep. AFAIK are we lacking a config hint, that dynamic loading is ok. leo
Re: Threads Design. A Win32 perspective.
Uri Guttman [EMAIL PROTECTED] wrote: LT == Leopold Toetsch [EMAIL PROTECTED] writes: LT These are platform specific details. We will use whatever the LT platform/OS provides. In the source code its a LOCK() UNLOCK() pair. LT The LOCK() can be any atomic operation and doesn't need to call the LT kernel, if the lock is aquired. if it doesn't call the kernel, how can a thread be blocked? I wrote, *if* the lock is aquired. That's AFAIK the fast path of a futex or of the described Win32 behavior. The slow path is always a kernel call (or a some rounds spinning before ...) But anyway, we don't reinvent these locking primitives. uri leo
Re: Threads Design. A Win32 perspective.
On Jan 3, 2004, at 10:08 AM, Elizabeth Mattijsen wrote: At 12:15 -0500 1/3/04, Uri Guttman wrote: LT == Leopold Toetsch [EMAIL PROTECTED] writes: LT These are platform specific details. We will use whatever the LT platform/OS provides. In the source code its a LOCK() UNLOCK() pair. LT The LOCK() can be any atomic operation and doesn't need to call the LT kernel, if the lock is aquired. if it doesn't call the kernel, how can a thread be blocked? Think out of the box! Threads can be blocked in many ways. My forks.pm module uses sockets to block threads ;-). IO operations which block like that end up calling into the kernel. But I believe it is usually possible to acquire an uncontested lock without calling into the kernel. When you do need to block (when trying to acquire a lock which is already held by another thread) you may need to enter the kernel. But I think that Leo's point was that in the common case of a successful lock operation, it may not be necessary. JEff
Re: Threads Design. A Win32 perspective.
Nigel Sandever [EMAIL PROTECTED] wrote: On Sat, 3 Jan 2004 11:35:37 +0100, [EMAIL PROTECTED] (Leopold Toetsch) wrote: That's exactly, what a ParrotInterpreter is: the entire state for a thread. This is only true if a thread == interpreter. If a single interpreter can run 2 threads then that single interpreter cannot represent the state of both threads safely. Yep. So if a single interpreter (which is almost a thread state) should run two threads, you have to allocate and swap all. What should the advantage of such a solution be? With 5005threads, multiple threads exist in a single interpreter. These are obsolete. With ithreads, each thread is also a seperate interpreter. Spawning a new thread becomes a process of duplicating everything. The interpreter, the perl program, and all it existing data. Partly yes. A new interpreter is created, the program, i.e. the opcode stream is *not* duplicated, but JIT or prederef information has to be rebuilt (on demand, if that run-core is running), and existing non-shared data items are cloned. Sharing data between the threads/interpreters is implemented by tieing Parrot != perl5.ithreads If Parrot has found a way of avoiding these costs and limitations then everything I offered is a waste of time, because these are the issues was attempting to address. I posted a very premature benchmark result, where an unoptimized Parrot build is 8 times faster then the equivalent perl5 code. And the reaction from those wh have tried to make use of ithreads under p5 are all too aware that replicating them for Parrot would be . [phrase deleted as too emotionally charged:)] I don't know how ithreads are working internally WRT the relevant issues like object allocation and such. But threads at the OS level provide shared code and data segments. So at the VM level you have to unshare non-shared resources at thread creation. You can copy objects lazily and make 2 distinct items when writing, or you copy them in the first place. But you have these costs at thread start - and not later. Nigel. leo
Re: Threads Design. A Win32 perspective.
EM == Elizabeth Mattijsen [EMAIL PROTECTED] writes: ding! ding! ding! you just brought in a cpu specific instruction which is not guaranteed to be on any other arch. in fact many have such a beast but again, it is not accessible from c. EM I just _can't_ believe I'm hearing this. So what if it's not EM accessible from C? Could we just not build a little C-program that EM would create a small in whatever loadable library? Or have a EM post-processing run through the binary image inserting the right EM machine instructions in the right places? Not being from a *nix EM background, but more from a MS-DOS background, I've been used to EM inserting architecture specific machine codes from higher level EM languages into executable streams since 1983! Don't tell me that's EM not done anymore? ;-) it is not that it isn't done anymore but the effect has to be the same on machines without test/set. and on top of that, it still needs to be a kernel level operation so a thread can block on the lock. that is the more important issue that makes using a test/set in user space a moot problem. EM I disagree. I'm not a redmond fan, so I agree with a lot of your EM _sentiment_, but you should also realize that a _lot_ of intel EM hardware is running Linux. Heck, even some Solarises run on it. we are talking maybe 10-20 architectures out there that we would want parrot to run on. maybe more. how many does p5 run on now? EM The portability is in Parrot itself: not by using the lowest common EM denominator of C runtime systems out there _today_! It will take a EM lot of trouble to create a system that will run everywhere, but that's EM just it what makes it worthwhile. Not that it offers the same limited EM capabilities on all systems! but we need a common denominator of OS features more than one for cpu features. the fibre/thread stuff is redmond only. and they still require system calls. so as i said the test/set is not a stopping point (dijkstra) but the OS support is. how and where and when we lock is the only critical factor and that hasn't been decided yet. we don't want to lock at global thread levels and we are not sure we can lock at PMC or object levels (GC and alloc can break that). we should be focusing on that issue. think about how DBs did it. sybase used to do page locking (coarse grained) since it was faster (this was 15 years ago) and they had the fastest engine. but when multithreading and multicpu designs came in a finer grained row locking was faster (oracle). sybase fell behind and has not caught up. we have the same choices to make so we need to study locking algorithms and techniques from that perspective and not how to do a single lock (test/set vs kernel). but i will keep reiterating that it has to be a kernel lock since we must block threads and GC and such without spinning or manual scheduling (fibres). virtual ram is what counts on unix. you can't request some large amount without using real swap space. it may not allocate real ram until later (page faulting on demand) but it is swap space that counts. it is used up as soon as you allocate it. EM So, maybe a wrapper is, either for *nix, or for Win32, or maybe both. this is very different behavior IMO and not something that can be wrapped easily. i could be wrong but separating virtual allocation from real allocation can't be emulated without kernel support. and we need the same behavior on all platforms. this again brings up how we lock so that GC/alloc will work properly with threads. do we lock a thread pool but not the thread when we access a shared thingy? that is a medium grain lock. can the GC/alloc break the lock if it is called inside that operation? or could only the pool inside the active thread do that? what about an shared object alloced from thread A's pool but it triggers an alloc when being accessed in thread B. these are the questions that need to be asked and answered. i was just trying to point out to nigel that the intel/redmond solutions are not portable as they require OS support and that all locks need to be kernel level. given that requirement, we need to decide how to do the locks so those questions can be answered with reasonable efficiency. of course a single global lock would work but that stinks and we all know it. so what is the lock granularity? how do we handle GC/alloc across shared objects? EM This sounds too much like dogma to me. Why? Isn't Parrot about EM borgifying all good things from all OS's and VM's, now and in the EM future? ;-) but parrot can only use a common set of features across OS's. we can't use a redmond feature that can't be emulated on other platforms. and my karma ran over my dogma :( :-) hardware is tossed out with portable VM design. we have a user space process written in c with a VM and VM threads that are to be based on kernel threads. the locking issues for shared objects is the toughest
Re: Threads Design. A Win32 perspective.
LT == Leopold Toetsch [EMAIL PROTECTED] writes: LT Uri Guttman [EMAIL PROTECTED] wrote: LT == Leopold Toetsch [EMAIL PROTECTED] writes: LT These are platform specific details. We will use whatever the LT platform/OS provides. In the source code its a LOCK() UNLOCK() pair. LT The LOCK() can be any atomic operation and doesn't need to call the LT kernel, if the lock is aquired. if it doesn't call the kernel, how can a thread be blocked? LT I wrote, *if* the lock is aquired. That's AFAIK the fast path of a futex LT or of the described Win32 behavior. The slow path is always a kernel LT call (or a some rounds spinning before ...) LT But anyway, we don't reinvent these locking primitives. ok, i missed the 'if' there. :) that could be workable and might be faster. it does mean that locks are two step as well, user space test/set and fallback to kernel lock. we can do what nigel said and wrap the test/set in macros and use assembler to get at it on platforms that have it or fallback to dijkstra on those that don't. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: Threads Design. A Win32 perspective.
EM == Elizabeth Mattijsen [EMAIL PROTECTED] writes: EM At 12:15 -0500 1/3/04, Uri Guttman wrote: LT == Leopold Toetsch [EMAIL PROTECTED] writes: LT These are platform specific details. We will use whatever the LT platform/OS provides. In the source code its a LOCK() UNLOCK() pair. LT The LOCK() can be any atomic operation and doesn't need to call the LT kernel, if the lock is aquired. if it doesn't call the kernel, how can a thread be blocked? EM Think out of the box! EM Threads can be blocked in many ways. My forks.pm module uses sockets EM to block threads ;-). i used that design as well. a farm of worker threads blocked on a pipe (socketpair) to the same process. the main event loop handled the other side. worked very well. EM It sucks performance wise, but it beats the current perl ithreads EM implementation on many platforms in many situations. i can believe that. EM Therefore my motto: whatever works, works. but i discussed that solution with dan and he shot it down for speed reasons IIRC. i still think it is an interesting solution. it could also be used for the main event queue and/or loop as i mention above. we are assuming some form of sockets on all platforms IIRC, so we can use socketpair for that. i even use socketpair on win32 to test a (pseudo)fork thing for file::slurp. ...you can't have user level locks without spinning. at some point (even with fibres) you need to make a kernel call so other threads can run. EM Possibly. I don't know enough of the specifics whether this is EM true or not. i looked at the docs for fibres and they say you do a manual reschedule by selecting the fibre to run next or i think a yield. but something has to go to the kernel since even fibres are kernel thingys. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: Threads Design. A Win32 perspective.
On Sat, Jan 03, 2004 at 08:05:13PM +0100, Elizabeth Mattijsen wrote: At 18:20 + 1/3/04, Nigel Sandever wrote: Sharing data between the threads/interpreters is implemented by tieing the two copies of the variables to be shared and each time a STORE is performed in one thread, the same STORE has too be performed on the copy of that var held on every other threads dataspace. Hmmm is that true? My impression was (and that's the way I've implemented it in Thread::Tie) is that each STORE actually stores the value in a hidden background thread, and each FETCH obtains the current value from the background thread. I don't think each STORE is actually cascading through all of the threads. Not until they try to fetch the shared value, anyway. Sharing consists of the real SV living in a shared interpreter, with each individual thread having a lightweight proxy SV that causes the appropriate real SV to be accessed/updated by a mixture or magic and/or tied-ish access. A particular access by one thread does not involve any of the other threads or their proxies. With ithreads, there can be no shared references, so no shared objects and no shared compound data structures Actually, you can bless a reference to a shared variable, but you can't share a blessed object (the sharing will let you lose the content of the object). I think shared compound data structures _are_ possible, but very tricky to get right (because the CLONE sub is called as a package method, rather than as an object method: see Thread::Bless for an attempt at getting around this). Nested shared structures work just fine, and there's no need to mess with CLONE for plain data. And the reaction from those wh have tried to make use of ithreads under p5 are all too aware that replicating them for Parrot would be . [phrase deleted as too emotionally charged:)] It's the implementation of ithreads in p5 that sucks, not the concept itself. The use of COW makes new thread creation cheap, and the use of lock PMCs interposed on the real PMCs makes shareing easy. Dave. -- O Unicef Clearasil! Gibberish and Drivel! - Bored of the Rings
Re: Threads Design. A Win32 perspective.
On Jan 3, 2004, at 12:26 PM, Uri Guttman wrote: LT These are platform specific details. We will use whatever the LT platform/OS provides. In the source code its a LOCK() UNLOCK() pair. LT The LOCK() can be any atomic operation and doesn't need to call the LT kernel, if the lock is aquired. if it doesn't call the kernel, how can a thread be blocked? LT I wrote, *if* the lock is aquired. That's AFAIK the fast path of a futex LT or of the described Win32 behavior. The slow path is always a kernel LT call (or a some rounds spinning before ...) LT But anyway, we don't reinvent these locking primitives. ok, i missed the 'if' there. :) that could be workable and might be faster. it does mean that locks are two step as well, user space test/set and fallback to kernel lock. we can do what nigel said and wrap the test/set in macros and use assembler to get at it on platforms that have it This is probably already done inside of pthread (and Win32) locking implementations--a check in userspace before a kernel call. It's also important to remember that it's not a two-step process most of the time, since most of the locking we are taking about is likely to be uncontested (ie, the lock acquisition will succeed most of the time, and no blocking will be needed). If we _do_ have major lock contention in our internal locking, those will be areas calling for a redesign. JEff
Re: Threads Design. A Win32 perspective.
On Sat, 3 Jan 2004 21:00:31 +0100, [EMAIL PROTECTED] (Leopold Toetsch) wrote: That's exactly, what a ParrotInterpreter is: the entire state for a thread. This is only true if a thread == interpreter. If a single interpreter can run 2 threads then that single interpreter cannot represent the state of both threads safely. Yep. So if a single interpreter (which is almost a thread state) should run two threads, you have to allocate and swap all. When a kernel level thead is spawned, no duplication of application memory is required, Only a set of registers, program counter and stack. These represent the entire state of that thread. If a VM thread mirrors this, by duplicating the VM program counter, VM registers and VM stack, then this VM thread context can also avoid the need to replicate the rest of the program data (interpreter). What should the advantage of such a solution be? The avoidance of duplication. Transparent interlocking of VHLL fat structures performed automatically by the VM itself. No need for :shared or lock(). With 5005threads, multiple threads exist in a single interpreter. These are obsolete. ONLY because they couldn't be made to work properly. The reason that was true are entirely due to the architecture of P5. Dan Sugalski suggested in this list back in 2001, that he would prefer pthreads to ithreads. I've used both in p5, and pthreads are vastly more efficient, but flaky and difficult to use well. These limitations are due to the architecture upon which they were built. My interest is in seeing the Parrot architecture not exclude them. With ithreads, each thread is also a seperate interpreter. Spawning a new thread becomes a process of duplicating everything. The interpreter, the perl program, and all it existing data. Partly yes. A new interpreter is created, the program, i.e. the opcode stream is *not* duplicated, but JIT or prederef information has to be rebuilt (on demand, if that run-core is running), and existing non-shared data items are cloned. Only duplicating shared data on demand (COW) may work well on systems that support COW in the kernel. But on systems that don't, this has to be emulated in user space, with all the inherent overhead that implies. My desire was that the VM_Spawn_Thread VM_Share_PMC and VM_Lock_PMC opcodes could be coded such that those platforms where the presence of kernel level COW and other native features mean that the ithreads-style model of VMthread == kernel thread + interpreter is the best way to go, then that would be the underlying implementation. On those platforms where VMthread == kernel thread + VMthread context is the best way, then that would be the underlying implementation. In order for this to be possible, it implies a certain level of support for both be engrained in the design of the interpreter. My (long) oroginal post, with all the subjects covered and details given was my attempt to describe the support required in the design for the latter. It would be necessary to consider all the elements, and the way they intereact, and take these into consideration when implementing Parrots threading in order that this would be achievable. Each element, the seraration of the VMstate from the interpreter state, the atomisation of VM operations, the automated detection and locking of concurrect access attempts and the serialisation of the VM threads when it is detected all need support at the highest level before they may be implemented at the lowest (platform specific) levels. It simply isn't possible to implement them on one platform at the lowest levels unless the upper levels of the design are contructed with the possibilities in mind. Sharing data between the threads/interpreters is implemented by tieing Parrot != perl5.ithreads If Parrot has found a way of avoiding these costs and limitations then everything I offered is a waste of time, because these are the issues was attempting to address. I posted a very premature benchmark result, where an unoptimized Parrot build is 8 times faster then the equivalent perl5 code. And the reaction from those wh have tried to make use of ithreads under p5 are all too aware that replicating them for Parrot would be . [phrase deleted as too emotionally charged:)] I don't know how ithreads are working internally WRT the relevant issues like object allocation and such. But threads at the OS level provide shared code and data segments. So at the VM level you have to unshare non-shared resources at thread creation. You only need to copy them, if the two threads can attempt to modify the contents of the objects concurrently. By precluding this possibility, by atomising VMthread level operations by preventing a new VM thread form being scheduled until any othe VM thread completes its current operation and ensuring that each VMthreads state is in a complete and coherent state before another VM
Re: Threads Design. A Win32 perspective.
JC == Jeff Clites [EMAIL PROTECTED] writes: JC On Jan 3, 2004, at 12:26 PM, Uri Guttman wrote: that could be workable and might be faster. it does mean that locks are two step as well, user space test/set and fallback to kernel lock. we can do what nigel said and wrap the test/set in macros and use assembler to get at it on platforms that have it JC This is probably already done inside of pthread (and Win32) JC locking implementations--a check in userspace before a kernel JC call. It's also important to remember that it's not a two-step JC process most of the time, since most of the locking we are taking JC about is likely to be uncontested (ie, the lock acquisition will JC succeed most of the time, and no blocking will be needed). If we JC _do_ have major lock contention in our internal locking, those JC will be areas calling for a redesign. i meant in the coding and not necessarily at runtime. we still need to address when and where locking happens. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: Threads Design. A Win32 perspective.
At 21:11 + 1/3/04, Dave Mitchell wrote: On Sat, Jan 03, 2004 at 08:05:13PM +0100, Elizabeth Mattijsen wrote: Actually, you can bless a reference to a shared variable, but you can't share a blessed object (the sharing will let you lose the content of the object). I think shared compound data structures _are_ possible, but very tricky to get right (because the CLONE sub is called as a package method, rather than as an object method: see Thread::Bless for an attempt at getting around this). Nested shared structures work just fine, and there's no need to mess with CLONE for plain data. Indeed. But as soon as there is something special such as a datastructure external to Perl between threads (which happens automatically shared automatically, because Perl doesn't know about the datastructure, so the cloned objects point to the same memory address), then you're in trouble. Simply because you now have multiple DESTROYs called on the same external data-structure. If the function of the DESTROY is to free the memory of the external data-structure, you're in trouble as soon as the first thread is done. ;-( And the reaction from those wh have tried to make use of ithreads under p5 are all too aware that replicating them for Parrot would be . [phrase deleted as too emotionally charged:)] It's the implementation of ithreads in p5 that sucks, not the concept itself. The use of COW makes new thread creation cheap, and the use of lock PMCs interposed on the real PMCs makes shareing easy. I agree that Perl ithreads as *a* concept are ok. The same could be said about what are now referred to as 5.005 threads. Ok as *a* concept. And closer to what many non-Perl people consider to be threads. Pardon my French, but both suck in the implementation. And it is not for lack of effort by the people who developed it. It is for lack of a good foundation to build on. And we're talking foundation here now, we all want to make sure it is the best, earth-quake proofed, rocking foundation we can get! Liz
Thread notes
First, I'm not paying much attention. Maybe next week. However, as messages that Eudora tags with multiple chiles tend to get my attention, be aware that the following are non-negotiable: 1) We are relying on OS services for all threading constructs We are not going to count on 'atomic' operations, low-level assembly processor guarantees, and we are definitely *not* rolling our own threading constructs of any sort. They break far too often in the face of SMP, new processors having different ordering of writes, and odd hardware issues. Yes, I know there's a But... for each of these. Please don't, I'll just have to get cranky. We shall use the system thread primitives and functions. 2) The only thread constructs we are going to count on are: *) Abstract, non-recursive, simple locks *) Rendezvous points (Things threads go to sleep on until another thread pings the condition) *) Semaphores (in the I do a V and P operation, with a count) We are *not* counting on being able to kill or freeze a thread from any other thread. Nor are we counting on recursive locks, read/write locks, nor any other things. Unfortunately. 3) I'm still not paying much attention. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
[perl #24802] [PATCH] Minor file reading bug in debug.c
# New Ticket Created by Lars Balker Rasmussen # Please include the string: [perl #24802] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org/rt3/Ticket/Display.html?id=24802 Following the file reading bug in chartype.c, I checked the rest of parrot for fopen's, to see if there were similar errors. A minor similar one was found in src/debug.c. -- Lars Balker Rasmussen Consult::Perl Index: src/debug.c === RCS file: /cvs/public/parrot/src/debug.c,v retrieving revision 1.116 diff -u -a -r1.116 debug.c --- src/debug.c 13 Dec 2003 15:01:17 - 1.116 +++ src/debug.c 3 Jan 2004 22:06:11 - @@ -1533,8 +1533,8 @@ PDB_load_source(struct Parrot_Interp *interpreter, const char *command) { FILE *file; -char f[255], c; -int i; +char f[255]; +int i, c; unsigned long size = 0; PDB_t *pdb = interpreter-pdb; PDB_file_t *pfile; @@ -1566,15 +1566,14 @@ pfile-line = pline; pline-number = 1; -while (!feof(file)) { -c = (char)fgetc(file); +while ((c = fgetc(file)) != EOF) { /* Grow it */ if (++size == 1024) { pfile-source = mem_sys_realloc(pfile-source, (size_t)pfile-size + 1024); size = 0; } -pfile-source[pfile-size] = c; +pfile-source[pfile-size] = (char)c; pfile-size++;
Re: Threads Design. A Win32 perspective.
Uri Guttman [EMAIL PROTECTED] wrote: ok, i missed the 'if' there. :) that could be workable and might be faster. it does mean that locks are two step as well, user space test/set and fallback to kernel lock. Yep, that is, what the OS provides. I really don't like to reinvent wheels here - ehem, and nowhere else. uri leo
Re: Threads Design. A Win32 perspective.
Uri Guttman [EMAIL PROTECTED] wrote: ... this again brings up how we lock so that GC/alloc will work properly with threads. do we lock a thread pool but not the thread when we access a shared thingy? This is the major issue, how to continue. Where are shared objects living (alloc) and where and how are these destroyed (DOD/GC). But first I'd like to hear some requirements defined, e.g.: A spawns B, C threads with shared $a B spawns D, E threads with shared $b - is $b shared in A or C - is $a shared in D or E - are there any (HLL) decisions to decide what is shared - and so on If that isn't layed out we don't need to talk about locking a certain thread pool. ...how do we handle GC/alloc across shared objects? I posted a proposal for that :) uri leo
Re: Threads Design. A Win32 perspective.
Elizabeth Mattijsen [EMAIL PROTECTED] wrote: Indeed. But as soon as there is something special such as a datastructure external to Perl between threads (which happens automatically shared automatically, because Perl doesn't know about the datastructure, Why is it shared automatically? Do you have an example for that? But anyway an interesting problem, I didn't consider until yet - thanks :) ... so the cloned objects point to the same memory address), then you're in trouble. Simply because you now have multiple DESTROYs called on the same external data-structure. If the function of the DESTROY is to free the memory of the external data-structure, you're in trouble as soon as the first thread is done. ;-( Maybe that DOD/GC can help here. A shared object can and will be destroyed only, when the last holder of that object has released it. [ perl5 thread concepts ] Pardon my French, but both suck in the implementation. And it is not for lack of effort by the people who developed it. The problem for sure was, to put threads on top of a working interpreter and a commonly used language. Parrots design is based on having threads, events, async IO in mind. It was surprisingly simple to implement these first steps that are running now. Separating the HLL layer from the engine probably helps a lot for such rather major design changes. now, we all want to make sure it is the best, earth-quake proofed, rocking foundation we can get! Yep. So again, your input is very welcome, Liz leo
Re: Threads Design. A Win32 perspective.
Nigel Sandever [EMAIL PROTECTED] wrote: On Sat, 3 Jan 2004 21:00:31 +0100, [EMAIL PROTECTED] (Leopold Toetsch) wrote: Yep. So if a single interpreter (which is almost a thread state) should run two threads, you have to allocate and swap all. When a kernel level thead is spawned, no duplication of application memory is required, Only a set of registers, program counter and stack. These represent the entire state of that thread. Here is the current approach, I've implemented partly: The state of a thread is basically the interpreter structure - that's it. In terms of Parrot at thread (a ParrotThread PMC) is derived from an interpreter (a ParrotInterpreter PMC). Please remember, Parrot is a register based VM and has a lot of registers. The whole representation of a VM thread is more and different to a kernel thread. While scheduling a kernel thread is only swapping above items, a VM level thread scheduler would have to swap much more. If a VM thread mirrors this, by duplicating the VM program counter, VM registers and VM stack, then this VM thread context can also avoid the need to replicate the rest of the program data (interpreter). You are again missing here: the interpreter is above VM state - the rest is almost nothing. So the interpreter := thread approach holds. You can't run a even a single - the one and only - thread without these necessary data and that's just called interpreter in Parrot speak. ... No need for :shared or lock(). That's the - let's say - type 4 of Dan's layout of different threading models. Everything is shared by default. That's similar to the shared PMC type 3 model - except that no objects have to be copied. It for sure depends on the user code, if one or the other model will have better performance, so the user can choose. We will provide both. Only duplicating shared data on demand (COW) may work well on systems that support COW in the kernel. No, we are dealing with VM objects and structures here - no kernel is involved for COWed copies of e.g. strings. [ snips ] Each element, the seraration of the VMstate from the interpreter state, VM = Virtual machine = interpreter These can't be separated as they are the same. the atomisation of VM operations, Different VMs can run on different CPUs. Why should we make atomic instructions out if these? We have a JIT runtime performing at 1 Parrot instruction per CPU instruction for native integers. Why should we slow down that by a magnitude of many tenths? We have to lock shared data, then you have to pay the penalty, but not for each piece of code. You only need to copy them, if the two threads can attempt to modify the contents of the objects concurrently. I think, that you are missing multiprocessor systems totally. leo
Re: Thread notes
Dan Sugalski [EMAIL PROTECTED] wrote: 2) The only thread constructs we are going to count on are: *) Abstract, non-recursive, simple locks *) Rendezvous points (Things threads go to sleep on until another thread pings the condition) *) Semaphores (in the I do a V and P operation, with a count) All d'accord with above but: I'm not sure yet, but /me thinks that we need to have a CLEANUP_PUSH and _POP handler functionality too. But these are basically macros (simple in the absence of pthread_kill or such) and currently already used :) 3) I'm still not paying much attention. May I ask why? -- Yes :) Why? leo
Re: Threads Design. A Win32 perspective.
On Jan 3, 2004, at 2:59 PM, Leopold Toetsch wrote: Nigel Sandever [EMAIL PROTECTED] wrote: Only duplicating shared data on demand (COW) may work well on systems that support COW in the kernel. No, we are dealing with VM objects and structures here - no kernel is involved for COWed copies of e.g. strings. And also, COW at the OS level (that is, of memory pages) doesn't help, because we have complex data structures filled with pointers, so copying them involves more than just duplicating a block of memory. We can use an approach similar to what we do for strings to make a COW copy of, for instance, the globals stash, but overall that will only be a speed improvement if the data structure is rarely modified. (That is, once it's modified, we will have paid the price. Unless we have clever data structure which can be COWed in sections.) Just adding to what Leo already said. JEff
Re: Thread Question and Suggestion -- Matt
On Sat, 2004-01-03 at 17:24, Matt Fowles wrote: I have a naive question: Why must each thread have its own interpreter? ~handwavy, high-level answer~ For the same reason each thread in C, for example, needs its own stack pointer. Since Parrot's a register machine, each thread needs its own set of registers so it can go off and do its own thing without whomping all over the other threads. Those registers live in each interpreter. -- c
Re: Thread Question and Suggestion -- Matt
On Jan 3, 2004, at 5:24 PM, Matt Fowles wrote: All~ I have a naive question: Why must each thread have its own interpreter? The short answer is that the bulk of the state of the virtual machine (including, and most importantly, its registers and register stacks) needs to be per-thread, since it represents the execution context which is logically thread-local. Stuff like the globals stash may or may not be shared (depending on the thread semantics we want), but as I understand it the potentially shared stuff is actually only a small part of the bits making up the VM. That said, I do think we have a terminology problem, since I initially had the same question you did, and I think my confusion mostly stems from there being no clear terminology to distinguish between 2 interpreters which are completely independent, and 2 interpreters which represent 2 threads of the same program. In the latter case, those 2 interpreters are both part of one something, and we don't have a name for that something. It would be clearer to say that we have two threads in one interpreter, and just note that almost all of our state lives in the thread structure. (That would mean that the thing which is being passed into all of our API would be called the thread, not the interpreter, since it's a thread which represents an execution context.) It's just (or mostly) terminology, but it's causing confusion. Why not have the threads that share everything share interpreters. We can have these threads be within the a single interpreter thus eliminating the need for complicated GC locking and resource sharing complexity. Because all of these threads will be one kernel level thread, they will not actually run concurrently and there will be no need to lock them. We will have to implement a rudimentary scheduler in the interpreter, but I don't think that is actually that hard. There are 2 main problems with trying to emulate threads this way: 1) It would likely kill the performance gains of JIT. 2) Calls into native libraries could block the entire VM. (We can't manually timeslice external native code.) Even things such as regular expressions can take an unbounded amount of time, and the internals of the regex engine will be in C--so we couldn't timeslice without slowing them down. And basically, people are going to want real threads--they'll want access to the full richness and power afforded by an API such as pthreads, and the real threading libraries (and the OS) have already done all of the really hard work. This allows threads to have completely shared state, at the cost of not being quite as efficient on SMP (they might be more efficient on single processors as there are fewer kernel traps necessary). Not likely to be more efficient even on a single processor, since even if a process is single threaded it is being preempted by other processes. (On Mac OS X, I've not been able to create a case where being multithreaded is a slowdown in the absence of locking--even for pure computation on a single processor machine, being multithreaded is actually a slight performance gain.) Programs that want to run faster on an SMP will use threads without shared that use events to communicate. It's nice to have speed gains on MP machines without having to redesign your application, especially as MP machines are quickly becoming the norm. (which probably provides better performance, as there will be fewer faults to main memory because of cache misses and shared data). Probably not faster actually, since you'll end up with more data copying (and more total data). I understand if this suggestion is dismissed for violating the rules, but I would like an answer to the question simply because I do not know the answer. I hope my answers are useful. I think it's always okay to ask questions. JEff
Re: Thread notes
At 1:11 AM +0100 1/4/04, Leopold Toetsch wrote: Dan Sugalski [EMAIL PROTECTED] wrote: 2) The only thread constructs we are going to count on are: *) Abstract, non-recursive, simple locks *) Rendezvous points (Things threads go to sleep on until another thread pings the condition) *) Semaphores (in the I do a V and P operation, with a count) All d'accord with above but: I'm not sure yet, but /me thinks that we need to have a CLEANUP_PUSH and _POP handler functionality too. But these are basically macros (simple in the absence of pthread_kill or such) and currently already used :) I wasn't listing anything we can build ourselves -- arguably we only need two of the three thigns I listed, since with semaphores you can do the rendezvous things (POSIX condition variables, but I'm sure windows has something similar) and vice versa. 3) I'm still not paying much attention. May I ask why? -- Yes :) Why? Got a killer deadline at work. Things ease up after the 9th if I make it, but then I owe someone else at home a few days of time. :) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Thread notes
At 11:42 PM + 1/3/04, Nigel Sandever wrote: 03/01/04 23:20:17, Dan Sugalski [EMAIL PROTECTED] wrote: [Dan getting cranky snipped] And that was that! Sorry I spoke. I'm not trying to shut anyone down. What I wanted to do was stop folks diving down too low a level. Yes, we could roll our own mutexes, condition variables, and semaphores, but we're not going to; it's far too system--not just architecture or OS specific, but system setup specific. Single-processor systems want to context switch on mutex aquisition failures, SMP systems want to use adaptive spinlocks, atomic test-and-set operations aren't necessarily on some NUMA systems, and ordering operations are somewhat fuzzy on some of the more advanced processors--and that's all on x86 systems. All this stuff is best left to the OS, which presumably has a better idea of what the right and most efficient thing to do is, and certainly has more resources behind it than we do. Definitely is in a position to be up-to-date, in ways that we don't have. (You can guarantee that the OS on a system is sufficiently up-to-date to run properly, but it's not the same with user executables, which can be years old) I really don't want folks to get distracted by trying to get down to the metal--it'll just get folks all worked up over something we're not going to be doing because it's not prudent. I'd prefer everyone get worked up over the higher-level stuff and just assume we have the simple stuff at hand, and as the simple stuff is all we can safely assume that's just a prudent thing. (This is one of those cases where I'd really prefer for force everyone doing thread work to have to work on 8 processor Alpha boxes (your choice of OS, I don't care), one of the most vicious threading enviroments ever devised, but alas that's not going to happen. Pity, though) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Thread notes
DS == Dan Sugalski [EMAIL PROTECTED] writes: DS (This is one of those cases where I'd really prefer for force DS everyone doing thread work to have to work on 8 processor Alpha DS boxes (your choice of OS, I don't care), one of the most vicious DS threading enviroments ever devised, but alas that's not going to DS happen. Pity, though) single cpu lsi-11's running FG/BG rt-11 doesn't count? :) it was a dec product too! :) uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: Thread notes
At 11:49 PM -0500 1/3/04, Uri Guttman wrote: DS == Dan Sugalski [EMAIL PROTECTED] writes: DS (This is one of those cases where I'd really prefer for force DS everyone doing thread work to have to work on 8 processor Alpha DS boxes (your choice of OS, I don't care), one of the most vicious DS threading enviroments ever devised, but alas that's not going to DS happen. Pity, though) single cpu lsi-11's running FG/BG rt-11 doesn't count? :) Given that it's not a SMP, massively out of order NUMA system with delayed writes... no. 'Fraid not. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Thread notes
DS == Dan Sugalski [EMAIL PROTECTED] writes: DS At 11:49 PM -0500 1/3/04, Uri Guttman wrote: DS == Dan Sugalski [EMAIL PROTECTED] writes: DS (This is one of those cases where I'd really prefer for force DS everyone doing thread work to have to work on 8 processor Alpha DS boxes (your choice of OS, I don't care), one of the most vicious DS threading enviroments ever devised, but alas that's not going to DS happen. Pity, though) single cpu lsi-11's running FG/BG rt-11 doesn't count? :) DS Given that it's not a SMP, massively out of order NUMA system with DS delayed writes... no. 'Fraid not. bah, humbug. then dec lied in their marketing crap. actually i think there were SMP pdp/lsi-11 systems but i never had one. tonight i happened to drive by the apartment where 20 years ago i lived alone with an lsi-11 box that my employer lent me (cost $10k!!). did my thesis on it. times have changed a little. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org