Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Nigel Sandever
On Sat, 03 Jan 2004 01:48:07 -0500, [EMAIL PROTECTED] (Uri Guttman) wrote:
 ding! ding! ding! you just brought in a cpu specific instruction which
 is not guaranteed to be on any other arch. in fact many have such a
 beast but again, it is not accessible from c.

 you can't bring x86 centrism into this. the fact that redmond/intel
 threads can make use of this instruction to do 'critical sections' is a
 major point why it is bad for a portable system with many architectures
 to be supported.

 i disagree. it is covered for the intel case only and that is not good
 enough.

 again, intel specific and it has to be tossed out.

 but atomic operations are not available on all platforms.

 not in a unix environment. real multi-threaded processes already have
 this problem.

 that is a kernel issue and most signalling systems don't have thread
 specific arguments AFAIK.

 virtual ram is what counts on unix. you can't request some large amount
 without using real swap space. 

 again, redmond specific. 

 what win32 does is not portable and not useful at a VM level.

 ok, i can see where you got the test/set and yield stuff from now
 finally, it is redmond specific again and very unportable. i
 have never heard of this fibre concept on any unix flavor.

 and that is not possible on other platforms.

 that is a big point and one that i don't see as possible. redmond can do
 what they want with their kernel and user procs. parrot can only use
 kernel concepts that are reasonably portable across most platforms.
 kernel threads are reasonably portable (FSDO of reasonable) but anything
 beyond that such as fibres, test/set in user space, etc is not. locks
 have to be in kernel space since we can do a fibre yield in user space
 on any other platform. so this rule out user space test/set as well
 since that would need a thread to spin instead of blocking.
 
 your ideas make sense but only on redmond/intel which is not the target
 space for parrot.

That's pretty much the crux. I don't know what is available (in detail) on
other platforms. Hence I needed to express the ideas in terms I understand
and explain them sufficiently that they could be viewed, interpreted  and 
either related to similar concepts on othe platforms, or shot down.

I accept your overall judgement, though not necessarially all the specifics.

Maybe it would be possible (for me + others) to write the core of a win32 specific,
threaded VM interpreter that would take parrot byte code and run it. Thereby,
utilising all the good stuff that preceeds the VM interpreter, plus probably large 
chunks of the parrot VM, but provides it with a more native compatible target. 

That's something that obviously not a simple project and is beyond the scope of 
this list. 

Thanks for taking the time to review this.

Nigel Sandever.




Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Luke Palmer
Nigel Sandever writes:
 Maybe it would be possible (for me + others) to write the core of a win32 specific,
 threaded VM interpreter that would take parrot byte code and run it. Thereby,
 utilising all the good stuff that preceeds the VM interpreter, plus probably large 
 chunks of the parrot VM, but provides it with a more native compatible target. 

You want to write a parrot?  Um, good luck.

One thing Uri mentioned was writing platform specific macro code, and he
dismissed it with that it's also a pain.  While I agree that it's a
pain, and it's about as maintainence-friendly as JIT, I don't think it
has to be ruled out.

Parrot is platform-independent, but that doesn't mean we can't take
advantage of platform-specific instructions to make it faster on certain
machines.  Indeed, this is precisely what JIT is.  

But a lock on every PMC is still pretty heavy for those non-x86
platforms out there, and we should avoid it if we can.

Luke

 
 That's something that obviously not a simple project and is beyond the scope of 
 this list. 
 
 Thanks for taking the time to review this.
 
 Nigel Sandever.
 
 


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Leopold Toetsch
Nigel Sandever [EMAIL PROTECTED] wrote:
 On Sat, 03 Jan 2004 01:48:07 -0500, [EMAIL PROTECTED] (Uri Guttman) wrote:

 your ideas make sense but only on redmond/intel which is not the target
 space for parrot.

s/not the/by far not the only/

 Maybe it would be possible (for me + others) to write the core of a
 win32 specific, threaded VM interpreter that would take parrot byte
 code and run it. Thereby, utilising all the good stuff that preceeds
 the VM interpreter, plus probably large chunks of the parrot VM, but
 provides it with a more native compatible target.

I'd be glad, if someone fills the gaps in platform code. There is no
need to rewrite the interpreter or such: Defining the necessary macros
in an efficient way should be enough to start with.

 Nigel Sandever.

leo


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Leopold Toetsch
Nigel Sandever [EMAIL PROTECTED] wrote:

[ Line length adjusted for readability ]

 VIRTUAL MACHINE INTERPRETER

 At any given point in the running of the interpreter, the VM register
 set, program counter and stack must represent the entire state for
 that thread.

That's exactly, what a ParrotInterpreter is: the entire state for a
thread.

 I am completely against the idea of VM threads having a 1:1
 relationship with interpreters.

While these can be separated its not efficient. Please note that Parrot
is a register-based VM. Switching state is *not* switching a
stack-pointer and a PC thingy like in stack-based VMs.

 ... The runtime costs of duplicating all
 shared data between interpreters, tying all shared variables, added to
 the cost of locking, throw away almost all of the benefit of using
 threads.

No, shared resources are not duplicated, nor tied. The locking of course
remains, but that's it.

 The distinction is that a critical section does not disable time
 slicing/ preemption.  It prevents any thread attempting to enter an
 owned critical section from receiving a timeslice until the current
 owner relinquishes it.

These are platform specific details. We will use whatever the
platform/OS provides. In the source code its a LOCK() UNLOCK() pair.
The LOCK() can be any atomic operation and doesn't need to call the
kernel, if the lock is aquired.

[ Lot of Wintel stuff snipped ]

 Nigel Sandever.

leo


Re: cvs commit: parrot/config/gen/platform ansi.c darwin.c generic.c openbsd.c platform_interface.h win32.c

2004-01-03 Thread Leopold Toetsch
Peter Gibbs [EMAIL PROTECTED] wrote:
   Log:
   Prevent attempts to reallocate mmap'd executable memory until somebody
 works out how to do it.

For linux/fedora, I'd go with this scheme:

- mem_alloc_executable uses valloc() that is memalign(getpagesize(),
size) with size rounded up to the next page size.
- mem_realloc_executable tries to realloc the memory region, and if
a different (unaligned) pointer is returned, then valloc(), memcpy and
free old.
- mem_prepare_executable (called after the code is prepared and before
execution starts, does mprotect(p, n, PROT_EXEC|PROT_READ) on fedora
or flush these memory region on ARM and PPC.
- mem_free_executable is just free - the current also has a missing
  (wrong) length argument.

And finally generic platforms that don't have this executable protection
just map these function to the plain allocation functions.

The current mmap() approach is probably not suited for allocating a lot
of different small code pieces, there is for sure some OS limit. This
scheme is probably only usable as base for an own allocator.

I'll check in a test if malloced memory is executable RSN.

leo


Re: cvs commit: parrot/config/gen/platform ansi.c darwin.c generic.c openbsd.c platform_interface.h win32.c

2004-01-03 Thread Leopold Toetsch
Leopold Toetsch [EMAIL PROTECTED] wrote:

 I'll check in a test if malloced memory is executable RSN.

The check is in. On fedora, the Configure line of JIT should have

  (has_exec_protect yes)

If that's ok, some define in a header should be set.

We currently seem to be missing a general solution to just define some
item, the feature.h approach is almost overkill for such simple
defines.

I'd go with something like what's currently being done with the headers:

  if $PConfig{h_*} is defined a PARROT_HAS_HEADER_* is set.

So for example:

  if $PConfig{d_*} is defined PARROT_HAS_* is defined to 1 else to 0

into a new generated include file, e.g. defines.h, which is included
from config.h additionally to features.h.

leo


[perl #24796] [PATCH] build errors on solaris

2004-01-03 Thread via RT
# New Ticket Created by  Lars Balker Rasmussen 
# Please include the string:  [perl #24796]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org/rt3/Ticket/Display.html?id=24796 


Adding a hints file for solaris solves two thread-related link issues.
-- 
Lars Balker Rasmussen  Consult::Perl

Index: MANIFEST
===
RCS file: /cvs/public/parrot/MANIFEST,v
retrieving revision 1.525
diff -u -a -r1.525 MANIFEST
--- MANIFEST	3 Jan 2004 14:01:43 -	1.525
+++ MANIFEST	3 Jan 2004 15:04:07 -
@@ -175,6 +175,7 @@
 config/init/hints/mswin32.pl  []
 config/init/hints/openbsd.pl  []
 config/init/hints/os2.pl  []
+config/init/hints/solaris.pl  []
 config/init/hints/vms.pl  []
 config/init/manifest.pl   []
 config/init/miniparrot.pl []
Index: config/init/hints/solaris.pl
===
diff -u /dev/null config/init/hints/solaris.pl
--- /dev/null	Sat Jan  3 15:58:17 2004
+++ config/init/hints/solaris.pl	Sat Jan  3 15:46:47 2004
@@ -1,0 +1,10 @@
+my $libs = Configure::Data-get('libs');
+if ( $libs !~ /-lpthread/ ) {
+$libs .= ' -lpthread';
+}
+if ( $libs !~ /-lrt\b/ ) {
+$libs .= ' -lrt';
+}
+Configure::Data-set(
+libs = $libs,
+);


[perl #24797] [PATCH] libnci build problem on FreeBSD

2004-01-03 Thread via RT
# New Ticket Created by  Lars Balker Rasmussen 
# Please include the string:  [perl #24797]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org/rt3/Ticket/Display.html?id=24797 


nci_test.c included malloc.h which is deprecated on FreeBSD (and
elsewhere I assume), without actually using any malloc-stuff as far as
I could tell.

According to #parrot, libnci.so is built automatically on some
platforms, but not on my FreeBSD.  I couldn't find any logic to do this
- are you supposed to build libnci.so by hand if you want to test nci?
-- 
Lars Balker Rasmussen  Consult::Perl

Index: src/nci_test.c
===
RCS file: /cvs/public/parrot/src/nci_test.c,v
retrieving revision 1.13
diff -u -a -r1.13 nci_test.c
--- src/nci_test.c	25 Dec 2003 12:02:25 -	1.13
+++ src/nci_test.c	3 Jan 2004 15:16:12 -
@@ -1,5 +1,4 @@
 #include stdio.h
-#include malloc.h
 /*
  * cc -shared -fpic nci_test.c -o libnci.so -g
  * export LD_LIBRARY_PATH=.


[perl #24799] [PATCH] bug in find_chartype's chartype_create_from_mapping()

2004-01-03 Thread via RT
# New Ticket Created by  Lars Balker Rasmussen 
# Please include the string:  [perl #24799]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org/rt3/Ticket/Display.html?id=24799 


chartype_create_from_mapping() (called from the op find_chartype) fails
to notice that it has run out of file, so happily parses the last line
twice.  This leads to writing to array[-1]...
-- 
Lars Balker Rasmussen  Consult::Perl

Index: src/chartype.c
===
RCS file: /cvs/public/parrot/src/chartype.c,v
retrieving revision 1.21
diff -u -a -r1.21 chartype.c
--- src/chartype.c	14 Nov 2003 20:27:02 -	1.21
+++ src/chartype.c	3 Jan 2004 16:10:34 -
@@ -183,7 +183,7 @@
 
 while (!feof(f)) {
 char *p = fgets(line, 80, f);
-if (line[0] != '#') {
+if (p  *p != '#') {
 int n = sscanf(line, %li\t%li, typecode, unicode);
 if (n == 2  typecode = 0) {
 if (typecode  256  typecode == one2one  


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Uri Guttman
 LT == Leopold Toetsch [EMAIL PROTECTED] writes:

  LT These are platform specific details. We will use whatever the
  LT platform/OS provides. In the source code its a LOCK() UNLOCK() pair.
  LT The LOCK() can be any atomic operation and doesn't need to call the
  LT kernel, if the lock is aquired.

if it doesn't call the kernel, how can a thread be blocked? you can't
have user level locks without spinning. at some point (even with fibres)
you need to make a kernel call so other threads can run. the macro layer
will make the mainline source look better but you still need kernel
calls in their definition.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org


[PATCH?] Error in t/src/list.t on FreeBSD

2004-01-03 Thread Lars Balker Rasmussen
I was seeing an error in the second test in t/src/list.t on FreeBSD:

set_integer_keyed() not implemented in class 'PerlInt'

I tracked it down to two consecutive calls to pmc_new() returning the
same pointer, which is generally not what you want.  Copying the
following line from imcc/main.c to just before the first pmc_new() in
the second test in t/src/list.t fixed the problem.

interpreter-DOD_block_level++;
 
But I'm unsure if this is the right way to go about it, or rather, if
the line above belongs in Parrot_init() or elsewhere.

Cheers,
-- 
Lars Balker Rasmussen  Consult::Perl



Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Nigel Sandever
On Sat, 3 Jan 2004 11:35:37 +0100, [EMAIL PROTECTED] (Leopold Toetsch) wrote:
 Nigel Sandever [EMAIL PROTECTED] wrote:
 
  VIRTUAL MACHINE INTERPRETER
 
  At any given point in the running of the interpreter, the VM register
  set, program counter and stack must represent the entire state for
  that thread.
 
 That's exactly, what a ParrotInterpreter is: the entire state for a
 thread.

This is only true if a thread == interpreter. 
If a single interpreter can run 2 threads then that single interpreter 
cannot represent the state of both threads safely.

 
  I am completely against the idea of VM threads having a 1:1
  relationship with interpreters.
 

With 5005threads, multiple threads exist in a single interpreter.

 All VHLL level data is shared without duplication, but locking has
 to be performed on each entity. This model is more efficient than
 ithreads. 
 However it was extremely difficult prevent onwanted interaction
 between the threads corrupting the internal state of the interpreter
 given the internal architecture of P5 and so it was abandoned.

With ithreads, each thread is also a seperate interpreter.

 This insulates the internal state of one interpreter from the other
 but also insulates *all* perl level program data in one interpreter
 from the perl level data in the other. 

 Spawning a new thread becomes a process of duplicating everything.
 The interpreter, the perl program, and all it existing data. 

 Sharing data between the threads/interpreters is implemented by 
 tieing the two copies of the variables to be shared and each time 
 a STORE is performed in one thread, the same STORE has too be 
 performed on the copy of that var held on every other threads 
 dataspace.

 If 2 threads need to share a scalar, but the program has 10 other
 threads, then each write to the shared scalar requires the update
 of all 12 copies of that scalar. There is no way to indicate that 
 you only need to share it between threads x  y.

 With ithreads, there can be no shared references, so no shared
 objects and no shared compound data structures

 While these can be separated its not efficient. Please note that Parrot
 is a register-based VM. Switching state is *not* switching a
 stack-pointer and a PC thingy like in stack-based VMs.
  ... The runtime costs of duplicating all
  shared data between interpreters, tying all shared variables, added to
  the cost of locking, throw away almost all of the benefit of using
  threads.
 No, shared resources are not duplicated, nor tied. The locking of course
 remains, but that's it.

The issues above are what make p5 ithreads almost unusable. 

If Parrot has found a way of avoiding these costs and limitations
then everything I offered is a waste of time, because these are
the issues  was attempting to address.

However, I have seen no indication here, in the sources or anywhere 
else that this is the case. I assume that the reason Dan opened the
discussion up in the first place is because the perception by those
looking on was that the p5 ithreads model was being suggested as the
way Parrot was going to go. 

And the reaction from those wh have tried to make use of ithreads
under p5 are all too aware that replicating them for Parrot would
be . [phrase deleted as too emotionally charged:)]

 leo

Nigel.





Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Elizabeth Mattijsen
At 18:20 + 1/3/04, Nigel Sandever wrote:
 Sharing data between the threads/interpreters is implemented by
 tieing the two copies of the variables to be shared and each time
 a STORE is performed in one thread, the same STORE has too be
 performed on the copy of that var held on every other threads
 dataspace.
Hmmm is that true?  My impression was (and that's the way I've 
implemented it in Thread::Tie) is that each STORE actually stores the 
value in a hidden background thread, and each FETCH obtains the 
current value from the background thread.  I don't think each STORE 
is actually cascading through all of the threads.  Not until they try 
to fetch the shared value, anyway.


 If 2 threads need to share a scalar, but the program has 10 other
 threads, then each write to the shared scalar requires the update
 of all 12 copies of that scalar. There is no way to indicate that
 you only need to share it between threads x  y.
That's true.  But there is some implicit grouping.  You can only 
create newly shared variables for the current thread and any thread 
that is started from the current thread.  This is actually a pity, as 
that precludes you to start a bare (minimal number of Perl modules 
loaded, or with exactly the modules that _you_ want) thread at 
compile time, from which you could start other threads.


 With ithreads, there can be no shared references, so no shared
 objects and no shared compound data structures
Actually, you can bless a reference to a shared variable, but you 
can't share a blessed object (the sharing will let you lose the 
content of the object).  I think shared compound data structures 
_are_ possible, but very tricky to get right (because the CLONE sub 
is called as a package method, rather than as an object method: see 
Thread::Bless for an attempt at getting around this).


  While these can be separated its not efficient. Please note that Parrot
 is a register-based VM. Switching state is *not* switching a
 stack-pointer and a PC thingy like in stack-based VMs.
  ... The runtime costs of duplicating all
  shared data between interpreters, tying all shared variables, added to
  the cost of locking, throw away almost all of the benefit of using
  threads.
 No, shared resources are not duplicated, nor tied. The locking of course
  remains, but that's it.
The issues above are what make p5 ithreads almost unusable.
For more about this, see my article on Perl Monks:

  http://www.perlmonks.org/index.pl?node_id=288022


And the reaction from those wh have tried to make use of ithreads
under p5 are all too aware that replicating them for Parrot would
be . [phrase deleted as too emotionally charged:)]
What can I say?   ;-)

Liz


Re: [perl #24799] [PATCH] bug in find_chartype's chartype_create_from_mapping()

2004-01-03 Thread Peter Gibbs
Lars Balker Rasmussen wrote (via RT)

 chartype_create_from_mapping() (called from the op find_chartype)
fails
 to notice that it has run out of file, so happily parses the last line
 twice.  This leads to writing to array[-1]...

 -if (line[0] != '#') {
 +if (p  *p != '#') {

Patch applied.

Many thanks
Peter Gibbs
EmKel Systems




Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Elizabeth Mattijsen
I'm trying to be constructive here.  Some passages may appear to be 
blunt.  Read at your own risk   ;-)

At 01:48 -0500 1/3/04, Uri Guttman wrote:
  NS == Nigel Sandever [EMAIL PROTECTED] writes:
  NS All that is required to protect an object from corruption through
  NS concurrent access and state change is to prevent two (or more) VMs
  NS trying to access the same object simultaneously. In order for the VM to
  NS address the PMC and must load it into a VM register, it must know where
  NS it is. Ie. It must have a pointer.  It can use this pointer to access
  NS the PMC with a Bit Test and Set operation.  (x86 BTS)
  NS [http://www.itis.mn.it/linux/quarta/x86/bts.htm] This is a CPU atomic
  NS operation. This occurs before the VM enters the VM operations critical
  NS section.
ding! ding! ding! you just brought in a cpu specific instruction which
is not guaranteed to be on any other arch. in fact many have such a
beast but again, it is not accessible from c.
I just _can't_ believe I'm hearing this.  So what if it's not 
accessible from C?  Could we just not build a little C-program that 
would create a small in whatever loadable library?  Or have a 
post-processing run through the binary image inserting the right 
machine instructions in the right places?  Not being from a *nix 
background, but more from a MS-DOS background, I've been used to 
inserting architecture specific machine codes from higher level 
languages into executable streams since 1983!  Don't tell me that's 
not done anymore?  ;-)


.. and still some archs don't
have it so it has to be emulated by dijkstra's algorithm. and that
requires two counters and a little chunk of code.
Well, I guess those architectures will have to live with that.  If 
that's what it takes?


you can't bring x86 centrism into this. the fact that redmond/intel
threads can make use of this instruction to do 'critical sections' is a
major point why it is bad for a portable system with many architectures
to be supported.
I disagree.  I'm not a redmond fan, so I agree with a lot of your 
_sentiment_, but you should also realize that a _lot_ of intel 
hardware is running Linux.  Heck, even some Solarises run on it.

I'm not saying that the intel CPU's are superior to others.  They're 
probably not.  But it's just as with cars: most of the cars get 
people from A to B.  They're not all Maserati's.  Or Rolls Royce's. 
Most of them are Volkswagen, Fiat and whatever compacts you guys 
drive in the States.  ;-)

I don't think we're making Parrot to run well on Maserati's and Rolls 
Royce's only.  We want to reach the Volkswagens.  And not even 
today's Volkswagens: by the time Perl 6 comes around, CPU's will have 
doubled in power yet _again_!

The portability is in Parrot itself: not by using the lowest common 
denominator of C runtime systems out there _today_!   It will take a 
lot of trouble to create a system that will run everywhere, but 
that's just it what makes it worthwhile.  Not that it offers the same 
limited capabilities on all systems!


i am well aware about test/set and atomic instructions. my point in my
previous reply was that it can't be assumed to be on a particular
arch. so it has to be assumed to be on none and it needs a pure software
solution. sure a platform specific macro/assembler lib could be done but
that is a pain as well.
Indeed.  A pain, probably.  Maybe not so much.  I can't believe that 
Sparc processors are so badly designed that they don't offer 
something similar as what Nigel suggested for Intel platforms.


again, intel specific and it has to be tossed out.
Again, I think you're thinking too much inside the non-Wintel box. 
You stated yourself just now ...in fact many have such beast..., so 
maybe it should first be investigated which platforms do suppport 
this and which don't, and then decide whether it is a good idea or an 
idea to be tossed?


virtual ram is what counts on unix. you can't request some large amount
without using real swap space. it may not allocate real ram until later
(page faulting on demand) but it is swap space that counts. it is used
up as soon as you allocate it.
So, maybe a wrapper is, either for *nix, or for Win32, or maybe both.


what win32 does is not portable and not useful at a VM level. kernels
and their threads can work well together. portable VMs and their threads
are another beast that can't rely on a particular architecture
instruction or kernel feature.
This sounds too much like dogma to me.  Why?  Isn't Parrot about 
borgifying all good things from all OS's and VM's, now and in the 
future?   ;-)


hardware is tossed out with portable VM design. we have a user space
process written in c with a VM and VM threads that are to be based on
kernel threads. the locking issues for shared objects is the toughest
nut to crack right now. there is no simple or fast solution to this
given the known contraints. intel/redmond specific solutions are not
applicable (though we can learn from them).
Then 

Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Jeff Clites
On Jan 3, 2004, at 11:19 AM, Elizabeth Mattijsen wrote:

At 01:48 -0500 1/3/04, Uri Guttman wrote:
  NS == Nigel Sandever [EMAIL PROTECTED] writes:
  NS All that is required to protect an object from corruption 
through
  NS concurrent access and state change is to prevent two (or more) 
VMs
  NS trying to access the same object simultaneously. In order for 
the VM to
  NS address the PMC and must load it into a VM register, it must 
know where
  NS it is. Ie. It must have a pointer.  It can use this pointer to 
access
  NS the PMC with a Bit Test and Set operation.  (x86 BTS)
  NS [http://www.itis.mn.it/linux/quarta/x86/bts.htm] This is a CPU 
atomic
  NS operation. This occurs before the VM enters the VM operations 
critical
  NS section.
ding! ding! ding! you just brought in a cpu specific instruction which
is not guaranteed to be on any other arch. in fact many have such a
beast but again, it is not accessible from c.
I just _can't_ believe I'm hearing this.  So what if it's not 
accessible from C?  Could we just not build a little C-program that 
would create a small in whatever loadable library?  Or have a 
post-processing run through the binary image inserting the right 
machine instructions in the right places?  Not being from a *nix 
background, but more from a MS-DOS background, I've been used to 
inserting architecture specific machine codes from higher level 
languages into executable streams since 1983!  Don't tell me that's 
not done anymore?  ;-)
Yes, you are correct--we are already using bits of assembly in parrot. 
C compilers tend to allow you to insert bits of inline assembly, and we 
are taking advantage of that--for instance, look for __asm__ in the 
following files:

jit/arm/jit_emit.h
jit/ppc/jit_emit.h
src/list.c
src/malloc.c
Also, JIT is all about generating platform-specific machine 
instructions at runtime. So it's certainly do-able, and right along the 
lines of of what we are already doing.

JEff



Re: [perl #24797] [PATCH] libnci build problem on FreeBSD

2004-01-03 Thread Leopold Toetsch
Lars Balker Rasmussen [EMAIL PROTECTED] wrote:

 According to #parrot, libnci.so is built automatically on some
 platforms, but not on my FreeBSD.

No, its currently not built automatically.

 - are you supposed to build libnci.so by hand if you want to test nci?

Yep. AFAIK are we lacking a config hint, that dynamic loading is ok.

leo


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Leopold Toetsch
Uri Guttman [EMAIL PROTECTED] wrote:
  LT == Leopold Toetsch [EMAIL PROTECTED] writes:

  LT These are platform specific details. We will use whatever the
  LT platform/OS provides. In the source code its a LOCK() UNLOCK() pair.
  LT The LOCK() can be any atomic operation and doesn't need to call the
  LT kernel, if the lock is aquired.

 if it doesn't call the kernel, how can a thread be blocked?

I wrote, *if* the lock is aquired. That's AFAIK the fast path of a futex
or of the described Win32 behavior. The slow path is always a kernel
call (or a some rounds spinning before ...)
But anyway, we don't reinvent these locking primitives.

 uri

leo


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Jeff Clites
On Jan 3, 2004, at 10:08 AM, Elizabeth Mattijsen wrote:

At 12:15 -0500 1/3/04, Uri Guttman wrote:
  LT == Leopold Toetsch [EMAIL PROTECTED] writes:
  LT These are platform specific details. We will use whatever the
  LT platform/OS provides. In the source code its a LOCK() UNLOCK() 
pair.
  LT The LOCK() can be any atomic operation and doesn't need to call 
the
  LT kernel, if the lock is aquired.
if it doesn't call the kernel, how can a thread be blocked?
Think out of the box!

Threads can be blocked in many ways.  My forks.pm module uses sockets 
to block threads  ;-).
IO operations which block like that end up calling into the kernel.

But I believe it is usually possible to acquire an uncontested lock 
without calling into the kernel. When you do need to block (when trying 
to acquire a lock which is already held by another thread) you may need 
to enter the kernel. But I think that Leo's point was that in the 
common case of a successful lock operation, it may not be necessary.

JEff



Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Leopold Toetsch
Nigel Sandever [EMAIL PROTECTED] wrote:
 On Sat, 3 Jan 2004 11:35:37 +0100, [EMAIL PROTECTED] (Leopold Toetsch) wrote:
 That's exactly, what a ParrotInterpreter is: the entire state for a
 thread.

 This is only true if a thread == interpreter.
 If a single interpreter can run 2 threads then that single interpreter
 cannot represent the state of both threads safely.

Yep. So if a single interpreter (which is almost a thread state) should
run two threads, you have to allocate and swap all. What should the
advantage of such a solution be?

 With 5005threads, multiple threads exist in a single interpreter.

These are obsolete.

 With ithreads, each thread is also a seperate interpreter.

  Spawning a new thread becomes a process of duplicating everything.
  The interpreter, the perl program, and all it existing data.

Partly yes. A new interpreter is created, the program, i.e. the opcode
stream is *not* duplicated, but JIT or prederef information has to be
rebuilt (on demand, if that run-core is running), and existing
non-shared data items are cloned.

  Sharing data between the threads/interpreters is implemented by
  tieing

Parrot != perl5.ithreads

 If Parrot has found a way of avoiding these costs and limitations
 then everything I offered is a waste of time, because these are
 the issues  was attempting to address.

I posted a very premature benchmark result, where an unoptimized Parrot
build is 8 times faster then the equivalent perl5 code.

 And the reaction from those wh have tried to make use of ithreads
 under p5 are all too aware that replicating them for Parrot would
 be . [phrase deleted as too emotionally charged:)]

I don't know how ithreads are working internally WRT the relevant issues
like object allocation and such. But threads at the OS level provide
shared code and data segments. So at the VM level you have to unshare
non-shared resources at thread creation. You can copy objects lazily and
make 2 distinct items when writing, or you copy them in the first
place. But you have these costs at thread start - and not later.

 Nigel.

leo


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Uri Guttman
 EM == Elizabeth Mattijsen [EMAIL PROTECTED] writes:

   ding! ding! ding! you just brought in a cpu specific instruction which
   is not guaranteed to be on any other arch. in fact many have such a
   beast but again, it is not accessible from c.

  EM I just _can't_ believe I'm hearing this.  So what if it's not
  EM accessible from C?  Could we just not build a little C-program that
  EM would create a small in whatever loadable library?  Or have a
  EM post-processing run through the binary image inserting the right
  EM machine instructions in the right places?  Not being from a *nix
  EM background, but more from a MS-DOS background, I've been used to
  EM inserting architecture specific machine codes from higher level
  EM languages into executable streams since 1983!  Don't tell me that's
  EM not done anymore?  ;-)

it is not that it isn't done anymore but the effect has to be the same
on machines without test/set. and on top of that, it still needs to be a
kernel level operation so a thread can block on the lock. that is the
more important issue that makes using a test/set in user space a moot
problem.

  EM I disagree.  I'm not a redmond fan, so I agree with a lot of your
  EM _sentiment_, but you should also realize that a _lot_ of intel
  EM hardware is running Linux.  Heck, even some Solarises run on it.

we are talking maybe 10-20 architectures out there that we would want
parrot to run on. maybe more. how many does p5 run on now?

  EM The portability is in Parrot itself: not by using the lowest common
  EM denominator of C runtime systems out there _today_!   It will take a
  EM lot of trouble to create a system that will run everywhere, but that's
  EM just it what makes it worthwhile.  Not that it offers the same limited
  EM capabilities on all systems!

but we need a common denominator of OS features more than one for cpu
features. the fibre/thread stuff is redmond only. and they still require
system calls. so as i said the test/set is not a stopping point
(dijkstra) but the OS support is. how and where and when we lock is the
only critical factor and that hasn't been decided yet. we don't want to
lock at global thread levels and we are not sure we can lock at PMC or
object levels (GC and alloc can break that). we should be focusing on
that issue. think about how DBs did it. sybase used to do page locking
(coarse grained) since it was faster (this was 15 years ago) and they
had the fastest engine. but when multithreading and multicpu designs
came in a finer grained row locking was faster (oracle). sybase fell
behind and has not caught up. we have the same choices to make so we
need to study locking algorithms and techniques from that perspective
and not how to do a single lock (test/set vs kernel). but i will keep
reiterating that it has to be a kernel lock since we must block threads
and GC and such without spinning or manual scheduling (fibres).

   virtual ram is what counts on unix. you can't request some large amount
   without using real swap space. it may not allocate real ram until later
   (page faulting on demand) but it is swap space that counts. it is used
   up as soon as you allocate it.

  EM So, maybe a wrapper is, either for *nix, or for Win32, or maybe both.

this is very different behavior IMO and not something that can be
wrapped easily. i could be wrong but separating virtual allocation from
real allocation can't be emulated without kernel support. and we need
the same behavior on all platforms. this again brings up how we lock so
that GC/alloc will work properly with threads. do we lock a thread pool
but not the thread when we access a shared thingy? that is a medium
grain lock. can the GC/alloc break the lock if it is called inside that
operation? or could only the pool inside the active thread do that? what
about an shared object alloced from thread A's pool but it triggers an
alloc when being accessed in thread B. these are the questions that need
to be asked and answered. i was just trying to point out to nigel that
the intel/redmond solutions are not portable as they require OS
support and that all locks need to be kernel level. given that
requirement, we need to decide how to do the locks so those questions
can be answered with reasonable efficiency. of course a single global
lock would work but that stinks and we all know it. so what is the lock
granularity? how do we handle GC/alloc across shared objects?

  EM This sounds too much like dogma to me.  Why?  Isn't Parrot about
  EM borgifying all good things from all OS's and VM's, now and in the
  EM future?   ;-)

but parrot can only use a common set of features across OS's. we can't
use a redmond feature that can't be emulated on other platforms. and my
karma ran over my dogma :( :-)

   hardware is tossed out with portable VM design. we have a user space
   process written in c with a VM and VM threads that are to be based on
   kernel threads. the locking issues for shared objects is the toughest
   

Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Uri Guttman
 LT == Leopold Toetsch [EMAIL PROTECTED] writes:

  LT Uri Guttman [EMAIL PROTECTED] wrote:
LT == Leopold Toetsch [EMAIL PROTECTED] writes:

  LT These are platform specific details. We will use whatever the
  LT platform/OS provides. In the source code its a LOCK() UNLOCK() pair.
  LT The LOCK() can be any atomic operation and doesn't need to call the
  LT kernel, if the lock is aquired.

   if it doesn't call the kernel, how can a thread be blocked?

  LT I wrote, *if* the lock is aquired. That's AFAIK the fast path of a futex
  LT or of the described Win32 behavior. The slow path is always a kernel
  LT call (or a some rounds spinning before ...)
  LT But anyway, we don't reinvent these locking primitives.

ok, i missed the 'if' there. :)

that could be workable and might be faster. it does mean that locks are
two step as well, user space test/set and fallback to kernel lock. we
can do what nigel said and wrap the test/set in macros and use assembler
to get at it on platforms that have it or fallback to dijkstra on those
that don't.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Uri Guttman
 EM == Elizabeth Mattijsen [EMAIL PROTECTED] writes:

  EM At 12:15 -0500 1/3/04, Uri Guttman wrote:
LT == Leopold Toetsch [EMAIL PROTECTED] writes:
  LT These are platform specific details. We will use whatever the
  LT platform/OS provides. In the source code its a LOCK() UNLOCK() pair.
  LT The LOCK() can be any atomic operation and doesn't need to call the
  LT kernel, if the lock is aquired.
   if it doesn't call the kernel, how can a thread be blocked?

  EM Think out of the box!

  EM Threads can be blocked in many ways.  My forks.pm module uses sockets
  EM to block threads  ;-).

i used that design as well. a farm of worker threads blocked on a pipe
(socketpair) to the same process. the main event loop handled the other
side. worked very well.

  EM It sucks performance wise, but it beats the current perl ithreads
  EM implementation on many platforms in many situations.

i can believe that.

  EM Therefore my motto: whatever works, works.

but i discussed that solution with dan and he shot it down for speed
reasons IIRC. i still think it is an interesting solution. it could also
be used for the main event queue and/or loop as i mention above. we are
assuming some form of sockets on all platforms IIRC, so we can use
socketpair for that. i even use socketpair on win32 to test a
(pseudo)fork thing for file::slurp.

   ...you can't
   have user level locks without spinning. at some point (even with fibres)
   you need to make a kernel call so other threads can run.

  EM Possibly.  I don't know enough of the specifics whether this is
  EM true or not.

i looked at the docs for fibres and they say you do a manual reschedule
by selecting the fibre to run next or i think a yield. but something has
to go to the kernel since even fibres are kernel thingys.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Dave Mitchell
On Sat, Jan 03, 2004 at 08:05:13PM +0100, Elizabeth Mattijsen wrote:
 At 18:20 + 1/3/04, Nigel Sandever wrote:
  Sharing data between the threads/interpreters is implemented by
  tieing the two copies of the variables to be shared and each time
  a STORE is performed in one thread, the same STORE has too be
  performed on the copy of that var held on every other threads
  dataspace.
 
 Hmmm is that true?  My impression was (and that's the way I've 
 implemented it in Thread::Tie) is that each STORE actually stores the 
 value in a hidden background thread, and each FETCH obtains the 
 current value from the background thread.  I don't think each STORE 
 is actually cascading through all of the threads.  Not until they try 
 to fetch the shared value, anyway.

Sharing consists of the real SV living in a shared interpreter, with each
individual thread having a lightweight proxy SV that causes the
appropriate real SV to be accessed/updated by a mixture or magic and/or
tied-ish access. A particular access by one thread does not involve any of
the other threads or their proxies.

  With ithreads, there can be no shared references, so no shared
  objects and no shared compound data structures
 
 Actually, you can bless a reference to a shared variable, but you 
 can't share a blessed object (the sharing will let you lose the 
 content of the object).  I think shared compound data structures 
 _are_ possible, but very tricky to get right (because the CLONE sub 
 is called as a package method, rather than as an object method: see 
 Thread::Bless for an attempt at getting around this).

Nested shared structures work just fine, and there's no need to mess with
CLONE for plain data.

 And the reaction from those wh have tried to make use of ithreads
 under p5 are all too aware that replicating them for Parrot would
 be . [phrase deleted as too emotionally charged:)]

It's the implementation of ithreads in p5 that sucks, not the concept
itself. The use of COW makes new thread creation cheap, and the use of
lock PMCs interposed on the real PMCs makes shareing easy.

Dave.

-- 
O Unicef Clearasil!
Gibberish and Drivel!
  - Bored of the Rings


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Jeff Clites
On Jan 3, 2004, at 12:26 PM, Uri Guttman wrote:

  LT These are platform specific details. We will use whatever the
  LT platform/OS provides. In the source code its a LOCK() UNLOCK() 
pair.
  LT The LOCK() can be any atomic operation and doesn't need to call 
the
  LT kernel, if the lock is aquired.

if it doesn't call the kernel, how can a thread be blocked?
  LT I wrote, *if* the lock is aquired. That's AFAIK the fast path of 
a futex
  LT or of the described Win32 behavior. The slow path is always a 
kernel
  LT call (or a some rounds spinning before ...)
  LT But anyway, we don't reinvent these locking primitives.

ok, i missed the 'if' there. :)

that could be workable and might be faster. it does mean that locks are
two step as well, user space test/set and fallback to kernel lock. we
can do what nigel said and wrap the test/set in macros and use 
assembler
to get at it on platforms that have it
This is probably already done inside of pthread (and Win32) locking 
implementations--a check in userspace before a kernel call. It's also 
important to remember that it's not a two-step process most of the 
time, since most of the locking we are taking about is likely to be 
uncontested (ie, the lock acquisition will succeed most of the time, 
and no blocking will be needed). If we _do_ have major lock contention 
in our internal locking, those will be areas calling for a redesign.

JEff



Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Nigel Sandever
On Sat, 3 Jan 2004 21:00:31 +0100, [EMAIL PROTECTED] (Leopold Toetsch) wrote:
  That's exactly, what a ParrotInterpreter is: the entire state for a
  thread.
 
  This is only true if a thread == interpreter.
  If a single interpreter can run 2 threads then that single interpreter
  cannot represent the state of both threads safely.
 
 Yep. So if a single interpreter (which is almost a thread state) should
 run two threads, you have to allocate and swap all. 

When a kernel level thead is spawned, no duplication of application memory 
is required, Only a set of registers, program counter and stack. These 
represent the entire state of that thread.

If a VM thread mirrors this, by duplicating the VM program counter, 
VM registers and  VM stack, then this VM thread context can also
avoid the need to replicate the rest of the program data (interpreter).

 What should the
 advantage of such a solution be?

The avoidance of duplication. 
Transparent interlocking of VHLL fat structures performed automatically
by the VM itself. No need for :shared or lock().


 
  With 5005threads, multiple threads exist in a single interpreter.
 
 These are obsolete.

ONLY because they couldn't be made to work properly. The reason 
that was true are entirely due to the architecture of P5.

Dan Sugalski suggested in this list back in 2001, that he would prefer
pthreads to ithreads. 

I've used both in p5, and pthreads are vastly more efficient, but flaky and
difficult to use well. These limitations are due to the architecture upon 
which they were built. My interest is in seeing the Parrot architecture
not exclude them.

 
  With ithreads, each thread is also a seperate interpreter.
 
   Spawning a new thread becomes a process of duplicating everything.
   The interpreter, the perl program, and all it existing data.
 
 Partly yes. A new interpreter is created, the program, i.e. the opcode
 stream is *not* duplicated, but JIT or prederef information has to be
 rebuilt (on demand, if that run-core is running), and existing
 non-shared data items are cloned.
 

Only duplicating shared data on demand (COW) may work well on systems
that support COW in the kernel. But on systems that don't, this has to be
emulated in user space, with all the inherent overhead that implies.

My desire was that the VM_Spawn_Thread VM_Share_PMC and 
VM_Lock_PMC opcodes could be coded such that those platforms where
the presence of kernel level COW and other native features mean that
the ithreads-style model of VMthread == kernel thread + interpreter 
is the best way to go, then that would be the underlying implementation.

On those platforms where VMthread == kernel thread + VMthread context
is the best way, then that would be the underlying implementation.

In order for this to be possible, it implies a certain level of support for
both be engrained in the design of the interpreter.

My (long) oroginal post, with all the subjects covered and details given 
was my attempt to describe the support required in the design for the 
latter. It would be necessary to consider all the elements, and the way 
they intereact, and take these into consideration when implementing
Parrots threading in order that this would be achievable.

Each element, the seraration of the VMstate from the interpreter state,
the atomisation of VM operations, the automated detection and locking of
concurrect access attempts and the serialisation of the VM threads when 
it is detected all need support at the highest level before they may be 
implemented at the lowest (platform specific) levels.

It simply isn't possible to implement them on one platform at the lowest
levels unless the upper levels of the design are contructed with the 
possibilities in mind.

   Sharing data between the threads/interpreters is implemented by
   tieing
 
 Parrot != perl5.ithreads
 
  If Parrot has found a way of avoiding these costs and limitations
  then everything I offered is a waste of time, because these are
  the issues  was attempting to address.
 
 I posted a very premature benchmark result, where an unoptimized Parrot
 build is 8 times faster then the equivalent perl5 code.
 
  And the reaction from those wh have tried to make use of ithreads
  under p5 are all too aware that replicating them for Parrot would
  be . [phrase deleted as too emotionally charged:)]
 
 I don't know how ithreads are working internally WRT the relevant issues
 like object allocation and such. But threads at the OS level provide
 shared code and data segments. So at the VM level you have to unshare
 non-shared resources at thread creation. 

You only need to copy them, if the two threads can attempt to modify
the contents of the objects concurrently. By precluding this possibility,
by atomising VMthread level operations by preventing a new VM thread
form being scheduled until any othe VM thread completes its current 
operation and ensuring that each VMthreads state is in a complete and
coherent state before another VM 

Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Uri Guttman
 JC == Jeff Clites [EMAIL PROTECTED] writes:

  JC On Jan 3, 2004, at 12:26 PM, Uri Guttman wrote:

   that could be workable and might be faster. it does mean that locks
   are two step as well, user space test/set and fallback to kernel
   lock. we can do what nigel said and wrap the test/set in macros and
   use assembler to get at it on platforms that have it

  JC This is probably already done inside of pthread (and Win32)
  JC locking implementations--a check in userspace before a kernel
  JC call. It's also important to remember that it's not a two-step
  JC process most of the time, since most of the locking we are taking
  JC about is likely to be uncontested (ie, the lock acquisition will
  JC succeed most of the time, and no blocking will be needed). If we
  JC _do_ have major lock contention in our internal locking, those
  JC will be areas calling for a redesign.

i meant in the coding and not necessarily at runtime. we still need to
address when and where locking happens.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Elizabeth Mattijsen
At 21:11 + 1/3/04, Dave Mitchell wrote:
On Sat, Jan 03, 2004 at 08:05:13PM +0100, Elizabeth Mattijsen wrote:
  Actually, you can bless a reference to a shared variable, but you
 can't share a blessed object (the sharing will let you lose the
 content of the object).  I think shared compound data structures
 _are_ possible, but very tricky to get right (because the CLONE sub
 is called as a package method, rather than as an object method: see
  Thread::Bless for an attempt at getting around this).
Nested shared structures work just fine, and there's no need to mess with
CLONE for plain data.
Indeed.  But as soon as there is something special such as a 
datastructure external to Perl between threads (which happens 
automatically shared automatically, because Perl doesn't know about 
the datastructure, so the cloned objects point to the same memory 
address), then you're in trouble.  Simply because you now have 
multiple DESTROYs called on the same external data-structure.  If the 
function of the DESTROY is to free the memory of the external 
data-structure, you're in trouble as soon as the first thread is 
done.  ;-(


  And the reaction from those wh have tried to make use of ithreads
 under p5 are all too aware that replicating them for Parrot would
  be . [phrase deleted as too emotionally charged:)]
It's the implementation of ithreads in p5 that sucks, not the concept
itself. The use of COW makes new thread creation cheap, and the use of
lock PMCs interposed on the real PMCs makes shareing easy.
I agree that Perl ithreads as *a* concept are ok.

The same could be said about what are now referred to as 5.005 
threads.  Ok as *a* concept.  And closer to what many non-Perl people 
consider to be threads.

Pardon my French, but both suck in the implementation.  And it is not 
for lack of effort by the people who developed it.  It is for lack of 
a good foundation to build on.  And we're talking foundation here 
now, we all want to make sure it is the best, earth-quake proofed, 
rocking foundation we can get!

Liz


Thread notes

2004-01-03 Thread Dan Sugalski
First, I'm not paying much attention. Maybe next week. However, as 
messages that Eudora tags with multiple chiles tend to get my 
attention, be aware that the following are non-negotiable:

1) We are relying on OS services for all threading constructs

We are not going to count on 'atomic' operations, low-level assembly 
processor guarantees, and we are definitely *not* rolling our own 
threading constructs of any sort. They break far too often in the 
face of SMP, new processors having different ordering of writes, and 
odd hardware issues.

Yes, I know there's a But... for each of these. Please don't, I'll 
just have to get cranky. We shall use the system thread primitives 
and functions.

2) The only thread constructs we are going to count on are:
  *) Abstract, non-recursive, simple locks
  *) Rendezvous points (Things threads go to sleep on until another 
thread pings the condition)
  *) Semaphores (in the I do a V and P operation, with a count)

We are *not* counting on being able to kill or freeze a thread from 
any other thread. Nor are we counting on recursive locks, read/write 
locks, nor any other things. Unfortunately.

3) I'm still not paying much attention.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


[perl #24802] [PATCH] Minor file reading bug in debug.c

2004-01-03 Thread via RT
# New Ticket Created by  Lars Balker Rasmussen 
# Please include the string:  [perl #24802]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org/rt3/Ticket/Display.html?id=24802 


Following the file reading bug in chartype.c, I checked the rest of
parrot for fopen's, to see if there were similar errors.

A minor similar one was found in src/debug.c.
-- 
Lars Balker Rasmussen  Consult::Perl

Index: src/debug.c
===
RCS file: /cvs/public/parrot/src/debug.c,v
retrieving revision 1.116
diff -u -a -r1.116 debug.c
--- src/debug.c	13 Dec 2003 15:01:17 -	1.116
+++ src/debug.c	3 Jan 2004 22:06:11 -
@@ -1533,8 +1533,8 @@
 PDB_load_source(struct Parrot_Interp *interpreter, const char *command)
 {
 FILE *file;
-char f[255], c;
-int i;
+char f[255];
+int i, c;
 unsigned long size = 0;
 PDB_t *pdb = interpreter-pdb;
 PDB_file_t *pfile;
@@ -1566,15 +1566,14 @@
 pfile-line = pline;
 pline-number = 1;
 
-while (!feof(file)) {
-c = (char)fgetc(file);
+while ((c = fgetc(file)) != EOF) {
 /* Grow it */
 if (++size == 1024) {
 pfile-source = mem_sys_realloc(pfile-source,
 (size_t)pfile-size + 1024);
 size = 0;
 }
-pfile-source[pfile-size] = c;
+pfile-source[pfile-size] = (char)c;
 
 pfile-size++;
 


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Leopold Toetsch
Uri Guttman [EMAIL PROTECTED] wrote:

 ok, i missed the 'if' there. :)

 that could be workable and might be faster. it does mean that locks are
 two step as well, user space test/set and fallback to kernel lock.

Yep, that is, what the OS provides. I really don't like to reinvent
wheels here - ehem, and nowhere else.

 uri

leo


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Leopold Toetsch
Uri Guttman [EMAIL PROTECTED] wrote:
 ... this again brings up how we lock so
 that GC/alloc will work properly with threads. do we lock a thread pool
 but not the thread when we access a shared thingy?

This is the major issue, how to continue. Where are shared objects
living (alloc) and where and how are these destroyed (DOD/GC).

But first I'd like to hear some requirements defined, e.g.:

A spawns B, C threads with shared $a
B spawns D, E threads with shared $b

- is $b shared in A or C
- is $a shared in D or E
- are there any (HLL) decisions to decide what is shared
- and so on

If that isn't layed out we don't need to talk about locking a certain
thread pool.

 ...how do we handle GC/alloc across shared objects?

I posted a proposal for that :)

 uri

leo


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Leopold Toetsch
Elizabeth Mattijsen [EMAIL PROTECTED] wrote:

 Indeed.  But as soon as there is something special such as a
 datastructure external to Perl between threads (which happens
 automatically shared automatically, because Perl doesn't know about
 the datastructure,

Why is it shared automatically? Do you have an example for that?
But anyway an interesting problem, I didn't consider until yet - thanks :)

 ... so the cloned objects point to the same memory
 address), then you're in trouble.  Simply because you now have
 multiple DESTROYs called on the same external data-structure.  If the
 function of the DESTROY is to free the memory of the external
 data-structure, you're in trouble as soon as the first thread is
 done.  ;-(

Maybe that DOD/GC can help here. A shared object can and will be
destroyed only, when the last holder of that object has released it.

[ perl5 thread concepts ]

 Pardon my French, but both suck in the implementation.  And it is not
 for lack of effort by the people who developed it.

The problem for sure was, to put threads on top of a working
interpreter and a commonly used language. Parrots design is based on
having threads, events, async IO in mind. It was surprisingly simple to
implement these first steps that are running now.

Separating the HLL layer from the engine probably helps a lot for such
rather major design changes.

 now, we all want to make sure it is the best, earth-quake proofed,
 rocking foundation we can get!

Yep. So again, your input is very welcome,

 Liz

leo


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Leopold Toetsch
Nigel Sandever [EMAIL PROTECTED] wrote:
 On Sat, 3 Jan 2004 21:00:31 +0100, [EMAIL PROTECTED] (Leopold Toetsch) wrote:
 Yep. So if a single interpreter (which is almost a thread state) should
 run two threads, you have to allocate and swap all.

 When a kernel level thead is spawned, no duplication of application memory
 is required, Only a set of registers, program counter and stack. These
 represent the entire state of that thread.

Here is the current approach, I've implemented partly: The state of a
thread is basically the interpreter structure - that's it. In terms of
Parrot at thread (a ParrotThread PMC) is derived from an interpreter (a
ParrotInterpreter PMC).

Please remember, Parrot is a register based VM and has a lot of
registers. The whole representation of a VM thread is more and different
to a kernel thread.

While scheduling a kernel thread is only swapping above items, a VM
level thread scheduler would have to swap much more.

 If a VM thread mirrors this, by duplicating the VM program counter,
 VM registers and  VM stack, then this VM thread context can also
 avoid the need to replicate the rest of the program data (interpreter).

You are again missing here: the interpreter is above VM state - the rest
is almost nothing. So the interpreter := thread approach holds. You
can't run a even a single - the one and only - thread without these
necessary data and that's just called interpreter in Parrot speak.

 ... No need for :shared or lock().

That's the - let's say - type 4 of Dan's layout of different threading
models. Everything is shared by default. That's similar to the shared
PMC type 3 model - except that no objects have to be copied. It for sure
depends on the user code, if one or the other model will have better
performance, so the user can choose. We will provide both.

 Only duplicating shared data on demand (COW) may work well on systems
 that support COW in the kernel.

No, we are dealing with VM objects and structures here - no kernel is
involved for COWed copies of e.g. strings.

[ snips ]

 Each element, the seraration of the VMstate from the interpreter state,

VM = Virtual machine = interpreter

These can't be separated as they are the same.

 the atomisation of VM operations,

Different VMs can run on different CPUs. Why should we make atomic
instructions out if these? We have a JIT runtime performing at 1 Parrot
instruction per CPU instruction for native integers. Why should we slow
down that by a magnitude of many tenths?

We have to lock shared data, then you have to pay the penalty, but not
for each piece of code.

 You only need to copy them, if the two threads can attempt to modify
 the contents of the objects concurrently.

I think, that you are missing multiprocessor systems totally.

leo


Re: Thread notes

2004-01-03 Thread Leopold Toetsch
Dan Sugalski [EMAIL PROTECTED] wrote:
 2) The only thread constructs we are going to count on are:
*) Abstract, non-recursive, simple locks
*) Rendezvous points (Things threads go to sleep on until another
 thread pings the condition)
*) Semaphores (in the I do a V and P operation, with a count)

All d'accord with above but: I'm not sure yet, but /me thinks that we
need to have a CLEANUP_PUSH and _POP handler functionality too. But
these are basically macros (simple in the absence of pthread_kill or
such) and currently already used :)

 3) I'm still not paying much attention.

May I ask why?
-- Yes :)
Why?

leo


Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Jeff Clites
On Jan 3, 2004, at 2:59 PM, Leopold Toetsch wrote:

Nigel Sandever [EMAIL PROTECTED] wrote:
Only duplicating shared data on demand (COW) may work well on systems
that support COW in the kernel.
No, we are dealing with VM objects and structures here - no kernel is
involved for COWed copies of e.g. strings.
And also, COW at the OS level (that is, of memory pages) doesn't help, 
because we have complex data structures filled with pointers, so 
copying them involves more than just duplicating a block of memory. We 
can use an approach similar to what we do for strings to make a COW 
copy of, for instance, the globals stash, but overall that will only be 
a speed improvement if the data structure is rarely modified. (That is, 
once it's modified, we will have paid the price. Unless we have clever 
data structure which can be COWed in sections.)

Just adding to what Leo already said.

JEff



Re: Thread Question and Suggestion -- Matt

2004-01-03 Thread chromatic
On Sat, 2004-01-03 at 17:24, Matt Fowles wrote:

 I have a naive question:
 
 Why must each thread have its own interpreter?

~handwavy, high-level answer~

For the same reason each thread in C, for example, needs its own stack
pointer.

Since Parrot's a register machine, each thread needs its own set of
registers so it can go off and do its own thing without whomping all
over the other threads.  Those registers live in each interpreter.

-- c



Re: Thread Question and Suggestion -- Matt

2004-01-03 Thread Jeff Clites
On Jan 3, 2004, at 5:24 PM, Matt Fowles wrote:

All~

I have a naive question:

Why must each thread have its own interpreter?
The short answer is that the bulk of the state of the virtual machine 
(including, and most importantly, its registers and register stacks) 
needs to be per-thread, since it represents the execution context 
which is logically thread-local. Stuff like the globals stash may or 
may not be shared (depending on the thread semantics we want), but as I 
understand it the potentially shared stuff is actually only a small 
part of the bits making up the VM.

That said, I do think we have a terminology problem, since I initially 
had the same question you did, and I think my confusion mostly stems 
from there being no clear terminology to distinguish between 2 
interpreters which are completely independent, and 2 interpreters 
which represent 2 threads of the same program. In the latter case, 
those 2 interpreters are both part of one something, and we don't 
have a name for that something. It would be clearer to say that we 
have two threads in one interpreter, and just note that almost all 
of our state lives in the thread structure. (That would mean that the 
thing which is being passed into all of our API would be called the 
thread, not the interpreter, since it's a thread which represents an 
execution context.) It's just (or mostly) terminology, but it's causing 
confusion.

Why not have the threads that share everything share interpreters.  We 
can have these threads be within the a single interpreter thus 
eliminating the need for complicated GC locking and resource sharing 
complexity.  Because all of these threads will be one kernel level 
thread, they will not actually run concurrently and there will be no 
need to lock them.  We will have to implement a rudimentary scheduler 
in the interpreter, but I don't think that is actually that hard.
There are 2 main problems with trying to emulate threads this way:

1) It would likely kill the performance gains of JIT.

2) Calls into native libraries could block the entire VM. (We can't 
manually timeslice external native code.) Even things such as regular 
expressions can take an unbounded amount of time, and the internals of 
the regex engine will be in C--so we couldn't timeslice without slowing 
them down.

And basically, people are going to want real threads--they'll want 
access to the full richness and power afforded by an API such as 
pthreads, and the real threading libraries (and the OS) have already 
done all of the really hard work.

This allows threads to have completely shared state, at the cost of 
not being quite as efficient on SMP (they might be more efficient on 
single processors as there are fewer kernel traps necessary).
Not likely to be more efficient even on a single processor, since even 
if a process is single threaded it is being preempted by other 
processes. (On Mac OS X, I've not been able to create a case where 
being multithreaded is a slowdown in the absence of locking--even for 
pure computation on a single processor machine, being multithreaded is 
actually a slight performance gain.)

Programs that want to run faster on an SMP will use threads without 
shared that use events to communicate.
It's nice to have speed gains on MP machines without having to redesign 
your application, especially as MP machines are quickly becoming the 
norm.

(which probably provides better performance, as there will be fewer 
faults to main memory because of cache misses and shared data).
Probably not faster actually, since you'll end up with more data 
copying (and more total data).

I understand if this suggestion is dismissed for violating the rules, 
but I would like an answer to the question simply because I do not 
know the answer.
I hope my answers are useful. I think it's always okay to ask questions.

JEff



Re: Thread notes

2004-01-03 Thread Dan Sugalski
At 1:11 AM +0100 1/4/04, Leopold Toetsch wrote:
Dan Sugalski [EMAIL PROTECTED] wrote:
 2) The only thread constructs we are going to count on are:
*) Abstract, non-recursive, simple locks
*) Rendezvous points (Things threads go to sleep on until another
 thread pings the condition)
*) Semaphores (in the I do a V and P operation, with a count)
All d'accord with above but: I'm not sure yet, but /me thinks that we
need to have a CLEANUP_PUSH and _POP handler functionality too. But
these are basically macros (simple in the absence of pthread_kill or
such) and currently already used :)
I wasn't listing anything we can build ourselves -- arguably we only 
need two of the three thigns I listed, since with semaphores you can 
do the rendezvous things (POSIX condition variables, but I'm sure 
windows has something similar) and vice versa.

  3) I'm still not paying much attention.

May I ask why?
-- Yes :)
Why?
Got a killer deadline at work. Things ease up after the 9th if I make 
it, but then I owe someone else at home a few days of time. :)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Thread notes

2004-01-03 Thread Dan Sugalski
At 11:42 PM + 1/3/04, Nigel Sandever wrote:
03/01/04 23:20:17, Dan Sugalski [EMAIL PROTECTED] wrote:

[Dan getting cranky snipped]

And that was that! Sorry I spoke.
I'm not trying to shut anyone down. What I wanted to do was stop 
folks diving down too low a level. Yes, we could roll our own 
mutexes, condition variables, and semaphores, but we're not going to; 
it's far too system--not just architecture or OS specific, but system 
setup specific. Single-processor systems want to context switch on 
mutex aquisition failures, SMP systems want to use adaptive 
spinlocks, atomic test-and-set operations aren't necessarily on some 
NUMA systems, and ordering operations are somewhat fuzzy on some of 
the more advanced processors--and that's all on x86 systems.

All this stuff is best left to the OS, which presumably has a better 
idea of what the right and most efficient thing to do is, and 
certainly has more resources behind it than we do. Definitely is in a 
position to be up-to-date, in ways that we don't have. (You can 
guarantee that the OS on a system is sufficiently up-to-date to run 
properly, but it's not the same with user executables, which can be 
years old)

I really don't want folks to get distracted by trying to get down to 
the metal--it'll just get folks all worked up over something we're 
not going to be doing because it's not prudent. I'd prefer everyone 
get worked up over the higher-level stuff and just assume we have the 
simple stuff at hand, and as the simple stuff is all we can safely 
assume that's just a prudent thing.

(This is one of those cases where I'd really prefer for force 
everyone doing thread work to have to work on 8 processor Alpha boxes 
(your choice of OS, I don't care), one of the most vicious threading 
enviroments ever devised, but alas that's not going to happen. Pity, 
though)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Thread notes

2004-01-03 Thread Uri Guttman
 DS == Dan Sugalski [EMAIL PROTECTED] writes:

  DS (This is one of those cases where I'd really prefer for force
  DS everyone doing thread work to have to work on 8 processor Alpha
  DS boxes (your choice of OS, I don't care), one of the most vicious
  DS threading enviroments ever devised, but alas that's not going to
  DS happen. Pity, though)

single cpu lsi-11's running FG/BG rt-11 doesn't count? :)

it was a dec product too! :)

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org


Re: Thread notes

2004-01-03 Thread Dan Sugalski
At 11:49 PM -0500 1/3/04, Uri Guttman wrote:
  DS == Dan Sugalski [EMAIL PROTECTED] writes:

  DS (This is one of those cases where I'd really prefer for force
  DS everyone doing thread work to have to work on 8 processor Alpha
  DS boxes (your choice of OS, I don't care), one of the most vicious
  DS threading enviroments ever devised, but alas that's not going to
  DS happen. Pity, though)
single cpu lsi-11's running FG/BG rt-11 doesn't count? :)
Given that it's not a SMP, massively out of order NUMA system with 
delayed writes... no. 'Fraid not.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Thread notes

2004-01-03 Thread Uri Guttman
 DS == Dan Sugalski [EMAIL PROTECTED] writes:

  DS At 11:49 PM -0500 1/3/04, Uri Guttman wrote:
DS == Dan Sugalski [EMAIL PROTECTED] writes:
   
  DS (This is one of those cases where I'd really prefer for force
  DS everyone doing thread work to have to work on 8 processor Alpha
  DS boxes (your choice of OS, I don't care), one of the most vicious
  DS threading enviroments ever devised, but alas that's not going to
  DS happen. Pity, though)
   
   single cpu lsi-11's running FG/BG rt-11 doesn't count? :)

  DS Given that it's not a SMP, massively out of order NUMA system with
  DS delayed writes... no. 'Fraid not.

bah, humbug. then dec lied in their marketing crap.

actually i think there were SMP pdp/lsi-11 systems but i never had one.

tonight i happened to drive by the apartment where 20 years ago i lived
alone with an lsi-11 box that my employer lent me (cost $10k!!). did my
thesis on it. times have changed a little.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org