Re: Strings. Finally.

2004-06-16 Thread Damien Neil
On Jun 14, 2004, at 1:54 PM, Dan Sugalski wrote:
Parrot provides code points for all graphemes, even for those
character sets/encodings which don't inherently do so. Most sets that
have variable-length encodings use an escape sequence scheme--the
value of the first byte in a character determines whether the
grapheme is a one or more byte sequence. When parrot turns these into
code points it does it by building up the final value. The first byte
is put in the low 8 bits of the integer. If there's a second byte in
the sequence the current value is shifted left 8 bits and the new byte
is stuffed in the low 8 bits. If there's a third byte in the sequence
everything is shifted left again 8 bits and that third byte is stuffed
in the bottom, and so on.
A grapheme consists of one or more code points.  Is provides code 
points for all graphemes really what is intended here?  I assume not, 
since you can't represent every combination of combining Unicode 
characters (COMBINING GRAVE ACCENT + KATAKANA LETTER KA, say) in a 
single 32-bit code point.

  - Damien


Re: Threads... last call

2004-01-29 Thread Damien Neil
On Wed, Jan 28, 2004 at 12:53:09PM -0500, Melvin Smith wrote:
 At 12:27 PM 1/23/2004 -0800, Damien Neil wrote:
 Java Collections are a standard Java library of common data structures
 such as arrays and hashes.  Collections are not synchronized; access
 involves no locks at all.  Multiple threads accessing the same
 collection at the same time cannot, however, result in the virtual
 machine crashing.  (They can result in data structure corruption,
 but this corruption is limited to surprising results rather than
 VM crash.)
 
 But this accomplishes nothing useful and still means the data structure
 is not re-entrant, nor is it corruption resistant, regardless of how we 
 judge it.

Quite the contrary--it is most useful.

Parrot must, we all agree, under no circumstances crash due to
unsynchronized data access.  For it to do so would be, among other
things, a gross security hole when running untrusted code in a
restricted environment.

There is no need for any further guarantee about unsynchronized data
access, however.  If unsyncronized threads invariably cause an exception,
that's fine.  If they cause the threads involved to halt, that's fine
too.  If they cause what was once an integer variable to turn into a
string containing the definition of mulching...well, that too falls
under the heading of undefined results.  Parrot cannot and should
not attempt to correct for bugs in user code, beyond limiting the extent
of the damage to the threads and data structures involved.

Java, when released, took the path that Parrot appears to be about
to take--access to complex data structures (such as Vector) was
always synchronized.  This turned out to be a mistake--sufficiently
so that Java programmers would often implement their own custom,
unsynchronized replacements for the core classes.  As a result,
when the Collections library (which replaces those original data
structures) was released, the classes in it were left unsynchronized.

In Java's case, the problem was at the library level, not the VM
level; as such, it was relatively easy to fix at a later date.
Parrot's VM-level data structure locking will be less easy to change.

 - Damien


Re: Threads... last call

2004-01-23 Thread Damien Neil
On Fri, Jan 23, 2004 at 10:07:25AM -0500, Dan Sugalski wrote:
 A single global lock, like python and ruby use, kill any hope of 
 SMP-ability.

Assume, for the sake of argument, that locking almost every PMC
every time a thread touches it causes Parrot to run four times
slower.  Assume also that all multithreaded applications are
perfectly parallelizable, so overall performance scales linearly
with number of CPUs.  In this case, threaded Parrot will need
to run on a 4-CPU machine to match the speed of a single-lock
design running on a single CPU.  The only people that will benefit
from the multi-lock design are those using machines with more than
4 CPUs--everyone else is worse off.

This is a theoretical case, of course.  We don't know exactly how
much of a performance hit Parrot will incur from a lock-everything
design.  I think that it would be a very good idea to know for
certain what the costs will be, before it becomes too late to change
course.

Perhaps the cost will be minimal--a 20% per-CPU overhead would
almost certainly be worth the ability to take advantage of multiple
CPUs.  Right now, however, there is no empirical data on which to
base a decision.  I think that making a decision without that data
is unwise.

As I said, I've seen a real-world program which was rewritten to
take advantage of multiple CPUs.  The rewrite fulfilled the design
goals: the new version scaled with added CPUs. Unfortunately, lock
overhead made it sufficiently slower that it took 2-4 CPUs to match
the old performance on a single CPU--despite the fact that almost
all lock attempts succeeded without contention.

The current Parrot design proposal looks very much like the locking
model that app used.


 Corruption-resistent data structures without locking just don't exist.

An existence proof:

Java Collections are a standard Java library of common data structures
such as arrays and hashes.  Collections are not synchronized; access
involves no locks at all.  Multiple threads accessing the same
collection at the same time cannot, however, result in the virtual
machine crashing.  (They can result in data structure corruption,
but this corruption is limited to surprising results rather than
VM crash.)

  - Damien


Re: Start of thread proposal

2004-01-21 Thread Damien Neil
On Wed, Jan 21, 2004 at 01:14:46PM -0500, Dan Sugalski wrote:
 ... seems to indicate that even whole ops like add P,P,P are atomic.
 
 Yep. They have to be, because they need to guarantee the integrity of 
 the pmc structures and the data hanging off them (which includes 
 buffer and string stuff)

Personally, I think it would be better to use corruption-resistant
buffer and string structures, and avoid locking during basic data
access.  While there are substantial differences in VM design--PMCs
are much more complicated than any JVM data type--the JVM does provide
a good example that this can be done, and done efficiently.

Failing this, it would be worth investigating what the real-world
performance difference is between acquiring multiple locks per VM
operation (current Parrot proposal) vs. having a single lock
controlling all data access (Python) or jettisoning OS threads
entirely in favor of VM-level threading (Ruby).  This forfeits the
ability to take advantage of multiple CPUs--but Leopold's initial
timing tests of shared PMCs were showing a potential 3-5x slowdown
from excessive locking.

I've seen software before that was redesigned to take advantage of
multiple CPUs--and then required no less than four CPUs to match
the performance of the older, single-CPU version.  The problem was
largely attributed to excessive locking of mostly-uncontested data
structures.

- Damien


Re: JVM as a threading example (threads proposal)

2004-01-16 Thread Damien Neil
On Thu, Jan 15, 2004 at 11:58:22PM -0800, Jeff Clites wrote:
 On Jan 15, 2004, at 10:55 PM, Leopold Toetsch wrote:
 Yes, that's what I'm saying. I don't see an advantage of JVMs 
 multi-step
 variable access, because it even doesn't provide such atomic access.

You're missing the point of the multi-step access.  It has nothing to
do with threading or atomic access to variables.

The JVM is a stack machine.  JVM opcodes operate on the stack, not on
main memory.  The stack is thread-local.  In order for a thread to operate
on a variable, therefore, it must first copy it from main store to thread-
local store (the stack).

Parrot, so far as I know, operates in exactly the same way, except that
the thread-local store is a set of registers rather than a stack.

Both VMs separate working-set data (the stack and/or registers) from
main store to reduce symbol table lookups.


 What I was expecting that the Java model was trying to do (though I 
 didn't find this) was something along these lines: Accessing the main 
 store involves locking, so by copying things to a thread-local store we 
 can perform several operations on an item before we have to move it 
 back to the main store (again, with locking). If we worked directly 
 from the main store, we'd have to lock for each and every use of the 
 variable.

I don't believe accesses to main store require locking in the JVM.

This will all make a lot more sense if you keep in mind that Parrot--
unthreaded as it is right now--*also* copies variables to working store
before operating on them.  This isn't some odd JVM strangeness.  The
JVM threading document is simply describing how the stack interacts
with main memory.

  - Damien


Re: JVM as a threading example (threads proposal)

2004-01-15 Thread Damien Neil
On Thu, Jan 15, 2004 at 09:31:39AM +0100, Leopold Toetsch wrote:
 I don't see any advantage of such a model. The more as it doesn't
 gurantee any atomic access to e.g. long or doubles. The atomic access to
 ints and pointers seems to rely on the architecture but is of course
 reasonable.

You *can't* guarantee atomic access to longs and doubles on some
architectures, unless you wrap every read or write to one with a
lock.  The CPU support isn't there.

(Why the e.g.?  Longs and doubles are explicitly the only core
data types which the JVM does not guarantee atomic access to.)

   - Damien


Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Damien Neil
On Sun, Jan 04, 2004 at 12:17:33PM -0800, Jeff Clites wrote:
 What are these standard techniques? The JVM spec does seem to guarantee 
 that even in the absence of proper locking by user code, things won't 
 go completely haywire, but I can't figure out how this is possible 
 without actual locking. (That is, I'm wondering if Java is doing 
 something clever.) For instance, inserting something into a collection 
 will often require updating more than one memory location (especially 
 if the collection is out of space and needs to be grown), and I can't 
 figure out how this could be guaranteed not to completely corrupt 
 internal state in the absence of locking. (And if it _does_ require 
 locking, then it seems that the insertion method would in fact then be 
 synchronized.)

My understanding is that Java Collections are generally implemented
in Java.  Since the underlying Java bytecode does not permit unsafe
operations, Collections are therefore safe.  (Of course, unsynchronized
writes to a Collection will probably result in exceptions--but it
won't crash the JVM.)

For example, insertion into a list might be handled something like
this (apologies for rusty Java skills):
  void append(Object new_entry) {
if (a.length = size) {
  Object new_a[] = new Object[size * 2];
  for (int i = 0; i  size; i++) {
new_a[i] = a;
  }
}
a[size++] = new_entry;
  }

If two threads call this function at the same time, they may well leave
the list object in an inconsistent state--but there is no way that the
above code can cause JVM-level problems.

The key decision in Java threading is to forbid modification of all
bytecode-level types that cannot be atomically modified.  For example,
the size of an array cannot be changed, and strings are constant.
If it WERE possible to resize arrays, the above code would require locks
to avoid potential JVM corruption--every access to 'a' would need a lock
against the possiblity that another thread was in the process of resizing
it.

It's my understanding that Parrot has chosen to take the path of using
many mutable data structures at the VM level; unfortunately, this is
pretty much incompatible with a fast or elegant threading model.

- Damien


Re: Events

2003-07-24 Thread Damien Neil
On Tue, Jul 22, 2003 at 11:41:25PM -0400, Dan Sugalski wrote:
 First, to get it out of the way, I don't have to convince you of 
 anything. You have to convince me. For better or worse I'm 
 responsible for the design and its ultimately my decision. If you 
 don't want async IO, it's time to make a very, very good case. 

I hope that I haven't given the impression that I feel otherwise.
You're the designer, and Parrot is your baby.  I'm just expressing
my opinion; you are of course completely free to disagree with me.


Let me restate my position, since I think it's getting lost in the
general confusion:

I'd be happy if Parrot contained no support at all for interrupts,
in particular the traditional interrupt-based delivery of Unix
signals.  I think that support for interrupts will come at a cost,
and I'd prefer not to have to pay that cost.


I've expounded at length in an earlier message on why I think
interrupts in application-level code is generally a bad idea.  I
won't bother repeating myself here; I don't think I said anything
particularly controversial there.


I'm not arguing against non-blocking IO, event loops, a unified
event queue, or internally using the aio_*() API on Unix.  I think
that all of these things are Nifty(tm) and I highly approve of all
of them.

I /am/ arguing against exposing the aio_*() API (or its equivalent)
to code running atop the Parrot VM, on the grounds that it uses
interrupts as a part of the API.  I'd rather just have non-blocking
IO calls and a good event queue.


On a somewhat related note, I'm dubious about the performance gains
that code using interrupt-driven IO will see as opposed to code using
event-loop driven IO.  I /think/ you're telling me that I'm wrong,
and that interrupt-driven IO does indeed have performance benefits;
it's possible that you're actually telling me that event-loop driven
code with non-blocking IO has performance benefits as compared to
threaded code with blocking IO.  If it's the latter, then we are
in violent agreement. :

   - Damien


Re: Events

2003-07-22 Thread Damien Neil
On Sun, Jul 20, 2003 at 11:59:00AM -0400, Dan Sugalski wrote:
 We're supporting interrupts at the interpreter level because we must. 
 It doesn't matter much whether we like them, or we think they're a 
 good idea, the languages we target require them to be there. Perl 5, 
 Perl 6, Python, and Ruby all have support for Unix signals in pretty 
 much the way you'd get them if you were writing C code. (That is to 
 say, broken by design, dangerous to use, and of far less utility than 
 they seem on the surface)

Right, which is why I said in my initial message that dropping
interrupts might be politically impossible.

I still think that including something that is broken by design,
dangerous to use, and of questionable utility isn't a good idea,
but I can accept the argument that it may be necessary.


 It would be entirely possible for Parrot (or a Parrot library) to
 use AIO at a low level, without introducing interrupts to the VM layer.
 
 Sure. But what'd be the point? Adding in interrupts allows a number 
 of high-performance idioms that aren't available without them. They 
 certainly won't be required, and most of the IO you'll see done will 
 be entirely synchronous, since that's what most compilers will be 
 spitting out. You don't *have* to use IO callbacks, you just can if 
 you want to.

Could point me at a reference for these high-performance idioms?
While I've heard of significant gains being realized through AIO,
it was my understanding that this is generally related to disk IO,
where Unix doesn't provide support for non-blocking IO.  The
performance gains come not from a different code flow, but from the
ability to perform disk access in the background.

(I'm not disputing that such idioms exist; if there's a better
way to do things that I don't know of, I want to know more about it!)


 Regarding AIO being faster: Beware premature optimization.
 
 I'm going to start carrying a nerf bat around and smack people who 
 trot this one out.

The fact that it is often said does not make it any less true.

You've asserted that Parrot will be faster (in at least some
situations) with interrupt-driven IO than it will be with
non-interrupt-driven IO.  I'm unconvinced of this claim.  In
particular, I feel that support for interrupts will come at an
overall performance penalty, and I am unconvinced that this penalty
will not outweigh any benefits that interrupt-driven IO would bring.

Now, you can ignore me if you want; you're the designer.  Hitting
me isn't going to convince me of anything, however.


 While it's not inappropriate to apply it to design, we're nowhere 
 near that point. This isn't premature optimization, or optimization 
 of any sort--it's design, and it should be done now. This is what 
 we're *supposed* to be doing. It's certainly reasonable to posit that 
 async IO is a bad design choice (won't get you very far, but you can 
 posit it :) but please don't trot out the premature optimization 
 quote.

This is *exactly* the time when that quote is appropriate to apply.
When a design decision is made because it'll be faster that way,
it is always worth examining the question of whether it WILL be
faster or not.  (I am aware that there is a second reason for
supporting interrupts in Parrot--Unix signals; I was addressing the
argument that support for AIO is sufficient reason to include
interrupts.)

For example: If it turns out that Parrot, sans interrupt-driven IO,
is capable of saturating the system bus when writing to a device,
there is little point in optimizing Parrot's IO system.


 You may suspect, but you'd turn out to be incorrect--using threads to 
 simulate a real async IO system still has performance wins. And we're 
 going to be using native async stuff when we can.

Do you know of a program that does this (simulated AIO via threads)?
(Again, I'm not disputing your claim--it's just that this is
completely contrary to my experience, and I'd like to know more
about it.)

  - Damien


Re: Events

2003-07-19 Thread Damien Neil
On Fri, Jul 18, 2003 at 05:42:10PM -0400, Benjamin Goldberg wrote:
 AIO is unpopular because it's not widely/portably supported, whereas
 non-blocking event-loop IO is, (one generally does have either select or
 poll), as is blocking threaded IO (even if the thread starting stuff may
 need to be different on some platform, it's *mostly* portable).

I disagree; AIO is not widely/portably supported because it is
unpopular.  Threading (on Unix systems, at least) is a much newer
concept than AIO, and yet it is now nigh-ubiquitous; any modern OS
needs to have solid threading support to be taken seriously.  Portable
libraries wrapping system-specific thread models are common.  The only
reason this hasn't happened with AIO is lack of user demand.

The problem with AIO is that it has all the synchronization pain
of threading combined with the code flow complexity of an event-based
IO system.  There are certianly occasions when AIO may prove to be
the best or most elegant solution to a problem, but in most cases
there are other approaches which are substantially simpler for the
programmer.


 If we make it a core part of parrot, it will become more popular, simply
 because of parrot's availability.

I'd be interested in seeing specific examples of problems that will
be solved by adding AIO support to the VM layer.  How will this feature
be used in real-world programs?


  Outside of signals and AIO, what requires async event dispatch?
  User events, as you pointed out above, are better handled through
  an explicit request for the next event.
 
 Inter-thread notification and timers?  True, these *could* be considered
 to be user events, but IMHO, there are times when we want a user
 event to have the same (high) priority as a system signal.

I'd like a specific example (general pseudocode fine) of inter-thread
notification implemented using interrupts that a) doesn't include
any race conditions, and b) can't be written more clearly using
non-interrupt based code.

I think you're vastly underestimating the difficulty of writing
interrupt-based code that doesn't include race conditions.  Consider
that Parrot itself has given up on trying to do this: internally,
interrupts (signals) will simply result in an event being added to
a queue for later processing.

- Damien


Re: Events

2003-07-19 Thread Damien Neil
On Fri, Jul 18, 2003 at 05:58:41PM -0400, Uri Guttman wrote:
 and event loop i/o doesn't usually support async file i/o. many people
 conflate the two. event i/o handles sockets, pipes and such but none
 support files. the issue is that the async file i/o api is so different
 (or non-existant) on all the platforms. dan wants to make clean async
 file i/o in the core by using a blocking thread on each i/o request and
 synchronizing them with this event queue in the main thread. it has the
 advantage of being easier to code and should work on most platforms
 which have threads. and he wants async file i/o in parrot core since it
 is faster and has to be in the core to be properly supported at higher
 levels.

Right, there are two independent issues here: Support for asynchronous
IO (an OS feature distinct from non-blocking IO), and VM-level support
for interrupts in Parrot.  The latter is what I am questioning.

It would be entirely possible for Parrot (or a Parrot library) to
use AIO at a low level, without introducing interrupts to the VM layer.

The fact that most event loops do not support async file IO on Unix
systems is due to a combination of deficiencies in the Unix APIs
(select() and poll() don't work on files), and lack implementation
in the event library.  There is certainly no reason a traditional
event loop (such at Tcl's, which is an excellent example of a
well-done event system) can't use AIO at a low level to support
async file IO on Unix.

(I specifically refer to Unix above because many non-Unix systems
have perfectly good support for monitoring files in their equivalents
of select().  So do some Unixes, for that matter.)

Regarding AIO being faster: Beware premature optimization.  And are
you referring to OS-level AIO (which often does have performance
advantages), or application-level AIO using a collection of threads
as you describe above (which I suspect will be slower than
single-threaded non-blocking IO, owing to synchronization costs
between threads)?


 that is a major win since no other system i (or dan) has heard of has
 portable async file i/o. and it will be integrated into the core event
 handling so you will be able to mix and match async socket, terminal (on
 unix at least) and file i/o with timers. this is what i want. :)

Do you want to use interrupt-based IO at the VM level, or do you
want an event system which will function cleanly on sockets,
terminals, and files?

- Damien


Re: Events

2003-07-18 Thread Damien Neil
On Thu, Jul 17, 2003 at 12:58:12PM -0400, Dan Sugalski wrote:
 The first is done in the case of readw or writew, for example. The 
 second for event-driven programs that set up callbacks and park 
 themselves forever in one big ProcessEvent call. (Tk programs come to 
 mind) The third is to be scattered in generated code to make sure 
 that events are occasionally checked for, and will probably drain 
 only high-priority events and at most a single low-priorty event. 

While it's possibly politically impossible (many people are very
attached to Unix signals), I'd really rather work with a system
that does async event dispatch exclusively through threads.
Interrupting a thread in the middle of its execution and sending
it haring off to an interrupt handler is not only clumsy and difficut
to implement, it's a recipe for buggy code.

Much nicer would be if events were always dispatched in one of two ways:
  - Synchronously, by calling a GetNextEvent or ProcessEvent function.
  - Asynchronously, by spawning a new thread and executing the signal
handler within it.

Is there any hope for rethinking the desire to expose the ugliness of
Unix signals in the Parrot VM?

- Damien


Re: Events

2003-07-18 Thread Damien Neil
On Fri, Jul 18, 2003 at 11:29:27AM -0400, Dan Sugalski wrote:
 Nope, that won't work. A lot of what's done is really main thread 
 with interrupts and that doesn't map well to doing all signal 
 handling in separate threads. You really do want to pause the main 
 program to handle the events that are coming in, if they're events of 
 sufficient importance. Generally I put them in three classes--hard 
 interrupts (signals), soft interrupts (IO completion stuff), and 
 events (fuzzy user-level stuff). Hard and soft interrupts should get 
 dealt with as soon as possible, events should probably wait until 
 something explicitly decides to process an event.

In my experience, interrupt handlers in Perl code generally fall
into three categories: Ones that set a flag to be checked later,
ones that perform an action and terminate the program, and buggy
ones subject to race conditions.

IO completion events in particular should not be handled by
interrupting the main execution thread.  The appropriate action
required to handle these events will almost invariably require
access to data structures shared between the interrrupt handler
and the main thread.  If you place the interrupt handler in the
main thread, you can't use locks to control access to these
structures (as the handler will wait on the main thread's lock,
while the main thread will block on the handler returning).
This leads to Unix-style signal masks, where interrupts are
blocked during critical sections.  While this works, I strongly
feel that a platform with thread support is better off dispatching
interrupts to a separate thread and using the existing interthread
synchronization mechanisms, rather than introducing a separate
interrupt masking system.

Also, given that asynchronous IO is a fairly unpopular programming
technique these days (non-blocking event-loop IO and blocking
threaded IO are far more common), I would think long and hard before
placing support for it as a core design goal of the VM.  If there
is a compelling reason to use AIO, the implementation may better
be handled at a lower level than Parrot; even if Parrot itself does
not support AIO at the opcode level, Parrot programs could still
use it by calling down to a C library.


 It's not just signals, there's a lot of stuff that falls into this 
 category. We've got to deal with it, and deal with it properly, since 
 not dealing with it gets you an 80% solution.

Outside of signals and AIO, what requires async event dispatch?
User events, as you pointed out above, are better handled through
an explicit request for the next event.

  - Damien


Re: Streams vs. Descriptors

2002-07-16 Thread Damien Neil

On Mon, Jul 15, 2002 at 08:59:40PM -0400, Melvin Smith wrote:
 True async IO implementations allow other things besides just notifying
 the process when data is available. Things like predictive seeks, or
 bundling up multiple read/writes, etc. aren't doable with select/poll loops.
 And the aioread/aiowrite/listio, etc. are a POSIX standard now, so they
 should be reasonably available on most UNIXen.

I'm not familiar with predictive seeks, and a quick google didn't
turn up anything relevant; can you give a quick explanation?

Bundling reads and writes sounds like a job for a buffered I/O layer.

Are the aio* calls available on Windows?  On the Macintosh?  (My OS X
system doesn't have a manpage for aioread, and man -k aio doesn't
turn up anything obvious.)  How about PalmOS?  While the POSIX standard
is a help, I think async I/O remains far less portable than the more
traditional alternatives.


 You are right,  though, I blurred the concepts. Callbacks are good to have 
 as well,
 for calling code blocks when data is available, and this might be done
 as an event loop, or a thread. However, the talks I've had with Dan always 
 ended
 up in us deciding that calling an event loop between every op, or even
 every N ops wasn't what we wanted to do.

Certainly, calling an event loop between every op would be insane.
That's not the normal way of using one, however.  Consider the
(excellent) Tcl event loop as an example: When a condition triggers
the loop, it invokes the appropriate callback which runs to completion
before returning control to the loop.

This doesn't allow an event to interrupt the current thread of
control, of course.  The most common way of having multiple
concurrent threads is, however, exactly that--threads.  Threads
can be used independently (the Java approach; all I/O is blocking)
or in conjunction with an event loop (the Macintosh OS X event loop
takes this approach).

I really recommend taking a look at the Tcl event loop and I/O system,
if you haven't already.  It's a joy to work with, and one of the best
features of that language.


 For many things, synchronous IO is adequate, and faster, but for people
 that really want the aio interface, I'm not sure it is worth trying to fake 
 it.

I'm sure that there are things async I/O is very good at, but I'm
not certain it makes sense to design Parrot's I/O system around them.
Might it not make more sense for async I/O to be available via an
alternate API?

  - Damien



Re: Streams vs. Descriptors

2002-07-16 Thread Damien Neil

On Tue, Jul 16, 2002 at 11:35:10AM -0700, John Porter wrote:
 Damien Neil wrote:
  I'm not familiar with predictive seeks,
  can you give a quick explanation?
 
 It's very much like predictive loading of the instruction cache
 in a cpu.  It makes a heuristic guess that since you just read
 1000 bytes in order, you're probably going to want to read the
 next 1000 bytes in order, so it reads them in even before you
 ask for them.  This can be extended to seeks in general.
 However, prediction is usually too strong a term.
 It's usually just pre-reading of the linear stream[1].
 (The program is a lazy consumer. :-)

Ah, that I'm familiar with.  Surely that isn't specific to async
I/O?  I'm fairly certain that many OSs will do readahead on ordinary
read() calls.


 In the end, there should be nothing of which it can be said,
 It is easier to do in Tcl than in Perl. [2]

Hear, hear!  :

I've been missing Tcl's event loop for years.

   - Damien



Re: PARROT QUESTIONS: Use the source, Luke

2002-07-15 Thread Damien Neil

On Mon, Jul 15, 2002 at 12:34:52AM -0400, Melvin Smith wrote:
 The last four are reserved by various C and C++ standards.
 
 I always hear this, but in real life it is never much of a problem. 
 Especially
 with a namespace like [Parrot]

It is a good idea to avoid using the reserved identifier space, not
only because it avoids conflicts with vendor libraries, but for
documentation purposes.  The leading underscore means system internal,
do not touch; blurring this meaning doesn't help.

It's also unnecessary.  It isn't like there aren't perfectly good
alternatives--what's wrong with Parrot__?

  - Damien



Re: Streams vs. Descriptors

2002-07-15 Thread Damien Neil

On Mon, Jul 15, 2002 at 12:16:29AM -0400, Melvin Smith wrote:
 1) Async support. The IO system needs to be asynchronous and re-entrant
 at the core, whether by threads or by use of the platform's async support.
 Other things like callbacks assume other features of Parrot to be finished,
 like subs/methods.

Out of curiosity, what's the motivation for supporting true signal-driven
async IO (which is what you seem to be referring to)?  In my experience,
nonblocking IO and a standard event loop is more than sufficient, and
far easier to implement--especially portably.

 - Damien



Re: [CONFIGURE] New make.pl coming soon...

2002-04-24 Thread Damien Neil

On Wednesday, April 24, 2002, at 04:04 PM, Robert Spier wrote:
 One of the keys of the system Jeff has implemented is that it's 100%
 real perl code and real perl objects, not a language parsed with
 perl.  This means you can do nifty things and write perl code to
 modify things in a natural way.

This is true for cons as well.

- Damien




0.0.2 needs what?

2001-09-25 Thread Damien Neil

Are there any issues holding up 0.0.2 that I (or others) could work
on?  Failing that, what areas of Parrot are most in need of immediate
work?

I'm interested in looking at the bytecode loader, if nobody else
has intentions there.  In particular, I'd like to see if we can
get empirical data to justify some of the design decisions that
are being assumed.  Exactly how expensive, for example, would it
be to use a single bytecode format with platform-independent
encodings?

  - Damien



Re: Strings db

2001-09-25 Thread Damien Neil

On Tue, Sep 25, 2001 at 07:29:01PM -0700, Wizard wrote:
 Actually, the thing that I didn't like was using an actual string as the
 message_id. I would have expected something more in the way of:
 
 char *err = get_text_string( THREAD_EXCEPTION_117, \
   THREAD EXCEPTION: Not enough handles. );

This is a far more error-prone interface in a number of ways:  It
is very easy for the mapping between the number and the string to
be lost.  Adding and removing strings is harder: the string list
will become filled with holes (or must be renumbered), and the
numeric order of the strings will probably not correspond with the
logical order.  Numerically indexes are far more prone to failure
when using out-of-date catalog files, while string-indexed ones
will mostly continue to work.  (With the obvious exception that
messages not contained in the old catalog cannot be displayed from
it.)

All these disadvantages are a significant penalty to pay for a very
minor improvement in efficiency.  (If there is one thing that Perl
has demonstrated, it is that looking up a string in a hash is fast.)

- Damien



Re: 0.0.2 needs what?

2001-09-25 Thread Damien Neil

On Tue, Sep 25, 2001 at 07:36:31PM -0400, Gregor N. Purdy wrote:
 I'm currently working on some assigned taskes for the bytecode stuff
 for 0.0.2. I need to get it to the point where we can stash NVs in
 the const_table. I've already got the interpreter using packfile.[hc]
 for its work (I posted a patch earlier today).

After taking a look at the packfile code, I think the interface
needs to be made more generic.  I don't believe the file format
should be aware of the nature of the contents.  For example, rather
than having functions to access the constant table, the fixup table,
and the bytecode, I would rather see a single set of functions
which take a section ID as a parameter.

I also feel that the prior discussions on using a preexisting file
format were on the right track.  With a good API, however, the file
format can be completely redefined, so this is a less pressing
concern.  (I also still think that IFF fits our needs quite closely,
although its support for data structure nesting may be more than we
want.)

   - Damien



Re: 0.0.2 needs what?

2001-09-25 Thread Damien Neil

On Wed, Sep 26, 2001 at 12:38:28AM +0100, Simon Cozens wrote:
 But then I'm one of those weird critters who doesn't understand what all the
 complaining over XS is about. :) I'd be happy to do the XS coding if it came
 down to it.

I'll take a look at making the assembler and disassembler use the
C packfile routines through XS.  As I just mentioned in a previous
mail, however, I'm not very happy with the current packfile API...
should I go ahead and use the existing one (temporarily, I hope :),
or is this section not covered by the current feature freeze?

   - Damien



Re: Draft switch for DO_OP() :-)

2001-09-22 Thread Damien Neil

On Thu, Sep 20, 2001 at 11:11:42AM -0400, Dan Sugalski wrote:
 Actually the ops=C conversion was conceived to do exactly what's being 
 done now--to abstract out the body of the opcodes so that they could be 
 turned into a switch, or turned into generated machine code, or TIL'd. If 
 you're finding that this isn't working well it's a sign we need to change 
 things some so they will. (Better now than in six months...)

The problem is that the conversion currently done by process_opcodes.pl
translates the op definitions into functions, and leaves the remainder
of the file untouched.  This is useful, because it allows the opcode
file to include headers, declare file-scope variables, and the like.
Unfortunately, when translating the ops into a switch statement in a
header file, there is no place to put this non-opcode code.

There are a few approaches we can take.  The simplest, I think, is to
ignore the problem when generating inline ops; given that these ops
are going to be compiled at Perl build time (they can never be
dynamically loaded for obvious reasons), we can manually put any
required #includes in interpreter.c.  Files containing dynamically-
loaded ops can be generated in the same way that process_opcodes.pl
does now, preserving the file-scope code.

Another approach would be to include a means of defining information
that must be included by the file implementing the ops.  For example:

  HEADER {
  #include math.h
  }

This would then be placed into interp_guts.h.  (Possibly surrounded
by a conditional guard (#ifdef PARROT_OP_IMPLEMENTATION), so no file  
other than interpreter.h will pick up that code.)

- Damien



Re: niave question about Parrot::Opcode

2001-09-20 Thread Damien Neil

On Wed, Sep 19, 2001 at 01:40:31PM -0400, Pat Eyler wrote:
 I realize that the $count inside the if block shown masks the $count
 declared outside the while loop, but (to me) this would be easier to
 understand if the inner $count where changed to $numParams -- it is more
 obvious on casual reading that $count and $count are two different
 things.  Am I missing something?

No, you aren't.  That IS confusing.


 2) It also appears that a second (older?) version of read_ops and an
 associated pile of pod is still in the Opcode.pm file can this be
 trimmed (removing about 80 lines from the file)?

Where on earth did that come from?

Patch attached to rename the second $count, and to remove the
duplicate code.

  - Damien


Index: Parrot/Opcode.pm
===
RCS file: /home/perlcvs/parrot/Parrot/Opcode.pm,v
retrieving revision 1.6
diff -u -r1.6 Opcode.pm
--- Parrot/Opcode.pm2001/09/18 00:32:15 1.6
+++ Parrot/Opcode.pm2001/09/20 07:23:44
@@ -28,9 +28,9 @@
 
my($name, @params) = split /\s+/;
if (@params  $params[0] =~ /^\d+$/) {
-   my $count = shift @params;
+   my $nparams = shift @params;
die $file, line $.: opcode $name parameters don't match count\n
- if ($count != @params);
+ if ($nparams != @params);
}
 
warn $file, line $.: opcode $name redefined\n if $opcode{$name};
@@ -108,91 +108,5 @@
 
 The fingerprint() function returns the MD5 signature (in hex) of the
 opcode table.
-
-=cut
-package Parrot::Opcode;
-
-use strict;
-use Symbol;
-
-sub read_ops {
-my $file = @_ ? shift : opcode_table;
-
-my $fh = gensym;
-open $fh, $file or die $file: $!\n;
-
-my %opcode;
-my $count = 1;
-while ($fh) {
-   s/#.*//;
-   s/^\s+//;
-   chomp;
-   next unless $_;
-
-   my($name, @params) = split /\s+/;
-   if (@params  $params[0] =~ /^\d+$/) {
-   my $count = shift @params;
-   die $file, line $.: opcode $name parameters don't match count\n
- if ($count != @params);
-   }
-
-   warn $file, line $.: opcode $name redefined\n if $opcode{$name};
-
-   $opcode{$name}{ARGS}  = @params;
-   $opcode{$name}{TYPES} = \@params;
-   $opcode{$name}{CODE}  = ($name eq end) ? 0 : $count++;
-   $opcode{$name}{FUNC}  = Parrot_op_$name;
-
-   my $num_i = () = grep {/i/} @params;
-   my $num_n = () = grep {/n/} @params;
-   $opcode{$name}{RETURN_OFFSET} = 1 + $num_i + $num_n * 2;
-}
-
-return %opcode;
-}
-
-1;
-
-
-__END__
-
-=head1 NAME
-
-Parrot::Opcode - Read opcode definitions
-
-=head1 SYNOPSIS
-
-  use Parrot::Opcode;
-
-  %opcodes = Parrot::Opcode::read_ops();
-
-=head1 DESCRIPTION
-
-The read_ops() function parses the Parrot opcode_table file, and
-returns the contents as a hash.  The hash key is the opcode name;
-values are hashrefs containing the following fields:
-
-=over
-
-=item CODE
-
-The opcode number.
-
-=item ARGS
-
-The opcode argument count.
-
-=item TYPES
-
-The opcode argument types, as an arrayref.
-
-=item FUNC
-
-The name of the C function implementing this op.
-
-=back
-
-read_ops() takes an optional argument: the file to read the opcode table
-from.
 
 =cut



Re: [PATCH] Changes to interpreter op table and simplified DO_OP

2001-09-20 Thread Damien Neil

Oops; that'll teach me to submit things before a cvs update.  The
generate.pl I just sent is out-of-date with regards to CVS.  Attached
is an updated version.

(I haven't seen my prior mail go through yet; I'm guessing this is
the list being slow, but it might be a problem with my local mail
system.  Just in case, I'm sending this from a different machine.  If
this arrives without my prior message, that means my mail system is
screwy. :)

  - Damien

 generate.pl


Re: question about branching/returning

2001-09-20 Thread Damien Neil

On Wed, Sep 19, 2001 at 10:32:18PM -0700, Dave Storrs wrote:
 Ok, that was pretty much what I thought.  But then what is the 'end'
 opcode for?  It does a 'RETURN 0', which would increment the PC by 0
 opcodes...which either counts as an infinite loop or a no-op, and we've
 already got a no-op op.

RETURN(0) is special-cased by process_opcodes(); it returns a literal 0,
not a relative address.  As other people have noted, this is irrelevent,
as end is never called.

  - Damien



Re: Tru64

2001-09-20 Thread Damien Neil

On Thu, Sep 20, 2001 at 09:06:12AM -0500, Gibbs Tanton - tgibbs wrote:
 Damien, is there any way we could get a similar fix for number.t?  That
 would make us at 100% on Tru64.

(Apologies if this shows up twice; something appears to be screwy with
my mail system.)

I'm currently getting segfaults on all tests on Tru64; I'll look into
it if I get a chance, but I may not have time for a few days.  (I'm
flying to Connecticut for a friend's wedding tomorrow morning.)

I didn't think there were any tests in number.t which would be
particularly architecture-dependent...which ones are failing for
you, and what output are they producing?

  - Damien



Re: Draft switch for DO_OP() :-)

2001-09-20 Thread Damien Neil

On Thu, Sep 20, 2001 at 11:11:42AM -0400, Dan Sugalski wrote:
 Actually the ops=C conversion was conceived to do exactly what's being 
 done now--to abstract out the body of the opcodes so that they could be 
 turned into a switch, or turned into generated machine code, or TIL'd. If 
 you're finding that this isn't working well it's a sign we need to change 
 things some so they will. (Better now than in six months...)

The problem is that the conversion currently done by process_opcodes.pl
translates the op definitions into functions, and leaves the remainder
of the file untouched.  This is useful, because it allows the opcode
file to include headers, declare file-scope variables, and the like.
Unfortunately, when translating the ops into a switch statement in a
header file, there is no place to put this non-opcode code.
 
There are a few approaches we can take.  The simplest, I think, is to
ignore the problem when generating inline ops; given that these ops
are going to be compiled at Perl build time (they can never be
dynamically loaded for obvious reasons), we can manually put any
required #includes in interpreter.c.  Files containing dynamically-
loaded ops can be generated in the same way that process_opcodes.pl
does now, preserving the file-scope code.
 
Another approach would be to include a means of defining information
that must be included by the file implementing the ops.  For example:

  HEADER {
  #include math.h
  }
 
This would then be placed into interp_guts.h.  (Possibly surrounded
by a conditional guard (#ifdef PARROT_OP_IMPLEMENTATION), so no file  
other than interpreter.h will pick up that code.)

- Damien



Re: Name lengths in C code

2001-09-20 Thread Damien Neil

On Thu, Sep 20, 2001 at 05:09:52PM -0400, Dan Sugalski wrote:
 Just a reminder--function names shouldn't exceed 31 characters. The C 
 standard doesn't guarantee anything past that...

You think that's bad?  You aren't guaranteed more than six characters,
case-insensitive for external identifiers.

I've been told that Oracle actually requires conformance to this
in their coding standards.  I'm very happy that I don't have to
write code for Orcale...

 - Damien



_read = read

2001-09-20 Thread Damien Neil

test_main.c still seems to contain a call to _read(), rather than
read().  This breaks compilation under Tru64 for me; the attached
patch removes the _.

- Damien


Index: test_main.c
===
RCS file: /home/perlcvs/parrot/test_main.c,v
retrieving revision 1.11
diff -u -r1.11 test_main.c
--- test_main.c 2001/09/18 21:03:27 1.11
+++ test_main.c 2001/09/20 21:17:44
@@ -94,7 +94,7 @@
 
 #ifndef HAS_HEADER_SYSMMAN
 program_code = (opcode_t*)mem_sys_allocate(program_size);
-_read(fd, (void*)program_code, program_size);
+read(fd, (void*)program_code, program_size);
 #else
 program_code = (opcode_t*)mmap(0, program_size, PROT_READ, MAP_SHARED, fd, 0);
 #endif



Re: Tru64

2001-09-20 Thread Damien Neil

On Thu, Sep 20, 2001 at 09:06:12AM -0500, Gibbs Tanton - tgibbs wrote:
 Failed 1/5 test scripts, 80.00% okay. 7/74 subtests failed, 90.54% okay.
 make: *** [test] Error 2
 
 Damien, is there any way we could get a similar fix for number.t?  That
 would make us at 100% on Tru64.

I'm currently getting segfaults on all tests on Tru64; I'll look into
it if I get a chance, but I may not have time for a few days.  (I'm
flying to Connecticut for a friend's wedding tomorrow morning.)

  - Damien



Some tests

2001-09-18 Thread Damien Neil

The attached file contains tests for all Parrot integer ops.

   - Damien


#! perl -w

use Parrot::Test tests = 26;

output_is(CODE, OUTPUT, set_i_ic);
# XXX: Need a test for writing outside the set of available
# registers.  Parrot doesn't check for this at the moment.
set I0, 0x12345678
print   I0
print   \\n
set I31, 0x9abcdef1
print   I31
print   \\n

set I1, 2147483647
print   I1
print   \\n
set I2, -2147483648
print   I2
print   \\n
set I3, 4294967295
print   I3
print   \\n
CODE
305419896
-1698898191
2147483647
-2147483648
-1
OUTPUT

output_is(CODE, OUTPUT, set_i);
set I0, 0x77665544
set I1, I0
print   I1
print   \\n
CODE
2003195204
OUTPUT

output_is(CODE, OUTPUT, add_i);
set I0, 0x11223344
add I1, I0, I0
print   I1
print   \\n

add I2, I0, I1  
print   I2
print   \\n

add I2, I2, I2
print   I2
print   \\n

set I3, 2147483647
set I4, 1
add I5, I3, I4
print   I5
print   \\n
set I6, -1
add I7, I5, I6
print   I7
print   \\n
CODE
574908040
862362060
1724724120
-2147483648
2147483647
OUTPUT

output_is(CODE, OUTPUT, sub_i);
set I0, 0x12345678
set I1, 0x01234567
sub I2, I0, I1
print   I2
print   \\n
CODE
286331153
OUTPUT

output_is(CODE, OUTPUT, mul_i);
set I0, 7
set I1, 29
mul I2, I0, I1
print   I2
print   \\n
CODE
203
OUTPUT

output_is(CODE, OUTPUT, div_i);
set I0, 0x
set I1, 0x
div I2, I0, I1
print   I2
print   \\n

set I0, 11
set I1, 2
div I2, I0, I1
print   I2
print   \\n

set I0, 9
set I1, -4
div I2, I0, I1
print   I2
print   \\n
CODE
3
5
-2
OUTPUT

output_is(CODE, OUTPUT, mod_i);
set I0, 17
set I1, 5
mod I2, I0, I1
print   I2
print   \\n

set I0, -57
set I1, 10
mod I2, I0, I1
print   I2
print   \\n
CODE
2
-7
OUTPUT

output_is(CODE, OUTPUT, eq_i_ic);
set I0, 0x12345678
set I1, 0x12345678
set I2, 0x76543210

eq  I0, I1, ONE, ERROR
print   bad\\n

ONE:
print   ok 1\\n
eq  I1, I2, ERROR, TWO
print   bad\\n

TWO:
print   ok 2\\n
end

ERROR:
print   bad\\n
CODE
ok 1
ok 2
OUTPUT

output_is(CODE, OUTPUT, eq_ic_ic);
set I0, -42

eq  I0, 42, ERROR, ONE
print   bad\\n

ONE:
print   ok 1\\n
eq  I0, -42, TWO, ERROR
print   bad\\n

TWO:
print   ok 2\\n
end

ERROR:
print   bad\\n
CODE
ok 1
ok 2
OUTPUT

output_is(CODE, OUTPUT, ne_i_ic);
set I0, 0xa0b0c0d0
set I1, 0xa0b0c0d0
set I2, 0

ne  I0, I2, ONE, ERROR
print   bad\\n

ONE:
print   ok 1\\n
ne  I0, I1, ERROR, TWO
print   bad\\n

TWO:
print   ok 2\\n
end

ERROR:
print   bad\\n
CODE
ok 1
ok 2
OUTPUT

output_is(CODE, OUTPUT, ne_ic_ic);
set I0, 427034409

ne  I0, 427034409, ERROR, ONE
print   bad\\n

ONE:
print   ok 1\\n
ne  I0, 427034408, TWO, ERROR
print   bad\\n

TWO:
print   ok 2\\n
end

ERROR:
print   bad\\n
CODE
ok 1
ok 2
OUTPUT

output_is(CODE, OUTPUT, lt_i_ic);
set I0, 2147483647
set I1, -2147483648
set I2, 0
set I3, 0

lt  I1, I0, ONE, ERROR
print   bad\\n

ONE:
print   ok 1\\n
lt  I0, I1, ERROR, TWO
print   bad\\n

TWO:
print   ok 2\\n
lt  I2, I3, ERROR, THREE
print   bad\\n

THREE:
print   ok 3\\n
end

ERROR:
print bad\\n
CODE
ok 1
ok 2
ok 3
OUTPUT

output_is(CODE, OUTPUT, lt_ic_ic);
set I0, 2147483647
set I1, -2147483648
set I2, 0

lt  I0, -2147483648, ERROR, ONE
print   bad\\n

ONE:
print   ok 1\\n
lt  I1, 2147483647, TWO, ERROR
print   bad\\n

TWO:
print   ok 2\\n
lt  I0, 0, ERROR, THREE
print   bad\\n

THREE:
print   ok 3\\n
end

ERROR:
print   bad\\n
CODE
ok 1
ok 2
ok 3
OUTPUT

output_is(CODE, OUTPUT, le_i_ic);
set I0, 2147483647
set I1, -2147483648
set I2, 0
set I3, 0

le  I1, I0, ONE, ERROR
print   bad\\n

ONE:
print   ok 1\\n
le  

Re: A task for the interested

2001-09-18 Thread Damien Neil

On Tue, Sep 18, 2001 at 03:55:23PM -0400, Dan Sugalski wrote:
 Anyone care to take a shot at it? Having an extra overridable column in 
 the opcode_table file (so we know which opcodes are overridable, and thus 
 can't be in the switch) would be a good thing while you were at it...

I will do this tonight, if nobody else gets to it first.

 - Damien



Bytecode safety

2001-09-18 Thread Damien Neil

Proposed: Parrot should never crash due to malformed bytecode.  When
choosing between execution speed and bytecode safety, safety should
always win.  Careful op design and possibly a validation pass before
execution will hopefully keep the speed penalty to a minimum.

Yes, no?

   - Damien



Re: Bytecode safety

2001-09-18 Thread Damien Neil

On Tue, Sep 18, 2001 at 10:40:30PM +0100, Simon Cozens wrote:
 On Tue, Sep 18, 2001 at 02:37:43PM -0700, Damien Neil wrote:
  Proposed: Parrot should never crash due to malformed bytecode.
 
 Haven't we done this argument? :)

Sort of, while talking about other things.  I wanted to drag it out to
stand on its own. :

   - Damien



Number tests

2001-09-18 Thread Damien Neil

...and here are tests for the number ops.

  - Damien


#! perl -w

use Parrot::Test tests = 23;

output_is(CODE, OUTPUT, set_n_nc);
set N0, 1.0
set N1, 4.0
set N2, 16.0
set N3, 64.0
set N4, 256.0
set N5, 1024.0
set N6, 4096.0
set N7, 16384.0
set N8, 65536.0
set N9, 262144.0
set N10, 1048576.0
set N11, 4194304.0
set N12, 16777216.0
set N13, 67108864.0
set N14, 268435456.0
set N15, 1073741824.0
set N16, 4294967296.0
set N17, 17179869184.0
set N18, 68719476736.0
set N19, 274877906944.0
set N20, 1099511627776.0
set N21, 4398046511104.0
set N22, 17592186044416.0
set N23, 70368744177664.0
set N24, 281474976710656.0
set N25, 1.12589990684262e+15
set N26, 4.5035996273705e+15
set N27, 1.8014398509482e+16
set N28, 7.20575940379279e+16
set N29, 2.88230376151712e+17
set N30, 1.15292150460685e+18
set N31, 4.61168601842739e+18

print   N0
print   \\n
print   N1
print   \\n
print   N2
print   \\n
print   N3
print   \\n
print   N4
print   \\n
print   N5
print   \\n
print   N6
print   \\n
print   N7
print   \\n
print   N8
print   \\n
print   N9
print   \\n
print   N10
print   \\n
print   N11
print   \\n
print   N12
print   \\n
print   N13
print   \\n
print   N14
print   \\n
print   N15
print   \\n
print   N16
print   \\n
print   N17
print   \\n
print   N18
print   \\n
print   N19
print   \\n
print   N20
print   \\n
print   N21
print   \\n
print   N22
print   \\n
print   N23
print   \\n
print   N24
print   \\n
print   N25
print   \\n
print   N26
print   \\n
print   N27
print   \\n
print   N28
print   \\n
print   N29
print   \\n
print   N30
print   \\n
print   N31
print   \\n
CODE
1.00
4.00
16.00
64.00
256.00
1024.00
4096.00
16384.00
65536.00
262144.00
1048576.00
4194304.00
16777216.00
67108864.00
268435456.00
1073741824.00
4294967296.00
17179869184.00
68719476736.00
274877906944.00
1099511627776.00
4398046511104.00
17592186044416.00
70368744177664.00
281474976710656.00
1125899906842620.00
4503599627370500.00
18014398509482000.00
72057594037927904.00
288230376151712000.00
1152921504606850048.00
4611686018427389952.00
OUTPUT

output_is(CODE, OUTPUT, add_n);
set N0, 1.0
add N1, N0, N0
print   N1
print   \\n

add N2, N0, N1
print   N2
print   \\n

add N2, N2, N2
print   N2
print   \\n
CODE
2.00
3.00
6.00
OUTPUT

output_is(CODE, OUTPUT, sub_i);
set N0, 424242.0
set N1, 4200.0
sub N2, N0, N1
print   N2
print   \\n
CODE
420042.00
OUTPUT

output_is(CODE, OUTPUT, mul_i);
set N0, 2.0
mul N1, N0, N0
mul N1, N1, N0
mul N1, N1, N0
mul N1, N1, N0
mul N1, N1, N0
mul N1, N1, N0
mul N1, N1, N0
print   N1
print   \\n
CODE
256.00
OUTPUT

output_is(CODE, OUTPUT, div_i);
set N0, 10.0
set N1, 2.0
div N2, N0, N1
print   N2
print   \\n

set N3, 7.0
set N4, 2.0
div N3, N3, N4
print   N3
print   \\n

set N5, 9.0
set N6, -4.0
div N7, N5, N6
print   N7
print   \\n

CODE
5.00
3.50
-2.25
OUTPUT

output_is(CODE, OUTPUT, eq_n_ic);
set N0, 5.01
set N1, 5.01
set N2, 5.02

eq  N0, N1, ONE, ERROR
print   bad\\n

ONE:
print   ok 1\\n
eq  N1, N2, ERROR, TWO
print   bad\\n

TWO:
print   ok 2\\n
end

ERROR:
print   bad\\n
CODE
ok 1
ok 2
OUTPUT

output_is(CODE, OUTPUT, eq_nc_ic);
set N0, 1.01

eq  N0, 1.00, ERROR, ONE
print   bad\\n

ONE:
print   ok 1\\n
eq  N0, 1.01, TWO, ERROR
print   bad\\n

TWO:
print   ok 2\\n
end

ERROR:
print   bad\\n
CODE
ok 1
ok 2
OUTPUT

output_is(CODE, 

Re: t/op/integer.t is IMHO wrong

2001-09-18 Thread Damien Neil

On Wed, Sep 19, 2001 at 12:51:43AM +0200, Mattia Barbon wrote:
 I think that especting 4294967295 == -1 because they have the same 
 bit pattern ( on two's complement 32 bit machines ) is wrong

I was wondering how long it would take for someone to notice that. :

If anyone feels like defining a policy on what Parrot does with
out-of-range numbers, and what happens on integer overflow, I'll
submit patches to the tests to check against it.  I'd rather we
didn't just modify the tests to never trigger overflow conditions,
however; that's just sweeping the issue under the rug.

- Damien



Re: Tests

2001-09-18 Thread Damien Neil

On Tue, Sep 18, 2001 at 06:12:48PM -0500, Gibbs Tanton - tgibbs wrote:
 All the tests are great!  But, could everyone please remember to put an
 end at the end of each assembly test...cygwin doesn't like it if you
 don't.  I think I've patched all the ones up to this point.

Oops.  Sorry about that; I thought I had seen a patch go through
to make the ends optional.

   - Damien



Re: naming conventions on opcodes

2001-09-18 Thread Damien Neil

On Tue, Sep 18, 2001 at 07:52:06PM -0400, Dan Sugalski wrote:
 More to the point, it needs typing exactly twice--once in the .ops file 
 that defines the opcode function body, and once in opcode_table. The 
 assembler, of course, uses the smaller name.

Three times: And once to name the test case.  :

- Damien



Re: Difficulties

2001-09-15 Thread Damien Neil

On Sat, Sep 15, 2001 at 01:15:57AM -0700, Brent Dax wrote:
 As for the 5.6 thing...I think we're supposed to support 5.005 and
 above.  Can you tell what Parrot::Opcode needs it for?  (And if it's for
 'our', I'm going to punch someone... :^) )

Er...I think it IS for our, actually. :  I'm so used to using it, I
didn't realize I was introducing a 5.6ism.  The silly thing is, I
deliberately avoided using open(my $fh, $file) to keep from requiring
5.6...

I notice that someone did add a use 5.6.0 to Parrot::Opcode--here's
a patch which removes it, and the offending ours.

   - Damien


Index: Parrot/Opcode.pm
===
RCS file: /home/perlcvs/parrot/Parrot/Opcode.pm,v
retrieving revision 1.3
diff -u -r1.3 Opcode.pm
--- Parrot/Opcode.pm2001/09/15 00:57:42 1.3
+++ Parrot/Opcode.pm2001/09/15 08:33:48
@@ -1,12 +1,11 @@
 package Parrot::Opcode;
 
-use 5.6.0;
 use strict;
 use Symbol;
 use Digest::MD5 qw(md5_hex);
 
-our %opcode;
-our $fingerprint;
+my %opcode;
+my $fingerprint;
 
 sub _load {
 my $file = @_ ? shift : opcode_table;



Re: Difficulties

2001-09-15 Thread Damien Neil

On Sat, Sep 15, 2001 at 01:52:26AM -0700, Brent Dax wrote:
 use vars qw(%opcode $fingerprint);#or strict will throw a tantrum

Not necessary--the patch changes those variables to lexicals.
There wasn't any strong reason for them to be package vars.

   - Damien



Re: Half-completed parrot/parrot.h conversion?

2001-09-14 Thread Damien Neil

On Fri, Sep 14, 2001 at 11:31:20AM -0500, Gibbs Tanton - tgibbs wrote:
 The patch assumes that your source code directory is named parrot.  This may
 have been an invalid assumption, but it is going to be hard to do this patch
 unless we agree on the name of the source directory.

That may be difficult.  I occasionally like to have multiple copies of the
source directory around for testing--usually parrot.orig and parrot.  I
suspect I'm not the only one.

Having the compile in the parrot.orig directory pick up the includes from
../parrot would be surprising, to say the least.

  - Damien



Re: RFC: Bytecode file format

2001-09-14 Thread Damien Neil

On Sat, Sep 15, 2001 at 01:03:51AM +0300, Jarkko Hietaniemi wrote:
  Re: IFF.  Being an old Amiga user, I find it appealing.  Is the lack
  of a dictionary likely to be a significant problem?
 
 Please elaborate. 

IFF stores a linear series of chunks.  Each chunk has a header containing
the chunk id, and the size of the chunk.  In order to get a listing of
all chunks in an IFF file, you need to do a linear scan of the chunks.

A file format with a dictionary would contain a single section with
a list of all chunks in the file, eliminating the need to do numerous
seeks and reads to pull in the contents.

- Damien



Re: RFC: Bytecode file format

2001-09-14 Thread Damien Neil

On Sat, Sep 15, 2001 at 12:39:39AM +0300, Jarkko Hietaniemi wrote:
  It will be hard to use one format for both native and portable.
 
 Not one format, but a set of closely related formats with well-defined
 transformations between them.

After thinking about implementing this for a bit, I'm becoming
dubious about the value of allowing any instance of Parrot to read
the native bytecode of every other Parrot out there.  Do we really
want the non-native byteloader to be capable of reading everything
from little-endian 16-bit to 64-bit mixed-endian?  What about 36-bit?
(PDP-6 port, anyone? :)

I propose two encodings per Parrot: portable and native.  Portable
is big-endian 32-bit words.  Native is, of course, whatever makes
sense for the local machine.  If you want to share bytecode between
machines, you pass the create portable bytecode switch to the
assembler.

  - Damien



Re: Patch: Common opcode_table parsing

2001-09-13 Thread Damien Neil

On Thu, Sep 13, 2001 at 08:25:46AM +0100, Simon Cozens wrote:
 On Thu, Sep 13, 2001 at 12:29:18AM -0700, Damien Neil wrote:
  CVS changes over the past couple of days mean this patch will no
  longer cleanly apply.  I'd be happy to update it to patch cleanly
  against the current CVS code, but I'd like to know first if the
  approach it takes is on the right track.
 
 I like it, if only because reduction of common code is always good,
 and reduction of common code while everything's in a lot of flux
 is even better.

OK, I'll go through and update it again.  This patch takes out the
parsing of interp_guts.h, which I think is good for a variety of
reasons.  (Summary: it makes things simpler, and I don't think
parsing it will buy us anything at this point in time.)  Is this
OK, or should I put it back in?


 Urgh, urgh, urgh. I don't *like* the idea of munging opcode function
 names, but I equally don't like coredumps. Isn't there a way of 
 telling the linker to use our own symbols?

Actually, the problem is that the linker IS using our symbols. :
There appears to be an end symbol somewhere in libc that is getting
munged by the Parrot symbol.  I think.  I didn't look deeply enough to
see exactly how things were going wrong, once I traced the core to a
symbol clash.

I *really* think we need to munge the names, though.  end is just
far too common a symbol for us to be able to pollute it.  Let's
learn the lesson from Perl 5: All symbols exported by the Parrot
code need a prefix.

  - Damien



Re: Patch: Common opcode_table parsing

2001-09-13 Thread Damien Neil

On Thu, Sep 13, 2001 at 08:44:44AM +0100, Simon Cozens wrote:
 Aiiee. Yes, I appreciate what you're saying, but the other lessons
 from Perl 5 is that if you want to do that, you end up with either
 lots of unweildy code, or a nasty macro renaming. Which is it
 gonna be?

I don't really like the Perl 5 approach of lots of macros.  I'd
rather have a short prefix attached to all symbols.  par_foo()
rather than foo() in all cases.  For very commonly used macros (the
equivalent of PUSHi()), you might relax this rule.

Look at the current source: Is it really going to be any harder to
always type par_string_length() rather than string_length(), or
Par_Allocate_Aligned() rather than Allocate_Aligned()?  The symbol
names are long enough to begin with that an extra four characters
isn't going to make much difference.

Even a single character would do a lot to eliminate symbol clashes:
Pstring_length() and PAllocate_Aligned(), for example.

(Speaking of the above, someone authoritative may want to dictate
whether functions are Upper_Case or lower_case.)

Talking just about the opcode functions, however: Will code be
calling opcode functions directly very often?  Perhaps I'm wrong,
but I'd think that ops being called outside the runops() loop will
be rare.  In fact, if ops can be embedded into a switch statement
in runops() at compile time, there won't even be any assurance that
there ARE any op functions to call.  Unwieldy op function names
shouldn't be a problem.

 - Damien



Re: Using int32_t instead of IV for code

2001-09-13 Thread Damien Neil

On Thu, Sep 13, 2001 at 10:06:51AM +0100, Philip Kendall wrote:
 If we are going to keep on doing fancy stuff with pointer arithmetic (eg
 the Alloc_Aligned/CHUNK_BASE stuff), I think we're also going to need an
 integer type which is guaranteed to be the same width as a pointer, so
 we can freely typecast between the two.

The language lawyer in me insists that I point out that this is
inherently nonportable.  C does not guarantee that it is possible to
convert losslessly between pointers and integers; there have been
systems on which this was impossible (or hugely inefficient) for
hardware reasons.

The correct approach to storing pointers and integerss in the same
value is to use a union.  Personally, I would use:

  typedef union {
  int   i;
  void *p;
  } IV;

I realize that I'm probably in a minority of one on this. :


 Also, if we've got a system with 64 bit IVs, are the arguments to Parrot
 opcodes going to be 32 or 64 bit? If 32 bit, is there going to be any
 way of loading a 64 bit constant?

This reminds me of something I've been meaning to ask: Is Parrot byte
code intended to be network-portable?

- Damien



Re: patch: assembly listings from assembler

2001-09-13 Thread Damien Neil

On Thu, Sep 13, 2001 at 06:41:00PM -0400, Dan Sugalski wrote:
 At 01:42 PM 9/13/2001 -0700, Benjamin Stuhl wrote:
 Could we please get in the habit of adding a -c or a -u to
 our CVS diffs, just as we would with normal patches?
 
 Yes, please!
 
 All diffs posted to the list should be either -c or -u diffs. Both can be 
 fed to patch, and both read far more easily than the plain diff output.

The following lines, placed in ~/.cvsrc, make cvs work much better:

  update -dP
  diff -u

The -d option to update makes cvs check out newly-created directories;
without it, it will silently ignore them.  -P prunes empty directories,
which compensates for the fact that directories can't be deleted.

And the -u to diff (or -c) is just a good idea. :

   - Damien



Patch: Common opcode_table parsing, take 2

2001-09-13 Thread Damien Neil

Here's an updated version of my original patch, to account for recent
changes in CVS.  As before, this includes opcode-munging to let Parrot
run on FreeBSD.

  - Damien


diff -u --new-file -r parrot.orig/Parrot/Opcode.pm parrot/Parrot/Opcode.pm
--- parrot.orig/Parrot/Opcode.pmWed Dec 31 16:00:00 1969
+++ parrot/Parrot/Opcode.pm Mon Sep 10 23:52:35 2001
@@ -0,0 +1,86 @@
+package Parrot::Opcode;
+
+use strict;
+use Symbol;
+
+sub read_ops {
+my $file = @_ ? shift : opcode_table;
+
+my $fh = gensym;
+open $fh, $file or die $file: $!\n;
+
+my %opcode;
+my $count = 1;
+while ($fh) {
+   s/#.*//;
+   s/^\s+//;
+   chomp;
+   next unless $_;
+
+   my($name, @params) = split /\s+/;
+   if (@params  $params[0] =~ /^\d+$/) {
+   my $count = shift @params;
+   die $file, line $.: opcode $name parameters don't match count\n
+ if ($count != @params);
+   }
+
+   warn $file, line $.: opcode $name redefined\n if $opcode{$name};
+
+   $opcode{$name}{ARGS}  = @params;
+   $opcode{$name}{TYPES} = \@params;
+   $opcode{$name}{CODE}  = ($name eq end) ? 0 : $count++;
+   $opcode{$name}{FUNC}  = Parrot_op_$name;
+
+   my $num_i = () = grep {/i/} @params;
+   my $num_n = () = grep {/n/} @params;
+   $opcode{$name}{RETURN_OFFSET} = 1 + $num_i + $num_n * 2;
+}
+
+return %opcode;
+}
+
+1;
+
+
+__END__
+
+=head1 NAME
+
+Parrot::Opcode - Read opcode definitions
+
+=head1 SYNOPSIS
+
+  use Parrot::Opcode;
+
+  %opcodes = Parrot::Opcode::read_ops();
+
+=head1 DESCRIPTION
+
+The read_ops() function parses the Parrot opcode_table file, and
+returns the contents as a hash.  The hash key is the opcode name;
+values are hashrefs containing the following fields:
+
+=over
+
+=item CODE
+
+The opcode number.
+
+=item ARGS
+
+The opcode argument count.
+
+=item TYPES
+
+The opcode argument types, as an arrayref.
+
+=item FUNC
+
+The name of the C function implementing this op.
+
+=back
+
+read_ops() takes an optional argument: the file to read the opcode table
+from.
+
+=cut
diff -u --new-file -r parrot.orig/assemble.pl parrot/assemble.pl
--- parrot.orig/assemble.pl Thu Sep 13 20:45:05 2001
+++ parrot/assemble.pl  Thu Sep 13 20:33:36 2001
@@ -5,6 +5,7 @@
 # Brian Wheeler ([EMAIL PROTECTED])
 
 use strict;
+use Parrot::Opcode;
 
 my $opt_c;
 if (@ARGV and $ARGV[0] eq -c) {
@@ -25,32 +26,10 @@
 foreach (keys(%real_type)) {
 $sizeof{$_}=length(pack($pack_type{$real_type{$_}},0));
 }
-
 
-# get opcodes from guts.
-open GUTS, interp_guts.h;
-my %opcodes;
-while (GUTS) {
-next unless /\tx\[(\d+)\] = ([a-z_]+);/;
-$opcodes{$2}{CODE} = $1;
-}
-close GUTS;
 
-# get opcodes and their arg lists
-open OPCODES, opcode_table or die Can't get opcode table, $!/$^E;
-while (OPCODES) {
-next if /^\s*#/;
-chomp;
-s/^\s+//;
-next unless $_;
-my ($name, $args, @types) = split /\s+/, $_;
-my @rtypes=@types;
-@types=map { $_ = $real_type{$_}} @types;
-$opcodes{$name}{ARGS} = $args;
-$opcodes{$name}{TYPES} = [@types];
-$opcodes{$name}{RTYPES}=[@rtypes];
-}
-close OPCODES;
+# get opcodes
+my %opcodes = Parrot::Opcode::read_ops();
 
 
 # read source and assemble
@@ -134,8 +113,8 @@
 $pc+=4;
 
 foreach (0..$#args) {
-my($rtype)=$opcodes{$opcode}{RTYPES}[$_];
-my($type)=$opcodes{$opcode}{TYPES}[$_];
+my($rtype)=$opcodes{$opcode}{TYPES}[$_];
+my($type)=$real_type{$opcodes{$opcode}{TYPES}[$_]};
 if($rtype eq I || $rtype eq N || $rtype eq P || $rtype eq S) {
 # its a register argument
 $args[$_]=~s/^[INPS](\d+)$/$1/i;
diff -u --new-file -r parrot.orig/build_interp_starter.pl 
parrot/build_interp_starter.pl
--- parrot.orig/build_interp_starter.pl Thu Sep 13 20:45:05 2001
+++ parrot/build_interp_starter.pl  Thu Sep 13 20:36:14 2001
@@ -1,10 +1,9 @@
 # !/usr/bin/perl -w
 use strict;
+use Parrot::Opcode;
 
 open INTERP,  interp_guts.h or die Can't open interp_guts.h, $!/$^E;
 
-open OPCODES, opcode_table or die Can't open opcode_table, $!/$^E;
-
 print INTERP CONST;
 /*
  *
@@ -18,17 +17,9 @@
 #define BUILD_TABLE(x) do { \\
 CONST
 
-my $count = 1;
-while (OPCODES) {
-chomp;
-s/#.*$//;
-s/^\s+//;
-next unless $_;
-my($name) = split /\s+/;
-my $num = $count;
-$num = 0 if $name eq 'end';
-print INTERP \tx[$num] = $name; \\\n;
-$count++ unless $name eq 'end';
+my %opcodes = Parrot::Opcode::read_ops();
+for my $name (sort {$opcodes{$a}{CODE} = $opcodes{$b}{CODE}} keys %opcodes) {
+print INTERP \tx[$opcodes{$name}{CODE}] = $opcodes{$name}{FUNC}; \\\n;
 }
 print INTERP } while (0);\n;
 
diff -u --new-file -r parrot.orig/disassemble.pl parrot/disassemble.pl
--- parrot.orig/disassemble.pl  Thu Sep 13 20:45:05 2001
+++ parrot/disassemble.pl   Thu Sep 13 20:37:47 2001
@@ -5,6 +5,7 @@
 # Turn a parrot bytecode 

Patch: Common opcode_table parsing

2001-09-11 Thread Damien Neil

The following patch moves all parsing of opcode_table into a
Parrot::Opcode module.  It also removes all parsing of interp_guts.h.
This patch incorporates my earlier patches to prefix all C opcode
functions with Perl_op_.

As best I can tell, everything works the same with the patch as it
did before--the assembler and disassembler both generate identical
output, and test_prog runs as well as before.  (Or better on FreeBSD,
where it stops core dumping. :)

  - Damien


diff -r --new-file -u parrot.orig/Parrot/Opcode.pm parrot/Parrot/Opcode.pm
--- parrot.orig/Parrot/Opcode.pmWed Dec 31 16:00:00 1969
+++ parrot/Parrot/Opcode.pm Mon Sep 10 23:52:35 2001
@@ -0,0 +1,86 @@
+package Parrot::Opcode;
+
+use strict;
+use Symbol;
+
+sub read_ops {
+my $file = @_ ? shift : opcode_table;
+
+my $fh = gensym;
+open $fh, $file or die $file: $!\n;
+
+my %opcode;
+my $count = 1;
+while ($fh) {
+   s/#.*//;
+   s/^\s+//;
+   chomp;
+   next unless $_;
+
+   my($name, @params) = split /\s+/;
+   if (@params  $params[0] =~ /^\d+$/) {
+   my $count = shift @params;
+   die $file, line $.: opcode $name parameters don't match count\n
+ if ($count != @params);
+   }
+
+   warn $file, line $.: opcode $name redefined\n if $opcode{$name};
+
+   $opcode{$name}{ARGS}  = @params;
+   $opcode{$name}{TYPES} = \@params;
+   $opcode{$name}{CODE}  = ($name eq end) ? 0 : $count++;
+   $opcode{$name}{FUNC}  = Parrot_op_$name;
+
+   my $num_i = () = grep {/i/} @params;
+   my $num_n = () = grep {/n/} @params;
+   $opcode{$name}{RETURN_OFFSET} = 1 + $num_i + $num_n * 2;
+}
+
+return %opcode;
+}
+
+1;
+
+
+__END__
+
+=head1 NAME
+
+Parrot::Opcode - Read opcode definitions
+
+=head1 SYNOPSIS
+
+  use Parrot::Opcode;
+
+  %opcodes = Parrot::Opcode::read_ops();
+
+=head1 DESCRIPTION
+
+The read_ops() function parses the Parrot opcode_table file, and
+returns the contents as a hash.  The hash key is the opcode name;
+values are hashrefs containing the following fields:
+
+=over
+
+=item CODE
+
+The opcode number.
+
+=item ARGS
+
+The opcode argument count.
+
+=item TYPES
+
+The opcode argument types, as an arrayref.
+
+=item FUNC
+
+The name of the C function implementing this op.
+
+=back
+
+read_ops() takes an optional argument: the file to read the opcode table
+from.
+
+=cut
diff -r --new-file -u parrot.orig/assemble.pl parrot/assemble.pl
--- parrot.orig/assemble.pl Mon Sep 10 14:26:08 2001
+++ parrot/assemble.pl  Mon Sep 10 23:51:34 2001
@@ -3,6 +3,7 @@
 # assemble.pl - take a parrot assembly file and spit out a bytecode file
 
 use strict;
+use Parrot::Opcode;
 
 my(%opcodes, %labels);
 
@@ -12,23 +13,7 @@
  );
 my $sizeof_packi = length(pack($pack_type{i},1024));
 
-open GUTS, interp_guts.h;
-my $opcode;
-while (GUTS) {
-next unless /\tx\[(\d+)\] = ([a-z_]+);/;
-$opcodes{$2}{CODE} = $1;
-}
-
-open OPCODES, opcode_table or die Can't get opcode table, $!/$^E;
-while (OPCODES) {
-next if /^\s*#/;
-chomp;
-s/^\s+//;
-next unless $_;
-my ($name, $args, @types) = split /\s+/, $_;
-$opcodes{$name}{ARGS} = $args;
-$opcodes{$name}{TYPES} = [@types];
-}
+%opcodes = Parrot::Opcode::read_ops();
 
 my $pc = 0;
 my @code;
diff -r --new-file -u parrot.orig/build_interp_starter.pl 
parrot/build_interp_starter.pl
--- parrot.orig/build_interp_starter.pl Mon Sep 10 14:26:09 2001
+++ parrot/build_interp_starter.pl  Mon Sep 10 23:53:26 2001
@@ -1,10 +1,9 @@
 # !/usr/bin/perl -w
 use strict;
+use Parrot::Opcode;
 
 open INTERP,  interp_guts.h or die Can't open interp_guts.h, $!/$^E;
 
-open OPCODES, opcode_table or die Can't open opcode_table, $!/$^E;
-
 print INTERP CONST;
 /*
  *
@@ -18,16 +17,8 @@
 #define BUILD_TABLE(x) do { \\
 CONST
 
-my $count = 1;
-while (OPCODES) {
-chomp;
-s/#.*$//;
-s/^\s+//;
-next unless $_;
-my($name) = split /\s+/;
-my $num = $count;
-$num = 0 if $name eq 'end';
-print INTERP \tx[$num] = $name; \\\n;
-$count++ unless $name eq 'end';
+my %opcodes = Parrot::Opcode::read_ops();
+for my $name (sort {$opcodes{$a}{CODE} = $opcodes{$b}{CODE}} keys %opcodes) {
+print INTERP \tx[$opcodes{$name}{CODE}] = $opcodes{$name}{FUNC}; \\\n;
 }
 print INTERP } while (0);\n;
diff -r --new-file -u parrot.orig/disassemble.pl parrot/disassemble.pl
--- parrot.orig/disassemble.pl  Mon Sep 10 14:45:33 2001
+++ parrot/disassemble.pl   Mon Sep 10 23:57:36 2001
@@ -7,6 +7,7 @@
 use strict;
 
 my(%opcodes, @opcodes);
+use Parrot::Opcode;
 
 my %unpack_type;
 %unpack_type = (i = 'l',
@@ -16,28 +17,10 @@
   n = 8,
   );
 
-open GUTS, interp_guts.h;
-my $opcode;
-while (GUTS) {
-next unless /\tx\[(\d+)\] = ([a-z_]+);/;
-$opcodes{$2}{CODE} = $1;
-}
-
-open OPCODES, opcode_table or die Can't get opcode table, $!/$^E;
-while (OPCODES) {
-next if /^\s*#/;
-s/^\s+//;
-

Re: Speaking of namespaces...

2001-09-10 Thread Damien Neil

On Mon, Sep 10, 2001 at 06:58:23PM -0400, Dan Sugalski wrote:
 At 03:52 PM 9/10/2001 -0700, Damien Neil wrote:
 Parrot fails to work in very obscure ways on FreeBSD.  After some
 poking around, I tracked the problem to the end op--this appears
 to conflict with something inside libc.  Renaming the op fixes the
 problem.
 
 Ah, that's what was  killing the build on Nat's machine. Patch, by chance?

The following quick-and-dirty patch appears to work.  This prefixes
all opcode functions with Parrot_op_.  I'd have made the prefix
configurable, but the opcode generation is spread across three
different files.

(Aside: What's the best way to generate a useful patch with cvs?
The following comes from cvs -q diff -u.)

   - Damien

Index: build_interp_starter.pl
===
RCS file: /home/perlcvs/parrot/build_interp_starter.pl,v
retrieving revision 1.2
diff -u -u -r1.2 build_interp_starter.pl
--- build_interp_starter.pl 2001/09/10 21:26:09 1.2
+++ build_interp_starter.pl 2001/09/10 23:07:08
@@ -27,7 +27,7 @@
 my($name) = split /\s+/;
 my $num = $count;
 $num = 0 if $name eq 'end';
-print INTERP \tx[$num] = $name; \\\n;
+print INTERP \tx[$num] = Parrot_op_$name; \\\n;
 $count++ unless $name eq 'end';
 }
 print INTERP } while (0);\n;
Index: make_op_header.pl
===
RCS file: /home/perlcvs/parrot/make_op_header.pl,v
retrieving revision 1.3
diff -u -u -r1.3 make_op_header.pl
--- make_op_header.pl   2001/09/10 21:26:09 1.3
+++ make_op_header.pl   2001/09/10 23:07:08
@@ -6,7 +6,7 @@
 next if /^\s*#/ or /^\s*$/;
 chomp;
 ($name, undef) = split /\t/, $_;
-print IV *$name(IV *, struct Perl_Interp *);\n;
+print IV *Parrot_op_$name(IV *, struct Perl_Interp *);\n;
 }
 
 BEGIN {
Index: process_opfunc.pl
===
RCS file: /home/perlcvs/parrot/process_opfunc.pl,v
retrieving revision 1.3
diff -u -u -r1.3 process_opfunc.pl
--- process_opfunc.pl   2001/09/10 21:26:09 1.3
+++ process_opfunc.pl   2001/09/10 23:07:08
@@ -105,7 +105,7 @@
 my $line = shift;
 my ($name) = $line =~ /AUTO_OP\s+(\w+)/;
 
-print OUTPUT IV *$name(IV cur_opcode[], struct Perl_Interp *interpreter) {\n;
+print OUTPUT IV *Parrot_op_$name(IV cur_opcode[], struct Perl_Interp 
+*interpreter) {\n;
 return($name,   return cur_opcode + 
 . $opcode{$name}{RETURN_OFFSET}. ;\n}\n);
 }
@@ -114,7 +114,7 @@
 my $line = shift;
 my ($name) = $line =~ /MANUAL_OP\s+(\w+)/;
 
-print OUTPUT IV *$name(IV cur_opcode[], struct Perl_Interp *interpreter) {\n;
+print OUTPUT IV *Parrot_op_$name(IV cur_opcode[], struct Perl_Interp 
+*interpreter) {\n;
 print OUTPUT   IV return_offset = 1;\n;
 return($name,   return cur_opcode + return_offset;\n}\n);
 }
Index: test.pbc
===
RCS file: /home/perlcvs/parrot/test.pbc,v
retrieving revision 1.2
diff -u -u -r1.2 test.pbc
Binary files /tmp/cvsqe7MSGr3cy and test.pbc differ



Re: Speaking of namespaces...

2001-09-10 Thread Damien Neil

On Mon, Sep 10, 2001 at 04:04:20PM -0700, Damien Neil wrote:
 The following quick-and-dirty patch appears to work.  This prefixes
 all opcode functions with Parrot_op_.  I'd have made the prefix
 configurable, but the opcode generation is spread across three
 different files.

Oops--that breaks the assembler.  This patch fixes the assembler to
work with the prior patch.

- Damien


Index: assemble.pl
===
RCS file: /home/perlcvs/parrot/assemble.pl,v
retrieving revision 1.6
diff -u -u -r1.6 assemble.pl
--- assemble.pl 2001/09/10 21:26:08 1.6
+++ assemble.pl 2001/09/10 23:43:30
@@ -15,7 +15,7 @@
 open GUTS, interp_guts.h;
 my $opcode;
 while (GUTS) {
-next unless /\tx\[(\d+)\] = ([a-z_]+);/;
+next unless /\tx\[(\d+)\] = Parrot_op_([a-z_]+);/;
 $opcodes{$2}{CODE} = $1;
 }
 



Re: Speaking of namespaces...

2001-09-10 Thread Damien Neil

On Mon, Sep 10, 2001 at 08:48:48PM -0400, Dan Sugalski wrote:
 At 04:56 PM 9/10/2001 -0700, Brent Dax wrote:
 This patch seems to work on the FreeBSD box I have access to.  Now to 
 figure out what's causing all those 'use of uninitialized value at 
 assembler.pl line 81' messages...
 
 It's the blank lines in opcode_table. The assembler (and disassembler) at 
 some point didn't grok 'em. Patches have been applied, but you might've 
 checked out before that happened.

No, in this case, it's my fault.  I didn't realize the assembler
reads op name/number mappings out of interp_guts.h, so my patch
broke the assembler.

I'm thinking of writing something to generate a Parrot::Opcode.pm
module, so code doesn't need to parse opcode_table and interp_guts.h.
Sound reasonable?

   - Damien



Re: Speaking of namespaces...

2001-09-10 Thread Damien Neil

On Mon, Sep 10, 2001 at 08:56:52PM -0400, Dan Sugalski wrote:
 I'm thinking of writing something to generate a Parrot::Opcode.pm
 module, so code doesn't need to parse opcode_table and interp_guts.h.
 Sound reasonable?
 
 Yes, please do. I knew we needed one the second time I needed to parse 
 opcode_table, I just haven't stopped long enough to be lazy and still 
 program. (In those cases I came to a full stop...)

OK, I'll do that sometime tonight.  Should it parse opcode_table,
or should it be generated with the contents of opcode_table?

  - Damien



Re: Should the op dispatch loop decode?

2001-06-12 Thread Damien Neil

On Tue, Jun 12, 2001 at 06:12:35PM -0400, Dan Sugalski wrote:
 At the moment I'm leaning towards the functions doing their own decoding, 
 as it seems likely to be faster. (Though we'd be duplicating the decoding 
 logic everywhere, and bigger's reasonably bad) Possibly mandating shadow 
 functions for each opcode function, where the shadow does the decoding and 
 calls the real functions which take real things rather than our registers.
 
 Opinions anyone?

I'd say that choosing the more complicated way because it seems to
be faster is almost always a bad idea.  What was that quote about
premature optimization?

A major advantage to putting the decoding in the main loop to start
with, at least, is that it makes it easier to perform major surgery
on the overall opcode design without needing to touch every op.
I don't know how likely such surgery is.

- Damien



Re: More character matching bits

2001-06-12 Thread Damien Neil

On Tue, Jun 12, 2001 at 06:44:02PM -0400, Dan Sugalski wrote:
 While that's true, KATAKANA LETTER A and HIRAGANA LETTER A are also 
 referring to distinct things. (Though arguably not as distinct as either 
 with LATIN CAPITAL A) If we do one, why not the other? I'm perfectly happy 
 with an answer that starts because..., but we should have an answer.

Because anything which treats KATAKANA LETTER A and LATIN CAPITAL A
as the same thing needs to treat KATAKANA LETTER KA and the sequence
(LATIN CAPITAL K, LATIN CAPITAL A) as the same thing.

Because this as much sense as allowing WHITE SMILING FACE to match
(COLON, HYPHEN, RIGHT  PARENTHESIS).

Because the logical extension of this is to allow a sequence of Kanji
or other ideographic characters to match their Romanized representation
(or vice versa), which is a reasonable approximation of impossible.


 We probably also ought to answer the question How accommodating to 
 non-latin writing systems are we going to be? It's an uncomfortable 
 question, but one that needs asking. Answering by Larry, probably, but 
 definitely asking. Perl's not really language-neutral now (If you think so, 
 go wave locales at Jarkko and see what happens... :) but all our biases are 
 sort of implicit and un (or under) stated. I'd rather they be explicit, 
 though I know that's got problems in and of itself.

A fair question, and not one I can answer.  I can say that I feel that
providing a mechanism for Hiragana characters to match Katakana and
vice-versa is about as useful for a person doing Japanese text processing
as case-insensitive matching is for a person working with English.

  - Damien



Re: More character matching bits

2001-06-12 Thread Damien Neil

On Wed, Jun 13, 2001 at 01:22:32AM +0100, Simon Cozens wrote:
 I'd say it was about as useful as providing a regexp option to translate
 the search term into French and try that instead.[1] Handy, possibly.
 Essential? No. Something that should be part of the core? I'll leave
 that for you to decide.

I believe that my initial analogy is more accurate than yours.  The
ability to match Hiragana as Katakana and vice-versa is almost
identical conceptually to the ability to perform case insensitive
matches on English text.


 What next, you want to maybe add Japanese and Chinese readings for all
 the kanji and convert between them too? That would be *considerably*
 more useful. :)
 
 [1] katakana signifies The following text is not in Japanese, except
 when it doesn't.

This is literally accurate,  but completely content-free.  The
variety of ways in which Hiragana and Katakana are be used in
Japanese are as disparate as the ways in which italic and non-italic
characters are used in English.

Katakana is frequently used to write words with additional emphasis,
to convey the impression of a sentence being spoken with an accent,
to write the on-youmi of a Kanji, to write foreign loan words, and to
write onomatopoeia.  (This is not a complete list.)

   - Damien



Re: More character matching bits

2001-06-12 Thread Damien Neil

On Wed, Jun 13, 2001 at 02:15:16AM +0100, Simon Cozens wrote:
 Or we could keep it out of core. It's up to you, really.

No, it isn't.  It's up to Larry, or to whoever gets the regex
pumpkin.

I'm withdrawing from this discussion: My intent was to clarify
exactly why someone might want to treat Katakana and Hiragana as
equivalent for matching purposes, not to take a stand on what
features Perl should include or how these should be implemented.


  to write the on-youmi of a Kanji,
 
 Hrm, no, not usually; furigana are almost always hiragana, and
 learner's textbooks - bah, they're not real Japanese. :)

I believe you are confused; kun-youmi and on-youmi have nothing
to do with furigana.

  - Damien



Re: Unicode handling

2001-03-27 Thread Damien Neil

On Tue, Mar 27, 2001 at 12:38:23PM -0500, Dan Sugalski wrote:
 I'm afraid this isn't what I'd normally think of--ord to me returns the 
 integer value of the first code point in the string. That does mean that A 
 is different for ASCII and EBCDIC, but that's just One Of Those Things.

My personal take is that ord and chr should be exact inverses of
each other.  chr(ord($c)) should produce the same value as $c.
(Albiet possibly in a different internal encoding.)

I just trawled through my installed modules, looking for existing
uses of ord to support my argument.  Unfortunately for me, my
conclusion is that pretty much any code which uses ord currently
will break in a world with multibyte characters.  I do think that
it would be worthwhile to come up with some examples of intended
uses of ord.  In particular, I'd be very interested in seeing any
cases where you would want it to return the value of a code point
in anything other than the current default encoding.

 - Damien



Re: Unicode handling

2001-03-26 Thread Damien Neil

On Mon, Mar 26, 2001 at 11:32:46AM -0500, Dan Sugalski wrote:
 At 05:09 PM 3/23/2001 -0800, Damien Neil wrote:
 So the results of ord are dependent on a global setting for "current
 character set" or some such, not on the encoding of the string that
 is passed to it?
 
 Nope, ord is dependent on the string it gets, as those strings know what 
 their encoding is. chr is the one dependent on the current default encoding.

So $c = chr(ord($c)) could change $c?  That seems odd.

In what other circumstances will the encoding of a string be
visible to the programmer?  Not when printing the string to
a file handle, I would think -- that should be controlled by
the encoding on the handle.  Are there any other cases where
encoding matters?

  - Damien



Re: Unicode handling

2001-03-26 Thread Damien Neil

On Mon, Mar 26, 2001 at 08:37:05PM +, [EMAIL PROTECTED] wrote:
 If ord is dependent on the encoding of the string it gets, as Dan
 was saying, than ord($e) is 0x81, 
 
 It it could still be 0x81 (from ebcdic) with the encoding carried 
 along with the _number_ if we thought that worth the trouble.

I'm going to go away and whimper in pain for a bit, now.

"I thought chr(0x61) was 'a'."  "It is, but that's an EBCDIC number."

  - Damien



Re: Unicode handling

2001-03-23 Thread Damien Neil

On Fri, Mar 23, 2001 at 12:38:04PM -0500, Dan Sugalski wrote:
while (IN) {
  $count++ if /bar/;
  print OUT $_;
}

I would find it surprising for this to have different output
than input.  Other people's milage may vary.

In general, however, I think I would prefer to be required to
explicitly normalize my data (via a function, pragma, or option
set on a filehandle) than have data change unexpectedly behind
my back.

 - Damien



Re: Unicode handling

2001-03-23 Thread Damien Neil

On Fri, Mar 23, 2001 at 06:16:58PM -0500, Dan Sugalski wrote:
 At 11:09 PM 3/23/2001 +, Simon Cozens wrote:
 For instance, chr() will produce Unicode codepoints. But you can pretend that
 they're ASCII codepoints, it's only the EBCDIC folk that'll get hurt. I hope
 and suspect there'll be an equivalent of "use bytes" which makes chr(256)
 either blow up or wrap around.
 
 Actually no it won't. If the string you're doing a chr on is tagged as 
 EBCDIC, you'll get the EBCDIC value. Yes, it does mean that this:
 
 chr($foo) == chr($bar);
 
 could evaluate to false if one of the strings is EBCDIC and the other 
 isn't. Odd but I don't see a good reason not to. Otherwise we'd want to 
 force everything to Unicode, and then what do we do if one of the strings 
 is plain binary data?

Are you thinking of ord rather than chr?  I can't seem to make the
above make sense otherwise.  chr takes a number, not a string as its
argument...

Your initial description of character set handling didn't mention
that different strings can be tagged as having different encodings,
and didn't cover the implications of this.  Could you give a list
of the specific occasions when the encoding of a string would be
visible to a programmer?

- Damien



Re: Unicode handling

2001-03-23 Thread Damien Neil

On Fri, Mar 23, 2001 at 06:31:13PM -0500, Dan Sugalski wrote:
 Err, perhaps I'm being dumb here - but surely $foo and $bar arent
 typed strings, they're just numbers (or strings which match /^\d+$/) ???
 
 D'oh! Too much blood in my caffeine stream. Yeah, I was thinking of ord.
 
 chr will emit a character of the type appropriate to the current default 
 string context. The default context will probably be settable at compile 
 time, or be the platform native type, alterable somehow. Probably "use 
 blah;" but that's a language design issue. :)

Ah, this answers the puzzlement in the message I just sent. :

So the results of ord are dependent on a global setting for "current
character set" or some such, not on the encoding of the string that
is passed to it?

  - Damien



Re: Please shoot down this GC idea...

2001-02-14 Thread Damien Neil

On Wed, Feb 14, 2001 at 11:26:00AM -0500, Dan Sugalski wrote:
 At 11:03 AM 2/14/2001 -0500, Buddha Buck wrote:
 [Truly profound amount of snippage]
 I'm sure this idea has flaws.  But it's an idea.  Tell me what I'm missing.
 
 You've pretty much summed up the current plan.

I have a strong suspicion that this approach will lead to confusing,
hard-to-find bugs in Perl programs.  (That is, programs written in
Perl, rather than perl-the-program.)  Consider:

  sub do_stuff { ... }

  {
my $fh = IO::File-new("file");
do_stuff($fh);
  }

In this code, the compiler can determine that $fh has no active
references at the end of the block, and $fh-DESTROY will be called.
(The compiler can flag do_stuff() as not preserving any references
to its argument.)

Now consider:

  {
my $fh = IO::File-new("file");
do_stuff($fh);
  }

  sub do_stuff { ... }

In this case, the compiler hasn't seen do_stuff() when it compiles
the block in which $fh is instantiated.  Unless it performs multiple
passes, it won't be able to determine that do_stuff() does not
preserve a reference to $fh, and it won't be able to deterministically
call $fh-DESTROY at the end of the block.

This is purest action-at-a-distance.  To the programmer, there is
no difference between the two blocks.  This can occur in even more
confusing fashions: consider a pair of recursive subs, or autoloaded
subs, or method calls.  For example:

  sub foo {
my Dog $spot = shift;
my $fh = IO::File-new("file");
$spot-eat_homework($fh);
  }

Even with the object type declared, the compiler can make no
assumptions about whether a reference to $fh will be held or not.
Perhaps the Poodle subclass of Dog will hold a reference, and the
Bulldog subclass will not.

I think that there will be few cases when compile-time analysis will
identify places where deterministic finalization can occur.  Worse,
any programmer who attempts to code for these cases will leave herself
open for action-at-a-distance code breakage.

Maybe I'm missing something.  I hope I am.  I think, however, that
Perl will have to decide between deterministic destruction of non-
circular data structures, and a modern garbage collector.

  - Damien



Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-14 Thread Damien Neil

[trimming distribution to -internals only]

On Wed, Feb 14, 2001 at 07:44:53PM +, Simon Cozens wrote:
 package NuclearReactor::CoolingRod;
 
 sub new {
 Reactor-decrease_core_temperature();
 bless {}, shift
 }
 
 sub DESTROY {
 Reactor-increase_core_temperature();
 }

A better design:

package NuclearReactor::CoolingRod;

sub new {
   Reactor-decrease_core_temperature();
   bless { inserted = 1 }, shift;
}

sub insert {
   my $self = shift;
   return if $self-{inserted};
   Reactor-decrease_core_temperature();
   $self-{inserted} = 1;
}

sub remove {
   my $self = shift;
   return unless $self-{inserted};
   Reactor-increase_core_temperature();
   $self-{inserted} = 0;
}

sub DESTROY {
   my $self = shift;
   $self-remove;
}


Using object lifetime to control state is almost never a good idea,
even if you have deterministic finalization.  A much better approach
is to have methods which allow holders of the object to control it,
and a finalizer (DESTROY method) which cleans up only if necessary.

A more real-world case is IO::Handle.  If you want to close a handle
explicitly, you call $handle-close, not $handle-DESTROY.  The
concept of closing a handle is orthogonal to the concept of the object
ceasing to exist.  User code can close a handle.  It can't make the
object go away -- only the garbage collector can do that.

I think that the biggest problem with DESTROY is that it is misnamed.
The name makes people think that $obj-DESTROY should destroy an object,
which it doesn't.  It's rather too late to rename it to ATDESTRUCTION
or WHENDESTROYED or FINALIZE, however.

  - Damien



Re: Core data types and lazy evaluation

2000-12-28 Thread Damien Neil

On Wed, Dec 27, 2000 at 09:27:05PM -0500, Dan Sugalski wrote:
 While we can evaluate the list lazily, that doesn't mean that's what the 
 language guarantees. Right now it's perfectly OK to do:
 
$foo = ($bar, $baz, $xyzzy);
 
 and if $bar and $baz are tied, that'll execute their FETCH methods. If 
 that's expected (and it is with functions, though arguably function calls 
 and fetches  of active data are not the same thing) then we can't be lazy. 
 I'd *like* to be, since it means we can optimize away things or defer them 
 until never, but... (Plus it makes the dataflow analysis a darned sight 
 easier, since every load and store from a variable wouldn't potentially be 
 a function call...)

I'd view the function call case as being conceptually equivalent to:

@_ = ($bar, $baz, $xyzzy);
snrub;

In this case, you are assigning the arguments to an array (@_), so it
makes sense for them all to be evaluated.  (Or not, depending on your
opinions of how list-to-array assignment should work. :)  In the case
of assigning a list to a scalar, or a list of scalars, I still think
that it makes sense for values not assigned to not be evaluated.

Consider this case:

($two, $four) = (@primes, $junk);

@primes is a lazy array containing all the primes.  Here, you would
expect only the first two values of @primes to be evaluated.  If
evaluation stops with the second element fo @primes, however, why
would $junk be evaluated?


   Also one side-effect of this, if we allow it, is to have a list masqerade
   (under the hood, at least) as another variable type. We could, say, see 
  this:
  
  @foo = (@bar, @baz);
  
   but actually defer evaluating the list and doing the assignment until
   either @foo, @bar, or @baz is accessed. (Potentially holding off even
   further--things like scalar() on @foo, for example, wouldn't require
   finishing the assignment, and neither would something simple like $foo[12])
 
 I dislike this.  It means that an exception occurring while evaluating
 $bar[0] can be deferred indefinately.
 
 This is a good argument for tagging tied variables as active data, since 
 that'd require things be evaluated immediately.

If this isn't done for tied variables, I withdraw my objection.  I'm
still a bit dubious about the cost/benefit tradeoff here, though.

Hmm.

# @primes = (2, 4, 5, 7, ...)
@foo = (1, @primes);

I think the above constitutes an argument for something.  I'm not
certain what. :


 I like lazy evaluation, but I don't think it should come at the
 expense of early detection of errors and comprehensible garbage
 collection.
 
 We're trying really, *really* hard to decouple GC with object destruction. 
 The latter should be reasonably understandable (though not necessarily 
 deterministic) while the former should be considered Dark Magic. :)

My specific concern is that it shouldn't be easy to accidentally
leave very large data structures lying around without obvious
references to them.  GC need not be deterministic, but it should
be possible to avoid leaking memory without resorting to animal
sacrifice. :

 - Damien



Re: standard representations

2000-12-27 Thread Damien Neil

On Wed, Dec 27, 2000 at 10:46:03AM -0500, Philip Newton wrote:
 So a native int could be 8 bits big? I think that's allowed according to
 ANSI.

ANSI/ISO C states:
  char = short = int = long

  char  =  8 bits
  short = 16 bits
  int   = 16 bits
  long  = 32 bits

C99 adds "long long", which is = long, and is at least 64 bits large.

I'd be in favor of defining Perl's "native int" type to be at least
32 bits long.  I would recommend against using the compiler's default
int type in all cases, as there are compilers which define int as 16
bits for backwards compatability reasons.  (As opposed to 16 bits being
the native word size of the architecture.)

  - Damien



Re: standard representations

2000-12-27 Thread Damien Neil

On Wed, Dec 27, 2000 at 02:06:45PM -0500, Hildo Biersma wrote:
 I don't recall the bit sizes to be in ANSI C.  Which paragraph is that
 in?

You need to deduce the bit sizes, as the standard doesn't speak in
terms of bits.  I don't have a copy of C89 available, but section
5.2.4.2.1 defines the sizes of the various integers:

-- minimum value for an object of type short int
   SHRT_MIN-32767 // -(2 ** 15 - 1)
-- maximum value for an object of type short int
   SHRT_MAX+32767 // 2 ** 15 - 1

...and so forth.


 Even so, the fact that a standard may declare it, doesn't make it true.
 I would expect embedded targets to differ from this.

I seriously doubt Perl will ever run on an architecture too small
to provide a 32-bit type.  I am certain it will never run on an
architecture with no 16-bit type.

Furthermore, the fact that the standard declares a thing DOES make
it true.  If Perl is to be written in C, it makes sense that it
require a compiler which at least pretends to conform to ANSI/ISO
C.  This is hardly an onerous restriction -- most compilers are
compliant, with the exception of compilers for very-small embedded
systems (ones where the total memory available is measured in bytes)
and antiquated curiosities like the SunOS 4 compiler.

Can you name specific compilers which fail to conform to the standard
in this (or other) regards, which Perl will need to support?


 That's eskewing efficiency to make sensible minimum guarantees.  I'd
 personally rather see the C compiler's native types be used, because
 that's what the platform can do _efficiently_.  Using larger types than
 that harms perl's ability to perform well on small platforms.

I am deeply dubious about Perl's ability to perform well on 80286
(or equivalent capacity) machines under any circumstances.

- Damien



Re: standard representations

2000-12-27 Thread Damien Neil

On Wed, Dec 27, 2000 at 02:51:57PM -0500, Hildo Biersma wrote:
 This seems likely, but we must take care not to take these assumptions
 too far.  For example, (and this is not realted to this discussion),
 pointers may well be smaller than integers (MVS defines 32-bit ints and
 31-bit pointers)

This is exactly the reason why standards are important.  An architecture
with 32-bit ints and 31-bit pointers is completely valid, and it is
important to not write code which assumes that ints and pointers are
interchangable.


 I have far less trust in the standards than you have.  Having said that,
 I can't actually name non-compliant compilers, so you're quite likely to
 be right.

Most compilers will violate the standard in certain small ways; complete
conformance is an ideal rarely (if ever) reached.  Few compilers (and
none, in my experience, of any quality) will commit gross violations
such as getting the guaranteed integer sizes wrong.

 - Damien



Re: Core data types and lazy evaluation

2000-12-27 Thread Damien Neil

On Wed, Dec 27, 2000 at 05:17:33PM -0500, Dan Sugalski wrote:
 The part I'm waffling on (and should ultimately punt to Larry) is what to 
 do with lazy data, and what exactly counts as lazy data anyway. For 
 example, tied variables certainly aren't passive data, but should they be 
 evaluated if they aren't used? If you do this:
 
($foo, $bar) = (@baz, "12", 15, $some_tied_scalar);
 
 should the FETCH method of $some_tied_scalar be called unconditionally, 
 even though we don't use it? (I'd argue yes, but prefer no... :)

I would argue no.  If the list is evaluated lazily, I'd only expect
the scalar FETCH method to be called when needed, not unconditionally.


 Also one side-effect of this, if we allow it, is to have a list masqerade 
 (under the hood, at least) as another variable type. We could, say, see this:
 
@foo = (@bar, @baz);
 
 but actually defer evaluating the list and doing the assignment until 
 either @foo, @bar, or @baz is accessed. (Potentially holding off even 
 further--things like scalar() on @foo, for example, wouldn't require 
 finishing the assignment, and neither would something simple like $foo[12])

I dislike this.  It means that an exception occurring while evaluating
$bar[0] can be deferred indefinately.  I'm also thinking that there
could be some really odd interactions with GC...if I write

  {
 my @bar = some_creation_function();
 @foo = (@bar);
  }

I would expect @bar to be GCd (or at least become GC-able) upon exit
from the block.  With a scheme as you describe, @bar would need to
hang around for the lifetime of @foo.

I like lazy evaluation, but I don't think it should come at the
expense of early detection of errors and comprehensible garbage
collection.

Other than that, I like what you describe.

- Damien



Re: The external interface for the parser piece

2000-11-30 Thread Damien Neil

On Mon, Nov 27, 2000 at 05:29:36PM -0500, Dan Sugalski wrote:
int perl6_parse(PerlInterp *interp,
void *source,
int flags,
void *extra_pointer);

Count me in with the people who prefer:

   int perl6_parse(PerlInterp *interp, PerlIO *io);

I understand the desire to reduce the number of API bits the external
user needs to know about, but I think that the non-PerlIO API will
lead to more complexity than it removes.

Assuming the non-PerlIO interface is used, however, I believe there
is a problem with the PERL_GENERATED_SOURCE option.  ANSI/ISO C does
not guarantee that a function pointer may be stored in a void*.

I would suggest that Perl's external APIs, at the very least, should
conform to standard C.

- Damien