Re: Wikipedia example

2006-10-12 Thread Aaron Sherman
Hey there, sorry about not responding. My mailer hid this message from 
me. I was actually about to reply asking what the deal was. ;)


chromatic wrote:

On Tuesday 03 October 2006 13:41, Aaron Sherman wrote:


This contains the Makefile, README, .pg grammar, a -harness.pir that
executes the parser on a sample string and dumps the parse tree and a
-stress.pir that runs 50,000 trial runs to see how fast PGE is (not too
shabby is the answer, as it comes in at about 1/2 the time of a P::RD
version for the simple example, and gets a bigger lead the more complex
the input expression).


I can't get this to work.  If I run 'make' in the target directory, I get a 
PASM file (with the .pir) extension.  Then if I run either of the PIR files, 
I get:


$ parrot wptest-harness.pir 
error:imcc:syntax error, unexpected LABEL, expecting $end

in file 'wptest.pir' line 3


That's odd. I double-checked the diffs to make sure I didn't send out an 
old version. This code works fine on my box, which I just updated to 
r14904. For me, line 3 of wptest.pir is main:. Should that be a syntax 
error?


That's the very first line of code output by:

 ../../../parrot -o wptest.pir ../../../compilers/pge/pgc.pir wptest.pg

so if there's a problem with it, I'm not sure that I could actually fix 
it. Any pointers appreciated!


Here's what I get:

$ ../../../parrot wptest-harness.pir
Parsing simple expression: 1+(1+1)
Match results begin:
VAR1 = PMC 'PGE::Match' = 1+(1+1) @ 0 {
expr = PMC 'PGE::Match' = + @ 1 {
type = infix:+
[0] = PMC 'PGE::Match' = 1 @ 0 {
number = PMC 'PGE::Match' = 1 @ 0
type = term:
}
[1] = PMC 'PGE::Match' = (1+1) @ 2 {
expr = PMC 'PGE::Match' = 1+1 @ 3 {
expr = PMC 'PGE::Match' = + @ 4 {
type = infix:+
[0] = PMC 'PGE::Match' = 1 @ 3 {
number = PMC 'PGE::Match' = 1 @ 3
type = term:
}
[1] = PMC 'PGE::Match' = 1 @ 5 {
number = PMC 'PGE::Match' = 1 @ 5
type = term:
}
}
}
type = term:
}
}
}
match complete



Re: Wikipedia example

2006-10-04 Thread Aaron Sherman

Markus Triska wrote:

Aaron Sherman writes:


+Written in 2006 by Aaron Sherman, and distrbuted


Typo: distributed


You are correct, sir.

This was not, in fact some strange attempt to seize control of the 
Parrot codebase ;)




Re: requirements gathering on mini transformation language

2006-10-04 Thread Aaron Sherman

chromatic wrote:

On Thursday 28 September 2006 14:51, Markus Triska wrote:


Allison Randal writes:

mini transformation language to use in the compiler tools.

For what purpose, roughly? I've some experience with rule-based
peep-hole optimisations. If it's in that area, I volunteer.


That's part of it, but mostly it's for transforming one tree-based 
representation of a program into another.  See for example Pheme's lib/*.tg 
files.


I'm confused. I thought that this is what TGE did. Is TGE going away, or 
are we talking about something that extends TGE in some way?




Wikipedia example

2006-10-03 Thread Aaron Sherman
 article:
+
+http://en.wikipedia.org/wiki/Parser_Grammar_Engine
+
+This code is here so that others can benefit from a simple example,
+and so that anyone who updates PGE can see if it affects the ability
+to handle the example given in that article.
+
+Written in 2006 by Aaron Sherman, and distrbuted under the same terms
+as the rest of the Parrot distribution that this should have come
+with.


Re: LLVM and HLVM

2006-08-23 Thread Aaron Sherman

On 8/23/06, peter baylies [EMAIL PROTECTED] wrote:


On 8/22/06, John Siracusa [EMAIL PROTECTED] wrote:

 Has anyone looked at LLVM lately?



[...]

On the other hand, Parrot built quite nicely on x86-64, although I think I

like the 32-bit build (which also built just fine, albeit without ICU)
better due to the excellent JIT support.



Not sure if the list will let this through, since I'm subscribed under
another account, but here's the problem with that: llvm is a very light
layer, but it's yet another layer. To put it between parrot and hardware
would mean that parrot is JITing to LLVM byte-code, which is JITing to
machine code. Not really ideal.

--
Aaron Sherman
Senior Systems Engineer and Toolsmith
[EMAIL PROTECTED] or [EMAIL PROTECTED]


Re: LLVM and HLVM

2006-08-23 Thread Aaron Sherman

John Siracusa wrote:

On 8/23/06 4:09 PM, Aaron Sherman wrote:
  

here's the problem with that: llvm is a very light layer, but it's yet another
layer. To put it between parrot and hardware would mean that parrot is JITing
to LLVM byte-code, which is JITing to machine code. Not really ideal.



...unless LLVM does a much better job of native code generation than the
existing Parrot code, that is.  Optimization seems to be LLVM's thing.
  


Keep in mind that you're not talking about some HLL generating LLVM 
bytecode. You're talking about Parrot reading in Parrot byte code, 
JITing to LLVM and then going through that dance again. The amount of 
lossage in those layers of translation simply cannot be worth whatever 
the difference is between LLVM optimization and Parrot's JIT, since 
Parrot will already have generated code that makes it MORE difficult to 
optimize.


I'll buy it if I see numbers, but I'm highly skeptical.



Re: End the Hollerith Tyranny? (linelength.t)

2006-08-21 Thread Aaron Sherman
On Mon, 2006-08-21 at 08:45 -0700, Chip Salzenberg wrote:
 On Mon, Aug 21, 2006 at 10:48:59AM -0400, Will Coleda wrote:
  The way you phrase the question, you're not going to get any of these  
  answers.  Who is programming parrot on their *physical* VT100? =-).
  The primary reason for an 80 column limit is developer convenience, I  
  think.
 
 Well, that's fair.  Many of us are old enough to have used such limited
 hardware, but it's all surely been relegated to the trashheap by now.  So:
 Would anyone be inconvenienced by exceeding 80 columns regularly; and, how?

I typically measure my screen real estate in discrete 80-col units. My
layout of terminals, editors (emacs, xemacs, vim and gvim mixed fairly
liberally for different purposes) and other applications is suited to
editing in 80-column units. When I have to re-size a window to larger
than that, it's a pain, but not a terribly hurtful one.

I like the Gnome Style document for reference here. They talk about
8-space tabs, but it's the same issue as 80-column text:

Using 8-space tabs for indentation provides a number of
benefits. It makes the code easier to read, since the
indentation is clearly marked. It also helps you keep your code
honest by forcing you to split functions into more modular and
well-defined chunks - if your indentation goes too far to the
right, then it means your function is designed badly and you
should split it to make it more modular or re-think it.

Just my $0.02.

  P.S.: Seems only fair that if we're sticking with C89, we stick with  
  1989 terminal sizes. =-)
 
 Some selectivity is in order, or we'll have to target 1989 memory sizes,
 disk capacities, and network bandwidth

There are times that I use emacs, not because it's the right tool for
the job that *it* is doing, but because I have to use it over a lossy
line, and NOTHING that I've found squeezes quite so much out of
curses-like rendering over a slow line. Why? Because it was designed to
do multi-window editing over 1200-9600 bps modem lines without making
its users want to kill themselves.

There is something to be said for working well in low-resource
environments, especially when talking about a VM that might need to be
placed into the control circuitry of, say, an elevator control system.




Re: PMC flags

2005-05-04 Thread Aaron Sherman
On Wed, 2005-05-04 at 01:51, Leopold Toetsch wrote:
 Aaron Sherman wrote:
  On Mon, 2005-05-02 at 08:58 +0200, Leopold Toetsch wrote:

  Here is some example P5 source from pp_pow in pp.c:
 
 I presume that Ponie eventually will run Parrot opcodes. pp_pow() is 
 like all these functions part of the perl runloop. Therefore it'll be
 
infix .MMD_POW, P1, P2, P3

Sorry, I wasn't being clear with my example and the fact that it was
just that.

Yes, I understand that the opcode pow will become Parrot... well, at
least I think I understand that, but I'm not 100% sure. There's some
very hairy magic in pp.c and pp_hot.c that Perl code relies on, and in
some places (e.g. pp_add), its behavior is not compatible with Parrot in
some very fundamental ways (e.g. all addition is done as unsigned
integers if/when possible to gain the overflow semantics of C8X uint,
check the novella-length comments in pp.c for details).


ALL THAT ASIDE, however, I did understand that the goal was not
explicitly to leave pp_pow as it is. I was using it as an example of the
kind of code that uses the Perl 5 core. XS, the Perl 5 runtime (e.g. the
regexp engine), parser, etc. all rely on the same sort of constructs
that I used pp_pow to outline.

  If you're writing a compiler from scratch, I can see that being mostly
  true. However, in trying to port Perl 5 guts over to Parrot, there's a
  lot of code that relies on knowing what's going on (e.g. if a value has
  ever been a number, etc.)
 
 Most of the guts are called from the runloop. But there is of course XS 
 code that messes with SV internals.

Ignore code that messes with SV internals. Code in many parts of the
runtime that almost certainly won't be re-written use the SV interface
provided by sv.h correctly, doing tons of flag tests all the time.
Literally every operation requires several flag tests!

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: PMC flags

2005-05-03 Thread Aaron Sherman
On Mon, 2005-05-02 at 08:58 +0200, Leopold Toetsch wrote:
 Nicholas Clark [EMAIL PROTECTED] wrote:

  1 bit for SVf_IOK
  1 bit for SVf_NOK
  1 bit for SVf_POK
  1 bit for SVf_ROK
 
 I'd not mess around with (or introduce) flag bits. The more that this
 would only cover perl5 PMCs. Presuming that the Perl5 PMCs are subtypes
 of Parrot core PMCs, I'd do ...
[... code doing a string isa check on the type ...]

 The vtable functions Cisa and Cdoes, which take now a string, are a
 bit heavy-weighted and might get an extension in the log run that take
 an integer flag.

Unless this happens, this would be a HUGE performance hit. After all,
Sv*OK is called all over the place in the Perl 5 code, including many
places where performance is an issue.

[...]
  2 bits to say what we're storing in the union
 
 The vtable is that information:
 
   INTVAL   i = VTABLE_get_integer(interpreter, pmc);
   FLOATVAL n = VTABLE_get_number(interpreter, pmc);

Here is some example P5 source from pp_pow in pp.c:

if (SvIOK(TOPm1s)) {
bool baseuok = SvUOK(TOPm1s);
UV baseuv;

if (baseuok) {
baseuv = SvUVX(TOPm1s);
} else {
IV iv = SvIVX(TOPm1s);

and here that is, run through the C pre-processor: pre-processor:

if *(sp-1)))-sv_flags  0x0001)) {
char baseuok = *(sp-1)))-sv_flags  (0x0001|0x8000)) 
== (0x0001|0x8000));
UV baseuv;

if (baseuok) {
baseuv = ((XPVUV*) ((*(sp-1)))-sv_any)-xuv_uv;
} else {
IV iv = ((XPVIV*) ((*(sp-1)))-sv_any)-xiv_iv;

Notice that there is exactly no function calling going on there. To
change that to (pseudocode):

if (isa_int_test(TOPm1s)) {
bool baseuok = isa_uint_test(TOPm1s);
UV baseuv;

if (baseuok) {
baseuv = invoke_uint_vtable_get(TOPm1s);
} else {
IV iv = invoke_int_vtable_get(TOPm1s);

Well... even after JIT compilation, function call overhead is function
call overhead, no?

 just do the right thing. Usually there is no need to query the PMC what
 it is.

If you're writing a compiler from scratch, I can see that being mostly
true. However, in trying to port Perl 5 guts over to Parrot, there's a
lot of code that relies on knowing what's going on (e.g. if a value has
ever been a number, etc.)




Common LISP

2005-04-25 Thread Aaron Sherman
I forwarded the Common LISP notice to a friend of mine who works on
CMUCL internals, and he suggested:

[they should think about starting with] CMUCL and retarget
[sic] it for the new VM. That way he gets all the type inference
for free [which would increase performance]

Just thought I'd pass that along, in case it's of interest. I'm not a CL
guy at all, so I wouldn't even know how type inference helps
performance.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: I wish to understand the JIT machine code generator

2005-04-14 Thread Aaron Sherman
On Thu, 2005-04-14 at 01:50 -0400, [EMAIL PROTECTED] wrote:

  [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 
  I have been trying to examine the i386 code generator to see how
  feasible it would be to create an AMD64 code generator.
[...]
 I'm going to copy the i386 path to an a64 path and have at it.
 I'm hoping it won't be much of a stretch to get 64-bit code generated --
 although REASONABLE 64-bit code is another problem.  But first I want 
 to ask if anybody else is doing this already.

Just one thought on stylistic conventions. GCC uses the following naming
conventions for AMD processors, and I imagine that they have had to do
this because they have discovered over the years that it makes sense:

   k6  AMD K6 CPU with MMX instruction set support.

   k6-2, k6-3
   Improved versions of AMD K6 CPU with MMX and 3dNOW! instruc-
   tion set support.

   athlon, athlon-tbird
   AMD Athlon CPU with MMX, 3dNOW!, enhanced 3dNOW! and SSE
   prefetch instructions support.

   athlon-4, athlon-xp, athlon-mp
   Improved AMD Athlon CPU with MMX, 3dNOW!, enhanced 3dNOW! and
   full SSE instruction set support.

   k8, opteron, athlon64, athlon-fx
   AMD K8 core based CPUs with x86-64 instruction set support.
   (This supersets MMX, SSE, SSE2, 3dNOW!, enhanced 3dNOW! and
   64-bit instruction set extensions.)

Since Parrot's JIT will need to think about the CPU in ways that are
roughly analogous to the way GCC's RTL-to-machine generator thinks about
it, perhaps you'll want to start off with similar categorization.

Also, looking at the m-* files in the GCC source tree could easily give
you some pointers on what it is that you might want to be thinking about
for optimal code-gen under Parrot.

It would be awesome if someone figured out a way to translate GCC's m-*
templates into some sort of starting point for Parrot JIT definitions,
but that might be too science-fictiony to make sense, as GCC is defining
translations for RTL sequences and Parrot is defining translations for
PBC ops, which are very different.





Re: A sketch of the security model

2005-04-14 Thread Aaron Sherman
On Thu, 2005-04-14 at 09:11, Dan Sugalski wrote:
 At 10:03 PM -0400 4/13/05, Michael Walter wrote:

 On 4/13/05, Dan Sugalski [EMAIL PROTECTED] wrote:
   All security is done on a per-interpreter basis. (really on a
   per-thread basis, but since we're one-thread per interpreter it's
   essentially the same thing)

 Just to get me back on track: Does this mean that when you spawn a
 thread, a separate interpreter runs in/manages that thread, or
 something else?
 
 We'd decided that each thread has its own interpreter. Parrot doesn't 
 get any lighter-weight than an interpreter, since trying to have 
 multiple threads of control share an interpreter seems to be a good 
 way to die a horrible death.

So to follow up on Michael's question: does this mean that you spawn a
new thread, instance an interpreter, and then begin executing shared
code? What about data? I assume that all has to be shared, since shared
data is a fundamental piece of any threaded application's assumptions.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: A sketch of the security model

2005-04-14 Thread Aaron Sherman
On Thu, 2005-04-14 at 13:22 -0400, Dan Sugalski wrote:

 Anyway, a number of people I deeply respect (and who do this sort of 
 thing for a living, at deep levels) have told me flat-out that we're 
 better not having a security system than we are trying to roll our 
 own, and the common response to We're lifting VMS' has been Good. 
 Do that.

Well, if you were lifting VMS's security model, that would be fine, but
you're really not. You're lifting the idea of VMS's security model.

A security model is a many-fold thing (I've only so far discussed the
highest and most user-visible level because they are the bits most
applicable to Parrot). You're talking about cherry-picking certain bits
and re-designing the rest to fit.

I have NO PROBLEM with that, but I want to make sure that you don't
think this is the easy way to go. It's not. You're biting off a HUGE
amount of work, and your first 2 attempts will likely be utterly wrong
(if history is any guide, not because you're not smart and capable).

 I think it would be easier to start from scratch, personally. I
 understand your concerns, but I don't think you run any less risk by
 creating a new VM security model out of an OS security model than you do
 by creating a new one. They both create many opportunities to make a
 mistake.
 
 That's not been the general consensus I've seen from people doing 
 security research and implementation.

They both create many opportunities to make a mistake. Really. Go ask
the folks at Microsoft who lifted VMS for NT's security model, and
then go ask the folks at Sun who rolled their own with Java. Both have
had significant pain.

 If you really want to reduce the chances that you'll make a mistake,
 swipe the security model from JVM or CLR and start with that. At least
 those have been tested in the large, and map closer to what Parrot wants
 to do than VMS.
 
 The problem is twofold with those. First, there's some indications 
 that they're busted, 

They're not busted so much as in many places they have needed
significant work. I think that the general consensus right now is that
JVM is fairly well sorted out in 1.5, and CLR is moving along well.

I would say that at an infrastructure level they're both more than
sufficient models, and that's all you're going to lift anyway (unless
you were considering lifting code from mono, which I'm not sure is
workable license-wise).

 and second (and more importantly) they're both 
 very coarse-grained, and that leads to excessive privs being handed 
 out, which increases your exposure to damage. 

That's fine. Merging down either JVM or CLR's privs into a granularity
that you're happy with should work fine, and again, privs are only a
small part of the security model. If you want a better picture these
sources might be useful:

http://developer.intel.com/technology/itj/2003/volume07issue01/art05_security/vol7iss1_art05.pdf
http://java.sun.com/docs/books/security/
http://www.arctecgroup.net/ISB0705GP.pdf
http://www.arctecgroup.net/ISB0706GP.pdf
http://www.arctecgroup.net/ISB0707GP.pdf

 Don't get me wrong. I loved VMS back in the day. It was a pain in the
 ass at times, but what isn't. It's just that it's not a VM trying to
 execute byte-code... it's an operating system which directly manages
 hardware.
 
 Yeah, but don't forget that for all intents and purposes parrot is an 
 OS trying to execute bytecode, 

VMS security was interesting because it was one of the first systems to
substantially abstract the security of the system from the security of
the hardware. You don't get to touch hardware because you're user-land,
so you have a very different set of concerns. You do, however, have
roughly the same set of concerns as the JVM and CLR. That's why I
suggested them. If you don't like them, that's cool, I was only trying
to save those of you who have enough time to think about something as
large as security infrastructure some time and pain.

I don't have that kind of spare time, so I bow to your superior ability
to manage your schedule.




Re: A sketch of the security model

2005-04-13 Thread Aaron Sherman
 to
execute byte-code... it's an operating system which directly manages
hardware.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Pugs Q for the Parrot FAQ?

2005-03-31 Thread Aaron Sherman
On Thu, 2005-03-31 at 12:04, Nicholas Clark wrote:

 Patches welcome, as I'm not sure of the best way to phrase the cross
 language stuff to follow on smoothly.

Also, Parrot provides access to Perl 6 from other languages and to those
other languages from Perl 6 at run-time, a feature which is both complex
and highly beneficial for all concerned.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Pugs Q for the Parrot FAQ?

2005-03-30 Thread Aaron Sherman
On Wed, 2005-03-30 at 14:58, Nicholas Clark wrote:
 Based on the wheat on IRC this evening, is this question/answer worth adding
 to the Parrot FAQ on parrotcode.org?
 
 Pugs is going great shakes - why not just toss Parrot and run Perl 6 on Pugs?
 
 Autrijus Tang, the lead on the Pugs project, notes that an *unoptimised*
 Parrot is already 30% faster than Haskell. Add compiler optimisation and a
 few planned optimisations and Parrot will beat Pugs for speed hands down.
 Autrijus things that Pugs could be made faster with some Haskell compiler
 tricks, but it's harder work and less effective than the Parrot optimisations
 we already know how to do.

Good answer, and other than adding a bit about cross-language usage I'd
stop there (memory issues are important but complex, and you've already
made your point with this brief answer).

The next question is:

Q: OK, so Parrot is fast... Pugs can back-end to Parrot, right?

A: Yes (though at this time, that's in the early stages). Still, the
ultimate goal is for Perl 6 to be self-hosting (that is, written in
itself) in order to improve introspection, debugger capabilities,
compile-time semantic modulation, etc. For this reason, Pugs will
probably be the compiler that first compiles the ultimate Perl 6
compiler, but thereafter Pugs will no longer be the primary reference
implementation. This is documented by the Pugs team at
http://svn.perl.org/perl6/pugs/trunk/docs/01Overview.html

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Python, Parrot, and lexical scopes

2004-10-18 Thread Aaron Sherman
On Mon, 2004-10-18 at 07:55, Sam Ruby wrote:
 I've been trying to make sense of Python's scoping in the context of 
 Parrot, and posted a few thoughts on my weblog:
 
 http://www.intertwingly.net/blog/2004/10/18/Python-Parrot-and-Lexical-Scopes

It seems like everything on that page boils down to: all functions are
module-scoped closures.

Your example:

Consider the following scope1.py:

 from scope2 import *
 print f(), foo
 foo = 1
 print f(), foo

and scope2.py:

 foo = 2
 def f(): return foo

The expected output is:

 2 2
 2 1

Is also useful for context, but I don't think you need the Perl
translation to explain it.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



rx_compile and FSAs

2004-10-13 Thread Aaron Sherman
I've done quite a lot of thinking about Parrot's rx_compile op, as I was
thinking about implementing it. However, I've come to the conclusion
that the definition of the op as it stands is too shallow. Please
consider this definition and let me know if implementing it would be
worth it to Parrot as a whole, or if this is a case of being too
generic.

rx_compile out P0, in S0, in I0

Produce an FSA continuation (see fsa_to_continuation) in P0
which matches the regular expression in S0. The syntax of the
regular expression is determined by the type value in I0. This
type must be a valid type as returned by rx_load_type. This op
is simply a combination of rx_to_fsa and fsa_to_continuation

rx_to_fsa out P0, in S0, in I0

Pass the regular expression string parameter to the specfied
regular expression parser based on the type parameter (as
returned from rx_load_type). Returns an FSA PMC.

rx_load_type out I0, in S0

Dynamically load an rx compiler by name and set the first
parameter to the identifier for that compiler (for use with
rx_compile). The default, minimalistic regular expression syntax
has identifier 0. The compiler itself must invoke fsa_new at
some point in order to generate its return value.

fsa_new out P0, in I0, in P1, in P3, in I1

Given inputs: max state, alphabet, transition matrix and start
state, produce an FSA output object which implements the
requirements. The alphabet can be one of several values, TBD,
but noted that characters are only one possible type of
alphabet. The transition matrix is an array of arrays, each of
which contains 3 to 5 values: start state, input range(s),
target state and an optional integer set to 1 or 0 to indicate
if this is a final state or not and an optional integer set to
0, 1 or 2 to indicate if this state consumes its input and
records it (0), consumes its input and does not record it (1) or
does not consume the input token. Input ranges are going to be
similar values to alphabet. More detail may be added to the
transition matrix as needed. Note that target state may end up
needing to be either an integer state value or a continuation.

fsa_to_continuation out P0, in P1

Take as input a valid FSA and return a continuation which, when
invoked, will simulate the FSA on its input parameter, and
return an FSA status PMC. If the FSA has never been compiled to
Parrot, this op compiles it, otherwise the same continuation is
returned.

For all intents and purposes, assume that this is a black box,
and although in most cases the continuation returned will invoke
newly generated bytecode (representing the FSA's state
transitions), that need not be the case.

fsa_minimize out P0, in P1

Attempt to minimize the FSA and return a new FSA as output.

The goal here is to provide a generic FSA mechanism which can accomodate
the various regular expression syntaxes as input, or can be generated
by hand by a compiler writer who wishes to have control over the
details (or provide FSA semantics over more complex tokens than just
characters). Either way, the result is an FSA which can be executed (aka
simulated) by invoking its continuation.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: rx_compile and FSAs

2004-10-13 Thread Aaron Sherman
On Wed, 2004-10-13 at 10:29, Leopold Toetsch wrote:
 Aaron Sherman [EMAIL PROTECTED] wrote:
  I've done quite a lot of thinking about Parrot's rx_compile op, as I was
  thinking about implementing it.
 
 Given that rx_compile syntax and semantics aren't really final and
 second that compiling a rx takes substantial time, I'd do something like
 this:
[...]
 You can experiment with needed methods, implement new ...
 You can subclass the Rx_Compiler, implement it in PIR and what not.
 
 Eventually for gaining the last bit of speed, we could make opcodes for
 the methods.

Sounds good. I need to look at the NCI stuff. I was going to skip over
that at first and build the default rx compiler (value 0) inline and
focus on the FSA-to-bytecode implementation, but if you think that
that's going to need up-front engineering, I'll look at it now (well,
now as in when I get home tonight).

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: rx_compile and FSAs

2004-10-13 Thread Aaron Sherman
On Wed, 2004-10-13 at 10:44, Matt Fowles wrote:

 I am of the opinion that we should treat regular expression as simply
 another language.  Thus one can register different compilers for
 different regex syntaxes and we do not need to add more opcodes for
 them.

That is essentially what I've proposed, however it is important to
realize 2 things:

  * Just because parrot exports a set of regex syntaxes does not
mean that those are the syntaxes that the user of a language
will see. Language designers might pre-process down to one of
those, or they might simply avoid all of them.
  * I'm shifting the interface from rx_compile (which is still
there, though possibly as a PMC, given Leo's comments) to
new_fsa, and letting language designers write their own rx
compiler. Because states in the FSA can be continuations (which,
yes, means it's not a true FSA, because it's not finite) you
should be able to implement arbitrarily complex regexes, and
even Perl 6 rules (which are not regular expressions or true
FSAs at all because they maintain state as they recurse
infinitely) without stepping outside of this framework.

I am certain that I will make many mistakes while implementing this, as
I'm learning how to create and manage FSAs, but putting an
implementation in place seems to me to be the right way to start.

   This also has the advantage of placing their internals in a
 black box off to the side.  So, the regex compiler can choose to
 agressively compile and optimize from the start or do it lazy at its
 whim while hiding behind the interface that the compile opcode already
 presents.

Again, this is all true, though it is also possible for the compiler
writer to take control by directly building and managing the FSA (e.g.
directing when/if minimization takes place, which is an optimization
that MUST take place before Parrot bytecode is generated).

The key here is that languages will be able to use diverse regular
expression syntaxes AND directly generated FSAs (e.g. for matching
high-level objects rather than characters), but use them all in the same
way. This is a fundamentally OO approach to FSA management, which was
inspired for me by the good work done by some of the excellent FSA
toolkits out there (which, woefully, are mostly in C++, and not truly
open source, so we cannot use them directly).

Several things are standing in my way right now, but I'm confident that
I'll find solutions:

 1. The concept of an alphabet is, as yet, vague. Obviously ASCII
characters, ISO-Latin-1 characters and Unicode code points are
all possible input ranges, but so too are any finite range of
integers or enumerated values. More research will be required to
find out how to best special-case the common cases and make them
efficient, using the already discussed low-level parrot opcodes
for string matching.
 2. It's not clear how a continuation state behaves when it
invokes its return continuation... how it directs the FSA to its
next state is probably my trickiest problem, as it must be a
mechanism that is compatible with the concept of an FSA (e.g.
you may have to declare what states any given continuation can
trigger upon return, or simply pass multiple return
continuations, one for each possible next state after the
continuation).
 3. I don't yet know if my extended translation matrix is sufficient
for modern patter-matching (ala Perl 5). Certainly it's not
sufficient for look-behind assertions, but that particular case
might have to be a continuation state, rather than a regular
node in the FSA. Other than that sort of thing, I do want all of
Perl 5's regular expression syntax to be representable in my
Parrot FSAs (which, in case anyone was wondering, I'm thinking
strongly of translating directly from the NDFA, rather than
translating to the corresponding DFA first... that is a harder
problem, but probably one more suited to this application, which
incidentally opens the door to possible auto-parallelization
later on).

I'll be working on these over the weekend at my parents' place (nothing
like the ocean to make thinking easier).

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Getting a char from stdin without waiting for \n

2004-10-13 Thread Aaron Sherman
On Wed, 2004-10-13 at 12:10, Matt Diephouse wrote:
 I'm still working on a new Forth implementation...
 
 Forth has a word `key` that gets one character from stdin. It
 shouldn't wait for a newline to get the character. Is there any way to
 implement this currently in PIR?

You can't do this in a standard, portable way in C, so I doubt that PIR
has such a mechanism. Here's a reference from the comp.lang.c FAQ:

http://www.eskimo.com/~scs/C-faq/q19.1.html

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: --pirate and coroutines

2004-10-11 Thread Aaron Sherman
On Mon, 2004-10-11 at 14:03, Sam Ruby wrote:

 Here's a script that will run in both Python and Perl.  It simply will 
 return different results.
 
print 1 + 2,\n,;
print 45%s8 % 7,\n,;
print 45 / 7   ,\n,;
print ['a','b','c'],\n,;
print ['a'] + ['b'],\n,;
print 9 + None ,\n,;

That last line fails under my python, but ignoring that:

example 1: I don't get the complexity here. Python's add is a special
case function that doesn't have much to do with the Parrot addition
operators except in that, in some cases, it calls them (and in other
cases, it calls ops like join). We have to distinguish the job of the
compiler writer from the job of the Parrot interpreter writer here, and
I don't think this is Parrot's concern.

example 2: This is just abusing the difference between % in Python and
in Perl. No shock there. Don't confuse syntax with semantics.

example 3: Python does integer math on integers, Perl does
floating-point. Again, not an issue.

I'll stop there. I think it's clear that there's two desires here. One I
think is reasonable, and one is (IMHO) not.

To want to be able to pass a PyString to a Perl function is fine.

To want to be able to pass a PyString to a Perl function and have it
mutate any code that uses it into Python semantics just doesn't make
sense to me.

The caller's semantics will drive how such things behave, and those
semantics will be imposed by the compiler in many cases, not Parrot.

If, for example, the caller turns:

a + b

into:

[...]
join result, a, b
[...]

then that's what happens, and if a happens to be a PerlString, too bad
that it doesn't think that + can mean concatenation.

If your compiler turns everything into an object, then the surprises
that develop from using variables that are alien to your object tree are
not, in fact, surprising.

Your concerns about hash behavior make more sense to me, and I do think
that hashing should be a core property of all PMCs. For those who don't
agree, go take a look at the mess that is C++-STL, and try this:

std::hash_mapstd::string myhash;

I love the STL in concept, but in practice it's full of gotchas like
this because the object tree is too disjoint because the foundation was
never set in such a way that the later pieces fit.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: [pid-mode.el] cannot edit

2004-10-05 Thread Aaron Sherman
On Mon, 04 Oct 2004 21:39:59 +0100, Piers Cawley [EMAIL PROTECTED] wrote:
 Stéphane Payrard [EMAIL PROTECTED] writes:
 
  On Fri, Oct 01, 2004 at 06:09:37PM +0200, Jerome Quelin wrote:
  This function is defined in emacs:
 
line-beginning-position is a built-in function.
(line-beginning-position optional N)

  switch to emacs. :)
 
 Or patch pir-mode.el, your choice.

That should be something like:

(defun line-beginning-position (optional n)
  Return the character position of the first character on the current line.
With argument N not nil or 1, move forward N - 1 lines first.
If scan reaches end of buffer, return that position.
  (save-excursion
(beginning-of-line)
(if n (next-line n))
(point)))

no?

-- 
Aaron Sherman
Senior Systems Engineer and Toolsmith
[EMAIL PROTECTED] or [EMAIL PROTECTED]


Re: [pid-mode.el] cannot edit

2004-10-05 Thread Aaron Sherman
On Fri, 2004-10-01 at 18:22, John Paul Wallington wrote:
 Jerome Quelin [EMAIL PROTECTED] writes:
 
  And the minibuffer tells me:
  Symbol's function definition is void: line-beginning-position
 
  I'm using xemacs 21.4.14
 
 You could put something like:
 (defalias 'line-beginning-position 'point-at-bol)
 in your XEmacs init file.

Sorry, I missed that comment before I wrote my own. Good catch.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Metaclasses

2004-10-04 Thread Aaron Sherman
On Mon, 4 Oct 2004 13:24:58 -0400, Dan Sugalski [EMAIL PROTECTED] wrote:
 On Mon, 4 Oct 2004 11:45:50 -0400, Dan Sugalski [EMAIL PROTECTED] wrote:
   Okay, color me officially confused. I'm working on the assumption
   that metaclasses are needed, but I don't, as yet, understand them.

 At 12:09 PM -0400 10/4/04, Michael Walter wrote:
 http://members.rogers.com/mcfletch/programming/metaclasses.pdf

 I do have that one. Unfortunately it's the PDF of slides and, while
 it looks like if I was at the talk it'd all make sense, without the
 talk that goes with 'em... not so much sense.

A metaclass is simply an object which represents the class itself and
can perform operations on the class. One (IMHO bad) reason to do this
is for aspect oriented programming, AKA making object oriented
programming even harder to debug. This is where you modify a class
on-the-fly to inject behaviors or conditions before or after events
(usually method invocations, specfically).

Metaclasses can also be used to do things like sub-class on the fly
(e.g. mix-ins). When you do this, you invoke a method on the metaclass
which requests a new class (a sort of copy constructor for classes)
with a new set of behaviors (usually via an inheritance mechanism).

Perl 6, for example, will be able to say:

  my Dog $spot .= new;
  my Dog $greyhound := $spot but Animal::Fast;

or something that looks strikingly like that. In this case, $greyhound
isn't really a Dog, it's an anonymous class type which was generated
by the but and instantiated from $spot.

More importantly, you could do the same thing when you say:

  my Dog $spot .= new;
  my Dog $dogbiscut .= new;
  $dogbiscut.race($spot but Animal::Fast);

Here, the Dog.race method takes, we presume, a Dog, but we're passing
it something that has an additional role attached (Animal::Fast).
Everything still works, but the role might change how this particular
dog works in ways that the original Dog designer didn't have in mind.

IMHO this is the correct reason (though there are other reasons,
mostly dealing with introspection and debugging) to want metaclasses.
Mixins for Python, Ruby and Perl 6 become trivial given this
mechanism. Of course, I'm not SURE you need to put this in Parrot...
it really depends on how valuable it is to be able to do it the same
way in all compilers.

If you want to embrace this in Parrot, you're probably going to want
to use something akin to the Perl 6 model, since it's designed to be
able to emulate the Python, Ruby and Scheme models.

-- 
Aaron Sherman
Senior Systems Engineer and Toolsmith
[EMAIL PROTECTED] or [EMAIL PROTECTED]


Re: rand opcodes are deprecated

2004-09-28 Thread Aaron Sherman
On Tue, 2004-09-28 at 03:53, Leopold Toetsch wrote:
 We already have the Random PMC with vtables to create random numbers. 
 There's really no need to have opcodes too. If there aren't serious 
 arguments for keeping these opcodes, they'll be removed for the release.

Didn't you and I specifically have this discussion when I wrote those
opcodes? I don't recall the justification at the time, but I thought
that we'd reached an understanding then

The opcodes only act as a front-end to the singleton PMC, so there's no
duplication of the implementation.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Namespaces again

2004-09-28 Thread Aaron Sherman
On Mon, 2004-09-27 at 13:04, Chip Salzenberg wrote:

 For Perl, I get that.  But for Python, AFAICT, namespaces are
 *supposed* to be in the same, er, namespace, as variables.  No?

Yes, and what's more the suggestion of using :: in Parrot won't work
perfectly either (I'm pretty sure that there are LISP variants, possibly
including scheme) that use : in identifiers.

Rather than trying to shuffle through the keyboard and find that special
character that can be used, why not have each language do it the way
that language is comfortable (e.g. place it in the regular namespace as
a variable like Python or place it in the regular namespace, but
append noise like Perl or hide it in some creative way for other
languages). For the most part, there's no performance penalty in having
a callback that the language/library/compiler provides because access to
the objects in question will be via a PMC, and only LOOKUP of that PMC
will be via namespace, no?

In that way, you could:

namespace_register  perl5_namespace_callback, Perl5
namespace_register  python_namespace_callback, Python
[...]
namespace_lookupP6, F\0o::Bar, Perl5
namespace_lookupP7, foo.bar, Python

the namespace callback could take a string and return whatever Parrot
needs to look up a namespace (Array PMC?), having encoded it according
to Parrot's rules.

That way, you can solve this however you like (heck, put a  between
them if you want... in fact, I kind of like that).

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Namespaces again

2004-09-28 Thread Aaron Sherman
On Tue, 2004-09-28 at 12:05, Jeff Clites wrote:
 On Sep 28, 2004, at 7:02 AM, Aaron Sherman wrote:
 
  why not have each language do it the way
  that language is comfortable (e.g. place it in the regular namespace as
  a variable like Python or place it in the regular namespace, but
  append noise like Perl or hide it in some creative way for other
  languages).

 That's similar in spirit to what I proposed of allowing PMC-subclassing 
 of the default ParrotNamespace, so that namespaces created from 
 different languages (often implicitly) could have different behaviors. 
 But I'd keep the pulling-apart of F\0o::Bar into [F\0o; Bar] a 
 compile-time task, so that at runtime the work's already been done 
 (since the compiler knows what language it's compiling).

Sounds reasonable, thought you can't ALWAYS do it in advance, you
certainly want to do as much as possible up-front.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Why lexical pads

2004-09-24 Thread Aaron Sherman
On Fri, 2004-09-24 at 10:03, KJ wrote:

 So, my question is, why would one need lexical pads anyway (why are they 
 there)?

They are there so that variables can be found by name in a lexically
scoped way. One example, in Perl 5, of this need is:

my $foo = 1;
return sub { $foo ++ };

Here, you keep this pad around for use by the anon sub (and anyone else
who still has access to that lexical scope) to find and modify the same
$foo every time. In this case it doesn't look like a by-name lookup,
and once optimized, it probably won't be, but remember that you are
allowed to say:

perl -le 'sub x {my $foo = 1; return sub { ${foo}++ } }$x=x();print $x-(), 
$x-(), $x-()'

Which prints 012 because of the ability to find foo by name.

Of course, you can emulate this behavior, but in doing so, you're going
to have to invent the pad :)

Someone else suggested that you need this for string eval, but you don't
really. You need it for by-name lookups, which string evals just happen
to also need. If you can't do by-name lookups, then string eval doesn't
need pads (and thus won't be able to access locals).

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Why lexical pads

2004-09-24 Thread Aaron Sherman
On Fri, 2004-09-24 at 12:36, Jeff Clites wrote:

 Ha, you're example is actually wrong (but tricked me for a second). 
 Here's a simpler case to demonstrate that you can't look up lexicals by 
 name (in Perl5):

You are, of course, correct. If I'd been ignorant of that in the first
place, this would be much less embarassing ;-)

However, the point is still sound, and that WILL work in P6, as I
understand it.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Synopsis 9 draft 1

2004-09-13 Thread Aaron Sherman
On Mon, 2004-09-13 at 07:19, [EMAIL PROTECTED] wrote:
 On Mon, 6 Sep 2004, Aaron Sherman wrote:
   Sized low-level types are named most generally by appending the number
   of bits to a generic low-level type name:
  
   [...] int1 int2 int4 int8 int16 int32 int64
  
 
  Ok, so Parrot doesn't have those. Parrot has int.
 
 The above generic low-level types are specific instances of a more general
 specification-based type system, with features grouped roughly as:

Martin, I don't think you can reasonably have the integer registers
typed so as to allow for multiple storage representations. For one,
the very fact that they lack such baggage is what makes them useful.

It *may* make sense to provide an unsigned, 64-bit integer somewhere,
though (I hesitate to say as an alternate register type, since that
touches so much of Parrot).

The real question is this: is this just a Perl 6 thing (if so, then it's
fodder for the newly created p6c, and we should drop it), or will/should
other high level languages be defining sized integer types through
Parrot? If so, then I don't think the current idea of having an
Integer PMC is going to be as generic as suggested.

If you think that the limitation of not having a handy 64-bit type on a
32-bit system is no big deal, check out the convolutions one Python user
suggests under Windows just to store the time:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/303344

Yeah, you're gonna want to not do that ;-)

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: multiple languages clarification - newbie

2004-09-09 Thread Aaron Sherman
On Wed, 2004-09-08 at 18:02, Richard Jolly wrote:

 Can you really do this:
 
 #!/usr/bin/perl6
 use __Python::sys;# whatever syntax
 sys.stdout.write( 'hi there');# perl6 syntax matches python syntax 

There's some confusion in the responses between syntax merging (not
appropriate for p6i, and not really what you asked) and access to
modules in other languages.

Yes, what you ask is essentially possible. It's not clear that you would
use ., though the way python does module heirarchies, maybe (it sort
of is closer to a method invocation rather than a sub-namespace).

Parrot represents namespaces, calling conventions and high level data in
generic ways so that they can be moved between languages which might
have different syntax and semantics.

Because of that something like:

foo(bar, 1, new Chunder);

in Perl 5 / Ponie could easily call a foo that was defined in some
other Parrot language, say Python:

def foo (x, y, z):
print String was %s, number was %d, other %s%(
x, y, z.yawn())

In the case of the string, Python would get a PMC which it could perform
all of the normal string operations on, even though it would be a
Perlish string object rather than a pythonish string object.

Same basic deal in terms of the number.

In the case of the objet, Python would get a PMC object which has a
means of invoking a method. Python would invoke yawn on z with no
parameters and the Perl:

package Chunder;
...
sub yawn { my $self = shift; print Now yawning\n }

would be invoked as normal.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: No Autoconf, dammit!

2004-09-09 Thread Aaron Sherman
On Wed, 2004-09-08 at 12:40, Larry Wall wrote:

 have to be careful to separate architectural parameters from policy
 parameters.  An architectural parameter says your integers are 32 bits.
 A policy parameter says you want to install the documentation in the
 /foo/bar/baz directory.  Cross compilation has to nail down the
 architectural parameters while potentially deferring decisions on
 policy to a later installation step.

Actually, I'd say that they're all architectural parameters, and you
want to put them all in the database (e.g. you should define that for
Fedora Linux Core 2, the default documentation area is /usr/share/man,
but for SunOS 3, it's /usr/man). What the person compiling the program
OVERRIDES is their call (and in some contexts, the size of integers is a
POLICY decision, not architectural).

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Cross compiling (extracting knowlege from autoconf?)

2004-09-09 Thread Aaron Sherman
On Wed, 2004-09-08 at 17:40, Rhys Weatherley wrote:
 On Thursday 09 September 2004 02:40 am, Larry Wall wrote:
 
  An interesting question would be whether we can bootstrap a Parrot
  cross-compile database using autoconf's *data* without buying into the
  shellism of autoconf.  Or give someone the tool to extract the data
  from the autoconf database themselves, so we don't have to ship it.
 
 What autoconf database?  Autoconf uses probing for cross-compilation as well.  

Well, that's one of the big problems with autoconf: it's NOT a database.
For example:

# AC_FUNC_GETMNTENT
# -
AN_FUNCTION([getmntent], [AC_FUNC_GETMNTENT])
AC_DEFUN([AC_FUNC_GETMNTENT],
[# getmntent is in -lsun on Irix 4, -lseq on Dynix/PTX, -lgen on Unixware.
AC_CHECK_LIB(sun, getmntent, LIBS=-lsun $LIBS,
  [AC_CHECK_LIB(seq, getmntent, LIBS=-lseq $LIBS,
[AC_CHECK_LIB(gen, getmntent, LIBS=-lgen $LIBS)])])
AC_CHECK_FUNCS(getmntent)
])

There's knowledge encoded in that, but it's not abstracted sufficiently.
Some assumptions could be made, and autoconf's knowledge could be
distilled a bit and then extracted into the [cross-]compiling database
that would be needed by Parrot.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: No-C, no programming project: Some configure investigation

2004-09-07 Thread Aaron Sherman
On Tue, 2004-09-07 at 08:00, Jens Rieks wrote:
 On Tuesday 07 September 2004 07:52, Robert Schwebel wrote:
  Would autoconf/automake be an option for the C part of parrot?
 No, its only available on a few systems.

Ok, this is probably a moot conversation because Metaconfig
(http://www.linux-mag.com/2002-12/compile_03.html) was written by Larry
Wall for rn, and the Perl community has some serious social inertia when
it comes to switching to any other configuration tool.

That said, autoconf is only available on a few systems. A few being
defined as everything I've ever heard of.

Seriously, I've never come across any system that lacked autoconf
support AND which a high level language like those that would target
Parrot, ran on. If you're referring to the number of systems that have
autoconf itself actually INSTALLED by default, that's just as moot as
the fact that almost no systems have Metaconfig installed. You never run
Metaconfig or autoconf as an end-user/installer, you run the resulting
[Cc]onfigure script.

autoconf (+automake, etc.) is an excellent tool, and while Metaconfig
is somewhat more limited, it too is an excellent tool, especially for
handling high level languages. Neither tool is wrong for the job, but
I expect that people who install Parrot will not be shocked by the
classic -des -Dprefix=/usr type of invocation

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Namespaces

2004-09-07 Thread Aaron Sherman
On Tue, 2004-09-07 at 09:26, Dan Sugalski wrote:

 *) Namespaces are hierarchical
 
 So we can have [foo; bar; baz] for a namespace. Woo hoo and all 
 that. It'd map to the equivalent perl namespace of foo::bar::baz.
[...]
It's also possible to hoist a sub-space up a few levels, so that the 
 [IO] space and the [__INTERNAL; Perl5, IO] namespace are the 
 same thing

This sounds fine, except for the higher level question of who controls
the root. That is, does a module Python decide to define its bits in the
Python space AND export them to the root?

The other way to go is to say:

#!/usr/bin/perl6
use __Python::os;

Which has the interesting result that no one ever need touch the root.
There's simply a search path that each language uses that would default
to [] and its own [__Internal;$language].


 Alternate names are fine. I'm seriously tempted to make it [\0\0]

Heehee. I don't think __INTERNAL is so bad.

 *) Each language has its own private second-level namespace. Core 
 library code goes in here.
 
 So perl 5 builtins, for example, would hang off of [__INTERNAL; 
 perl5] unless it wants something lower-down

Ok, so that's where split would go, right? Does that mean that if, in
Python, I wanted to use Perl 5's split, I'd just have to:

import __INTERNAL.perl5
list = split '\\ba(?=b)', the_string, 5

? That's some nifty beans!

Sounds great, Dan.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: No-C, no programming project: Some configure investigation

2004-09-07 Thread Aaron Sherman
On Tue, 2004-09-07 at 11:59, Andrew Dougherty wrote:

 Both autoconf and metaconfig assume a unix-like environment.  Ambitious
 plans for parrot's configure include non-unix environments too, such as
 VMS and all the ports where perl5 uses a manually-generated config.*
 template.

autoconf assumes m4 and shell and some other primitive tools, all of
which have GNU ports to just about everything I've had to touch. VMS is
the bastard child of autoconf right now, but back when VMS was a
platform that folks used, it certainly was supported (as late as 1999 it
worked great). I don't think it's the UNIXishness of the platform as
much as the popularity of the platform that guides autoconf support
(e.g. any platform for which patches are contributed). The guy who tried
to update the autoconf 1.x version this year kind of kicked the autoconf
guys in the shins verbally and got nowhere as a result. He then forked
it and has since apparently dropped support.

That said, if you have a manually generated template as you do for Perl
5, you can do the same for autoconf, no?

I'm not advocating autoconf here, just exploring the lay of the land.

Personally, I think Parrot should do whatever makes it easiest for
maintainers.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



SDL usage broken?

2004-09-07 Thread Aaron Sherman
When I try to run one of the SDL examples (any of them), I get:

SDL::fetch_layout warning: layout 'Pixels' not found!
Segmentation fault

When I edit runtime/parrot/library/SDL.imc and add the call to
_set_Pixels_layout in at line 60, I remove that warning, but still get
the seg-fault:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -150897856 (LWP 15959)]
0x08161133 in ins_writes2 (ins=0x8985c20, t=75) at imcc/instructions.c:138
138 if (ins-opnum == w_special[i + (strchr(types, t) - types)])
(gdb) bt
#0  0x08161133 in ins_writes2 (ins=0x8985c20, t=75) at imcc/instructions.c:138
#1  0x08162fd4 in analyse_life_block (interpreter=0x879d008, bb=0x8aeff98,
r=0x8983398) at imcc/cfg.c:583
#2  0x08162e3f in analyse_life_symbol (interpreter=0x879d008, unit=0x897ed48,
r=0x8983398) at imcc/cfg.c:523
#3  0x08162d86 in life_analysis (interpreter=0x879d008, unit=0x897ed48)
at imcc/cfg.c:499
#4  0x08164137 in imc_reg_alloc (interpreter=0x879d008, unit=0x897ed48)
at imcc/reg_alloc.c:172
#5  0x0815efdc in imc_compile_unit (interp=0x879d008, unit=0x897ed48)
at imcc/imc.c:111
#6  0x0815ef33 in imc_compile_all_units (interp=0x879d008) at imcc/imc.c:68
#7  0x0815edc6 in compile_file (interp=0x879d008, file=0x8b3ff90)
at imcc.l:920
#8  0x0816cb8b in imcc_compile_file (interp=0x879d008,
s=0x897e4d0 library/SDL/Surface.imc) at imcc/parser_util.c:574
#9  0x081fb259 in pcf_p_It (interpreter=0x879d008, self=0x893b0c8)
at src/nci.c:1862
#10 0x081d16a4 in Parrot_Compiler_invoke (interpreter=0x879d008,
pmc=0x893b0c8, code_ptr=0x8926e50) at classes/compiler.c:56
#11 0x080a38f8 in Parrot_load_bytecode (interpreter=0x879d008,
filename=0x8976de0 library/SDL/Surface.imc) at src/packfile.c:3103
#12 0x080ede8c in Parrot_load_bytecode_sc (cur_opcode=0x8980b70,

Hope this helps!
-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Synopsis 9 draft 1

2004-09-06 Thread Aaron Sherman
Taking this to p6i, in order to get Parroty for a few

On Thu, 2004-09-02 at 19:47, Larry Wall wrote:
 =head1 Overview
 
 This synopsis summarizes the non-existent Apocalypse 9, which
 discussed in detail the design of Perl 6 data structures.

[...]

 =head1 Sized types
 
 Sized low-level types are named most generally by appending the number
 of bits to a generic low-level type name:
 
 int1
 int2
 int4
 int8
 int16
 int32 (aka int on 32-bit machines)
 int64 (aka int on 64-bit machines)

Ok, so Parrot doesn't have those. Parrot has int. Presumably this
means that when the high-level language programmer (Perl 6 here, but
that's just an example) tries to get lower level by explicitly using a
sized type, they're going to have to be working in a PMC of some type
like PerlInt16, which (for reasons of overflow behavior and a few other
things) can almost never be optimized down into an integer register.

It seems to me that this causes a dilema for high-level languages where
providing to the user what appears to be finer grained control over
implementation actually makes them work at a higher level of
abstraction.

How would Parrot expect languages to implement such features? Should
there be a set of (highly JIT-optimizable) PMCs that present sized type
features, should the core register types be sizable somehow or should
languages just be left to roll their own PMCs that do whatever they
want?

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: No-C, no programming project: Some configure investigation

2004-09-06 Thread Aaron Sherman
On Mon, 2004-09-06 at 12:42, Dan Sugalski wrote:
 Right now configure.pl pulls a bunch of configuration information 
 straight out of the current perl configuration. We need to stop that, 
 and this is as good a time as any.
 
 If someone could go through and make a list of what info configure.pl 
 pulls from perl, I'll start writing (or snagging :) the probing code 
 to do it ourselves, so we can be perl-free, at least from a 
 configuration standpoint.

I think right now that info is all in config/init/data.pl, and it's
actually fairly well documented. Here's all of the variables that rely
on the %Config data from Perl's Config.pm:

optimize  = $optimize ? $Config{optimize} : '',
# Compiler -- used to turn .c files into object files.
# (Usually cc or cl, or something like that.)
cc= $Config{cc},
ccflags   = $Config{ccflags},
ccwarn= exists($Config{ccwarn}) ? $Config{ccwarn} : '',
# Flags used to indicate this object file is to be compiled
# with position-independent code suitable for dynamic loading.
cc_shared = $Config{cccdlflags}, # e.g. -fpic for GNU cc.
# Linker, used to link object files (plus libraries) into
# an executable.  It is usually $cc on Unix-ish systems.
# VMS and Win32 might use Link.
# Perl5's Configure doesn't distinguish linking from loading, so
# make a reasonable guess at defaults.
link  = $Config{cc},
linkflags = $Config{ldflags},
# ld:  Tool used to build dynamically loadable libraries.  Often
# $cc on Unix-ish systems, but apparently sometimes it's ld.
ld= $Config{ld},
ldflags   = $Config{ldflags},
libs  = $Config{libs},
exe   = $Config{_exe},   # executable files extension
ld_shared = $Config{lddlflags},
ar= $Config{ar},
ranlib= $Config{ranlib},
make  = $Config{make},
make_set_make = $Config{make_set_make},


-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: No-C, no programming project: Some configure investigation

2004-09-06 Thread Aaron Sherman
On Mon, 2004-09-06 at 18:29, Aaron Sherman wrote:
 On Mon, 2004-09-06 at 12:42, Dan Sugalski wrote:

  If someone could go through and make a list of what info configure.pl 
  pulls from perl, I'll start writing (or snagging :) the probing code 
  to do it ourselves, so we can be perl-free, at least from a 
  configuration standpoint.

 optimize  = $optimize ? $Config{optimize} : '',
 # Compiler -- used to turn .c files into object files.
 # (Usually cc or cl, or something like that.)
 cc= $Config{cc},
 ccflags   = $Config{ccflags},
 ccwarn= exists($Config{ccwarn}) ? $Config{ccwarn} : '',
 # Flags used to indicate this object file is to be compiled
 # with position-independent code suitable for dynamic loading.
 cc_shared = $Config{cccdlflags}, # e.g. -fpic for GNU cc.
 # Linker, used to link object files (plus libraries) into
 # an executable.  It is usually $cc on Unix-ish systems.
 # VMS and Win32 might use Link.
 # Perl5's Configure doesn't distinguish linking from loading, so
 # make a reasonable guess at defaults.
 link  = $Config{cc},
 linkflags = $Config{ldflags},
 # ld:  Tool used to build dynamically loadable libraries.  Often
 # $cc on Unix-ish systems, but apparently sometimes it's ld.
 ld= $Config{ld},
 ldflags   = $Config{ldflags},
 libs  = $Config{libs},
 exe   = $Config{_exe},   # executable files extension
 ld_shared = $Config{lddlflags},
 ar= $Config{ar},
 ranlib= $Config{ranlib},
 make  = $Config{make},
 make_set_make = $Config{make_set_make},

Add to that:

archname (used in several config/auto/*.pl files)
sig_name (used in config/auto/*.pl files, but also t/ and lib/)
./config/auto/pack.pl:39:if (($] = 5.006)  ($size == $longsize)  ($size == 
$Config{longsize}) ) {
./config/auto/pack.pl:45:elsif ($size == 8 || $Config{use64bitint} eq 'define') {
parrotbug/ uses a bunch of Config fields too

And then there's everything that uses an i_* field from Configure::Data,
but I only SEE i_malloc getting called there.

And finally, I don't know what uses config_lib.pasm, but it seems to
write out a copy of Perl's Config with some extra stuff added in to
runtime/parrot/include/config.fpmc, so anything that references that
structure too... anyone know what references that?


-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: No-C, no programming project: Some configure investigation

2004-09-06 Thread Aaron Sherman
On Mon, 2004-09-06 at 18:29, Aaron Sherman wrote:

 I think right now that info is all in config/init/data.pl, and it's

Scratch that. I was grepping through the tree for Config{ which turns
out to not catch the way %Config is used in most of the tree... I'll
have a look and get you the details.


-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Semantics for regexes

2004-09-02 Thread Aaron Sherman
On Wed, 2004-09-01 at 17:00, Larry Wall wrote:

 : Let's get concrete:
 : 
 : rule foo { a $x:=(b*) c }
 : abbabc
 : 
 : So, if I understand Parrot and Perl 6 correctly (heh, fat chance), a
 : slight modification to the calling convention of the closure that
 : represents a rule (possibly even a raw .Closure) could add a pad that
 : the callee is expected to fill in with any hypotheticals defined during
 : execution.
 
 Okay, except that hypotheticality is an attribute of a variable's
 value, not of the pad it's in.

Yes, I think I got that part, and perhaps I was being unclear or am
still missing something. Here's what I was saying, a slightly different
way:

As you enter a rule, you establish a new, free-floating pad. It *is*
stored on the current pad stack (so that its variables are available to
the rule and its closures), but, more importantly, it is part of the
rule's state because it is stored in C$0

When you bind a hypothetical it goes into this pad.

When you unbind a hypothetical (fail/backtrack) it is deleted from this
pad (its value doesn't just get undef).

When you return from the rule (and this is the key), you return C$0,
which, along with other state, contains a reference to this pad (and the
pad, of course contains a circular reference to C$0). The caller can
now do one of two things:

  * Push this pad onto its stack. Pro: simple and fast
  * Copy each variable from this pad in a smart way, searching up
the pad stack for a candidate variable to replace, and
defaulting to storing it in the inner-most pad as a new lexical.

I think the second one is the one you are describing (and described in
A5). The first is, IMHO, the cleaner solution, but I'm not suggesting
anything really, just pointing out the options.

My real point is that if you just establish such a free-floating
hypopad (sounds like something Dr. McCoy would use) in the rule, then
you get all of the hypothetical/backtracking behavior that you want,
regardless of how the caller integrates the variables with its scope. It
also keeps rules from having to search up through existing scope levels
themselves, keeping their complexity constrained to what they know best:
matching regular expressions and grammars. Perl's calling conventions
manage all of the extra complexity on return, and that's probably where
stack-walking code should go anyway.

 : Essentially every close-paren triggers binding, and every back-track
 : over a close-paren triggers clearing.
 
 Yes, that's essentially correct.  My quibble was simply that it may be
 hard to keep track of what to clear out in the case of calling a
 failure continuation.

I'm not sure if that's going to be true or not, as thinking in terms of
failure continuations hurts my brain ;-) Still, I'm 99% sure that what I
describe above puts all of the what to clear state in the pad that you
return. Nice and easy.

A side point to Dan: In reading P6PE, I don't see an op for deleting an
entry from a pad. At least for this, and I think for some other things
that aren't coming to mind right now, it's probably going to be needed.
If it's already there, but just not in P6PE, cool and thanks! ;-)

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Semantics for regexes

2004-09-02 Thread Aaron Sherman
On Thu, 2004-09-02 at 11:27, Felix Gallo wrote:

 Although the next regex engine has to deal with the horribly
 crufty new perl6 syntax

Keep in mind that Perl 6 regexen are really just Perl 5 regexen with a
call stack and backtracking control. Absolutely everything else that I
see in P6 is either just a different syntax for the same thing (e.g.
character classes) or unrelated to the actual regex engine itself (e.g.
hypotheticals). There's nothing that I see in this that would slow down
a mundane regexp OTHER than Unicode, and in many respects P5 has already
taken that hit.

Now, that's not to say that:

rule perl6_program { perl5_program | perl6_statement(*) }

is supposed to run as fast as a Perl 5 regexp, but that's a WHOLE other
beast, and we don't expect it to be as simple as a regexp.

Under the hood, I would expect that P6 regexen will be broken down into
matching and flow control parts, and handed off to Parrot
differently. While there might or might not be an op for the matching
part, the flow control part is just code (though code with significant
magic, I will admit).

In other words, you might see:

rule { a b c }

get broken down into:

rule { {pasm('regexp P0, P1, a')} b {pasm('regexp P0, P1, c')} }

As Dan points out regexp might not exist, or (this seems more likely)
might just be a call-back into a tiny regexp compiler that generates
Parrot bytecode for the convenience of languages for which regex is not
a core feature that the compiler would want to get its hands dirty with.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Semantics for regexes

2004-09-02 Thread Aaron Sherman

Ok, I get it now, thanks Larry.

I do still think that you can do what I suggest, but I realize that it's
not as easy as handing around a single pad, you would actually need to
maintain either a list of pads (outside of the built-in pad stack,
probably inside of C$0) or a list of C$0s, each with their own pad.
If you do that, there's something really NIFTY that falls out of it:
premature (possibly temporary) exit from a rule results in the
restoration of all hypotheticals to their pre-rule state (because the
pads in which they live are no longer active (and possibly gone)).

Should you resume such a rule (e.g. because you had gone off to handle a
signal or exception), the hypotheticals all pick up their states again
and proceed. This kind of atomic hypothetical-updating / reversion could
be very valuable (and fast!), since exposing the state of hypotheticals
at the moment of a signal or exception doesn't really make a lot of
sense, and could cause some programs to fail in surprising ways.

Thanks for entertaining this brain-detour of mine. I return you to your
regularly scheduled language design without further ado.

 Sorry to inhabit your 1% unsureness, but that's precisely where I am.

Never doubted it ;-)

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: perl6 garbage collector?

2004-09-01 Thread Aaron Sherman
On Mon, 2004-08-30 at 14:40, Ozgun Erdogan wrote:
   Currently, we're using perl-5.6.1 and are having problems with memory
   leaks - thanks to reference counting.
  
  You'll have to break reference loops explicitely.
 
 If only I had known where those circular references are. I have a
 circular ref. detector tool, but it still doesn't get them. The thing
 is, you could do an SvREFCNT_inc, and boom you have a memory leak.

Ok, you're no longer talking about Perl (the language) but rather about
Perl 5's internals. Different beast.

This is not the right list for debugging that kind of thing, so I won't
go into it, but suffice to say that if you have trouble managing your
references through XS, incorporating Parrot's GC into Perl 5 would be
near impossible. That's not intended as a slight, believe me, I put
myself in the same category (reference counting in Perl 5 is very
difficult to grok from the docs, as the docs make some assumptions about
how much you know about how Perl constructs scopes).

All that aside, Ponie is your friend. As Ponie matures, it will provide
what you need, and your XS could be transitioned over into Parrot
bytecode.

For now, if I were you I would upgrade to 5.8.x and try to make sure
that every value that you move between your XS and Perl is properly
mortal (see the perlapi, perlguts and perlxs man pages).

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Library loading

2004-09-01 Thread Aaron Sherman
On Sat, 2004-08-28 at 16:17, Dan Sugalski wrote:
 Time to finish this one and ensconce the API into the embedding interface.

That reminds me, I was reading P6PE yesterday, and I came across a
scary bit on loading of shared libraries. The statement was made that
Parrot would search the current directory first.

Perhaps this was an over-simplification, but if not, PLEASE,
re-consider. Security implications aside (and they're huge), Parrot
should probably be searching its installation area (possibly overridden
by an environment variable) followed by whatever system path (e.g.
LD_LIBRARY_PATH, ldconfig or whatever your OS uses) is given to Parrot
externally, so as not to modify the behavior of a program based on the
current directory of the user running it.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Proposal for a new PMC layout and more

2004-09-01 Thread Aaron Sherman
On Wed, 2004-09-01 at 11:17, Leopold Toetsch wrote:

 Comments welcome,

Honestly, much of this goes beyond my meager understanding of Parrot
internals, but I've read it, and most of it seems reasonable. Just on
point where you may not have considered a logical alternative:

 =head2 2.6. Morphing Undefs
 
 Currently all binary (and other) opcodes need an existing destination
 PMC. The normal sequence a compiler emits is something like this:
 
   $P0 = new Undef
   $P0 = a + b

Since you've lopped a lot of space off of PMCs, Undefs could be made
large enough to fit a basic buffer PMC (3 words). In that case, they
could always be upgraded in-place to integer PMCs, float PMCs, very
simple objects, references and buffers. Everything else would need to go
through a copy-upgrade.

The trade-off is that all PMCs would be 3 words unless special code was
emitted that avoided this for smaller (integer, float, reference) PMCs.

I'm not saying that this is a BETTER plan, just an idea to think about
and a different set of trade-offs.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Semantics for regexes

2004-09-01 Thread Aaron Sherman
On Wed, 2004-09-01 at 16:07, Larry Wall wrote:

 I see one other potential gotcha with respect to backtracking and
 closures.  In P6, a closure can declare a hypothetical variable
 that is restored only if the closure exits unsuccessfully.  Within
 a rule, an embedded closure is unsuccessful if it is backtracked over.
 But that implies that you can't know whether you have a successful
 return until the entire regex is matched, all the way down, and all the
 way back out the top, or at least out far enough that you know you
 can't backtrack into this closure.  Abstractly, the closure doesn't
 return until the entire rest of the match is decided.  Internally,
 of course, the closure probably returns as soon as you run into the
 end of it.

Let's get concrete:

rule foo { a $x:=(b*) c }
abbabc

So, if I understand Parrot and Perl 6 correctly (heh, fat chance), a
slight modification to the calling convention of the closure that
represents a rule (possibly even a raw .Closure) could add a pad that
the callee is expected to fill in with any hypotheticals defined during
execution. The following would happen in the example above:

store_lex bb into hypopad($x) after abb
find a and fail the rule, backtracking (clear hypopad($x))
store_lex b into hypopad($x) after backtracking over one b
find b next and fail the rule, backtracking again (clear)
store_lex b into hypopad($x) after second ab
find c and succeed rule foo, return hypopad

Essentially every close-paren triggers binding, and every back-track
over a close-paren triggers clearing.

Because this is all part of the calling convention for a rule, there's
no difference between a rule passing back hypotheticals to its caller
and a sub-rule doing so to the rule which called IT.

Is that workable? Does it address your concern, Larry, or did I miss
your point?

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Semantics for regexes

2004-09-01 Thread Aaron Sherman
On Wed, 2004-09-01 at 16:33, Aaron Sherman wrote:

   rule foo { a $x:=(b*) c }

In the rest of my message I acted as if that read:

rule foo { a $x:=(b+) c }

so, we may as well pretend that that's what I meant to say ;-)

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



OT: SPF problem with the list?

2004-08-24 Thread Aaron Sherman
Please let me know who is appropriate for this, and whatever you do,
please don't reply to / CC the list. We don't need to bog down the works
with discussion of spam filtering.

I'm noticing that mail from perl6-* is showing up with this header:

Received-SPF: softfail (mail.ajs.com: transitioning domain of perl.org does
not designate 63.251.223.186 as permitted sender) client-ip=63.251.223.186;
[EMAIL PROTECTED];
helo=lists.develooper.com;

That is added by my local SPF-checker. It seems that x6.develooper.com
[63.251.223.186], which is sending these out this mail is not in
perl.org's SPF record (which would be fine if perl.org had no SPF
record, but it does). There's an easy way to say and all of this other
domain's MXes too in SPF, which is probably what was intended.

This is causing my spam filtering to slightly bump p6 mail toward spam
(though so far, I don't think I've gotten any false positives).

I take a somewhat proprietary interest in perl.org working well for
various historical/sentimental reasons, so I'd be happy to help with any
debugging / diagnosing of this if that would help.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: NCI and callback functions

2004-08-23 Thread Aaron Sherman
Leopold Toetsch wrote:
Leopold Toetsch wrote:
Stephane Peiry wrote:
  g_return_val_if_fail (G_IS_OBJECT (gobject), 0);  Fails here


gtk shouldn't make assumption on the user_data argument IMHO.

The whole idea behind callbacks is, that there is a userdata argument 
that get's passed through transparently. GTK is taking a gobject only. 
So there is currently no way to use the existing callback scheme.
Can't you wrap what you want to pass in a GObject?


Re: Something to ponder

2004-08-18 Thread Aaron Sherman
On Wed, 2004-08-18 at 15:57, Felix Gallo wrote:
 Dan writes:
  sub foo :come_from('+', int, int) {}
 
 One problem with MMD in general, and return specifically, is
 'what happens if multiple M match the same D requirements?
 i.e., 

That's a question, not a problem. It's easy to answer questions ;-) I
assume we're talking about first-to-match only, but I haven't looked at
the code. You could always go look at the MMD code in the Parrot
source...

However, I'm not sure what Dan meant there. Perhaps he mis-spoke, or
perhaps I don't understand this at all... that's a calling signature,
not a return signature. I would expect:

sub foo :come_from('+', int) {...} # Handle integer returns
sub foo :come_from('+', num) {...} # Handle floating point

That's very different from a come_from that would operate on the calling
signature (which needs return continuations too, but differently).

Or did we just switch to talking about something different while I
wasn't looking?

 If the answer is 'all get executed', this could be useful for
 any languages interested in implementing aspect-oriented programming
 as a first class language feature, e.g.

You can build one from the other trivially, though and that doesn't
affect, in the slightest, how first class the feature is in a language
that uses Parrot, only how interchangeable it is between languages.

That, of course, dodges the question of how much aspect oriented
programming is an attempt to being the beauty of Intercal to ugly,
usable programming languages

 sub debug_log :come_from(:benchmark_me) { 
 my $function_name = shift;
 print STDERR debug: $function_name at  . time() . \n;
 }

Ok, this is starting to look like people speaking seriously about using
Intercal's COME FROM (http://c2.com/cgi/wiki?ComeFrom)... can we just
step back and take a deep breath of AIR please? Seriously, this is
starting to creep me out.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Something to ponder

2004-08-17 Thread Aaron Sherman
On Tue, 2004-08-17 at 16:22, Felix Gallo wrote:
 On Tue, Aug 17, 2004 at 04:08:34PM -0400, Dan Sugalski wrote:
  1) We're going to have MMD for functions soon
  2) Function invocation and return continuation invocation's 
  essentially identical
  3) Therefore returning from a sub/method can do MMD return based on 
  the return values

 $x -\
  \
 @mylist -+--- $obj.mymmdsub;
  /  
 %hash --/   

How very fungible of you ;-)

Still, I think that's a nice APPLICATION, but the concept is more
flexible, if I understand it correctly. It would be something that would
look more like a cross between exception handling and a switch
statement.

I would think it would look more like (again, Perlish example):

$sock.peername()
does returntype(
Net::Address::IP - $ip {
die Remote host unresolvable: '$ip';
}, Net::Address - $addr {
die Non IP unresolvable address: '$addr';
}, Str - $_ {
print Seemingly valid hostname: '$_'\n;
});

Of course, that's just Perl. Perhaps Python would add something that
would look like:

returnswitch: sock.peername()
returncase os.net.addr.ip:
lambda ip: raise OSError, Unresolvable %s % str(ip)
returncase os.net.addr:
lambda addr: raise OSError, Unresolvable non-IP %s % str(ip)
returncase str:
lambda name: print Seemingly valid hostname: '%s' % name

My python skills are still developing, so pardon me if I've gotten it
wrong, and I'm just inventing os.net.addr.ip for purposes of
pseudo-code

Is that the kind of thing you had in mind, Dan, or am I misunderstanding
how return continuations work?

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: What Unicode means to us

2004-08-13 Thread Aaron Sherman
On Mon, 2004-08-09 at 14:14, Dan Sugalski wrote:

 Additionally if we have source text which is 
 Latin-n, EBCDIC, ASCII, or whatever we must be 
 able to convert it with no loss to Unicode. 
 (Which I believe is now doable with Unicode 4.0) 
 Losslessly converting Unicode to 
 ASCII/EBCDIC/whatever is *not* required, which is 
 fine as it's theoretically (and often 
 practically) impossible.

Can I suggest instead:

If we have source text which is comprised of a non-Unicode
character-set we must be able to convert it with minimal loss to
Unicode (minimal being defined as zero for all Unicode-subset
character sets).

Converting Unicode to non-Unicode character sets will be
lossless where possible, and will attempt to encode the name of
the character in ASCII characters into the target character set.
An example would be the conversion of the UTF-8 string (in Perl
5 notation):

foo \x{263a} bar

to the ASCII representation:

foo {SMILING FACE, WHITE} bar

There are 4 possible failure modes, each resulting in a
conversion exception: 1) the ASCII name is not available 2) the
ASCII name cannot be converted into the target character set
(recursive name-lookups are not allowed, nor would they be very
useful) 3) a VM parameter requesting exceptions on failed
character-set conversions has been set to a true value 4) the
source is a PMC and that PMC has a property indicating that
exceptions should be generated on failed conversions.

This just seems a bit more useful in the general case to me, while
allowing the language implementation the option of requesting an
exception either globally or per-PMC.

Thoughts?

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: What Unicode means to us

2004-08-13 Thread Aaron Sherman
I don't want to argue per-se (that doesn't do anyone any good), so if
your mind is made up, that's cool... still, I think there's some value
in exploring the options, so read on if you're so inclined.

On Wed, 2004-08-11 at 04:40, Dan Sugalski wrote:

  Converting Unicode to non-Unicode character sets will be
  lossless where possible, and will attempt to encode the name of
  the character in ASCII characters into the target character set.
 
 Gack. No, I think this'd be a bad idea as the default behavior. 

Well ok, why not make an exception the default behavior then? Just
reverse what I suggested from the default to the option. It's still
mighty handy for a language (any Parrot-based language) to be able to
render a meaningful string in any ASCII-capable encoding from any
Unicode subset.

I think the only problem would be in the realm of directionality of
script, but I assume that all non L-R scripts have some convention for
injecting snippits of L-R, just as en-US injects R-L, easy as   .

 What's right is up in the air -- I'm figuring we'll either throw an 
 exception or substitute in a default character, but the full 
 expansion's definitely way too much.

That's too bad, as:

This was converted from nicode

becoming

This was converted from {FULLWIDTH LATIN CAPITAL LETTER U}nicode

seems much more reasonable than choosing some poor ASCII character to
act as the fallback.

If someone does something stupid like converting a 5MB document in UTF-8
encoded Cyrillic into ASCII, then they're going to get a huge result,
but that's no less useful than 3MB of text that looks like  ** 
***-**. ***'* *, I would think, and perhaps more useful for certain
purposes (e.g. it could still be deciphered and/or re-assembled).

The other way to go would be some sort of standardized low-level
notation to represent encoding and codepoint such as:

This was converted from {U+FF35}nicode

That's less readable, but arguably more reversible and/or precise.
Certainly more easily automatically detected. For example, the following
Perl 5 code could reverse such transformation:

s{\{(.)\+([a-f\d]+)\}}{
character(target_encoding  = $target_encoding,
  source_encoding  = abbrv_to_encoding($1),
  source_codepoint = hex(0x.$2))
}eg;

assuming, of course, a function character and a function
abbrv_to_encoding which attempt generate a character in a target
encoding based on a character in a source encoding and return an
encoding ID/name/object/whatever based on a one-character abbreviation
respectively.

It would be ideal if other tactics could be used like the GB 2312
encoding in ASCII described in RFC 1842. Of course, the above could be
permuted that way:

This was converted from {G+~{:Ky2;S{#,NpJ)l6HK!#~}}

But that starts to get deeper into character set and encoding
transformation than my head is capable of coping with at this stage (I'm
really just learning about these topics). I fear I'm walking down a road
that ends in my suggesting that every non-Unicode string has a MIME
header, but rest assured that that's not my goal. I just wanted to
suggest a useful alternative to throwing an exception on incompatible
type conversion, especially for those client languages (e.g. m4) in
which an exception will either have to be ignored or treated as fatal.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: We have spawn, and now we need exec

2004-08-05 Thread Aaron Sherman
On Thu, 2004-08-05 at 13:43, Dan Sugalski wrote:

 Cool. On the Unix platforms we exec off 'sh' and pass in parameters 
 (so we get command parameters split up right, IIRC). I'm presuming we 
 don't do the same for Windows, so I'll make it the plain command and 
 hope it all works out.

Well, that's one way you can do it, but it causes a ton of headaches,
e.g. because

exec echo user's text goes here 

gets shell interpretation and fails, so by way of example only, Perl 5
allows for both usages depending on what you pass it. Parrot could
easily make the distinction based on being passed a string value or a
PMC array of some sort and end up with roughly the same functionality as
Perl (though Perl itself would not use this as-is, as it decides further
based on the content of the string, and will call raw exec(2) on the
results of splitting the string on whitespace if no shell metacharacters
occur, but I think that's a bit too much Perlishness to put in Parrot).

Either way, Parrot really HAS to provide a raw POSIX exec, as it cannot
be faked from a shell-using variant correctly.

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: We have spawn, and now we need exec

2004-08-05 Thread Aaron Sherman
On Thu, 2004-08-05 at 14:11, Aaron Sherman wrote:

 Parrot could
 easily make the distinction based on being passed a string value or a
 PMC array of some sort and end up with roughly the same functionality as
 Perl (though Perl itself would not use this as-is, as it decides further
 based on the content of the string, and will call raw exec(2) on the
 results of splitting the string on whitespace if no shell metacharacters
 occur, but I think that's a bit too much Perlishness to put in Parrot).

Run-on sentence from hell barely begins to describe the horror... I'm
so sorry about that. Please, do me a favor and breath whenever you feel
like it ;-)

-- 
 781-324-3772
 [EMAIL PROTECTED]
 http://www.ajs.com/~ajs



Re: Python builtin namespace

2004-07-19 Thread Aaron Sherman
On Thu, 2004-07-15 at 22:46, Dan Sugalski wrote:
 And language builtin namespaces in general. We need a standard, and 
 now's as good a time as any, so...
 
 All language-specific builtin functions go into the _core_Language 
 namespace. (So for Python it's _core_Python, Perl 5 is _core_Perl5, 
 and so on)

In the specific case of Perl 5 and 6, aren't builtins in the same
(Parrot) namespace as user-defined functions? In Perl 5, you can access
builtins through the CORE:: (Perl) namespace, which certainly is visible
to all Perl programs.

Would there be some sort of namespace aliasing for this sort of thing
(or the Microsoft naming scheme for C#/.Net Framework for that matter)?

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Perl Toolsmith
http://www.ajs.com/~ajs/resume.html




[Fwd: Re: Layering PMCs]

2004-06-03 Thread Aaron Sherman
I sent this message out a few days ago, but never saw it show up on the
list... Just to recap

a) option #1 seemed best to me
b) this will all happen at the parrot level
c) languages will almost never change an object to read-only
d) there are some reasons that old access to an object should not
become read-only
e) true read-onlyness will probably most often be optimized at the
language level by storing cached values in typed registers anyway

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Perl Toolsmith
http://www.ajs.com/~ajs/resume.html

---BeginMessage---
On Sat, 2004-05-29 at 15:29, Dan Sugalski wrote:

 The problem with the first scheme is that anything that has a handle 
 on the PMC will not get the new layers. Not a good thing.

I like the first scheme. The question that comes up is: when does
something get layered?

That is: if I have code that says:

new_thread_increment_every_minute(foo)
become_read_only(foo)

Then, you have two choices of semantic:

1. foo is read-only retroactively and will throw an exception in one
minute
2. You've given the new thread a read-write version of foo, and the
read-only version created after that now has the property of changing
every minute.

I would see this as being very useful for several types of read-only
access to data that DOES change (an accumulator for a random number
entropy pool, for example).

High level languages on the other hand, should probably not expose this
directly. They will create a variable and tag it as read only at the
same time, and to the programmer there's no difference.

If they do allow for run-time read-only-ification, they can always build
their own high-level abstraction around this core PMC.

The only problem I see with this is that high level languages might want
to cache the value of a read-only variable in a typed register. If read
only really is read only, that's valid, but if it's only an interface
restriction it's not.

There you're going to have some semantic boundaries between languages
that might be unfortunate. How much of a problem that is, I'm not sure.

As for threading, I think the simple layering is easiest, and again, you
create the PMC layered if you want that functionality (e.g. for
locking).

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback

---End Message---


Re: Layering PMCs

2004-06-01 Thread Aaron Sherman
On Sat, 2004-05-29 at 15:29, Dan Sugalski wrote:

 The problem with the first scheme is that anything that has a handle 
 on the PMC will not get the new layers. Not a good thing.

I like the first scheme. The question that comes up is: when does
something get layered?

That is: if I have code that says:

new_thread_increment_every_minute(foo)
become_read_only(foo)

Then, you have two choices of semantic:

1. foo is read-only retroactively and will throw an exception in one
minute
2. You've given the new thread a read-write version of foo, and the
read-only version created after that now has the property of changing
every minute.

I would see this as being very useful for several types of read-only
access to data that DOES change (an accumulator for a random number
entropy pool, for example).

High level languages on the other hand, should probably not expose this
directly. They will create a variable and tag it as read only at the
same time, and to the programmer there's no difference.

If they do allow for run-time read-only-ification, they can always build
their own high-level abstraction around this core PMC.

The only problem I see with this is that high level languages might want
to cache the value of a read-only variable in a typed register. If read
only really is read only, that's valid, but if it's only an interface
restriction it's not.

There you're going to have some semantic boundaries between languages
that might be unfortunate. How much of a problem that is, I'm not sure.

As for threading, I think the simple layering is easiest, and again, you
create the PMC layered if you want that functionality (e.g. for
locking).

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Please become ID verified.

2004-05-25 Thread Aaron Sherman
On Mon, May 24, 2004 at 09:48:45PM -0400, Uri Guttman wrote:
 
 is there a paypal PMC in the plans? will it be multi-accounted? will it
 have built in auth support? what about rounding errors?

In case it was not obvious, the Paypal message was a scam to get people's
passwords.

The offending host appears to be 81.196.122.75.

-- 
Aaron Sherman
[EMAIL PROTECTED]   finger [EMAIL PROTECTED] for GPG info. Fingerprint:
www.ajs.com/~ajs6DC1 F67A B9FB 2FBA D04C  619E FC35 5713 2676 CEAF
   Visit my Mushroom Journals at http://mush.ajs.com/


RE: Events (I think we need a new name)

2004-05-14 Thread Aaron Sherman
On Fri, 2004-05-14 at 06:27, Rachwal Waldemar-AWR001 wrote:
 It seems the name 'event' is not as bad. So, maybe 'Pevent', stands for 'parrot 
 event'?
 One advantage... it'd be easy searchable. I recall a pain whenever I searched for 
 'thread', or 'Icon'.

If you're talking about search engines, then of course parrot event
works just fine. If you're talking about searching your code, then
that's another matter.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Events (I think we need a new name)

2004-05-12 Thread Aaron Sherman
On Wed, 2004-05-12 at 12:08, Dan Sugalski wrote:

 It does, though, sound like we might want an alternate name for this 
 stuff. While event is the right thing in some places it isn't in 
 others (like the whole attribute/property mess) we may be well-served 
 choosing another name. I'm open to suggestions here...

How about skippy?

Seriously, I would say that event is about as abstract as it comes. Even
the proposed message is, in some ways, LESS abstract.

What's the specific sort of case events don't seem to cover? The setting
of a property?

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Patch: Do rand() and srand()

2004-05-12 Thread Aaron Sherman
On Wed, 2004-04-28 at 09:54, Aaron Sherman wrote:
 A simple implementation of rand() and srand() which may not be ideal for
 Perl. Also included is the test file for random ops. If anyone can think
 of a good way to ALWAYS know that a number we got back was random,
 throw that into the test ;-)

Was this going to get sucked up into CVS? I'm just having to keep
patching it back in each time I update for now.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Events design question: Handles for repeating events?

2004-05-04 Thread Aaron Sherman
On Tue, 2004-05-04 at 09:25, Dan Sugalski wrote:
 Okay, I'm working up the design for the event and IO system so we can 
 get that underway (who, me, avoid the unpleasantness of strings? 
 Nah... :) and I've come across an interesting question.
 
 The way things are going to work with specific actions a program has 
 asked to be done, such as a disk read or write, is that you get back 
 a handle PMC that represents the request, and you can wait on that 
 handle for the request to be completed. The sequence goes something 
 like:
 
  write Px, Py, Sz # Return handle, file, and data to write
  waitfor Px   # Wait for the request to finish

So, all Parrot IO will be asynchronous? Does that mean that there's no
way to perform an atomic read or write?

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Events design question: Handles for repeating events?

2004-05-04 Thread Aaron Sherman
On Tue, 2004-05-04 at 11:36, Dan Sugalski wrote:
 At 11:25 AM -0400 5/4/04, Aaron Sherman wrote:

 So, all Parrot IO will be asynchronous? Does that mean that there's no
 way to perform an atomic read or write?
 
 Yes, and there isn't now anywhere anyway so it's not a big deal.

I was speaking in terms of Parrot. Obviously, at the OS level some
writes are guaranteed atomic (e.g. POSIX dictates that writes of
PIPE_BUF or fewer bytes are atomic on a pipe, but that's neither here
nor there) and others are not. What I was asking was more in terms of
what could happen to Parrot while your write is in an unknown state.

Specifically, I'm concerned that I might want to say:

become immune to any events
perform write
re-sensitize to events

But, if writes are implemented using the event-handling system, won't
that mean that you can't actually do that? Here's one scenario for a
filter that I think demonstrates my concern:

read event handler:
perform synchronous write

This simple example might perform a partial write, then get a read event
and queue up a second write, perform a partial write on that, then queue
up a third write due to another event 

You can feel free to tell me there's some obvious way this is avoided,
as I admit I'm no expert in the domain of asynchronous IO management.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 04:57, Jarkko Hietaniemi wrote:
  If Jarkko 
  tells me you can do bitwise operations with unicode text now in Perl 
  5, well... we'll support it there, too, though we shan't like it at 
  all.
 
 We can and I don't like it at all [...]
 None of it anything I want to propagate anywhere.

Please correct me if I'm wrong here, but I'm going to lay out my
understanding as a set of assertions:

  * Parrot will be able to convert any encoding to any other
encoding
  * though, some conversions will result in an exception, that's
still a defined behavior
  * We've agreed that only raw binary 8-bit strings make sense for
bit vector operations

So it seems to me that the obvious way to go is to have all bit-s
operations first convert to raw bytes (possibly throwing an exception)
and then proceed to do their work.

This means that UTF-8 strings will be handled just fine, and (as I
understand it) some subset of Unicode-at-large will be handled as well.
In other-words, the burden goes on the conversion functions, not on the
bit ops.

It's not that it's going to be meaningful in the general case, but if
you have code like:

sub foo() { return \x01+|\x02 }

I would expect the get the bit-string, \x03 back even though strings
may default to Unicode in Perl 6.

You could put this on the shoulders of the client language (by saying
that the operands must be pre-converted, but that seems to be contrary
to Parrot's usual MO.

Let me know. I'm happy to do it either way, and I'll look at modifying
the other bit-string operators if they don't conform to the decision.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback



signature.asc
Description: This is a digitally signed message part


Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 11:26, Jarkko Hietaniemi wrote:

As for codepoints outside of \x00-\xff, I vote exception. I don't think
there's any other logical choice, but I think it's just an encoding
conversion exception, not a special bit-op exception (that's arm-waving,
I have not looked at Parrot's exception model yet... miles to go...)

  This means that UTF-8 strings will be handled just fine, and (as I
 
 Please don't mix encodings and code points.  That strings might be
 serialized or stored as UTF-8 should have no consequence with bitops.

What I meant was that UTF-8 IS going to be represented in a way that
will guarantee you won't get an exception when trying to do bit-ops. All
bets are off for many other encodings. While you're right that you might
get lucky, that wasn't really the point I was making. Many languages
(Perl included, I think) are going to encode strings as UTF-8 by
default, and this means that in the general case, we should not expect
exceptions to be thrown around any time we do a bit-op and 'A'|'B' will
still be 'C' :-)

 Of course.  But I would expect a horrible flaming death for
 \x{100}|+\x02.

Well, if you consider a string conversion exception to be horrible
flaming death, then I hate to see what you do with a divide-by-zero ;-)

None of your response sounds overly scary to me, so I'll start looking
at what Parrot does NOW for bit-string-ops and see if it needs to mutate
to fit this model. Then I'll add in the rest. Then I get to see what
evil Dan and Leo perform upon my patch ;-)
 
-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback



signature.asc
Description: This is a digitally signed message part


Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 14:18, Jeff Clites wrote:
 On May 1, 2004, at 8:26 AM, Jarkko Hietaniemi wrote:

 Just FYI, the way I implemented bitwise-not so far, was to bitwise-not 
 code points 0x{00}-0x{FF} as uint8-sized things, 0x{100}-0x{} as 
 uint16-sized things, and  0x{} as uint32-sized things (but then 
 bit-masking them with 0xF to make sure that they fell into a valid 
 code point range). That's pretty arbitrary, but if you bitwise-not as 
 though everything were 32-bits wide, you'll end up with a string 
 containing no assigned code points at all (they'll all be  0x10F). 
 But from a text point of view, bitwise-not on a string isn't a sensible 
 operation no matter how you slice it (that is, even for 0x{00}-0x{FF}), 
 so one flavor of arbitrary is just about as good as any other. We could 
 also make anything  0x{FF} map to either 0x{00} or 0x{FF}, or mask if 
 with 0xFF to push it into that range. It's all pretty meaningless, as 
 text transformations go, and I can't imagine anyone using it for 
 anything, except maybe weak encryption.

I think Dan and I were both thinking in terms of bit-vector operations
on byte-streams for any purpose that would require such a beast. In
Perl, you have the vec function to make this slightly easier.

This is one of those places where thinking about strings as text is
highly misleading. They're used for an awful lot more.

 Exactly. And also realize that if you bitwise-not (or shift or 
 something similar) the bytes of a UTF-8 serialization of something, the 
 result isn't going to be valid UTF-8, so you'd be hard-pressed to lay 
 text semantics down on top of it.

How are you defining valid UTF-8? Is there a codepoint in UTF-8
between \x00 and \xff that isn't valid? Is there a reason to ever do
bitwise operations on anything other than 8-bit codepoints?

 I'm beginning to wonder if we're going to be square-rooting strings, 
 and taking the array-th root of a hash :)

Strings are not numbers, but there's a heck of a lot of code out there
that treats existing strings as bit-vectors (note: bit vectors are not
numbers either), and that code needs to be supported, no?

Now, shift operations aren't usually part of the package, but I figured
that as long as we were going to have the rest of the bit-manipulators,
finishing off the set would be of value.

More to the point, I said all of this at the beginning of this thread.
You should not, at this point, be confused about the scope of what I
want to do, as it was very narrowly and clearly defined up-front.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback



signature.asc
Description: This is a digitally signed message part


Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 15:09, Jarkko Hietaniemi wrote:
  How are you defining valid UTF-8? Is there a codepoint in UTF-8
  between \x00 and \xff that isn't valid? Is there a reason to ever do
 
 Like, half of them?  \x80 .. \xff are all invalid as UTF-8.

Heh, damn Ken Thompson and his placemat!

I am too new to UCS and UTF-8, and had thought it was always 8-bit. I
stand corrected, having read up on the UTF-8 and Unicode FAQ.

Jeff, yeah I have to take back my statement. If Perl defaults to UTF-8,
then it's not a valid assumption that a UTF-8 input string won't throw
an exception. I still think that's ok, and better than
representation-expanding to the larger representation and doing the
bit-op in that, since that  means that bit-vectors would have to be
valid in enum_stringrep_one, _two and _four as sort of alternate
datastructures. I don't think we want to go there.

For everything else, as Jeff correctly points out, this has nothing to
do with encoding. Only in the sense that default encoding in a language
like (only one example) Perl 6 dictates what representation you will
have to expect to be the common case.

  bitwise operations on anything other than 8-bit codepoints?
 
 I am very confused.  THIS IS WHAT WE ALL SEEM TO BE SAYING.  BITOPS ONLY
 ON EIGHT-BIT DATA.  AM I WRONG?

No, it's not, and could you please not get emotional about this? It's
what you, Dan and I have been saying, but I was responding to Jeff who
said:

Just FYI, the way I implemented bitwise-not so far, was to
bitwise-not code points 0x{00}-0x{FF} as uint8-sized things,
0x{100}-0x{} as uint16-sized things, and  0x{} as
uint32-sized things (but then bit-masking them with 0xF to
make sure that they fell into a valid code point range).

It was kind of important that I deal with the fact that I was proposing
a very different behavior for bit-shifting than exists currently for
boolean operations, I thought.

The question becomes should I CHANGE the existing bit-ops so that they
don't work on representations in two or four bytes for symmetry?

If this continues to be so contentious, I'm tempted to agree with the
nay-sayers and say that Parrot shouldn't do bit-vectors on strings, and
we should just implement a bit-vector class later on. Perl will just
have to suffer the overhead of translation. This just IS NOT important
enough to waste this many brain cells on.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback



signature.asc
Description: This is a digitally signed message part


Re: Bit ops on strings

2004-04-30 Thread Aaron Sherman
On Fri, 2004-04-30 at 10:42, Dan Sugalski wrote:

 Bitstring operations ought only be valid on binary data, though, 
 unless someone can give me a good reason why we ought to allow 
 bitshifting on Unicode. (And then give me a reasoned argument *how*, 
 too)

100% agree. If you want to play games with any other encoding, you may
proceed to write your own damn code ;-)

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




RE: Bit ops on strings

2004-04-30 Thread Aaron Sherman
On Fri, 2004-04-30 at 09:47, Butler, Gerald wrote:
 If I may interject for a moment:

Let me start by saying that I have not drunk the Unicode cool-aid. I'm
not at all certain that the overhead required to do all of what Parrot
wants to do is warranted, BUT that's beside the point.

Parrot is doing things the way it's doing them, and the time for debate
was a few months or at latest weeks ago, as far as I can tell.

 I have been following the discussion of strings on this list over the last few
 weeks. It seems that there is somewhat of a disconnect in various definitions
 of what is a string. It seems as though there needs to be a hierarchy to
 this with a little more clear definition. May I humbly propose the following:
 
   1. String - low-level, abstract, base class (or in Perl6 terms role --
 I think) which represents a logically contiguous series of Parrot Int

You say that you think there should be a hierarchy, but you're just
throwing out broad concepts and applying them equally to terminology,
representation and implementation. As such, there is no good way to
respond to what you suggest, nor any way to determine how much work you
are proposing be performed in order to bend existing code to your
suggested paradigm.

A string is what Dan described in his various postings on strings. Nuff
said.

###
Aside from the rest of your message, and bearing no logical impact on
the rest of it, I'd like to call out:

  The information contained in this e-mail message is privileged and/or
  confidential and is intended only for the use of the individual or entity
  named above.  If the reader of this message is not the intended recipient,
  or the employee or agent responsible to deliver it to the intended 
  recipient, you are hereby notified that any dissemination, distribution or 
  copying of this communication is strictly prohibited.  If you have received 
  this communication in error, please immediately notify us by telephone
  (330-668-5000), and destroy the original message.  Thank you.  

Need I point out http://www.goldmark.org/jeff/stupid-disclaimers/

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




RE: Bit ops on strings

2004-04-30 Thread Aaron Sherman
On Fri, 2004-04-30 at 12:18, Butler, Gerald wrote:

 Now, we
 have people talking about doing LSL/LSR on Strings. That is 100%
 inconsistent with that definition of a String.

Not at all, and keep in mind that I didn't propose this out of the blue.
bands, bxors and bors are existing string ops and have been for a
long time. I was just proposing rounding out the bit operator set. Go
check out the ops/bit.ops in CVS. It's even well documented.

I don't think Dan was being at all contradictory or inconsistent in his
string postings, given that those ops were already there. I may have
problems with the extent to which Parrot embraces abstraction, but
inconsistency is not one of them.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Bit ops on strings

2004-04-29 Thread Aaron Sherman
bit.ops defines some ops on strings, and not others. I was wondering if
anyone thinks the following would be useful (I'm offering to write them,
as it won't be much work):

lsls(inout STR, in INT)
lsrs(inout STR, in INT)

and, of course, their appropriate permutations.

For those who haven't looked at bit.ops, lsl and lsr are logical shift
left and logical shift right. Doing this operation on strings (as bands,
bors and bxors do) would allow the full range of bit-manipulation to be
done quickly on strings-as-bitfields (though, of course, it's already
possible even without these operations).

I don't see shls and shrs being useful (or terribly meaningful), but
correct me if I'm wrong there.

Of course, there's the small matter that shifting left might grow your
string, but this should not be a major concern for Parrot. I don't think
shifting right should shrink the string.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: MMD performance (was: keyed vtables and mmd)

2004-04-29 Thread Aaron Sherman
On Thu, 2004-04-29 at 03:33, Leopold Toetsch wrote:

 As Dan already said there is no performance hit (at least if the MMD
 tables don't blow the caches).

Good stuff! One thing leaps to mind when you mention the cache though...
keep in mind that blowing L2 cache (which we might be in no danger of
doing at all, but I'm just bringing it up) might be WORSE than you would
think on P4 and beyond because of hyperthreading.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Patch: Do rand() and srand()

2004-04-28 Thread Aaron Sherman
A simple implementation of rand() and srand() which may not be ideal for
Perl. Also included is the test file for random ops. If anyone can think
of a good way to ALWAYS know that a number we got back was random,
throw that into the test ;-)

Perl 5 mandates that it calls srand if you call rand without first
calling srand. Since not all Parrot client-languages will want that
behavior it's not in this version of rand, but that leaves Perl having
to maintain separate state.

In future, it would be nice to add a special rsrand or the like, which
checks to see if srand has already been called. For now this should be
sufficient for anyone who expects a rand op, and it's not an onerous
amount of state for Perl to store. The real concern is that Perl and
Foolanguage might both srand(), but that's not something I'm gonna think
too hard about just now and probably is a matter for library maintainers
in those languages anyway.

Oh, one more thing: I added op numbers for the sqrt ops since they were
causing me to be given some warnings during build. Feel free to ignore
them if you don't want sqrt to have op numbers.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback

Index: ops/math.ops
===
RCS file: /cvs/public/parrot/ops/math.ops,v
retrieving revision 1.18
diff -u -r1.18 math.ops
--- ops/math.ops	27 Apr 2004 15:48:20 -	1.18
+++ ops/math.ops	27 Apr 2004 20:10:16 -
@@ -1392,6 +1392,58 @@
 
 =back
 
+=item Brand(out NUM, in NUM)
+
+=item Brand(out NUM)
+
+=item Brand(out INT, in INT)
+
+=item Brand(out INT)
+
+=item Bsrand(in NUM)
+
+Generate random numbers based on the Random PMC.
+
+=cut
+
+inline op rand(out NUM, in NUM) {
+  FLOATVAL n = $2;
+  PMC * r = pmc_new_noinit(interpreter, enum_class_Random);
+  $1 = VTABLE_get_number(interpreter,r);
+  $1 *= $2;
+  goto NEXT();
+}
+
+inline op rand(out INT, in INT) {
+  INTVAL n = $2;
+  PMC * r = pmc_new_noinit(interpreter, enum_class_Random);
+  FLOATVAL resultnum;
+  resultnum = VTABLE_get_number(interpreter,r);
+  $1 = (INTVAL)(resultnum * (FLOATVAL)n);
+  goto NEXT();
+}
+
+inline op rand(out NUM) {
+  PMC * r = pmc_new_noinit(interpreter, enum_class_Random);
+  $1 = VTABLE_get_number(interpreter,r);
+  goto NEXT();
+}
+
+inline op rand(out INT) {
+  PMC *r = pmc_new_noinit(interpreter, enum_class_Random);
+  $1 = VTABLE_get_integer(interpreter,r);
+  goto NEXT();
+}
+
+inline op srand(in INT) {
+  INTVAL i = $1;
+  PMC * r = pmc_new_noinit(interpreter, enum_class_Random);
+  VTABLE_set_integer_native(interpreter,r,i);
+  goto NEXT();
+}
+
+=back
+
 =cut
 
 ###
Index: ops/ops.num
===
RCS file: /cvs/public/parrot/ops/ops.num,v
retrieving revision 1.36
diff -u -r1.36 ops.num
--- ops/ops.num	22 Apr 2004 09:17:38 -	1.36
+++ ops/ops.num	27 Apr 2004 20:10:16 -
@@ -1451,3 +1451,15 @@
 fetchmethod_p_p_s   1424
 fetchmethod_p_p_sc  1425
 setref_p_p  1426
+sqrt_n_i1427
+sqrt_n_ic   1428
+sqrt_n_n1429
+sqrt_n_nc   1430
+rand_n_n1431
+rand_n_nc   1432
+rand_i_i1433
+rand_i_ic   1434
+rand_n  1435
+rand_i  1436
+srand_i 1437
+srand_ic1438
#! perl -w
# Copyright: 2001-2003 The Perl Foundation.  All Rights Reserved.
# $Id$

=head1 NAME

t/op/random.t - Random numbers

=head1 SYNOPSIS

	% perl t/op/random.t

=head1 DESCRIPTION

Tests random number generation

=cut

use Parrot::Test tests = 5;
use Test::More;
use Parrot::Config;
use Config;

output_is('CODE', OUT, generate random int);
rand I0
print Called random just fine\n
end
CODE
Called random just fine
OUT

output_is('CODE', OUT, generate random 10int=0);
rand I0, 10
ge I0, 10, BROKE
lt I0, 0, BROKE
print Called random just fine\n
exit 0
  BROKE:
print Failure: random number 
print I0
print  is not in range 0..9\n
  end
CODE
Called random just fine
OUT

output_is('CODE', OUT, generate random num);
rand N0
print Called random just fine\n
end
CODE
Called random just fine
OUT

output_is('CODE', OUT, generate random 10num=0);
rand N0, 10.0
ge N0, 10.0, BROKE
lt N0, 0, BROKE
print Called random just fine\n
exit 0
  BROKE:
print Failure: random number 
print N0
print  is not in range 0.0..10.0\n
end
CODE
Called random just fine
OUT

output_is('CODE', OUT, Seed RNG);
srand 1
print Seeded the rng just fine\n
end
CODE
Seeded the rng just fine
OUT

1; # HONK


Re: Patch: Do rand() and srand()

2004-04-28 Thread Aaron Sherman
On Wed, 2004-04-28 at 10:01, Jens Rieks wrote:

 Thats the reason why we have a Random PMC (classes/random.pmc).
 
 I'am still not sure if we need an rand/srand OP for random numbers. As you 
 already mentioned, srand uses a global state and I belief that it will cause 
 trouble earlier or later.

If you check out the patch, you will notice that the Random PMC
(enum_class_Random) is the underlying implementation. This is just a
functional interface. There is no state actually maintained in these
functions.

The reason for the ops is to avoid having 99 of 100 mathish functions in
your native language defined as:

foo: emit_parrot(foo, return, args)

but rand defined as:

rand: emit_parrot(set, return, Random)

It's just a potential point of confusion.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: keyed vtables and mmd

2004-04-28 Thread Aaron Sherman
On Wed, 2004-04-28 at 11:33, Dan Sugalski wrote:

 We toss the keyed variants for everything but get and set. And... we 
 move *all* the operator functions out of the vtable and into the MMD 
 system.
[...]
 Comments?

Only one question. What's the performance hit likely to be and is there
any way around that performance hit for code that doesn't want to take
it?

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: File stat info

2004-04-28 Thread Aaron Sherman
On Wed, 2004-04-28 at 11:56, Dan Sugalski wrote:

 stat [PINS]x, Sy, Iz
 stat Px, Sy
[...]
 The returned PMC in the two-arg case could be a hash/array pmc and 
 allow string-keyed access to elements. If we do that, then the names 
 correspond to the constant names that follow.

 NAME  Filename, no extension or path
 EXTENSION File extension

This represents a world-view that is not universal. Rather than making
Parrot into a lens through which system features need to be de-coded,
why not provide a set of modular native-friendly tools with which to
perform such operations?

After all, in UNIX-land you can't know what the extension is (just
look at the filenames auto.home, .bash_logout and foo.tar.gz).

If you have a POSIX view by default, but provide a set of opcodes that
specialize in Win32, Darwin, VMS, PalmOS, then you can avoid these
points of confusion.

Heck, you might even provide this abstraction as yet another layer, if
it's really helpful. But most languages/system libraries that don't come
out of Microsoft expect a POSIX view of the world, so that's probably a
reasonable default.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: keyed vtables and mmd

2004-04-28 Thread Aaron Sherman
On Wed, 2004-04-28 at 12:33, Dan Sugalski wrote:
 At 12:21 PM -0400 4/28/04, Aaron Sherman wrote:

 Since we're specifically talking about Perl here (and probably not Perl
 5, since its overloading model is baroque and probably has to be managed
 by the compiler, not Parrot)
 
 Actually perl 5's overloading gets handled this way too. Overloaded 
 operations *can't* be handled by the compiler in dynamic languages, 
 and none of then do so.

Hmmm, I thought we were on the same page here, but I'll back up and
define terms if needed.

When I talk about a runtime construct being handled by the compiler vs
handled by parrot, I mean that the compiler will have to generate code
that knows how to deal with the construct, rather than relying on
Parrot's native constructs. That might be (as is the case with Perl 5
right now) that the construct is built into a runtime library, or it
might be that the compiler generates special code inline.

You seem to be replying to a point I would not make, e.g., that the
compiler would have to somehow determine at compile-time what would
happen. Clearly that's impossible.

 , I was under the impression that for types
 that are non-objecty,
 
 Types that are non-PMC won't check. PMC types will.

Ok, so in Boston you suggested that every variable declared by a high
level language would have to be a PMC and that INT registers for example
were only for the compilers and Parrot libraries to use... would that
not be the case for a Java int or a Perl 6 int and/or has it changed
since then?

I'm not arguing anything here, just trying to wrap my head around the
scope of this change's impact.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: File stat info

2004-04-28 Thread Aaron Sherman
On Wed, 2004-04-28 at 12:26, Dan Sugalski wrote:

   NAME  Filename, no extension or path
   EXTENSION File extension
 
 This represents a world-view that is not universal. Rather than making
 Parrot into a lens through which system features need to be de-coded,
 why not provide a set of modular native-friendly tools with which to
 perform such operations?
 
 Because you end up with 78 kinds of portability hell if you don't, as 
 everyone rolls their own way to handle this.

Oh, don't get me wrong! I'm not saying an abstraction isn't all keen and
such, I'm just wondering why we're abstracting farther out than POSIX
when the right way, as you point out has never been a matter of
consensus, and many client languages will be presenting POSIX semantics
through their standard libraries anyway, which they will have to massage
your representation back into.

 I'm OK with adding a TYPE to the stat array as well, though more for 
 an it's a file/socket/device/directory type thing, rather than an 
 it's an application/x-pdf file! thing.

Well, since no OS I know of except for MacOS/Darwin has a reliable way
to determine the ACTUAL type of a file, that's wise.


##

ALTERNATE RESPONSE

You didn't go far enough. Leave stat alone, back up 12 paces and write a
vfs layer for Parrot that comes in at a level of abstraction WAY above
the core POSIX/Win32/etc ops and provides a generic way to access URIs,
mailboxes, files, shared memory regions, etc, etc. Why abstract within
the arbitrary constraints of a POSIX-type stat model? Why assume that
something has a name rather than a locator? Why not provide an
abstract concept of type that encompasses all of MIME? Why not have
permissions/ACL/security be a totally separate object which can
understand SSL/TLS authentication models, pam, etc.?

The obvious response is that you want to ship Parrot before the Y3k bug
becomes a problem ;-) I understand that, and perhaps that's a reason to
speculate about such a best, but implement after 1.0, but that doesn't
invalidate the point.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: File stat info

2004-04-28 Thread Aaron Sherman
On Wed, 2004-04-28 at 13:40, Dan Sugalski wrote:

 ALTERNATE RESPONSE
 
 This is where you go mad, right? :)

Usually ;-)

   Why abstract within
 the arbitrary constraints of a POSIX-type stat model?
 
 I wasn't, actually. There's a good sprinkling of VMSisms in that 
 list, and I'm all for adding more stuff if need be. (I forgot to note 
 the various flavors of symlink, as well as the link count in cases 
 where it can be determined, as well as user and group of the file 
 itself)

Yeah, noticed the VMSism (ACLs, version (mentioned later), a separate
change dir bit), and being an old VMS hacker I approved in spirit, if
not in action. VMS was nice for when it was used. It's too bad it's
being maintained as a legacy now, and not the OS it could have been.

If you scrap the places that you've factored out things that will have
to be un-factored in the common case (filenames were the biggie), it's
fine... just don't expect people to do anything with it except extract
the POSIX semantics... after all, it took 15 years to get to the point
that POSIX could unify file semantics as much as it did

Keeping a niche open for ACLs is probably smart, esp. in the Windows
world.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: keyed vtables and mmd

2004-04-28 Thread Aaron Sherman
Ok, nuff said. I think there are slightly too many definitions that
we're not agreeing on (though, I suspect if we ironed those out, we'd be
in violent agreement).

As for INT/PMC thing I'm pretty sure all of my concerns come down
to: compilers can really screw each other over, but then we knew that,
and there will have to be conventions to prevent it.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: File stat info

2004-04-28 Thread Aaron Sherman
On Wed, 2004-04-28 at 14:51, Dan Sugalski wrote:
 At 8:08 PM +0200 4/28/04, Jerome Quelin wrote:
 Dan Sugalski wrote:
 [...]
   CTIME Creation time
 
 Will unixen use this for change time? (also spelled ctime too)
 
 Nope, for that they use mtime. We can expand the names to skip the confusion.

*scratch head*... I'll dig out a man-page 'cause I don't want to sign on
to the POSIX site just this sec:

  time_tst_atime;/* time of last access */
  time_tst_mtime;/* time of last modification */
  time_tst_ctime;/* time of last change */

There's no creation time listed, and mtime and ctime are most certainly
not the same thing. Now, if you want to add a creation time, that's
fine, but I recommend against calling it ctime, as that's a sort of well
defined word in these parts.

 Should be OWNER_CD?
 Should be SYSTEM_CD?
 Should be OTHER_CD?
 
 Yep. Cut'n'paste error. :(

I didn't even see that. Being dyslexic, my eye skips over that kind of
error very easily. I just see it as my own mistake ;-)

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: File stat info

2004-04-28 Thread Aaron Sherman
On Wed, 2004-04-28 at 15:42, Dan Sugalski wrote:
 At 10:32 PM +0300 4/28/04, Jarkko Hietaniemi wrote:

   I think you'll find ACL use is increasing, not decreasing. They've
   been tacked on to most recent filesystems, and they're coming into

But AFAIK, Windows is the only place where the use of ACLs is encouraged
in the native API. Everywhere else I look, they seem to be an add-on
that you can use if you want to tie yourself to a particular set of
extensions. This is why, for example, AIX has had ACLs forever, but I
can't name one product that uses them (other than backup and restore
software ;-)

 This is true.  But good luck in trying to map between the ACL schema of
 different systems :-(
 
 Yech, good point. I'm not even sure you can do any sort of sane 
 abstraction there.

Sure you can. It's just at a much higher level of abstraction than stat.
You could very easily say this is a file permission object and ask it
can I do X to this file? or can user do X to this file where
user might be a process or uid_t or whatever.

That's perfectly reasonable as a core system abstraction layer, I was
just waving the keep the native access too flag, since I've seen too
many systems abstract away the native system to the point that no
reasonable integration can occur between the language and its
surroundings (e.g. Java).

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Patch: don't build docs for .*.ops

2004-04-27 Thread Aaron Sherman
See attached patch which prevents the docs/Makefile from including
invalid targets that just happen to be editor temp files (emacs temp
files have a # character which really boggles make).

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback

Index: config/gen/makefiles.pl
===
RCS file: /cvs/public/parrot/config/gen/makefiles.pl,v
retrieving revision 1.30
diff -u -r1.30 makefiles.pl
--- config/gen/makefiles.pl	19 Apr 2004 11:31:44 -	1.30
+++ config/gen/makefiles.pl	27 Apr 2004 19:52:18 -
@@ -84,7 +84,7 @@
   # set up docs/Makefile, partly based on the .ops in the root dir
 
   opendir OPS, ops or die opendir ops: $!;
-  my @ops = sort grep { /\.ops$/ } readdir OPS;
+  my @ops = sort grep { !/^\./  /\.ops$/ } readdir OPS;
   closedir OPS;
 
   my $pod = join  , map { my $t = $_; $t =~ s/\.ops$/.pod/; ops/$t } @ops;


Re: hyper op - proof of concept

2004-04-25 Thread Aaron Sherman
On Fri, 2004-04-23 at 15:34, Dan Sugalski wrote:
 At 3:25 PM -0400 4/23/04, Aaron Sherman wrote:
 That I did not know about, but noticed Dan pointing it out too. I'm
 still learning a lot here,
 
 It might be best, for everyone's peace of mind, blood pressure, and 
 general edification, to take a(nother) run through the documentation. 
 The stuff in docs/pdds isn't too out of date (mostly) and all the 
 opcodes have POD, so you can do something like:

Yeah, I've been plowing through it a piece at a time. I'm currently
still mowing down the DOD docs which (given that I've been in
application space for the last 8 years, and the world of GC has changed
radically in that time) is a hard read. There are 14,304 lines of POD in
the docs subdir and its immediate subdirs. That's a fair amount of
reading, especially for something as dense as technical documentation.

 While diving in feet-first does get you going, looking for the rocks 
 and deep water first is never ill-advised... :)

Is that really what I'm doing?

It's also the case that there's a HUGE amount of documentation and
source code, and I doubt that ANYONE coming to this list and asking
questions will understand all of it. I would be so egotistical as to
even suggest that I've read more of the source and docs than most who
will be asking questions in the next few years.

Given that, getting the stupid stuff out of the way now, and putting it
in a highly indexed form (e.g. a mailing list FAQ) that people on the
list can be pointed at, might save EVEN MORE blood pressure.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback



signature.asc
Description: This is a digitally signed message part


Re: A12: The dynamic nature of a class

2004-04-23 Thread Aaron Sherman
On Fri, 2004-04-23 at 08:42, Dan Sugalski wrote:

 Since any type potentially has assignment behaviour, it has to be a 
 constructor. For example, if you've got the Joe class set such that 
 assigning to it prints the contents to stderr, this:
 
 my Joe $foo;
 $foo = 12;
 
 should print 12 to stderr. Can't do that if you've not put at least a 
 minimally constructed thing in the slot.

Yes, and to make that statement a bit more generic, I would suggest
that:

my X $y;

is, as far as I can tell:

my X $y = undef;

except that the explicit assignment might have a different signature
(being as you are providing a parameter to the constructor, even if it's
undef). There was a long thread a LONG time ago on what passing undef
meant for signature matching and how that interacted with default
arguments. I'm sorry, but I'm not able to recall the result at this
point. If I have time, I'll chase it down.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: hyper op - proof of concept

2004-04-23 Thread Aaron Sherman

Note: We've moved past hyper-ops (I hope!), but there are still some
details in this post that deserve a response on tangential topics.

On Wed, 2004-04-21 at 11:52, Leopold Toetsch wrote:
 Aaron Sherman [EMAIL PROTECTED] wrote:

  That's unrealistic.
 
 No. A real test.

Sorry, I was not clear enough. Yes, of course, non-Parrot Perl 5 is
going to be slow at this, but we expect that and your results showed
nothing surprising.

What might be interesting is to compare Parrot to Parrot doing this with
and without a hyper-operator. That's all I was trying to say.

As for the DOD: you have an excellent point, but it extends far beyond
the hyper-operators. I'm starting to think that front-ends like the
Python compiler or the Perl 6 compiler are going to need controls over
the DOD for just the reasons you cite. After all, they know when they
are about to start doing some large looping operation that's all highly
constrained with respect to allocation. It would make sense to gather
the resources they need, lock down DOD, do what they need to do and then
unlock the DOD...

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: hyper op - proof of concept

2004-04-23 Thread Aaron Sherman
On Fri, 2004-04-23 at 14:52, Leopold Toetsch wrote:
 Aaron Sherman [EMAIL PROTECTED] wrote:
 
  What might be interesting is to compare Parrot to Parrot doing this with
  and without a hyper-operator. That's all I was trying to say.
 
 I'd posted that as well. Here again with an O3 build of parrot:

Oops, missed that. Thanks! I'm shocked by the difference in
performance... it makes me wonder how efficient the optimization+JIT is
when the two operations are SO different. I must simply not understand
what's going on at the lowest level here. More investigation needed on
my part, as I'm sure this will be an important point for me to
understand in later topics that I'll run into writing Parrot code.

  As for the DOD: you have an excellent point, but it extends far beyond
  the hyper-operators. I'm starting to think that front-ends like the
  Python compiler or the Perl 6 compiler are going to need controls over
  the DOD for just the reasons you cite. After all, they know when they
  are about to start doing some large looping operation that's all highly
  constrained with respect to allocation. It would make sense to gather
  the resources they need, lock down DOD, do what they need to do and then
  unlock the DOD...
 
 Well, it's unlikely that we can expose all the details the more that
 such details may change. We could have a generalized version of such an
 operation though:
 
   i_need_now_x_pmcs_and_wont_dispose_any_start   10
   # ... deep clone code or loop
   i_need_now_x_pmcs_and_wont_dispose_any_end
 
 er EOPCODETOOLONG :)

Heh, yeah I getcha. It would be interesting, but as you point out it's
ugly and specialized.

   sweep 1
   sweepoff
   ... deep clone or some such
   sweepon

That I did not know about, but noticed Dan pointing it out too. I'm
still learning a lot here, and while I know it's frustrating, I hope to
condense what I learn into some usable forms (perhaps adding to the FAQ
as I suggested to Dan). I don't always agree with the two of you, but
that's not required. I just need to understand enough that I can get the
work done that I want to do, and make it efficient enough that people
actually USE it ;-)

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




missint math ops?

2004-04-23 Thread Aaron Sherman
I'm trying to write some code, and I'm not finding certain ops. Now,
perhaps this is just that I don't know how to look for them, or perhaps
they have yet to be written, so please pardon my ignorance. These are
things that seem fairly atomic, and which exist in the C library. If
they truly don't exist, perhaps this is a good place for me to jump in
and get to know the code rather that just talking :)

rand/srand
sqrt

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: hyper op - proof of concept

2004-04-22 Thread Aaron Sherman
On Wed, 2004-04-21 at 13:51, Larry Wall wrote:

 In any event, it is absolutely my intent that the builtin array
 types of Perl 6 support PDL directly, both in terms of efficiency
 and flexibility.  You ain't seen Apocalypse 9 yet, but that's what
 it's all about.  Straight from my rfc list file:

Ok, the combination of Dan's (perhaps overzealous) emphasis on the
dynamic nature of Parrot's client languages and my assumption that we
had learned all there was to learn about the storage of aggregates
mislead me here.

That said, I now see why hyper goes in Parrot... maybe. It depends on
how dynamic Perl is about lazy arrays (e.g. my int @foo = 1..Inf) and
what happens when I:

my int @foo = 1..3;
$foo[0] = URI::AutoFetch.new(http://numberoftheweek.math.gov/;);

If that's polymorphic, we're hosed. If it's an auto-conversion, then
we're good. I like the polymorphic version for a lot of reasons, but
I'll understand if we can't get that.

Thanks all!

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




A12: The dynamic nature of a class

2004-04-22 Thread Aaron Sherman
Ok, so I got to thinking about Parrot and compilation last night. Then
something occurred to me, and I'm not sure how it works.

When Perl sees:

class Joe { my $.a; method b {...} }
my Joe $j;

Many things happen and some of them will require knowing what the result
of the previous thing is.

More to the point, Perl 6's compiler will have to parse class Joe,
create a new object of type Class, parse and execute the following
block/closure in class MetaClass, assign the result into the new Class
object named Joe and then continue parsing, needing access to the values
that were just created in order to further parse the declaration of $j

There are several ways this can be accomplished:

 1. Have a feedback loop between Parrot and Perl 6 that allows the
compiler to execute a chunk of bytecode, get the result as a PMC
and store it for future use. This will probably be needed
regardless of which option is chosen, but may not be ideal.
 2. Have a pseudo Perl 6 interpreter in the compiler which can
execute a limited subset of Perl 6 that is allowed inside of
class and module definitions (Larry implied that they were not
limited in this way, but if they were, compilation could be
optimized a bit).
 3. Attempt to build a one-shot, bytecode stream that outputs a
bytecode stream that represents the program. This would be the
fastest in the general case, and would make pre-bytecoded
libraries much easier to implement. However, it would also mean
that class and module definitions could not affect the grammar
of the language, and Larry has said that won't be the case :-(

To me, #2 looks most attractive, but requires some duplication of
effort.

How easy would it be to interact with Parrot in the way that #1
proposes?

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: A12: The dynamic nature of a class

2004-04-22 Thread Aaron Sherman
On Thu, 2004-04-22 at 11:22, Dan Sugalski wrote:
 At 10:48 AM -0400 4/22/04, Aaron Sherman wrote:

 More to the point, Perl 6's compiler will have to parse class Joe,
 create a new object of type Class, parse and execute the following
 block/closure in class MetaClass, assign the result into the new Class
 object named Joe and then continue parsing, needing access to the values
 that were just created in order to further parse the declaration of $j
 
 Erm... no. Not even close, really. There's really nothing at all 
 special about this--it's a very standard user-defined type issue, 
 dead-common compiler stuff. You could, if you wanted, really 
 complicate it, but there's no reason to and unless someone really 
 messes up we're not going to. Just no need.

That's not at all what A12 said. And, I quote:

One of the big advances in Perl 5 was that a program could be in
charge of its own compilation via use statements and BEGIN
blocks. A Perl program isn't a passive thing that a compiler has
its way with, willy-nilly. It's an active thing that negotiates
with the compiler for a set of semantics. In Perl 6 we're not
shying away from that, but taking it further, and at the same
time hiding it in a more declarative style. So you need to be
aware that, although many of the things we'll be talking about
here look like declarations, they trigger Perl code that runs
during compilation.

This is in direct contradiction to what I'm hearing from you, Dan.
What's the scoop?

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: A12: The dynamic nature of a class

2004-04-22 Thread Aaron Sherman
On Thu, 2004-04-22 at 14:44, Dan Sugalski wrote:
 At 1:05 PM -0400 4/22/04, Aaron Sherman wrote:

 This is in direct contradiction to what I'm hearing from you, Dan.
 What's the scoop?
 
 The scoop is that
 
 my Joe $foo;
 
 emits the code that, at runtime, finds the class ID of whatever Joe's 
 in scope, instantiates a new object of that class, and sticks it into 
 the $foo lexical slot that's in scope at runtime.

Right, ok, good. I gotcha.

But according to A12 as I understand it, the part BEFORE that, which
looked innocently like a definition:

class Joe { my $.a; method b {...} }

would actually get turned into a BEGIN block that executes the body of
the class definition as a closure in class MetaClass and stores the
result into a new object (named Joe) of class Class.

Perl 6's compiler does not (by default, at least) know how to run code.
It just knows how to translate that text into bytecode (or IMCC or
something). So it will need SOMETHING to execute, possibly multiple
times with parsing going on before, after or during some of those
executions.

During is the hard one. That means you have to actually call back from
Parrot into the Perl 6 compiler. But, even the simple:

eval eval 'eval 1' 

causes that problem. How does Ponie deal with that? Does it simply act
as an interpreter for the first pass and then do code-gen ala -MO ? If
so, that's a nice dodge, but putting a full Perl 6 interpreter into Perl
6's compiler seems to me to be a tad heavy-weight.

Thoughts? Am I missing a simple way to get around this?

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: hyper op - proof of concept

2004-04-22 Thread Aaron Sherman
On Wed, 2004-04-21 at 15:46, Larry Wall wrote:
 On Wed, Apr 21, 2004 at 03:15:37PM -0400, Dan Sugalski wrote:
 : The math folks tell me it makes sense. I can come up with a 
 : half-dozen non-contrived examples, and will if I have to. :-P
 
 I've said this before, and I'll keep repeating it till it sinks in.
 The math folks are completely, totally, blazingly untrustworthy on
 this subject. [...] They can't have my »« without a fight.

Ah... now you see the true face of the age-old Linguistics-Mathematics
wars! ;-)

But seriously, to summarize what I've learned from this thread:

  * my int @foo will compile down into an efficient representation
  * PDL (and its like) will be able to use this to efficiently
perform high-level operations on arrays, but only built-in
operations
  * If someone (e.g. PDL) wants to implement other operations and
their hyper-equivalent, they can do it in a high level language
like P6 or as run-time loadable parrot opcodes (which PDL will
certainly have to do, since most of their ops are in an ancient
and gigantic Fortran lib).

Sounding like problem solved to me! Thanks Larry.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: A12: The dynamic nature of a class

2004-04-22 Thread Aaron Sherman
On Thu, 2004-04-22 at 15:37, Luke Palmer wrote:

 But Perl 6 is tightly coupled with Parrot.  Perl 6 will be a Parrot
 program (even if it calls out to C a lot), and can therefore use the
 compreg opcodes.  That means that any code executing in Parrot can call
 back out to the Perl 6 compiler, and obviously the Perl 6 compiler can
 call out to parrot.

Clearly my question was garbled the first time, as this answer is
exactly what I was looking for. Thanks!

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: hyper op - proof of concept

2004-04-21 Thread Aaron Sherman
On Tue, 2004-04-20 at 18:06, Leopold Toetsch wrote:
 Aaron Sherman [EMAIL PROTECTED] wrote:

This horse is getting a bit ripe, so I'm going to skip most of the
detail. I think we all agree on most of the basics, we just disagree on
what to do with them. That's cool.

I do want to pick a couple of small nits though:

 Well, yes. Except for the special case, which is nice though:
 
 $ time parrot ih.imc  #[1]
 real0m0.370s
 
 $ time perl i.pl  #[2]
 real0m5.656s

That's unrealistic. In P6, you should be able to take:

@a + @b

and turn it into:

# Trivial example of hyper-operation, untested pseudo-IMCC
# Just take __Perl_Ary_a and add it to __Perl_Ary_b and put
# the result in tmp5
.local int tmp1
tmp1 = 0
.local int tmp2
tmp2 = __Perl_Ary_a
.local int tmp3
tmp3 = __Perl_Ary_b
.local int tmp4
# Not sure what the ? is below... is there a typeof?
.local ? PerlArray tmp5
tmp5 = new .PerlArray
# We auto-extend here... that may not be P6's eventual MO
# but it's enough to get the point across
if tmp2 = tmp3 goto AutoExtend_HYPER_1
__Perl_Ary_a = tmp3
tmp4 = tmp3
goto PRE_HYPER_1
AutoExtend_HYPER_1:
__Perl_Ary_b = tmp2
tmp4 = tmp2
PRE_HYPER_1:
tmp5 = tmp4
BEGIN_HYPER_1:
if tmp1 = tmp4 goto END_HYPER_1
tmp5[tmp1] = __Perl_Ary_a[tmp1] + __Perl_Ary_b[tmp1]
CONT_HYPER_1:
# I forget if there's an inc op
tmp1 = tmp1 + 1
goto BEGIN_HYPER_1
END_HYPER_1

Are we seriously suggesting that after JIT, that's going to be as slow
as raw Perl, or even any slower than:

.local ? PerlArray tmp1
hyper
tmp1 = __Perl_Ary_a + __Perl_Ary_b

?! If so, I'm curious to know why. It seems to me that you're just
moving the work from the Perl 6 compiler all the way down to the JIT,
but the resulting code is the same, no?

I would agree that a bulk array copy and iterators should go in Parrot.
That much would speed up many things (especially the above code).

Putting Perl 6 features into Parrot without factoring out their modular
essence would seem to me to result in a great deal of duplication, but
now I'm starting to get close to that horse again

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: hyper op - proof of concept

2004-04-21 Thread Aaron Sherman
On Wed, 2004-04-21 at 10:13, Simon Glover wrote:

  Absolutely -- I really, _really_ want to be able to use hyper ops with
  fixed size, floating point arrays, and to have that be as fast as
  possible, as that should make it possible to implement something like
  PDL in the core.

Mistake.

You don't want to have to convert to-and-from arrays of PMCs in order to
do those ops, and regardless of what kind of hyper-nifty-mumbo-jumbo you
put into Parrot, that's exactly what you're going to have to do.

In fact, Parrot Data Language (if there were such a thing) would likely
introduce its own runtime-loadable opcode set to operate on a new PMC
type called a piddle. Then, each client language could define (in a
module/library) its own means of interacting with a piddle. For example
in Perl, you might:

multi method new(Class $class, int @ary) {...}
multi method new(Class $class, float @ary) {...}
multi method new(Class $class, int $value) {...}
multi method new(Class $class, Octets $value: %*_) {...}

and then you would override BUILD in order to emit your special piddle
opcodes.

Then, in user-space:

my PDL::Piddle $foo = [1,2,3,4,5,6];

Does what you expect, and

$foo + $bar

is special.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




  1   2   >