Re: iThreads and selective variable copying (was Destructors andiThreads)
Dave Mitchell [EMAIL PROTECTED] writes: 1. It would be very hard to create these options. 2. Any programmer that used an 'only these' option would almost certainly create a program that at best would not work, and at worst would coredump. Whats happens if the user forgot to copy $/ ? What does Perl do the next time it tries to read from a file and wants to know the current line delineator? Then there's stuff like stashes - %main:: is a hash that indirectly references just about every object in the perl interpreter. Does the programmer have to remember to exclude that? You are suggesting opening up a can of worms which I have no great desire to see opened. Much as I philosophically like Eric's idea this does indeed look too messy for perl5. Lets see if perl6 can or has already fixed this. Dave.
Re: [PROPOSAL] Cstat opcode and interface
Dan Sugalski [EMAIL PROTECTED] writes: At 10:11 AM -0800 3/10/04, Brent \Dax\ Royal-Gordon wrote: Josh Wilmes wrote: It's also quite possible that miniparrot is a waste of time. I'm pretty much of the opinion myself that it's an academic exercise at this point, but one which keeps us honest, even if we don't use it. Miniparrot, or something very much like it, is the final build system. Yep. We need to make sure it always works. Which, unfortunately, will end up making things a hassle, since there's no platform-independent way to spawn a sub-process, dammit. :( On that topic specifically - the DOS style spawn() API is easy to fake with fork/exec but converse is NOT true. i.e. if Miniparrot assumes: pid_t my_spawn(const char *progname,int argc,const char *argv[]); int my_wait(pid_t proc); then Unix-oids can have pid_t my_spawn(const char *progname,int argc,const char *argv[]); { pid_t pid = fork(); if (pid) return pid; execv(progname,argc,argv); } Unidirectional popen() is also reasonably portable.
Re: [PROPOSAL] Cstat opcode and interface
Dan Sugalski [EMAIL PROTECTED] writes: At 11:12 AM -0800 3/10/04, Brent \Dax\ Royal-Gordon wrote: Dan Sugalski wrote: Which, unfortunately, will end up making things a hassle, since there's no platform-independent way to spawn a sub-process, dammit. :( Unixen seem to support system(). D'oh! It's C89 standard. I'm getting stuck in the 80s with the multitude of exec variants. Yeah, with that issue taken care of it's a lot more doable. Nevermind... But: A. system() is blocking. B. system() takes single string so whatever calls system() has to be aware of the System's quoting rules.
Re: Dates and times again
Larry Wall [EMAIL PROTECTED] writes: That would seem like good future proofing. Someday every computer will have decentish subsecond timing. I hope to see it in my lifetime... It isn't having the sub-second time in the computer it is the API to get at it... My guess is that eventually they'll decide to put a moratorium on leap seconds, with the recommendation that the problem be revisited just before 2100, on the assumption that we'll add all of a century's leap seconds at once at the end of each century. That would let civil time drift by at most a minute or two before being hauled back to astronomical time. Given that most people live more than an minute or two from their civil-time meridian who will notice? (Says me about 8 minutes west of GMT.) I'd say what's missing are the error bars. I don't mind if the timestamp comes back integral on machines that can't support subsecond timing, but I darn well better *know* that I can't sleep(.25), or strange things are gonna happen. But you can fake sleep() with select() or whatever.
Re: Using Ruby Objects with Parrot
Mark Sparshatt [EMAIL PROTECTED] writes: I'm not 100% certain about the details but I think this is how it works. In languages like C++ objects and classes are completely seperate. classes form an inheritance heirachy and objects are instances of a particular class. However in some languages (I think that Smalltalk was the first) there's the idea that everything is an object, including classes. So while an object is an instance of a class, that class is an instance of another class, which is called the metaclass. I don't there's anything special about these classes other than the fact that their instances are also classes. Thinking about it I think you may have the relationship between ParrotObject and ParrotClass the wrong way around. Since a class is an object but and object isn't a class it would be better for ParrotClass to inherit from ParrotObject, rather than the other way round. In Ruby when you create a class Foo, the Ruby interpreter automatically creates a class Foo' and sets the klass attribute of Foo to point to Foo'. This is important since class methods of Foo are actually instance methods of Foo'. Which means that method dispatch is the same whether you are calling an instance of class method. So in perl5-ese when you call Foo-method you are actually calling sub Foo::method which is in some sense a method of the %Foo:: stash object. So what you suggest is as if perl5 compiled Foo-method into (\%Foo::)-method and the %Foo:: 'stash' was blessed... foo.method() looks at foo's klass attribute then checks the returned class object (Foo) for method Foo.method() looks at Foo's klass attribute and again checks the returned class object (Foo') for method. The Pickaxe book has got a better explanation of this (at http://www.rubycentral.com/book/classes.html though without any diagrams :( ) In Python when defining a class it's possible to set an attribute in the class that points to the classes metaclass. The metaclass itself is just a normal class that defines methods which override the normal behaviour of the class. IIRC Python has got both class methods and meta class instance methods which work almost (but not quite) in the same way as each other. Hopefully someone with more experience with Python will be able to explain better. I'm not sure if this has cleared things up or just made them more confusing.
Testing XS modules on Ponie
Arthur Bergman [EMAIL PROTECTED] writes: This is Ponie, development release 2 And, isn't sanity really just a one-trick ponie anyway? I mean all you get is one trick, rational thinking, but when you're good and crazy, oooh, oooh, oooh, the sky is the limit. -- the tick Welcome to this second development release of ponie, the mix of perl5 and parrot. Ponie embeds a parrot interpreter inside perl5 and hands off tasks to it, the goal of the project is to hand of all data and bytecode handling to parrot. With this release all internal macros that poke at perl data types are converted to be real C functions and to check if they are dealing with traditional perl data types or PMC (Parrot data types) data. Perl lvalues, arrays and hashes are also hidden inside PMCs but still access their core data using traditional macros. The goal and purpose of this release is to make sure this approach keeps on working with the XS modules available on CPAN and to let people test with their own source code. No changes where made to any of the core XS modules. So ponie-2 compiles and passes all its tests for me. So how do I see if it can handle the XS module from hell - Tk ?
Re: [perl #16689] [NIT] trailing commas in enumerator lists bad
Jarkko Hietaniemi [EMAIL PROTECTED] writes: # New Ticket Created by Jarkko Hietaniemi # Please include the string: [perl #16689] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org/rt2/Ticket/Display.html?id=16689 Freshly checked out parrot moans a lot: cc: Info: ./include/parrot/string.h, line 56: Trailing comma found in enumerator list. (trailcomma) } TAIL_flags; ^ Trailing commas in enumerator lists is unportable behaviour in C. And in case anyone has not come accross the trick before it is not uncommon to have enum foo { /* auto-genererated stuff */ foo_MAX }; where foo_MAX is a handy number of entries value as well as avoiding the trailing comma issue. -- Nick Ing-Simmons http://www.ni-s.u-net.com/
Re: [perl #15006] [PATCH] Major GC Refactoring
# New Ticket Created by Mike Lambert # Please include the string: [perl #15006] # in the subject line of all future correspondence about this issue. # URL: http://bugs6.perl.org/rt2/Ticket/Display.html?id=15006 Tickets from RT don't have an address in the To: line and so my mailfilter is filing them as SPAM -- Nick Ing-Simmons http://www.ni-s.u-net.com/
Re: The internal string API
Jarkko Hietaniemi [EMAIL PROTECTED] writes: Taiwanese read traditional chinese characters, but PRC people read simplied chinese. Even we take the same data, and same program (code), people just read differently. As an end user, I want to make the decision. It will drive me crazy if Perl render/display the text file using traditional chinese just because it was tagged as Big5. Perl will (probably, whispers he, crossing his fingers) never translate data that far. Perl (5) does not display chr(0x1234) to me using Unicode fonts, it just pushes the octets to a file descriptor/handle. Unicode is language-neutral. Perl may not, but I assume someone will be fool enough to give it a GUI. perl5.7.1+/Tk803.???-to-be will now make a stab at rendering Unicode (not a very good one I am the 1st to admit which is why it isn't released!). It would be good if Tk-for-perl6 did not have to break the rules or provide its own hooks for meta data and could use the string API. -- Nick Ing-Simmons http://www.ni-s.u-net.com/
Re: Should the op dispatch loop decode?
Benjamin Stuhl [EMAIL PROTECTED] writes: I don't see where shadow functions are really necessary - after all, no one has ever complained that you can't do pp_chomp(sv); /* or pp_add(sv1, sv2), for that matter */ in Perl 5. Yes we did. And note the doop.c file which is part answer to the shadows. Given the inner functions we could presumable generate the decode functions (c.f. xsubpp) -- Nick Ing-Simmons
Re: Stacks, registers, and bytecode. (Oh, my!)
Uri Guttman [EMAIL PROTECTED] writes: DS The one handy thing about push and pop is you don't need to go DS tracking the stack manually--that's taken care of by the push and DS pop opcodes. They can certainly be replaced with manipulations of DS a temp register and indirect register stores or loads, but that's DS more expensive--you do the same thing only with more dispatch DS overhead. DS And I'm considering the stack as a place to put registers DS temporarily when the compiler runs out and needs a spot to DS squirrel something away, rather than as a mechanism to pass DS parameters to subs or opcodes. This is a stack in the traditional DS scratch-space sense. i agree with that. the stack here is mostly a call stack which save/restores registers as we run out. with a large number like 64, we won't run out until we do some deep calls. then the older registers (do we have an LRU mechnism here?) get pushed by the sub call prologue which then uses those registers for its my vars. I don't like push/pop - they imply a lot of stack limit checking word-by-word when it is less overhead for compiler to analyse the needs of whole basic-block check-for/make-space-on the stack _once_ then just address it. is the sub call/return stack also the data (scratch) stack? i think separate ones makes sense here. the data stack is just PMC pointers, the code call stack has register info, context, etc. One stack is more natural for translation to C (which has just one). One problem with FORTH was allocating two growable segments for its two stacks - one always ended up 2nd class. -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks, registers, and bytecode. (Oh, my!)
Uri Guttman [EMAIL PROTECTED] writes: think of this as classic CISC code generation with plenty of registers and a scratch stack. this is stable technology. we could even find a code generator guru (i don't know any obvious ones in the perl6 world) Classic CISC code generation taught us that CISC is a pain to code-gen. (I am not a Guru but did design TMS320C80's RISC specifically to match gcc of that vintage, and dabbled in a p-code for Pascal way back.) special registers ($_, @_, events, etc.) are indexed with a starting offset of 64, so general registers are 0-63. DS I'd name them specially (S0-Snnn) rather than make them a chunk of the DS normal register set. All that dividing registers into sub-classes does it cause you to do register-register moves when things are in the wrong sort of register. Its only real benefit is for encoding density as you can imply part of the register number by requiring addresses to be in address registers etc. It is not clear to me that perl special variables map well to that. Mind you the names are just a human thing - it is the bit-pattern that compiler cares about. oh, they have macro names which are special. something like: #defineMAX_PLAIN REG 64 /* 0 - 63 are plain regs */ #defineREG_ARG 64 /* $_ */ #defineREG_SUB_ARG 65 /* @_ */ #defineREG_ARGV66 /* @ARGV */ #defineREG_INT167 /* integer 1 */ #defineREG_INT268 /* integer 1 */ uri -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks, registers, and bytecode. (Oh, my!)
Dan Sugalski [EMAIL PROTECTED] writes: At 02:08 PM 5/30/2001 +, Nick Ing-Simmons wrote: Classic CISC code generation taught us that CISC is a pain to code-gen. (I am not a Guru but did design TMS320C80's RISC specifically to match gcc of that vintage, and dabbled in a p-code for Pascal way back.) Right, but in this case we have the advantage of tailoring the instruction set to the language, and given the overhead inherent in op dispatch we also have an incentive to hoist opcodes up to as high a level as we can manage. That is of course what they/we all say ;-) The 68K for example matched quite well to the low-tech compiler technology of its day, as did UCSD's p-code for USCD Pascal, and DSPs have their own reasons (inner loops are more important than generic C) for their CISC nature. Even the horrible x86 architecture is quasi-sane if you assume all variables are on the stack addressed by the Base Pointer. It is interesting now that people are looking at building chips for JVM how much cursing there is about certain features - though I don't have the references to hand. The overhead of op dispatch is a self-proving issue - if you have complex ops they are expensive to dispatch. In the limit FORTH-like threaded code while (1) *(*op_ptr++)(); is not really very expensive, it is then up to the op to adjust op_ptr for in-line args etc. Down sides are size op is at least size of a pointer. With a 16-bit opcode as-per-Uri that becomes: while (1) *(table[*op_ptr++])(); (Assuming we don't need to check bounds 'cos we won't generate bad code...) One can then start adding decode to the loop: while (1) { op_t op = *op_ptr++; switch(NUM_ARGS(op)) case 1: *(table[FUNC_NUM(op)])(*op_ptr++); break; case 3: *(table[FUNC_NUM(op)])(op_ptr[0],op_ptr[1],op_ptr[2]); op_ptr += 3; break; ... } Then one can do byte-ordering and mis-aligned hackery and index into reg-array while (1) { op_t op = GET16BITS(*op_ptr); switch(NUM_ARGS(op)) case 1: *(table[FUNC_NUM(op)])(reg_ptr[GET8BITS(*op_ptr)]); break; ... } -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks, registers, and bytecode. (Oh, my!)
Dave Mitchell [EMAIL PROTECTED] writes: There's no reason why you can.t have a hybrid scheme. In fact I think it's a big win over a pure register-addressing scheme. Consider... Which was more or less my own position... At the start of a new scope, the stack is extended by N to create a new stack frame (including a one-off check that the stack can be extended). There is then a 'stack pointer' (sp) which is initialised to the base of the new frame, or an initial offset thereof. (So sp is really just a temporary index within the current frame.) Then some opcodes can use explicit addressing, while others can be explicit, or a mixture. Explicit opcodes specify one or more 'registers' - ie indexes within the current frame, while implicit opcodes use the current value of sp as an implicit index, and may alter sp as a side effect. So an ADD opcode would use sp[0], sp[-1] to find the 2 operands and would store a pointer to the result at sp[-1], then sp--. The compiler plants code in such a way that it will never allow sp to go outside the current stack frame. This allows a big win on the size of the bytecode, and in terms of the time required to decode each op. Consider the following code. $a = $x*$y+$z Suppose we have r5 and r6 available for scratch use, and that for some reason we wish to keep a pointer to $a in r1 at the end (perhaps we use $a again a couple of lines later): This might have the following bytecode with a pure resiger scheme: GETSV('x',r5) # get pointer to global $x, store in register 5 GETSV('y',r6) MULT(r5,r5,r6) # multiply the things pointed to by r5 and r6; store ptr to # result in r5 GETSV('z',r6) ADD(r5,r5,r6) GETSV('a',r1) SASSIGN(r1,r5) Globals are a pain. Consider this code: sub foo { my ($x,$y,$z) = @_; return $x*$y+$z; } In the pure register (RISC-oid) scheme the bytecode should be: FOO: MULT(arg1,arg2,tmp1) ADD(tmp1,arg3,result) RETURN That is lexicals get allocated registers at compile time, and ops just go get them. In the pure stack with alloc scheme (x86-oid) scheme it should be ENTER +1 # need a temp MULT SP[1],SP[2],SP[4] # $x*$y ADD SP[4],SP[3],SP[1]# temp + $z - result RETURN -2# Loose temp and non-results And in a pure stack (FORTH, PostScript) style it might be rot 3# reorder stack to get x y on top mpy add ret but might be like this in a hybrid scheme: SETSP(5) # make sp point to r5 GETSV('x') # get pointer to global $a, store at *sp++ GETSV('y') MULT GETSV('z') ADD GETSV('a') SASSIGN SAVEREG(r1)# pop pointer at *sp, and store in register 1 The problem that the hybrid scheme glosses over is the re-order the args issue that is handled by register numbers, stack addressing or FORTH/PostScript stack re-ordering. It avoids it by expensive long range global fetches - which is indeed what humans do when writing PostScript - use globals - but compilers can keep track of such mess for us. Both use the same regsters, have the same net result, but the explicit scheme requires an extra 11 numbers in the bytecode, not to mention all the extra cycles required to extract out those nunmbers from the bytecode in the first place. -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks, registers, and bytecode. (Oh, my!)
Uri Guttman [EMAIL PROTECTED] writes: NI == Nick Ing-Simmons [EMAIL PROTECTED] writes: NI The overhead of op dispatch is a self-proving issue - if you NI have complex ops they are expensive to dispatch. but as someone else said, we can design our own ops to be as high level as we want. lowering the number of op calls is the key. that loop will be a bottleneck as it is in perl5 unless we optimize it now. NI With a 16-bit opcode as-per-Uri that becomes: NIwhile (1) *(table[*op_ptr++])(); NI (Assuming we don't need to check bounds 'cos we won't generate bad code...) i dropped the 16 bit idea in favor of an extension byte code that zhong mentioned. it has several wins, no ordering issues, it is pure 'byte' code. NI One can then start adding decode to the loop: NIwhile (1) { NI op_t op = *op_ptr++; NI switch(NUM_ARGS(op)) no switch, a simple lookup table: op_cnt = op_counts[ op ] ; Myths of 21st Century Computing #1: Memory lookups are cheap Most processors only have only one memory unit and it typically has a long pipeline delay. But many have several units that can do compare etc. A lookup table may or may-not be faster/denser than a switch. A lookup may take 9 cycles down a memory pipe while ans = (op 16) ? 2 : (op 8) ? 1 : 0; might super-scalar issue in 1 cycle. Code at high level and let C compiler know what is best. C will give you a lookup if that is best. Memory ops need not be expensive if they pipeline well, but making one memory op depend on the result of another is bad idea e.g. op = *op_ptr++; arg1 = *op_ptr++; arg2 = *op_ptr++; May apear to happen in 3 cycles, as all the loads can be issued in a pipelined manner and ++s issued in parallel. While op = *op_ptr++; ans = table[op]; could take seem to 18 cycles as can't start 2nd load till 1st one completes. I have been meaning to try and prove my point with a software-pipelined dispatch loop which is fetching one op, decoding previous one and executing one before that. -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks registers
Uri Guttman [EMAIL PROTECTED] writes: NI No - you keep the window base handy and don't keep re-fetching it, NI same way you keep program counter and stack pointer handy. NI Getting NIwindow[N] NI is same cost as NInext = *PC++; NI My point is that to avoid keeping too-many things handy window NI base and stack pointer should be the same (real machine) register. if we can control that. Maybe not directly, but most compilers will keep common base registers in machine registers if you code things right. but i see issues too. i mentioned the idea of having $_ and other special vars and stuff would have their own PMC's in this register set. Why does it have to be _this_ register set - globals can go in another register set - SPARC's register scheme has global registers too. That said my guess is that $_ is usually save/restored across sub/block boundaries. dan like the idea. that doesn't map well to a window as those vars may not change when you call subs. i just don't see register windows as useful at the VM level. Call it what you will - I am arguing for an addressable stack not for windows as such. i am just saying register windows don't seem to be any win for us and cost an extra indirection for each data access. my view is let the compiler keep track of the register usage and just do individual push/pops as needed when registers run out. NI That makes sense if (and only if) virtual machine registers are real NI machine registers. If virtual machine registers are in memory then NI accessing them on the stack is just as efficient (perhaps more so) NI than at some other special location. And it avoids need for NI memory-to-memory moves to push/pop them when we do spill. no, the idea is the VM compiler keeps track of IL register use for the purpose of code generating N-tuple op codes and their register arguments. this is a pure IL design thing and has nothing to do with machine registers. at this level, register windows don't win IMO. That quote is a little misleading. My point is that UNLESS machine (real) machine registers are involved then all IL Registers are in memory. Given that they are in memory they should be grouped with and addressed-via-same-base-as other memory that a sub is accessing. (The sub will be accessing the stack (or its PAD if you like), and the op-stream for sure, and possibly a few hot globals.) The IL is going to be CISC-ish - so treat it like an x86 where you operate on things where-they-are (e.g. on the stack) add 4,BP[4] rather than RISC where you ld BP[4],X add 4,X ST X,BP[4] If registers are really memory the extra moves of a RISC scheme are expensive. What we _really_ don't want is the worst of both worlds: push BP[4]; push 4 add pop BP[4] i am thinking about writing a short psuedo code post about the N-tuple op codes and the register set design. the ideas are percolating in my brane. uri -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks registers
Uri Guttman [EMAIL PROTECTED] writes: NI i.e. NI R4 = frame[N] NI is same cost as NI R4 = per_thread[N] NI and about the same as NI extern REGISTER GlobalRegs4 NI R4 = GlobalRegs4; well, if there is no multithreading then you don't need the per_thread lookup. Well: (a) I thought the plan was to design threads in from the begining this time. (b) I maintain that cost is about the same as global variables anyway. The case for (b) is as follows: on RISC hardware R4 = SomeGlobal; becomes two instructions: loadhigh SomeGlobal.high,rp ld rp(SomeGlobal.low),R4 The C compiler will try and factor out the loadhigh instruction, leaving you with an indexed load. In most cases ld rp(RegBase.low+4),R4 is just a valid and takes same number of cycles, and there is normally a form like ld rp(rn),R4 Which allows index by variable amount. On CISC machines, then either there is an invisible RISC (e.g. Pentium) which behaves as above or you get something akin to PDP-11 where indirection reads a literal address via the program counter. move [pc+n],r4 In such cases move [regbase+n],r4 is going to be just as fast - the issue is the need for a (real machine) register to hold 'regbase'. and the window base is not accounted for. you would need 2 indirections, the first to get the window base and the second to get the register in that window. No - you keep the window base handy and don't keep re-fetching it, same way you keep program counter and stack pointer handy. Getting window[N] is same cost as next = *PC++; My point is that to avoid keeping too-many things handy window base and stack pointer should be the same (real machine) register. i am just saying register windows don't seem to be any win for us and cost an extra indirection for each data access. my view is let the compiler keep track of the register usage and just do individual push/pops as needed when registers run out. That makes sense if (and only if) virtual machine registers are real machine registers. If virtual machine registers are in memory then accessing them on the stack is just as efficient (perhaps more so) than at some other special location. And it avoids need for memory-to-memory moves to push/pop them when we do spill. -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks registers
Uri Guttman [EMAIL PROTECTED] writes: NI == Nick Ing-Simmons [EMAIL PROTECTED] writes: NI We need to decide where a perl6 sub's local variables are going NI to live (in the recursive case) - if we need a stack anyway it NI may make sense for VM to have ways of indexing the local frame NI rather than having global registers (set per thread by the way?) i made that thread point too in my long reply to dan. but indexing directly into a stack frame is effectively a register window. the problem is that you need to do an indirection through the window base for every access and that is slow in software (but free in hardware). It isn't free in hardware either, but cost may be lower. Modern machines should be able to schedule indirection fairly efficiently. But I would contend we are going to have at least one index operation anyway - if only from the thread pointer, or global base - so with careful design so that registers are at right offset from the base we can subsume the register lookup index into that. i.e. R4 = frame[N] is same cost as R4 = per_thread[N] and about the same as extern REGISTER GlobalRegs4 R4 = GlobalRegs4; -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks registers
Alan Burlison [EMAIL PROTECTED] writes: 1. When you call deep enough to fall off the end of the large register file an expensive system call is needed to save some registers at the other end to memory and wrap, and then again when you come back to the now-in-memory registers. Not a system call but a trap - they aren't the same thing (pedant mode off ;-). The register spill trap handler copies the relevant registers onto the stack - each stack frame has space allocated for this. Pedant mode accepted - and I concur. But trap handler is still significant overhead compared to just doing the moves (scheduled) inline as part of normal code. So register windows win if you stay in bounds but loose quite seriously if you have active deep calls. (My own style is to write small functions rather than #define or inline, for cache reasons - this has tended to make above show. I am delighted to say that _modern_ (Sun) SPARCs have deep enough windows even for me - but SPARCStation1+ and some of the lowcost CPUs didn't.) Alan Burlison -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: PDD: Conventions and Guidelines for Perl Source Code
Alan Burlison [EMAIL PROTECTED] writes: I strongly agree. The current macro mayhem in perl is an utter abomination, and drastically reduces the maintainability of the code. I think the performance argument is largely specious, and while abstraction is a laudable aim, in the case of perl it has turned from abstraction into obfustification. As I have said more than once before, excessive use of macros can be a performance killer. It is better to have slabs of common stuff in real function (which is cached) rather than replicated all over the place. That is the style I use in my own (whoops sorry, TI's) code and it does not seem to hurt even on X86 CISC machines. -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Re: Tying Overloading
Larry Wall [EMAIL PROTECTED] writes: Nick Ing-Simmons writes: : You really have to talk about overloading boolean context : in general. : : Only if you are going to execute the result in the normal perl realm. : Consider using the perl parser to build a parse tree - e.g. one to : read perl5 and write perl 6. This works for all expressions except : , || and ?: because perl5 cannot overload those - so : : $c = ($a b) ? $d : $e; : : calls the bool-ness of $a and in the defered execution mode of a translator : it wants to return not true/false but it depends on what $a is at run-time. : It cannot do that and is not passed $b so cannot return I think using overloading to write a parser is going to be a relic of Perl 5's limitations, not Perl 6's. I am _NOT_ using overloading to write a parser. Parse::Yapp is just fine for writing parsers. I am trying to re-use a parser that already exists - perl5's parser. I am using overloading to get at the parse tree that the _existing_ parser has produced. So I can get at perly.y's : term: ... | '!' term | term ADDOP term etc. but NOT | term ANDAND term | term OROR term | term '?' term ':' term ; I can get at the former because overload maps via newBINOP/newUNOP just fine, I cannot get at latter group because newLOGOP/newCONDOP don't do overloading. What _really_ want to do is a dynamically scoped peep-hole optimize (actually a rewrite) of the op tree - written in perl. But I can't do that, so I fake it by having sub construct () { ... } and then construct { # expression(s) here } and have construct() call the ops with the overload stuff returning a tree. These days I suppose one could use B:: to poke about in the CV Larry -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: Split PMCs
Dan Sugalski [EMAIL PROTECTED] writes: At 07:39 PM 4/19/2001 +, [EMAIL PROTECTED] wrote: Depends what they are. The scheme effectively makes the part mandatory as we will have allocated space whether used or not. Well, we were talking about all PMCs having an int, float, and pointer part, so it's not like we'd be adding anything. Segregating them out might make things faster for those cases where we don't actually care about the data. OTOH that might be a trivially small percentage of the times the PMC's accessed, so... What is the plan for arrays these days? - if the float parts of the N*100 entries in a perl5-oid AV were collected you might get packed arrays by the back door. So it depends if access pattern means that the part is seldom used, or used in a different way. As you say works well for GC of PMCs - and also possibly for compile-time or debug parts of ops but is not obviously useful otherwise. That's what I was thinking, but my intuition's rather dodgy at this level. The cache win might outweigh other losses. I'm thinking that passing around an arena address and offset and going in as a set of arrays is probably suboptimal in general, You don't, you pass PMC * and have offset embedded within the PMC then arena base is (pmc - pmc-offset) iff you need it. I was trying to avoid embedding the offset in the PMC itself. Since it was calculatable, it seemed a waste of space. But passing extra args around is fairly expensive when they are seldom going to be used. Passing an extra arg through N-levels is going to consume instructions and N * 32 bits of memory or so. If we made sure the arenas were on some power-of-two boundary we could just mask the low bits off the pointer for the base arena address. Evil, but potentially worth it at this low a level. That would work ;-) -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: PDD for code comments ????
David L . Nicol [EMAIL PROTECTED] writes: Jarkko Hietaniemi wrote: Some sort of simple markup embedded within the C comments. Hey, let's extend pod! Hey, let's use XML! Hey, let's use SGML! Hey, let's use XHTML! Hey, let's use lout! Hey, ... Either run pod through a pod puller before the C preprocessor gets to the code, or figure out a set of macros that can quote and ignore pod. The second is Yet Another Halting Problem so we go with the first? Which means a little program to depod the source before building it, or a -HASPOD extension to gcc Or just getting in the habit of writing /* =pod and =cut */ Perhaps we could teach pod that /* was alias for =pod and */ an alias for =cut ? -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: Vtables: what do we know so far?
Edwin Steiner [EMAIL PROTECTED] writes: Filipe Brandenburger wrote: [...] struct sv { vtable_sv * ptr_to_vtable; void * ptr_to_data; void * gc_data; }; [...] I don't think I can get further from here. Note that, in all examples, I didn't write the `this' pointer that every function would receive. This would correspond to the `ptr_to_data' from the struct sv. I think the `this' pointer should be the SV* (== ptr_to_vtable) so virtual functions can themselves call virtual functions on the same object. Definitely. It also allows them to change what ptr_to_data is for example. -Edwin -- Nick Ing-Simmons
Modular subsystem design (was Re: Speaking of signals...)
Filipe Brandenburger [EMAIL PROTECTED] writes: But, back to the efficiency issue, I _THINK_ the scenario I described is not inefficient. What it does differently from a monolithic system: it uses callbacks instead of fixed function calls, and it doesn't inline the functions. First, Callbacks take at most 1 cycle more than fixed function calls (is this right???), No - a memory fetch can take a long time (10s of cycles). Mostly that can be hidden by a pipeline, but branches (i.e. calls) tend to expose it more. But we are already thinking of "vtables" which are no better. because the processor must fetch the code address from an address of memory, instead of just branching to a fixed memory address. Comparing to all the code Perl uses to handle SVs and such stuff, I think 1 cycle wouldn't kill us at all! Well, inline functions _CAN_ make a difference if there are many calls to one function inside a loop, or something like this. And this _CAN_ be a bottleneck. Inline functions can also cost you - the out-of-line function may be in the cache, and the plethora of inline functions not in cache, or extra code size thrashes cache. Well, I have one idea that keeps our design modular, breaks dependencies between subsystems (like that of using async i/o system without having to link to the whole thing), and achieves efficiency through inline functions. We could develop a tool that works in the source code level and does the inlining of functions for us. I mean a perl program that opens the C/C++ source of the kernel, looks for pre-defined functions that should be inlined, and outputs processed C/C++ in ``spaghetti-style'', very messy, very human-unreadable, and very efficient. And already discussed ;-) -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: perl IS an event loop (was Re: Speaking of signals...)
Simon Cozens [EMAIL PROTECTED] writes: On Fri, Jan 05, 2001 at 11:42:32PM -0500, Uri Guttman wrote: SC 5x slowdown. not if you just check a flag in the main loop. you only check the event system if you have pending events or signals, etc. the key is not checking all events on each pass thru the loop. Which is exactly what Chip did in his safe-signals patch. 33% slowdown. I don't believe it - can we add a stub test and bench mark it? -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: perl IS an event loop (was Re: Speaking of signals...)
Dan Sugalski [EMAIL PROTECTED] writes: At 01:02 PM 1/6/01 -0500, Uri Guttman wrote: that is what i would expect form a simple flag test and every N tests doing a full event poll. and even up to 5-10% slowdown i would think is a good tradeoff for the flexibilty and ease of design win we get in the i/o and event guts. but then, i have always traded off speed for flexibility and ease. hey, so has perl! :) Not always. :) The flexibility really does need to balance out the speed hit. (If Nick wasn't in the middle of rewriting the whole IO system, I'd probably be assaulting sv_gets to make up for the speed hit I introduced way back with the record reading code...) Nick has yet to touch sv_gets() - partly 'cos it was too scary to mess with - so you can if you like ;-) -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: perl IS an event loop (was Re: Speaking of signals...)
Bart Lateur [EMAIL PROTECTED] writes: Apropos safe signals, isn't it possible to let perl6 handle avoiding zombie processes internally? What use does having to do wait() yourself, have anyway? Valid point - perl could have a CHLD handler in C and stash away returned status to pass to wait() when it did get called. -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: Speaking of signals...
Uri Guttman [EMAIL PROTECTED] writes: but the question remains, what code triggers a signal handler? would you put a test in the very tight loop of the the op dispatcher? Not a test. The C level signal handler just fossicks with the variables that very tight loop is using. n But if "runops" looked like: n while (PL_op = PL_next_op) n { PL_op- perform(); # assigns PL_next_op; n } n (Which is essentially FORTH-like) then there is little to get in a mess. n The above is simplistic - we need a way to "disable interrupts" too. and where is the event test call made? It isn't. PL_next_op is set by C signal handler. In practice I suspect we need the test : while (PL_op = (PL_sig_op) ? PL_sig_op : PL_next_op) { PL_op-perform; } or somehow the next op delivered will be the next baseline op or the dispatch check op. that is basically the same as my ideas above, just a different style loop. What I am trying to get to is adding minimal extra tests to the tight loop. We probably need at least ONE test in the loop - let us try and make that usable for all the "abnormal" cases. uri -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: Anyone want to take a shot at the PerlIO PDD?
Dan Sugalski [EMAIL PROTECTED] writes: Would someone like to take a crack at a PDD for the PerlIO system? It doesn't need to be particularly fancy (nor complete) to start with, but having one will give us a place to work from. (Waiting for me to spec it out may take a while...) I am willing to cast bleadperl5's PerlIO into the form of a _draft_ PDD for perl6 - i.e. "this is what it does now", not "this is what it should do". Then we can discuss it here some more. -- Nick Ing-Simmons
Re: standard representations
Dan Sugalski [EMAIL PROTECTED] writes: That's fine. I was thinking of smaller processors that might be used in embedded apps and such. (I'm also not sure what's the most efficient integer representation on things like the ARM microprocessors are) ARM7/ARM9 are both 32-bit MIPS has both 32-bit and 64-bit variants. That's good. Though do either of them have 16-bit data busses? Not at the CPU no - what happens at chip boundary depends on what customer asks for. The 68XXX in Palm-Pilots are the issue there. DSPs are more messy. That's probably a bit too specialized a piece of hardware to worry about. Unlss things have changed lately, they're not really general-purpose CPUs. Some of them are. It is micro-controllers that you have to worry about Yeak, I know a lot of the old 8 and 16 bit chips are in use as control devices places. Those are the ones I'm thinking about. (Not that hard, but I don't want to rule them out needlessly) I suspect that any that are up to running anything approximating perl will have 32-bit ops in a library in any case. -- Nick Ing-Simmons
Re: standard representations
Dan Sugalski [EMAIL PROTECTED] writes: Anyone know of a good bigint/bigfloat library whose terms are such that we can just snag the source and use it in perl? There was some traffic on gcc list recently about a GNU one (presumably GPL only). I don't really care to write the code for division, As I recall Knuth has something on it. I know that some hardware FPUs do division (N/M) by Newton-Raphson expansion of 1/M and then do N*(1/M). let alone the transcendental math ops... TI's sources for those site some book or other. The snag with those and sqrt() etc. is that the published algorithms "know" how many terms of power series are needed to reach (say) IEEE-754 "double". Thus a "big float" still needs to decide how precise it is going to be or atan2(1,1)*4 (aka PI) is going to take a while to compute... -- Nick Ing-Simmons
Re: standard representations
Dan Sugalski [EMAIL PROTECTED] writes: At 01:05 PM 12/29/00 +, Nick Ing-Simmons wrote: Dan Sugalski [EMAIL PROTECTED] writes: I'm reasonably certain that all platforms that perl will ultimately run on can muster hardware support for 16-bit integers. Hmm, most modern RISCs are very bad at C-like 16-bit arithmetic - they have a tendency to widen to 32-bits. That's fine. I was thinking of smaller processors that might be used in embedded apps and such. (I'm also not sure what's the most efficient integer representation on things like the ARM microprocessors are) ARM7/ARM9 are both 32-bit MIPS has both 32-bit and 64-bit variants. DSPs are more messy. It is micro-controllers that you have to worry about -- Nick Ing-Simmons
Re: standard representations
Dan Sugalski [EMAIL PROTECTED] writes: Strings can be of three types--binary data, platform native, and UTF-32. No, we are not messing around with UTF-8 or 16, nor are we messing with EBCDIC, shift-JIS, or any of that stuff. I don't understand that in the light of supporting "platform native". That could easily be any of those as you note below. So what operations are supported on "platform native" strings? Are we at the mercy of locale's idea of upper/lower case, sort order etc.? Strings can be stored internally that way (and the native form might be one of them) but as far as the interface is concerned we have only three. Yes, this does mean if we mess with strings in UTF-8 format on a non-UTF-8 system they'll need to be fed out in UTF-32. It's bigger, but we can deal. -- Nick Ing-Simmons
Re: standard representations
Dan Sugalski [EMAIL PROTECTED] writes: I'm reasonably certain that all platforms that perl will ultimately run on can muster hardware support for 16-bit integers. Hmm, most modern RISCs are very bad at C-like 16-bit arithmetic - they have a tendency to widen to 32-bits. I also expect that they can all muster at least software support for 32-bit integers. However The issue isn't support, it's efficiency. Since we're not worrying about loss of precision (as we will be upconverting as needed) the next issue is speed, and that's where we want things to be in a platform convenient size. I honestly can't think of any reason why the internal representation of an integer matters to the outside world, but if someone can, do please enlighten me. :) I can't think of anything except the range that is affected by the representation. -- Nick Ing-Simmons
Re: standard representations
Dan Sugalski [EMAIL PROTECTED] writes: BigInt and BigFloat are both pure perl, and as such their speed leaves a *lot* to be desired. Fixing that (at least yanking some of it to XS) has been on my ToDo list for a while, but other stuff keeps getting in the way... :) My own "evolutionary" view of things is that if we did XS versions of BigInt and BigFloat for perl5 we would learn some issues that might affect Perl6. i.e. the vtable entries for "ints" may be influenced by their use as building blocks for "floats". For example the choice of radix in the BigInt case - should it be N*16-bits or should we try and squeeze 32-bits - or to avoid issues with sign should that be 15 or 31? (If we assume we use 2's complement then LS words are treated as unsigned only MS word has sign bit(s).) BigFloat could well build on BigInt for its "mantissa" and have another int-of-some-kind as its exponent. We don't need to pack it tightly so we should probably avoid IEEE-like hidden MSB. The size of exponent is one area where "known range of int" is important. -- Nick Ing-Simmons
Re: mixed numeric and string SVs.
David Mitchell [EMAIL PROTECTED] writes: 2. Each SV has 2 vtable pointers - one for it's numeric representation (if any), and one for its string represenation (if any). Flexible, but may require an extra 4/8 bytes per SV. It may not be terrible. How big is the average SV already anyway? True, but I've just realised a complication with my suggestion. If there are a multiple vtable ptrs per SV, which type 'owns' the SV carcass, Perl owns the carcass. Each vtable would have its own payload portion and be responsible for its destruction and cleanup. This is classical "multiple inheritance" scheme. and is responsible for destruction, and has permission to put its own stuff in the payload area etc? I think madness might this way lie. So here's a modified suggestion. Rather than having 2 vtable ptrs per scalar, we allow a string type to contain an optional pointer to another subsidiary SV containing its numeric value. (And vice versa). That would work too. Then for example the getint() method for a utf8 string type might look like: utf8_getint(SV *sv) { if (sv-subsidiary_numeric_sv == NULL) { sv-subsidiary_numeric_sv = Numeric-new(aton(sv-value)); } return sv-subsidiary_numeric_sv-getint(); } (uft8 stringgy methods that alter the string value of the SV are then responsible for either destroying the subsidiary numeric SV, or for making sure it's value gets updated, or for setting a flag warning that it's value needs recalculating.) Similarly, the stringy methods for numeric types are wrappers that optionally create a subsidiary string SV, then pass the call onto that object. Or to avoid the conditional each time, there could be 2 vtables for each type, containing 'with subsidiary' and 'without subsidiary' methods; the role of the latter being to create the subsidiary SV and update the type of the main SV to the 'with subsidiary' type. -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: String representation
Nicholas Clark [EMAIL PROTECTED] writes: where it is possible to get "smart" when one arg is a "special case" of the other. And similarly numbers must be convertable to "complex long double" or what ever is the top if the built-in tree ? (NV I guess - complex is over-kill.) It is the how do we do the generic case that worries me. Maybe this is a digression, but it does suggest that there may not be 1 top to the tree (at least for builtin numbers). Which may also hold for strings. Which is why it worries me. If I invent a new number type (say), what vtable entries must it have to allow all the generic things to function? Given a choice between NV/UV/IV possibles on what basis do we choose one branch over the other? We old'ns need people that don't know "it can't be done" to tell us how to do it - but we reserve the right to say "we tried that it didn't work" too. ^ because Nicholas Clark -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: mixed numeric and string SVs.
David Mitchell [EMAIL PROTECTED] writes: Has anyone given thought to how an SV can contain both a numeric value and string value in Perl6? Given the arbitrary number of numeric and string types that the vatble scheme of Perl6 support it will be unviable to to have special types for all permuations (eg, utf8_nv, unicode32_iv, ascii_bitint, ad nauseum). It seems to me the following options are poossible: 1. We no longer save conversions, so $i="3"; $j+=$i for (...); does an aton() or similar each time round the loop Well just the 1st time - then it is a number... 2. Each SV has 2 vtable pointers - one for it's numeric representation (if any), and one for its string represenation (if any). Flexible, but may require an extra 4/8 bytes per SV. This is my favourite. 3. We decree that all string to numeric conversions should return a particular numeric type (eg NV), and that all numeric to string conversions should similary convert to a fixed string type (eg utf8). (Although I'm not sure that really helps.) I can't see how that helps. -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: String representation
e the mess, or *might* increase it, depending on how its done. If it "depends" then it isn't strictly "orthogonal". One final thing - I'm fairly new to this game (I thought the start of Perl6 would be a good time to get involved, without having to understand the horrors of perl5 internals in depth), which means I run more of a risk than most of speaking from my derierre. So far I have been reluctant to put forward any really substantial suggestions as to how to handle all this stuff, mainly for fear of irritating people who know what they are talking about, and who have to take time out to explain to me why I'm wrong! On the other hand, I do seem to have ended up taking a lot about this subject on perl6-internals!! So, should I have the courage of my convictions and let rip, or should I just leave this to wiser people? Answers on a postcard, please We old'ns need people that don't know "it can't be done" to tell us how to do it - but we reserve the right to say "we tried that it didn't work" too. -- Nick Ing-Simmons
Re: String representation
David Mitchell [EMAIL PROTECTED] writes: Personally I feel that that string part of the SV API should include most (if not all) string functions, including regex matching and substitution. What are string functions in your view? m// s/// join() substr index lc, lcfirst, ... | ~ ++ vec '.' '.=' It rapidly gets out of hand. Why not eval "$string" as well ? ;-) then in the limit perl can just become eval scalar(ARGV); Seriously - I think we need to considr the original question "What is the representation" based on perl5 hindsight, then think what operations we want to perform on it, then divide those into the ones which make sense to be "methods" (vtable entries) of string, those that are part of string API, and those which are just ops messing with strings. That way way there can be multiple regex implementations to handle different cases (eg fast one(s) for fixed width ASCII, UTF-32 etc, and a slow horrible one for variable-length UTF-8, etc). Of course perl itself could provide a default regex engine usable by all string types, but implementors would then be free to add variants for custom string types. I would argue one does that by making the regex API more modular. -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: String representation
Simon Cozens [EMAIL PROTECTED] writes: So, before we start even thinking about what we need, it's time to look at the vexed question of string representation. How do we do Unicode without getting into the horrendous non-Latin1 cockups we're seeing on p5p right now? Well - my theorist's answer is that everything is Unicode - like Java. As I pointed out on p5p even EBCDIC machines can use that model - but the downside is that ord('A') == 65 which will breaks backward compatibility with EBCDIC scripts. If perl5.7+ EBCDIC continues down its alternate road and we need to be able to translate perl5 - perl6 I strongly suspect that perl6 cannot use the "java-oid" model either as the programmer's intent will not be obvious enough to auto-translate. I still haven't grasped what the current EBCDIC "model as seen by perl programmer" _is_. Larry suggested aeons ago that everything is an array of numbers, and Perl shouldn't care what those numbers represent. But at some point, it has to, and that means things have to be tagged with their character repetoires and encodings. Tagging a string with a repertoire and encoding is horrible - you are aware of the trickyness of even getting the SvUTF8 bit "right". To have a general representation carried around we need a pointer rather just a bit and we cannot say if (SvUTF8(sv)) we have to say if (SvENCODING(sv)-some_predicate) e.g. if (SvENCODING(sv_a) != SvENCODING(sv_b)) { if (SvENCODING(sv_a)-is_superset_of(SvENCODING(sv_b)) { sv_upgrade_to(sv_b,SvENCODING(sv_a)); } elsif if (SvENCODING(sv_b)-is_superset_of(SvENCODING(sv_a)) { sv_upgrade_to(sv_a,SvENCODING(sv_b)); } else { Encoding *x = find_superset_encoding(SvENCODING(sv_a),SvENCODING(sv_b)) sv_upgrade_to(sv_a,x); sv_upgrade_to(sv_b,x); } } Personally I would not use such a beast The only sane compromise I can imagine is close to what we have at the moment with maybe a few extra special cases in the "flags" bits: ASCII only (0..7f) Native-single-byte (iso8859-x, IBM1047) wchar_t UTF-8 UNICODE There needs to be a hierachy of _repertoires_ such that: ASCII is subset of Native is subset of wchar_t is subset of UNICODE. The "Native-single-byte" would have one - global-to-interpreter encoding object - not just iso8859-1 - basically the one that LC_CTYPE gives the "right answers for" - though how the "ยฃ!$^ยฌ!*% one is supposed to find that out is beyond me - so we would presumably invert that and use the Unicode CTYPE-oid stuff to do isALPHA() etc. -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: String representation
David Mitchell [EMAIL PROTECTED] writes: Personally I would not use such a beast But with different encodings implemented by different SV types - each with their own vtable - surely most of this will "come out in the wash", by the correct method automatically being called. I thought that was the big selling point of vtables :-) (Or to put it another way - is the debate about handling multiple string encodings really just the same debate as the handling of multiple numeric types (but harder...) ?) It is exactly the same as the enormous_int ** complex_rational problem. if ("N{gamma}".title_case(join($klingon,@welsh)) =~ /$urdu/) who's operators get called ? -- Nick Ing-Simmons
Re: String representation
Nicholas Clark [EMAIL PROTECTED] writes: On Fri, Dec 15, 2000 at 11:18:00AM -0600, Jarkko Hietaniemi wrote: As painful as it may sound (codingwise) I would urge to spare some thought to using (internally) UTF-32 for those encodings for which UTF-8 would be *longer* than the UTF-32 (mainly the Asian scripts). most CPUs can load a 32 bit quantity in 1 machine instruction most CPUs would take 2 or 3 machine instructions to load 2 or 3 bytes of variable length encoding, and I'd guess that on most RISC CPUs those three instructions take three times the space, Okay so far. (and take 3 times the single load instruction) Almost certainly more than the single load, but much less than 3 due to cache effects. And that's ignoring the code to bit shuffle those bytes that make up the character. So it may be more total space efficient to use 32 bits for data. And although it feels like we'll be shifting 32 bits of data round per character instead of 8-40 with an average less than 32, it might still take longer because we're doing it less efficiently. My big worry is that "strings" are would fill the data cache much more quickly. Just a passing thought. Extrapolated up from 1 RISC CPU I know quite well. Nicholas Clark -- Nick Ing-Simmons
Re: String representation
David Mitchell [EMAIL PROTECTED] writes: Nick Ing-Simmons [EMAIL PROTECTED] wrote: What are string functions in your view? m// s/// join() substr index lc, lcfirst, ... | ~ ++ vec '.' '.=' It rapidly gets out of hand. Perhaps, but consider that somewhere within the perl internals there have to be functions which implement all these ops anyway. If we provide vtable slots for all these functions and just fill most of the slots with pointers to the 'default' Perl implementation, we havent really lost anything, except possibly a slight delay due to the extra indirection which that may be compensated for elsewhere). On the other hand, we have gained the ability to replace the default implementation with something more efficent where it suits us. I have just been through exactly that process with the PerlIO stuff. So I hope you will not take offence when I say that your observation above is simplistic. The problem is "what are the (types of) the arguments passed to the functions?" - the existing code will be expecting its args in a particular form. So your wonderous new function must accept exactly those args and types - and convert them as necessary before becoming more efficient. So to get any win the args/types of all the functions has to be designed with pluggable-ness in mind from the outset. At best this means taking an indirection hit for all the args as well as the function (this is what PerlIO does - PerlIO is now essentially a FILE ** rather than a FILE *). At worst we have to write a "worst case" override entry for each op and then work what it needs back - this is exemplified by PerlIO_getpos() the "position" arg had to stop being an Fpos_t and become an SV * so that stdio could stuff an Fpos_t in it, but a transcoding layer could put the Fpos_t, and the escape-state and partial characters in as well. Take the example of substr() - if this is a standalone function, then it has to work without reference to any of the internals of its args, and thus has to rely on extracting a 'standard' representation of the string value from the SV in order to operate upon it. This then implies messiness of coding and inefficiency, with all the unicode hell that infects perl5 re-appearing. If substr() were a per-type op, then the messy details of UTF8 would lie almost completely within the internal implementation of that datatype. True, but the messy details would now occur multiple times, as soon as substr_utf8 exists then _ALL_ the other string ops _must_ be overridden as well because nothing but string_utf8 "class" knows what is going on. In fact, I would argue that in general most if not all the operations currently performed by pp_* should have vtable equivalents, both for numeric and string types (including unary ops, mutators, binops etc etc). Hmm - that is indeed a logical position. Seriously - I think we need to considr the original question "What is the representation" based on perl5 hindsight, then think what operations we want to perform on it, then divide those into the ones which make sense to be "methods" (vtable entries) of string, those that are part of string API, and those which are just ops messing with strings. If an "op messing with strings" might be able to do a faster job given access to the internals of that string type, then I'd argue that that op should be in the vtable too. I can see your position. perl6 = Union_of(I32_perl, I64_perl, float_perl, double_perl, long_double_perl, ASCII_perl, UTF8_perl, ShiftJis_perl, Complex_rational_perl, right_to_left_perl, ) or class perl { virtual SV *add(SV *,SV *); ... virtual SV *y(SV *,SV *); } The snag here is that the volume of code explodes and gets splattered all over the sub-classes. So to fix a bug in the '+' operator (pp_plus) one has to go visit lots of places - but, presumably, the bug will only be in one of them. If this is to fly (and I am not saying it cannot), then the "multiple despatch" issue needs to have a clean process so that it is clear what happens if someone writes: my $complex_rational = $urdu_string / sqrt(-$big_integer); The string needs to get converted to a number knowing which characters are digits and what the Urdu for 'i' is. The big integer needs to get negated (no sweat) then someone's sqrt() gets called and had better not barf on the -ve value, then complex_rational can do the right thing. In other words - string ops on strings of uniform type, math ops on well understood hierachies etc. are all easy enough - it is the combinations that get very messy very very quickly. -- Nick Ing-Simmons
Re: String representation
Jarkko Hietaniemi [EMAIL PROTECTED] writes: On Mon, Dec 18, 2000 at 03:21:05PM +, Nick Ing-Simmons wrote: Simon Cozens [EMAIL PROTECTED] writes: So, before we start even thinking about what we need, it's time to look at the vexed question of string representation. How do we do Unicode without getting into the horrendous non-Latin1 cockups we're seeing on p5p right now? Well - my theorist's answer is that everything is Unicode - like Java. That would be nice, yes. As I pointed out on p5p even EBCDIC machines can use that model - but the downside is that ord('A') == 65 which will breaks backward compatibility with EBCDIC scripts. Maybe we need $ENV{PERL_ENCODING} to control ord() and chr(), too? That was my suggestion last week some time - though not stated as clearly! Tagging a string with a repertoire and encoding is horrible - you are aware Indeed. We have had a very rough ride trying to get just two encodings to play well together, trying to support more simultaneously would be pure combinatorial masochism. I say we should strive for converting everything to/from one agreed-upon internal encoding. Yes, this is somewhat counter to the idea 'no preferred internal encoding'. After pondering about the issue I have come around to "Oh, yes, there should be one preferred internal encoding.", otherwise we banish ourselves to much nashing of the teeth. Off-hand, I think it's only when there would be information loss when the One True Encoding conversion shouldn't be done. What's the OTE, then? Well, UTF-16 or UTF-32, I guess. The redeeming features of UTF-8, that it is 1:1 for ASCII, and also compact for ASCII, frankly are getting rather thing in my eyes. But not in mine (yet) - but then IO is just throwing gobs of bytes about and regexps are introspecting. (And Encode has to handle variable-length multi-byte gunk anyway.) -- Nick Ing-Simmons
Re: Opcodes (was Re: The external interface for the parser piece)
David Mitchell [EMAIL PROTECTED] writes: I think this this boils down to 2 important questions, and I'd be interested in hearing people's opinions of them. 1. Does the Perl 6 language require some explicit syntax and/or semnatics to handle multiple and user-defined numeric types? Eg "my type $scalar", "$i + integer($r1+$r2)" and so on. That is a Language and not an internals issue - Larry will tell us. But I suspect the answer is that it should "work" without any special stuff for simple perl5-ish types - because you need to be able to translate 98% of 98% of perl5 programs. So we should start from the premise "no" and see where we get ... 2. If the answer to (1) is yes, is it possible to decide what the numeric part of the vtable API should be until the details of (1) has been agreed on? I supect the answers are yes and no. I suspect the answers are "no" and (2) is eliminated as "dead code" ;-) Dave. -- Nick Ing-Simmons
Re: The external interface for the parser piece
Nicholas Clark [EMAIL PROTECTED] writes: We're trying to make this an easy embedding API. Yes, and we are in danger of "premature optimization" of the _interface_. What we need to start with is a list of "what we need to know" - they may as well be separate parameters at this point - then we can decide how best to group them and provide wrapper(s) that call the zillion parameter version. If there turns out to be only one sensible wrapper then it can become _the_ interface. -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: The external interface for the parser piece
Tom Hughes [EMAIL PROTECTED] writes: In message [EMAIL PROTECTED] Dan Sugalski [EMAIL PROTECTED] wrote: At 10:42 AM 11/29/00 +, Nick Ing-Simmons wrote: FILE * is not a good idea. PerlIO * is fine. The problem with that is we're potentially getting the filehandle from something that isn't perl. Or so my thinking went at the time. Right now I'm thinkng that I need to rethink things. That was my point. The Parser API should stick to PerlIO * - which is an abstract interface. How that interface gets provided is none of the _parser's_ business. There is another side to this - perl itself (particularly on Win32 or other places where stdio is "broken") may not have a FILE * to give you - it may only have PerlIO *. That shouldn't matter so long as there's a simple way to create a PerlIO * from a FILE * or whatever. Bleadperl work on PerlIO is teaching that it is not necessarily "simple" to convert one to the other. One can wrap a FILE * inside a PerlIO simply enough, provided that the provider then promisses not to touch it in anyway while perl is messing with it, but the FILE *-ness gets exposed. For example there are issues with FILE *'s 'textmode' and PerlIO's crlf layer fighting. Unless we inherit perl5's twin-IO * concept (which I would not recommend at this stage) there are also issues with bi-directional things like sockets. It is (currently) much better to open a PerlIO * from the outset, either from a pathname, or a low level "file descriptor" (what a low-level descriptor is on non-UNIX is work in progress). Now we should be able to clean that up some (even in perl5) but we don't want to expose all the mess to the _parser_ API. If we say FILE * we _may_ have to say - FILE *, open in binmode, not line buffered, not to a socket on Win32, ... and a whole host of other gunk. And then (presumably) inside the parser wrapper do the right thing to turn it into a PerlIO * so we can use UTF8, CRLF, encode/decode etc. Better (IMHO) to do that _outside_ the parser API under another committee's juristiction. That's probably something that needs to be specific to different language bindings - if you're embedding perl in C++ you probably have an iostream, and if you're embedding in Java you'll have a Java stream object. In each case you'll want an easy way to create a PerlIO object from that. Why not export the PerlIO API and have language call that? But if not possible then we can write parse_FILE(FILE *x,...) which takes the FILE *, wraps in in a PerlIO * calls the generic Parser unwraps can cleans up and returns. Tom -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: To get things started...
Bart Lateur [EMAIL PROTECTED] writes: But what if you choose wrong, forgat a really important one, and this instruction gets a multibyte representation? We're stuck with it forever...? I have had some thoughts on "dynamic opcodes", where the meaning of opcode bytes needn't be fixed, but can be dynamically assigned, depending on how often they occur (for example). A bit like how a Huffman compressor may choose shorter representations for the most occurring byte patterns. This is just like HW processor opcodes. x86 has lasted so well because the initial guess at the short/common opcodes was not too bad. But the escape bytes are getting out of hand now... -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: Perl Implementation Language
Tom Hughes [EMAIL PROTECTED] writes: What I'd like to see us avoid is the current situation where trying to examine the value of an SV in the debugger is all but impossible for anybody other than a minor god. What is so hard about: gdb call Perl_sv_dump(sv) ??? Tom -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: A tentative list of vtable functions
Nathan Torkington [EMAIL PROTECTED] writes: Dan Sugalski writes: It's possible, for example, for a tied/overloaded/really-darned-strange variable to look true but still be false. If you do: $foo = $bar || $baz; and both $bar and $baz are objects, the 'naive' way is to make $foo be $bar. But it's distinctly possible that $bar really should be treated as a false value and $baz be used instead. Why? Dunno. Serious hand-waving here. (And yes, I know that's a danger sign... :) But I don't see any reason to preclude the possibility. You can do that right now in perl5, by using overload.pm and supplying a 'bool' method. In practice both Damian and I have been bitten by inability to overload || and - you can indeed pick which side is kept but you cannot make it keep both. So "defered" action is not possible. I can make $a + $b return bless ['+',$a,$b],'OperatorNode' but you cannot get $a $b to produce bless ['',$a,$b],'OperatorNode' whatever you do. -- Nick Ing-Simmons
Re: RFCs for thread models
Steven W McDougall [EMAIL PROTECTED] writes: 1. All threads execute the same op tree Consider an op, like fetch(b) If you actually compile a Perl program, like $a = $b and then look at the op tree, you won't find the symbol "$b", or "b" anywhere in it. But it isn't very far away (at least for lexicals) ;-) The fetch() op does not have the name of the variable $b; rather, it holds a pointer to the value for $b. It holds and index into the scratch-pad. Subs have scratch-pads which are cloned as needed during recursion etc. If each thread is to have its own value for $b, then the fetch() op can't hold a pointer to *the* value. Each thread's view of the sub has its own scratch-pad - value is at same index in each. -- Nick Ing-Simmons
Re: Event model for Perl...
Grant M. [EMAIL PROTECTED] writes: I am reading various discussions regarding threads, shared objects, transaction rollbacks, etc., and was wondering if anyone here had any thoughts on instituting an event model for Perl6? I can see an event model allowing for some interesting solutions to some of the problems that are currently being discussed. Yes - Uri has started [EMAIL PROTECTED] to discuss that stuff. Grant M. -- Nick Ing-Simmons
Re: A tentative list of vtable functions
Ken Fox [EMAIL PROTECTED] writes: Short circuiting should not be customizable by each type for example. We are already having that argument^Wdiscussion elsewhere ;-) But I agree variable vtables are not the place for that. -- Nick Ing-Simmons
Re: RFC 178 (v2) Lightweight Threads
Alan Burlison [EMAIL PROTECTED] writes: Nick Ing-Simmons wrote: The tricky bit i.e. the _design_ - is to separate the op-ness from the var-ness. I assume that there is something akin to hv_fetch_ent() which takes a flag to say - by the way this is going to be stored ... I'm not entirely clear on what you mean here - is it something like this, where $a is shared and $b is unshared? $a = $a + $b; because there is a potential race condition between the initial fetch of say $a and the assignment to it? My response to this is simple - tough. That is mine too - I was trying to deduce why you thought op tree had to change. I can make a weak case for $a += $b; Expanding to a-vtable[STORE](DONE = 1) = a-vtable[FETCH](LVALUE = 1) + b-vtable[FETCH](LVALUE = 0); but that can still break easily if b turns out to be tied to something that also dorks with a. -- Nick Ing-Simmons
Re: RFC 130 (v4) Transaction-enabled variables for Perl6
Bart Lateur [EMAIL PROTECTED] writes: On Wed, 06 Sep 2000 11:23:37 -0400, Dan Sugalski wrote: Here's some high-level emulation of what it should do. eval { my($_a, $_b, $c) = ($a, $b, $c); ... ($a, $b, $c) = ($_a, $_b, $_c); } Nope. That doesn't get you consistency. What you need is to make a local alias of $a and friends and use that. My example should have been clearer. I actually intended that $_a would be a variable of the same name as $a. It's a bit hard to write currently valid code that way. Second attempt: eval { ($a, $b, $c) = do { local($a, $b, $c) = ($a, $b, $c); #or my(...) ... # code which may fail ($a, $b, $c); }; }; So the final assignment of the local values to the outer scoped variables will happen, and in one go, only if the whole block has been executed succesfully. So what is wrong with (if you mean that) saying: eval { my($_a, $_b, $_c) = ($a, $b, $c); ... lock $abc_guard; ($a, $b, $c) = ($_a, $_b, $_c); } Then no one has to guess what is going on? But what do you do if $b (say) is tied so that assign to it needs a $abc_guard lock in another thread for assign to complete? i.e. things get hairy in the "final assignment". I would simply block ALL other threads while the final group assignment is going on. This should finish typically in a few milliseconds. So we "only" stall the other CPUs for a few million instructions each ;-) It also means that if we're including *any* sort of external pieces (even files) in the transaction scheme we need to have some mechanism to roll back changes. If a transaction fails after truncating a 12G file and writing out 3G of data, what do we do? That does not belong in the kernel of a language. All that you may expect, is transactions on simple variables; plus maybe some hooks to attach external transaction code (transactions on files etc) to it. A simple "create a new file, and rename to the old filename when done" will usually do. I am concerned that this is making "simple things easyish, BUT hard things impossible". i.e. we have a scheme which will be hard to explain, will only cover a few fairly uninteresting cases, and get in the way of doing it "properly". -- Nick Ing-Simmons
Re: RFC 178 (v2) Lightweight Threads
Chaim Frenkel [EMAIL PROTECTED] writes: What tied scalar? All you can contain in an aggregate is a reference to a tied scalar. The bucket in the aggregate is a regular bucket. No? I tied scalar is still a scalar and can be stored in a aggregate. Well if you want to place that restriction on perl6 so be it but in perl5 I can say tie $a[4],'Something'; Indeed that is exactly how tied arrays work - they (automatically) add 'p' magic (internal tie) to their elements. Tk apps to this all the time : $parent-Lable(-textvariable = \$somehash{'Foo'}); The reference is just to get the actual element rather than a copy. Tk then ties the actual element so it can see STORE ops and up date label. -- Nick Ing-Simmons
Re: RFC 130 (v4) Transaction-enabled variables for Perl6
Dlux [EMAIL PROTECTED] writes: | I've deemed to be "too complex".) (Also note that I'm not a | database | guru, so please bear with me, and don't ask me to write the code | :-) Implementing threads must be done in a very clever way. It may be put in a shared library (mutex handling code, locking, etc.), but I think there are more clevery guys out there who are more competent in this, and I think it is covered with some other RFCs... If amazingly clever threads handling is a requirement of this RFC then it is probably doomed. Multi-processing needs detailed explicit specifications to be done right - not vague requests. I also don't like the overhead, that's why I made the "simple" mode default (look at the "use transaction" pragma again...). This means NO overhead, Not none, perhaps minimal ;-) - it has at least got to be looking at something pragma can set. no locking between threads: this can be used in single-thread or multi-process environment. Other modes CAN switch on locking functions, but this is not default! If you implement that intelligently (separated .so for the thread handling), then it means minimal overhead (some more callback call, and that's all). I would need to understand just where the thread hooks need to go. So far my non-detailed reading suggests that the hooks are pretty fundamental. -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: RFC 178 (v2) Lightweight Threads
Chaim Frenkel [EMAIL PROTECTED] writes: "JH" == Jarkko Hietaniemi [EMAIL PROTECTED] writes: JH Multithreaded programming is hard and for a given program the only JH person truly knowing how to keep the data consistent and threads not JH strangling each other is the programmer. Perl shouldn't try to be too JH helpful and get in the way. Just give user the bare minimum, the JH basic synchronization primitives, and plenty of advice. The problem I have with this plan, is reconciling the fact that a database update does all of this and more. And how to do it is a known problem, its been developed over and over again. Yes - by the PROGRAMMER that does the database access code - that is far higher level than typical perl code. If all your data lives in database and you are prepared to lock database while you get/set them. Sure we can apply that logic to making statememts coherent in perl: while (1) { lock PERL_LOCK; do_state_ment unlock PERL_LOCK; } So ONLY 1 thread is ever _in_ perl at a time - easy! But now _by constraint_ a threaded perl program can NEVER be a performance win. The reason this isn't a pain for databases is they have other things to do while they wait ... -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: RFC 178 (v2) Lightweight Threads
Chaim Frenkel [EMAIL PROTECTED] writes: Some series of points (I can't remember what they are called in C) Sequence points. where operations are consider to have completed will have to be defined, between these points operations will have to be atomic. No, quite the reverse - absolutely no promisses are made as to state of anything between sequence points - BUT - the state at the sequence points is _AS IF_ the operations between then had executed in sequence. So not _inside_ these points the sub-operations are atomic, but rather This sequence of operations is atomic. The problem with big "atoms" is that it means if CPU A. is doing a complex atomic operation. the CPU B has to stop working on perl and go find something else to do till it finishes. chaim -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: Profiling
[EMAIL PROTECTED] writes: Anyone surprised by the top few entries: Nope. It looks close to what I saw when I profiled perl 5.004 and 5.005 running over innlog.pl and cleanfeed. The only difference is the method stuff, since neither of those were OO apps. The current Perl seems to spend most of its time in the op dispatch loop and in dealing with internal data structures. What initially surprised me is why the op-despatch loop spends so long in 'self' code when there is so little of it. My assumption is this is where we see the "cache miss" time. -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: A tentative list of vtable functions
Dan Sugalski [EMAIL PROTECTED] writes: is_equal (true if this thing is equal to the parameter thing) is_same (True if this thing is the same thing as the parameter thing) is_equal in what sense? (String, Number, ...) and how is is_same different from just comparing addresses of the things? -- Nick Ing-Simmons
Re: RFC 146 (v1) Remove socket functions from core
David L . Nicol [EMAIL PROTECTED] writes: Nick Ing-Simmons wrote: We need to distinguish "module", "overlay", "loadable", ... if we are going to get into this type of discussion. Here is my 2ยข: Module - separately distributable Perl and/or C code. (e.g. Tk800.022.tar.gz) Loadable - OS loadable binary e.g. Tk.so or Tk.dll Overlay - Tightly coupled ancillary loadable which is no use without its "base" - e.g. Tk/Canvas.so which can only be used when a particular Tk.so has already be loaded. I know I've got helium Karma around here these days but I don't like "overlay" it is reminiscent of old IBM machines swapping parts of the program out because there isn't enough core. Which is exactly why I chose it - the places these things makes sense are on little machines where memory is a premium. Linux modules have dependencies on each other and sometimes you have to load the more basic ones first or else get symbol-undefined errors. So why not follow that lead and call Overlays "dependent modules." A. Name is too long. B. That does not have same "feel" as what we have. If a dependent module knows what it depends on, that module can be loaded on demand for the dependent one. But - like old-style overlays our add-ons are going to be loaded on need by the parent and only depend on the parent. e.g. perl discovers it needs to getpwuid() do it loads the thing that has those functions. We are not going to be in the middle of getpwuid() and decide we need perl... -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
The evils of #define ...
Jarkko Hietaniemi [EMAIL PROTECTED] writes: On Tue, Aug 29, 2000 at 01:46:17AM -, [EMAIL PROTECTED] wrote: This is a build failure report for perl from [EMAIL PROTECTED], generated with the help of perlbug 1.32 running under perl v5.7.0. Now I tracked this one down (change #6891). The hunt mainly consisted of debugging the following charming line :-) SV *perinterp_sv = * Perl_hv_fetch(((PerlInterpreter *)pthread_getspecific((*Perl_Gthr_key_ptr(((void *)0) )) ) ) , (*Perl_Imodglobal_ptr(((PerlInterpreter *)pthread_getspecific((*Perl_Gthr_key_ptr(((void *)0) )) ) ) )) , "Storable(" "0.703" ")" , sizeof("Storable(" "0.703" ")" )-1 , (1) ) ; stcxt_t *cxt = ( stcxt_t * )(perinterp_sv (( perinterp_sv )-sv_flags 0x0001 )? ( stcxt_t * )(unsigned long )( ((XPVIV*) ( perinterp_sv )-sv_any )-xiv_iv ) : ((void *)0) ) ; ( cxt = ( stcxt_t *)Perl_safesysmalloc ((size_t )(( 1 )*sizeof( stcxt_t , (__extension__ (__builtin_constant_p ( ( 1 )*sizeof( stcxt_t ) ) ( ( 1 )*sizeof( stcxt_t ) ) = 16? (( ( 1 )*sizeof( stcxt_t ) ) == 1? ({ void *__s = ( (char*)( cxt ) ); *((__uint8_t *) __s) = (__uint8_t)0 ; __s; }) : ({ void *__s = ( (char*)( cxt ) ); union { unsigned int __ui; unsigned short int __usi; unsigned char __uc; } *__u = __s; __uint8_t __! c = (__uint8_t) ( 0 ); switch ((unsigned int) ( ( 1 )*sizeof( stcxt_t ) )) { case 15:__u-__ui = __c * 0x01010101; __u = __extension__ (void *)((char *) __u + 4); case 11:__u-__ui = __c * 0x01010101; __u = __extension__ (void *)((char *) __u + 4); case 7: __u-__ui = __c * 0x01010101; __u = __extension__ (void *)((char *) __u + 4); case 3: __u-__usi = (unsigned short int) __c * 0x0101; __u = __extension__ (void *)((char *) __u + 2); __u-__uc = (unsigned char) __c;break; case 14:__u-__ui = __c * 0x01010101; __u = __extension__ (void *)((char *) __u + 4); case 10:__u-__ui = __c * 0x01010101; __u = __extension__ (void *)((char *) __u + 4); case 6: __u-__ui = __c * 0x01010101; __u = __extension__ (void *)((char *) __u + 4); case 2: __u-__usi = (unsigned short int) __c * 0x0101; break; case 13:__u-__ui = __c * 0x01010101; __u = __extension__ (void *)((char *) __u + 4); case 9: __u-__ui = __c * 0x01010101; __u = __extension__ (void *)((char *) __u + 4); c! ase 5: __u-__ui = __c * 0x01010101; __u = __extension__ (void *)((char *) __u + 4); case 1: __u-__uc = (unsigned char) __c;break; case 16:__u-__ui = __c * 0x01010101; __u = __extension__ (void *)((char *) __u + 4); case 12: __u-__ui = __c * 0x01010101; __u = __extension__ (void *)((char *) __u + 4); case 8: __u-__ui = __c * 0x01010101; __u = __extension__ (void *)((char *) __u + 4); case 4: __u-__ui = __c * 0x01010101; case 0: break; } __s; }) ) : (__builtin_constant_p ( 0 ) ( 0 ) == '\0' ? ({ void *__s = ( (char*)( cxt ) ); __builtin_memset ( __s , '\0', ( 1 )*sizeof( stcxt_t ) ) ; __s; }) : memset ( (char*)( cxt ) , 0 ,( 1 )*sizeof( stcxt_t ) ; Perl_sv_setiv(((PerlInterpreter *)pthread_getspecific((*Perl_Gthr_key_ptr(((void *)0) )) ) ) , perinterp_sv , ( IV )(unsigned long )( cxt ) ) ; -- Nick Ing-Simmons
Re: RFC 155 - Remove geometric functions from core
David L . Nicol [EMAIL PROTECTED] writes: does sysV shm not support the equivalent security as the file system? mmap() has the file system. Did I not just describe how a .so or a DLL works currently? And behind the scenes that does something akin to: int fd = open("file_of_posn_indepenant_byte_code",O_RDONLY); struct stat st; fstat(fd,st); code_t *code = mmap(NULL,st.st_len,PROT_READ,MAP_SHARED,fd,0); close(fd); strace (linux) or truss (solaris) will show you what I mean. And then trusts to OS to honour MAP_SHARED. (mmap() is POSIX.) Win32 has "something similar" but I don't remember the function names off hand. Or you can embed your bytecode in const char script[] = {...}; and link/dlopen() it and then you have classical shared text. -- Nick Ing-Simmons
Re: RFC 155 - Remove geometric functions from core
Sam Tregar [EMAIL PROTECTED] writes: On Tue, 29 Aug 2000, Nick Ing-Simmons wrote: David L . Nicol [EMAIL PROTECTED] writes: does sysV shm not support the equivalent security as the file system? mmap() has the file system. I wasn't aware that mmap() was part of SysV shared memory. It is NOT. It is another (POSIX) way of getting shared memory bewteen processes. Even without MAP_SHARED OS will share un-modified pages between processes. It happens to be the way modern UNIX implemements "shared .text". i.e. the ".text" part of the object file is mmap()'ed into each process. My mistake? It's not on the SysV IPC man pages on my Linux system. The mmap manpage doesn't mention SysV IPC either. SysV IPC is a mess IMHO. My point was that if the "file system" is considered sufficient then mmap()ing file system objects will get you "shared code" or "shared data" without any tedious reinventing of wheels. -- Nick Ing-Simmons
Re: RFC 161 (v2) OO Integration/Migration Path
Nathan Torkington [EMAIL PROTECTED] writes: Dan Sugalski writes: If the vtable stuff goes into the core perl engine (and it probably will, barring performance issues), then what could happen in the I have a lot of questions. Please point me to the appropriate place if they are answered elsewhere. vtables are tables of C functions? I am using them as tables of machine-code functions (compiled from C being the obvious but not the only way to create those). Perl functions? Not directly. But given a "C" API it is normally easy enough to wrap the perl function (e.g. the FETCH/GET tie methods layered under "magic" in perl5). Either? How would you use them to handle overloading of operators? One function in the vtable for every operation? If the table is with the data yes, else if table is with the code one function for every type. How does that extend to user-defined operators? Badly. But it makes user-defined implementations of existing operators easy. Nat -- Nick Ing-Simmons
RE: RFC 146 (v1) Remove socket functions from core
Fisher Mark [EMAIL PROTECTED] writes: Leaping to conculusions based on no tests at all is even worse... Will anyone bite the bullet and write the "Internals Decisions should be based on actual tests on multiple platforms" RFC ? BTW, I have access to Rational Software's Quantify (and PureCoverage and Purify) on WinNT and HP-UX 10.20 which I'd be glad to use for such tests. If you want to get "in the mood" it would be good to fire it up on (say) perl5.6.0 and see where the hot-spots are. === Mark Leighton Fisher[EMAIL PROTECTED] Thomson Consumer ElectronicsIndianapolis IN "Display some adaptability." -- Doug Shaftoe, _Cryptonomicon_ -- Nick Ing-Simmons
Re: RFC 155 - Remove geometric functions from core
Jarkko Hietaniemi [EMAIL PROTECTED] writes: bytes microperl, which has almost nothing os dependent (*) in it 1212416 shared libperl 1277952 bytes + perl 32768 bytes1310720 dynamically linked perl1376256 statically linked perl with all the core extensions2129920 (*) I haven't tried building it in non-UNIX boxes, so I can't be certain of how fastidiously features have been disabled. "bytes" of what? - size of executable, size of .text, ??? If we are taling executable with -g size then a lot of that is symbol-table and is tedious repetition of "sv.h" co. re-itteerated in each .o file. But the basic point is that these things are small. So ripping all this 'cruft' would save us about 100-160 kB, still leaving us with well over a 1MB-plus executable. It's Perl itself that's big, not the thin glue to the system functions. My support for the idea is not to reduce the size of perl in the UNIX case, but to allow replacement. I would also like to have the mechanism worked out and "proven" on something that we know gets used so that we can have good solid testing of the mechanism. Then something less obvious (say Damian's any/all operators) which might be major extra size and not of universal appeal can use a well tried mechanism, and we can flip default to re-link sockets or sin/cos/tan into the core. -- Nick Ing-Simmons
RE: RFC 146 (v1) Remove socket functions from core
Al Lipscomb [EMAIL PROTECTED] writes: I wonder if you could arrange things so that you could have statically linked and dynamic linked executable. Kind of like what they do with the Linux kernel. When your installation is configured in such a way as to make the dynamic linking a problem, just compile a version that has (almost) everything bolted in. Otherwise compile the features as modules. If we make it possible to move socket or math functions out of execuable into "overlays" then there will always be an option NOT to do that and build one executable - (and that will probably be the default!). We need to distinguish "module", "overlay", "loadable", ... if we are going to get into this type of discussion. Here is my 2ยข: Module - separately distributable Perl and/or C code. (e.g. Tk800.022.tar.gz) Loadable - OS loadable binary e.g. Tk.so or Tk.dll Overlay - Tightly coupled ancillary loadable which is no use without its "base" - e.g. Tk/Canvas.so which can only be used when a particular Tk.so has already be loaded. Tk has these "overlays" - I think DBI has something similar. perl5 itself does not as such (although POSIX.so is close). _I_ would like to see RFC 146 mutate into or be replaced by an RFC which said perl should have a mechanism to allow parts of functionality to be split out into separate binary (sharable) files. -- Nick Ing-Simmons
Re: RFC 146 (v1) Remove socket functions from core
Michael G Schwern [EMAIL PROTECTED] writes: Like all other optimizing attempts, the first step is analysis. People have to sit down and systematically go through and find out what parts of perl (and Perl) are eating up space and speed. The results will be very surprising, I'm sure, but it will give us a concrete idea of what we can do to really help out perl's performance. There should probably be an RFC to this effect, and I'm just visiting here in perl6-language so I dump it on somebody else. Alan Burlison [EMAIL PROTECTED] writes: Drawing conclusions based on a single test can be misleading. Leaping to conculusions based on no tests at all is even worse... Will anyone bite the bullet and write the "Internals Decisions should be based on actual tests on multiple platforms" RFC ? -- Nick Ing-Simmons
Re: RFC 155 (v1) Remove geometric functions from core
Chaim Frenkel [EMAIL PROTECTED] writes: I don't think that you should require a use. That is too violent a change. Moving things that were in the core of Perl5 out should be invisible to the user. I strenuosly object to having to add use, for every stupid module. Don't worry - so do Dan and I at least. Anything that is part of the shipped perl should not need a use. That is the "definition" of the "shipped perl" ;-) If course we need a new name (not perl or Perl) for the "bundle of a perl and some handly Modules" which will be perl-6.0.0.tar.gz The entire set of constants and namespace should be immediately avaiable. The only possible use for a use for core functions would be to pass options or perhaps to select a non-default version. Yes - use math 'vector-processor'; use socket 'IPv7'; use getpw qw(paranoid); Modules that are from CPAN or local should be able to be promoted to autoloadable by some simple mechanism. Once we have a fast easy to use way of loading adjuncts for math/socket/gpw*/Registry/... and we can "byte compile" or whatever a module there should be no reason why not. -- Nick Ing-Simmons
Re: RFC 146 (v1) Remove socket functions from core
Bart Lateur [EMAIL PROTECTED] writes: On Fri, 25 Aug 2000 12:19:24 -0400, Dan Sugalski wrote: Code you don't call won't eat up any cache space, nor crowd out some other code. And if you do call it, well, it ought to be in the cache. Probably a stupid question... But can't you group the code for the most often used constructs? We can - and we will once we know what the "often used constructs" will be in perl6. Larry started will with pphot.c in perl5 - but over the years the bells and whistles have got tacked on where it seemed easiest and now perl5 needs a re-write to clean it up - perl6 will be that thing but while perl5 runs a language called Perl5, perl6 (being defined here) will run a language called Perl6 - being defined on [EMAIL PROTECTED] So that, if one of those things is loaded in the cache, the others are in there with it? That is the first approximation to what happens - but it is a start... If all the less needed stuff is more at the back of the executable, it wouldn't even have to be loaded, most of the time. Besides, I'm more worried about unnecessarily loading 600k from disk, than from main memory to cache. For short-lived scripts, this loading overhead could be quite significant. Most mordern (and sane) OSes will keep "useful" pages in memory till they need them for something else. This would be _the_ win for true byte-compiled (not modified at runtime) scripts/modules - those pages would not be re-loaded either. -- Nick Ing-Simmons
Re: RFC 127 (v1) Sane resolution to large function returns
Dan Sugalski [EMAIL PROTECTED] writes: At 02:25 PM 8/24/00 -0400, Chaim Frenkel wrote: But ($foo, $baz, @bar) = (1,(2,3),4) # $foo = 1 $baz=2, @bar=(3,4) Actually, looking at it like that makes it an ugly situation. The 'new' expectation would be to have it become # $foo=1 $baz=2 @bar=(4) Wouldn't that be $baz = 3, since the middle list would be taken in scalar context? Which has sanely become the length of the list rather than last element. -- Nick Ing-Simmons
Re: Vtable speed worry
David L . Nicol [EMAIL PROTECTED] writes: No, because each table lookup takes less time than comparing one letter of a text string. Er, I don't think so. A lookup takes serveral cycles on a RISC machine due to memory latency even to the cache. A pipelined string compare takes less than a cycle per char. Also what has comparing got to do with SvPVX ? sv-vtable-svpvx; Isn't this going to really, really hurt? -- Nick Ing-Simmons
Re: Design by Contract for perl internals
Michael G Schwern [EMAIL PROTECTED] writes: I wouldn't mind an optional OO contract system in the core of Perl, but this may be a case of "why do it in core when a module will work?" I _think_ the proposal was to have design-by-contract in the perl core in the sense that contract is checked when one part of the core calls another. e.g. that when someone does char *s = SvPV(sv); Something checks that 'sv' is a SV * and not an HV * or whatever. Obviously this is a perl-core-compile-time option and would NOT be on for production perl as it is SLOW. Sort of automatically and liberally inserted assert() statements. -- Nick Ing-Simmons
Re: Threaded In-Line Code (was Re: Typed Intermediate Language)
Chaim Frenkel [EMAIL PROTECTED] writes: "DS" == Dan Sugalski [EMAIL PROTECTED] writes: DS I was actually thinking that @b * @c would boil down to a single vtable DS call--we'd just hit the multiply function for variable @b, and pass it a DS pointer to @c, and let it Do The Right Thing. But that was my question in the _other_ thread. How? Given N different fundemental types, we end up with NxN vtbl entries. Which is not necessarily a problem if N is small (a 4x4 vtable is easy). -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: Internal Filename Representations (was Re: Summary of I/O related RFCs)
Jarkko Hietaniemi [EMAIL PROTECTED] writes: On Fri, Aug 11, 2000 at 02:16:31AM -0700, Nathan Wiger wrote: [cc'ed on internals as FYI] =item 36 (v1): Structured Internal Representation of Filenames I think this should be discussed a good amount. I think URIs are cool, but too much trouble for simple stuff. I don't want to have to write "file:///etc/motd" everytime I want to address a file. Too cumbersome. URI's have thought of that already - you can have a "relative URI" with a parent specified. We would have a default parent of file://localhost/$PWD The (vague) idea wasn't that "everything shall be an URI". It was the other way round: "the representation should be generic enough so that also URIs could be handled". In other words: things like the protocol, the port number, the username, the password, could be part of a "file spec". Quite. -- Nick Ing-Simmons
Re: Internal Filename Representations (was Re: Summary of I/O related RFCs)
Johan Vromans [EMAIL PROTECTED] writes: Nathan Wiger [EMAIL PROTECTED] writes: $fo = open "C:\Windows\System\IOSUBSYS\RMM.PDR"; $fo-pathdrive = "C:" ; I think the drive is "C", not "C:". The reason for including the ':' is so that the rule for reconstructing the path is easy and we don't need another slot for 'drive separator'. $fo-patharray = [ Windows, System, IOSUBSYS, RMM.PDR ]; I think the patharray is [ Windows, System, IOSUBSYS ]. The file name is RMM, the extension is PDR. $fo = open "/etc/inet/inetd.conf"; $fo-pathdrive = ""; I think this should be the mount point, e.g., "/". Splitting apart or putting together either one of these paths is trivial I think it's far from trivial, especially if you want to take into account network names, file versions, protection attributes and ACLs, ... -- Johan -- Nick Ing-Simmons
Re: Typed Intermediate Language
David L . Nicol [EMAIL PROTECTED] writes: Just in case I'm not the only one here who doesn't know what TIL means: http://www.cs.cornell.edu/home/jgm/tilt.html Well I have been using 'TIL' to mean "Threaded Interpretive Language" There is a Z80 FORTH clone defined in : "Threaded Interpretive Languages" R. G. Loeliger. Byte Books / McGraw-Hil 1981 ISBN 0-07-038360-X -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
Re: RFC 35 (v1) A proposed internal base format for perl
Larry Wall [EMAIL PROTECTED] writes: Nick Ing-Simmons writes: : It's not clear to me whether the intrinsic types should have a different : solution to this than the extrinsic types. : : _This_ thread is about using vtables for intrinsic types. If we cannot : make them work there then the proposed innermost SV * replacment is flawed. Sure, but we may have to warp our ideas of what a vtable is to encompass the notion of a vtable that is the cross-product of two vtables. That wouldn't be a 'vector' table but a 'matrix' table ! only 1/2 ;-) Larry -- Nick Ing-Simmons
Re: vector and matrix calculations in core? (was: Re: Ramblings on base class for SV etc.)
Bart Lateur [EMAIL PROTECTED] writes: On Wed, 09 Aug 2000 12:46:32 -0400, Dan Sugalski wrote: @foo = @bar * @baz; Given that the default action of the multiply routine for an array in non-scalar context would be to die, allowing user-overrides of the functions would probably be a good idea... :) [Is this still -internals? Or should we stop CC'ing?] One problem: overloading requires objects, or at least one. Objects are (currently) scalars. You can't make an array into an object. We are thinking of adding "objects" in the implementation of perl. i.e. perl's primitive "things" (scalars, arrays, hashes) will have 'vtables' (table of functions that do the work). So in that sense an array as in @foo can be an "object" at some level of meaning while not being an "object" at the perl level. -- Nick Ing-Simmons
Re: Method call optimization.
David L . Nicol [EMAIL PROTECTED] writes: One assumes that if you redefine (@ISA) perl5 throws away this cache? Not all at once. It increments a "generation number". When perl finds it is about to use a cached method it checks to see if the value post-dates the current generation number, or does the re-lookup. If D isa C isa B and D looked up method f and found it in B's methods, then C redefines itself as an A, does perl5 figure out to throw away D-f ? Sub definition increments the generation number too. I mean, who redefines ISA at run time? A. Almost all perl code prior to invention of 'use base' in perl5.00404. (Had to do @ISA = ... as part of run phase.) B. Any module loaded after another is compiled sets _its_ ISA after the base class has been compiled. And how did Ing-Simmons get on the reply-to-all CC list twice? Posted from work and from home. -- Nick Ing-Simmons
Re: Method call optimization.
Dan Sugalski [EMAIL PROTECTED] writes: At 03:35 PM 8/9/00 -0700, Damien Neil wrote: On Wed, Aug 09, 2000 at 03:32:41PM -0400, Chaim Frenkel wrote: Each sub is assigned an index. This index is unique for the package the sub is in, and all ancestor packages. Add all sibling packages of all the packages involved ;-) If we are not careful we can end up making the compile NP complete. We just had all the numbers nicely sorted and then someone reads in: package Foo; use base qw(Meth_is_1 Other_is_1); sub Meth ... sub Other ... And now we have to recompute the whole tree so that Meth and Other don't share the index. The first runtime reassignment of @ISA shoots this one down hard. Sorry. (MI also makes it more difficult, since dependency trees will have to be built...) Yes - this is why Malcolm dodged MI with 'fields' module. -- Nick Ing-Simmons
Re: Language RFC Summary 4th August 2000
Dan Sugalski [EMAIL PROTECTED] writes: At 11:40 AM 8/5/00 +, Nick Ing-Simmons wrote: Damian Conway [EMAIL PROTECTED] writes: It definitely is, since formats do things that can't be done in modules. Such as??? Quite. Even in perl5 an XS module can do _anything at all_. It can't access data the lexer's already tossed out. A source filter can, but not elegantly. That's where the current format format (so to speak) runs you into trouble. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk -- Nick Ing-Simmons
Re: RFC 61 (v2) Interfaces for linking C objects into pe
Perl6 Rfc Librarian [EMAIL PROTECTED] writes: This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Interfaces for linking C objects into perlsubs =head1 VERSION Maintainer: David Nicol [EMAIL PROTECTED] Date: 7 Aug 2000 Version: 2 Mailing List: [EMAIL PROTECTED] Number: 61 As this is all about what the interface looks like and has no details of implementation it is not really appropriate (IMHO) for internals list yet. This document is not precisely concerned with the details of the implementation of the interfaces it specifies, beyond a general attempt to restric itself to the possible. But this list is _only_ concerned with implementation details. -- Nick Ing-Simmons
Re: pramgas as compile-time-only
Chaim Frenkel [EMAIL PROTECTED] writes: "GB" == Graham Barr [EMAIL PROTECTED] writes: A different op would be a better performance win. Even those sections that didn't want the check has to pay for it. GB That may not be completly true. You would in effect be increasing the GB size of code for perl itself. Whether or not it would be a win would GB depend on how many times the extra code caused a cache miss and a fetch GB from main memory. GB As Chip says, human intuition is a very bad benchmark. Does the cache hit/miss depend on the nearness of the code To some extent. or simply on code path? That can have an effect too. not just caches but pre-fetch and branch prediction mess here as well. Obviously having the checked version be a wrapper of the base op and near it on the same page would be a VM win. Caches work well with small-ish linear-ish hotspots that keep being re-used. When access pattern does not follow that pattern things get (gradually) worse. How gradual and how -ish depends on cache architecture which is fun, often proprietary and off-topic ;-) - I can write a quick "turorial" e-mail if there is general interest (and I must have a biblography somewhere at work). -- Nick Ing-Simmons
Re: RFC 35 (v1) A proposed internal base format for perl
Ken Fox [EMAIL PROTECTED] writes: When we document this, can we move the low level interfaces out of the pod directory? It would be a shame to have people accidentally start using the internal interfaces just because they're well documented. ;) If they are well documented then the risks they will be taking will be obvious. (And if we say something like "this is fast" people will ignore all the warnings.) As one of the worst offenders I certainly will ;-) - Ken -- Nick Ing-Simmons
Re: Ramblings on base class for SV etc.
Ken Fox [EMAIL PROTECTED] writes: This got me thinking about whether it's necessary to define exactly what an SV struct is. The following seems over-specified: Dan's struct that includes thread sync stuff is also over-specified. I think the only thing we have to standardize on is the vtable interface and the flags. This seems like a good thing, at least during early experimentation with perl 6. True. I think just the vtable and flags is the minimal "interface" rest of the stuff is just data that access functions mess with (even the thread sync stuff). None the less - it makes sense to have a "straw man" of how the essential types will be implemented. We could wrap the basic operations on an SV with inlines so that the abstraction won't kill performance. The entire low-level definition of SV could be done in a header file. When building perl just pick what header you want. This would have no effect on external modules since they all go through the public interface. BTW, SV isn't a good name for this struct. Agreed. It's really a value binding, not the value itself. Nor is this "base" just for scalars - so apart from the fact it isn't a scalar and isn't a value "scalar value" is ideal :-( IMHO it would be a lot easier to read the code if we clearly differentiated between what's now called SV and the collection of xpv*'s. inline IV SvIV(SV *sv) { return (*sv-vtable.SvIV)(sv); } With the simplest case being IV nativeIV(SV *sv) { return sv-data.words.iv; } What component of the system is responsible for representation shifting? The vtable functions. It might be a really clean design for us to always shift representations so that the current vtable always points to the corrent semantics. (Look, ma, no flags required!) We need to see what the flags turn out to be. Perl5 has things like IOK, ROK, NOK, POK, UTF8, ... While those are not always necessary with the vtable stuff, they tend to get used by Perl in DWIM mode - it looks at flags to see if thing is "naturally" a string or a number. -- Nick Ing-Simmons
Re: RFC 35 / Re: perl6-internals-gc sublist
Chaim Frenkel [EMAIL PROTECTED] writes: And why carry around IV/NV and ptr? Make it into a union, to allow room. _my_ original "ramblings" posting did. I kept the triple to pad the thing to 8 words. Partly as devil's advocate against the "squeeze it all into one word" camp ;-) Essentially what I am proposing is removing the sv_any indirection for the simple scalar cases this reduces housekeeping, indirections and keeps actual data near "SV/PMC" for cache reasons. The string/number duality should be handled by the vtbl. So a string that is never accessed as a number, doesn't waste the space. And numbers that are rarely accessed as a string save some room. And as needed the vtbl can be promoted to a duality version that maintains both. All true - but we can minimize the malloc-ing that such conversions entail if we have a "reasonable" amount of (multi-purpose) storage in the root. -- Nick Ing-Simmons
Re: Ramblings on base class for SV etc.
Dan Sugalski [EMAIL PROTECTED] writes: The rest is also there to optimize the common case. (Though I do think it's overkill in many circumstances if all variables share the same base structure--arrays don't really need an integer portion, neither do hashes) So you re-use the space for AvLEN or whatever is "hot" for arrays and hashes. That's not a bad thing and, like I said, the big win's not in optimizing scalars, it's in optimizing hashes and arrays. Skimping here's likely not worth it in the long run. -- Nick Ing-Simmons
Re: C--
John Tobey [EMAIL PROTECTED] writes: Joshua N Pritikin [EMAIL PROTECTED] wrote: A few more clicks and I found: http://www.cminusminus.org/ Thanks, Joshua. Quickie summary. Implementations: one[1] semi-free (non-DFSG-compliant) complete. Others in progress. Why not specify as a C extension: I'm still looking for that. Could one do a GCC front end for C-- ? -- Nick Ing-Simmons
Re: RFC: Foreign objects in perl
Benjamin Stuhl [EMAIL PROTECTED] writes: --- Dan Sugalski [EMAIL PROTECTED] wrote: actual work. The dispatch routine has a function signature like so: int status = dispatch(void *native_obj, sv *perl_scalar, char *method_called, int *num_args_in, perl_arg_stack *arg_stack, int *num_args_out, perl_arg_stack *return_values); One thing: remember, there is a lot of talk about having perl6 use Unicode internally, which means that things like method names should be wchar_t * (or whatever). I doubt that - I guess names will be UNICODE but will be encoded in UTF8 rather than as wide chars. -- Nick Ing-Simmons
Re: RFC 35 (v1) A proposed internal base format for perl
Perl6 Rfc Librarian [EMAIL PROTECTED] writes: This is similar to the structure used in perl 5, with one major difference. Rather than having all the intellegence needed to use a variable separate from that variable, this RFC embeds that information into the variable itself. This allows for more efficient code to access the vriables, and it lets us add in variable types on the fly. This way perl doesn't, for example, have to know how to access an individual element of an array of integers--it just asks the array to return it a particular element. Code MUST use the vtable functions to get or set values from variables. They MUST NOT directly access the data. This base structure should be considered immobile, so it's safe to maintain pointers to it. The data portion of a variable should be considered moveable, and may be shuffled around if a variable changes its type, or the garbage collector needs to compact the heap. Implementation on various types (arrays, hashes, scalars) as well as sub-types (integer scalars, string scalars, objects) is left to another RFC. All good so far. =head1 IMPLEMENTATION The base variable structure looks like: struct { IV GC_data; void *variable_data; IV flags; void *vtable; void *sync_data; } The fields, in order, are: =over 4 =item variable_data Equivalent to perl5's sv_any pointer, this is a pointer to the actual data structure for the variable. It may, in certain cases, be coopted to hold the actual value. (This is likely the case for a scalar that holds just an integer, where the native int size is equal to or smaller than the native pointer size) I think we should allow more than just a pointer, and that really simple variables (IV, NV) should be able to use _just_ the structure above without auxillary malloc'ed data. The actual structure that hangs off will depend both on the class of variable (scalar, hash, array) and the type of that class (integer array, integer scalar, filehandle, reference) and isn't specified here =item flags This field holds various flags that hold the status of the variable. (Flags to be RFC'd later) =item vtable The vtable field holds a pointer to the vtable for a variable. Each variable type has its own vtable, holding pointers to functions for the variable. Vtables are shared between variables of the same type. (All integer arrays have the same vtable, as do all string scalars and so on) vtable contents will be RFCd separately. All variables will share a common set of functions, though scalars, arrays, and hashes will have their own set of extensions on top of that. The vtable should be non-opaque to the perl-core. =head1 IMPACT ON EMBEDDING None. Generally embedding apps won't deal with actual perl data =head1 IMPACT ON EXTENSIONS None. Extensions get pointers to this structure, which as far as they know is a magic cookie. (In fact the official perl term for the thing handed to extensions is a Perl Magic Cookie, or PMC) Knowledge of the internals is a no-no at this level. Moot - I think there are two classes of "extension": A. As above that treat this stuff as opaque. B. External "ops" - which will assume same as ops below - i.e. they can call via vtable etc. =head1 IMPACT ON OP FUNCTIONS Op functions have intimate knowledge of the internals and unrestricted access. Therefore they're assumed to know what they're doing, and will therefore heed the info in this RFC. =head1 REFERENCES sv.h -- Nick Ing-Simmons
Re: RFC 35 / Re: perl6-internals-gc sublist
John Tobey [EMAIL PROTECTED] writes: Nick Ing-Simmons [EMAIL PROTECTED] wrote: John Tobey [EMAIL PROTECTED] writes: Dan Sugalski [EMAIL PROTECTED] wrote: Yup, and I realized one of my big problems to GCs that move memory (references that are pointers and such) really isn't, if we keep the two-level variable structure that we have now. The 'main' SV structure won't move, while the guts that the equivalent of sv_any points to can without a problem. I certainly hope this data layout factoid is still subject to change. Having an SV have a fixed address is handy for C extensions. The 'entity' has got to have some 'handle' to defines its existance. If not the SV* data structure then what is it that defines the thing? Just like in Perl, if you want a reference to it, put it somewhere and use a pointer. Otherwise, use it by value. (three words, or two if flags are dropped, or four if vptr is added, or one if everything is crammed into a pointer with low bits commandeered). So what exactly is your point here ? We currently (perl5) have the "token" being: struct sv { void* sv_any; /* pointer to something */ U32 sv_refcnt; /* how many references to us */ U32 sv_flags; /* what we are */ }; 3 words (which is a "funny" number in a binary world. (With the type hiding in flags as a small int.) Current perl6 "token" proposed by RFC 35 is: struct { IV GC_data; // REFCNT void *variable_data;// sv_any, possibly used for IV, RV IV flags; // sv_flags vtable_t *vtable; // _new_ - explcit type void *sync_data;// _new_ - for threads etc. } 5 words (which is if anything a slighty worse number...) I think I would make it 4 or 8 words: struct { vtable_t *vtable; // _new_ - explcit type IV flags_and_GC;// sv_flags void *sync_data;// _new_ - for threads etc. void *variable_data;// sv_any, possibly used for IV, RV } That squeezes flags and GC into a word - no big deal for 'mark' bit but if we have a REFCNT it had better be 8 or 16 bits or update will be too many cycles. So my own favourite right now allows a little more - to keep data and token together for cache, and to avoid extra malloc() in simple cases. Some variant like: struct { vtable_t *vtable; // _new_ - explcit type IV flags; // sv_flags void *sync_data;// _new_ - for threads etc. IV GC_data; // REFCNT void *ptr; // SvPV, SvRV IV iv;// SvIV, SvCUR/SvOFF NV nv;// SvNV } The other extreme might be just a pointer with LS 3 bits snaffled for "mark" and two "vital" flags - but that just means above lives via the pointer and everything is one more de-ref away _AND_ needs masking except for "all flags zero" case (which had better be the common one). As I recall my LISP it has two pointers + flags -- Nick Ing-Simmons
Re: Language RFC Summary 4th August 2000
Damian Conway [EMAIL PROTECTED] writes: It definitely is, since formats do things that can't be done in modules. Such as??? Quite. Even in perl5 an XS module can do _anything at all_. -- Nick Ing-Simmons
Re: RFC 27 (v1) Coroutines for Perl
Dan Sugalski [EMAIL PROTECTED] writes: At 01:17 PM 8/4/00 +0500, Tom Scola wrote: [I think this belongs on the language list, FWIW, Cc'd there] I like this, but I'd like to see this, inter-thread queues, and events all use the same communication method. Overload filehandles to pass events around instead, so: I'm proposing that events and threads be dropped in lieu of coroutines. Not gonna happen. Tk and signals, at the very least, will see to that. As far as I am aware any multi-processing problem can be reduced to message passing and these "co routines as IO" are just one stab at that. For example occurance of a signal could just "print" down the handler "pipe", Likewise mouse click could just "print" down the Tk-ish ButtonPress-1 pipe. It is the "return path" that bothers me - and of course the thread behind the co routine still has locking issues if it updates "global" state. -- Nick Ing-Simmons
Re: inline mania
Dan Sugalski [EMAIL PROTECTED] writes: At 05:39 PM 8/2/00 +0100, Tim Bunce wrote: On Wed, Aug 02, 2000 at 12:05:20PM -0400, Dan Sugalski wrote: Reference counting is going to be a fun one, that's for sure. I'd like the interface to be something like: stat = perl_get_value(sv *, int what, destination) And what type is perl_get_value declared as returning? An integer--it is a status value after all... Are we sure the value to be should not be returned and the status to be the extra arg? It is neater to be able to say int err; int circ = perl_get_value(radius_sv,PL_INTEGER,err)*2*M_PI; rather than: int radius; int err = perl_get_value(radius_sv,PL_INTEGER,radius); int circ = radius*2*M_PI; Remember the compiler cannot put anything which has its address taken in a register - so if the value is likely to be used in an expression it is better to avoid forcing it to the stack. -- Nick Ing-Simmons [EMAIL PROTECTED] Via, but not speaking for: Texas Instruments Ltd.
RE: inline mania
Brent Fulgham [EMAIL PROTECTED] writes: Having thought about it a bunch more (because of this) I'm proposing we let the compiler decide. The caller doesn't know enough to make that decision. Read carefully. I said we *let* the caller decide, not *make* the caller decide. What, specifically, disturbs you about my proposal? The 'inline' keyword is just a hint to the compiler. If optimization is turned off, no inlining is done. If optimization is on, the compiler may or may not decide to inline. Performance on different compilers will vary. To repeat: Even if I say "inline" on everything, the compiler is free to disregard that if its optimization routines decide not to. (Also, if I fail to say "inline" on something, the compiler may decide to inline if optimization is active). So aren't we all saying the same thing? I don't think so - it is a question which way we code the source: A. Use 'inline' every where and trust compiler not to do what we told it if it knows better. B. No inline hints in the source and trust the compiler to be able to do the right thing when prodded with -O9 or whatever. C. Make "informed" guesses at to which calls should be inlined. My view is that (B) is the way to go, aiming for (C) eventually, because (A) gives worst-case cache purging. -Brent -- Nick Ing-Simmons