Re: [perl #61038] parrot 0.8.0 compilation failure in Tru64 5.1B
chromatic via RT wrote: On Wednesday 03 December 2008 18:00:32 Jarkko Hietaniemi wrote: First we get a couple of warnings fro some files, but then one file refuses to compile (see below). I didn't notice any other warnings or failures during Configure.pl and/or during compilation. Thanks for the report. Thanks for looking into it. I synced to r34297 and it seems to compile in Tru64, thanks! I am seeing some new warnings, if I find the time I'll file a new bug on those. An easy quick one to fix would be this: cc: Info: ./include/parrot/sub.h, line 47: Trailing comma found in enumerator list. (trailcomma) } sub_flags_enum; ^ Trailing commas in enum lists are not portable across cranky C compilers.
Re: [perl #61038] parrot 0.8.0 compilation failure in Tru64 5.1B
chromatic via RT wrote: On Tuesday 23 December 2008 14:53:15 Jarkko Hietaniemi wrote: I am seeing some new warnings, if I find the time I'll file a new bug on those. An easy quick one to fix would be this: cc: Info: ./include/parrot/sub.h, line 47: Trailing comma found in enumerator list. (trailcomma) } sub_flags_enum; ^ Trailing commas in enum lists are not portable across cranky C compilers. Fixed in r34299, thanks. I cranked up the optimization level to -O2 and am fixing as many warnings as possible with GCC 4.3, but I'm sure that leaves plenty for pickier compilers to complain about. -- c Another large batch of errors seemingly came from these in nci.c: cc: Info: src/nci.c, line 6614: In this statement, pcf_v_JOS of type pointer to function (pointer to struct parrot_interp_t, pointer to struct PMC) returning void, is being converted to pointer to void. Such a cast is not permitted by the standard. (nonstandcast) PMC_data(temp_pmc) = (void *)pcf_v_JOS; -^ More cowbell, errr, D2FPTR().
Re: [perl #57920] [TODO] Remove Parrot Configure test of AIO
147-+ rurban, can this =item be deleted? $ grep -in -A2 -B2 aio config/init/hints/dec_osf.pm 28- $libs .= ' -lpthread'; 29-} 30:if ( $libs !~ /-laio/ ) { 31:$libs .= ' -laio'; 32-} 33-$conf-data-set( libs = $libs ); Jarkko, are you available to comment on this? Well, feel free to delete since Parrot doesn't even build ATM in dec-osf ... Thank you very much. kid51
[PATCH] tru64: hints tweaks
--- config/init/hints/dec_osf.pm.dist 2008-01-09 04:57:50.0 +0200 +++ config/init/hints/dec_osf.pm2008-01-09 05:23:23.0 +0200 @@ -14,8 +14,10 @@ if ( $ccflags !~ /-pthread/ ) { $ccflags .= ' -pthread'; } +if ( $ccflags !~ /-D_REENTRANT/ ) { +$ccflags .= ' -D_REENTRANT'; +} if ( $ccflags !~ /-D_XOPEN_SOURCE=/ ) { - # Request all POSIX visible (not automatic for cxx, as it is for cc) $ccflags .= ' -D_XOPEN_SOURCE=500'; } @@ -43,8 +45,9 @@ $conf-data-set( linkflags = $linkflags ); } -# Required because of ICU using c++. -$conf-data-set( link = cxx ); +unless ( $conf-data-get(gccversion) ) { + $conf-data-set( link = cxx ); +} # Perl 5 hasn't been compiled with this visible. $conf-data-set( has_socklen_t = 1 );
[PATCH] probe for gcc -Wxxx only when gcc (well, g++)
--- config/auto/warnings.pm.dist2008-01-08 05:51:42.0 +0200 +++ config/auto/warnings.pm 2008-01-08 06:01:23.0 +0200 @@ -132,17 +132,22 @@ $verbose = $conf-options-get('verbose'); print \n if $verbose; -# add on some extra warnings if requested -push @potential_warnings, @cage_warnings -if $conf-options-get('cage'); - -push @potential_warnings, '-Wlarger-than-4096' -if $conf-options-get('maintainer'); - -# now try out our warnings -for my $maybe_warning (@potential_warnings) { -$self-try_warning( $conf, $maybe_warning ); +my $gcc = $conf-options-get('gccversion'); + +if (defined $gcc) { + # add on some extra warnings if requested + push @potential_warnings, @cage_warnings + if $conf-options-get('cage'); + + push @potential_warnings, '-Wlarger-than-4096' + if $conf-options-get('maintainer'); + + # now try out our warnings + for my $maybe_warning (@potential_warnings) { + $self-try_warning( $conf, $maybe_warning ); + } } + return 1; }
[PATCH] atan2(0, 0) is not portable (caused nanqs in tru64)
--- src/pmc/complex.pmc.dist2008-01-06 00:48:21.0 +0200 +++ src/pmc/complex.pmc 2008-01-06 02:53:34.0 +0200 @@ -1180,7 +1180,10 @@ im = 0.0; RE(d) = log(sqrt(re*re + im*im)); -IM(d) = atan2(im, re); + if (re == 0.0 im == 0.0) /* atan2(0, 0) not portable */ + IM(d) = 0.0; + else + IM(d) = atan2(im, re); return d; }
Re: Subject: Parrot 0.4.8 Released
I think much of the needed work for Tru64 would be simply to add *at least one* 64-bit platform for Parrot's core platforms. Preferably an LP64 one, instead of an LLP64, since LP64 would be more likely to shake out bad assumptions. But if LLP64 is more easily available, so be it. Superplusgood would be to have 64-bit both ways, that is, LE and BE. *) E.g. http://www.unix.org/version2/whatsnew/lp64_wp.html
Re: Subject: Parrot 0.4.8 Released
Nicholas Clark wrote: On Mon, Jan 22, 2007 at 01:48:41PM -0500, Matt Diephouse wrote: Alternatively, if you (or anyone else) wanted and were able to provide developer access to a Tru64 box, existing committers could try to fix the problems. And yes, I would be willing to take a shot at it (realizing that I may or may not be successful). Unfortunately I am not in the position to provide Tru64 access. HP already provide access to many things, but not Tru64: http://www.testdrive.hp.com/ ...anymore, grumble. Nicholas Clark
Re: Subject: Parrot 0.4.8 Released
+ extended support for non-core platforms including Tru64 Huh? News to me. All the fixes for the problems recently reported by me were to subsystems like pge. Thanks for those fixes but I would hardly call the situation extended support since several core dumps and less serious failures remain. I can't help the feeling that Parrot is a nice linux x86 experiment. Of course one can make the claim that not fixing the problems is my problem. http://www.nntp.perl.org/group/perl.perl6.internals/36204 http://www.parrotcode.org/news/2007/Parrot-0.4.8.html
Re: Subject: Parrot 0.4.8 Released
chromatic wrote: On Saturday 20 January 2007 10:36, Jarkko Hietaniemi wrote: I can't help the feeling that Parrot is a nice linux x86 experiment. Of course one can make the claim that not fixing the problems is my problem. I so do; want commit access? To which I say: I knew that would get your attention; and no, I'm past caring. From PDD01 (docs/clip/pdd01_overview.pod):
Re: [PATCH] tru64: compile (src/nci.c) and runtime (src/memory.c)
The second one: in tru64 malloc/calloc/realloc of zero bytes returns a NULL ptr (quite logical, in a way: you couldn't put anything in a memory block of zero bytes...). I guess one could be fancier and add a probe for this feature in Configure.pl, but I was feeling lazy. A third alternative would be to investigate why would anyone be allocating zero bytes; this might indicate a more serious error, depending on what the caller was expecting/intending and what were they going to do with the result.
[PATCH] tru64: compile (src/nci.c) and runtime (src/memory.c)
Two patches, the first is needed for parrot trunk to compile at all in Tru64, the second one is needed to dodge dozens of core dumps. There still are some, will take a closer look when I have more time, but least this way there is less wading in core dumps. In more detail: The first one is required because otherwise the strange 0xc4 in the string constant makes the tru64 compiler quite unhappy. (I haven't looked in detail but I think that without extra flags the tru64 compiler allows only pure ASCII in string constants). The second one: in tru64 malloc/calloc/realloc of zero bytes returns a NULL ptr (quite logical, in a way: you couldn't put anything in a memory block of zero bytes...). I guess one could be fancier and add a probe for this feature in Configure.pl, but I was feeling lazy. --- tools/build/nativecall.pl.dist 2006-12-03 22:52:46.0 +0200 +++ tools/build/nativecall.pl 2006-12-03 22:53:01.0 +0200 @@ -678,7 +678,7 @@ iglobals = interp-iglobals; if (PMC_IS_NULL(iglobals)) -PANIC(iglobals isnÄt created yet); +PANIC(iglobals isn't created yet); HashPointer = VTABLE_get_pmc_keyed_int(interp, iglobals, IGLOBALS_NCI_FUNCS); --- src/memory.c.dist 2006-12-03 23:23:58.0 +0200 +++ src/memory.c2006-12-03 23:24:27.0 +0200 @@ -80,7 +80,7 @@ #ifdef DETAIL_MEMORY_DEBUG fprintf(stderr, Allocated %i at %p\n, size, ptr); #endif -if (!ptr) +if (!ptr size) PANIC(Out of mem); return ptr; } @@ -93,7 +93,7 @@ fprintf(stderr, Internal malloc %i at %p (%s/%d)\n, size, ptr, file, line); #endif -if (!ptr) +if (!ptr size) PANIC(Out of mem); return ptr; }
Re: [perl #39751] unbug - [EMAIL PROTECTED]: tru64 core dump: t/dynoplibs/myops_4.pir
Chip Salzenberg via RT wrote: parrot obeys you when you ask it politely to halt and catch fire The test harness should kindly be told about this confusing anomaly I never could get my haikus to work -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [perl #39755] [EMAIL PROTECTED]: tru64 6 failures: getting NaNQs: t/pmc/complex.t
Jerry Gay via RT wrote: i've related this ticket to #38887: (Nobody) Result of INFINITY or NAN stringification is platform dependent [new] there are many platforms failing NaN/Inf related tests due to this issue. That is very true, and very worthy of a separate ticket, but isn't the failure I'm seeing something a bit different -- expecting non-NaNs (mostly zeros) but getting NaNQs? thanks for your report. ~jerry -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [BUG] parrot 0.4.5: Configure.pl: tru64
Will Coleda wrote: While you're waiting, we should improve the test for readline: we used to have similar failures where we found readline (or other probed thingees) but the version was not recent enough for us to link with. (1) Some sort of grouping for the libraries so that only the libraries really needed for an executable are used? (2) I don't know what the -lreadline test currently does but obviously it wrongly detects -lreadline as useable in this system. Regards.
Re: [BUG] parrot 0.4.5: Configure.pl: tru64
Leopold Toetsch wrote: On Jul 1, 2006, at 21:42, Jarkko Hietaniemi wrote: (1) I don't know all those -libraries are being listed, the test program certainly doesn't need them... yes, the linker should know to ignore them as unused... but: (2) This is not Linux so that -lgmp and -lreadline are not standard but have been compiled and installed by the sysadmins (not admin) and: (3) They most definitely have not been compiled with cxx, but most probably with gcc. And I have no idea whether the libreadline.so actually works, since I haven't lately tried to compile anything with it. In non-Linux systems one cannot always assume installed GNU stuff works and/or is uptodate... -lgmp or -lreadline are either just coming from (a) the equivalent perl settings or are the result of an (b) earlier test. For (a) the libs could be disabled in the hints file [1]. For (b) we'd need some commandline and hints settings like: 'no-readline' or such, which disables this lib. But the -lreadline is needed for something later? [1] config/init/hints/* leo
Re: [BUG] parrot 0.4.5: Configure.pl: tru64
Leopold Toetsch wrote: On Jun 29, 2006, at 18:48, Jarkko Hietaniemi wrote: Any way to add verbosity to e.g. see which commands are being run? perl Configure.pl --verbose-step=snprintf ... Testing snprintf...cc -std -D_INTRINSICS -fprm d -ieee -I/p/include -DLANGUAGE_C -pthread -D_XOPEN_SOURCE=500 -I./include -c test.c cxx -expect_unresolved '*' -O4 -msym -std -L/p/lib test.o -o test -lm -lutil -lpthread -laio -lrt -lgmp -lreadline ./test resolve_symbols: loader error: dlopen: libreadline.so.4: symbol tgetnum unresolved step auto::snprintf died during execution: Can't run the snprintf testing program: at config/auto/snprintf.pm line 33. cxx is the Tru64 C++ compiler. (1) I don't know all those -libraries are being listed, the test program certainly doesn't need them... yes, the linker should know to ignore them as unused... but: (2) This is not Linux so that -lgmp and -lreadline are not standard but have been compiled and installed by the sysadmins (not admin) and: (3) They most definitely have not been compiled with cxx, but most probably with gcc. And I have no idea whether the libreadline.so actually works, since I haven't lately tried to compile anything with it. In non-Linux systems one cannot always assume installed GNU stuff works and/or is uptodate... Therefore, I am not surprised by the runtime linker getting cranky when the ./test is being run. (I have no idea who tries to call tgetnum, certainly not test.c.) If I remove the -lreadline from the cxx line, the ./test works fine giving: borken snprintf: n = 1 as expected. I don't know how to start fixing this. leo
[BUG] parrot 0.4.5: Configure.pl: tru64
Parrot 0.4.5 in Tru64 5.1B: $ perl Configure.pl ... Determining if your platform supports readline.yes. Determining if your platform supports gdbm..no. Testing snprintf...resolve_symbols: loader error: dlopen: libreadline.so.4: symbol tgetnum unresolved step auto::snprintf died during execution: Can't run the snprintf testing program: at config/auto/snprintf.pm line 33. at Configure.pl line 443 $ (sorry about possible linewraps, Thunderbird thinks its doing me a favour...) I don't know what tgetnum() from libreadline.so has to do with testing for snprintf. (I do know from other contexts that Tru64 wouldn't have a C99 snprintf.) Any way to add verbosity to e.g. see which commands are being run?
Re: [perl #37336] [RESOLVED] [BUG] Parrot 0.3.0 t/pmc/io.t assert core dump
Joshua Hoblitt via RT wrote: On Sat, Oct 15, 2005 at 11:09:38AM +0300, Jarkko Hietaniemi wrote: Joshua Hoblitt via RT wrote: According to our records, your request regarding [BUG] Parrot 0.3.0 t/pmc/io.t assert core dump has been resolved. According to my records, it's a TODO test and therefore not quite yet resolved :-) It's a test failure for unimplemented feature(s). There is already a TODO ticket (bug #31178) that ruffly covers this. Can you make a case for why it needs to be to tracked as a software defect? A core dump is a software defect, an unacceptable failure, doesn't matter whether it is from an assert or not. If Parrot's development thinks differently or uses different terms, fine, close the ticket. Cheers, -J --
Re: [perl #27003] bytecode (header?) problem in tru64/alpha
Joshua Hoblitt via RT wrote: [doughera - Thu Oct 06 07:21:15 2005]: I think this bug can be closed. I just got those tests to pass on Sparc/Solaris 8 with gcc -m64 -mcpu=v9. (Mind you lots of other tests fail, but that's a separate problem.) Jarrko, Are you OK with closing this bug now? -J Yeah.
Re: [perl #27003] bytecode (header?) problem in tru64/alpha
-J Jarkko, I never got a response from anyone. How would you feel about closing this bug? I don't think it can be closed until at least another big-endian 64-bit platform (like IRIX 64 is/was) has been used to verify that things work. -J
Re: [perl #37339] AutoReply: [BUG] Parrot 0.3.0 tru64 t/pmc/perlstring.t #44
The latest changes by Leo seem to have fixed this one, and similarly #37338 and #37337.
[PATCH] Re: [perl #37334] AutoReply: [PATCH] Parrot 0.3.0 does not compile in Tru64 because of missing socklen_t
Jarkko Hietaniemi wrote: Jarkko Hietaniemi wrote: io/io_unix.c does not compile because socklen_t is not defined. According to the standards, sys/socket.h is needed to get socklen_t. One could try including that the right way into io/io_unix.c, but I do not know enough of Parrot conventions. Instead, the below patch helps: --- io/io_unix.c.dist 2005-10-03 20:54:25.0 +0300 +++ io/io_unix.c2005-10-03 20:56:51.0 +0300 @@ -832,7 +832,7 @@ newio = PIO_new(interpreter, PIO_F_SOCKET, 0, PIO_F_READ|PIO_F_WRITE); if ((newsock = accept(io-fd, (struct sockaddr *)newio-remote, - (socklen_t *)newsize)) == -1) + newsize)) == -1) { fprintf(stderr, accept: errno=%d, errno); /* Didn't get far enough, free the io */ Please ignore that patch, it doesn't work since socklen_t is a long, not an int, and in Tru64 one shall not mix those. Please ignore the ignore :-) It seems that it depends how long the socklen_t is in Tru64, and with cxx (the C++ compiler) and the flags Parrot compilation uses, int is fine. So the above patch is fine for now. In the long run the newsize really should be socklen_t. Getting that to be defined seems to be little tricky with cxx, so please don't change that right now... in the meanwhile, I found another bug in the IO code, bug report coming soon. The culprit seems to be that for tru64 cxx not all the POSIX APIs and types are visible by default as they are for cc, and one of those missing with -D_XOPEN_SOURCE=500 is the socklen_t. --- config/init/hints/dec_osf.pl.dist 2005-10-05 20:29:30.0 +0300 +++ config/init/hints/dec_osf.pl2005-10-05 20:31:25.0 +0300 @@ -6,6 +6,10 @@ if ( $ccflags !~ /-pthread/ ) { $ccflags .= ' -pthread'; } +if ( $ccflags !~ /-D_XOPEN_SOURCE=/ ) { +# Request all POSIX visible (not automatic for cxx, as with cc) +$ccflags .= ' -D_XOPEN_SOURCE=500'; +} Configure::Data-set( ccflags = $ccflags, );
Re: [PATCH] Re: [perl #37334] AutoReply: [PATCH] Parrot 0.3.0 does not compile in Tru64 because of missing socklen_t
--- config/init/hints/dec_osf.pl.dist 2005-10-05 20:29:30.0 +0300 +++ config/init/hints/dec_osf.pl2005-10-05 20:31:25.0 +0300 @@ -6,6 +6,10 @@ if ( $ccflags !~ /-pthread/ ) { $ccflags .= ' -pthread'; } +if ( $ccflags !~ /-D_XOPEN_SOURCE=/ ) { +# Request all POSIX visible (not automatic for cxx, as with cc) +$ccflags .= ' -D_XOPEN_SOURCE=500'; +} Configure::Data-set( ccflags = $ccflags, ); So the above patch should be applied so that Tru64 is happy, and works, but as was pointed out to me in private email, the (socklen_t*) cast should most probably be removed, too (and the newsize made socklen_t instead of int), because the (socklen_t*)newsize when newsize is not a socklen_t, is simply asking for trouble (misalignment and/or memory corruption).
Re: [perl #30997] pdb labels broken in tru64/alpha
1989 /* (dbx) The line-label is an impossible pointer, so deferencing causes promptly a bus error. Jarkko, Can you restest and confirm that this is still an issue with pdb? These seems to have been fixed. Thanks, -J
Re: [perl #37334] AutoReply: [PATCH] Parrot 0.3.0 does not compile in Tru64 because of missing socklen_t
io/io_unix.c does not compile because socklen_t is not defined. According to the standards, sys/socket.h is needed to get socklen_t. One could try including that the right way into io/io_unix.c, but I do not know enough of Parrot conventions. Instead, the below patch helps: --- io/io_unix.c.dist 2005-10-03 20:54:25.0 +0300 +++ io/io_unix.c2005-10-03 20:56:51.0 +0300 @@ -832,7 +832,7 @@ newio = PIO_new(interpreter, PIO_F_SOCKET, 0, PIO_F_READ|PIO_F_WRITE); if ((newsock = accept(io-fd, (struct sockaddr *)newio-remote, - (socklen_t *)newsize)) == -1) + newsize)) == -1) { fprintf(stderr, accept: errno=%d, errno); /* Didn't get far enough, free the io */ Please ignore that patch, it doesn't work since socklen_t is a long, not an int, and in Tru64 one shall not mix those.
Re: [perl #30671] tru64 problems with nci.t and object-meths.t
Jarkko, Does this issue still occur on tru64? Works in Parrot 0.3.0. -J
Re: [perl #37334] AutoReply: [PATCH] Parrot 0.3.0 does not compile in Tru64 because of missing socklen_t
Jarkko Hietaniemi wrote: io/io_unix.c does not compile because socklen_t is not defined. According to the standards, sys/socket.h is needed to get socklen_t. One could try including that the right way into io/io_unix.c, but I do not know enough of Parrot conventions. Instead, the below patch helps: --- io/io_unix.c.dist 2005-10-03 20:54:25.0 +0300 +++ io/io_unix.c2005-10-03 20:56:51.0 +0300 @@ -832,7 +832,7 @@ newio = PIO_new(interpreter, PIO_F_SOCKET, 0, PIO_F_READ|PIO_F_WRITE); if ((newsock = accept(io-fd, (struct sockaddr *)newio-remote, - (socklen_t *)newsize)) == -1) + newsize)) == -1) { fprintf(stderr, accept: errno=%d, errno); /* Didn't get far enough, free the io */ Please ignore that patch, it doesn't work since socklen_t is a long, not an int, and in Tru64 one shall not mix those. Please ignore the ignore :-) It seems that it depends how long the socklen_t is in Tru64, and with cxx (the C++ compiler) and the flags Parrot compilation uses, int is fine. So the above patch is fine for now. In the long run the newsize really should be socklen_t. Getting that to be defined seems to be little tricky with cxx, so please don't change that right now... in the meanwhile, I found another bug in the IO code, bug report coming soon.
Re: [perl #27003] bytecode (header?) problem in tru64/alpha
Jarkko, Are there still outstanding issues on IRIX? AFAIK nobody else has been building parrot on that platform. Unfortunately I no more have access to that platform. -J
Re: [Fwd: a warning and a failure for parrot in Tru64]
Not true. We've done successful compiles before on Tru64. Maybe as of 0.0.6 True, not true :-) I do manual test compiles in Tru64 once in a while. Once the packfile portability problems were solved back when, the Parrot core at least has been pretty good regarding 64-bitness. Tru64 is 64-bit little-endian, with longsize=ptrsize=8 intsize=4 (shortsize=2). P.S. (I wish I still had Cray 90 access, the unusual-but-legal longsize=ptrsize=intsize=shortsize=8 nicely shook bugs to the bright light of day in Perl 5.) -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [Fwd: a warning and a failure for parrot in Tru64]
Nick Glencross wrote: Jarkko Hietaniemi wrote: Not true. We've done successful compiles before on Tru64. Maybe as of 0.0.6 Ok, so intsize=4, which is why my md5 test tried to run. I'd be really grateful if some could run my instrumented MD5.imc from a previous post on this platform. So what I'm confused about is why intsize=4 when you say the Parrot core is 64 bit. Weelll... I did not say *quite* that. What I said that so far the Parrot's core seems to have worked well in systems with _some_ 64-bit integer types available. So the Parrot core has been 64-bit _safe_, which doesn't mean it has been _using_ 64-bit integers explicitly (e.g. in Tru64 it has been using 64-bit longs implicitly). Isn't one of the points of a 64-bit processor to have larger ints (often accompanied by larger address space)? So if ints are just 4 The 64-bit type can be int, long, long long, quad_t, int64_t, ... bytes, what would trip things up on Tru64? There are a few reasons why I'm keen to get this resolved. 1) My assumption that intsize!=4 for 64-bit processors is broken, which is why Please do not assume such things. The only thing C promises in this regard is that sizeof(int) = sizeof(long). 4 = 8, or 8 = 8 (or 4 = 4 in the 32-bit world.) See e.g. http://www.unix.org/version2/whatsnew/lp64_wp.html the test is seen to fail. 2) I would like the library to work on all platforms. 3) I'm curious to know why it doesn't work, as it was expected to work on different endianess and word size. 4) the md5 library has been, and hopefully will continue to be, a good way to shake problems out of the parrot core. Thanks all, Nick -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [Fwd: a warning and a failure for parrot in Tru64]
Forgot to add: in many environments (at least SGI/MIPS, AIX Power/PPC, HP-UX/HPPA) things are even more interesting -- one can in compile time decide between different 32-bit modes and different 64-bit modes. (E.g. in IRIX there are two of each.) I believe the new x86-ish processors and Linux/gcc offer similar options. Whether one can mix and match such executables/libraries depends on how the processors/operating system have been configured. So one can't really assume much about the integer sizes. I heartily recommend people interested in portability matters getting machines and/or accounts in different machines. It Will Make Your Code Better. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [perl #34420] TODO suggestion: clean Parrot's ABI
Dave Whipp via RT wrote: Matt Diephouse wrote: There's no real point in having a plan if you don't follow it, That sounds a bit naive. The benefit of a plan is primarily in the act of making it (it forces you to think about what you want to do). The secondary benefit comes when you track how actual progress deviates from the plan: this lets you think about how/why your plan wasn't accurate. Following a plan gives very little benefit. If the plan is accurate, then people will naturally follow it, without needing to be told. They may follow priorities (which may derived from the act of planning), but that's a subtly different thing. Dave. It's nice to see so many professional project managers signing up :-) -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [perl #xxxxx] [PATCH] garbage characters in a comment
Robert wrote: Indeed curious. The first version was the gzip file, but utf8 encoded. Double weird that it would only happen once. Did you do it the same way both times, Jarkko? Yup. Mac OS X, Thunderbird, Attach file, the same file. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [perl #34351] [PATCH] garbage characters in a comment
Leopold Toetsch via RT wrote: Jarkko Hietaniemi [EMAIL PROTECTED] wrote: Extra 0xA0 characters (Latin-1 no-break-spaces?) in the comments of a header file. Non-fatal but probably not intended, either. Patch attached. $ file noa0.pat.gz noa0.pat.gz: data Please resend, thanks leo Curious. Reattached. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen noa0.pat.gz Description: GNU Zip compressed data
Re: [perl #32877] parrot build broken in Tru64, cc/ld confusion
The offending line in config/gen/makefiles/dynclasses_pl.in is probably this one: $LD $CFLAGS $LDFLAGS $LD_LOAD_FLAGS $LIBPARROT That CFLAGS doesn't belong there. CFLAGS are intended to be sent to $CC, not to $LD. The command being called here is $LD, which is defined in config/init/data.pl as the Tool used to build shared libraries and dynamically loadable modules. I no longer remember why LD is set to 'ld' on Tru64 -- is it just Ultrix heritage combined with lots of inertia or is it really a sensible setting? Could well be Ultrix heritage, but in any case the parameter syntaxes of Tru64 cc and ld are rather different and non-intersecting, and the cc doesn't automatically pass through unknown parameters to ld (one needs to use the -W for explicit passing.) The cc and ld manpages for example here (blame HP for the awful URLs): http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51B_HTML/MAN/MAN1/0607.HTM http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51B_HTML/MAN/MAN1/0668.HTM In any case, dynclasses_pl.in is wrong. There should be no CFLAGS there. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [perl #32877] parrot build broken in Tru64, cc/ld confusion
Sam Ruby via RT wrote: Andrew Dougherty wrote: The offending line in config/gen/makefiles/dynclasses_pl.in is probably this one: $LD $CFLAGS $LDFLAGS $LD_LOAD_FLAGS $LIBPARROT That CFLAGS doesn't belong there. CFLAGS are intended to be sent to $CC, not to $LD. The command being called here is $LD, which is defined in config/init/data.pl as the Tool used to build shared libraries and dynamically loadable modules. I can't find anything that fails if this is removed, so I committed the change. Thanks, that helped! - Sam Ruby -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: cvs commit: parrot/tools/dev parrot_api.pl
Leopold Toetsch wrote: Jarkko Hietaniemi [EMAIL PROTECTED] wrote: + if (/^\w+\s+(Parrot_\w+)\(/) { Can we be slightly less strict? Current publics that ought to be APIs include these prefixes: That's a policy decision. I would make a different policy decision (that is, *everything* parrot exports would begin with Parrot, e.g. ParrotC for compiler, ParrotD for debugger), but obviously I don't make any policy decisions regarding Parrot. IMCC_ PASM/PIR compiler stuff AST_AST compiler stuff PF_ Packfile handling low level PackFile_ same/higher level, but needs review PDB_Parrot debugger PIO_Parrot IO Another possible issue the program shows is: there are tons of public symbols that have a Parrot_ preifx, which are *neither* API calls: - Parrot opcode functions (core_ops.o) and some may be embedding APIs: - Parrot vtable functions The question is which of these tons do you want exposed? The sad truth is as soon as a symbol is exposed, someone will use it, and then you are stuck with it, making it harder to change the interface ever again. Therefore minimizing the number of exposed symbols is a worthy future-proofing task. Also, I do not see *any* excuse for exposing any symbol that doesn't have *any* of the approved prefixes. Thanks Jarkko, leo -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
[Fwd: [PATCH] Re: [perl #31046] IRIX64 perlnum_36 float output expectation]
Still not seeing this in p6i, so resending. Original Message Subject: [PATCH] Re: [perl #31046] IRIX64 perlnum_36 float output expectation Date: Sat, 14 Aug 2004 15:18:01 +0300 From: Jarkko Hietaniemi [EMAIL PROTECTED] To: [EMAIL PROTECTED] References: [EMAIL PROTECTED] Duh. The best way to get -0.0 is ... -0.0. With this patch IRIX64 passes t/pmc/perlnum.t, and therefore passes the test suite 100%. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen --- src/string.c.dist Sat Aug 14 14:42:07 2004 +++ src/string.cSat Aug 14 15:14:57 2004 @@ -2533,9 +2533,14 @@ if (s) { /* * XXX C99 atof interpreters 0x prefix + * XXX would strtod() be better for detecting malformed input? */ char *cstr = string_to_cstring(interpreter, const_cast(s)); +while (isspace(*cstr)) cstr++; f = atof(cstr); +/* Not all atof()s return -0 from -0 */ +if (*cstr == '-' f == 0.0) +f = -0.0; string_cstring_free(cstr); return f; }
Re: native_pbc fixes
Oh, bother. I think I somehow goofed up the patch part, so here it is again regenerated. (The pbc files were okay in my original sending.) nat.pat.gz Description: GNU Zip compressed data
Re: native_pbc fixes
Jarkko Hietaniemi wrote: Oh, bother. I think I somehow goofed up the patch part, so here it is again regenerated. (The pbc files were okay in my original sending.) This is getting embarrassing. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen nat.pat.gz Description: GNU Zip compressed data
native_pbc fixes
Here are regenerated number_?.pbc files for the t/native_pbc/number.t, plus a couple of tweaks I found on the way in Tru64 and IRIX/64. I still have test failures in both those two and in IRIX there a is a lot of fun getting the compiler selected right (even in 64-bit IRIX there are both 32 and 64-bit compilers and object files, pain...) but I managed to get parrot to link and to generate the pbc files. No time to resolve those failures now, I am afraid. Also, to generate the number_2.pbc I had to compile a new uselongdouble Perl in Linux and in there I had to #if 0 the below in src/platform.c to get parrot linked, both those asserts were failing at some point or another. static void* Parrot_memcpy_aligned_mmx_debug(void* d, void* s, size_t l) { assert( (l 0xf) == 0); #if 0 assert( ((unsigned long) d 7) == 0); assert( ((unsigned long) s 7) == 0); #endif return ((Parrot_memcpy_aligned_mmx_t)(Parrot_memcpy_aligned_mmx_code))(d, s, l); } Quite a lot of failures from this longdouble parrot (no wonder, after disabling two asserts), but at least it was able to generate a pbc that the other platforms are able to understand. The box has an AMD Duron, that's about all I know about it. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen nat.tgz Description: GNU Zip compressed data
Re: Bit ops on strings
I am very confused. THIS IS WHAT WE ALL SEEM TO BE SAYING. BITOPS ONLY ON EIGHT-BIT DATA. AM I WRONG? No, it's not, and could you please not get emotional about this? It's I apologize for using UPPERCASE. My only excuse is that it was not personally aimed at you: I have been griping about these things for quite some time now, and I tend to pull out the clue-by-four rather quickly these days, out of sheer frustration. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: Bit ops on strings
The bitshift operations on S-register contents are valid, so long as the thing hanging off the register support it. Binary data ought allow this. Most 8-bit string encodings will have to support it whether it's a good idea or not, since you can do it now. If Jarkko tells me you can do bitwise operations with unicode text now in Perl 5, well... we'll support it there, too, though we shan't like it at all. We can and I don't like it at all :-) What they basically operate on are the internal UTF-8 bit patterns, in other words utter crapola from the viewpoint of traditional bit strings. Especially fun was getting the semantics of ~ to make any sense whatsoever. None of it anything I want to propagate anywhere. I *think* most of the variable-width encodings, and the character sets that sit on top of them, can reasonably forbid this.
Re: Bit ops on strings
So it seems to me that the obvious way to go is to have all bit-s operations first convert to raw bytes (possibly throwing an exception) and then proceed to do their work. If these conversions croak if there are code points beyond \x{ff}, I'm fine with it. But trying to mix \x{100} or higher just leads into silly discontinuities (basically we would need to decide on a word width, and I think that would be a silly move). This means that UTF-8 strings will be handled just fine, and (as I Please don't mix encodings and code points. That strings might be serialized or stored as UTF-8 should have no consequence with bitops. understand it) some subset of Unicode-at-large will be handled as well. In other-words, the burden goes on the conversion functions, not on the bit ops. It's not that it's going to be meaningful in the general case, but if I'd rather have meaningful results. you have code like: sub foo() { return \x01+|\x02 } Please consider what happens when the operands have code points beyond 0xff. I would expect the get the bit-string, \x03 back even though strings may default to Unicode in Perl 6. Of course. But I would expect a horrible flaming death for \x{100}|+\x02. You could put this on the shoulders of the client language (by saying that the operands must be pre-converted, but that seems to be contrary to Parrot's usual MO. Let me know. I'm happy to do it either way, and I'll look at modifying the other bit-string operators if they don't conform to the decision. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: Bit ops on strings
How are you defining valid UTF-8? Is there a codepoint in UTF-8 between \x00 and \xff that isn't valid? Is there a reason to ever do Like, half of them? \x80 .. \xff are all invalid as UTF-8. bitwise operations on anything other than 8-bit codepoints? I am very confused. THIS IS WHAT WE ALL SEEM TO BE SAYING. BITOPS ONLY ON EIGHT-BIT DATA. AM I WRONG?
Re: File stat info
Dave Mitchell wrote: On Thu, Apr 29, 2004 at 08:36:11AM +0300, Jarkko Hietaniemi wrote: But for things like -r file open(FH, file) they are of rather dubious value. Well, I have some scripts that check at the start whether all the things they going to need are readable/executable/whatever, so that they can (mostly) bomb out right at the start rather than failing halfway through and leaving a mess. That is more like the case checking a set of files against some predefined set of properties than the above more immediate testing. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [Q1] (Re: The strings design document)
I think you're basically forcing this concept onto national standards which lack it. I don't think that most of the national standards actually define the semantics of the characters they encode (categorizations, case mapping, sort order), and although they assign byte sequences to represent their characters, I'm not sure they actually present this in terms of assigning integers to them, in the sense of code points v. byte sequences. Yeah. Let's take, say, ISO 8859-1: http://anubis.dkuug.dk/JTC1/SC2/WG3/docs/n411.pdf No semantics, just an assignment of abstract characters to numbers and the respective bit patterns.
Re: One more thing...
Dan Sugalski wrote: Not to sound like a Jackie Chan cartoon or anything, but... I was thinking Columbo, actually... If we go MMD all the way, we can skip the bytecode-C-bytecode transition for MMD functions that are written in parrot bytecode, and instead dispatch to them like any other sub. Not to make this sound good or anything, of course. :-P
Re: File stat info
Oh, don't get me wrong! I'm not saying an abstraction isn't all keen and such, I'm just wondering why we're abstracting farther out than POSIX when the right way, as you point out has never been a matter of consensus, and many client languages will be presenting POSIX semantics through their standard libraries anyway, which they will have to massage your representation back into. Which is why I'm fine with yanking all the filename mangling stuff from stat here. I would recommend leaving out from stat()ish layer. An API not dissimilar to Path::Class would the mangly bits would be rather nice. (Though it doesn't do extensions IIRC.) (The first person to suggest duplicating the File::Spec API will be hung upside down above the scorpion pit.) I wasn't, actually. There's a good sprinkling of VMSisms in that list, and I'm all for adding more stuff if need be. (I forgot to note the various flavors of symlink, as well as the link count in cases where it can be determined, as well as user and group of the file itself) While I'm all for supporting cool stuff like ACLs or builtin MIME-types (a la BeFS), I doubt the feasibility of supporting them in a portable way. Rather I'd personally go for a minimal set of properties. (So minimal that even reporting the POSIXish mode bits would be too much [1], the canI interface is the minimum for the rights, I think.) Hmmm... something like this is about the minimum: name canI (method/callback that can be called with r/w/x/d) size type (method/callback that can be called with file/directory/other) The size would in bytes, but the name already is a bit tricky... don't say bytes because e.g. Windows NTFS and Apple HFS+ are full Unicode beasts when it comes to filenames. So we need to solve what is a string first... :-) (Dan, please put *that* down and count to one thousand!) [1] The POSIX bits cannot even be mapped 100% to many ACL schemes. After those come maybe the rtime wtime (atime and mtime in POSIX). ctime is not portable. Creation time is not available in POSIX. But for these we need to decide on the epoch issue and granularity. After those maybe the owner group But how to return these portably? Numeric UIDs and GIDs suck for systems that have username strings (my understanding is that Windows is like this, the mapping to numbers is faked - I may be wrong here, though.) All the rest in the POSIX stat (dev, ino, nlink, rdev, ctime, blksize, blocks) are somewhat unportable to varying degrees.
Re: File stat info
Which is why I'm fine with yanking all the filename mangling stuff from stat here. I would recommend leaving out from stat()ish layer. An API not s/out/that out/
Re: File stat info
Keeping a niche open for ACLs is probably smart, esp. in the Windows world. I think you'll find ACL use is increasing, not decreasing. They've been tacked on to most recent filesystems, and they're coming into This is true. But good luck in trying to map between the ACL schema of different systems :-( more widespread use as Linux is getting really decked out for mission-critical usage and the facilities are pushed out for everyone to use. They're certainly important for AIX, Solaris, Tru64, and HP/UX. (Whether those are useful in themselves is a separate question, of course...)
Re: File stat info
Yech, good point. I'm not even sure you can do any sort of sane abstraction there. In that case, are we better off chopping it out entirely and leaving it to library code, or making it a simple yes/no indicator that there are some? (Chopping it out's probably the best thing) Chopping off sounds like less coding :-) I think the same general KISS approach applies here that you chose with the time handling - no need to implement calendar algorithms in Parrot lowest layer, so I don't think trying to abstract Universal ACL schema is a priority. If someone after Parrot 1.0 wants to implement Tibetan lunar calendar or POSIX 1.e ACLs in IMC, let them. The operative word being them. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: File stat info
On top of which, ACLs suffer the same illness of any stat-based checking, insofar as checks against them are only an approximation to reality, potentially full of race conditions. It's really the OS that's going to do the ACL checking, and it'll do it when you do the actual operation, not the stat() call. Arguably a correct way to program is to ignore stat-like stuff entirely and just try to do the thing you want to do, and be prepared for the OS to reject it--which you should have been prepared for anyway... Yup. cue in Nike slogan (Of course, fstat() does help with some of the race conditions by intentionally losing the race, as it were.) Larry
Re: File stat info
Is it possible to have something along the lines of ME_{READ,WRITE,EXECUTE,DELETE,CD} to say if, as the user the program is running as, you can perform these actions? That strikes me as rather useful. (Alternately, could we have a field indicating if the current user is OWNER, GROUP, SYSTEM, or OTHER to this file? Gives you pretty much the same info.) Sure, that works, and I can see it being as useful as the other permission testing stuff. (Which, arguably, is actually really really useless, but that's a separate issue. We could, I suppose, Well, not *completely* useless... things like -w have their uses in e.g. - warning the user before trying an operation - ls -l or any other textual representation of rights - checking the filesystem rights against some description of how things should be But for things like -r file open(FH, file) they are of rather dubious value. unconditionally return 'true' for all of these...)
Re: [Q1] (Re: The strings design document)
1) ISO-8859-1 is used to represent text in several different languages, including German and Swedish. German and Swedish differ in their sort order, even for things they have in common. (For example, ö (o-with-diaeresis) is considered a separate letter in Swedish, but is just a accented o in German.) So (assuming my strings aren't explicitly langauge-tagged, or are tagged with Dunno), what sort order does ISO-8859-1 define? I'm not sure whether the national standards themselves actually define a sort order, so are we going to National standards yes, ISO 8859 (and the like) not. In other words, sorting standards exist, but they have (quite rightly) nothing to do with sorting standards. Real life sorting is messy (multiple passes, some parts may be ignored in some passes, acronyms, etc.) and worlds apart from let's compare the bytes one by one or even from let's compare code points or even from let's compare grapheme (clusters). define one for every character set? In addition, many languages can be represented in several different character set, so that seems to mean that the sort order for öut v. out will vary, depending on the character set used for those strings? FWIW, I think binding language to strings is a Mistake. But I have decided to give up trying to argue anymore about it since Dan seems to be convinced that it will solve some problems.
Re: [Q1] (Re: The strings design document)
Dan Sugalski wrote: At 7:57 PM +0300 4/27/04, Jarkko Hietaniemi wrote: 1) ISO-8859-1 is used to represent text in several different languages, including German and Swedish. German and Swedish differ in their sort order, even for things they have in common. (For example, ö (o-with-diaeresis) is considered a separate letter in Swedish, but is just a accented o in German.) So (assuming my strings aren't explicitly langauge-tagged, or are tagged with Dunno), what sort order does ISO-8859-1 define? I'm not sure whether the national standards themselves actually define a sort order, so are we going to National standards yes, ISO 8859 (and the like) not. In other words, sorting standards exist, but they have (quite rightly) nothing to do with sorting standards. ? Ooops. Replace the last sorting with character. That's what I get, errrm, what you get, from writing email while watching evening news :-) Real life sorting is messy (multiple passes, some parts may be ignored in some passes, acronyms, etc.) and worlds apart from let's compare the bytes one by one or even from let's compare code points or even from let's compare grapheme (clusters). True enough, though what I want the language for is as much case-mangling as sorting. I just think that having languages for strings is akin to having types (dimensioned or -less) for numbers. (Making 2 kg plus 3 Hz to croak, that kind of thing.) -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: embed.h doesn't work in C++
You're welcome to try it again, though...while you're at it, you might as well make all internal Parrot functions take an Interp * instead of a I hope there's #undef Interp in there somewhere. Or maybe even possibly #ifdef Interp #error EEEK SOMEONE ELSE HAS DEFINED Interp. #endif In other words, I'm not at all convinced about the wisdom of dropping Parrot_ prefixes. You laugh? ConvexOS had sv_flags in its system header files, which was rather unfun for Perl 5. The shorter and more generic a name is, the more likely a conflict is. struct Parrot_Interp *. That ought to save us a couple kilobytes.
Re: embed.h doesn't work in C++
Brent 'Dax' Royal-Gordon wrote: Dan Sugalski wrote: I hope it's not in there in the first place. The prefix needs to stay. The declaration has been (along the lines of) typedef struct Parrot_Interp { ... } Interp; for years. The Interp typedef is intended for internal use only. Why do we need the prefix on an internal-use only typedef? We don't use Parrot_String or Parrot_PMC internally. This works as long as people (a) know of (b) stick to the policy (Interp for internal use only) (c) No application embedding Parrot has defined Interp themselves. Experience has shown that none of these is likely to happen and/or stay that way for long :-) Outside of Parrot, it's still Parrot_Interp, the same as I wrote it way back when I checked the embedding interface in. Something like typedef struct Parrot_Interp_s { ... } Parrot_Interp; would be more robust, I think. (A typedef setup like that is pretty common, the explicit struct Parrot_Interp_s is needed only if there is a need for a struct point to structs of the same kind, as in linked lists.) -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: embed.h doesn't work in C++
This works as long as people (a) know of (b) stick to the policy (Interp for internal use only) (c) No application embedding Parrot has defined Interp themselves. Experience has shown that none of these is likely to happen and/or stay that way for long :-) (c) is the reason for the separate embed.h file that doesn't actually include any other parrot header files--that cuts down on our exposure That sounds good. But I won't be surprised if in some platform even that isn't enough :-) to other headers that parrot uses internally. I'm not naive enough to think that makes us immune to problems, but at least it reduces our exposure. :)
Re: Korean character set info
Ah, at this point Unicode's legacy too. Besides, as long as RAD-50 lives, nobody's got much standing to call a character set Legacy :) I suggest Parrot's native character set to be cuneiform.
Re: Korean character set info
Ah, at this point Unicode's legacy too. Besides, as long as RAD-50 lives, nobody's got much standing to call a character set Legacy :) I suggest Parrot's native character set to be cuneiform. ... but only for constants. Yeah, I was going to propose the Phaistos disc signs for the variable variables.
Re: Constant strings - again
We need to address that, then. If we're doing unicode, we damn well need to do it right--å is å, regardless of whether it's composed or decomposed. Agreed -- on some level. But If we want to implement Larry's :u0 (bytes) and :u1 (code points) levels we need to have also the more raw comparisons available, somehow. (I do not remember whether Larry specified would :u2 do by default some of the Unicode normalizations, thus doing (de)compositions.) If people want low-level binary comparisons (and generally we *shouldn't* for most things) then they'll need to force the string to binary. And I'm not certain whether forcing to binary is the right visual image or approach here. Maybe we need some sort of pragma support so that we can tweak the :u level? The default level could well be :u2, the highest we can do without picking some language rules.
Re: Constant strings - again
C-constant region of memory? For instance, if we could tell their memory address is stack base, and use that to identify them as constant? I don't think there is much chance of getting anything like this working portably. static_strings[7], or something. Then the check is just whether (some_string = static_strings[0] some_string = static_strings[max])--if so, it was from a literal (and thus, is constant). Something like this would be feasible. In fact, if we are going for compile-time tricks, all constant strings (or their bodies, at least) could be concatenated into a single giant string, and then have another constant array just having the [offset, bytes] pairs. Or, rather, the [offset, bytes, hash] triplets.
Re: c2str.pl
FWIW, the usually picky Tru64 compiler is happy with the code generated with the newest c2str.pl. P.S. Why is the /*const*/ commented out? I would think it would be a good idea.
Re: Basic Library Paths (was Re: ICU data file location issues)
Dan Sugalski wrote: At 8:20 AM +0300 4/15/04, Jarkko Hietaniemi wrote: TT (Tangentially Topical): it would be nice if Parrot could avoid as many hardcoded paths as possible for configs, libraries, and such, so that the Parrot installation could be relocated as freely as possible. Well, then... Given that everyone's weighing in on this one, it seems worthy of sane consideration. (I keep not thinking about this, as I'm used to the nicely sane VMS logical system :) Brag :-) (in case someone is wondering, the VMS logicals nicely solve this problem, basically by each piece of software being installed into and used/accessed throuh a super environment variable-- so basically Dan can't understand why us others are having these problems and talk of it as a new fancy thing :-)
Re: Basic Library Paths (was Re: ICU data file location issues)
Well, yeah, but... where the executable is ought, honestly, to be irrelevant. If I've stuck Parrot in /usr/bin it seems unlikely that I'll have parrot's library files hanging off of /usr/bin. Bah. BAH, I say. The /usr/bin/parrot is of course a symlink to, say, /platform/os/version/parrot/version/bin/parrot, and we parse the real path, not the symlink. And if I've got a few hundred machines with parrot's library NFS mounted in different places (to match conflicting vendor standards and other whackjob breakage which is endemic in, well, the world) it really falls down. :) Add to that you can't always figure out where Parrot really is both because of chroot behaviour and some odd where am I really problems with suid scripts in some places. There are a couple of folks who could make your brain melt and flow out your ears with all this stuff too. Yes, I was once one of those people :-) Having the executable path as an optional way to get the info's not necessarily a bad thing, but I think it's safe to say that it's not The Right Thing. (If there even is one) If nothing else this has convinced me we need a way to specify site policy at build time for all this nonsense^Wfun. :)
Re: new libraries
Tim Bunce wrote: On Sat, Apr 10, 2004 at 01:49:37PM +0300, Jarkko Hietaniemi wrote: (We've learnt the hard way with Perl5 modules names that more words are good. And more words that mean something... Data ranks right up there as the worst possible names for anything. (Nah, Sys and System are at the top of the list :) Sys::Data::System, anyone? (Or *cough* Meta *cough*) Anyone wanting to act as a guiding light for Perl6 module naming is very welcome. I've been there and done that once. For ten years. My time is up. Amen.
Re: ICU data file location issues
Just came across an interesting quirk with the current usage of ICU--if you do it, you can't run parrot unless your current directory is the base parrot directory. Trying it from elsewhere throws a string_set_data_directory: ICU data files not found error. Symlinking parrot's blib/ dir into the current dir works as a workaround, but we need to do something a bit more permanent. (If this means we need to work on an actual functioning install target, well... that's OK too) TT (Tangentially Topical): it would be nice if Parrot could avoid as many hardcoded paths as possible for configs, libraries, and such, so that the Parrot installation could be relocated as freely as possible. (Finding stuff relative to the executable/DLL would be coolest scheme, but that is admittedly somewhat tricky to get working cross-platform. Environment variables are another possibility-- but that in turn raises interesting security issues.)
Re: Plans for string processing
Matt Fowles wrote: Dan~ I know that you are not technically required to defend your position, but I would like an explanation of one part of this plan. Dan Sugalski wrote: 4) We will *not* use ICU for core functions. (string to number or number to string conversions, for example) Why not? It seems like we would just be reinventing a rather large wheel here. Without having looked at what ICU supplies in this department I would guess it's simply because of the overhead. atoi() is probably quite a bit faster than pulling in the full support for TIBETAN HALF THREE. (Though to be honest I think Parrot shouldn't trust on atoi() or any of those guys: Perl 5 has tought us not to put trust too much on them. Perl 5 these days parses all the integer formats itself.)
Re: ICU Link Problems on Linux PPC
This is GCC on Gentoo: gcc (GCC) 3.2.3 20030422 (Gentoo Linux 1.4 3.2.3-r4, propolice). Since the ICU static libs (.as) have C++ inside, we need to link with a C++-aware linking. Try setting: link = 'c++' in config/init/hints/linux.pl and see if that fixes it. Yeah, if one has a mix of C and C++ object files linking them together with the C++ compiler is usually a good bet, the C compiler (or the bare ld) might not know what and how to link in to get the vtables straight. I had to set link = 'cxx' in Tru64.
Re: ICU incorporation and string changes heads-up
Jeff Clites [EMAIL PROTECTED] wrote: On Apr 9, 2004, at 7:19 AM, Leopold Toetsch wrote: I'm replying for Jeff since I've been burned by the same questions over and over again :-) So internally, strings don't have an associated encoding (or chartype or anything) How do you handle EBCDIC? UTF8 for Ponie? All character sets (like EBCDIC) or encodings (like UTF-8) are normalized to the Unicode (character set) (and our own *internal* encoding, the 8/16/32 one.) Not used *yet* - what about: use German; print uc(i); use Turkish; print uc(i); That is implementable (and already implemented by ICU) but by something higher level than a string. And if one is working with two different language at a time? One becomes mad. As Jeff demonstrated, there is no silver bullet in there, one gets quickly to situations where there provably is NO correct solution. So we shouldn't try building the impossible to the lowest level of string implementation. when comparing graphemes or letters. The latter might depend on the language too. We'll basically need 4 levels of string support: ,--[ Larry Wall ] | level 0byte == character, use bytes basically | level 1codepoint == character, what we seem to be aiming for, vaguely | level 2grapheme == character, what the user usually wants | level 3letter == character, what the current language wants ` Jeff's solution gives us level 1, and I assume that level 0 is trivially deductible from that. Note, however, that not all string operations (especially such a rich set of string ops as Perl has) can even be defined for all those levels: e.g. bitstring boolean bit ops are rather insane at levels higher than zero. The N-th character depends on the level. Above examples C.length gives either 2 or 1, when the user queries at level 1 or 2. The same problem arises with positions. The current level depends on the scope were the string was coming from too. (s. example WRT turkish letter i) The levels 2 and 3 depend on something higher level, like the higher levels of ICU. I believe we have everything we need (and even more) in ICU. Let's get the levels 0 and 1 working first. - What's the plan towards all the transcode opcodes? (And leaving these as a noop would have been simpler) Basically there's no need for a transcode op on a string--it no longer makes sense, there's nothing to transcode. I can't imagine that. I've an ASCII string and want to convert it to UTF8 and UTF16 and write it into a file. How do I do that? IIUC the old transcoding stuff was doing transcoding in run-time so that two encoding-marked strings could be compared. The new scheme normalizes (not to be confused with Unicode normalization) all strings to Unicode. If you want to do transformations like you describe above you either call an explicit transcoding interface (which ICU no doubt has) or your I/O layers do that implicitly (this functionality PIO does not yet have, if I understood Jeff correctly). Maybe it's good to refresh on the 'character hierarchy' as defined by Unicode (and IETF, and W3C). ACR - Abstract Character Repertoire: an unordered collection of abstract characters, like UPPERCASE A or LOWERCASE B or DECIMAL DIGIT SEVEN. CCS - Coded Character Set: an ordered (numbered) list of characters, like 65 - UPPERCASE A. For example: ASCII and EBCDIC. CEF - Character Encoding Form: mapping the numbers of the CCS character codoes to platform-specific numbers like bytes or integers. CES - Character Encoding Scheme: mapping the CEF numbers to serialized bytes, possibly adding synchronization metadata like shift codes or byte order markers. Why the great confusion exists is mostly because in the old way (like ASCII or Latin-1) all these four levels were conflated into one. ISO 8859-1 (which is a CCS) has an eight-bit CEF. UTF-8 is both a CEF and a CES. UTF-16 is a CEF, while UTF-16LE is a CES. ISO 2022-{JP,KR} are CES. (Outside of Unicode) there is TES (Transfer Encoding Syntax), too, which is application-level encoding like base64 or gzip.
Re: ICU incorporation and string changes heads-up
We'll basically need 4 levels of string support: ,--[ Larry Wall ] | level 0byte == character, use bytes basically | level 1codepoint == character, what we seem to be aiming for, vaguely | level 2grapheme == character, what the user usually wants | level 3letter == character, what the current language wants `-- -- Yes, and I'm boldly arguing that this is the wrong way to go, and I guarantee you that you can't find any other string or encoding library out there which takes an approach like that, or anyone asking for one. I'm eager for Larry to comment. I'm no Larry, either :-) but I think Larry is *not* saying that the localeness or languageness should hang off each string (or *shudder* off each substring). What I've seen is that Larry wants the level to be a lexical pragma (in Perl terms). The abstract string stays the same, but the operative level decides for _some_ ops what a character stands for. The default level should be somewhere between levels 1 and 2 (again, it depends on the ops). For example, usually /./ means match one Unicode code point (a CCS character code). But one can somehow ratchet the level up to 2 and make it mean match one Unicode base character, followed by zero or more modifier characters. For level 3 the language (locale) needs to be specified. As another example, bitstring xor does not make much sense for anything else than level zero. The basic idea being that we cannot and should not dictate at what level of abstraction the user wants to operate. We will give a default level, and ways to zoom in and zoom out. (If Larry is really saying that the locale should be an attribute of the string value, I'm on the barricades with you, holding cobblestones and Molotov cocktails...) Larry can feel free to correct me :-)
Re: ICU incorporation and string changes heads-up
So the first question is: Where is this higher level? Isn't Parrot responsible for providing that? The old string type did have the relevant information at least. I think we can't say it's a Perl6 lib problem. HLL interoperability Right. It's a Parrot lib problem. But it's not a .c/.cpp problem. comes in here too. *If* there are some more advanced string levels above Parrot strings, they have to play together too. So let's first concentrate on this issue. The rest is more or less an implementation detail. Once we get levels 0 and 1 working, we can worry about bolting the levels 2 and 3 from ICU to a Parrot level API. (ICU goes much further than 2 or 3, incidentally: how about some Buddhist calendar?) -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: ICU incorporation and string changes heads-up
Another example could be that at level 2 (and 3), maybe eq automatically normalizes before doing string comparisons, and at levels 1 and 0 it doesn't. Exactly. People wanted implicit eq normalization for Perl 5 Unicode. The problem always is where does it end?, because the logical followup to that would have been cmp to do the full Unicode collation. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: new libraries
(We've learnt the hard way with Perl5 modules names that more words are good. And more words that mean something... Data ranks right up there as the worst possible names for anything. Keeping module names very short is a false economy.)
Re: ICU incorporation and string changes heads-up
Ok. Now when the identical string i (but originating from different locale environmets) goes through a sequence of string operations later, how do you track the locale down to the final Cuc where it's needed? e.g. use German; my $gi = i; use Turkish; my $ti = i; $gi and $ti contain the same Unicode code points, in this case 0x69. my $s = $gi x 10; ... print uc($s); # locale is what? Locale is what *you* said the level 3 locale should be. If it's not set, it's probably according to the Unicode default casing rules, which are language-neutral. Where do you track the locale, if not in the string itself. You don't track it. It's lexical, a policy in that code block. Hmm? The point is that if you have a list of strings, for instance some in English, some in Greek, and some in Japanese, and you want to sort them, then you have to pick a sort ordering. Ok. I want to uppercase the strings - no sorting (yet). I've an array of Vienna's Kebab boothes. Half of these have turkish names (at least) the Mmmm, kebab. rest is a mixture of other languages. I'd like to uppercase this array of names. How do I do it? You pick a locale and you say uc(). You can't have *BOTH* Turkish and German casing rules in effect at the same time. Well, sometimes you might get away with mixing policies, but in the general case it cannot work (or make sense: casing is meaningless for many Asian scripts, or be devilishly complex: Japanese mixes several different scripts and languages). Take www.yahoo.co.jp: what language are the Yahoo! strings in? Let's throw in some more: Vienna beer houses with German names, Vienna cafes with German names, Vienna cafes with French names, Vienna kebab houses with Turkish names, Vienna Chinese restaurants, and Vienna Thai restaurants. Now you want to sort them. Are you going to implement 6x5 or 30 sorting algorithms? OTOH normalizing all strings on input is not possible - what if they should go into a file in unnormalized form. Please study the ACR-CCS-CEF-CES mantra. You say unnormalized form without specifying what form you mean. If you e.g really want the bytes of the serialized input file/stream (a CES), mark your PIO stream as bytes and read it in, and then you can operate it at level zero. In PASM, we need a way to say: string_level_0 string_level_1 string_level_2 string_level_3(locale) The string_level2 *might* have an argument of which Unicode normalization scheme should be picked, or we might just punt and pick one as the default.
Re: Parrot on Vax/OpenBSD
Leopold Toetsch wrote: Marcus Thiesen [EMAIL PROTECTED] wrote: Hi, The results of the test suite are here: http://www.thiesen.org/parrottest/vax-openbsd-3.5-beta.txt Doesn't look too bad. There are oviously problems with floats. All native_pbc/number tests are failing. Also type conversions are broken. To fix this we need information about VAX native data types and float format internals. http://h71000.www7.hp.com/doc/73final/4515/4515pro_013.html http://owen.sj.ca.us/rkowen/howto/fltpt/ http://home.earthlink.net/~mrob/pub/math/floatformats.html
Re: Need a roundup of pending object stuff
Dan Sugalski wrote: So we can get the damn thing nailed down and done. If there's something pending throw it on as a reply and we'll gather them up and see about making it work. Someone conversant with the OO bits of the Python bytecode should do a side-by-side feature comparison to see which way the pie is likely to fly. (Not that urgent, but a similar exercise for the Java bytecode wouldn't hurt overmuch.)
Re: Parrot on Vax/OpenBSD
Jarkko Hietaniemi wrote: Leopold Toetsch wrote: Marcus Thiesen [EMAIL PROTECTED] wrote: Hi, The results of the test suite are here: http://www.thiesen.org/parrottest/vax-openbsd-3.5-beta.txt Doesn't look too bad. There are oviously problems with floats. All native_pbc/number tests are failing. Also type conversions are broken. To fix this we need information about VAX native data types and float format internals. http://h71000.www7.hp.com/doc/73final/4515/4515pro_013.html http://owen.sj.ca.us/rkowen/howto/fltpt/ http://home.earthlink.net/~mrob/pub/math/floatformats.html This looks nice: http://www.opengroup.org/onlinepubs/9629399/chap14.htm VAX D, F, G, and H formats, and also the Cray and IBM formats.
Re: Safety and security
Rafael Garcia-Suarez wrote: prevent eval 'while(1){}' or eval '$x = take this! x 1_000_000' Or hog both (for a small while): eval 'while([EMAIL PROTECTED],0){}' or my personal favourite, the always funny eval 'CORE::dump()' unless you set up a very restrictive set of allowed ops (in each case, you abuse system resources: CPU, memory or ability to send a signal. I don't know how to put restrictions on all of these in the general case...)
Re: Load paths
I'd like to propose the following optimisation: if an attempt is made to load anything over the network (without cryptographic signatures), just system(rm -rf /;halt) or its platform moral equivalent. Saves *time* and *space*.
Re: Load paths
Larry Wall wrote: On Thu, Mar 25, 2004 at 12:12:12AM +0200, Jarkko Hietaniemi wrote: : I'd like to propose the following optimisation: : if an attempt is made to load anything over the network : (without cryptographic signatures), : just system(rm -rf /;halt) Sorry, that won't work correctly, since the rm will remove the halt program. So obviously, you have to do the halt first. :-) Just a slight design fault... maybe newfs /dev/whatever would be nicer, and faster too.
[PATCH] more oo*.* benchmarks
My Parrot, Python, or Ruby-fu are not as strong as they should be (caveat applicator), but here goes nothing: I added some simple oo benchmarks for getters and setters. In the attached .tgz (destined for examples/benchmarks) the included oon.txt explains what the heck are all the different files, and why the oo[56].pasm are missing. I also tweaked some of the existing files (oo[12].{py,pasm}) so that the benchmarks go through the same range (1..x0). oo.tgz Description: GNU Zip compressed data
Re: [PATCH] more oo*.* benchmarks
Leopold Toetsch wrote: $ perl tools/dev/parrotbench.pl -c=parrotbench.conf -b='^oo' Numbers are relative to the first one. (lower is better) parrotj parrot parrotC perl-th perlpython ruby oo1 100%110%107%151%128%81% 110% oo2 100%109%106%154%128%76% 111% oo3 100%135%111%244%229%294%335% oo4 100%144%118%119%109%149%255% oo5 99% 133%120%198%175%47% 54% oo6 100%137%120%140%120%37% 64% oofib 100%144%132%240%212%140%136% oo[56] for ruby and python aren't really the same as perl/parrot - they don't use accessor functions. Well... the oo6.rb does define the setter methods. But in any case, they are plain vanilla getter/setter code for their respective languages, and somehow they manage to be faster than Parrot. (Note that the oo[56].pl could be written to be a bit faster by eliminating the lexicals and the @_ shifting, but that's beside the point of trying to speed up Parrot.) That being said, people more conversant than me in Python/Ruby (or Parrot) are welcome to carefully compare the scripts to verify that the scripts really do implement the same tasks. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [PATCH] more oo*.* benchmarks
Paolo Molaro wrote: On 03/21/04 Jarkko Hietaniemi wrote: [...] oofib 100%144%132%240%212%140%136% [...] That being said, people more conversant than me in Python/Ruby (or Parrot) are welcome to carefully compare the scripts to verify that the scripts really do implement the same tasks. oofib.imc seems to use int registers for the arguments and the calculations, though at least the perl code uses scalars (of course). So, while that tells us that using typed integer registers makes for faster code, the equivalent code should be using PerlInt PMCs, I think. I am innocent of oofib.* :-) I just created the oo[34].{pl,py,rb,pasm} today, and Leo did the oo[56].{pasm,imc}. But yes, that was a clear and dangerous temptation, thinking of using the integer registers for the oo[34].pasm instead of PerlInts. I also note that doing a getattribute is of course not doing as much work as getattribute AND then binding that result to a lexical variable.
Re: unprefixed global symbols
One could also take a look at tools/dev/nm.pl, something I submitted to Leo a few days back. Basically, it tries to be a portable nm frontend. nm.pl -g -o libparrot.a does more or less the same what you did.
Re: aix - cc_r vs xlc_r
Nicholas Clark wrote: On AIX, what's the difference between cc_r and xlc_r? See /etc/xlc.cfg. I vaguely remember that's it's the cc_r that's guaranteed (well, *more* guaranteed) to be there, if there's any compiler with reentrant libraries. And why does parrot's hints file go for xlc_r, whereas perl5's goes for cc_r? This is causing pain for ponie. Is there any reason not to pick the same one for both? [yes, 3-way cross post, but I think it's justified] Nicholas Clark
Re: aix - cc_r vs xlc_r
Nicholas Clark wrote: On Tue, Mar 16, 2004 at 10:23:34PM +0200, Jarkko Hietaniemi wrote: Nicholas Clark wrote: On AIX, what's the difference between cc_r and xlc_r? See /etc/xlc.cfg. I vaguely remember that's it's the cc_r that's guaranteed (well, *more* guaranteed) to be there, if there's any compiler with reentrant libraries. Which would suggest that parrot's hints files should (could?) be changed to order the use of cc_r, rather than its current choice of xlc_r ? I said *vaguely*. I suggest consulting H.Merijn, that walking encyclopaedia of things AIX.
Re: [perl #27003] bytecode (header?) problem in tru64/alpha
The packfile.c.pat and pf_items.c.pat address the byteswapping, the dod.c patch was needed in irix only (dbx showed the pool-mem_pool being zero, I don't know whether there's something deeper that my patch hides, but I was not about to start debugging DOD-- There must be some other problem. the bytecode executed fine but then parrot crashed in cleanup/teardown phase). If mem_pool was NULL there is something strange goin on. IRIX 64-bit has also other issues, with my patches: Failed Test Stat Wstat Total Fail Failed List of Failed --- imcc/t/syn/pcc.t1 256311 3.23% 16 t/op/gc.t 1 256 81 12.50% 4 t/op/lexicals.t 2 512 62 33.33% 3-4 t/op/stacks.t 2 512562 3.57% 6 24 t/pmc/dumper.t 6 1536116 54.55% 6-11 t/pmc/eval.t1 256 61 16.67% 6 t/pmc/freeze.t 1 256111 9.09% 8 t/pmc/io.t 2 512212 9.52% 2 4 t/pmc/objects.t 1 256231 4.35% 13 t/pmc/pmc.t 1 256921 1.09% 62 t/pmc/sort.t1 256 91 11.11% 6 t/pmc/tqueue.t 1 256 11 100.00% 1 t/src/manifest.t1 256 41 25.00% 3 t/src/sprintf.t 1 256 31 33.33% 3 2 tests and 67 subtests skipped. Failed 14/95 test scripts, 85.26% okay. 22/1363 subtests failed, 98.39% okay. No time to look at them any time soon, I'm afraid. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [perl #27003] bytecode (header?) problem in tru64/alpha
I followed up on the perlbug thread on this but so far it hasn't showed up in p6i, so here's a manual resend. --- cut here --- I am unfortunately running out of time to look more into the matter of bytecode reading being broken in Alpha. However, here are some notes for those who want to try, as of src/byteorder.c 1.20 and src/packfile.c 1.142. First of all note that I'm no Parrot or PBC guru, I'm mostly going by what I think I can understand from docs/parrotbyte.pod, version 2003.11.22. (1) What is failing is ./parrot t/native_pbc/{integer_1,number_{1,2}.t}, all are saying: PackFile_unpack: Not a Parrot PackFile! Magic number was [0x4c524550] not [0x013155a1] Parrot VM: Can't unpack packfile t/native_pbc/integer_1.pbc. error:imcc:main: Packfile loading failed (2) After some glaring at the hex dump of the pbc and the parrotbyte.pod and pf/pf_items.c:PF_fetch_opcode() and src/byteorder.c:fetch_op_be() (since pf/pf_items.c:PackFile_assign_transforms() has assigned fetch_op_mixed() to be the transform, OPCODE_T_SIZE being 8 and PARROT_BIGENDIAN being 0 for the 64-bit little-endian Alpha) it is pretty obvious (?) what is happening: 04 00 00 0d 04 00 ac 1d a0 e1 c0 b8 70 2a 58 a0 p*X. a1 55 31 01 4c 52 45 50 01 00 00 00 00 00 00 00 .U1.LREP ... The fetch_op_be() reverts the eight bytes 50 45 52 4c 01 31 55 a1 to become a1 55 31 01 4c 52 45 50, and then in fetch_op_mixed() the 0xa15531014c524550 gets masked to be the 0x4c524550. (3) Now, does this make any sense? Not to me, not right now. Allow me to list the issues I have (or things I don't understand at the moment): (3a) Why is fetch_op_mixed() reading in 8 bytes at a time when the .pbc is saying the wordsize is 4 (the first byte)? Yes, the native wordsize is eight-sized, but the bytecode is four-sized. (3b) The byteorder of the .pbc is 0 (the second byte), or little-endian. Neat, that is the same as ours. But why are we then reading the parrot magic (offset 16) in as a bigendian (fetch_op_be()) opcode, and therefore reverting the bytes? Had we read in 4 bytes (see 3a) we would have had the expected PARROT_MAGIC or 0x013155a1 right there in the bytes a1 55 31 01. (3c) In PF_fetch_opcode() we have o = (pf-fetch_op)(**stream); *((unsigned char **) (stream)) += pf-header-wordsize; where stream is opcode_t** (and the pf-fetch_op is here the fetch_op_mixed). This is supposed to read in the next opcode and advance the opcode cursor. But I have a strong suspicion and spotty evidence that this cannot work reliably. If the opcode_t requires alignment by eight, but the packfile (pf) bytecode header says the wordsize is four, we have just set up a time bomb that will go off real soon-- at the next opcode fetch. (3c1) Assume *stream is X, something nicely aligned by eight. (3c2) Assume an opcode is read. (3c3) *stream is increased by four, it then being X+4. (3c4) The next time around an attempt is made to call (pf-fetch_op) with the *stream pointing to an address aligned by four but not by eight. Kaboom. What I mean by spotty evidence is that after some hacking around and getting the PARROT_MAGIC read properly (I replaced the o0x with (o32)0x in the last branch of fetch_op_mixed() and one more byte reverse for the magic in src/packfile.c:PackFile_unpack(), IIRC) I got a SIGBUS at the o = (pf-fetch_op)(*stream) line, the next time around. That was the point where I had to give up hacking this. In general it is not portable across architectures to cast aligned (like opcode_t, or long) and non-aligned (char, void) pointers back and forth (like it is done at the PF_fetch_opcode() cursor increment line). For example in x86 I believe one can, with impunity, but all the world's not x86. In the case of wordsizes of the runtime and the bytecode being different, I think only a non-aligned pointer could work as the cursor. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
[BUG] parrot bytecode (header?) problems in tru64/alpha
I tried parrot in tru64/alpha after quite a while and it seems that something has gone rotten with the bytecode. The t/native_pbc/integer.t and the t/native_pbc/number.t both fail: t/native_pbc/integer# Failed test (t/native_pbc/integer.t at line 35) # got: 'PackFile_unpack: Not a Parrot PackFile! # Magic number was [4c520050] not [13155a1] # Parrot VM: Can't unpack packfile t/native_pbc/integer_1.pbc. # error:imcc:main: Packfile loading failed # ' # expected: '270544960' # './parrot t/native_pbc/integer_1.pbc' failed with exit code 1 t/native_pbc/integerNOK 1# Looks like you failed 1 tests of 1. t/native_pbc/integerdubious Test returned status 1 (wstat 256, 0x100) DIED. FAILED test 1 Failed 1/1 tests, 0.00% okay Failed TestStat Wstat Total Fail Failed List of Failed --- t/native_pbc/integer.t1 256 11 100.00% 1 t/native_pbc/number# Failed test (t/native_pbc/number.t at line 42) # got: 'PackFile_unpack: Not a Parrot PackFile! # Magic number was [4c520050] not [13155a1] # Parrot VM: Can't unpack packfile t/native_pbc/number_1.pbc. # error:imcc:main: Packfile loading failed # ' # expected: '1.00 # 4.00 # 16.00 # 64.00 # 256.00 # 1024.00 # 4096.00 # 16384.00 # 65536.00 # 262144.00 # 1048576.00 # 4194304.00 # 16777216.00 # 67108864.00 # 268435456.00 # 1073741824.00 # 4294967296.00 # 17179869184.00 # 68719476736.00 # 274877906944.00 # 1099511627776.00 # 4398046511104.00 # 17592186044416.00 # 70368744177664.00 # 281474976710656.00 # 1125899906842620.00 # ' # './parrot t/native_pbc/number_1.pbc' failed with exit code 1 # Failed test (t/native_pbc/number.t at line 85) # got: 'PackFile_unpack: Not a Parrot PackFile! # Magic number was [4c520050] not [13155a1] # Parrot VM: Can't unpack packfile t/native_pbc/number_2.pbc. # error:imcc:main: Packfile loading failed Here are the first eight lines of hexdumps of t/native_pbc/integer_1.pbc 04 00 00 0d 04 00 ac 1d a0 e1 c0 b8 70 2a 58 a0 p*X. a1 55 31 01 4c 52 45 50 01 00 00 00 00 00 00 00 .U1.LREP 30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0... 03 00 00 00 04 00 00 00 42 59 54 45 43 4f 44 45 BYTECODE 5f 2d 00 00 20 00 00 00 08 00 00 00 02 00 00 00 _-.. ... 46 49 58 55 50 5f 2d 00 28 00 00 00 08 00 00 00 FIXUP_-.(... 03 00 00 00 43 4f 4e 53 54 41 4e 54 5f 2d 00 00 CONSTANT_-.. 30 00 00 00 08 00 00 00 00 00 00 00 00 00 00 00 0... and t/native_pbc/number_1.pbc 04 00 00 0d 04 00 ac 1d a0 e1 c0 b8 70 2a 58 a0 p*X. a1 55 31 01 4c 52 45 50 01 00 00 00 00 00 00 00 .U1.LREP 44 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 D... 03 00 00 00 04 00 00 00 42 59 54 45 43 4f 44 45 BYTECODE 5f 74 2f 6f 70 2f 6e 75 6d 62 65 72 5f 31 2e 70 _t/op/number_1.p 61 73 6d 00 2c 00 00 00 bc 00 00 00 02 00 00 00 asm.,... 46 49 58 55 50 5f 74 2f 6f 70 2f 6e 75 6d 62 65 FIXUP_t/op/numbe 72 5f 31 2e 70 61 73 6d 00 00 00 00 e8 00 00 00 r_1.pasm myconfig: Summary of my parrot 0.0.13 configuration: configdate='Mon Feb 23 11:47:59 2004' Platform: osname=dec_osf, archname=alpha-dec_osf jitcapable=1, jitarchname=alpha-dec_osf, jitosname=DEC_OSF, jitcpuarch=alpha execcapable=0 perl=/u/vieraat/vieraat/jhi/Perl/Platform/OSF1/bin/perl Compiler: cc='cc', ccflags='-std -D_INTRINSICS -fprm d -ieee -I/p/include -DLANGUAGE_C -pthread', Linker and Libraries: ld='ld', ldflags=' -L/p/lib', cc_ldflags='', libs='-lm -lutil -lpthread' Dynamic Linking: so='.so', ld_shared='-shared -expect_unresolved * -O4 -msym -std -s -L/p/lib', ld_shared_flags='' Types: iv=long, intvalsize=8, intsize=4, opcode_t=long, opcode_t_size=8, ptrsize=8, ptr_alignment=4 byteorder=12345678, nv=double, numvalsize=8, doublesize=8 My longsize would be 8. Note: if you have Tru64 you will need the attached (and submitted privately to Leo) config/init/hints/dec_osf.pl to get things to compile. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen dec_osf.pl Description: Perl program
Re: [perl #16283] parrot dandruff
On Sun, Aug 18, 2002 at 05:23:23PM -, Steve Fink wrote: On Sun, Aug 18, 2002 at 02:35:09PM +, Jarkko Hietaniemi wrote: Tru64 finds the following objectionable spots from a fresh CVS checkout: Does this patch fix it? (Though even if it does, I wouldn't be at all surprised if some other compiler choked on it.) Works okay in Tru64 and IRIX which are known for their pointer pickiness. On IRIX, though, I get these, where probably NO_STACK_ENTRY_TYPE is meant instead. cc-1185 cc: WARNING File = core.ops, Line = 3678 An enumerated type is mixed with another type. Stack_entry_type type = 0; ^ cc-1185 cc: WARNING File = core.ops, Line = 3678 An enumerated type is mixed with another type. Stack_entry_type type = 0; ^ cc-1185 cc: WARNING File = core.ops, Line = 3688 An enumerated type is mixed with another type. Stack_entry_type type = 0; ^ cc-1185 cc: WARNING File = core.ops, Line = 3688 An enumerated type is mixed with another type. Stack_entry_type type = 0; -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
packfile tests?
I can't off-hand see tests that would try to read in and execute bytecode written all possible combinations of wordsize/byteorder? -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: drive-by-reminder: missing JITs
* MIPS - I know a little bit more about these, but I *suspect there's a simple common instruction set * HPPA - I know very little about these, is there a common instruction set? * IA64 - reports of the IA64 instruction set tell that it combines the elegance of the IA32 CISCy instruction set with the elegance of the HPPA RISCy instruction set... :-) I intend to do nothing on these except raise gui^H^H^Hawareness :-) Or give me an acount? ;) For the HPPA and IA64 I think getting an account in the HP/CPQ Test Drive machines should help: http://www.testdrive.compaq.com/ For MIPS, I dunno whether SGI has something similar. -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: ICU and Parrot
On Fri, May 31, 2002 at 06:18:55AM +0900, Dan Kogai wrote: On Friday, May 31, 2002, at 06:06 AM, George Rhoten wrote: Hopefully you take the implicit information in the UCM files and put that into encode implementation too. For instance, in gb18030 there are whole ranges of Unicode mappings that aren't in the UCM file, but they are in the implementation of the gb18030 converter (and the XML form of the UCM file). If the encode API works with gb18030 properly, that's great :-) I'm sure that the people in China appreciate that. As a matter of fact GB18030 is ALREADY supported via Encode::HanExtra by Autrijus Tang. The only reason GB18030 was not included in Encode main is sheer size of the map. I have deliberately kept Parrot and Perl6 out of my mind until Perl 5.8 is a reality. Now that 5.8-RC1 is just 24 hours away, I should get Oy! More like 42. myself ready for Parrot and Perl6 Dan the Encode Maintainer -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
regarding cpp namespace pollution
I think the following would work. * At the beginning of each parrot source code file there must be at least two Parrot-specific defines, e.g. #define PARROT_SOURCE #define PARROT_SOURCE_REGEXEC_C These would declare both being part of Parrot, and being a particular file. If some kind of clear component architecture emerges, then a third define may be in order #define PARROT_SOURCE #define PARROT_SOURCE_GC #define PARROT_SOURCE_BOEHM_C * The parrot header files should be anal-retentively sorted into (at least) three categories: * Private to Parrot (intra-source-file protypes, for example). * Visible to friends of Parrot (XS, in Perl-5-talk) * Public. This should be kept to minimum, and to prototypes and constants. No dark scary ifdef forests, no hackish things mattering only to the Parrot implementation. There should be no (accidental) way for things external to Parrot to get at the category one: the way to do this would be to use the PARROT_SOURCE* defines. It requires some discipline, yes, but wasn't that the whole idea of this...? -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: on parrot strings
On Mon, Jan 21, 2002 at 04:37:46PM +, Dave Mitchell wrote: Jarkko Hietaniemi [EMAIL PROTECTED] wrote: There is no string type built out of native eight-bit bytes. In the good ol'days, one could usefully use regexes on 8-bit binary data, eg open G, 'myfile.gif' or die; read G, $buf, 8192 or die; if ($buf =~ /^GIF89a\x08\x02/) { . where it was clear to everyone that we are checking whether the first few bytes of the file contain (0x47, 0x49, ..., 0x02) Is this sort of thing now completely dead in the Brave New World of Of course not, I do not remember forbiddding \xHH. The default of data coming in from filehandles could still be opaque 8-bit bytes. Unicode, Locales etc etc? (yes, there's always pack, but pack is so... errr hmm ) Dave. -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: on parrot strings
On Mon, Jan 21, 2002 at 05:09:06PM +, Dave Mitchell wrote: Jarkko Hietaniemi [EMAIL PROTECTED] wrote: In the good ol'days, one could usefully use regexes on 8-bit binary data, eg open G, 'myfile.gif' or die; read G, $buf, 8192 or die; if ($buf =~ /^GIF89a\x08\x02/) { . where it was clear to everyone that we are checking whether the first few bytes of the file contain (0x47, 0x49, ..., 0x02) Is this sort of thing now completely dead in the Brave New World of Of course not, I do not remember forbiddding \xHH. The default of data coming in from filehandles could still be opaque 8-bit bytes. Good :-) I'm not clear though, how binary data could get passed to parrot's regex engine, unless there's a BINARY_8 CEF in addition to UNICODE_CEF_UTF_8 etc in Ctypedef enum {...} PARROT_CEF Yes, that's somewhat problematic. Making up a byte CEF would be Wrong, though, because there is, by definition, no CCS to map, and we would be dangerously close to conflating in CES, too... ACR-CCS-CEF-CES. Read the character model. Understand the character model. Embrace the character model. Be the character model. (And once you're it, read the relevant Unicode, XML, and Web standards.) To highlight the difference between opaque numbers and characters, the above should really be: if ($buf =~ /\x47\x49\x46\x38\x39\x61\x08\x02/) { ... } I think what needs to be done is that \xHH must not be encoded as literals (as it is now, 'A' and \x41 are identical (in ASCII)), but instead as regex nodes of their own, storing the code points. Then the regex engine can try both the right/new way (the Unicode code point), and the wrong/legacy way (the native code point). String literals have the same problem. What does foo\x41 mean? (Here, unlike with the regular expressions, we can't try both, unless we integrate Damian's quantum state variables to the core :-) We have various options: there might be a pragma to tell what CCS naked codepoints are to be understood in, or the default could be grovelled out of environment settings (both these options could affect the regex solution, too), and so forth. -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: on parrot strings
Honour where honour is due: I've got some questions about inversion lists. Where I saw them mentioned by that name were some drafts of this: http://www.aw.com/catalog/academic/product/1,4096,0201700522,00.html The book looks really promising-- unfortunately it's not yet published. -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen