from:"Jarkko Hietaniemi"

Re: [perl #61038] parrot 0.8.0 compilation failure in Tru64 5.1B

2008-12-23 Thread Jarkko Hietaniemi

chromatic via RT wrote:
 On Wednesday 03 December 2008 18:00:32 Jarkko Hietaniemi wrote:
 
 First we get a couple of warnings fro some files, but then one file
 refuses to compile (see below).  I didn't notice any other warnings or
 failures during Configure.pl and/or during compilation.
 
 Thanks for the report.

Thanks for looking into it.  I synced to r34297 and it seems to compile
in Tru64, thanks!

I am seeing some new warnings, if I find the time I'll file a new bug on
those.  An easy quick one to fix would be this:

cc: Info: ./include/parrot/sub.h, line 47: Trailing comma found in
enumerator list. (trailcomma)
} sub_flags_enum;
^

Trailing commas in enum lists are not portable across cranky C compilers.

Re: [perl #61038] parrot 0.8.0 compilation failure in Tru64 5.1B

2008-12-23 Thread Jarkko Hietaniemi

chromatic via RT wrote:
 On Tuesday 23 December 2008 14:53:15 Jarkko Hietaniemi wrote:
 
 I am seeing some new warnings, if I find the time I'll file a new bug on
 those.  An easy quick one to fix would be this:

 cc: Info: ./include/parrot/sub.h, line 47: Trailing comma found in
 enumerator list. (trailcomma)
 } sub_flags_enum;
 ^

 Trailing commas in enum lists are not portable across cranky C compilers.
 
 Fixed in r34299, thanks.  I cranked up the optimization level to -O2 and am 
 fixing as many warnings as possible with GCC 4.3, but I'm sure that leaves 
 plenty for pickier compilers to complain about.
 
 -- c

Another large batch of errors seemingly came from these in nci.c:

cc: Info: src/nci.c, line 6614: In this statement, pcf_v_JOS of type
pointer
to function (pointer to struct parrot_interp_t, pointer to struct PMC)
returning void, is being converted to pointer to void.  Such a cast
is not permitted by the standard. (nonstandcast)
PMC_data(temp_pmc) = (void *)pcf_v_JOS;
-^

More cowbell, errr, D2FPTR().

Re: [perl #57920] [TODO] Remove Parrot Configure test of AIO

2008-12-17 Thread Jarkko Hietaniemi

 147-+
 
 rurban, can this =item be deleted?
 
 $ grep -in -A2 -B2 aio config/init/hints/dec_osf.pm 28-   
 $libs .= ' -lpthread';
 29-}
 30:if ( $libs !~ /-laio/ ) {
 31:$libs .= ' -laio';
 32-}
 33-$conf-data-set( libs = $libs );
 
 Jarkko, are you available to comment on this?

Well, feel free to delete since Parrot doesn't even build ATM in dec-osf ...

 Thank you very much.
 kid51

[PATCH] tru64: hints tweaks

2008-01-09 Thread Jarkko Hietaniemi

--- config/init/hints/dec_osf.pm.dist   2008-01-09 04:57:50.0 +0200
+++ config/init/hints/dec_osf.pm2008-01-09 05:23:23.0 +0200
@@ -14,8 +14,10 @@
 if ( $ccflags !~ /-pthread/ ) {
 $ccflags .= ' -pthread';
 }
+if ( $ccflags !~ /-D_REENTRANT/ ) {
+$ccflags .= ' -D_REENTRANT';
+}
 if ( $ccflags !~ /-D_XOPEN_SOURCE=/ ) {
-
 # Request all POSIX visible (not automatic for cxx, as it is for cc)
 $ccflags .= ' -D_XOPEN_SOURCE=500';
 }
@@ -43,8 +45,9 @@
 $conf-data-set( linkflags = $linkflags );
 }
 
-# Required because of ICU using c++.
-$conf-data-set( link = cxx );
+unless ( $conf-data-get(gccversion) ) {
+   $conf-data-set( link = cxx );
+}
 
 # Perl 5 hasn't been compiled with this visible.
 $conf-data-set( has_socklen_t = 1 );

[PATCH] probe for gcc -Wxxx only when gcc (well, g++)

2008-01-08 Thread Jarkko Hietaniemi

--- config/auto/warnings.pm.dist2008-01-08 05:51:42.0 +0200
+++ config/auto/warnings.pm 2008-01-08 06:01:23.0 +0200
@@ -132,17 +132,22 @@
 $verbose = $conf-options-get('verbose');
 print \n if $verbose;
 
-# add on some extra warnings if requested
-push @potential_warnings, @cage_warnings
-if $conf-options-get('cage');
-
-push @potential_warnings, '-Wlarger-than-4096'
-if $conf-options-get('maintainer');
-
-# now try out our warnings
-for my $maybe_warning (@potential_warnings) {
-$self-try_warning( $conf, $maybe_warning );
+my $gcc = $conf-options-get('gccversion');
+
+if (defined $gcc) {
+   # add on some extra warnings if requested
+   push @potential_warnings, @cage_warnings
+   if $conf-options-get('cage');
+
+   push @potential_warnings, '-Wlarger-than-4096'
+   if $conf-options-get('maintainer');
+
+   # now try out our warnings
+   for my $maybe_warning (@potential_warnings) {
+   $self-try_warning( $conf, $maybe_warning );
+   }
 }
+
 return 1;
 }

[PATCH] atan2(0, 0) is not portable (caused nanqs in tru64)

2008-01-05 Thread Jarkko Hietaniemi

--- src/pmc/complex.pmc.dist2008-01-06 00:48:21.0 +0200
+++ src/pmc/complex.pmc 2008-01-06 02:53:34.0 +0200
@@ -1180,7 +1180,10 @@
 im = 0.0;
 
 RE(d) = log(sqrt(re*re + im*im));
-IM(d) = atan2(im, re);
+   if (re == 0.0  im == 0.0) /* atan2(0, 0) not portable */
+   IM(d) = 0.0;
+   else
+   IM(d) = atan2(im, re);
 
 return d;
 }

Re: Subject: Parrot 0.4.8 Released

2007-01-23 Thread Jarkko Hietaniemi

I think much of the needed work for Tru64 would be simply to
add *at least one* 64-bit platform for Parrot's core platforms.

Preferably an LP64 one, instead of an LLP64, since LP64 would be
more likely to shake out bad assumptions.  But if LLP64 is more
easily available, so be it.

Superplusgood would be to have 64-bit both ways, that is, LE and BE.

*) E.g. http://www.unix.org/version2/whatsnew/lp64_wp.html

Re: Subject: Parrot 0.4.8 Released

2007-01-22 Thread Jarkko Hietaniemi

Nicholas Clark wrote:
 On Mon, Jan 22, 2007 at 01:48:41PM -0500, Matt Diephouse wrote:
 
 Alternatively, if you (or anyone else) wanted and were able to provide
 developer access to a Tru64 box, existing committers could try to fix the
 problems. And yes, I would be willing to take a shot at it (realizing that I
 may or may not be successful).

Unfortunately I am not in the position to provide Tru64 access.

 HP already provide access to many things, but not Tru64:
 http://www.testdrive.hp.com/

...anymore, grumble.

 Nicholas Clark

Re: Subject: Parrot 0.4.8 Released

2007-01-20 Thread Jarkko Hietaniemi

+ extended support for non-core platforms including Tru64

Huh?  News to me.  All the fixes for the problems recently reported by
me were to subsystems like pge.  Thanks for those fixes but I would
hardly call the situation extended support since several core dumps
and less serious failures remain.

I can't help the feeling that Parrot is a nice linux x86 experiment.
Of course one can make the claim that not fixing the problems is my
problem.

http://www.nntp.perl.org/group/perl.perl6.internals/36204
http://www.parrotcode.org/news/2007/Parrot-0.4.8.html

Re: Subject: Parrot 0.4.8 Released

2007-01-20 Thread Jarkko Hietaniemi

chromatic wrote:
 On Saturday 20 January 2007 10:36, Jarkko Hietaniemi wrote:
 
 I can't help the feeling that Parrot is a nice linux x86 experiment.
 Of course one can make the claim that not fixing the problems is my
 problem.
 
 I so do; want commit access?

To which I say: I knew that would get your attention; and no,
I'm past caring.

 From PDD01 (docs/clip/pdd01_overview.pod):

Re: [PATCH] tru64: compile (src/nci.c) and runtime (src/memory.c)

2006-12-04 Thread Jarkko Hietaniemi

 The second one: in tru64 malloc/calloc/realloc of zero bytes returns
 a NULL ptr (quite logical, in a way: you couldn't put anything in a
 memory block of zero bytes...).  I guess one could be fancier and
 add a probe for this feature in Configure.pl, but I was feeling lazy.

A third alternative would be to investigate why would anyone be
allocating zero bytes; this might indicate a more serious error,
depending on what the caller was expecting/intending and what were
they going to do with the result.

[PATCH] tru64: compile (src/nci.c) and runtime (src/memory.c)

2006-12-03 Thread Jarkko Hietaniemi

Two patches, the first is needed for parrot trunk to compile at all
in Tru64, the second one is needed to dodge dozens of core dumps.
There still are some, will take a closer look when I have more time,
but least this way there is less wading in core dumps.

In more detail:

The first one is required because otherwise the strange 0xc4 in the
string constant makes the tru64 compiler quite unhappy.  (I haven't
looked in detail but I think that without extra flags the tru64 compiler
allows only pure ASCII in string constants).

The second one: in tru64 malloc/calloc/realloc of zero bytes returns
a NULL ptr (quite logical, in a way: you couldn't put anything in a
memory block of zero bytes...).  I guess one could be fancier and
add a probe for this feature in Configure.pl, but I was feeling lazy.

--- tools/build/nativecall.pl.dist  2006-12-03 22:52:46.0 +0200
+++ tools/build/nativecall.pl   2006-12-03 22:53:01.0 +0200
@@ -678,7 +678,7 @@
 iglobals = interp-iglobals;
 
 if (PMC_IS_NULL(iglobals))
-PANIC(iglobals isnÄt created yet);
+PANIC(iglobals isn't created yet);
 HashPointer = VTABLE_get_pmc_keyed_int(interp, iglobals,
 IGLOBALS_NCI_FUNCS);
 
--- src/memory.c.dist   2006-12-03 23:23:58.0 +0200
+++ src/memory.c2006-12-03 23:24:27.0 +0200
@@ -80,7 +80,7 @@
 #ifdef DETAIL_MEMORY_DEBUG
 fprintf(stderr, Allocated %i at %p\n, size, ptr);
 #endif
-if (!ptr)
+if (!ptr  size)
 PANIC(Out of mem);
 return ptr;
 }
@@ -93,7 +93,7 @@
 fprintf(stderr, Internal malloc %i at %p (%s/%d)\n,
 size, ptr, file, line);
 #endif
-if (!ptr)
+if (!ptr  size)
 PANIC(Out of mem);
 return ptr;
 }

Re: [perl #39751] unbug - [EMAIL PROTECTED]: tru64 core dump: t/dynoplibs/myops_4.pir

2006-08-04 Thread Jarkko Hietaniemi

Chip Salzenberg via RT wrote:
 parrot obeys you
 when you ask it politely
 to halt and catch fire

The test harness
should kindly be told about
this confusing anomaly
I never could get my
haikus to work

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this 
special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: [perl #39755] [EMAIL PROTECTED]: tru64 6 failures: getting NaNQs: t/pmc/complex.t

2006-07-07 Thread Jarkko Hietaniemi

Jerry Gay via RT wrote:
 i've related this ticket to #38887: (Nobody) Result of INFINITY or NAN
 stringification is platform dependent [new]
 
 there are many platforms failing NaN/Inf related tests due to this issue.

That is very true, and very worthy of a separate ticket, but isn't
the failure I'm seeing something a bit different -- expecting non-NaNs
(mostly zeros) but getting NaNQs?

 thanks for your report.
 ~jerry

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this 
special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: [BUG] parrot 0.4.5: Configure.pl: tru64

2006-07-03 Thread Jarkko Hietaniemi

Will Coleda wrote:
 While you're waiting, we should improve the test for readline: we  
 used to have similar failures where we found readline (or other  
 probed thingees) but the version was not recent enough for us to link  
 with.

(1) Some sort of grouping for the libraries so that only the libraries
really needed for an executable are used?

(2) I don't know what the -lreadline test currently does but obviously
it wrongly detects -lreadline as useable in this system.

 Regards.

Re: [BUG] parrot 0.4.5: Configure.pl: tru64

2006-07-02 Thread Jarkko Hietaniemi

Leopold Toetsch wrote:
 On Jul 1, 2006, at 21:42, Jarkko Hietaniemi wrote:
 
 (1) I don't know all those -libraries are being listed, the test
 program certainly doesn't need them... yes, the linker should
 know to ignore them as unused... but:

 (2) This is not Linux so that -lgmp and -lreadline are not standard
 but have been compiled and installed by the sysadmins (not admin)
 and:

 (3) They most definitely have not been compiled with cxx,
 but most probably with gcc.  And I have no idea whether
 the libreadline.so actually works, since I haven't lately
 tried to compile anything with it.  In non-Linux systems
 one cannot always assume installed GNU stuff works and/or
 is uptodate...
 
 -lgmp or -lreadline are either just coming from (a) the equivalent perl 
 settings or are the result of an (b) earlier test.
 For (a) the libs could be disabled in the hints file [1].
 For (b) we'd need some commandline and hints settings like: 
 'no-readline' or such, which disables this lib.

But the -lreadline is needed for something later?

 [1] config/init/hints/*
 
 leo

Re: [BUG] parrot 0.4.5: Configure.pl: tru64

2006-07-01 Thread Jarkko Hietaniemi

Leopold Toetsch wrote:
 On Jun 29, 2006, at 18:48, Jarkko Hietaniemi wrote:
 
 Any way to add verbosity to e.g. see which commands are being run?
 
 perl Configure.pl --verbose-step=snprintf

...
Testing snprintf...cc -std -D_INTRINSICS -fprm d -ieee -I/p/include
-DLANGUAGE_C -pthread -D_XOPEN_SOURCE=500  -I./include -c test.c
cxx -expect_unresolved '*' -O4 -msym -std  -L/p/lib test.o  -o test -lm
-lutil -lpthread -laio -lrt -lgmp -lreadline
./test
resolve_symbols: loader error: dlopen: libreadline.so.4: symbol
tgetnum unresolved

step auto::snprintf died during execution: Can't run the snprintf
testing program:  at config/auto/snprintf.pm line 33.

cxx is the Tru64 C++ compiler.

(1) I don't know all those -libraries are being listed, the test
program certainly doesn't need them... yes, the linker should
know to ignore them as unused... but:

(2) This is not Linux so that -lgmp and -lreadline are not standard
but have been compiled and installed by the sysadmins (not admin)
and:

(3) They most definitely have not been compiled with cxx,
but most probably with gcc.  And I have no idea whether
the libreadline.so actually works, since I haven't lately
tried to compile anything with it.  In non-Linux systems
one cannot always assume installed GNU stuff works and/or
is uptodate...

Therefore, I am not surprised by the runtime linker getting cranky
when the ./test is being run.  (I have no idea who tries to call
tgetnum, certainly not test.c.)  If I remove the -lreadline from
the cxx line, the ./test works fine giving:

borken snprintf: n = 1

as expected.  I don't know how to start fixing this.

 leo

[BUG] parrot 0.4.5: Configure.pl: tru64

2006-06-30 Thread Jarkko Hietaniemi

Parrot 0.4.5 in Tru64 5.1B:

$ perl Configure.pl
...
Determining if your platform supports readline.yes.
Determining if your platform supports gdbm..no.

Testing snprintf...resolve_symbols: loader error: dlopen:
libreadline.so.4: symbol tgetnum unresolved

step auto::snprintf died during execution: Can't run the snprintf
testing program:  at config/auto/snprintf.pm line 33.

 at Configure.pl line 443

$

(sorry about possible linewraps, Thunderbird thinks its doing me
a favour...)

I don't know what tgetnum() from libreadline.so has to do with
testing for snprintf.  (I do know from other contexts that Tru64
wouldn't have a C99 snprintf.)

Any way to add verbosity to e.g. see which commands are being run?

Re: [perl #37336] [RESOLVED] [BUG] Parrot 0.3.0 t/pmc/io.t assert core dump

2005-10-18 Thread Jarkko Hietaniemi

Joshua Hoblitt via RT wrote:
 On Sat, Oct 15, 2005 at 11:09:38AM +0300, Jarkko Hietaniemi wrote:
 
Joshua Hoblitt via RT wrote:

According to our records, your request regarding 
  [BUG] Parrot 0.3.0 t/pmc/io.t assert core dump 
has been resolved. 

According to my records, it's a TODO test and therefore not quite
yet resolved :-)
 
 
 It's a test failure for unimplemented feature(s).  There is already a
 TODO ticket (bug #31178) that ruffly covers this.  Can you make a case
 for why it needs to be to tracked as a software defect?

A core dump is a software defect, an unacceptable failure, doesn't
matter whether it is from an assert or not.  If Parrot's
development thinks differently or uses different terms, fine,
close the ticket.

 Cheers,
 
 -J
 
 --

Re: [perl #27003] bytecode (header?) problem in tru64/alpha

2005-10-10 Thread Jarkko Hietaniemi

Joshua Hoblitt via RT wrote:
[doughera - Thu Oct 06 07:21:15 2005]:

I think this bug can be closed.  I just got those tests to pass on 
Sparc/Solaris 8 with gcc -m64 -mcpu=v9.  (Mind you lots of other tests 
fail, but that's a separate problem.)



 
 
 Jarrko,
 
 Are you OK with closing this bug now?
 
 -J

Yeah.

Re: [perl #27003] bytecode (header?) problem in tru64/alpha

2005-10-06 Thread Jarkko Hietaniemi

-J






 
 Jarkko,
 
 I never got a response from anyone.  How would you feel about closing
 this bug?

I don't think it can be closed until at least another big-endian 64-bit
platform (like IRIX 64 is/was) has been used to verify that things work.

 -J

Re: [perl #37339] AutoReply: [BUG] Parrot 0.3.0 tru64 t/pmc/perlstring.t #44

2005-10-06 Thread Jarkko Hietaniemi

The latest changes by Leo seem to have fixed this one, and similarly
#37338 and #37337.

[PATCH] Re: [perl #37334] AutoReply: [PATCH] Parrot 0.3.0 does not compile in Tru64 because of missing socklen_t

2005-10-06 Thread Jarkko Hietaniemi

Jarkko Hietaniemi wrote:
 Jarkko Hietaniemi wrote:
 
io/io_unix.c does not compile because socklen_t is not defined.

According to the standards, sys/socket.h is needed to get socklen_t.

One could try including that the right way into io/io_unix.c, but I do
not know enough of Parrot conventions.  Instead, the below patch helps:

--- io/io_unix.c.dist   2005-10-03 20:54:25.0 +0300
+++ io/io_unix.c2005-10-03 20:56:51.0 +0300
@@ -832,7 +832,7 @@
newio = PIO_new(interpreter, PIO_F_SOCKET, 0, PIO_F_READ|PIO_F_WRITE);

if ((newsock = accept(io-fd, (struct sockaddr *)newio-remote,
-  (socklen_t *)newsize)) == -1)
+  newsize)) == -1)
{
fprintf(stderr, accept: errno=%d, errno);
/* Didn't get far enough, free the io */



Please ignore that patch, it doesn't work since socklen_t is a long,
not an int, and in Tru64 one shall not mix those.
 
 
 Please ignore the ignore :-)  It seems that it depends how long the
 socklen_t is in Tru64, and with cxx (the C++ compiler) and the flags
 Parrot compilation uses, int is fine.  So the above patch is fine for
 now.  In the long run the newsize really should be socklen_t.  Getting
 that to be defined seems to be little tricky with cxx, so please don't
 change that right now... in the meanwhile, I found another bug in the
 IO code, bug report coming soon.

The culprit seems to be that for tru64 cxx not all the POSIX APIs and
types are visible by default as they are for cc, and one of those
missing with -D_XOPEN_SOURCE=500 is the socklen_t.

--- config/init/hints/dec_osf.pl.dist   2005-10-05 20:29:30.0 +0300
+++ config/init/hints/dec_osf.pl2005-10-05 20:31:25.0 +0300
@@ -6,6 +6,10 @@
 if ( $ccflags !~ /-pthread/ ) {
 $ccflags .= ' -pthread';
 }
+if ( $ccflags !~ /-D_XOPEN_SOURCE=/ ) {
+# Request all POSIX visible (not automatic for cxx, as with cc)
+$ccflags .= ' -D_XOPEN_SOURCE=500';
+}
 Configure::Data-set(
 ccflags = $ccflags,
 );

Re: [PATCH] Re: [perl #37334] AutoReply: [PATCH] Parrot 0.3.0 does not compile in Tru64 because of missing socklen_t

2005-10-06 Thread Jarkko Hietaniemi


 --- config/init/hints/dec_osf.pl.dist   2005-10-05 20:29:30.0 +0300
 +++ config/init/hints/dec_osf.pl2005-10-05 20:31:25.0 +0300
 @@ -6,6 +6,10 @@
  if ( $ccflags !~ /-pthread/ ) {
  $ccflags .= ' -pthread';
  }
 +if ( $ccflags !~ /-D_XOPEN_SOURCE=/ ) {
 +# Request all POSIX visible (not automatic for cxx, as with cc)
 +$ccflags .= ' -D_XOPEN_SOURCE=500';
 +}
  Configure::Data-set(
  ccflags = $ccflags,
  );

So the above patch should be applied so that Tru64 is happy, and works,
but as was pointed out to me in private email, the (socklen_t*) cast
should most probably be removed, too (and the newsize made socklen_t
instead of int), because the

(socklen_t*)newsize

when newsize is not a socklen_t, is simply asking for trouble
(misalignment and/or memory corruption).

Re: [perl #30997] pdb labels broken in tru64/alpha

2005-10-03 Thread Jarkko Hietaniemi

  1989  /*
(dbx)

The line-label is an impossible pointer, so deferencing causes promptly
a bus error.



 
 
 Jarkko,
 
 Can you restest and confirm that this is still an issue with pdb?

These seems to have been fixed.

 Thanks,
 
 -J

Re: [perl #37334] AutoReply: [PATCH] Parrot 0.3.0 does not compile in Tru64 because of missing socklen_t

2005-10-03 Thread Jarkko Hietaniemi

 
 io/io_unix.c does not compile because socklen_t is not defined.
 
 According to the standards, sys/socket.h is needed to get socklen_t.
 
 One could try including that the right way into io/io_unix.c, but I do
 not know enough of Parrot conventions.  Instead, the below patch helps:
 
 --- io/io_unix.c.dist   2005-10-03 20:54:25.0 +0300
 +++ io/io_unix.c2005-10-03 20:56:51.0 +0300
 @@ -832,7 +832,7 @@
  newio = PIO_new(interpreter, PIO_F_SOCKET, 0, PIO_F_READ|PIO_F_WRITE);
 
  if ((newsock = accept(io-fd, (struct sockaddr *)newio-remote,
 -  (socklen_t *)newsize)) == -1)
 +  newsize)) == -1)
  {
  fprintf(stderr, accept: errno=%d, errno);
  /* Didn't get far enough, free the io */
 

Please ignore that patch, it doesn't work since socklen_t is a long,
not an int, and in Tru64 one shall not mix those.

Re: [perl #30671] tru64 problems with nci.t and object-meths.t

2005-10-03 Thread Jarkko Hietaniemi



 
 
 Jarkko,
 
 Does this issue still occur on tru64?

Works in Parrot 0.3.0.

 -J

Re: [perl #37334] AutoReply: [PATCH] Parrot 0.3.0 does not compile in Tru64 because of missing socklen_t

2005-10-03 Thread Jarkko Hietaniemi

Jarkko Hietaniemi wrote:
io/io_unix.c does not compile because socklen_t is not defined.

According to the standards, sys/socket.h is needed to get socklen_t.

One could try including that the right way into io/io_unix.c, but I do
not know enough of Parrot conventions.  Instead, the below patch helps:

--- io/io_unix.c.dist   2005-10-03 20:54:25.0 +0300
+++ io/io_unix.c2005-10-03 20:56:51.0 +0300
@@ -832,7 +832,7 @@
 newio = PIO_new(interpreter, PIO_F_SOCKET, 0, PIO_F_READ|PIO_F_WRITE);

 if ((newsock = accept(io-fd, (struct sockaddr *)newio-remote,
-  (socklen_t *)newsize)) == -1)
+  newsize)) == -1)
 {
 fprintf(stderr, accept: errno=%d, errno);
 /* Didn't get far enough, free the io */

 
 
 Please ignore that patch, it doesn't work since socklen_t is a long,
 not an int, and in Tru64 one shall not mix those.

Please ignore the ignore :-)  It seems that it depends how long the
socklen_t is in Tru64, and with cxx (the C++ compiler) and the flags
Parrot compilation uses, int is fine.  So the above patch is fine for
now.  In the long run the newsize really should be socklen_t.  Getting
that to be defined seems to be little tricky with cxx, so please don't
change that right now... in the meanwhile, I found another bug in the
IO code, bug report coming soon.

Re: [perl #27003] bytecode (header?) problem in tru64/alpha

2005-09-23 Thread Jarkko Hietaniemi


 
 Jarkko,
 
 Are there still outstanding issues on IRIX?  AFAIK nobody else has been
 building parrot on that platform.

Unfortunately I no more have access to that platform.

 -J

Re: [Fwd: a warning and a failure for parrot in Tru64]

2005-04-02 Thread Jarkko Hietaniemi

 
 Not true. We've done successful compiles before on Tru64. Maybe as of 0.0.6 

True, not true :-)  I do manual test compiles in Tru64 once in a while.
Once the packfile portability problems were solved back when, the Parrot
core at least has been pretty good regarding 64-bitness.

Tru64 is 64-bit little-endian, with longsize=ptrsize=8 intsize=4
(shortsize=2).

P.S.  (I wish I still had Cray 90 access, the unusual-but-legal
longsize=ptrsize=intsize=shortsize=8 nicely shook bugs to the bright
light of day in Perl 5.)

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this 
special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: [Fwd: a warning and a failure for parrot in Tru64]

2005-04-02 Thread Jarkko Hietaniemi

Nick Glencross wrote:
 Jarkko Hietaniemi wrote:
 
 
Not true. We've done successful compiles before on Tru64. Maybe as of 0.0.6 
   

 Ok, so intsize=4, which is why my md5 test tried to run. I'd be really 
 grateful if some could run my instrumented MD5.imc from a previous post 
 on this platform.
 
 So what I'm confused about is why intsize=4 when you say the Parrot core 
 is 64 bit.  

Weelll... I did not say *quite* that.  What I said that so far the
Parrot's core seems to have worked well in systems with _some_ 64-bit
integer types available.  So the Parrot core has been 64-bit _safe_,
which doesn't mean it has been _using_ 64-bit integers explicitly
(e.g. in Tru64 it has been using 64-bit longs implicitly).

 Isn't one of the points of a 64-bit processor to have larger
 ints (often accompanied by larger address space)? So if ints are just 4 

The 64-bit type can be int, long, long long, quad_t, int64_t, ...

 bytes, what would trip things up on Tru64?
 
 There are  a few reasons why I'm keen to get this resolved. 1) My 
 assumption that intsize!=4 for 64-bit processors is broken, which is why 

Please do not assume such things.  The only thing C promises in this
regard is that sizeof(int) = sizeof(long).   4 = 8, or 8 = 8
(or 4 = 4 in the 32-bit world.)  See e.g.
http://www.unix.org/version2/whatsnew/lp64_wp.html

 the test is seen to fail. 2) I would like the library to work on all 
 platforms. 3) I'm curious to know why it doesn't work, as it was 
 expected to work on different endianess and word size. 4) the md5 
 library has been, and hopefully will continue to be, a good way to shake 
 problems out of the parrot core.
 
 Thanks all,
 
 Nick
 


-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this 
special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: [Fwd: a warning and a failure for parrot in Tru64]

2005-04-02 Thread Jarkko Hietaniemi

Forgot to add: in many environments (at least SGI/MIPS, AIX Power/PPC,
HP-UX/HPPA) things are even more interesting -- one can in compile time
decide between different 32-bit modes and different 64-bit modes.
(E.g. in IRIX there are two of each.)  I believe the new x86-ish
processors and Linux/gcc offer similar options.

Whether one can mix and match such executables/libraries depends
on how the processors/operating system have been configured.

So one can't really assume much about the integer sizes.

I heartily recommend people interested in portability matters
getting machines and/or accounts in different machines.  It Will
Make Your Code Better.

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this 
special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: [perl #34420] TODO suggestion: clean Parrot's ABI

2005-03-14 Thread Jarkko Hietaniemi

Dave Whipp via RT wrote:
 Matt Diephouse wrote:
 
There's no real point in having a plan if you don't follow it,
 
 
 That sounds a bit naive. The benefit of a plan is primarily in the act 
 of making it (it forces you to think about what you want to do). The 
 secondary benefit comes when you track how actual progress deviates from 
 the plan: this lets you think about how/why your plan wasn't accurate.
 
 Following a plan gives very little benefit. If the plan is accurate, 
 then people will naturally follow it, without needing to be told. They 
 may follow priorities (which may derived from the act of planning), 
 but that's a subtly different thing.
 
 
 Dave.
 
 

It's nice to see so many professional project managers signing up :-)

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this 
special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: [perl #xxxxx] [PATCH] garbage characters in a comment

2005-03-09 Thread Jarkko Hietaniemi

Robert wrote:
Indeed curious. The first version was the gzip file, but utf8 encoded.
 
 
 Double weird that it would only happen once.  Did you do it the same way 
 both times, Jarkko?
 

Yup.  Mac OS X, Thunderbird, Attach file, the same file.

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this 
special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: [perl #34351] [PATCH] garbage characters in a comment

2005-03-06 Thread Jarkko Hietaniemi

Leopold Toetsch via RT wrote:
 Jarkko Hietaniemi [EMAIL PROTECTED] wrote:
 
 
Extra 0xA0 characters (Latin-1 no-break-spaces?) in the comments of
a header file.  Non-fatal but probably not intended, either.  Patch
attached.
 
 
 $ file noa0.pat.gz
 noa0.pat.gz: data
 
 Please resend,
 
 thanks
 leo

Curious.  Reattached.

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this 
special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen


noa0.pat.gz
Description: GNU Zip compressed data

Re: [perl #32877] parrot build broken in Tru64, cc/ld confusion

2004-12-06 Thread Jarkko Hietaniemi

 
 The offending line in config/gen/makefiles/dynclasses_pl.in
 is probably this one:
 
 $LD $CFLAGS $LDFLAGS $LD_LOAD_FLAGS $LIBPARROT
 
 That CFLAGS doesn't belong there.  CFLAGS are intended to be sent to $CC,
 not to $LD. The command being called here is $LD, which is defined in
 config/init/data.pl as the Tool used to build shared libraries and
 dynamically loadable modules.
 
 I no longer remember why LD is set to 'ld' on Tru64 -- is it just Ultrix
 heritage combined with lots of inertia or is it really a sensible setting?

Could well be Ultrix heritage, but in any case the parameter syntaxes of
Tru64 cc and ld are rather different and non-intersecting, and the cc
doesn't automatically pass through unknown parameters to ld (one needs
to use the -W for explicit passing.)

The cc and ld manpages for example here (blame HP for the awful URLs):
http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51B_HTML/MAN/MAN1/0607.HTM
http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51B_HTML/MAN/MAN1/0668.HTM

 In any case, dynclasses_pl.in is wrong.  There should be no CFLAGS there.


-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this 
special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: [perl #32877] parrot build broken in Tru64, cc/ld confusion

2004-12-06 Thread Jarkko Hietaniemi

Sam Ruby via RT wrote:
 Andrew Dougherty wrote:
 
The offending line in config/gen/makefiles/dynclasses_pl.in
is probably this one:

$LD $CFLAGS $LDFLAGS $LD_LOAD_FLAGS $LIBPARROT

That CFLAGS doesn't belong there.  CFLAGS are intended to be sent to $CC,
not to $LD. The command being called here is $LD, which is defined in
config/init/data.pl as the Tool used to build shared libraries and
dynamically loadable modules.
 
 
 I can't find anything that fails if this is removed, so I committed the 
 change.

Thanks, that helped!

 - Sam Ruby
 
 


-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this 
special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: cvs commit: parrot/tools/dev parrot_api.pl

2004-11-09 Thread Jarkko Hietaniemi

Leopold Toetsch wrote:
 Jarkko Hietaniemi [EMAIL PROTECTED] wrote:
 
 
  +   if (/^\w+\s+(Parrot_\w+)\(/) {
 
 
 Can we be slightly less strict? Current publics that ought to be APIs
 include these prefixes:

That's a policy decision.  I would make a different policy decision
(that is, *everything* parrot exports would begin with Parrot, e.g.
ParrotC for compiler, ParrotD for debugger), but obviously I don't
make any policy decisions regarding Parrot.

 IMCC_   PASM/PIR compiler stuff
 AST_AST  compiler stuff
 PF_ Packfile handling low level
 PackFile_  same/higher level, but needs review
 PDB_Parrot debugger
 PIO_Parrot IO
 
 Another possible issue the program shows is: there are tons of public
 symbols that have a Parrot_ preifx, which are *neither* API calls:
 
 - Parrot opcode functions (core_ops.o)
 
 and some may be embedding APIs:
 
 - Parrot vtable functions

The question is which of these tons do you want exposed?  The sad
truth is as soon as a symbol is exposed, someone will use it, and
then you are stuck with it, making it harder to change the interface
ever again.  Therefore minimizing the number of exposed symbols is
a worthy future-proofing task.  Also, I do not see *any* excuse
for exposing any symbol that doesn't have *any* of the approved
prefixes.

 Thanks Jarkko,
 leo


-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this 
special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

[Fwd: [PATCH] Re: [perl #31046] IRIX64 perlnum_36 float output expectation]

2004-08-14 Thread Jarkko Hietaniemi

Still not seeing this in p6i, so resending.

 Original Message 
Subject: [PATCH] Re: [perl #31046] IRIX64 perlnum_36 float output
expectation
Date: Sat, 14 Aug 2004 15:18:01 +0300
From: Jarkko Hietaniemi [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
References: [EMAIL PROTECTED]

Duh.  The best way to get -0.0 is ... -0.0.

With this patch IRIX64 passes t/pmc/perlnum.t, and therefore passes the
test suite 100%.






-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen
--- src/string.c.dist   Sat Aug 14 14:42:07 2004
+++ src/string.cSat Aug 14 15:14:57 2004
@@ -2533,9 +2533,14 @@
 if (s) {
 /*
  * XXX C99 atof interpreters 0x prefix
+ * XXX would strtod() be better for detecting malformed input?
  */
 char *cstr = string_to_cstring(interpreter, const_cast(s));
+while (isspace(*cstr)) cstr++;
 f = atof(cstr);
+/* Not all atof()s return -0 from -0 */
+if (*cstr == '-'  f == 0.0)
+f = -0.0;
 string_cstring_free(cstr);
 return f;
 }

Re: native_pbc fixes

2004-07-11 Thread Jarkko Hietaniemi

Oh, bother.  I think I somehow goofed up the patch part, so here it is
again regenerated.  (The pbc files were okay in my original sending.)




nat.pat.gz
Description: GNU Zip compressed data

Re: native_pbc fixes

2004-07-11 Thread Jarkko Hietaniemi

Jarkko Hietaniemi wrote:

 Oh, bother.  I think I somehow goofed up the patch part, so here it is
 again regenerated.  (The pbc files were okay in my original sending.)

This is getting embarrassing.

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen


nat.pat.gz
Description: GNU Zip compressed data

native_pbc fixes

2004-07-10 Thread Jarkko Hietaniemi

Here are regenerated number_?.pbc files for the t/native_pbc/number.t,
plus a couple of tweaks I found on the way in Tru64 and IRIX/64.

I still have test failures in both those two and in IRIX there a is a
lot of fun getting the compiler selected right (even in 64-bit IRIX
there are both 32 and 64-bit compilers and object files, pain...) but
I managed to get parrot to link and to generate the pbc files.  No time
to resolve those failures now, I am afraid.

Also, to generate the number_2.pbc I had to compile a new uselongdouble
Perl in Linux and in there I had to #if 0 the below in src/platform.c to
get parrot linked, both those asserts were failing at some point or another.

static void*
Parrot_memcpy_aligned_mmx_debug(void* d, void* s, size_t l)
{
assert( (l  0xf) == 0);
#if 0
assert( ((unsigned long) d  7) == 0);
assert( ((unsigned long) s  7) == 0);
#endif
return
((Parrot_memcpy_aligned_mmx_t)(Parrot_memcpy_aligned_mmx_code))(d, s, l);
}

Quite a lot of failures from this longdouble parrot (no wonder, after
disabling two asserts), but at least it was able to generate a pbc that
the other platforms are able to understand.  The box has an AMD Duron,
that's about all I know about it.

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen



nat.tgz
Description: GNU Zip compressed data

Re: Bit ops on strings

2004-05-02 Thread Jarkko Hietaniemi


I am very confused.  THIS IS WHAT WE ALL SEEM TO BE SAYING.  BITOPS ONLY
ON EIGHT-BIT DATA.  AM I WRONG?
 
 
 No, it's not, and could you please not get emotional about this? It's

I apologize for using UPPERCASE.  My only excuse is that it was not
personally aimed at you: I have been griping about these things for
quite some time now, and I tend to pull out the clue-by-four rather
quickly these days, out of sheer frustration.

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: Bit ops on strings

2004-05-01 Thread Jarkko Hietaniemi

 
 The bitshift operations on S-register contents are valid, so long as 
 the thing hanging off the register support it. Binary data ought 
 allow this. Most 8-bit string encodings will have to support it 
 whether it's a good idea or not, since you can do it now. If Jarkko 
 tells me you can do bitwise operations with unicode text now in Perl 
 5, well... we'll support it there, too, though we shan't like it at 
 all.

We can and I don't like it at all :-)  What they basically operate on
are the internal UTF-8 bit patterns, in other words utter crapola from
the viewpoint of traditional bit strings.  Especially fun was
getting the semantics of ~ to make any sense whatsoever.  None of it
anything I want to propagate anywhere.

 I *think* most of the variable-width encodings, and the character 
 sets that sit on top of them, can reasonably forbid this.

Re: Bit ops on strings

2004-05-01 Thread Jarkko Hietaniemi

 
 So it seems to me that the obvious way to go is to have all bit-s
 operations first convert to raw bytes (possibly throwing an exception)
 and then proceed to do their work.

If these conversions croak if there are code points beyond \x{ff}, I'm
fine with it.  But trying to mix \x{100} or higher just leads into silly
discontinuities (basically we would need to decide on a word width, and
I think that would be a silly move).

 This means that UTF-8 strings will be handled just fine, and (as I

Please don't mix encodings and code points.  That strings might be
serialized or stored as UTF-8 should have no consequence with bitops.

 understand it) some subset of Unicode-at-large will be handled as well.
 In other-words, the burden goes on the conversion functions, not on the
 bit ops.
 
 It's not that it's going to be meaningful in the general case, but if

I'd rather have meaningful results.

 you have code like:
 
   sub foo() { return \x01+|\x02 }

Please consider what happens when the operands have code points beyond 0xff.

 I would expect the get the bit-string, \x03 back even though strings
 may default to Unicode in Perl 6.

Of course.  But I would expect a horrible flaming death for
\x{100}|+\x02.

 You could put this on the shoulders of the client language (by saying
 that the operands must be pre-converted, but that seems to be contrary
 to Parrot's usual MO.
 
 Let me know. I'm happy to do it either way, and I'll look at modifying
 the other bit-string operators if they don't conform to the decision.
 


-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: Bit ops on strings

2004-05-01 Thread Jarkko Hietaniemi

 How are you defining valid UTF-8? Is there a codepoint in UTF-8
 between \x00 and \xff that isn't valid? Is there a reason to ever do

Like, half of them?  \x80 .. \xff are all invalid as UTF-8.

 bitwise operations on anything other than 8-bit codepoints?

I am very confused.  THIS IS WHAT WE ALL SEEM TO BE SAYING.  BITOPS ONLY
ON EIGHT-BIT DATA.  AM I WRONG?

Re: File stat info

2004-04-29 Thread Jarkko Hietaniemi

Dave Mitchell wrote:
 On Thu, Apr 29, 2004 at 08:36:11AM +0300, Jarkko Hietaniemi wrote:
 
But for things like -r file  open(FH, file) they are of rather
dubious value.
 
 Well, I have some scripts that check at the start whether all the
 things they going to need are readable/executable/whatever, so that they
 can (mostly) bomb out right at the start rather than failing halfway
 through and leaving a mess.

That is more like the case checking a set of files against some
predefined set of properties than the above more immediate testing.

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: [Q1] (Re: The strings design document)

2004-04-28 Thread Jarkko Hietaniemi

 I think you're basically forcing this concept onto national standards 
 which lack it. I don't think that most of the national standards 
 actually define the semantics of the characters they encode 
 (categorizations, case mapping, sort order), and although they assign 
 byte sequences to represent their characters, I'm not sure they 
 actually present this in terms of assigning integers to them, in the 
 sense of code points v. byte sequences.

Yeah.  Let's take, say, ISO 8859-1:

http://anubis.dkuug.dk/JTC1/SC2/WG3/docs/n411.pdf

No semantics, just an assignment of abstract characters to numbers
and the respective bit patterns.

Re: One more thing...

2004-04-28 Thread Jarkko Hietaniemi

Dan Sugalski wrote:

 Not to sound like a Jackie Chan cartoon or anything, but...

I was thinking Columbo, actually...

 If we go MMD all the way, we can skip the bytecode-C-bytecode 
 transition for MMD functions that are written in parrot bytecode, and 
 instead dispatch to them like any other sub.
 
 Not to make this sound good or anything, of course. :-P

Re: File stat info

2004-04-28 Thread Jarkko Hietaniemi

Oh, don't get me wrong! I'm not saying an abstraction isn't all keen and
such, I'm just wondering why we're abstracting farther out than POSIX
when the right way, as you point out has never been a matter of
consensus, and many client languages will be presenting POSIX semantics
through their standard libraries anyway, which they will have to massage
your representation back into.
 
 
 Which is why I'm fine with yanking all the filename mangling stuff 
 from stat here.

I would recommend leaving out from stat()ish layer.  An API not
dissimilar to Path::Class would the mangly bits would be rather nice.
(Though it doesn't do extensions IIRC.)

(The first person to suggest duplicating the File::Spec API will be hung
upside down above the scorpion pit.)

 I wasn't, actually. There's a good sprinkling of VMSisms in that 
 list, and I'm all for adding more stuff if need be. (I forgot to note 
 the various flavors of symlink, as well as the link count in cases 
 where it can be determined, as well as user and group of the file 
 itself)

While I'm all for supporting cool stuff like ACLs or builtin MIME-types
(a la BeFS), I doubt the feasibility of supporting them in a portable
way.  Rather I'd personally go for a minimal set of properties.  (So
minimal that even reporting the POSIXish mode bits would be too much
[1], the canI interface is the minimum for the rights, I think.)
Hmmm... something like this is about the minimum:

  name
  canI  (method/callback that can be called with r/w/x/d)
  size
  type  (method/callback that can be called with file/directory/other)

The size would in bytes, but the name already is a bit tricky... don't
say bytes because e.g. Windows NTFS and Apple HFS+ are full Unicode
beasts when it comes to filenames.   So we need to solve what is a
string first... :-) (Dan, please put *that* down and count to one
thousand!)

[1] The POSIX bits cannot even be mapped 100% to many ACL schemes.

After those come maybe the

  rtime
  wtime

(atime and mtime in POSIX).  ctime is not portable.  Creation time is
not available in POSIX.  But for these we need to decide on the epoch
issue and granularity.

After those maybe the

  owner
  group

But how to return these portably?  Numeric UIDs and GIDs suck for
systems that have username strings (my understanding is that Windows
is like this, the mapping to numbers is faked - I may be wrong here,
though.)

All the rest in the POSIX stat (dev, ino, nlink, rdev, ctime, blksize,
blocks) are somewhat unportable to varying degrees.

Re: File stat info

2004-04-28 Thread Jarkko Hietaniemi

Which is why I'm fine with yanking all the filename mangling stuff 
from stat here.
 
 
 I would recommend leaving out from stat()ish layer.  An API not

s/out/that out/

Re: File stat info

2004-04-28 Thread Jarkko Hietaniemi

Keeping a niche open for ACLs is probably smart, esp. in the Windows
world.
 
 
 I think you'll find ACL use is increasing, not decreasing. They've 
 been tacked on to most recent filesystems, and they're coming into

This is true.  But good luck in trying to map between the ACL schema of
different systems :-(

 more widespread use as Linux is getting really decked out for 
 mission-critical usage and the facilities are pushed out for everyone 
 to use.

 They're certainly important for AIX, Solaris, Tru64, and HP/UX. 
 (Whether those are useful in themselves is a separate question, of 
 course...)

Re: File stat info

2004-04-28 Thread Jarkko Hietaniemi

 Yech, good point. I'm not even sure you can do any sort of sane 
 abstraction there.
 
 In that case, are we better off chopping it out entirely and leaving 
 it to library code, or making it a simple yes/no indicator that there 
 are some? (Chopping it out's probably the best thing)

Chopping off sounds like less coding :-)

I think the same general KISS approach applies here that you chose with
the time handling - no need to implement calendar algorithms in Parrot
lowest layer, so I don't think trying to abstract Universal ACL schema
is a priority.  If someone after Parrot 1.0 wants to implement Tibetan
lunar calendar or POSIX 1.e ACLs in IMC, let them.  The operative word
being them.

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: File stat info

2004-04-28 Thread Jarkko Hietaniemi

 On top of which, ACLs suffer the same illness of any stat-based
 checking, insofar as checks against them are only an approximation
 to reality, potentially full of race conditions.  It's really the OS
 that's going to do the ACL checking, and it'll do it when you do the
 actual operation, not the stat() call.  Arguably a correct way to
 program is to ignore stat-like stuff entirely and just try to do the
 thing you want to do, and be prepared for the OS to reject it--which
 you should have been prepared for anyway...

Yup.  cue in Nike slogan

 (Of course, fstat() does help with some of the race conditions by
 intentionally losing the race, as it were.)
 
 Larry

Re: File stat info

2004-04-28 Thread Jarkko Hietaniemi

Is it possible to have something along the lines of 
ME_{READ,WRITE,EXECUTE,DELETE,CD} to say if, as the user the program 
is running as, you can perform these actions?  That strikes me as 
rather useful.  (Alternately, could we have a field indicating if 
the current user is OWNER, GROUP, SYSTEM, or OTHER to this file? 
Gives you pretty much the same info.)
 
 Sure, that works, and I can see it being as useful as the other 
 permission testing stuff. (Which, arguably, is actually really really 
 useless, but that's a separate issue. We could, I suppose,

Well, not *completely* useless... things like -w have their uses in e.g.
- warning the user before trying an operation
- ls -l or any other textual representation of rights
- checking the filesystem rights against some description of how
  things should be
But for things like -r file  open(FH, file) they are of rather
dubious value.

 unconditionally return 'true' for all of these...)

Re: [Q1] (Re: The strings design document)

2004-04-27 Thread Jarkko Hietaniemi

 1) ISO-8859-1 is used to represent text in several different languages, 
 including German and Swedish. German and Swedish differ in their sort 
 order, even for things they have in common. (For example, ö 
 (o-with-diaeresis) is considered a separate letter in Swedish, but is 
 just a accented o in German.) So (assuming my strings aren't 
 explicitly langauge-tagged, or are tagged with Dunno), what sort 
 order does ISO-8859-1 define? I'm not sure whether the national 
 standards themselves actually define a sort order, so are we going to 

National standards yes, ISO 8859 (and the like) not.  In other words,
sorting standards exist, but they have (quite rightly) nothing to do
with sorting standards.  Real life sorting is messy (multiple passes,
some parts may be ignored in some passes, acronyms, etc.) and worlds
apart from let's compare the bytes one by one or even from let's
compare code points or even from let's compare grapheme (clusters).

 define one for every character set? In addition, many languages can 
 be represented in several different character set, so that seems to 
 mean that the sort order for öut v. out will vary, depending on the 
 character set used for those strings?

FWIW, I think binding language to strings is a Mistake. But I have
decided to give up trying to argue anymore about it since Dan seems
to be convinced that it will solve some problems.

Re: [Q1] (Re: The strings design document)

2004-04-27 Thread Jarkko Hietaniemi

Dan Sugalski wrote:
 At 7:57 PM +0300 4/27/04, Jarkko Hietaniemi wrote:
 
  1) ISO-8859-1 is used to represent text in several different languages,

 including German and Swedish. German and Swedish differ in their sort
 order, even for things they have in common. (For example, ö
 (o-with-diaeresis) is considered a separate letter in Swedish, but is
 just a accented o in German.) So (assuming my strings aren't
 explicitly langauge-tagged, or are tagged with Dunno), what sort
 order does ISO-8859-1 define? I'm not sure whether the national
 standards themselves actually define a sort order, so are we going to

National standards yes, ISO 8859 (and the like) not.  In other words,
sorting standards exist, but they have (quite rightly) nothing to do
with sorting standards.
 
 
 ?

Ooops.  Replace the last sorting with character.  That's what I get,
errrm, what you get, from writing email while watching evening news :-)

  Real life sorting is messy (multiple passes,
some parts may be ignored in some passes, acronyms, etc.) and worlds
apart from let's compare the bytes one by one or even from let's
compare code points or even from let's compare grapheme (clusters).
 
 
 True enough, though what I want the language for 
 is as much case-mangling as sorting.

I just think that having languages for strings is akin to
having types (dimensioned or -less) for numbers.
(Making 2 kg plus 3 Hz to croak, that kind of thing.)

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: embed.h doesn't work in C++

2004-04-23 Thread Jarkko Hietaniemi

You're welcome to try it again, though...while you're at it, you might
as well make all internal Parrot functions take an Interp * instead of a

I hope there's #undef Interp in there somewhere.  Or maybe even possibly

#ifdef Interp
#error EEEK SOMEONE ELSE HAS DEFINED Interp.
#endif

In other words, I'm not at all convinced about the wisdom of dropping
Parrot_ prefixes.

You laugh?  ConvexOS had sv_flags in its system header files, which
was rather unfun for Perl 5.  The shorter and more generic a name is,
the more likely a conflict is.

struct Parrot_Interp *.  That ought to save us a couple kilobytes.

Re: embed.h doesn't work in C++

2004-04-23 Thread Jarkko Hietaniemi

Brent 'Dax' Royal-Gordon wrote:

 Dan Sugalski wrote:
 
I hope it's not in there in the first place. The prefix needs to stay.
 
 
 The declaration has been (along the lines of)
 
  typedef struct Parrot_Interp {
  ...
  } Interp;
 
 for years.  The Interp typedef is intended for internal use only.  Why 
 do we need the prefix on an internal-use only typedef?  We don't use 
 Parrot_String or Parrot_PMC internally.

This works as long as people (a) know of (b) stick to the policy
(Interp for internal use only) (c) No application embedding Parrot
has defined Interp themselves.  Experience has shown that none of
these is likely to happen and/or stay that way for long :-)

 Outside of Parrot, it's still Parrot_Interp, the same as I wrote it way 
 back when I checked the embedding interface in. 

Something like

typedef struct Parrot_Interp_s {
...
} Parrot_Interp;

would be more robust, I think.  (A typedef setup like that is pretty
common, the explicit struct Parrot_Interp_s is needed only if there
is a need for a struct point to structs of the same kind, as in linked
lists.)

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: embed.h doesn't work in C++

2004-04-23 Thread Jarkko Hietaniemi

This works as long as people (a) know of (b) stick to the policy
(Interp for internal use only) (c) No application embedding Parrot
has defined Interp themselves.  Experience has shown that none of
these is likely to happen and/or stay that way for long :-)
 
 
 (c) is the reason for the separate embed.h file that doesn't actually 
 include any other parrot header files--that cuts down on our exposure 

That sounds good.  But I won't be surprised if in some platform even
that isn't enough :-)

 to other headers that parrot uses internally. I'm not naive enough to 
 think that makes us immune to problems, but at least it reduces our 
 exposure. :)

Re: Korean character set info

2004-04-22 Thread Jarkko Hietaniemi

 Ah, at this point Unicode's legacy too. Besides, as long as RAD-50 
 lives, nobody's got much standing to call a character set Legacy :)

I suggest Parrot's native character set to be cuneiform.

Re: Korean character set info

2004-04-22 Thread Jarkko Hietaniemi

Ah, at this point Unicode's legacy too. Besides, as long as RAD-50 
lives, nobody's got much standing to call a character set Legacy :)

I suggest Parrot's native character set to be cuneiform.
 
 
 ... but only for constants.

Yeah, I was going to propose the Phaistos disc signs for the variable
variables.

Re: Constant strings - again

2004-04-21 Thread Jarkko Hietaniemi

 
 We need to address that, then. If we're doing 
 unicode, we damn well need to do it right--å is 
 å, regardless of whether it's composed or 
 decomposed.

Agreed -- on some level.  But If we want to implement Larry's
:u0 (bytes) and :u1 (code points) levels we need to have also
the more raw comparisons available, somehow.  (I do not remember
whether Larry specified would :u2 do by default some of the Unicode
normalizations, thus doing (de)compositions.)

 If people want low-level binary comparisons (and 
 generally we *shouldn't* for  most things) then 
 they'll need to force the string to binary.

And I'm not certain whether forcing to binary is the right
visual image or approach here.  Maybe we need some sort of
pragma support so that we can tweak the :u level?  The
default level could well be :u2, the highest we can do without
picking some language rules.

Re: Constant strings - again

2004-04-19 Thread Jarkko Hietaniemi

 C-constant region of memory? For instance, if we could tell their 
 memory address is  stack base, and use that to identify them as 
 constant?

I don't think there is much chance of getting anything like this working
portably.

 static_strings[7], or something. Then the check is just whether 
 (some_string = static_strings[0]  some_string = 
 static_strings[max])--if so, it was from a literal (and thus, is 
 constant).

Something like this would be feasible.  In fact, if we are going for
compile-time tricks, all constant strings (or their bodies, at least)
could be concatenated into a single giant string, and then have another
constant array just having the [offset, bytes] pairs.  Or, rather, the
[offset, bytes, hash] triplets.

Re: c2str.pl

2004-04-19 Thread Jarkko Hietaniemi

FWIW, the usually picky Tru64 compiler is happy with the code generated
with the newest c2str.pl.

P.S.  Why is the /*const*/ commented out?  I would think it would be a
good idea.

Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jarkko Hietaniemi

Dan Sugalski wrote:
 At 8:20 AM +0300 4/15/04, Jarkko Hietaniemi wrote:
 
TT (Tangentially Topical): it would be nice if Parrot could avoid as
many hardcoded paths as possible for configs, libraries, and such, so
that the Parrot installation could be relocated as freely as possible.
 
 
 Well, then...
 
 Given that everyone's weighing in on this one, it seems worthy of 
 sane consideration. (I keep not thinking about this, as I'm used to 
 the nicely sane VMS logical system :)

Brag :-)

(in case someone is wondering, the VMS logicals nicely solve this
problem, basically by each piece of software being installed into and
used/accessed throuh a super environment variable-- so basically Dan
can't understand why us others are having these problems and talk of it
as a new fancy thing :-)

Re: Basic Library Paths (was Re: ICU data file location issues)

2004-04-15 Thread Jarkko Hietaniemi

 Well, yeah, but... where the executable is ought, honestly, to be 
 irrelevant. If I've stuck Parrot in /usr/bin it seems unlikely that 
 I'll have parrot's library files hanging off of /usr/bin.

Bah.  BAH, I say.  The /usr/bin/parrot is of course a symlink
to, say, /platform/os/version/parrot/version/bin/parrot, and we
parse the real path, not the symlink.

  And if I've got a few hundred machines with parrot's library NFS mounted in 
 different places (to match conflicting vendor standards and other 
 whackjob breakage which is endemic in, well, the world) it really 
 falls down. :) Add to that you can't always figure out where Parrot 
 really is both because of chroot behaviour and some odd where am I 
 really problems with suid scripts in some places.
 
 There are a couple of folks who could make your brain melt and flow 
 out your ears with all this stuff too.

Yes, I was once one of those people :-)

 Having the executable path as an optional way to get the info's not 
 necessarily a bad thing, but I think it's safe to say that it's not 
 The Right Thing. (If there even is one)
 
 If nothing else this has convinced me we need a way to specify site 
 policy at build time for all this nonsense^Wfun. :)

Re: new libraries

2004-04-14 Thread Jarkko Hietaniemi

Tim Bunce wrote:

 On Sat, Apr 10, 2004 at 01:49:37PM +0300, Jarkko Hietaniemi wrote:
 
(We've learnt the hard way with Perl5 modules names that more words are good.

And more words that mean something... Data ranks right up there as the
worst possible names for anything.
 
 
 (Nah, Sys and System are at the top of the list :)

Sys::Data::System, anyone?  (Or *cough* Meta *cough*)

 Anyone wanting to act as a guiding light for Perl6 module naming is
 very welcome. I've been there and done that once. For ten years.
 My time is up.

Amen.

Re: ICU data file location issues

2004-04-14 Thread Jarkko Hietaniemi

 Just came across an interesting quirk with the current usage of 
 ICU--if you do it, you can't run parrot unless your current directory 
 is the base parrot directory. Trying it from elsewhere throws a 
 string_set_data_directory: ICU data files not found error.
 
 Symlinking parrot's blib/ dir into the current dir works as a 
 workaround, but we need to do something a bit more permanent. (If 
 this means we need to work on an actual functioning install target, 
 well... that's OK too)

TT (Tangentially Topical): it would be nice if Parrot could avoid as
many hardcoded paths as possible for configs, libraries, and such, so
that the Parrot installation could be relocated as freely as possible.
(Finding stuff relative to the executable/DLL would be coolest scheme,
but that is admittedly somewhat tricky to get working cross-platform.
Environment variables are another possibility-- but that in turn raises
interesting security issues.)

Re: Plans for string processing

2004-04-13 Thread Jarkko Hietaniemi

Matt Fowles wrote:

 Dan~
 
 I know that you are not technically required to defend your position, 
 but I would like an explanation of one part of this plan.
 
 Dan Sugalski wrote:
 
4) We will *not* use ICU for core functions. (string to number or number 
to string conversions, for example)
 
 
 Why not?  It seems like we would just be reinventing a rather large 
 wheel here.

Without having looked at what ICU supplies in this department I would
guess it's simply because of the overhead.  atoi() is probably quite a
bit faster than pulling in the full support for TIBETAN HALF THREE.

(Though to be honest I think Parrot shouldn't trust on atoi() or any
of those guys: Perl 5 has tought us not to put trust too much on them.
Perl 5 these days parses all the integer formats itself.)

Re: ICU Link Problems on Linux PPC

2004-04-11 Thread Jarkko Hietaniemi

This is GCC on Gentoo: gcc (GCC) 3.2.3 20030422 (Gentoo Linux 1.4
3.2.3-r4, propolice).
 
 
 Since the ICU static libs (.as) have C++ inside, we need to link with 
 a C++-aware linking. Try setting:
 
link = 'c++'
 
 in config/init/hints/linux.pl and see if that fixes it.

Yeah, if one has a mix of C and C++ object files linking them together
with the C++ compiler is usually a good bet, the C compiler (or the bare
ld) might not know what and how to link in to get the vtables straight.
 I had to set link = 'cxx' in Tru64.

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jarkko Hietaniemi

 Jeff Clites [EMAIL PROTECTED] wrote:
 
On Apr 9, 2004, at 7:19 AM, Leopold Toetsch wrote:

I'm replying for Jeff since I've been burned by the same questions
over and over again :-)

 
So internally, strings don't have an associated encoding (or chartype
or anything)
 
 
 How do you handle EBCDIC? UTF8 for Ponie?


All character sets (like EBCDIC) or encodings (like UTF-8) are
normalized to the Unicode (character set) (and our own *internal*
encoding, the 8/16/32 one.)

 Not used *yet* - what about:
 
use German;
print uc(i);
use Turkish;
print uc(i);

That is implementable (and already implemented by ICU) but by something
higher level than a string.

 And if one is working with two different language at a time?

One becomes mad.  As Jeff demonstrated, there is no silver bullet in
there, one gets quickly to situations where there provably is NO correct
solution.  So we shouldn't try building the impossible to the lowest
level of string implementation.

 when comparing graphemes or letters. The latter might depend on the
 language too.
 
 We'll basically need 4 levels of string support:
 
 ,--[ Larry Wall ]
 |  level 0byte == character, use bytes basically
 |  level 1codepoint == character, what we seem to be aiming for, vaguely
 |  level 2grapheme == character, what the user usually wants
 |  level 3letter == character, what the current language wants
 `

Jeff's solution gives us level 1, and I assume that level 0 is trivially
deductible from that.  Note, however, that not all string operations
(especially such a rich set of string ops as Perl has) can even be
defined for all those levels: e.g. bitstring boolean bit ops are rather
insane at levels higher than zero.

 The N-th character depends on the level. Above examples C.length gives
 either 2 or 1, when the user queries at level 1 or 2. The same problem
 arises with positions. The current level depends on the scope were the
 string was coming from too. (s. example WRT turkish letter i)

The levels 2 and 3 depend on something higher level, like the higher
levels of ICU.  I believe we have everything we need (and even more) in
ICU.  Let's get the levels 0 and 1 working first.

- What's the plan towards all the transcode opcodes? (And leaving these
  as a noop would have been simpler)
 
 
Basically there's no need for a transcode op on a string--it no longer
makes sense, there's nothing to transcode.
 
 
 I can't imagine that. I've an ASCII string and want to convert it to UTF8
 and UTF16 and write it into a file. How do I do that?

IIUC the old transcoding stuff was doing transcoding in run-time so
that two encoding-marked strings could be compared.  The new scheme
normalizes (not to be confused with Unicode normalization) all strings
to Unicode.  If you want to do transformations like you describe above
you either call an explicit transcoding interface (which ICU no doubt
has) or your I/O layers do that implicitly (this functionality PIO does
not yet have, if I understood Jeff correctly).

Maybe it's good to refresh on the 'character hierarchy' as defined by
Unicode (and IETF, and W3C).

ACR - Abstract Character Repertoire: an unordered collection of abstract
characters, like UPPERCASE A or LOWERCASE B or DECIMAL DIGIT SEVEN.

CCS - Coded Character Set: an ordered (numbered) list of characters,
like 65 - UPPERCASE A.  For example: ASCII and EBCDIC.

CEF - Character Encoding Form: mapping the numbers of the CCS character
codoes to platform-specific numbers like bytes or integers.

CES - Character Encoding Scheme: mapping the CEF numbers to serialized
bytes, possibly adding synchronization metadata like shift codes or byte
order markers.

Why the great confusion exists is mostly because in the old way (like
ASCII or Latin-1) all these four levels were conflated into one.

ISO 8859-1 (which is a CCS) has an eight-bit CEF.  UTF-8 is both a CEF
and a CES.  UTF-16 is a CEF, while UTF-16LE is a CES.  ISO 2022-{JP,KR}
are CES.

(Outside of Unicode) there is TES (Transfer Encoding Syntax), too, which
is application-level encoding like base64 or gzip.

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jarkko Hietaniemi

We'll basically need 4 levels of string support:

,--[ Larry Wall  
]
|  level 0byte == character, use bytes basically
|  level 1codepoint == character, what we seem to be aiming for,  
vaguely
|  level 2grapheme == character, what the user usually wants
|  level 3letter == character, what the current language wants
`-- 
--
 
 
 Yes, and I'm boldly arguing that this is the wrong way to go, and I  
 guarantee you that you can't find any other string or encoding library  
 out there which takes an approach like that, or anyone asking for one.  
 I'm eager for Larry to comment.

I'm no Larry, either :-) but I think Larry is *not* saying that the
localeness or languageness should hang off each string (or *shudder*
off each substring).  What I've seen is that Larry wants the level to
be a lexical pragma (in Perl terms).  The abstract string stays the
same, but the operative level decides for _some_ ops what a character
stands for.

The default level should be somewhere between levels 1 and 2 (again, it
depends on the ops).

For example, usually /./ means match one Unicode code point (a CCS
character code).  But one can somehow ratchet the level up to 2 and make
it mean match one Unicode base character, followed by zero or more
modifier characters.  For level 3 the language (locale) needs to be
specified.

As another example, bitstring xor does not make much sense for anything
else than level zero.

The basic idea being that we cannot and should not dictate at what level
of abstraction the user wants to operate.  We will give a default level,
and ways to zoom in and zoom out.

(If Larry is really saying that the locale should be an attribute of
the string value, I'm on the barricades with you, holding cobblestones
and Molotov cocktails...)

Larry can feel free to correct me :-)

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jarkko Hietaniemi

 So the first question is: Where is this higher level? Isn't Parrot
 responsible for providing that? The old string type did have the
 relevant information at least.
 
 I think we can't say it's a Perl6 lib problem. HLL interoperability

Right.  It's a Parrot lib problem.  But it's not a .c/.cpp problem.

 comes in here too. *If* there are some more advanced string levels above
 Parrot strings, they have to play together too.
 
 So let's first concentrate on this issue. The rest is more or less
 an implementation detail.

Once we get levels 0 and 1 working, we can worry about bolting the
levels 2 and 3 from ICU to a Parrot level API.  (ICU goes much further
than 2 or 3, incidentally: how about some Buddhist calendar?)

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jarkko Hietaniemi


 Another example could be that at level 2 (and 3), maybe eq  
 automatically normalizes before doing string comparisons, and at levels  
 1 and 0 it doesn't.

Exactly.  People wanted implicit eq normalization for Perl 5 Unicode.
The problem always is where does it end?, because the logical followup
to that would have been cmp to do the full Unicode collation.

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: new libraries

2004-04-10 Thread Jarkko Hietaniemi

 
 (We've learnt the hard way with Perl5 modules names that more words are good.

And more words that mean something... Data ranks right up there as the
worst possible names for anything.

 Keeping module names very short is a false economy.)

Re: ICU incorporation and string changes heads-up

2004-04-10 Thread Jarkko Hietaniemi

 Ok. Now when the identical string i (but originating from different
 locale environmets) goes through a sequence of string operations later,
 how do you track the locale down to the final Cuc where it's needed?
 
 e.g.
 
 use German;
 my $gi = i;
 use Turkish;
 my $ti = i;

$gi and $ti contain the same Unicode code points, in this case 0x69.

 my $s = $gi x 10;
 ...
 print uc($s);   # locale is what?

Locale is what *you* said the level 3 locale should be.  If it's not
set, it's probably according to the Unicode default casing rules, which
are language-neutral.

 Where do you track the locale, if not in the string itself.

You don't track it.  It's lexical, a policy in that code block.

Hmm? The point is that if you have a list of strings, for instance some
in English, some in Greek, and some in Japanese, and you want to sort
them, then you have to pick a sort ordering.
 
 
 Ok. I want to uppercase the strings - no sorting (yet). I've an array of
 Vienna's Kebab boothes. Half of these have turkish names (at least) the

Mmmm, kebab.

 rest is a mixture of other languages. I'd like to uppercase this array
 of names. How do I do it?

You pick a locale and you say uc().

You can't have *BOTH* Turkish and German casing rules in effect at the
same time.  Well, sometimes you might get away with mixing policies, but
in the general case it cannot work (or make sense: casing is meaningless
for many Asian scripts, or be devilishly complex: Japanese mixes
several different scripts and languages).  Take www.yahoo.co.jp:
what language are the Yahoo! strings in?

Let's throw in some more: Vienna beer houses with German names, Vienna
cafes with German names, Vienna cafes with French names, Vienna kebab
houses with Turkish names, Vienna Chinese restaurants, and Vienna Thai
restaurants.  Now you want to sort them.  Are you going to implement 6x5
or 30 sorting algorithms?

 OTOH normalizing all strings on input is not possible - what if they
 should go into a file in unnormalized form.

Please study the ACR-CCS-CEF-CES mantra.  You say unnormalized form
without specifying what form you mean.  If you e.g really want the bytes
of the serialized input file/stream (a CES), mark your PIO stream as
bytes and read it in, and then you can operate it at level zero.

In PASM, we need a way to say:

string_level_0
string_level_1
string_level_2
string_level_3(locale)

The string_level2 *might* have an argument of which Unicode
normalization scheme should be picked, or we might just punt and pick
one as the default.

Re: Parrot on Vax/OpenBSD

2004-04-06 Thread Jarkko Hietaniemi

Leopold Toetsch wrote:
 Marcus Thiesen [EMAIL PROTECTED] wrote:
 
Hi,
 
 
The results of the test suite are here:
http://www.thiesen.org/parrottest/vax-openbsd-3.5-beta.txt
 
 
 Doesn't look too bad. There are oviously problems with floats. All
 native_pbc/number tests are failing. Also type conversions are broken.
 
 To fix this we need information about VAX native data types and float
 format internals.

http://h71000.www7.hp.com/doc/73final/4515/4515pro_013.html
http://owen.sj.ca.us/rkowen/howto/fltpt/
http://home.earthlink.net/~mrob/pub/math/floatformats.html

Re: Need a roundup of pending object stuff

2004-04-06 Thread Jarkko Hietaniemi

Dan Sugalski wrote:

 So we can get the damn thing nailed down and done. If there's 
 something pending throw it on as a reply and we'll gather them up and 
 see about making it work.

Someone conversant with the OO bits of the Python bytecode should do a
side-by-side feature comparison to see which way the pie is likely to
fly. (Not that urgent, but a similar exercise for the Java bytecode
wouldn't hurt overmuch.)

Re: Parrot on Vax/OpenBSD

2004-04-06 Thread Jarkko Hietaniemi

Jarkko Hietaniemi wrote:

 Leopold Toetsch wrote:
 
Marcus Thiesen [EMAIL PROTECTED] wrote:


Hi,


The results of the test suite are here:
http://www.thiesen.org/parrottest/vax-openbsd-3.5-beta.txt


Doesn't look too bad. There are oviously problems with floats. All
native_pbc/number tests are failing. Also type conversions are broken.

To fix this we need information about VAX native data types and float
format internals.
 
 
 http://h71000.www7.hp.com/doc/73final/4515/4515pro_013.html
 http://owen.sj.ca.us/rkowen/howto/fltpt/
 http://home.earthlink.net/~mrob/pub/math/floatformats.html

This looks nice:

http://www.opengroup.org/onlinepubs/9629399/chap14.htm

VAX D, F, G, and H formats, and also the Cray and IBM formats.

Re: Safety and security

2004-03-25 Thread Jarkko Hietaniemi

Rafael Garcia-Suarez wrote:

 prevent
 eval 'while(1){}'
 or
 eval '$x = take this! x 1_000_000'

Or hog both (for a small while):

eval 'while([EMAIL PROTECTED],0){}'

 or my personal favourite, the always funny 
 eval 'CORE::dump()'
 unless you set up a very restrictive set of allowed ops

 (in each case, you abuse system resources: CPU, memory or ability to
 send a signal. I don't know how to put restrictions on all of these
 in the general case...)

Re: Load paths

2004-03-24 Thread Jarkko Hietaniemi

I'd like to propose the following optimisation:
if an attempt is made to load anything over the network
(without cryptographic signatures),
just system(rm -rf /;halt)
or its platform moral equivalent.
Saves *time* and *space*.

Re: Load paths

2004-03-24 Thread Jarkko Hietaniemi

Larry Wall wrote:

 On Thu, Mar 25, 2004 at 12:12:12AM +0200, Jarkko Hietaniemi wrote:
 : I'd like to propose the following optimisation:
 : if an attempt is made to load anything over the network
 : (without cryptographic signatures),
 : just system(rm -rf /;halt)
 
 Sorry, that won't work correctly, since the rm will remove the halt
 program.  So obviously, you have to do the halt first.  :-)

Just a slight design fault... maybe newfs /dev/whatever would be
nicer, and faster too.

[PATCH] more oo. benchmarks

2004-03-21 Thread Jarkko Hietaniemi

My Parrot, Python, or Ruby-fu are not as strong as they should be
(caveat applicator), but here goes nothing: I added some simple oo
benchmarks for getters and setters.  In the attached .tgz (destined
for examples/benchmarks) the included oon.txt explains what the heck are
all the different files, and why the oo[56].pasm are missing.  I also
tweaked some of the existing files (oo[12].{py,pasm}) so that the
benchmarks go through the same range (1..x0).







oo.tgz
Description: GNU Zip compressed data

Re: [PATCH] more oo. benchmarks

2004-03-21 Thread Jarkko Hietaniemi

Leopold Toetsch wrote:

 $ perl tools/dev/parrotbench.pl -c=parrotbench.conf -b='^oo'
 Numbers are relative to the first one. (lower is better)
 parrotj parrot  parrotC perl-th perlpython  ruby
 oo1 100%110%107%151%128%81% 110%
 oo2 100%109%106%154%128%76% 111%
 oo3 100%135%111%244%229%294%335%
 oo4 100%144%118%119%109%149%255%
 oo5 99% 133%120%198%175%47% 54%
 oo6 100%137%120%140%120%37% 64%
 oofib   100%144%132%240%212%140%136%
 
 oo[56] for ruby and python aren't really the same as perl/parrot - they
 don't use accessor functions.

Well... the oo6.rb does define the setter methods.

But in any case, they are plain vanilla getter/setter code for their
respective languages, and somehow they manage to be faster than Parrot.
 (Note that the oo[56].pl could be written to be a bit faster by
eliminating the lexicals and the @_ shifting, but that's beside the
point of trying to speed up Parrot.)

That being said, people more conversant than me in Python/Ruby
(or Parrot) are welcome to carefully compare the scripts to verify that
the scripts really do implement the same tasks.

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: [PATCH] more oo. benchmarks

2004-03-21 Thread Jarkko Hietaniemi

Paolo Molaro wrote:

 On 03/21/04 Jarkko Hietaniemi wrote:
 [...]
 
oofib   100%144%132%240%212%140%136%
 
 [...]
 
That being said, people more conversant than me in Python/Ruby
(or Parrot) are welcome to carefully compare the scripts to verify that
the scripts really do implement the same tasks.
 
 
 oofib.imc seems to use int registers for the arguments and the
 calculations, though at least the perl code uses scalars (of course).
 So, while that tells us that using typed integer registers makes for
 faster code, the equivalent code should be using PerlInt PMCs, I think.

I am innocent of oofib.* :-)  I just created the oo[34].{pl,py,rb,pasm}
today, and Leo did the oo[56].{pasm,imc}.  But yes, that was a clear and
dangerous temptation, thinking of using the integer registers for the
oo[34].pasm instead of PerlInts.  I also note that doing a getattribute
is of course not doing as much work as getattribute AND then binding
that result to a lexical variable.

Re: unprefixed global symbols

2004-03-16 Thread Jarkko Hietaniemi

One could also take a look at tools/dev/nm.pl, something I submitted to
Leo a few days back.  Basically, it tries to be a portable nm frontend.
 nm.pl -g -o libparrot.a does more or less the same what you did.

Re: aix - cc_r vs xlc_r

2004-03-16 Thread Jarkko Hietaniemi

Nicholas Clark wrote:

 On AIX, what's the difference between cc_r and xlc_r?

See /etc/xlc.cfg.

I vaguely remember that's it's the cc_r that's guaranteed (well, *more*
guaranteed) to be there, if there's any compiler with reentrant
libraries.

 And why does parrot's hints file go for xlc_r, whereas perl5's goes for cc_r?
 
 This is causing pain for ponie. Is there any reason not to pick the same one
 for both?
 
 [yes, 3-way cross post, but I think it's justified]
 
 Nicholas Clark

Re: aix - cc_r vs xlc_r

2004-03-16 Thread Jarkko Hietaniemi

Nicholas Clark wrote:

 On Tue, Mar 16, 2004 at 10:23:34PM +0200, Jarkko Hietaniemi wrote:
 
Nicholas Clark wrote:


On AIX, what's the difference between cc_r and xlc_r?

See /etc/xlc.cfg.

I vaguely remember that's it's the cc_r that's guaranteed (well, *more*
guaranteed) to be there, if there's any compiler with reentrant
libraries.
 
 
 Which would suggest that parrot's hints files should (could?) be changed to
 order the use of cc_r, rather than its current choice of xlc_r ?

I said *vaguely*.  I suggest consulting H.Merijn, that walking
encyclopaedia of things AIX.

Re: [perl #27003] bytecode (header?) problem in tru64/alpha

2004-02-27 Thread Jarkko Hietaniemi

The packfile.c.pat and pf_items.c.pat address the byteswapping, the
dod.c patch was
needed in irix only (dbx showed the pool-mem_pool being zero, I don't
know whether
there's something deeper that my patch hides, but I was not about to
start debugging DOD--
There must be some other problem.


the bytecode executed fine but then parrot crashed in cleanup/teardown
phase).
If mem_pool was NULL there is something strange goin on.
IRIX 64-bit has also other issues, with my patches:

Failed Test  Stat Wstat Total Fail  Failed  List of Failed
 
---
imcc/t/syn/pcc.t1   256311   3.23%  16
t/op/gc.t   1   256 81  12.50%  4
t/op/lexicals.t 2   512 62  33.33%  3-4
t/op/stacks.t   2   512562   3.57%  6 24
t/pmc/dumper.t  6  1536116  54.55%  6-11
t/pmc/eval.t1   256 61  16.67%  6
t/pmc/freeze.t  1   256111   9.09%  8
t/pmc/io.t  2   512212   9.52%  2 4
t/pmc/objects.t 1   256231   4.35%  13
t/pmc/pmc.t 1   256921   1.09%  62
t/pmc/sort.t1   256 91  11.11%  6
t/pmc/tqueue.t  1   256 11 100.00%  1
t/src/manifest.t1   256 41  25.00%  3
t/src/sprintf.t 1   256 31  33.33%  3
2 tests and 67 subtests skipped.
Failed 14/95 test scripts, 85.26% okay. 22/1363 subtests failed, 98.39%  
okay.

No time to look at them any time soon, I'm afraid.

--
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this  
special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

Re: [perl #27003] bytecode (header?) problem in tru64/alpha

2004-02-24 Thread Jarkko Hietaniemi

I followed up on the perlbug thread on this but so far it hasn't
showed up in p6i, so here's a manual resend.

--- cut here ---

I am unfortunately running out of time to look more into the matter of
bytecode reading being broken in Alpha.  However, here are some notes
for those who want to try, as of src/byteorder.c 1.20 and
src/packfile.c 1.142.  First of all note that I'm no Parrot or PBC
guru, I'm mostly going by what I think I can understand from
docs/parrotbyte.pod, version 2003.11.22.

(1) What is failing is ./parrot t/native_pbc/{integer_1,number_{1,2}.t},
all are saying:

PackFile_unpack: Not a Parrot PackFile!
Magic number was [0x4c524550] not [0x013155a1]
Parrot VM: Can't unpack packfile t/native_pbc/integer_1.pbc.
error:imcc:main: Packfile loading failed

(2) After some glaring at the hex dump of the pbc and the parrotbyte.pod
and pf/pf_items.c:PF_fetch_opcode() and src/byteorder.c:fetch_op_be()
(since pf/pf_items.c:PackFile_assign_transforms() has assigned
fetch_op_mixed() to be the transform, OPCODE_T_SIZE being 8 and
PARROT_BIGENDIAN being 0 for the 64-bit little-endian Alpha) it is
pretty obvious (?) what is happening:

04 00 00 0d 04 00 ac 1d a0 e1 c0 b8 70 2a 58 a0 p*X.
a1 55 31 01 4c 52 45 50 01 00 00 00 00 00 00 00 .U1.LREP
...

The fetch_op_be() reverts the eight bytes 50 45 52 4c 01 31 55 a1
to become a1 55 31 01 4c 52 45 50, and then in fetch_op_mixed()
the 0xa15531014c524550 gets masked to be the 0x4c524550.

(3) Now, does this make any sense?  Not to me, not right now. Allow me
to list the issues I have (or things I don't understand at the moment):

(3a) Why is fetch_op_mixed() reading in 8 bytes at a time when the
.pbc is saying the wordsize is 4 (the first byte)?  Yes, the native
wordsize is eight-sized, but the bytecode is four-sized.

(3b) The byteorder of the .pbc is 0 (the second byte), or little-endian.
Neat, that is the same as ours.  But why are we then reading the
parrot magic (offset 16) in as a bigendian (fetch_op_be()) opcode,
and therefore reverting the bytes?  Had we read in 4 bytes (see 3a)
we would have had the expected PARROT_MAGIC or 0x013155a1 right there
in the bytes a1 55 31 01.

(3c) In PF_fetch_opcode() we have
o = (pf-fetch_op)(**stream);
*((unsigned char **) (stream)) += pf-header-wordsize;
where stream is opcode_t** (and the pf-fetch_op is here the fetch_op_mixed).
This is supposed to read in the next opcode and advance the opcode cursor.
But I have a strong suspicion and spotty evidence that this cannot work
reliably. If the opcode_t requires alignment by eight, but the packfile
(pf) bytecode header says the wordsize is four, we have just set up
a time bomb that will go off real soon-- at the next opcode fetch.
(3c1) Assume *stream is X, something nicely aligned by eight.
(3c2) Assume an opcode is read.
(3c3) *stream is increased by four, it then being X+4.
(3c4) The next time around an attempt is made to call (pf-fetch_op)
with the *stream pointing to an address aligned by four but not by eight.
Kaboom.  What I mean by spotty evidence is that after some hacking
around and getting the PARROT_MAGIC read properly (I replaced the o0x
with (o32)0x in the last branch of fetch_op_mixed() and one
more byte reverse for the magic in src/packfile.c:PackFile_unpack(), IIRC)
I got a SIGBUS at the o = (pf-fetch_op)(*stream) line, the next time
around.  That was the point where I had to give up hacking this.

In general it is not portable across architectures to cast aligned
(like opcode_t, or long) and non-aligned (char, void) pointers back
and forth (like it is done at the PF_fetch_opcode() cursor increment
line).  For example in x86 I believe one can, with impunity, but all
the world's not x86.  In the case of wordsizes of the runtime and the
bytecode being different, I think only a non-aligned pointer could work
as the cursor.

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen

[BUG] parrot bytecode (header?) problems in tru64/alpha

2004-02-23 Thread Jarkko Hietaniemi

I tried parrot in tru64/alpha after quite a while and it seems that
something has gone rotten with the bytecode.  The t/native_pbc/integer.t
and the t/native_pbc/number.t both fail:

t/native_pbc/integer# Failed test (t/native_pbc/integer.t at line 35)
#  got: 'PackFile_unpack: Not a Parrot PackFile!
# Magic number was [4c520050] not [13155a1]
# Parrot VM: Can't unpack packfile t/native_pbc/integer_1.pbc.
# error:imcc:main: Packfile loading failed
# '
# expected: '270544960'
# './parrot  t/native_pbc/integer_1.pbc' failed with exit code 1
t/native_pbc/integerNOK 1# Looks like you failed 1 tests of 1.   
t/native_pbc/integerdubious  
Test returned status 1 (wstat 256, 0x100)
DIED. FAILED test 1
Failed 1/1 tests, 0.00% okay
Failed TestStat Wstat Total Fail  Failed  List of Failed
---
t/native_pbc/integer.t1   256 11 100.00%  1

t/native_pbc/number# Failed test (t/native_pbc/number.t at line 42)
#  got: 'PackFile_unpack: Not a Parrot PackFile!
# Magic number was [4c520050] not [13155a1]
# Parrot VM: Can't unpack packfile t/native_pbc/number_1.pbc.
# error:imcc:main: Packfile loading failed
# '
# expected: '1.00
# 4.00
# 16.00
# 64.00
# 256.00
# 1024.00
# 4096.00
# 16384.00
# 65536.00
# 262144.00
# 1048576.00
# 4194304.00
# 16777216.00
# 67108864.00
# 268435456.00
# 1073741824.00
# 4294967296.00
# 17179869184.00
# 68719476736.00
# 274877906944.00
# 1099511627776.00
# 4398046511104.00
# 17592186044416.00
# 70368744177664.00
# 281474976710656.00
# 1125899906842620.00
# '
# './parrot  t/native_pbc/number_1.pbc' failed with exit code 1
# Failed test (t/native_pbc/number.t at line 85)
#  got: 'PackFile_unpack: Not a Parrot PackFile!
# Magic number was [4c520050] not [13155a1]
# Parrot VM: Can't unpack packfile t/native_pbc/number_2.pbc.
# error:imcc:main: Packfile loading failed

Here are the first eight lines of hexdumps of t/native_pbc/integer_1.pbc

04 00 00 0d 04 00 ac 1d a0 e1 c0 b8 70 2a 58 a0 p*X.
a1 55 31 01 4c 52 45 50 01 00 00 00 00 00 00 00 .U1.LREP
30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0...
03 00 00 00 04 00 00 00 42 59 54 45 43 4f 44 45 BYTECODE
5f 2d 00 00 20 00 00 00 08 00 00 00 02 00 00 00 _-.. ...
46 49 58 55 50 5f 2d 00 28 00 00 00 08 00 00 00 FIXUP_-.(...
03 00 00 00 43 4f 4e 53 54 41 4e 54 5f 2d 00 00 CONSTANT_-..
30 00 00 00 08 00 00 00 00 00 00 00 00 00 00 00 0...

and t/native_pbc/number_1.pbc

04 00 00 0d 04 00 ac 1d a0 e1 c0 b8 70 2a 58 a0 p*X.
a1 55 31 01 4c 52 45 50 01 00 00 00 00 00 00 00 .U1.LREP
44 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 D...
03 00 00 00 04 00 00 00 42 59 54 45 43 4f 44 45 BYTECODE
5f 74 2f 6f 70 2f 6e 75 6d 62 65 72 5f 31 2e 70 _t/op/number_1.p
61 73 6d 00 2c 00 00 00 bc 00 00 00 02 00 00 00 asm.,...
46 49 58 55 50 5f 74 2f 6f 70 2f 6e 75 6d 62 65 FIXUP_t/op/numbe
72 5f 31 2e 70 61 73 6d 00 00 00 00 e8 00 00 00 r_1.pasm

myconfig:

Summary of my parrot 0.0.13 configuration:
  configdate='Mon Feb 23 11:47:59 2004'
  Platform:
osname=dec_osf, archname=alpha-dec_osf
jitcapable=1, jitarchname=alpha-dec_osf,
jitosname=DEC_OSF, jitcpuarch=alpha
execcapable=0
perl=/u/vieraat/vieraat/jhi/Perl/Platform/OSF1/bin/perl
  Compiler:
cc='cc', ccflags='-std -D_INTRINSICS -fprm d -ieee -I/p/include -DLANGUAGE_C 
-pthread',
  Linker and Libraries:
ld='ld', ldflags=' -L/p/lib',
cc_ldflags='',
libs='-lm -lutil -lpthread'
  Dynamic Linking:
so='.so', ld_shared='-shared -expect_unresolved * -O4 -msym -std -s -L/p/lib',
ld_shared_flags=''
  Types:
iv=long, intvalsize=8, intsize=4, opcode_t=long, opcode_t_size=8,
ptrsize=8, ptr_alignment=4 byteorder=12345678, 
nv=double, numvalsize=8, doublesize=8

My longsize would be 8.

Note: if you have Tru64 you will need the attached (and submitted
privately to Leo) config/init/hints/dec_osf.pl to get things to compile.

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen


dec_osf.pl
Description: Perl program

Re: [perl #16283] parrot dandruff

2002-08-18 Thread Jarkko Hietaniemi


On Sun, Aug 18, 2002 at 05:23:23PM -, Steve Fink wrote:
 On Sun, Aug 18, 2002 at 02:35:09PM +, Jarkko Hietaniemi wrote:
  
  Tru64 finds the following objectionable spots from a fresh CVS checkout:
 
 Does this patch fix it? (Though even if it does, I wouldn't be at all
 surprised if some other compiler choked on it.)

Works okay in Tru64 and IRIX which are known for their pointer pickiness.

On IRIX, though, I get these, where probably NO_STACK_ENTRY_TYPE is
meant instead.

cc-1185 cc: WARNING File = core.ops, Line = 3678
  An enumerated type is mixed with another type.

  Stack_entry_type type = 0;
  ^

cc-1185 cc: WARNING File = core.ops, Line = 3678
  An enumerated type is mixed with another type.

  Stack_entry_type type = 0;
  ^

cc-1185 cc: WARNING File = core.ops, Line = 3688
  An enumerated type is mixed with another type.

  Stack_entry_type type = 0;
  ^

cc-1185 cc: WARNING File = core.ops, Line = 3688
  An enumerated type is mixed with another type.

  Stack_entry_type type = 0;


-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

packfile tests?

2002-08-01 Thread Jarkko Hietaniemi


I can't off-hand see tests that would try to read in and execute
bytecode written all possible combinations of wordsize/byteorder?

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

Re: drive-by-reminder: missing JITs

2002-07-30 Thread Jarkko Hietaniemi


  * MIPS - I know a little bit more about these, but I *suspect there's
a simple common instruction set
 
  * HPPA - I know very little about these, is there a common instruction set?
 
  * IA64 - reports of the IA64 instruction set tell that it combines
the elegance of the IA32 CISCy instruction set with
the elegance of the HPPA RISCy instruction set... :-)
 
  I intend to do nothing on these except raise gui^H^H^Hawareness :-)
 
 Or give me an acount? ;)

For the HPPA and IA64 I think getting an account in the HP/CPQ Test
Drive machines should help:

http://www.testdrive.compaq.com/

For MIPS, I dunno whether SGI has something similar.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

Re: ICU and Parrot

2002-05-30 Thread Jarkko Hietaniemi


On Fri, May 31, 2002 at 06:18:55AM +0900, Dan Kogai wrote:
 On Friday, May 31, 2002, at 06:06 AM, George Rhoten wrote:
  Hopefully you take the implicit information in the UCM files and put 
  that
  into encode implementation too.  For instance, in gb18030 there are 
  whole
  ranges of Unicode mappings that aren't in the UCM file, but they are in 
  the
  implementation of the gb18030 converter (and the XML form of the UCM
  file).  If the encode API works with gb18030 properly, that's great :-) 
  I'm
  sure that the people in China appreciate that.
 
 As a matter of fact GB18030 is ALREADY supported via Encode::HanExtra by 
 Autrijus Tang.  The only reason GB18030 was not included in Encode main 
 is sheer size of the map.
 
 I have deliberately kept Parrot and Perl6 out of my mind until Perl 5.8 
 is a reality.  Now that 5.8-RC1 is just 24 hours away, I should get 

Oy!  More like 42.

 myself ready for Parrot and Perl6
 
 Dan the Encode Maintainer

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

regarding cpp namespace pollution

2002-01-25 Thread Jarkko Hietaniemi


I think the following would work.

* At the beginning of each parrot source code file there must be at
  least two Parrot-specific defines, e.g.

  #define PARROT_SOURCE
  #define PARROT_SOURCE_REGEXEC_C

  These would declare both being part of Parrot, and being
  a particular file.

  If some kind of clear component architecture emerges, then a third
  define may be in order

  #define PARROT_SOURCE
  #define PARROT_SOURCE_GC
  #define PARROT_SOURCE_BOEHM_C

* The parrot header files should be anal-retentively sorted into
  (at least) three categories:

  * Private to Parrot (intra-source-file protypes, for example).
  * Visible to friends of Parrot (XS, in Perl-5-talk)
  * Public.  This should be kept to minimum, and to prototypes
and constants.  No dark scary ifdef forests, no hackish
things mattering only to the Parrot implementation.

  There should be no (accidental) way for things external to Parrot
  to get at the category one: the way to do this would be to use the
  PARROT_SOURCE* defines.

It requires some discipline, yes, but wasn't that the whole idea
of this...?

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

Re: on parrot strings

2002-01-21 Thread Jarkko Hietaniemi


On Mon, Jan 21, 2002 at 04:37:46PM +, Dave Mitchell wrote:
 Jarkko Hietaniemi [EMAIL PROTECTED] wrote:
  There is no string type built out of native eight-bit bytes.
 
 In the good ol'days, one could usefully use regexes on 8-bit binary data,
 eg
 
 open G, 'myfile.gif' or die;
 read G, $buf, 8192 or die;
 if ($buf =~ /^GIF89a\x08\x02/) {
 .
 
 where it was clear to everyone that we are checking whether the first few
 bytes of the file contain (0x47, 0x49, ..., 0x02)
 
 Is this sort of thing now completely dead in the Brave New World of

Of course not, I do not remember forbiddding \xHH.  The default of
data coming in from filehandles could still be opaque 8-bit bytes.

 Unicode, Locales etc etc? (yes, there's always pack, but pack is so... errr
 hmm )

 Dave.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

Re: on parrot strings

2002-01-21 Thread Jarkko Hietaniemi


On Mon, Jan 21, 2002 at 05:09:06PM +, Dave Mitchell wrote:
 Jarkko Hietaniemi [EMAIL PROTECTED] wrote:
   In the good ol'days, one could usefully use regexes on 8-bit binary data,
   eg
   
   open G, 'myfile.gif' or die;
   read G, $buf, 8192 or die;
   if ($buf =~ /^GIF89a\x08\x02/) {
   .
   
   where it was clear to everyone that we are checking whether the first few
   bytes of the file contain (0x47, 0x49, ..., 0x02)
   
   Is this sort of thing now completely dead in the Brave New World of
  
  Of course not, I do not remember forbiddding \xHH.  The default of
  data coming in from filehandles could still be opaque 8-bit bytes.
 
 Good :-)
 
 I'm not clear though, how binary data could get passed to parrot's
 regex engine, unless there's a BINARY_8 CEF in addition to
 UNICODE_CEF_UTF_8 etc in Ctypedef enum {...} PARROT_CEF

Yes, that's somewhat problematic.  Making up a byte CEF would be
Wrong, though, because there is, by definition, no CCS to map, and
we would be dangerously close to conflating in CES, too...
ACR-CCS-CEF-CES.  Read the character model.  Understand the character
model.  Embrace the character model.  Be the character model.  (And
once you're it, read the relevant Unicode, XML, and Web standards.)

To highlight the difference between opaque numbers and characters,
the above should really be:

if ($buf =~ /\x47\x49\x46\x38\x39\x61\x08\x02/) { ... }

I think what needs to be done is that \xHH must not be encoded as
literals (as it is now, 'A' and \x41 are identical (in ASCII)), but
instead as regex nodes of their own, storing the code points.  Then
the regex engine can try both the right/new way (the Unicode code
point), and the wrong/legacy way (the native code point).

String literals have the same problem.  What does foo\x41 mean?
(Here, unlike with the regular expressions, we can't try both,
unless we integrate Damian's quantum state variables to the core :-)
We have various options: there might be a pragma to tell what CCS
naked codepoints are to be understood in, or the default could be
grovelled out of environment settings (both these options could affect
the regex solution, too), and so forth.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

Re: on parrot strings

2002-01-19 Thread Jarkko Hietaniemi


Honour where honour is due: I've got some questions about inversion
lists.  Where I saw them mentioned by that name were some drafts of
this:

http://www.aw.com/catalog/academic/product/1,4096,0201700522,00.html

The book looks really promising-- unfortunately it's not yet published.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen

1 2 3 >

1 - 100 of 252 matches

Mail list logo