In perl.git, the branch khw/ebcdic has been created
<http://perl5.git.perl.org/perl.git/commitdiff/1666902a7f0e665fde904483048433402fef70b8?hp=0000000000000000000000000000000000000000>
at 1666902a7f0e665fde904483048433402fef70b8 (commit)
- Log -----------------------------------------------------------------
commit 1666902a7f0e665fde904483048433402fef70b8
Author: Karl Williamson <[email protected]>
Date: Thu Mar 7 12:08:41 2013 -0700
XXXu8
M utfebcdic.h
commit 5862ed4338593c1d59b0178f88240a6444392f60
Author: Karl Williamson <[email protected]>
Date: Wed Mar 6 21:32:42 2013 -0700
Revert "XXX get Configure to work on Linux"
This reverts commit 587944ecf24503eddf45df4acf45ae60da17030d.
M Configure
commit d3ecc9881bd8db224deb511f7a64fc900e812408
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 09:09:29 2013 -0700
XXX get Configure to work on Linux
M Configure
commit 11ecd11c92b627092d3412cc613b221f4bec8be5
Author: Karl Williamson <[email protected]>
Date: Fri Mar 8 08:11:38 2013 -0700
l
M regcharclass.h
M regen/regcharclass.pl
commit e19baab085c00bc58dc375dc73655a6cccfb8143
Author: Karl Williamson <[email protected]>
Date: Wed Mar 6 21:47:21 2013 -0700
XXX: Turn off debug tracing in perly.c
THis is somehow getting into lib/buildcustomize.pl
M perly.c
commit 8d567935eceebfd6a8a43f05ee52359d8a6f1a0a
Author: Karl Williamson <[email protected]>
Date: Wed Mar 6 21:30:01 2013 -0700
XXX: rebase: Add cast
M utfebcdic.h
commit 2c9ce6f26574ba2e744321c81f6fe43f72ef341e
Author: Karl Williamson <[email protected]>
Date: Wed Mar 6 17:04:58 2013 -0700
XXXtemp: Use native, canned values for isFOO()
M handy.h
commit bdfea9b6e804071473e9ee595d59858e50140563
Author: Karl Williamson <[email protected]>
Date: Tue Mar 5 10:36:07 2013 -0700
XXX Enable lex debugging wihout -DDEBUGGING
M perly.c
commit ccdd424f7f7d982bae9705fa9ee97bc19d18cbc4
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 19:16:31 2013 -0700
XXX: perly.c: Reinstate some ebcdic code
This is an experiment to see if this fixes things
M perly.c
commit b57d3780482a97974ec0b2a88343531fec7100bd
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 13:43:26 2013 -0700
gv.c: Remove EBCDIC dependency
M gv.c
commit 4a0bcad6671e62ca6a2d4c8c7d9d1d91fe52f659
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 13:00:47 2013 -0700
toke.c: Remove EBCDIC dependency
M toke.c
commit d0e4ed97768a7b025803b3fe2055e04bdf3e32a0
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 09:14:25 2013 -0700
toke.c: Remove character set dependency
Instead of hard-coding the bit patterns that comprise the Byte Order
Mark in the UTF-8 or UTF-EBCDIC encodings, use the generated ones for
the current platform.
This removes some EBCDIC-only code.
M toke.c
commit afc05107ebad7c275487f943ed03cfebd6aca1e9
Author: Karl Williamson <[email protected]>
Date: Mon Mar 4 09:10:27 2013 -0700
unicode_constants.h: Add #defines for Byte Order Mark
These will be used in future commits
M regen/unicode_constants.pl
M unicode_constants.h
commit 370920f0a05e0d34b3b262e35e3b2615bfdbd4e7
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 20:53:04 2013 -0700
regen/unicode_constants.pl: Change #define name
This was added in the 5.17 series so there's no code relying on its
current name. I think that the abbreviation is clearer.
M regen/unicode_constants.pl
M unicode_constants.h
M x2p/a2py.c
commit 909dc5300ce1e0a82c29fd90923cd3428896448b
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 20:43:56 2013 -0700
regen/unicode_constants.pl: Make portable to non-ASCII
This now uses the U+ notation to indicate code points, which is
unambiguous not matter what the platform's character set is. (charnames
accepts the U+ notation)
M regen/unicode_constants.pl
M unicode_constants.h
commit 489b5502109a4553221369a191d113a31aa8ed3f
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 20:29:33 2013 -0700
regen/unicode_constants.pl: Remove unused constant
This was added in the 5.17 series, so can't be yet in the field; and
isn't needed.
M regen/unicode_constants.pl
M unicode_constants.h
commit c27bd4fa16d55302b3cb55c1752f1196b66a20f7
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 19:28:43 2013 -0700
regen/unicode_constants.pl: Pass through input comments
The data can now have comments, which are converted to C and passed
through
M regen/unicode_constants.pl
commit d53e53b4e1ba0b278245bba26d92c33d9dc13aa2
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 19:19:02 2013 -0700
regen/unicode_constants.pl: Convert '-' in names to '_'
Unicode character names can have dashes in them. These aren't accepted
in C macro names. Change so both blanks and the hyphen-minus are
converted to underscores.
M regen/unicode_constants.pl
commit 7a95b8f019e1f10ee01e451aa12ad463aaf758ef
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 15:04:18 2013 -0700
XXX: Find a cleaner way. Handle missing is_UTF8_CHAR_utf8_safe
This macro may not be present, and is currently used exclusively in
IS_UTF8_CHAR, which itself may be undefined, and code should cope with
that. This is a work-around until a better solution is found.
M utf8.c
M utf8.h
commit c8c80721be763feb018f4d11a81fa92b45077cb8
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 14:09:04 2013 -0700
Add Porting tool for help with non-ASCII platforms
Porting/reorder_l1_char_class_tab.pl is used to bootstrap Perl onto a
non-ASCII platform with no working Perl.
M MANIFEST
A Porting/reorder_l1_char_class_tab.pl
commit f800d0aca07e8397f2b8a5b9c88664b32501716d
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 13:06:58 2013 -0700
inline.h: Reorder functions
The comment implied that the functions below it in the file were
deprecated, but in fact only the next two functions were. This
clarifies that and moves them so they are the final ones in the file
M inline.h
commit bfbda92d5558694bac12861bedf2a615af923ac1
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 12:33:42 2013 -0700
utfebcdic.h: Add comment
M utfebcdic.h
commit 9b02083051e2df3c5a34d514aee6dc39e17f8b66
Author: John Goodyear <[email protected]>
Date: Sat Mar 2 12:31:25 2013 -0700
XXX Temporary for z/OS long long support
M Configure
M hints/os390.sh
commit 4b3e1d5c9a18e16aeeb9de16c8b9a50b361b9cfa
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 12:12:11 2013 -0700
utf8.h: Clean up START_MARK definition and use
The previous definition broke good encapsulation rules. UTF_START_MARK
should return something that fits in a byte; it shouldn't be the caller
that does this. So the mask is moved into the definition. This means
it can apply only to the portion that creates something larger than a
byte. Further, the EBCDIC version can be simplified, since 7 is the
largest possible number of bytes in an EBCDIC UTF8 character.
M utf8.h
M utfebcdic.h
commit 85322e70306ca3ce06fd9d379a251c9c0c96220e
Author: Karl Williamson <[email protected]>
Date: Sat Mar 2 12:05:26 2013 -0700
utf8.h: Move #includes
These two files were only being #included for non-ebcdic compiles; they
should be included always.
M utf8.h
commit 7cf7364cdae81ae4f797b14bde51f965e238d60f
Author: John Goodyear <[email protected]>
Date: Sat Mar 2 11:49:14 2013 -0700
utfebcdic.h: Remove extra parameter expansions
These two macros were improperly expanding the parameters as well as
defining the operation, leading to compile errors.
M utfebcdic.h
commit b2796e5e5c9100406af006082bde5f45004da0ce
Author: Karl Williamson <[email protected]>
Date: Fri Mar 1 08:28:52 2013 -0700
utf8.h: Simplify UTF8_EIGHT_BIT_foo on EBCDIC
These macros were previously defined in terms of UTF8_TWO_BYTE_HI and
UTF8_TWO_BYTE_LO. But the EIGHT_BIT versions can use the less general
and simpler NATIVE_TO_LATN1 instead of NATIVE_TO_UNI because the input
domain is restricted in the EIGHT_BIT. Note that on ASCII platforms,
these both expand to the same thing, so the difference matters only on
EBCDIC.
M utf8.h
commit 2aef1014f1e8fbdd1dd0a36d0585eecb1f3c83c6
Author: Karl Williamson <[email protected]>
Date: Thu Feb 28 21:34:38 2013 -0700
XXX temp: makedepend.SH \{1000\} doesn't work on z/OS
This tries 500 instead. We'll keep going down until we get a number
that works.
M makedepend.SH
commit 6267c19421c706e544c5f63e068f9890c9e28fbf
Author: Karl Williamson <[email protected]>
Date: Thu Feb 28 09:25:27 2013 -0700
XXX temp: show makedepend cerr
M makedepend.SH
commit 49472ae5de46cefd443f7cfab0ac94583a440b74
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 21:59:11 2013 -0700
makedepend.SH: Split too long lines; properly join
I had thought that a continuation introduced a space. But no,
a continuation can happen in the middle of a token.
And this splits lines that are getting very long to avoid preprocessor
limitations.
M makedepend.SH
commit e1a89034961dbc6597375261ecf2137393041541
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 15:51:28 2013 -0700
makedepend.SH: White-space only
Align continuation backslashes
M makedepend.SH
commit f1d056ba2f5cf7c4729ee624a594e8a19b313a01
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 14:39:28 2013 -0700
makedepend.SH: Remove some unnecessary white space
Multi-line preprocessor directives are now joined into single lines.
This can create lines too long for the preprocessor to handle. This
commit removes blanks adjoining comments that get deleted. This makes
things somewhat less likely to exceed the limit.
This commit also fixes several [] which were meant to each match a tab
or a blank, but editors converted the tabs to blanks
M makedepend.SH
commit c9246e5d0e3e21bfe24bb7c08e61b8257a62a5c4
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 14:30:51 2013 -0700
makedepend.SH: Retain '/**/' comments
These comments may actually be necessary.
M makedepend.SH
commit d644e7ec2e6c9219430a8aeaf524cf037b3d9cdf
Author: Karl Williamson <[email protected]>
Date: Wed Feb 27 08:38:19 2013 -0700
handy.h: Remove extraneous parens
M handy.h
commit dd3bf60ccfb92fb4a69fd73cd0044ebfd8b12182
Author: Andy Dougherty <[email protected]>
Date: Wed Feb 27 13:06:07 2013 -0500
Disable gcc-style function attributes on z/OS.
John Goodyear <[email protected]> reports that the z/OS C compiler
supports the attribute keyword, but not exactly the same as gcc.
Instead of a "warning", the compiler emits an "INFORMATIONAL" message
that Configure fails to detect. Until Configure is fixed, just disable
the attributes altogether.
John Goodyear
M hints/os390.sh
commit 5aaa4dd964d09418d9abd24fb851de2c9b266fc0
Author: Andy Dougherty <[email protected]>
Date: Wed Feb 27 09:12:13 2013 -0500
Change os390 custom cppstdin script to use fgrep.
Grep appears to be limited to 2048 characters, and truncates
the output for cppstin. Fgrep apparently doesn't have that limit.
Thanks to John Goodyear <[email protected]> for reporting this.
M hints/os390.sh
commit 964b0eed9ba2b581183179e6bc541fa12dd9cf9e
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:45:19 2013 -0700
utf8.c: Use more clearly named macro
In the case of invariants these two macros should do the same thing,
but it seems to me that the latter name more clearly indicates what is
going on.
M utf8.c
commit 77cfe4a263ec2abf1a02d81e9802121f992eb6be
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:35:12 2013 -0700
Add macro OFFUNISKIP
This means use official Unicode code point numbering, not native. Doing
this converts the existing UNISKIP calls in the code to refer to native
code points, which is what they meant anyway. The terminology is
somewhat ambiguous, but I don't think will cause real confusion.
NATIVESKIP is also introduced for situations where it is important to be
precise.
M toke.c
M utf8.c
M utf8.h
M utfebcdic.h
commit 8018c873df17e625f9c418b9e4d3a0d1a329e238
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:22:19 2013 -0700
toke.c: white space only
M toke.c
commit d4b16339ba66cd932a9bdf5b2bc18a62b355abb1
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 12:08:50 2013 -0700
utf8.c: Deprecate two functions
This is to force any code that has been using these functions to change.
Since the Unicode tables are now stored in native order, these functions
should only rarely be needed.
However, the functionality of these is needed, and in actuality, on
ASCII platforms, the native functions are #defined to these. So what
this commit does is rename the functions to something else, and create
wrappers with the old names, so that anyone using them will get the
deprecation.
M embed.fnc
M embed.h
M mathoms.c
M proto.h
M toke.c
M utf8.c
M utf8.h
commit 4d0bc895c403595c2462dbef33c3417384c0828b
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 11:26:09 2013 -0700
Deprecate uvuni_to_utf8()
Code should almost never be dealing with non-native code points
M embed.fnc
M embed.h
M proto.h
M toke.c
M utf8.c
M utf8.h
commit 8fae5f78e1fe84115e71a89ad15dac75ece13b60
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 11:02:33 2013 -0700
Deprecate utf8_to_uni_buf()
Now that the tables are stored in native order, there is almost no need
for code to be dealing in Unicode order.
M embed.fnc
M proto.h
M utf8.c
commit 1266b620adbf02c0e1b80b015f0952c2a49517a1
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 09:00:18 2013 -0700
makedepend.SH: Comment out unnecessary code
This causes problems currently for z/OS. But, since we don't know why
it was there, I'm leaving it in as a placeholder.
M makedepend.SH
commit 2fbbf5da9507193bfcc3d80e9f29cbf5b2b8cc04
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 20:26:44 2013 -0700
Deprecate valid_utf8_to_uvuni()
Now that all the tables are stored in native format, there is very
little reason to use this function; and those who do need this kind of
functionality should be using the bottom level routine, so as to make it
clear they are doing nonstandard stuff.
M embed.fnc
M proto.h
M utf8.c
commit 2ee912c297b9b805325b94cb2ae711a651d28a50
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 20:14:26 2013 -0700
utf8.c: Swap which fcn wraps the other
This is in preparation for the current wrapee becoming deprecated
M embed.fnc
M embed.h
M proto.h
M utf8.c
M utf8.h
commit 331dadac0f476906674c11845f52a74df396de95
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 19:29:34 2013 -0700
utf8.c: Skip a no-op
Since the value is invariant under both UTF-8 and not, we already have
it in 'uv'; no need to do anything else to get it
M utf8.c
commit 24cecd91ae5ca8a23a1f803136db676e10cfe2e7
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 19:26:50 2013 -0700
utf8.c: Move comment to where makes more sense
M utf8.c
commit 90382955d141821e0c43afe8d8296fab04f298e4
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:30:10 2013 -0700
APItest: Test native code points, instead of Unicode
M ext/XS-APItest/APItest.pm
M ext/XS-APItest/APItest.xs
M ext/XS-APItest/t/utf8.t
commit 46b3759bec510089913575d9e27f61cec9fceba8
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:25:08 2013 -0700
XXX CPAN Normalize
This converts Unicode::Normalize to use the native tables that are used
by Perl starting in XXX, while using the Unicode-ordered ones that were
used before then.
Another alternative would be to have mktables generate just these tables
in Unicode ordering.
M cpan/Unicode-Normalize/Normalize.xs
commit 416de2aad2ace2fe067b2196b9f1d16587dc9537
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:22:55 2013 -0700
XXX CPAN prob wrong Collate
This changes to implicity usenative code points. This is likely wrong,
as the module comes with its own data, that are probably in terms of
Unicode
M cpan/Unicode-Collate/Collate.xs
commit a4a49beb29bad9ec78c4724dc91ae8bd39b92bf0
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:12:53 2013 -0700
XXX CPAN Encode.xs
Use core function if available. This will insulate this code from any
future changes.
M cpan/Encode/Encode.xs
commit 29a8d2b2b8c866ad418fa17123693643c3d99e47
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:04:24 2013 -0700
XXX CPAN and unsure Encode
M cpan/Encode/Encode.xs
M cpan/Encode/Unicode/Unicode.xs
commit 903f85fbf9e9c5ad5c176da02065897144097702
Author: Karl Williamson <[email protected]>
Date: Mon Feb 25 17:00:47 2013 -0700
XXX CPAN Encode.xs: fix indent
M cpan/Encode/Encode.xs
commit 6602048a9aaec01ef1dcf5f27230f188b86e4745
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 17:23:15 2013 -0700
Don't refer to U+XXXX when mean native
These messages say the output number is Unicode, but it is really
native, so change to saying is 0xXXXX.
M regen/regcharclass_multi_char_folds.pl
M regexec.c
commit 1de86e61350d4fbbea2a9a8f39bd19443e21e75c
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:43:59 2013 -0700
Convert some uvuni() to uvchr()
All the tables are now based on the native character set, so using
uvuni() in almost all cases is wrong.
M cygwin/cygwin.c
M doop.c
M op.c
M pp_pack.c
M regcomp.c
M regexec.c
M toke.c
M utf8.c
commit bd481fc4b7254b0327fd3f47e71a572ec732bba2
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:25:47 2013 -0700
handy.h: White space only
M handy.h
commit 2a2492708f137a0ce38700fad7029fff7d57a8bd
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:19:49 2013 -0700
t/test.pl: Allow native/latin1 string conversions to work on utf8.
These functions no longer have the hard-coded definitions in them,
but now end up resolving to internal functions, so that new encodings
could be added and these would automatically understand them.
Instead of using tr///, these now go character by character and
converting to/from ord, which is slower, but allows them to operate on
utf8 strings.
Peephole optimization should make these essentially no-ops on ascii
platforms.
M t/test.pl
commit 2e30485fc8ba1e91eb2563672a535a1746cb8b5c
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 16:05:55 2013 -0700
t/test.pl: Simplify ord to/from native fcns
This commit changes these functions from converting to/from a string to
calling utf8:: functions which operate on ordinals instead.
M t/test.pl
commit 5ff0a67a2e15cd104a300b6baed8266ea85a1044
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 15:35:38 2013 -0700
Make casing tables native
These are final tables that haven't been converted to native character
set casing.
M perl.h
M utfebcdic.h
commit ea3f4c8f0d8a9dee8fd4013855b78bcbd853a0ae
Author: Karl Williamson <[email protected]>
Date: Sun Feb 24 15:32:30 2013 -0700
utfebcdic.h: Remove trailing spaces
M utfebcdic.h
commit 88fe12aa11468daa24cb856699f0d8a48edf0355
Author: Karl Williamson <[email protected]>
Date: Fri Feb 22 18:55:26 2013 -0700
EBCDIC has the unicode bug too
We have not had a working modern Perl on EBCDIC for some years. When I
started out, comments and code led me to conclude erroneously that
natively it supported semantics for all 256 characters 0-255. It turns
out that I was wrong; it natively (at least on some platforms) has the
same rules (essentially none) for the characters which don't correspond
to ASCII onees, as the rules for these on ASCII platforms.
This commit forces those rules on EBCDIC platforms (even should there be
one that natively uses all 256). To get all 256, the same things like
'use feature "unicode_strings"' must now be done.
M autodoc.pl
M handy.h
M pod/perlfunc.pod
M pod/perlre.pod
M pod/perlrecharclass.pod
M pod/perlunicode.pod
M pod/perlunifaq.pod
commit cda864f5944f770f3f9c780b1dc71323dd003e6f
Author: Karl Williamson <[email protected]>
Date: Thu Feb 21 13:47:52 2013 -0700
handy.h: Solve a failure to compile problem under EBCDIC
handy.h is included in files that don't include perl.h, and hence not
utf8.h. We can't rely therefore on the ASCII/EBCDIC conversion
macros being available to us. The best way to cope is to use the native
ctype functions. Most, but not all, of the macros in this commit
currently resolve to use those native ones, but a future commit will
change that.
M handy.h
commit 0b61494f87ec00e33ac364d9ba979bbe74d81e8e
Author: Karl Williamson <[email protected]>
Date: Thu Feb 21 13:35:12 2013 -0700
handy.h: Simplify some macro definitions
Now, only one of the macros relies on magic numbers (isPRINT), leading
to clearer definitions.
M handy.h
commit 267f39dea998502b9e7101d2d04bc0b0bf7c8b80
Author: Karl Williamson <[email protected]>
Date: Thu Feb 21 13:26:49 2013 -0700
handy.h: Combine macros that are same in ASCII, EBCDIC
These 4 macros can have the same RHS for their ASCII and EBCDIC
versions, so no need to duplicate their definitions
This also enables the EBCDIC versions to not have undefined expansions
when compiling without perl.h
M handy.h
commit b6e687b3f446a971e336bda5aee730e7423cbff3
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 10:39:48 2013 -0700
Deprecate NATIVE_TO_NEED and ASCII_TO_NEED
These macros are no longer called in the Perl core. This commit turns
them into functions so that they can use gcc's deprecation facility.
I believe these were defective right from the beginning, and I have
struggled to understand what's going on. From the name, it appears
NATIVE_TO_NEED taks a native byte and turns it into UTF-8 if the
appropriate parameter indicates that. But that is impossible to do
correctly from that API, as for variant characters, it needs to return
two bytes. It could only work correctly if ch is an I8 byte, which
isn't native, and hence the name would be wrong.
Similar arguments for ASCII_TO_NEED.
The function S_append_utf8_from_native_byte(const U8 byte, U8** dest)
does what I think NATIVE_TO_NEED intended.
M embed.fnc
M mathoms.c
M proto.h
M toke.c
M utf8.h
M utfebcdic.h
commit c93f281bcce3de7214889e3bfcf65265fdc42e67
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 10:26:43 2013 -0700
Remove remaining calls of NATIVE_TO_NEED
These calls are just copying the input to the output byte by byte.
There is no need to worry about UTF-8 or not, as the output is just an
exact copy of the input
M toke.c
commit 4d9d049f2ba63f41df8b43332a6b5f0545a78a14
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 08:12:15 2013 -0700
toke.c: Remove some NATIVE_TO_NEED calls
I believe NATIVE_TO_NEED is defective, and will remove it in a future
commit. But, just in case I'm wrong, I'm doing it in small steps so
bisects will show the culprit. This removes the calls to it where the
parameter is clearly invariant under UTF-8 and UTF-EBCDIC, and so the
result can't be other than just the parameter.
M toke.c
commit e1fcc682bd6c16a1d161a8a3cd40b6ba15d91d8b
Author: Karl Williamson <[email protected]>
Date: Wed Feb 20 08:22:07 2013 -0700
toke.c: in [A-Za-z] use macros that exclude non-ASCII alphas
This code is attempting to deal with the problem of holes in the ranges
a-z and A-Z in EBCDIC. Prior to this patch, it accepeted things like A
WITH GRAVE, etc, which shouldn't have the special processing to deal
with the holes
M toke.c
commit 878483481276aedd0566edeef499b8ad406e4d4f
Author: Karl Williamson <[email protected]>
Date: Tue Feb 19 15:13:19 2013 -0700
Use real illegal UTF-8 byte
The code here was wrong in assuming that \xFF is not legal in UTF-8
encoded strings. It currently doesn't work due to a bug, but that may
eventually be fixed: [perl #116867]. The comments are also wrong that
all bytes are legal in UTF-EBCDIC.
It turns out that in well-formed UTF-8, the bytes C0 and C1 never appear
(C2, C3, and C4 as well in UTF-EBCDIC), as they would be the start byte
of an illegal overlong sequence.
This creates a #define for an illegal byte using one of the real illegal
ones, and changes the code to use that.
No test is included due to #116867.
M op.c
M toke.c
M utf8.h
commit e963e4bd4d4fcc0233d51a1ec33165fc9fb38fe5
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 14:00:13 2013 -0700
toke.c: Don't remap \N{} for EBCDIC
Everything is now in native,
M toke.c
commit 3515a462c70ad75ede28000fdce1f971247c7e6a
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 13:50:45 2013 -0700
toke.c: Remove remapping for EBCDIC for octal
The code prior to this commit converted something like \04 into its
EBCDIC equivalent only in double-quoted strings. This was not done in
patterns, and so gave inconsistent results. The correct thing to do
should be to do the native thing, what someone who works on a platform
would think \04 do. Platform independent characters are available
through \N{}, either by name or by U+.
The comment changed by this was wrong, as in some cases it was native,
and in some cases Unicode.
M toke.c
commit f330cc5a6b5ce74ecc67ecac97553fc9cfa76eae
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 13:47:13 2013 -0700
Remove EBCDIC remappings
Now that the tables are stored in native format, we shouldn't be doing
remapping.
Note that this assumes that the Latin1 casing tables are stored in
native order; this hasn't been done yet.
M handy.h
M perly.c
M pp.c
M regcomp.c
M regexec.c
M utf8.c
commit 0caad749e72a4614c0d13d60a9eb63ac6e8fc631
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 12:46:05 2013 -0700
Add and use macro to return EBCDIC
The converstion from UTF-8 to code point should generally be to the
native code point. This adds a macro to do that, and converts the
core calls to the existing macro to use the new one instead. The old
macro is retained for possible backwards compatibility, though it
probably should be deprecated.
M handy.h
M pp.c
M regcomp.c
M regexec.c
M toke.c
M utf8.c
M utf8.h
commit 0db527fc1b0478c7217dd4ce5c26704630c0c99a
Author: Karl Williamson <[email protected]>
Date: Sun Feb 17 09:18:06 2013 -0700
charnames: fix nit in comment
M lib/_charnames.pm
commit b7f4305e3b8453afbc55be5387c491afb1520702
Author: Karl Williamson <[email protected]>
Date: Sat Feb 16 11:05:44 2013 -0700
charnames: Make work in EBCDIC
Now that mktables generates native tables, the only thing that was
needed was to make U+ mean Unicode instead of native.
M lib/_charnames.pm
M lib/charnames.pm
commit 752462c119e18471190dc471d2d2f26bcdc7f046
Author: Karl Williamson <[email protected]>
Date: Sat Feb 16 09:35:56 2013 -0700
Unicode::UCD: Work on non-ASCII platforms
Now that mktables generates native tables, it is a fairly simple matter
to get Unicode::UCD to work on those platforms.
M lib/Unicode/UCD.pm
commit 7b06773e3e2dc42be30feda7666582526cd6f71b
Author: Karl Williamson <[email protected]>
Date: Thu Feb 14 22:16:38 2013 -0700
mktables: Generate native code-point tables
The output tables for mktables are now in the platform's native
character set. This means there is no change for ASCII platforms, but
is a change for EBCDIC ones.
Since we currently don't have any EBCDIC test platforms, I tested this
by faking it out to generate EBCDIC data, and then eye-balled the
results.
Code that didn't realize there was a potential difference between EBCDIC
and non-EBCDIC platforms will now start to work; code that tried to do
the right thing under these circumstances will no longer work. Fixing
that comes in later commits.
M lib/unicore/mktables
commit fa0db3e97c7ea5ca2c041dcc1f1c6f7c6eca0468
Author: Karl Williamson <[email protected]>
Date: Thu Feb 14 10:50:00 2013 -0700
Fix some EBCDIC problems
These spots have native code points, so should be using the macros for
native code points, instead of Unicode ones.
M regcomp.c
M sv.c
M toke.c
commit 31f89ae93bb19cf093cd9a2b821bfb9c06951ebb
Author: Karl Williamson <[email protected]>
Date: Wed Feb 13 22:10:19 2013 -0700
Remove unnecessary temp variable in converting to UTF-8
These areas of code included a temporary that is unnecessary.
M inline.h
M regcomp.c
M sv.c
commit 962db892cae07286131e9d194cc7af4b0a14990f
Author: Karl Williamson <[email protected]>
Date: Wed Feb 13 22:00:55 2013 -0700
utf8.h: Correct macros for EBCDIC
These macros were incorrect for EBCDIC. The 3 step process given in
utfebcdic.h wasn't being followed.
M utf8.h
commit c265cfd30dc0eaae33d05de8dbb49a56fc9f4aaf
Author: Karl Williamson <[email protected]>
Date: Sat Feb 9 21:23:30 2013 -0700
Extract common code to an inline function
This fairly short paradigm is repeated in several places; a later commit
will improve it.
M embed.fnc
M embed.h
M inline.h
M pp_pack.c
M proto.h
M sv.c
M toke.c
M utf8.c
commit b60d0a8769db597c2fabe55907586e6af7caa123
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 21:35:57 2013 -0700
Don't use EBCDIC macro for a C language escape
C recognizes '\a' (for BEL); just use that instead of a look-up.
regen/unicode_constants.pl could be used to generate the character for
the ESC (set in surrounding code), but I didn't do that because of
potential bootstrapping problems when porting to an EBCDIC platform
without a working perl. (The other characters generated in that .pl are
less likely to cause problems when compiling perl.)
M regcomp.c
M toke.c
commit d4f37eacd972f87ac586279c80016d5def0a1d64
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 19:53:38 2013 -0700
Use byte domain EBCDIC/LATIN1 macro where appropriate
The macros like NATIVE_TO_UNI will work on EBCDIC, but operate on the
whole Unicode range. In the locations affected by this commit, it is
known that the domain is limited to a single byte, so the simpler ones
whose names contain LATIN1 may be used.
On ASCII platforms, all the macros are null, so there is no effective
change.
M handy.h
M regcomp.c
M utf8.c
commit b68a6369f3bef40813b6b2f9b9125330c329d15e
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 14:31:09 2013 -0700
Use new clearer named #defines
This converts several areas of code to use the more clearly named macros
introduced in a recent commit
M op.c
M toke.c
M utf8.c
M utf8.h
M utfebcdic.h
commit 316d80bd052136276ed5856d84f0fb2ac9d2a0b3
Author: Karl Williamson <[email protected]>
Date: Thu Feb 7 13:52:31 2013 -0700
utf8.h, utfebcdic.h: Create less confusing #defines
This commit creates macros whose names mean something to me, and I don't
find confusing. The older names are retained for backwards
compatibility. Future commits will fix bugs I introduced from
misunderstanding the meaning of the older names.
The older names are now #defined in terms of the newer ones, and moved
so that they are only defined once, valid for both ASCII and EBCDIC
platforms.
M utf8.h
M utfebcdic.h
commit 905eb24a5ca83a2eaa35ccae179717c1c0d94744
Author: Karl Williamson <[email protected]>
Date: Mon Feb 4 14:22:02 2013 -0700
pp_ctl.c: Use isCNTRL instead of hard-coded mask
This is clearer and portable to EBCDIC.
M pp_ctl.c
commit 9e48f7a7c1b5f3568f07b884816396b9254a9750
Author: Karl Williamson <[email protected]>
Date: Tue Feb 26 13:51:05 2013 -0700
utf8.c: is_utf8_char_slow() should use native length
What is passed is the actual length of the native utf8 character. What
this was calculating was the length it would be if it were a Unicode
character, and then compares, apples to oranges.
M utf8.c
-----------------------------------------------------------------------
--
Perl5 Master Repository