of modules known to pass tests on 5.6.1
There may be an older version of Cache::Memcached that works on 5.6.1
Otherwise I'd guess that your least worse choice to progress is to locally
fork Cache::Memcached and remove the code in it which requires Encode.
Nicholas Clark
set the STDOUT filehandle to ':raw' and
> then to restore the previous IO layers.
> Is there a way to determine the IO layers applying to a filehandle
> just from the filehandle itself?
I think you want PerlIO::get_layers($fh)
I'm not sure where it's documented.
Nicholas Clark
are not those which you want to consider reserved. I guess that one
needs to loop over all characters in the string, and verify that if
$char eq lc $char then also $char ne uc $char. (But one could first
short circuit the common pass case with the test above)
Nicholas Clark
print bytes::twtowtdi()
>
> Were these problems resolved in later releases of Perl?
This appears to have been solved by 5.8.7
Nicholas Clark
CET 5.0.0
with the release of 5.8.9, it could break things for people who have installed
Unicode::Collate with 5.8.8 (or earlier) and are currently using DUCET 4.1.0
So it wouldn't be a great idea.
Nicholas Clark
hink it likely that 5.8.9 will ship with Unicode 5.0.0 data
Nicholas Clark
raw'). But obviously I'm
> wrong. Is there something I'm missing about when and/or where the IO
> layer should be set? Anyone run into something like this before?
I doubt that perl's at fault. What are you piping the data into, and
what does it think of $LANG in the environment?
Nicholas Clark
> locale but Unix is all over the map.
I don't know. I have little to no experience of doing conversion of real
data, certainly for data outside of ISO-8859-1 and UTF-8, and I've never used
I18N::Langinfo. I hope that someone else on this list can give a decent
answer.
Nicholas Clark
data, and there is a (buggy) assumption
that 8 bit data can be converted to Unicode by assuming that it's ISO-8859-1.
Definitely buggy. Not possible to change without breaking backward
compatibility.
Nicholas Clark
ault. Under use utf8; Unicode word characters
can also be used in identifiers.
I doubt that this will change in perl 5, because the parser is written in C,
and so it would be very hard work to replace it with something that was fully
Unicode aware.
Nicholas Clark
; NATIVE_TO_ASCII macro on the input character?
I don't know.
And if the test is only checking for invariant characters below 127, it
doesn't strike me as a very thorough test.
Nicholas Clark
w. How thorough are the tests? Do the tests check for the
conversion of characters with Unicode code points >127?
You're asking questions beyond my knowledge.
Nicholas Clark
On Wed, Aug 10, 2005 at 02:11:45PM +0530, Sastry wrote:
> On 8/9/05, Nicholas Clark <[EMAIL PROTECTED]> wrote:
> > On Tue, Aug 09, 2005 at 10:58:48AM +0530, Sastry wrote:
> > > > $enc_string = encode("iso-8859-16", $string);
> > So $enc_string should
On Tue, Aug 09, 2005 at 10:58:48AM +0530, Sastry wrote:
> Hi
>
> I get 73 printed on EBCDIC platform. I think it is supposed to print
> 129 as it is the numeric equivalent of 'a'.
>
> -Sastry
>
>
>
> On 8/8/05, Nicholas Clark <[EMAIL PROTECTED]
a regexp leave a gap, but [\x89-\x91] not.
I don't know where ranges in tr/// are parsed, but given that I grepped
for EBCDIC and didn't find any analogous code, it looks like tr/\x89-\x91//
is treated as tr/i-j// and in turn i-j is treated as letters and always
"special cased"
I don't know if tr/i-j// and tr/\x89-\x91// should behave differently
(ie whether we currently have a bug)
Nicholas Clark
On your EBCDIC platform, what does this give?
use Encode;
$string = "a";
$enc_string = encode("iso-8859-16", $string);
print ord ($enc_string), "\n";
__END__
Nicholas Clark
ete.
> Thanks for all your help on this.
Do you have any idea *why* this change makes things work?
Nicholas Clark
* would explain the failures, and be the
> > thing that needs
> > correcting. The test file would need if/else with a
> > different test on EBCDIC.
> what would you suggest be put in the if/ else ?
I think that the regression tests tended to do something like
if (ord 'A' == 65) {
# Do the ASCII/UTF-8 version
} else {
# Assume EBCDIC
}
Nicholas Clark
that make a
> difference to miniperl ?
Well, the code is linked into miniperl, so I can only assume that it's
getting called.
If so, does removing the second instance of NATIVE_TO_UTF() improve things?
Nicholas Clark
thing I can think of is that I notice that further down that function
there is:
#ifdef EBCDIC
uv = NATIVE_TO_UTF(uv);
#else
if ((uv == 0xfe || uv == 0xff) &&
!(flags & UTF8_ALLOW_FE_FF)) {
warning = UTF8_WARN_FE_FF;
goto malformed;
}
#endif
Is that second call to NATIVE_TO_UTF still present in your modified code?
Nicholas Clark
y change you've made to the source code?
2: Without that change, how does your build fail? How do the errors differ?
Nicholas Clark
On Fri, Jun 10, 2005 at 12:02:27PM +0100, Nicholas Clark wrote:
> It would be better if you sent 1 e-mail to both perl-unicode.perl.org
> and perl5-porters@perl.org to ask a given question, rather than two.
It would help if I got the address correct.
(or avoided using the format used in DN
spective ?
I also can't answer this, but my hunch is that from a debugging perspective,
tackling 7) and 5) first is the way to go. Until these bugs are solved, it's
quite probable that attempts to solve the other problems will be hindered
by errors introduced by these bugs.
Nicholas Clark
e bugs. CPAN module authors have also fixed bugs.
Nicholas Clark
ecifically its XS code is not checking the internal UTF8
flag before doing things with the PV.
Nicholas Clark
ncode(_utf8_on);
my $data = "\xC3\x84";
_utf8_on($data);
open FH, ">:utf8", "aa";
print FH $data ;
print length($data);
to tell perl that the file handle is expecting UTF8 rather than the default,
then you get a 2 byte file output.
Nicholas Clark
est 107 in ext/Unicode/Normalize/t/illegal.t at line 59 fail #11
not ok 108
# Failed test 108 in ext/Unicode/Normalize/t/illegal.t at line 60 fail #11
not ok 109
# Failed test 109 in ext/Unicode/Normalize/t/illegal.t at line 61 fail #11
not ok 110
# Failed test 110 in ext/Unicode/Normalize/t/illegal.t at line 62 fail #11
ok 111
ok 112
I don't know what is at fault here, the tests, or the patch.
Nicholas Clark
know how.
I don't think that you need to file a report, as we're now aware of it.
Jarkko managed to cut the test case down to something very small, but we
can't manage to make a fix that doesn't break regexps in something else,
seemingly completely unrelated.
Nicholas Clark
On Wed, Jun 30, 2004 at 10:15:13PM +0100, Richard Jolly wrote:
>
> On 30 Jun 2004, at 17:52, Nicholas Clark wrote:
>
> >On Tue, Jun 29, 2004 at 06:49:16PM +0100, Richard Jolly wrote:
> >> Script
> >
> >Could you resend the script/data test case as an attac
On Tue, Jun 29, 2004 at 06:49:16PM +0100, Richard Jolly wrote:
> Script
Could you resend the script/data test case as an attachment please?
It's been mangled by the format flowed on your mailer and currently I'm
getting errors which suggest that I can't undo that damage.
for arguing's sake these days.
Suggest a workable solution, volunteer to actually do it and I think
that everyone will be happy.
My only thought is should the API be full SVs, or char pointer plus utf8/not
flag? (possibly as 1 bit in a flags word)
Nicholas Clark
rated it into the maintenance branch, but I intend to,
so unless it causes really really strange errors (very unlikely) it will
be in 5.8.5. (Which in turn will be in mid July)
Nicholas Clark
if $] >= 5.006, utf8;
On CPAN as http://search.cpan.org/author/ILYAZ/if-0.0101/
In the core since 5.8.0
Nicholas Clark
L
and LC_CTYPE don't contain a string matching /utf-?8/i)
I don't know what sets these variables on RedHat systemwide, so I don't
know how to change them.
My personal opinion is that it was premature of RedHat to make RedHat 8.0
*default* to using UTF8 locales, given the general state o
On Mon, Nov 04, 2002 at 03:26:16AM +, [EMAIL PROTECTED] wrote:
> Nicholas Clark <[EMAIL PROTECTED]> wrote:
> :I've been experimenting with how enc2xs builds the C tables that turn into the
> :shared objects. enc2xs is building tables (arrays of struct encpage_t) which
>
On Mon, Nov 04, 2002 at 03:26:16AM +, [EMAIL PROTECTED] wrote:
> Nicholas Clark <[EMAIL PROTECTED]> wrote:
>
> :The default method is to see if my substring is already present somewhere,
> :if so note where, if not append at the end. The (currently buggy) -O optimiser
>
t 5.8.0?]
Basically is there something that the perl development community needs to do
(or change) that would avoid this in future?
Nicholas Clark
--
Befunge better than perl? http://www.perl.org/advocacy/spoofathon/
ogai wrote:
> > oh wait! Encode.xs remains unchanged so Encode::* may still work
>
> Confirmed. The NC patch works w/ preexisting shlibs.
Good. It would have been worrying if it had not. The idea was not to
change any of the internal data structures visible to any code anywhere,
just to change how the U8 strings they point were arranged.
Nicholas Clark
--
z-code better than perl?http://www.perl.org/advocacy/spoofathon/
On Sun, Nov 03, 2002 at 11:13:25PM +, Nicholas Clark wrote:
> Currently the appended patch passes all regression tests on FreeBSD on
> bleadperl. However, having experimented I know that the new -O function it
> provides is buggy in some way, as running -O on the Chinese encodi
Encode-O0-Agg/lib/auto/Encode/Symbol/Symbol.so
937328 18075-Encode-O0-Agg/lib/auto/Encode/TW/TW.so
12039 18075-Encode-O0-Agg/lib/auto/Encode/Unicode/Unicode.so
4706245 total
Nicholas Clark
--
XSLT better than perl? http://www.perl.org/advocacy/spoofathon/
--- ext/Encode/bin/enc2xs.orig S
obsessed with speed currently, so I
doubt I can find them by searching for "speed". And I can't remember why
I might have suggested that allowing optional arguments would induce
serious slowdown. (by implication even when no optional arguments are used)
Nicholas Clark
PS shameless plug for optimising your perl code talk:
http://www.ccl4.org/~nick/P/Fast_Enough/
e considering upgrading from something like 5.005, is there any
reason not to consider going straight to 5.8.0? The Unicode 5.8.0 support
in 5.8.0 is much better than 5.6.1, and it also fixes many of the bugs
still present in 5.6.1. (Nothing is perfect - a few new bugs have been
reported in 5.8.0,
cpan.org, enter Unicode::Collate in the box, hit go, top
of the returned list)
CPAN's usually a the best place to start when looking for anything perl.
Nicholas Clark
On Thu, Aug 15, 2002 at 05:28:43PM -0400, David Gray wrote:
> > I'm having a bit of a problem getting Unicode pattern
> > matching to do what I would like it to.
>
> I guess my question wasn't entirely clear. I'm reading in the attatched
> file and trying to split it on "\n\n".
>
> When I'm loo
On Tue, Aug 06, 2002 at 10:36:09PM +0900, SADAHIRO Tomoyuki wrote:
>
> On Mon, 5 Aug 2002 22:17:10 +0100
> Nicholas Clark <[EMAIL PROTECTED]> wrote:
>
> > I'm trying to backport ExtUtils::Constant from 5.8.0 to work on perl pre
> > 5.8.0. Currently ExtUtils:
e power of the dark side.
M-x flyspell-mode
Definitely part of the dark side because here it defaults to American.
And then refuses to start because I don't have American dictionaries
installed. ispell has no problem "just running" and finding the correct
dictionaries.
Nicholas C
the various hoops
the different encoding systems are forced to jump through, when one
only knows languages which use the Roman alphabet and therefore has
had no direct experience of anything other than ASCII and ISO 8859-1.
But it's more important to get Encode working well than spend time on th
$seen{$uch} = [$page,$ch];
}
enter($e2u,$ech,$uch,$e2u,0);
enter($u2e,$uch,$ech,$u2e,0);
}
else
{
# No character at this position
# enter($e2u,$ech,undef,$e2u);
}
$ch++;
}
Is there a bug?
Should the $ch++ hap
On Wed, Feb 06, 2002 at 09:59:44AM +, Nick Ing-Simmons wrote:
> Nicholas Clark <[EMAIL PROTECTED]> writes:
> >On Tue, Feb 05, 2002 at 04:29:34PM +, Nick Ing-Simmons wrote:
> >> If I throw jis208.enc into the pot, then without -O it is 12s
> >> and with
isk buffers, so the OS is doing its
best however you config things]. Or am I barking up the wrong tree?
Nicholas Clark
--
EMCFT http://www.ccl4.org/~nick/CV.html
It saves 22K but is that worth while?
Then surely this extra searching becomes the configure question?
Try harder to compress CJK encodings (this will slow your build considerably)?
[no]
Unless we find a more efficient algorithm to search for common substrings.
Nicholas Clark
--
EMCFT http://www.ccl4.org/~nick/CV.html
51 matches
Mail list logo