Determining IO layer set on filehandle

2010-01-29 Thread Michael Ludwig
Filehandles may have IO layers applied to them, like :utf8 or :raw. One of the ways to achieve that is to use the binmode() function. binmode $fh, ':utf8'; What I want to achieve is to set the STDOUT filehandle to ':raw' and then to restore the previous IO layers. sub out_bin { binmode

Re: Determining IO layer set on filehandle

2010-01-29 Thread Michael Ludwig
Am 29.01.2010 um 15:37 schrieb Aristotle Pagaltzis: * Michael Ludwig michael.lud...@xing.com [2010-01-29 14:20]: Is there an alternative way to achieve what I want, maybe involving one of the IO modules? You may want to just dup the filehandle and then diddle the dup’d one. You may need

Re: Determining IO layer set on filehandle

2010-01-29 Thread Michael Ludwig
Am 29.01.2010 um 17:28 schrieb Nicholas Clark: On Fri, Jan 29, 2010 at 02:22:06PM +0100, Michael Ludwig wrote: Filehandles may have IO layers applied to them, like :utf8 or :raw. One of the ways to achieve that is to use the binmode() function. binmode $fh, ':utf8'; What I want

Re: Determining IO layer set on filehandle

2010-02-01 Thread Michael Ludwig
Am 29.01.2010 um 16:10 schrieb Aristotle Pagaltzis: * Michael Ludwig michael.lud...@xing.com [2010-01-29 15:50]: Like, does it work on all platforms? Ouch, good question. I don’t know whether Win32 supports dup’ing. I tried it out, it does. Same syntax, cross-platform. -- Michael.Ludwig

use encoding 'utf8' and \x{00e4} notation

2010-02-02 Thread Michael Ludwig
I was under the assumption that: use encoding 'utf8'; was equivalent to: use utf8; # source in UTF-8 binmode STDIN, ':utf8'; binmode STDOUT, ':utf8; But that does not seem to be the case. Please consider and run the following script: use strict; use warnings; # use either (a) use

Re: use encoding 'utf8' and \x{00e4} notation

2010-02-03 Thread Michael Ludwig
Am 03.02.2010 um 08:55 schrieb Aristotle Pagaltzis: * Michael Ludwig michael.lud...@xing.com [2010-02-02 17:35]: use encoding 'utf8'; The `encoding` pragma is broken. Do not use it. You want use open ':encoding(UTF-8)', ':std'; Thanks, Aristotle - that works correctly! I'm also

Character (or byte?) escapes under utf8 pragma

2010-03-03 Thread Michael Ludwig
For convenience, I have test script source code in UTF-8. The test also deals with non-breaking spaces, which I prefer to keep as character references since they are not visible and might be mistaken by the casual onlooker for ordinary spaces. So I write them as \xa0. Or \x{a0}, or \x{00a0}. Now

Re: Character (or byte?) escapes under utf8 pragma

2010-03-08 Thread Michael Ludwig
Hi Aristotle, thanks for your answer - much appreciated! Please see my comments inline. Am 07.03.2010 um 07:39 schrieb Aristotle Pagaltzis: Perl does not distinguish between bytes and characters. It does distinguish between scalars that use a packed byte buffer for storage vs strings that

Re: Character (or byte?) escapes under utf8 pragma

2010-03-10 Thread Michael Ludwig
Moin Juerd, Am 08.03.2010 um 16:15 schrieb Juerd Waalboer: Michael Ludwig skribis 2010-03-08 15:55 (+0100): Okay. But unless I'm completely misled, you can tell whether a string is supposed to contain characters (- Encode::decode) or bytes (- Encode::encode) The result of decode

Re: Character (or byte?) escapes under utf8 pragma

2010-03-10 Thread Michael Ludwig
Am 10.03.2010 um 11:02 schrieb Juerd Waalboer: Michael Ludwig skribis 2010-03-10 10:34 (+0100): Okay. Let me try to see if I have understood correctly. Without the utf8 pragma in scope, so\xa0ein\xa0Käse with a-Umlaut stored as a sequence of two bytes in my source code will be stored

Use case for utf8::upgrade?

2010-04-07 Thread Michael Ludwig
Perl Unicode Advice http://juerd.nl/site.plp/perluniadvice Having read Juerd's list of useful advice, I don't understand the reason for its last three items: • utf8::upgrade before doing lc/lcfirst/uc • utf8::upgrade before doing case insensitive matching • utf8::upgrade before matching

Re: Use case for utf8::upgrade?

2010-04-07 Thread Michael Ludwig
Am 07.04.2010 um 17:42 schrieb Aristotle Pagaltzis: * Michael Ludwig michael.lud...@xing.com [2010-04-07 15:00]: • utf8::upgrade before doing lc/lcfirst/uc • utf8::upgrade before doing case insensitive matching • utf8::upgrade before matching predefined character classes like w and s

Re: Use case for utf8::upgrade?

2010-04-08 Thread Michael Ludwig
Am 08.04.2010 um 01:25 schrieb Aristotle Pagaltzis: * Gisle Aas gi...@aas.no [2010-04-08 00:00]: This fix was withdrawn from 5.12.0. Currently you have to use feature 'unicode_strings' to get the sane behaviour in the current lexical scope. [...] This means that the utf8::upgrade() advice

Re: Use case for utf8::upgrade?

2010-04-08 Thread Michael Ludwig
Am 08.04.2010 um 10:06 schrieb Aristotle Pagaltzis: Almost all of the time the performance cost is negligible and not worth sweating at the application code level. Exactly what I was hoping to hear. Thanks a lot for your exhaustive answer! -- Michael.Ludwig (#) XING.com

Establishing IO code conventions

2010-04-09 Thread Michael Ludwig
The `open` pragma allows you to set default values for two-argument calls to open and some other operators for a lexical scope, for example file level. Where you need something else you can call binmode or three-argument open or use the `open` pragma in a narrow lexical scope. Supposed you

Effect of -C command line switch on `warn` and `die`

2010-04-22 Thread Michael Ludwig
Consider the following script, the source of which is encoded in UTF-8: use utf8; use open qw/:utf8 :std/; my $str = Käse\n; print STDOUT $str; print STDERR $str; warn $str; die $str; On a UTF-8 terminal, this prints Käse three times. So far, so good. Now remove the open pragma

Re: Effect of -C command line switch on `warn` and `die`

2010-04-26 Thread Michael Ludwig
Am 22.04.2010 um 17:29 schrieb Aristotle Pagaltzis: * Michael Ludwig michael.lud...@xing.com [2010-04-22 17:00]: Consider the following script, the source of which is encoded in UTF-8: I can’t answer your question, but I do want to suggest that you re-post it to perl5-port...@perl.org

Don't use the \C escape in regexes - Why not?

2010-05-03 Thread Michael Ludwig
Don't use the \C escape in regexes - taken from Juerd's Unicode Advice page: http://juerd.nl/site.plp/perluniadvice Why not? -- perldoc perlre: \C Match a single C char (octet) even under Unicode. NOTE: breaks up characters into their UTF-8 bytes, so you may end up with malformed

Re: Don't use the \C escape in regexes - Why not?

2010-05-04 Thread Michael Ludwig
Am 04.05.2010 um 11:09 schrieb Gisle Aas: I regret that I let \C sneak into the URI module. I might have understood why one might think that \C is not a good idea to use in that method, and maybe not in general. The fact that character strings in Perl are encoded in UTF-8 is an

Re: Don't use the \C escape in regexes - Why not?

2010-05-04 Thread Michael Ludwig
Am 04.05.2010 um 13:06 schrieb Michael Ludwig: Is it this (theoretically fragile) implicitness in handling character strings that makes \C a bad idea? But probably not as bad an idea as relying on the default platform encoding in Java (default charset in Java API doc lingo), which may

Re: Variation In Decoding Between Encode and XML::LibXML

2010-06-19 Thread Michael Ludwig
David E. Wheeler schrieb am 16.06.2010 um 13:59 (-0700): On Jun 16, 2010, at 9:05 AM, David E. Wheeler wrote: On Jun 16, 2010, at 2:34 AM, Michael Ludwig wrote: In order to print Unicode text strings (as opposed to octet strings) correctly to a terminal (UTF-8 or not), add the following

Re: Silence “Wide character” warning globally one time

2010-07-29 Thread Michael Ludwig
might also want to read the archives of this list to see how I managed to make some progress in understanding thanks to the good answers I got here. [1] http://juerd.nl/perl -- Michael Ludwig

Re: Silence “Wide character” warning globally one time

2010-07-30 Thread Michael Ludwig
send the direct URL to the part you wanted me to see? http://juerd.nl/perlunitut.html http://juerd.nl/site.plp/perluniadvice These docs are also included in recent Perl distributions. -- Michael Ludwig

Re: Workaround to a unicode bug needed

2010-09-06 Thread Michael Ludwig
+/gms; print $_, \n for @words; -- Michael Ludwig

Re: Workaround to a unicode bug needed

2010-09-06 Thread Michael Ludwig
mode. (2) Consequently, you don't have text in Perl: you have octets. (3) You're applying some butchery to the octets using the tr operator. (4) You're outputting the remaining octets encoding them as UTF-8. (5) You're seeing garbage on the screen. -- Michael Ludwig

Re: utf8 pragma, lexical scope

2010-09-10 Thread 'Michael Ludwig'
in stashes as raw bytes, without | the utf-8 flag set. The pad API only takes a Cchar * pointer, | so that's all bytes too. The tokeniser ignores the UTF-8-ness of | CPL_rsfp, or any SVs returned from source filters. All this | could be fixed. Thanks - I didn't know this doc. -- Michael Ludwig

Re: Am I correct in thinking that the only way to get ord() to return a value over 256 is to send the character as a Unicode string instead of a byte string?

2010-10-28 Thread Michael Ludwig
is already encoded, and also to use the correct encoding. -- Michael Ludwig

Re: Matching upper ASCII characters in RE patterns

2010-11-30 Thread Michael Ludwig
://stackoverflow.com/questions/492838/ -- Michael Ludwig

Re: encoding(UTF16-LE) on Windows

2011-01-19 Thread Michael Ludwig
. Perl/Cygwin 5.10.1 does fine because its OS is cygwin, so it doesn't translate \n to CRLF. -- Michael Ludwig

Re: encoding(UTF16-LE) on Windows

2011-01-19 Thread 'Michael Ludwig'
out with the :raw pseudo-layer: open(my $fh, :raw:encoding(UTF-16LE):crlf, $filename) or die $!; Cool, that works. thanks! :-) -- Michael Ludwig

Re: encoding(UTF16-LE) on Windows

2011-01-20 Thread Michael Ludwig
; # appears logical, but wrong result out $e, $n; out $n, $e; out $n, $r, $e; # :crlf reset -- Michael Ludwig

Re: encoding(UTF16-LE) on Windows

2011-01-20 Thread 'Michael Ludwig'
[RE: encoding(UTF16-LE) on Windows] Jan Dubois schrieb am 20.01.2011 um 12:45 (-0800): On Thu, 20 Jan 2011, Michael Ludwig wrote: Erland Sommarskog schrieb am 20.01.2011 um 08:29 (-): Jan Dubois (j...@activestate.com) writes: You need to stack the I/O layers in the right order

Re: encoding(UTF16-LE) on Windows

2011-01-30 Thread Michael Ludwig
for Unicode. Or what exactly are you referring to? -- Michael Ludwig

Re: encoding(UTF16-LE) on Windows

2011-01-31 Thread Michael Ludwig
Erland Sommarskog schrieb am 31.01.2011 um 23:42 (+0100): Michael Ludwig (mil...@gmx.de) writes: Erland Sommarskog schrieb am 29.01.2011 um 14:02 (+0100): Yes, there certainly seems to be some more stuff to do in the Unicode support in Perl. For instance, support for Unicode filenames

Unicode on Windows Console

2012-01-07 Thread Michael Ludwig
of these produced the desired effect. Any ideas? I know I could use a Linux UTF-8 terminal or Cygwin/MinTTY, which is what I'm using to write this mail, by the way; but the question is specific to the Windows console in cmd.exe and how to make Perl use its features. -- Michael Ludwig

Re: Unicode on Windows Console

2012-01-11 Thread Michael Ludwig
Michael Ludwig schrieb am 07.01.2012 um 18:30 (+0100): There's a WinAPI function that sets stdout to Unicode so you can read Cyrillic and Greek characters in the cmd.exe console window: Can I get the same feature in Perl? Yes: https://metacpan.org/module/Win32::Unicode Printing twelve hearts

Re: Unicode on Windows Console

2012-01-12 Thread Michael Ludwig
. Might seem wrong at first glance, I agree; but hey, this is Windows, and it magically works, sort of bypassing the chcp setting! You just need the C API call: _setmode(_fileno(stdout), _O_WTEXT) Or some equivalent of that, which is what Win32::Unicode seems to be doing. -- Michael Ludwig