Filehandles may have IO layers applied to them, like :utf8 or :raw.
One of the ways to achieve that is to use the binmode() function.
binmode $fh, ':utf8';
What I want to achieve is to set the STDOUT filehandle to ':raw' and
then to restore the previous IO layers.
sub out_bin {
binmode
Am 29.01.2010 um 15:37 schrieb Aristotle Pagaltzis:
* Michael Ludwig michael.lud...@xing.com [2010-01-29 14:20]:
Is there an alternative way to achieve what I want, maybe
involving one of the IO modules?
You may want to just dup the filehandle and then diddle the dup’d
one. You may need
Am 29.01.2010 um 17:28 schrieb Nicholas Clark:
On Fri, Jan 29, 2010 at 02:22:06PM +0100, Michael Ludwig wrote:
Filehandles may have IO layers applied to them, like :utf8 or :raw.
One of the ways to achieve that is to use the binmode() function.
binmode $fh, ':utf8';
What I want
Am 29.01.2010 um 16:10 schrieb Aristotle Pagaltzis:
* Michael Ludwig michael.lud...@xing.com [2010-01-29 15:50]:
Like, does it work on all platforms?
Ouch, good question. I don’t know whether Win32 supports dup’ing.
I tried it out, it does. Same syntax, cross-platform.
--
Michael.Ludwig
I was under the assumption that:
use encoding 'utf8';
was equivalent to:
use utf8; # source in UTF-8
binmode STDIN, ':utf8';
binmode STDOUT, ':utf8;
But that does not seem to be the case. Please consider and run
the following script:
use strict;
use warnings;
# use either (a)
use
Am 03.02.2010 um 08:55 schrieb Aristotle Pagaltzis:
* Michael Ludwig michael.lud...@xing.com [2010-02-02 17:35]:
use encoding 'utf8';
The `encoding` pragma is broken. Do not use it.
You want
use open ':encoding(UTF-8)', ':std';
Thanks, Aristotle - that works correctly!
I'm also
For convenience, I have test script source code in UTF-8.
The test also deals with non-breaking spaces, which I prefer
to keep as character references since they are not visible
and might be mistaken by the casual onlooker for ordinary
spaces. So I write them as \xa0. Or \x{a0}, or \x{00a0}.
Now
Hi Aristotle,
thanks for your answer - much appreciated! Please see my comments
inline.
Am 07.03.2010 um 07:39 schrieb Aristotle Pagaltzis:
Perl does not distinguish between bytes and characters. It does
distinguish between scalars that use a packed byte buffer for
storage vs strings that
Moin Juerd,
Am 08.03.2010 um 16:15 schrieb Juerd Waalboer:
Michael Ludwig skribis 2010-03-08 15:55 (+0100):
Okay. But unless I'm completely misled, you can tell whether a
string is supposed to contain characters (- Encode::decode) or
bytes (- Encode::encode)
The result of decode
Am 10.03.2010 um 11:02 schrieb Juerd Waalboer:
Michael Ludwig skribis 2010-03-10 10:34 (+0100):
Okay. Let me try to see if I have understood correctly. Without the utf8
pragma in scope, so\xa0ein\xa0Käse with a-Umlaut stored as a sequence
of two bytes in my source code will be stored
Perl Unicode Advice
http://juerd.nl/site.plp/perluniadvice
Having read Juerd's list of useful advice, I don't understand the reason for
its last three items:
• utf8::upgrade before doing lc/lcfirst/uc
• utf8::upgrade before doing case insensitive matching
• utf8::upgrade before matching
Am 07.04.2010 um 17:42 schrieb Aristotle Pagaltzis:
* Michael Ludwig michael.lud...@xing.com [2010-04-07 15:00]:
• utf8::upgrade before doing lc/lcfirst/uc
• utf8::upgrade before doing case insensitive matching
• utf8::upgrade before matching predefined character classes
like w and s
Am 08.04.2010 um 01:25 schrieb Aristotle Pagaltzis:
* Gisle Aas gi...@aas.no [2010-04-08 00:00]:
This fix was withdrawn from 5.12.0. Currently you have to use
feature 'unicode_strings' to get the sane behaviour in the
current lexical scope. [...] This means that the utf8::upgrade()
advice
Am 08.04.2010 um 10:06 schrieb Aristotle Pagaltzis:
Almost all of the time the performance cost is negligible and not
worth sweating at the application code level.
Exactly what I was hoping to hear.
Thanks a lot for your exhaustive answer!
--
Michael.Ludwig (#) XING.com
The `open` pragma allows you to set default values for two-argument calls to
open and some other operators for a lexical scope, for example file level.
Where you need something else you can call binmode or three-argument open or
use the `open` pragma in a narrow lexical scope.
Supposed you
Consider the following script, the source of which is encoded in UTF-8:
use utf8;
use open qw/:utf8 :std/;
my $str = Käse\n;
print STDOUT $str;
print STDERR $str;
warn $str;
die $str;
On a UTF-8 terminal, this prints Käse three times. So far, so good. Now remove
the open pragma
Am 22.04.2010 um 17:29 schrieb Aristotle Pagaltzis:
* Michael Ludwig michael.lud...@xing.com [2010-04-22 17:00]:
Consider the following script, the source of which is encoded
in UTF-8:
I can’t answer your question, but I do want to suggest that you
re-post it to perl5-port...@perl.org
Don't use the \C escape in regexes - taken from Juerd's Unicode Advice page:
http://juerd.nl/site.plp/perluniadvice
Why not?
-- perldoc perlre:
\C Match a single C char (octet) even under Unicode.
NOTE: breaks up characters into their UTF-8 bytes,
so you may end up with malformed
Am 04.05.2010 um 11:09 schrieb Gisle Aas:
I regret that I let \C sneak into the URI module.
I might have understood why one might think that \C is not a good idea to use
in that method, and maybe not in general.
The fact that character strings in Perl are encoded in UTF-8 is an
Am 04.05.2010 um 13:06 schrieb Michael Ludwig:
Is it this (theoretically fragile) implicitness in handling character strings
that makes \C a bad idea?
But probably not as bad an idea as relying on the default platform encoding
in Java (default charset in Java API doc lingo), which may
David E. Wheeler schrieb am 16.06.2010 um 13:59 (-0700):
On Jun 16, 2010, at 9:05 AM, David E. Wheeler wrote:
On Jun 16, 2010, at 2:34 AM, Michael Ludwig wrote:
In order to print Unicode text strings (as opposed to octet
strings) correctly to a terminal (UTF-8 or not), add the following
might also want to read the archives of this list to see how I
managed to make some progress in understanding thanks to the good
answers I got here.
[1] http://juerd.nl/perl
--
Michael Ludwig
send the direct URL to the part you
wanted me to see?
http://juerd.nl/perlunitut.html
http://juerd.nl/site.plp/perluniadvice
These docs are also included in recent Perl distributions.
--
Michael Ludwig
+/gms;
print $_, \n for @words;
--
Michael Ludwig
mode.
(2) Consequently, you don't have text in Perl: you have octets.
(3) You're applying some butchery to the octets using the tr operator.
(4) You're outputting the remaining octets encoding them as UTF-8.
(5) You're seeing garbage on the screen.
--
Michael Ludwig
in stashes as raw bytes, without
| the utf-8 flag set. The pad API only takes a Cchar * pointer,
| so that's all bytes too. The tokeniser ignores the UTF-8-ness of
| CPL_rsfp, or any SVs returned from source filters. All this
| could be fixed.
Thanks - I didn't know this doc.
--
Michael Ludwig
is already encoded, and also
to use the correct encoding.
--
Michael Ludwig
://stackoverflow.com/questions/492838/
--
Michael Ludwig
.
Perl/Cygwin 5.10.1 does fine because its OS is cygwin, so it doesn't
translate \n to CRLF.
--
Michael Ludwig
out with the :raw pseudo-layer:
open(my $fh, :raw:encoding(UTF-16LE):crlf, $filename) or die $!;
Cool, that works. thanks! :-)
--
Michael Ludwig
; # appears logical, but wrong result
out $e, $n;
out $n, $e;
out $n, $r, $e; # :crlf reset
--
Michael Ludwig
[RE: encoding(UTF16-LE) on Windows]
Jan Dubois schrieb am 20.01.2011 um 12:45 (-0800):
On Thu, 20 Jan 2011, Michael Ludwig wrote:
Erland Sommarskog schrieb am 20.01.2011 um 08:29 (-):
Jan Dubois (j...@activestate.com) writes:
You need to stack the I/O layers in the right order
for Unicode.
Or what exactly are you referring to?
--
Michael Ludwig
Erland Sommarskog schrieb am 31.01.2011 um 23:42 (+0100):
Michael Ludwig (mil...@gmx.de) writes:
Erland Sommarskog schrieb am 29.01.2011 um 14:02 (+0100):
Yes, there certainly seems to be some more stuff to do in the
Unicode support in Perl. For instance, support for Unicode
filenames
of these produced the desired effect. Any ideas?
I know I could use a Linux UTF-8 terminal or Cygwin/MinTTY, which is
what I'm using to write this mail, by the way; but the question is
specific to the Windows console in cmd.exe and how to make Perl use
its features.
--
Michael Ludwig
Michael Ludwig schrieb am 07.01.2012 um 18:30 (+0100):
There's a WinAPI function that sets stdout to Unicode so you can
read Cyrillic and Greek characters in the cmd.exe console window:
Can I get the same feature in Perl?
Yes: https://metacpan.org/module/Win32::Unicode
Printing twelve hearts
.
Might seem wrong at first glance, I agree; but hey, this is Windows, and
it magically works, sort of bypassing the chcp setting! You just need
the C API call:
_setmode(_fileno(stdout), _O_WTEXT)
Or some equivalent of that, which is what Win32::Unicode seems to be
doing.
--
Michael Ludwig
37 matches
Mail list logo