At 00:27 +0100 18/6/10, I wrote:
If I save the file and undo the second decoding I get the proper output
In this case all talk of iso-8859-1 and cp1252 is a red herring. I
read several Italian websites where this same problem is manifest in
external material such as ads. The news page
At 13:24 -0700 17/6/10, David E. Wheeler wrote:
On Jun 17, 2010, at 12:30 PM, Henning Michael Møller Just wrote:
So the original character \x{010d} is represented by the bytes
\x{c4} and \x{8d}, an application thinks those are in fact
characters and encodes them again as \x{c3} + \x{84} and
Juerd Waalboer wrote:
E R skribis 2007-10-18 9:50 (-0500):
I'm preparing a presentation about Perl and Unicode support, and I'd
like to give a name for characters with ordinals above 255. Is there a
good name for that class?
They are characters outside the latin-1 range.
Latin-1 has
At 10:31 am -0700 23/6/06, Jianyang Tai wrote:
I encountered some problem with the Encode module when I convert
some Japanese contents from shift-jis to utf-8. Basically I am using
the from_to subroutine to do the job. All work well except for those
number inside a circle characters (8740 ~
At 2:40 pm -0700 7/7/06, Jianyang Tai wrote:
Thanks for the reply. Are you sure those characters don't exist n
shift-jis? Please take a look at the attached text file. It contains
two characters (1 in a circle and 2 in a circle). The file is in
shift-jis encoding.
Not possible. Here is an
At 4:20 pm -0700 7/7/06, Jianyang Tai wrote:
Thanks John. The original characters came from Japan, don't know if they
use some proprietary extension of shift_jis. Attached is the zipped
file. Hope it come across correctly this time. It should contains
characters 0x 87408741.
That's windows
At 1:01 pm + 30/12/05, Nick Ing-Simmons wrote:
That isn't quite right.
MIME::QuotedPrint does NOT encode space or tab.
All the more reason to forget about QP, which is a great way to
triple the size of any message in non-european languages, and use
base64. QP is designed for text that
At 11:44 am +0800 28/12/05, wing wrote:
Thanks for your prompt reply. The subject line contains some
Chinese or
Japanese characters in UTF8. Can they be encoded as UTF8 with
MIME:Base64??
The script below creates a file containing the following 4 characters
谷神不死
as utf8 bytes
At 12:42 am +0800 28/12/05, wing wrote:
I need to encode the subject line in a MIME header in UTF8 (something like
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=). I know that
this can be done by using Encode in Perl 5.8. However, in my production
environment, we can only use Perl
At 10:46 am +0100 20/12/05, [EMAIL PROTECTED] wrote:
...Let's say I have a txt file which contains a list of strings.
Some of these strings contain characters encoded in this fashion:
R\xC3\xA9union (\xC3\xA9 is one character - e with an accent).
...Now, this fails, even though when I look
At 3:54 pm +0100 23/11/05, Sven Neuhaus wrote:
this seems to be a bug:
b)
perl -MHTML::Entities -MEncode -e '$a=abcAuml;
print encode(MIME-Q, HTML::Entities::decode($a)), \n;'
Result:
=?UTF-8?Q?abc=C4def?=
What about this:
perl -MHTML::Entities -MEncode -e 'use encoding (iso-8859-1);
At 8:03 pm + 9/3/05, [EMAIL PROTECTED] wrote:
here's my perl -V
Summary of my perl5 (revision 5 version 8 subversion 6) configuration:
So ignore anything you've been told about previous versions.
Basically I have xC3 x84 and let perl think it is utf-8.
It is valid utf-8 ie A with diaresis.
At 10:22 am +0100 15/12/04, Marco Baroni wrote:
I have a long text ostensibly in utf-8, and I would like to get rid
of all the lines that contain anything BUT kanji, katakana or
hiragana (thus, throwing away Latin, but also digits, punctuation,
etc.)
There's probably a better way to do it but
At 12:39 pm +0100 15/12/04, Marco Baroni wrote:
... where can I find the hexadecimal hiragana, katakana and kanj ranges?
Get UnicodeChecker:
http://www.earthlingsoft.net/UnicodeChecker/index.html
Freeware AND you won't regret it!
eg. Do command-f and type hirag
JD
At 12:31 am +0800 3/12/04, He Zhiqiang wrote:
Now i encountered another problem, there are a few files contains
not only one charset but also two or more, for example, file1
contains japanese and chinese, if i use open() to load the data
into memory, ord and length etc.. can't correctly work!
At 10:33 am +1100 18/11/04, Rick Measham wrote:
That being the case, I grab the charset and use Encode's decode function
to turn it into 'perl's internal format' .. which in 5.8.5 is utf8
right? I then store that in the db.
What happens if you do something like this? :
my $uri =
At 6:19 pm +0100 25/2/04, Sebastian Lehmann wrote:
Can anybody tell me how to work with UTF8 and UTF16 in the same script? Any
help would be greatly appreciated.
Suppose that /tmp/iba.txt contains the text
ibañez in UCS-2, preceded by the BOM, then this
works here (Perl 5.8.3)
use Encode
At 4:21 pm +0200 5/2/04, ALexander N. Treyner wrote:
Hi John,
Your code works perfect.
But I found one strange thing.
For example I have next string:
hello ÏÂÌ hello world
that converted by the mail client to
hello =?windows-1255?Q?=F9=EC=E5=ED_hello_world?=
After converting it by
At 5:14 pm +0200 2/2/04, ALexander N. Treyner wrote:
Hello All,
I'm using utf-8 Postgres database, where I save strings in many languages.
I have to match the database with strings encoded in mime base64 or
quoted-printable format. Like next:
=?utf-8?B?15TXoNeUINee16nXlNeZINeR16LXkdeo15nXqi4=?=
At 7:36 pm +0100 2/2/04, Guido Flohr wrote:
Unfortunately, you will be out of luck for the somewhat common case
of UTF-7 (unless it is available in Encode by now).
It is:
use Encode;
for ( Encode-encodings(:all) ) { print $_$/ }
7bit-jis
AdobeStandardEncoding
AdobeSymbol
AdobeZdingbat
ascii
At 11:31 am +0100 16/9/03, [EMAIL PROTECTED] wrote:
I am running Perl 5.8. and trying to filter out some invalid Unicode
characters from Unicoded texts of some South Asian languages. There
are 28 such characters in my data (all control characters):
0x1, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15,
At 11:31 am +0100 16/9/03, [EMAIL PROTECTED] wrote:
I am running Perl 5.8. and trying to filter out some invalid Unicode
characters from Unicoded texts of some South Asian languages. There
are 28 such characters in my data (all control characters):
0x1, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15,
At 11:47 pm + 2/1/04, I wrote:
$f = /tmp/zili.txt;
open F, $f ;...
Sorry. I had my mailbox sorted by sender rather than by date, so
this message appeared at the bottom unread. My memory's not good
enough to recall I'd read it and actually replied 4 months ago :)
Happy new year!
JD
[ sent as utf8 ]
At 6:13 pm -0800 20/11/03, Neelima Bandla wrote:
I am trying to create a japanese file on a windows
machine, Below is the code I am using to do so.
my @array = (0x5f89 ,0x623f,0x5f89,0x623f);
my $str1 = pack(U*, @array);
open(FD, $filepath\\$str1) or die(
Question 1.
In this script I would like for convenience' sake to use variables in
the second line, but I don't seem to be able to do so. Am I missing
something or is is simply not possible?
$source = 'MacRoman'; # I want to use this in the next line
use encoding qw( MacRoman ), STDOUT =
At 3:36 pm -0800 2/11/03, Jan Dubois wrote:
Should work if you initialize the variable in a BEGIN block:
BEGIN { $source = 'MacRoman'; }
use encoding $source, STDOUT = 'utf-8';
Ah!
Yes, put single quotes around your EOT marker:
$text = 'EOT';
$ome$tuff
$ome$tuff
$ome$tuff
At 1:12 am +0200 26/10/03, Marco Baroni wrote:
I am new to (explicit) unicode handling, and right now I am facing
this problem.
I have some data (lots of data) that in theory should be in ascii
(with entity references in place of non-ascii characters). I have
no easy way to get to know
At 11:31 am +0100 16/9/03, [EMAIL PROTECTED] wrote:
Dear PERLists,
I am running Perl 5.8. and trying to filter out some invalid Unicode
characters from Unicoded texts of some South Asian languages. There
are 28 such characters in my data (all control characters):
0x1, 0x10, 0x11, 0x12, 0x13,
At 9:07 am -0500 27/8/03, [EMAIL PROTECTED] wrote:
I'm working with a byte oriented protocol, and need to extract byte n1 through
byte n2 from a string. Problem is, the string can be UTF8, and substr() is
character oriented. What (if anything) is the best way to do this in Perl?
Untitled 3.txt
29 matches
Mail list logo