Re: Encode-JIS2K-0.02 problem

2007-01-04 Thread Nobumi Iyanaga

Hello Joel,

On Jan 4, 2007, at 4:09 PM, Joel Rees wrote:



On 2007/01/03, at 23:52, Nobumi Iyanaga wrote:


$_ = decode ("shiftjisx0123", $_);

print;

I get this error message:
untitled text 4:21:  Unknown encoding 'shiftjisx0123'


Is that a typo?


What am I doing wrong...??


Maybe 0123 should be 2013?

(I've never seen the version number for jis tagged on the end,  
but ...)


Ah! thank you!  That's right.  It is "shiftjisx0213"!  My excuse, if  
there is any, is that I copied "shiftjisx0123" from search.cpan.org/~dankogai/Encode-JIS2K-0.02/JIS2K.pm>, under ABSTRACT  
("Canonical")





---

And -- if I can solve this problem, I would like to find out from  
text files in shiftjisx0123 characters which belong only to JIS X  
0213, not to JIS X 0212.  Is this possible...??


I'm sure it's possible, either by making something like an isprint  
boolean table for each entire character set, or be slurping the  
file and scanning it in parallel from memory. I think it should  
even be possible to open two read-only streams on the same file,  
read characters out, and throw some message when the one doesn't  
match the other.


Don't know if there are any shortcut tools for it.


Thank you.  I will try to study a little more on this problem.

Best regards,

Nobumi Iyanaga
Tokyo,
Japan






Encode-JIS2K-0.02 problem

2007-01-03 Thread Nobumi Iyanaga

Hello,

I downloaded and installed Encode-JIS2K-0.02.  Install log says that  
all tests were successful.  But when I do this:


#!/usr/bin/perl

use strict;
use warnings;

use Encode::JIS2K;
use Encode qw/encode decode/;

my $infile = "some_shiftjisx0123.txt";

undef $/;

open (IN, $infile);

$_ = ;

close (IN);

binmode (STDOUT, ":utf8");

$_ = decode ("shiftjisx0123", $_);

print;

I get this error message:
untitled text 4:21:  Unknown encoding 'shiftjisx0123'

What am I doing wrong...??

---

And -- if I can solve this problem, I would like to find out from  
text files in shiftjisx0123 characters which belong only to JIS X  
0213, not to JIS X 0212.  Is this possible...??


Thank you very much in advance.

Best regards,

Nobumi Iyanaga
Tokyo,
Japan



Re: encode qp a Unicode string

2006-10-07 Thread Nobumi Iyanaga

Hello Gisle,

On Oct 7, 2006, at 11:04 PM, Gisle Aas wrote:


Nobumi Iyanaga <[EMAIL PROTECTED]> writes:


What am I doing wrong?


You did not read 'perldoc MIME::QuotedPrint' to the end :)

|Perl v5.8 and better allow extended Unicode characters in  
strings.  Such strings
|cannot be encoded directly, as the quoted-printable encoding  
is only defined for
|single-byte characters.  The solution is to use the Encode  
module to select the byte

|encoding you want.  For example:
|
|use MIME::QuotedPrint qw(encode_qp);
|use Encode qw(encode);
|
|$encoded = encode_qp(encode("UTF-8", "\x{}\n"));
|print $encoded;



Ah, thank you very much indeed!

Best regards,

Nobumi Iyanaga
Tokyo,
Japan



encode qp a Unicode string

2006-10-07 Thread Nobumi Iyanaga

Hello,

I have a Unicode string that I would like to convert into quoted- 
printable encoding, but if I do:


#!/usr/bin/perl

use utf8;
use MIME::QuotedPrint;

my $unicode_string = "xxx" # where I have real Unicode string, for  
example Japanese characters...

$encoded = encode_qp($unicode_string);
print "qp: $encoded\n";

I get the error message:

"Wide character in subroutine entry"

If I comment out the "use utf8", I get the right result, but I need  
it for my script.


I tried also to convert the Unicode string to data using the code

$native_string  = pack("C*", unpack("U*", $Unicode_string));

that I found in perluniintro, but I get the error message:

"Character in 'C' format wrapped in pack"

What am I doing wrong?

Thank you very much in advance for any help.

Best regards,

Nobumi Iyanaga
Tokyo,
Japan



How to know if a module is installed

2006-09-27 Thread Nobumi Iyanaga

Hello,

This is a newbie question: how can I determine if a specific module  
is installed on a client machine?


I would like to do something like this:

if (MacPerl installed is true) {
do this...;
}
else {
do nothing...;
}

Thank you in advance for any help.

Best regards,

Nobumi Iyanaga
Tokyo,
Japan



Re: Enconding, locate, etc.

2006-04-19 Thread Nobumi Iyanaga

Hello Ende,

On Apr 19, 2006, at 8:58 PM, ende wrote:



Wow!  It is near the full solution!  It is a pity it fails when you  
do not use accented chars!!


I tried again, and with the following script, it *seems* that you can  
use either non-accented characters or accented character:


#!/usr/bin/perl

use utf8;
use Encode;
use Unicode::Normalize;

binmode (STDOUT, ":utf8");

my $re = join("|", @ARGV);
$re = decode ("utf8", $re);
my $listin = "/Users/me/Documents/documentos/Familia/Casa/Telistin.txt";

open my $f, "<:encoding(MacRoman)", "$listin" or die "$listin no  
abre: $!";

while (<$f>) {
chomp;
if (/$re/i) {
print $_, "\n";
}
else {
my $temp = NFD($_);
$temp =~ s/[\x{0300}-\x{036F}\x{0081}]+//g;
print $_, "\n" if $temp =~ /$re/i;
}
if ($re !~ /^[\x{}-\x{007F}]+$/) {
my $temp = NFD($re);
$temp =~ s/[\x{0300}-\x{036F}\x{0081}]+//g;
print $_, "\n" if /$temp/i;
}
}
close $f;

You would call this script either:

perl Ende_test.pl angeles

or

perl Ende_test.pl ángeles

to get:

Ángeles
Angeles
ángeles
angeles

Is this what you would want...?

Note that I am not sure at all if this will work for all cases.

Best regards,

Nobumi Iyanaga
Tokyo,
Japan



Re: Enconding, locate, etc.

2006-04-19 Thread Nobumi Iyanaga

Dear Ende,

On Apr 19, 2006, at 5:22 PM, ende wrote:



Thanks  Nobumi,

Your solution is not only shorter but also more precise and correct  
than my first attempt.  But, anyway, although it works better it  
doesn't find words with different accented capitalization.  That  
is, if you look for "Ángeles" it doesn't find nor "Angeles" nor  
"angeles" nor "ángeles"...




Well, on my machine, if I call that script with:

perl Ende_test.pl Ángeles

it does find "Ángeles" AND "ángeles" (because it has the "i" option  
in the regex).


But you seem to want to do a kind of "accent insensitive search"...?   
That should not be simple.


One possible -- and rather simple -- solution would be to use  
"Unicode::Normalize".  I just tried this script:


#!/usr/bin/perl

use utf8;
use Encode;
use Unicode::Normalize;

binmode (STDOUT, ":utf8");

my $re = join("|", @ARGV);
$re = decode ("utf8", $re);
my $listin = "/Users/me/Documents/documentos/Familia/Casa/Telistin.txt";

open my $f, "<:encoding(MacRoman)", "$listin" or die "$listin no  
abre: $!";

while (<$f>) {
chomp;
if (/$re/i) {
print $_, "\n";
}
else {
my $temp = NFD($re);
$temp =~ s/[\x{0300}-\x{036F}\x{0081}]+//g;
print $_, "\n" if /$temp/i;
}
}
close $f;

I can call this script from Terminal like this:

perl Ende_test.pl Ángeles

or

perl Ende_test.pl ángeles

and get the reply:

Ángeles
Angeles
ángeles
angeles

-- But you have to use the accented character to match non-accented  
characters -- that is, you will find only 


Angeles
angeles

if you invoke the script with:

perl Ende_test.pl Angeles

or

perl Ende_test.pl angeles

Best regards,

Nobumi Iyanaga
Tokyo,
Japan