Hi Perl Gurus,

I am using functions decode_entities() & decode_utf8() to decode the html
codes and UTF (latin characters) respectively. (from module use Encode).
The functions which i mentioned above works upto ASCII Decimals 255 and
above that it works differently.
This is the URL i referred to know the list of html codes and latin
characters [http://www.ascii.cl/htmlcodes.htm].

Attached the sample script.
 Where i give the input values which i got from a XML SOAP response for
decoding (The SOAP response doesn't gives the HTML numbers or HTML codes as
in the above said URL list).

The script gives me what i expected for array values from arr_val[0] to
arr_val[4] ((i.e) upto ASCII Decimals range 0-255)
but for arr_val[5] (which have ASCII Decimals greater than 255) the decoded
values are different.

Given the list of array variable values and their expected values. The
decoding fails for array variable arr_val[5].
Similarly i would need to encode also.

$arr_val[0] = '!"#$%&'()*+,-./   0123456789:;<=>?' ;
             expected decoded values -- !"#$%&'()*+,-./ 0123456789:;<=>?

$arr_val[1] =
'@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~' ;
             expected decoded values --
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

$arr_val[2] =
'�...@Ã~aÃ~bÃ~cÃ~dÃ~eÃ~fÃ~gÃ~hÃ~iÃ~jÃ~kÃ~lÃ~mÃ~nÃ~oÃ~pÃ~qÃ~rÃ~sÃ~tÃ~uÃ~vÃ~wÃ~xÃ~yÃ~zÃ~[Ã~\Ã~]Ã~^Ã~_'
;
             expected decoded values -- ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß

$arr_val[3] = 'Ã|
áâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ' ;
             expected decoded values -- àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

$arr_val[4] =
'¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿' ;
             expected decoded values -- ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿

$arr_val[5] = 'others Å~RÅ~SÅ|
šŸÆ~r...@~sâ~@~t...@~xâ~@~y...@~zâ~@~\...@~]â~@~^â~@| �...@¡â~@¢...@¦' ;
             expected decoded values -- others ŒœŠšŸƒ–—‘’‚“”„†‡•…‰€™

  Could you please help to know what i am missing or doing wrong.
I'll greatly appreciate the help.

Thanks
Saravanan Balaji.
#!/ms/dist/perl5/bin/perl5.8 -I ../   
 
use MSDW::Version   
'HTML-Parser'        => '3.56', # HTML::Entities may be used by HTTP::Response  
 
;   
 
use Encode;
use strict;   
use Data::Dumper;   
use HTML::Entities;
use HTML::Entities qw(encode_entities_numeric);

 
my @arr_val = ();
$arr_val[0] = '!&quot;#$%&amp;&apos;()*+,-./   0123456789:;&lt;=&gt;?' ;
$arr_val[1] = 
'@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~' ;
$arr_val[2] = '懒旅呐魄壬仕掏蜗醒矣哉肿刭谯茌捱' ;
$arr_val[3] = '噌忏溴骁栝觌祉铒瘃蝮趱鲼��������' ; 
$arr_val[4] = '、¥ウЖ┆�����氨渤吹斗腹夯冀究' ; 
$arr_val[5] = 'others ������������������' ;

my $bcp_in_file = "/tmp/testbcp.in" ;
my $out_str = "" ;
if (!(open ( TEMP_OUT, ">$bcp_in_file" ) ))  ##REVISIT##  
{
        print "Error: cannot open the file \n";
}

foreach my $temp_var (@arr_val)
{
    print "\nProcessing value           [$temp_var] \n";

    decode_entities($temp_var) ;
    print "After HTML decode [$temp_var] \n";
    my $temp_var2  = decode_utf8($temp_var);
    print "After UTF8 decode [$temp_var2] \n\n";
    print TEMP_OUT $temp_var2 ;   

    #my $temp_var3 = encode_utf8($temp_var2);
    #print "After UTF8 encode [$temp_var3] \n";
    #my $temp_var4 = encode_entities($temp_var3, '"&<>' );
    #print "After HTML encode [$temp_var4] \n";
}
1;   
############ End of Script #################   

Reply via email to