Hi,

   I am trying to read a word document using Win32::OLE.  I am able to open the document the paragraphs successfully if the contents of the document is in English. But I have a document containing English and Japanese mixed content. I am getting ‘?’ in place of Japanese characters. Can any body suggest me how to get the text without ‘?’ symbols.

 

use strict;

use Win32::OLE;

use Win32::OLE::Const 'Microsoft Word';

use Win32::Clipboard;

 

my $word_file;

my $Word;

my $document;

my $paragraphs;

my $paragraph;

my $enumerate;

my $text;

 

$word_file = 'test.doc';

 

$Word = Win32::OLE->new('Word.Application', 'Quit');

$Word->{'Visible'} = 1;

$document = $Word->Documents->Open($word_file)  || die("Unable to open document");

$Word->{Language} = 1041;

$Word->{WdOpenFormat} = 5;

$Word->{WdSaveFormat} = 7;

$paragraphs = $document->Paragraphs() ;

$enumerate = new Win32::OLE::Enum($paragraphs);

 

while(defined($paragraph = $enumerate->Next()))

{

    $paragraph->{Range}->{LanguageID} = 1041 ;

    $paragraph->{Range}->{LanguageIDFarEast} = 1041 ;

    $text = $paragraph->{Range}->{Text} ;

    print "$text\n" ;

}

 

 

$Word->ActiveDocument->Close ;

$Word->Quit;

 

 

The test.doc file contains the following line

 

こんにちは、皆 means Hello Everybody

 

Thanks in advance

Lalith

 

 

 


Confidentiality Notice

The information contained in this electronic message and any attachments to this message are intended
for the exclusive use of the addressee(s) and may contain confidential or privileged information. If
you are not the intended recipient, please notify the sender at Wipro or [EMAIL PROTECTED] immediately
and destroy all copies of this message and any attachments.
_______________________________________________
Perl-Win32-Admin mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to