Hello All,
This is the sample code for parsing the MS Word 2.x Documents.
Please Let me know if I wanted to do any changes in it. Your
help is always welcome and appreciatable
Yhanks & Regards,
Sudhakar
//Beginning of Source Code
/**
* <p>Title: Word Document Parser</p>
* <p>Description: This parser parses the Microsoft Word
Documents of Version 2.0 text</p>
* <p>Copyright: Open Source Code/p>
* @author Sudhakar Chavali Sharma
* @version 1.0
*/
public class Word2 {
public Word2() {
}
public static void main(String[] args) throws Exception{
Word2 word21 = new Word2();
System.out.println(word21.getText(args[0])) ;
}
/**
* takes the Document name as an argument and reads the
document for getting the parsed text
* @param file
* @return String
* @throws java.lang.Exception
*/
public String getText(String file) throws Exception
{
java.io.FileInputStream stream=new java.io.FileInputStream
(file);
String buffer="";
byte bytes[]=new byte[stream.available()];
int length=stream.read(bytes);
buffer=new String(bytes,length);
return ParseWord2(buffer,buffer.length());
}
/**
* Parses the Word Document (Version 2.0) Buffer to normal
Text Buffer
* @param sourceBuffer
* @param sourceLength
* @return String
*/
String ParseWord2(String sourceBuffer, long sourceLength) {
int counter; //source buffer pointer
long quitcounter; //pointer to quit the parsing
int incrementer; // general incrementer, used in loops
String destinationString; //destination string;
counter = 384; //starting position of text
/*
Traverse the buffer until pointer reaches the cument length
*/
destinationString = "";
while (counter < sourceLength) {
quitcounter = 0;
if (sourceBuffer.charAt(counter) == 0) {
for (incrementer = 1; incrementer <= 10; incrementer++)
{
if ( (sourceBuffer.charAt(counter + incrementer) ==
0)) {
quitcounter = quitcounter + 1;
}
else {
break;
}
}
}
if (quitcounter >= 10) {
break;
}
if (sourceBuffer.charAt(counter) == 19) { //&&
(sourceBuffer[counter+1]='t') && (sourceBuffer[counter+2]='o')
&& (sourceBuffer[counter+3]='c'))
counter = counter + 1;
while (true) {
if (sourceBuffer.charAt(counter) == 20) {
counter = counter + 1;
break;
}
counter = counter + 1;
}
while (true) {
if (sourceBuffer.charAt(counter) == 21) {
counter = counter + 1;
break;
}
destinationString = destinationString +
(char) sourceBuffer.charAt(counter);
counter = counter + 1;
}
}
else {
if ( (sourceBuffer.charAt(counter) == 13) &&
(sourceBuffer.charAt(counter + 1) == 7)) {
if ( (sourceBuffer.charAt(counter + 2) == 13) &&
(sourceBuffer.charAt(counter + 3) == 7)) {
/*
This is row break in a table
*/
destinationString = destinationString + (char) 13;
destinationString = destinationString + (char) 10;
counter = counter + 4;
}
else {
/* This is column Break in Table
*/
destinationString = destinationString + (char) 9;
counter = counter + 2;
}
}
else {
//this is for column breaks
if ( (sourceBuffer.charAt(counter) == 13) &&
(sourceBuffer.charAt(counter + 1) == 10) &&
(sourceBuffer.charAt(counter + 2) == 14)) {
destinationString = destinationString + (char) 13;
destinationString = destinationString + (char) 10;
counter = counter + 3;
}
else if ( (sourceBuffer.charAt(counter) == 13) &&
(sourceBuffer.charAt(counter + 1) == 10) &&
(sourceBuffer.charAt(counter + 2) == 12)) {
/*This is Page Break*/
destinationString = destinationString + (char) 13;
destinationString = destinationString + (char) 10;
counter = counter + 3;
}
else {
/* Normal flow of charachters
*/
if (sourceBuffer.charAt(counter) != 0) {
destinationString = destinationString +
(char) sourceBuffer.charAt(counter);
}
counter = counter + 1;
}
}
}
}
return destinationString;
}
}
// End of Source Code
=====
"No one can earn a million dollars honestly."- William Jennings Bryan (1860-1925)
"Make everything as simple as possible, but not simpler."- Albert Einstein (1879-1955)
"It is dangerous to be sincere unless you are also stupid."- George Bernard Shaw
(1856-1950)
__________________________________
Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
http://taxes.yahoo.com/filing.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]