Hello All,

This is the sample code for parsing the MS Word 2.x Documents.
Please Let me know if I wanted to do any changes in it. Your
help is always welcome and appreciatable


Yhanks & Regards,
Sudhakar


//Beginning of Source  Code





/**
 * <p>Title: Word Document Parser</p>
 * <p>Description: This parser parses the Microsoft Word
Documents of Version  2.0 text</p>
 * <p>Copyright: Open Source Code/p>
 * @author Sudhakar Chavali Sharma
 * @version 1.0
 */

public class Word2 {
  public Word2() {
  }

  public static void main(String[] args) throws Exception{
    Word2 word21 = new Word2();
    System.out.println(word21.getText(args[0])) ;
  }
  /**
   * takes the Document name as an argument and reads the
document for getting the parsed text
   * @param file
   * @return String
   * @throws java.lang.Exception
   */
  public String getText(String file) throws Exception
  {
    java.io.FileInputStream stream=new java.io.FileInputStream
(file);
    String buffer="";
    byte bytes[]=new byte[stream.available()];
    int length=stream.read(bytes);
    buffer=new String(bytes,length);
    return ParseWord2(buffer,buffer.length());
  }

  /**
   * Parses the Word Document (Version 2.0) Buffer to normal
Text Buffer
   * @param sourceBuffer
   * @param sourceLength
   * @return String
   */
  String ParseWord2(String sourceBuffer, long sourceLength) {

    int counter; //source buffer pointer
    long quitcounter; //pointer to quit the parsing
    int incrementer; // general incrementer, used in loops
    String destinationString; //destination string;
    counter = 384; //starting position of text
    /*
     Traverse the buffer until pointer reaches the cument length
     */
    destinationString = "";
    while (counter < sourceLength) {
      quitcounter = 0;
      if (sourceBuffer.charAt(counter) == 0) {
        for (incrementer = 1; incrementer <= 10; incrementer++)
{
          if ( (sourceBuffer.charAt(counter + incrementer) ==
0)) {
            quitcounter = quitcounter + 1;
          }
          else {
            break;
          }
        }
      }
      if (quitcounter >= 10) {

        break;
      }

      if (sourceBuffer.charAt(counter) == 19) { //&&
(sourceBuffer[counter+1]='t') && (sourceBuffer[counter+2]='o')
&& (sourceBuffer[counter+3]='c'))
        counter = counter + 1;
        while (true) {
          if (sourceBuffer.charAt(counter) == 20) {
            counter = counter + 1;
            break;
          }
          counter = counter + 1;
        }
        while (true) {
          if (sourceBuffer.charAt(counter) == 21) {
            counter = counter + 1;
            break;
          }
          destinationString = destinationString +
              (char) sourceBuffer.charAt(counter);
          counter = counter + 1;
        }
      }
      else {
        if ( (sourceBuffer.charAt(counter) == 13) &&
            (sourceBuffer.charAt(counter + 1) == 7)) {
          if ( (sourceBuffer.charAt(counter + 2) == 13) &&
              (sourceBuffer.charAt(counter + 3) == 7)) {
            /*
                This is row break in a table
             */

            destinationString = destinationString + (char) 13;
            destinationString = destinationString + (char) 10;
            counter = counter + 4;
          }
          else {
            /*                This is column Break in Table     
       */

            destinationString = destinationString + (char) 9;
            counter = counter + 2;
          }
        }
        else {
          //this is for column breaks
          if ( (sourceBuffer.charAt(counter) == 13) &&
              (sourceBuffer.charAt(counter + 1) == 10) &&
              (sourceBuffer.charAt(counter + 2) == 14)) {
            destinationString = destinationString + (char) 13;
            destinationString = destinationString + (char) 10;
            counter = counter + 3;
          }
          else if ( (sourceBuffer.charAt(counter) == 13) &&
                   (sourceBuffer.charAt(counter + 1) == 10) &&
                   (sourceBuffer.charAt(counter + 2) == 12)) {
            /*This is Page Break*/
            destinationString = destinationString + (char) 13;
            destinationString = destinationString + (char) 10;
            counter = counter + 3;
          }
          else {
            /*               Normal flow of charachters         
   */
            if (sourceBuffer.charAt(counter) != 0) {
              destinationString = destinationString +
                  (char) sourceBuffer.charAt(counter);
            }
            counter = counter + 1;
          }
        }
      }
    }
    return destinationString;
  }
}





// End of Source Code

=====
"No one can earn a million dollars honestly."- William Jennings Bryan (1860-1925) 

"Make everything as simple as possible, but not simpler."- Albert Einstein (1879-1955)

"It is dangerous to be sincere unless you are also stupid."- George Bernard Shaw 
(1856-1950)

__________________________________
Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
http://taxes.yahoo.com/filing.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to