I was wanting to pull the text from each page and the number of paragraphs is 
greater than the number of pages.  I need to keep the text for each page 
separated, I could not find a good example and was thinking that a paragraph 
was the same as a page in POI.

How can I cycle through the pages and get the text per page?

      try
      {
         XWPFDocument docx = new XWPFDocument(new FileInputStream(aFile));
         int numOfPages = 
docx.getProperties().getExtendedProperties().getUnderlyingProperties().getPages();

         String pageText;
         String md5Hash;
         int searchablePages = 0;
         
         List<XWPFParagraph> paragraphs = docx.getParagraphs();
         if (paragraphs != null && paragraphs.isEmpty() == false)
         {
            for (XWPFParagraph paragraph : paragraphs)
            {
                pageText = paragraph.getText();
                if (pageText != null && pageText.trim().length() > 0)
                {
                   if (pageText.indexOf('\n') > -1)
                   {
                      pageText = this.removeDuplicateLines(pageText);
                      if (pageText != null && pageText.length() > 0)
                      {
                         md5Hash = this.calcHashCode(pageText);
                         if (md5Hash != null)
                         {
                            searchablePages++;
                         }
                      }
                   }
                   else
                   {
                       md5Hash = this.calcHashCode(pageText);
                       if (md5Hash != null)
                       {
                          searchablePages++;
                       }
                   }
                }
            }
         }
      }
      catch (Throwable t)
      {
          t.printStackTrace();
      }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to