Thanks for that, I have seen people asking for just this sort of information
before on the list. Can I assume you have been able to get something to
work?

Yours

Mark B


Fabián Avilés Martínez wrote:
> 
> Hi Mark, version 3.2-FINAL is accesible in public maven repositories,
> these are the dependencies:
> 
> <dependency>
>     <groupId>org.apache.poi</groupId>
>     <artifactId>poi</artifactId>
>     <version>3.2-FINAL</version>
> </dependency>
> <dependency>
>     <groupId>org.apache.poi</groupId>
>     <artifactId>poi-scratchpad</artifactId>
>     <version>3.2-FINAL</version>
> </dependency>
> 
> 
> Thanks, Fabi.
> 
> -----Mensaje original-----
> De: MSB [mailto:[email protected]]
> Enviado el: martes, 24 de noviembre de 2009 17:27
> Para: [email protected]
> Asunto: RE: Modify word document
> 
> 
> You are welcome.
> 
> If you do not have access to 3.2 FINAL of the API, it is possible to
> download older releases from here -
> http://archive.apache.org/dist/poi/release/bin/. Must admit that I do not
> know what changes were made to HWPF between 3.2 and 3.5 so cannot say why
> the formatting information is being lost and can only hope that you will
> ne
> able to revert to using 3.2 FINAL for this project.
> 
> All that you will need to do is to ensure that both the scratchpad and POI
> archives are in your classpath and you should be able to successfully
> compile and run the code. Any problems, just let me know.
> 
> Yours
> 
> Mark B
> 
> 
> 
> Fabián Avilés Martínez wrote:
>>
>> Wow, thats great. At least I have new direction to work with. I have been
>> struggling myself for at least three days. I can not try it today, but
>> tomorrow wil be the first thing I am going to do. I will told you the
>> results.
>>
>> Thank you so nuch.
>>
>> -----Mensaje original-----
>> De: MSB [mailto:[email protected]]
>> Enviado el: martes, 24 de noviembre de 2009 16:51
>> Para: [email protected]
>> Asunto: RE: Modify word document
>>
>>
>> I have had the chance to play around with some code and I have to admit
>> that
>> I was wrong, on two counts.
>>
>> Firstly, if you do drill down to the level of the CharacterRun and
>> perform
>> a
>> replacement operation there, you will not retain the formatting applied
>> to
>> the text, further more, it seems to fail completely; no replacements will
>> be
>> made in the document at all. To have the search term be successfully
>> replaced, you DO need to operate at the Pargraph level.
>>
>> Secondly, if the search term is shorter than the replacement term, then
>> HWPF
>> will throw an exception. It seems quite happy to work if the replacement
>> term is equal to or longer - in terms of the number of characters - than
>> the
>> search term.
>>
>> Please see the code I have attached below;
>>
>> /* ====================================================================
>>    Licensed to the Apache Software Foundation (ASF) under one or more
>>    contributor license agreements.  See the NOTICE file distributed with
>>    this work for additional information regarding copyright ownership.
>>    The ASF licenses this file to You under the Apache License, Version
>> 2.0
>>    (the "License"); you may not use this file except in compliance with
>>    the License.  You may obtain a copy of the License at
>>
>>        http://www.apache.org/licenses/LICENSE-2.0
>>
>>    Unless required by applicable law or agreed to in writing, software
>>    distributed under the License is distributed on an "AS IS" BASIS,
>>    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>> implied.
>>    See the License for the specific language governing permissions and
>>    limitations under the License.
>> ==================================================================== */
>>
>> package newsearchreplace;
>>
>> import java.io.File;
>> import java.io.FileInputStream;
>> import java.io.FileOutputStream;
>> import java.io.FileNotFoundException;
>> import java.io.IOException;
>> import java.util.HashMap;
>> import java.util.Set;
>>
>> import org.apache.poi.hwpf.HWPFDocument;
>> import org.apache.poi.hwpf.usermodel.Range;
>> import org.apache.poi.hwpf.usermodel.Paragraph;
>> import org.apache.poi.hwpf.usermodel.CharacterRun;
>>
>>
>> /**
>>  *
>>  * @author win Mark Beardsley [msb at apache.org]
>>  * @version 1.00
>>  */
>> public class SearchReplace {
>>
>>     private HashMap<String, String> searchTerms = null;
>>     private Set<String> searchKeys = null;
>>     private HWPFDocument wordDocument = null;
>>
>>     public SearchReplace() {
>>         searchTerms = new HashMap<String, String>();
>>         // The first String is the text that will be searched for, the
>> second is what will be used to
>>         // replace it. Of course, it is possible to create more than one
>> search term, replacement text
>>         // pairing.
>>         searchTerms.put("replace", "tester");
>>         searchKeys = searchTerms.keySet();
>>     }
>>
>>     public void openTemplate(String filename) throws
>> FileNotFoundException,
>> IOException {
>>         File file = null;
>>         FileInputStream fis = null;
>>         try {
>>             file = new File(filename);
>>             fis = new FileInputStream(file);
>>             this.wordDocument = new HWPFDocument(fis);
>>         }
>>         finally {
>>             if(fis != null) {
>>                 try {
>>                     fis.close();
>>                     fis = null;
>>                 }
>>                 catch(Exception ex) {
>>                     // I G N O R E
>>                 }
>>             }
>>         }
>>     }
>>
>>     public void searchAndReplace() {
>>         Range docRange = this.wordDocument.getRange();
>>         int numParas = docRange.numParagraphs();
>>         for(int i = 0; i < numParas; i++) {
>>             Paragraph para = docRange.getParagraph(i);
>>             int numCharRuns = para.numCharacterRuns();
>>             for(int j = 0; j < numCharRuns; j++) {
>>                 CharacterRun charRun = para.getCharacterRun(j);
>>                 String text = charRun.text();
>>                 for(String key : this.searchKeys) {
>>                     if(text.contains(key)) {
>>                         String replacementTerm =
>> this.searchTerms.get(key);
>>                         charRun.replaceText(replacementTerm, key);
>>                         System.out.println("Found: " + key + " in " +
>> text
>> +
>> ". Will replace with: " + replacementTerm);
>>                     }
>>                 }
>>             }
>>         }
>>
>>     }
>>
>>     public void searchReplace() {
>>         Range docRange = this.wordDocument.getRange();
>>         int numParas = docRange.numParagraphs();
>>         for(int i = 0; i < numParas; i++) {
>>             Paragraph para = docRange.getParagraph(i);
>>             String text = para.text();
>>             for(String key : this.searchKeys) {
>>                 if(text.contains(key)) {
>>                     String replacementTerm = this.searchTerms.get(key);
>>                     para.replaceText(key, replacementTerm);
>>                 }
>>             }
>>         }
>>     }
>>
>>     public void saveResults(String filename) throws
>> FileNotFoundException,
>> IOException {
>>         File file = null;
>>         FileOutputStream fos = null;
>>         try {
>>             file = new File(filename);
>>             fos = new FileOutputStream(file);
>>             this.wordDocument.write(fos);
>>         }
>>         finally {
>>             if(fos != null) {
>>                 try {
>>                     fos.close();
>>                     fos = null;
>>                 }
>>                 catch(Exception ex) {
>>                     // I G N O R E
>>                 }
>>             }
>>         }
>>     }
>>
>>     /**
>>      * @param args the command line arguments
>>      */
>>     public static void main(String[] args) {
>>         try {
>>             SearchReplace sr = new SearchReplace();
>>             sr.openTemplate("C:/temp/Test Document.doc");
>>             sr.searchAndReplace();
>>             //sr.searchReplace();
>>             sr.saveResults("C:/temp/New Updated Document.doc");
>>         }
>>         catch(Exception ex) {
>>             System.out.println("Caught an: " + ex.getClass().getName());
>>             System.out.println("Message: " + ex.getMessage());
>>             System.out.println("Stacktrace follows............");
>>             ex.printStackTrace(System.out);
>>         }
>>     }
>> }
>>
>> More particularly, look at the main method. If you comment out the
>> sr.searchAndReplace() and un-comment the sr.searchReplace() line, then
>> the
>> code will work successfully. But, and this is a BIG but, it will only
>> work
>> if you compile and run it against 3.2 FINAL of the API. I have found that
>> later versions seem to 'drop' or lose the formatting information
>> completely;
>> to convince yourself of this, just modify the main method so that it
>> contains only these lines of code;
>>
>> SearchReplace sr = new SearchReplace();
>> sr.openTemplate("C:/temp/Test Document.doc");
>> sr.saveResults("C:/temp/New Updated Document.doc");
>>
>> If you run that against versions later than 3.2 FINAL, you should see
>> that
>> the copy of the original document that this produces loses all of it's
>> formatting.
>>
>> Yours
>>
>> Mark B
>>
>> PS. I guess that it should go without saying, you will need to replace
>> the
>> paths to and document names passed to the openTemplate() and
>> saveResults()
>> methods to point to locations and files that exist on your machine.
>>
>> PPS Forgive the lack of comments please. I hope that the it is apparant
>> just
>> what the methods do.
>>
>>
>> Fabián Avilés Martínez wrote:
>>>
>>> Hi, as I told you, I have tried it, but with the same result, the
>>> resulting file is corrupted, that is what MSWord says. My next approach
>>> is
>>> to create a copy file, and do modifications within this file. My problem
>>> is that I do not know how to save modifications done in the charRuns of
>>> the paragraphs, what I mean is to persist modifications done in the
>>> resulting file, without have to coopy it, calling
>>> document.write(outputStream)
>>>
>>> My code is:
>>>
>>> public File processFile(final InputStream is, final Map<String, String>
>>> replacementText) throws IOException {
>>>         Set<String> keys = replacementText.keySet();
>>>         try {
>>>             // Makes a copy of the file.
>>>             File res = copyfile(is);
>>>             InputStream auxIs = new FileInputStream(res);
>>>             POIFSFileSystem poifs = new POIFSFileSystem(auxIs);
>>>             HWPFDocument document = new HWPFDocument(poifs);
>>>             Range range = document.getRange();
>>>
>>>             for (int i = 0; i < range.numParagraphs(); i++) {
>>>                 Paragraph paragraph = range.getParagraph(i);
>>>                 int numCharRuns = paragraph.numCharacterRuns();
>>>                 for (int j = 0; j < numCharRuns; j++) {
>>>                     CharacterRun charRun = paragraph.getCharacterRun(j);
>>>                     for (Iterator<String> it = keys.iterator();
>>> it.hasNext();) {
>>>                         String key = it.next();
>>>                         if (charRun.text().contains(key)) {
>>>                             String value = replacementText.get(key);
>>>                             charRun.replaceText(key, value);
>>>                             range = document.getRange();
>>>                             paragraph = range.getParagraph(i);
>>>                             charRun = paragraph.getCharacterRun(j);
>>>                         }
>>>                     }
>>>                 }
>>>             }
>>>             is.close();
>>>             return res;
>>>         } catch (IOException e) {
>>>             logger.error("Error procesando el fichero WORD: " + e);
>>>             throw new IOException("Error procesando el fichero WORD");
>>>         } finally {
>>>             if (is != null) {
>>>                 is.close();
>>>             }
>>>         }
>>>     }
>>>
>>>
>>> Thanks in advance, Fabi.
>>>
>>> -----Mensaje original-----
>>> De: MSB [mailto:[email protected]]
>>> Enviado el: martes, 24 de noviembre de 2009 8:43
>>> Para: [email protected]
>>> Asunto: Re: Modify word document
>>>
>>>
>>> You have not dug down far enough into the structure of the document yet
>>> I
>>> am
>>> afraid - all of the formatting information is stopred (encapsulated)
>>> within
>>> the CharacterRun class and you need to perform the repllacements at that
>>> level.
>>>
>>> I do not have any suitable code at hand as I type this so what follows
>>> will
>>> need to be converted into Java and tested;
>>>
>>> Open the Word document.
>>> Get the overall Range for the document.
>>> Get the number of Paragraph objects the Range contains.
>>> Iterate through the Pargraphs and for each Pargraph
>>>     Get the CharacterRun(s) the Paragraph contains.
>>>     Call the method to replace the search term with the replacement text
>>> on
>>> the CharacterRun
>>> Save the modified document away again.
>>>
>>> You do however face a couple of problems with this. It has been a long
>>> time
>>> since I tried to write a search and replace routine using HWPF and I
>>> could
>>> not get it to work if the replacement text was longer that the search
>>> term.
>>> In that case, HWPF threw an exception and would not allow me to complete
>>> the
>>> process; but that problem could well have been addressed by now as it
>>> was
>>> well known and caused by faulty bounds checking within the Range class.
>>> Only
>>> testing will prove or disprove this for you I am afraid.
>>>
>>> Secondly, the CharacterRun class encapsulates a piece of text with
>>> common
>>> properties. So, imagine that we are searching for the phrase 'search
>>> term'
>>> and that the word 'search' has been emboldened whilst the word 'term'
>>> has
>>> been left as normal text, then my suggested approach will not work. That
>>> is
>>> because the words search and term will be held in different
>>> CharacterRun(s).
>>> If you do hit this problem, then I am afraid you will have to write code
>>> that searches for the term at the Paragraph level and that identifies
>>> where
>>> the search terms can be found and recovers the CharacterRun(s) that
>>> encapsulate them. Once you have these, you can modify the runs or create
>>> and
>>> substitute new ones but I have to admit that I have never tried to do
>>> this
>>> myself. Instead I chose to automate Word using OLE and to explore the
>>> possibilities offered by OpenOffices UNO interface. Both options did
>>> work
>>> but threw up other problems that proved more limiting (in terms of
>>> architecture and platform). If you can get it to work, HWPF offers the
>>> better solution IMO.
>>>
>>> Yours
>>>
>>> Mark B
>>>
>>>
>>> Fabián Avilés Martínez wrote:
>>>>
>>>> Hi all,
>>>>      I have a Word document, as a template: In this template there are
>>>> some
>>>> tokenized words, which have to be modified and the result has to be
>>>> saved
>>>> into another file. The original file has some properties, like header
>>>> and
>>>> footer, images, etc. The resulting file has to be the same, but with
>>>> the
>>>> modified words. I am trying it with the code below, but it does not
>>>> work.
>>>>
>>>> public ByteArrayOutputStream processFile(final InputStream is, final
>>>> Map<String, String> replacementText)
>>>>         throws IOException {
>>>>         Set<String> keys = replacementText.keySet();
>>>>         try {
>>>>             POIFSFileSystem poifs = new POIFSFileSystem(is);
>>>>             HWPFDocument document = new HWPFDocument(poifs);
>>>>             Range range = document.getRange();
>>>>
>>>>             for (int i = 0; i < range.numParagraphs(); i++) {
>>>>                 String newTxt = range.getParagraph(i).text();
>>>>                 String oldTxt = range.getParagraph(i).text();
>>>>                 for (Iterator<String> it = keys.iterator();
>>>> it.hasNext();)
>>>> {
>>>>                     String key = it.next();
>>>>                     if (newTxt.contains(key)) {
>>>>                         newTxt = replacePlaceholders(key,
>>>> replacementText.get(key), newTxt);
>>>>                     }
>>>>                 }
>>>>                 if (!oldTxt.equals(newTxt)) {
>>>>                     range.getParagraph(i).replaceText(oldTxt, newTxt);
>>>>                 }
>>>>             }
>>>>
>>>>             // Save the document away.
>>>>             ByteArrayOutputStream bos = new ByteArrayOutputStream();
>>>>             document.write(bos);
>>>>             bos.flush();
>>>>             bos.close();
>>>>             return bos;
>>>>         } catch (IOException e) {
>>>>             logger.error("Error procesando el fichero WORD: " + e);
>>>>             throw new IOException("Error procesando el fichero WORD");
>>>>         } finally {
>>>>             if (is != null) {
>>>>                 is.close();
>>>>             }
>>>>         }
>>>>     }
>>>>
>>>> Any help, please?
>>>>
>>>> Thanks in advance, Fabi.
>>>>
>>>>
>>>>
>>>> ______________________
>>>> This message including any attachments may contain confidential
>>>> information, according to our Information Security Management System,
>>>>  and intended solely for a specific individual to whom they are
>>>> addressed.
>>>>  Any unauthorised copy, disclosure or distribution of this message
>>>>  is strictly forbidden. If you have received this transmission in
>>>> error,
>>>>  please notify the sender immediately and delete it.
>>>>
>>>> ______________________
>>>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>>>  puede contener informacion clasificada por su emisor como confidencial
>>>>  en el marco de su Sistema de Gestion de Seguridad de la
>>>> Informacion siendo para uso exclusivo del destinatario, quedando
>>>> prohibida su divulgacion copia o distribucion a terceros sin la
>>>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>>>>  erroneamente, se ruega lo notifique al remitente y proceda a su
>>>> borrado.
>>>> Gracias por su colaboracion.
>>>>
>>>> ______________________
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Modify-word-document-tp26480450p26491636.html
>>> Sent from the POI - User mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>> ______________________
>>> This message including any attachments may contain confidential
>>> information, according to our Information Security Management System,
>>>  and intended solely for a specific individual to whom they are
>>> addressed.
>>>  Any unauthorised copy, disclosure or distribution of this message
>>>  is strictly forbidden. If you have received this transmission in error,
>>>  please notify the sender immediately and delete it.
>>>
>>> ______________________
>>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>>  puede contener informacion clasificada por su emisor como confidencial
>>>  en el marco de su Sistema de Gestion de Seguridad de la
>>> Informacion siendo para uso exclusivo del destinatario, quedando
>>> prohibida su divulgacion copia o distribucion a terceros sin la
>>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>>>  erroneamente, se ruega lo notifique al remitente y proceda a su
>>> borrado.
>>> Gracias por su colaboracion.
>>>
>>> ______________________
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Modify-word-document-tp26480450p26498333.html
>> Sent from the POI - User mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>> ______________________
>> This message including any attachments may contain confidential
>> information, according to our Information Security Management System,
>>  and intended solely for a specific individual to whom they are
>> addressed.
>>  Any unauthorised copy, disclosure or distribution of this message
>>  is strictly forbidden. If you have received this transmission in error,
>>  please notify the sender immediately and delete it.
>>
>> ______________________
>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>  puede contener informacion clasificada por su emisor como confidencial
>>  en el marco de su Sistema de Gestion de Seguridad de la
>> Informacion siendo para uso exclusivo del destinatario, quedando
>> prohibida su divulgacion copia o distribucion a terceros sin la
>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado.
>> Gracias por su colaboracion.
>>
>> ______________________
>>
>>
>>
> 
> --
> View this message in context:
> http://old.nabble.com/Modify-word-document-tp26480450p26498547.html
> Sent from the POI - User mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 
> ______________________
> This message including any attachments may contain confidential 
> information, according to our Information Security Management System,
>  and intended solely for a specific individual to whom they are addressed.
>  Any unauthorised copy, disclosure or distribution of this message
>  is strictly forbidden. If you have received this transmission in error,
>  please notify the sender immediately and delete it.
> 
> ______________________
> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>  puede contener informacion clasificada por su emisor como confidencial
>  en el marco de su Sistema de Gestion de Seguridad de la 
> Informacion siendo para uso exclusivo del destinatario, quedando 
> prohibida su divulgacion copia o distribucion a terceros sin la 
> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
> Gracias por su colaboracion.
> 
> ______________________
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Modify-word-document-tp26480450p26514349.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to