Thanks for that, I have seen people asking for just this sort of information before on the list. Can I assume you have been able to get something to work?
Yours Mark B Fabián Avilés Martínez wrote: > > Hi Mark, version 3.2-FINAL is accesible in public maven repositories, > these are the dependencies: > > <dependency> > <groupId>org.apache.poi</groupId> > <artifactId>poi</artifactId> > <version>3.2-FINAL</version> > </dependency> > <dependency> > <groupId>org.apache.poi</groupId> > <artifactId>poi-scratchpad</artifactId> > <version>3.2-FINAL</version> > </dependency> > > > Thanks, Fabi. > > -----Mensaje original----- > De: MSB [mailto:[email protected]] > Enviado el: martes, 24 de noviembre de 2009 17:27 > Para: [email protected] > Asunto: RE: Modify word document > > > You are welcome. > > If you do not have access to 3.2 FINAL of the API, it is possible to > download older releases from here - > http://archive.apache.org/dist/poi/release/bin/. Must admit that I do not > know what changes were made to HWPF between 3.2 and 3.5 so cannot say why > the formatting information is being lost and can only hope that you will > ne > able to revert to using 3.2 FINAL for this project. > > All that you will need to do is to ensure that both the scratchpad and POI > archives are in your classpath and you should be able to successfully > compile and run the code. Any problems, just let me know. > > Yours > > Mark B > > > > Fabián Avilés Martínez wrote: >> >> Wow, thats great. At least I have new direction to work with. I have been >> struggling myself for at least three days. I can not try it today, but >> tomorrow wil be the first thing I am going to do. I will told you the >> results. >> >> Thank you so nuch. >> >> -----Mensaje original----- >> De: MSB [mailto:[email protected]] >> Enviado el: martes, 24 de noviembre de 2009 16:51 >> Para: [email protected] >> Asunto: RE: Modify word document >> >> >> I have had the chance to play around with some code and I have to admit >> that >> I was wrong, on two counts. >> >> Firstly, if you do drill down to the level of the CharacterRun and >> perform >> a >> replacement operation there, you will not retain the formatting applied >> to >> the text, further more, it seems to fail completely; no replacements will >> be >> made in the document at all. To have the search term be successfully >> replaced, you DO need to operate at the Pargraph level. >> >> Secondly, if the search term is shorter than the replacement term, then >> HWPF >> will throw an exception. It seems quite happy to work if the replacement >> term is equal to or longer - in terms of the number of characters - than >> the >> search term. >> >> Please see the code I have attached below; >> >> /* ==================================================================== >> Licensed to the Apache Software Foundation (ASF) under one or more >> contributor license agreements. See the NOTICE file distributed with >> this work for additional information regarding copyright ownership. >> The ASF licenses this file to You under the Apache License, Version >> 2.0 >> (the "License"); you may not use this file except in compliance with >> the License. You may obtain a copy of the License at >> >> http://www.apache.org/licenses/LICENSE-2.0 >> >> Unless required by applicable law or agreed to in writing, software >> distributed under the License is distributed on an "AS IS" BASIS, >> WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or >> implied. >> See the License for the specific language governing permissions and >> limitations under the License. >> ==================================================================== */ >> >> package newsearchreplace; >> >> import java.io.File; >> import java.io.FileInputStream; >> import java.io.FileOutputStream; >> import java.io.FileNotFoundException; >> import java.io.IOException; >> import java.util.HashMap; >> import java.util.Set; >> >> import org.apache.poi.hwpf.HWPFDocument; >> import org.apache.poi.hwpf.usermodel.Range; >> import org.apache.poi.hwpf.usermodel.Paragraph; >> import org.apache.poi.hwpf.usermodel.CharacterRun; >> >> >> /** >> * >> * @author win Mark Beardsley [msb at apache.org] >> * @version 1.00 >> */ >> public class SearchReplace { >> >> private HashMap<String, String> searchTerms = null; >> private Set<String> searchKeys = null; >> private HWPFDocument wordDocument = null; >> >> public SearchReplace() { >> searchTerms = new HashMap<String, String>(); >> // The first String is the text that will be searched for, the >> second is what will be used to >> // replace it. Of course, it is possible to create more than one >> search term, replacement text >> // pairing. >> searchTerms.put("replace", "tester"); >> searchKeys = searchTerms.keySet(); >> } >> >> public void openTemplate(String filename) throws >> FileNotFoundException, >> IOException { >> File file = null; >> FileInputStream fis = null; >> try { >> file = new File(filename); >> fis = new FileInputStream(file); >> this.wordDocument = new HWPFDocument(fis); >> } >> finally { >> if(fis != null) { >> try { >> fis.close(); >> fis = null; >> } >> catch(Exception ex) { >> // I G N O R E >> } >> } >> } >> } >> >> public void searchAndReplace() { >> Range docRange = this.wordDocument.getRange(); >> int numParas = docRange.numParagraphs(); >> for(int i = 0; i < numParas; i++) { >> Paragraph para = docRange.getParagraph(i); >> int numCharRuns = para.numCharacterRuns(); >> for(int j = 0; j < numCharRuns; j++) { >> CharacterRun charRun = para.getCharacterRun(j); >> String text = charRun.text(); >> for(String key : this.searchKeys) { >> if(text.contains(key)) { >> String replacementTerm = >> this.searchTerms.get(key); >> charRun.replaceText(replacementTerm, key); >> System.out.println("Found: " + key + " in " + >> text >> + >> ". Will replace with: " + replacementTerm); >> } >> } >> } >> } >> >> } >> >> public void searchReplace() { >> Range docRange = this.wordDocument.getRange(); >> int numParas = docRange.numParagraphs(); >> for(int i = 0; i < numParas; i++) { >> Paragraph para = docRange.getParagraph(i); >> String text = para.text(); >> for(String key : this.searchKeys) { >> if(text.contains(key)) { >> String replacementTerm = this.searchTerms.get(key); >> para.replaceText(key, replacementTerm); >> } >> } >> } >> } >> >> public void saveResults(String filename) throws >> FileNotFoundException, >> IOException { >> File file = null; >> FileOutputStream fos = null; >> try { >> file = new File(filename); >> fos = new FileOutputStream(file); >> this.wordDocument.write(fos); >> } >> finally { >> if(fos != null) { >> try { >> fos.close(); >> fos = null; >> } >> catch(Exception ex) { >> // I G N O R E >> } >> } >> } >> } >> >> /** >> * @param args the command line arguments >> */ >> public static void main(String[] args) { >> try { >> SearchReplace sr = new SearchReplace(); >> sr.openTemplate("C:/temp/Test Document.doc"); >> sr.searchAndReplace(); >> //sr.searchReplace(); >> sr.saveResults("C:/temp/New Updated Document.doc"); >> } >> catch(Exception ex) { >> System.out.println("Caught an: " + ex.getClass().getName()); >> System.out.println("Message: " + ex.getMessage()); >> System.out.println("Stacktrace follows............"); >> ex.printStackTrace(System.out); >> } >> } >> } >> >> More particularly, look at the main method. If you comment out the >> sr.searchAndReplace() and un-comment the sr.searchReplace() line, then >> the >> code will work successfully. But, and this is a BIG but, it will only >> work >> if you compile and run it against 3.2 FINAL of the API. I have found that >> later versions seem to 'drop' or lose the formatting information >> completely; >> to convince yourself of this, just modify the main method so that it >> contains only these lines of code; >> >> SearchReplace sr = new SearchReplace(); >> sr.openTemplate("C:/temp/Test Document.doc"); >> sr.saveResults("C:/temp/New Updated Document.doc"); >> >> If you run that against versions later than 3.2 FINAL, you should see >> that >> the copy of the original document that this produces loses all of it's >> formatting. >> >> Yours >> >> Mark B >> >> PS. I guess that it should go without saying, you will need to replace >> the >> paths to and document names passed to the openTemplate() and >> saveResults() >> methods to point to locations and files that exist on your machine. >> >> PPS Forgive the lack of comments please. I hope that the it is apparant >> just >> what the methods do. >> >> >> Fabián Avilés Martínez wrote: >>> >>> Hi, as I told you, I have tried it, but with the same result, the >>> resulting file is corrupted, that is what MSWord says. My next approach >>> is >>> to create a copy file, and do modifications within this file. My problem >>> is that I do not know how to save modifications done in the charRuns of >>> the paragraphs, what I mean is to persist modifications done in the >>> resulting file, without have to coopy it, calling >>> document.write(outputStream) >>> >>> My code is: >>> >>> public File processFile(final InputStream is, final Map<String, String> >>> replacementText) throws IOException { >>> Set<String> keys = replacementText.keySet(); >>> try { >>> // Makes a copy of the file. >>> File res = copyfile(is); >>> InputStream auxIs = new FileInputStream(res); >>> POIFSFileSystem poifs = new POIFSFileSystem(auxIs); >>> HWPFDocument document = new HWPFDocument(poifs); >>> Range range = document.getRange(); >>> >>> for (int i = 0; i < range.numParagraphs(); i++) { >>> Paragraph paragraph = range.getParagraph(i); >>> int numCharRuns = paragraph.numCharacterRuns(); >>> for (int j = 0; j < numCharRuns; j++) { >>> CharacterRun charRun = paragraph.getCharacterRun(j); >>> for (Iterator<String> it = keys.iterator(); >>> it.hasNext();) { >>> String key = it.next(); >>> if (charRun.text().contains(key)) { >>> String value = replacementText.get(key); >>> charRun.replaceText(key, value); >>> range = document.getRange(); >>> paragraph = range.getParagraph(i); >>> charRun = paragraph.getCharacterRun(j); >>> } >>> } >>> } >>> } >>> is.close(); >>> return res; >>> } catch (IOException e) { >>> logger.error("Error procesando el fichero WORD: " + e); >>> throw new IOException("Error procesando el fichero WORD"); >>> } finally { >>> if (is != null) { >>> is.close(); >>> } >>> } >>> } >>> >>> >>> Thanks in advance, Fabi. >>> >>> -----Mensaje original----- >>> De: MSB [mailto:[email protected]] >>> Enviado el: martes, 24 de noviembre de 2009 8:43 >>> Para: [email protected] >>> Asunto: Re: Modify word document >>> >>> >>> You have not dug down far enough into the structure of the document yet >>> I >>> am >>> afraid - all of the formatting information is stopred (encapsulated) >>> within >>> the CharacterRun class and you need to perform the repllacements at that >>> level. >>> >>> I do not have any suitable code at hand as I type this so what follows >>> will >>> need to be converted into Java and tested; >>> >>> Open the Word document. >>> Get the overall Range for the document. >>> Get the number of Paragraph objects the Range contains. >>> Iterate through the Pargraphs and for each Pargraph >>> Get the CharacterRun(s) the Paragraph contains. >>> Call the method to replace the search term with the replacement text >>> on >>> the CharacterRun >>> Save the modified document away again. >>> >>> You do however face a couple of problems with this. It has been a long >>> time >>> since I tried to write a search and replace routine using HWPF and I >>> could >>> not get it to work if the replacement text was longer that the search >>> term. >>> In that case, HWPF threw an exception and would not allow me to complete >>> the >>> process; but that problem could well have been addressed by now as it >>> was >>> well known and caused by faulty bounds checking within the Range class. >>> Only >>> testing will prove or disprove this for you I am afraid. >>> >>> Secondly, the CharacterRun class encapsulates a piece of text with >>> common >>> properties. So, imagine that we are searching for the phrase 'search >>> term' >>> and that the word 'search' has been emboldened whilst the word 'term' >>> has >>> been left as normal text, then my suggested approach will not work. That >>> is >>> because the words search and term will be held in different >>> CharacterRun(s). >>> If you do hit this problem, then I am afraid you will have to write code >>> that searches for the term at the Paragraph level and that identifies >>> where >>> the search terms can be found and recovers the CharacterRun(s) that >>> encapsulate them. Once you have these, you can modify the runs or create >>> and >>> substitute new ones but I have to admit that I have never tried to do >>> this >>> myself. Instead I chose to automate Word using OLE and to explore the >>> possibilities offered by OpenOffices UNO interface. Both options did >>> work >>> but threw up other problems that proved more limiting (in terms of >>> architecture and platform). If you can get it to work, HWPF offers the >>> better solution IMO. >>> >>> Yours >>> >>> Mark B >>> >>> >>> Fabián Avilés Martínez wrote: >>>> >>>> Hi all, >>>> I have a Word document, as a template: In this template there are >>>> some >>>> tokenized words, which have to be modified and the result has to be >>>> saved >>>> into another file. The original file has some properties, like header >>>> and >>>> footer, images, etc. The resulting file has to be the same, but with >>>> the >>>> modified words. I am trying it with the code below, but it does not >>>> work. >>>> >>>> public ByteArrayOutputStream processFile(final InputStream is, final >>>> Map<String, String> replacementText) >>>> throws IOException { >>>> Set<String> keys = replacementText.keySet(); >>>> try { >>>> POIFSFileSystem poifs = new POIFSFileSystem(is); >>>> HWPFDocument document = new HWPFDocument(poifs); >>>> Range range = document.getRange(); >>>> >>>> for (int i = 0; i < range.numParagraphs(); i++) { >>>> String newTxt = range.getParagraph(i).text(); >>>> String oldTxt = range.getParagraph(i).text(); >>>> for (Iterator<String> it = keys.iterator(); >>>> it.hasNext();) >>>> { >>>> String key = it.next(); >>>> if (newTxt.contains(key)) { >>>> newTxt = replacePlaceholders(key, >>>> replacementText.get(key), newTxt); >>>> } >>>> } >>>> if (!oldTxt.equals(newTxt)) { >>>> range.getParagraph(i).replaceText(oldTxt, newTxt); >>>> } >>>> } >>>> >>>> // Save the document away. >>>> ByteArrayOutputStream bos = new ByteArrayOutputStream(); >>>> document.write(bos); >>>> bos.flush(); >>>> bos.close(); >>>> return bos; >>>> } catch (IOException e) { >>>> logger.error("Error procesando el fichero WORD: " + e); >>>> throw new IOException("Error procesando el fichero WORD"); >>>> } finally { >>>> if (is != null) { >>>> is.close(); >>>> } >>>> } >>>> } >>>> >>>> Any help, please? >>>> >>>> Thanks in advance, Fabi. >>>> >>>> >>>> >>>> ______________________ >>>> This message including any attachments may contain confidential >>>> information, according to our Information Security Management System, >>>> and intended solely for a specific individual to whom they are >>>> addressed. >>>> Any unauthorised copy, disclosure or distribution of this message >>>> is strictly forbidden. If you have received this transmission in >>>> error, >>>> please notify the sender immediately and delete it. >>>> >>>> ______________________ >>>> Este mensaje, y en su caso, cualquier fichero anexo al mismo, >>>> puede contener informacion clasificada por su emisor como confidencial >>>> en el marco de su Sistema de Gestion de Seguridad de la >>>> Informacion siendo para uso exclusivo del destinatario, quedando >>>> prohibida su divulgacion copia o distribucion a terceros sin la >>>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje >>>> erroneamente, se ruega lo notifique al remitente y proceda a su >>>> borrado. >>>> Gracias por su colaboracion. >>>> >>>> ______________________ >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Modify-word-document-tp26480450p26491636.html >>> Sent from the POI - User mailing list archive at Nabble.com. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >>> ______________________ >>> This message including any attachments may contain confidential >>> information, according to our Information Security Management System, >>> and intended solely for a specific individual to whom they are >>> addressed. >>> Any unauthorised copy, disclosure or distribution of this message >>> is strictly forbidden. If you have received this transmission in error, >>> please notify the sender immediately and delete it. >>> >>> ______________________ >>> Este mensaje, y en su caso, cualquier fichero anexo al mismo, >>> puede contener informacion clasificada por su emisor como confidencial >>> en el marco de su Sistema de Gestion de Seguridad de la >>> Informacion siendo para uso exclusivo del destinatario, quedando >>> prohibida su divulgacion copia o distribucion a terceros sin la >>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje >>> erroneamente, se ruega lo notifique al remitente y proceda a su >>> borrado. >>> Gracias por su colaboracion. >>> >>> ______________________ >>> >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/Modify-word-document-tp26480450p26498333.html >> Sent from the POI - User mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> >> ______________________ >> This message including any attachments may contain confidential >> information, according to our Information Security Management System, >> and intended solely for a specific individual to whom they are >> addressed. >> Any unauthorised copy, disclosure or distribution of this message >> is strictly forbidden. If you have received this transmission in error, >> please notify the sender immediately and delete it. >> >> ______________________ >> Este mensaje, y en su caso, cualquier fichero anexo al mismo, >> puede contener informacion clasificada por su emisor como confidencial >> en el marco de su Sistema de Gestion de Seguridad de la >> Informacion siendo para uso exclusivo del destinatario, quedando >> prohibida su divulgacion copia o distribucion a terceros sin la >> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje >> erroneamente, se ruega lo notifique al remitente y proceda a su borrado. >> Gracias por su colaboracion. >> >> ______________________ >> >> >> > > -- > View this message in context: > http://old.nabble.com/Modify-word-document-tp26480450p26498547.html > Sent from the POI - User mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > ______________________ > This message including any attachments may contain confidential > information, according to our Information Security Management System, > and intended solely for a specific individual to whom they are addressed. > Any unauthorised copy, disclosure or distribution of this message > is strictly forbidden. If you have received this transmission in error, > please notify the sender immediately and delete it. > > ______________________ > Este mensaje, y en su caso, cualquier fichero anexo al mismo, > puede contener informacion clasificada por su emisor como confidencial > en el marco de su Sistema de Gestion de Seguridad de la > Informacion siendo para uso exclusivo del destinatario, quedando > prohibida su divulgacion copia o distribucion a terceros sin la > autorizacion expresa del remitente. Si Vd. ha recibido este mensaje > erroneamente, se ruega lo notifique al remitente y proceda a su borrado. > Gracias por su colaboracion. > > ______________________ > > > -- View this message in context: http://old.nabble.com/Modify-word-document-tp26480450p26514349.html Sent from the POI - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
