Mark, Thanks for trying out. I tried the same way you did... Can you suggest me a way to resolve this.. What is the option one to try patch HWPF.. May be i can try if you have any new way to approach this problem... And the option two, SWT/OLE iam new to it and i dont know how to work on that..
Thanks Karthik On Sun, Aug 9, 2009 at 6:07 AM, MSB <[email protected]> wrote: > > This morning, I had the chance to try what I spoke of in my previous email > - > using Java to replace text in a copy of the Paragraph's text and then > replacing all of the text in the Paragraph with the modified copy. It > worked > but only up to a point; if the replacement text was exactly the same length > as the search term, this technique worked but any differences in length > rendered the resulting file corrupt; and I am guessing that this is the > same > problem as you originally encountered. > > So, I think I can conclude that if you are trying to replace the search > term > with a String of text that is longer than it, you will run into problems. > As > long as the replacement is shorter than the search term - or at most the > same length - then the previous piece of code I posted seems to work well > enough. > > To my mind then you have two options. Option 1 would be to patch HWPF so > that it will work as you wish - the API is very immature and has not been > the focus of the same sort of development effort as has HSSF for example. > Option 2 is to use an alternative such as SWT/OLE or OpenOffice. The > limitation with the OpenOffice approach is that whilst it can read OpenXML > documents - Office 2007 and beyond with the .docx or similar extension - it > cannot save a document in this format. > > Sorry for the bad news. > > Yours > > Mark B > > > karthik-33 wrote: > > > > Hi Mark, Thanks for sending this program. > > I tried this program with POI 3.2 final version, which iam currently > > using. > > CharaterRun doesnt behave consistently the same way, sometimes it splits > > the > > paragraph text into more number of Character run, sometimes it doesnt > > split > > and i see the whole paragraph text in one character run. So the search > > text > > is not getting replaced. > > Is there anyway to solve this issue? > > On Sat, Aug 8, 2009 at 6:34 AM, MSB <[email protected]> wrote: > > > >> > >> Here is the HWPF based code that I put together to play around with. It > >> was > >> written a very long time ago so I am not sure what testing I undertook > >> and > >> exactly what the results were but I have run it this morning just to > >> ensure > >> that > >> it does not crash the PC and all seems to be well. This section has been > >> cut > >> from a much larger class that is full of other test code that I play > >> around > >> with peridocically. Everything you need is there I believe but on the > >> off-chance > >> that it calls another method whose source I have neglected to include, > >> just > >> drop > >> an email to the list please. > >> > >> Currently, I am running POI version 3.5 beta 7 on a PC operating under > >> Windows XP SP2. Office 2007 is installed now and it seems able to open > >> the > >> files this code produces quite happily. In the back of my mind, I seem > to > >> remember that the files produced by some search and replace code I put > >> together could be opened but not modified; I tested that problem this > >> morning > >> and the files this code produces seem fine, I can open them, make > changes > >> and > >> then save the results again. But do please be prepared for problems like > >> that. > >> > >> Again, can I emphasise this is test code; it is scruffy and there are > >> going > >> to > >> be variables I put in there so that I could monitor the progress of the > >> code > >> by dumping messages to the screen. As you go through, if something seems > >> to > >> be superfluous, then this is likely the reason and you can comment it > out > >> or > >> delete it. > >> > >> Good luck and I do hope it all works. If you have any problems, just > drop > >> a > >> message onto the list. > >> > >> Yours > >> > >> Mark B > >> > >> > >> import org.apache.poi.poifs.filesystem.POIFSFileSystem; > >> import org.apache.poi.hwpf.HWPFDocument; > >> import org.apache.poi.hwpf.usermodel.Range; > >> import org.apache.poi.hwpf.usermodel.Paragraph; > >> import org.apache.poi.hwpf.usermodel.CharacterRun; > >> > >> import java.io.File; > >> import java.io.FileOutputStream; > >> import java.io.FileInputStream; > >> import java.io.BufferedOutputStream; > >> import java.io.BufferedInputStream; > >> import java.util.HashMap; > >> import java.util.Iterator; > >> > >> /** > >> * This code is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR > >> * CONDITIONS OF ANY KIND, either express or implied. It is not intended > >> to > >> * be used in a 'production' environment without undergoing rigorous > >> testing. > >> * > >> * With that out of the way, an instance of this class can be used to > >> search > >> for > >> * and replace Strings of text within a Word document. To see how the > >> code > >> may > >> * be used, look into the main() method for examples. > >> * > >> * Note the replacements made by the code contained within this class > >> ignore > >> * any formatting that may have been applied to the text that is > >> replaced. > >> That > >> * is to say that if the text was originally formatted to use the Arial > >> font, > >> * was sized to 24 points, emboldened, underlined and red in colour, > then > >> all > >> * of this will be lost if it is replaced. Further if any text is > >> replaced > >> in a > >> * Paragraph, all the formatting applied to that Paragraph's contents is > >> likely > >> * to be lost. > >> * > >> * @author Mark Beardsley [msb at apache.org] > >> * @version 1.00 8th August 2009 (cannot remember when originally put > >> together) > >> */ > >> public class SearchReplace { > >> > >> > >> /** > >> * Search for and replace a single occurrence of a string of text > >> within a > >> * Word document. > >> * > >> * Note that no checks are made on the parameter's values; that > is > >> to say > >> * that the file named in the InputFilename parameter will not be > >> checked > >> * to ensure the file exists and neither of the searchTerm nor > >> * replacementTerm parameters will be checked to ensure they are > >> not > >> null. > >> * Also, note that I have never tested passing the same String to > >> the > >> * inputFilename and outputFilename parameters but cannot see why > >> that > >> * should not be possible. > >> * > >> * @param inputFilename An instance of the String class that > >> encapsulates > >> * the name of and path to a Word document > >> which is > >> * in the binary (OLE2CDF) format. The > >> contents > >> of > >> this > >> * document will be searched for occurrences > >> of > >> the > >> * search term. > >> * @param outputFilename An instance of the String class that > >> encapsulates > >> * the name of and path to a Word document > >> which is > >> * in the binary (OLE2CDF) format. This > >> document will > >> * contain the results of the search and > >> replace > >> * operation. > >> * @param searchTerm An instance of the String class that > >> encapsulates a > >> * series of characters, a word or words. The > >> document > >> * will be searched for occurrences of this > >> String. > >> * @param replacementTerm An instance of the String class that > >> contains a > >> * series of characters, a word or words. > >> The > >> String > >> * encapsulated by the searchTerm > parameter > >> will be > >> * replaced by the 'contents' of this > >> parameter. > >> * > >> */ > >> public void searchAndReplace(String inputFilename, > >> String outputFilename, > >> String searchTerm, > >> String replacementText) { > >> > >> File inputFile = null; > >> File outputFile = null; > >> FileInputStream fileIStream = null; > >> FileOutputStream fileOStream = null; > >> BufferedInputStream bufIStream = null; > >> BufferedOutputStream bufOStream = null; > >> POIFSFileSystem fileSystem = null; > >> HWPFDocument document = null; > >> Range docRange = null; > >> Paragraph paragraph = null; > >> CharacterRun charRun = null; > >> int numParagraphs = 0; > >> int numCharRuns = 0; > >> String text = null; > >> > >> try { > >> // Create an instance of the POIFSFileSystem class and > >> // attach it to the Word document using an InputStream. > >> inputFile = new File(inputFilename); > >> fileIStream = new FileInputStream(inputFile); > >> bufIStream = new BufferedInputStream(fileIStream); > >> fileSystem = new POIFSFileSystem(bufIStream); > >> document = new HWPFDocument(fileSystem); > >> > >> // Get the overall Range object for the document. Note the > >> // use of the getRange() method and not the getOverallRange() > >> // method, this is just historic - when the code was > >> originally > >> // written, I do not believe the latter method was part of > the > >> API. > >> docRange = document.getRange(); > >> > >> // Get the number of Paragraph(s) in the overall range and > >> iterate > >> // through them > >> numParagraphs = docRange.numParagraphs(); > >> for(int i = 0; i < numParagraphs; i++) { > >> > >> // Get a Paragraph and recover the text from it. This > step > >> is > >> far from > >> // necessary and I think I only got the text so that I > >> could > >> print > >> // it to screen as a diagnostic check to ensure that the > >> Paragraph > >> // contained the text I was searching for. Experiment > with > >> this. > >> paragraph = docRange.getParagraph(i); > >> text = paragraph.text(); > >> > >> // Get the number of CharacterRuns in the Paragraph > >> numCharRuns = paragraph.numCharacterRuns(); > >> for(int j = 0; j < numCharRuns; j++) { > >> > >> // Get a character run and recover it's text - > >> note > >> that > >> // the same text variable is used as for the > >> Paragraph > >> above. > >> // So, it MUST be safe to remove the text = > >> paragraph.text() > >> // line above. > >> charRun = paragraph.getCharacterRun(j); > >> text = charRun.text(); > >> > >> // Check to see if the text of the CharacterRun > >> contains > >> the > >> // search term. If it does, find out where that term > >> starts > >> // and call the replaceText() method passing the > >> index. > >> // Maybe this is the key difference between what we > >> are > >> // doing. > >> if(text.contains(searchTerm)) { > >> int start = text.indexOf(searchTerm); > >> charRun.replaceText(searchTerm, replacementText, > >> start); > >> } > >> } > >> } > >> > >> // Close the InputStream > >> bufIStream.close(); > >> bufIStream = null; > >> > >> // Open an OutputStream and write the document away. > >> outputFile = new File(outputFilename); > >> fileOStream = new FileOutputStream(outputFile); > >> bufOStream = new BufferedOutputStream(fileOStream); > >> > >> document.write(bufOStream); > >> > >> } > >> catch(Exception ex) { > >> System.out.println("Caught an: " + ex.getClass().getName()); > >> System.out.println("Message: " + ex.getMessage()); > >> System.out.println("Stacktrace follows............."); > >> ex.printStackTrace(System.out); > >> } > >> finally { > >> if(bufOStream != null) { > >> try { > >> //bufOStream.flush(); > >> bufOStream.close(); > >> bufOStream = null; > >> } > >> catch(Exception ex) { > >> > >> } > >> } > >> if(bufIStream != null) { > >> try { > >> bufIStream.close(); > >> bufIStream = null; > >> } > >> catch(Exception ex) { > >> // I G N O R E // > >> } > >> } > >> } > >> > >> } > >> > >> /** > >> * Search for and replace a single occurrence of a string of text > >> within a > >> * Word document. > >> * > >> * Note that no checks are made on the parameter's values; that > is > >> to say > >> * that the file named in the InputFilename parameter will not be > >> checked > >> * to ensure the file exists and neither of the searchTerm nor > >> * replacementTerm pare,eters will be checked to ensure they are > >> not > >> null. > >> * Also, note that I have never tested passing the same String to > >> the > >> * inputFilename and outputFilename parameters but cannot see why > >> that > >> * should not be possible. > >> * > >> * @param inputFilename An instance of the String class that > >> encapsulates > >> * the name of and path to a Word document > >> which is > >> * in the binary (OLE2CDF) format. The > >> contents > >> of > >> this > >> * document will be searched for occurrences > >> of > >> the > >> * search term. > >> * @param outputFilename An instance of the String class that > >> encapsulates > >> * the name of and path to a Word document > >> which is > >> * in the binary (OLE2CDF) format. This > >> document will > >> * contain the results of the search and > >> replace > >> * operation. > >> * @param replacements An instance of the java.util.HashMap class > >> that > >> * contains a series of key, value pairs. > Each > >> key > >> * is an instance of the String class that > >> encapsulates > >> * a series of characters, a word or words > >> that > >> the > >> * code will search for and the accompanying > >> value is > >> * also an instance of the String class that > >> likewise > >> * encapsulates a series of characters, a > word > >> or > >> words. > >> * The 'contents' of the value's String will > >> be > >> used to > >> * replace the contents of the key's String > if > >> an > >> * occurrence of the latter is found. > >> */ > >> public void searchAndReplace(String inputFilename, > >> String outputFilename, > >> HashMap<String, String> replacements) { > >> > >> File inputFile = null; > >> File outputFile = null; > >> FileInputStream fileIStream = null; > >> FileOutputStream fileOStream = null; > >> BufferedInputStream bufIStream = null; > >> BufferedOutputStream bufOStream = null; > >> POIFSFileSystem fileSystem = null; > >> HWPFDocument document = null; > >> Range docRange = null; > >> Paragraph paragraph = null; > >> CharacterRun charRun = null; > >> Set<String> keySet = null; > >> Iterator<String> keySetIterator = null; > >> int numParagraphs = 0; > >> int numCharRuns = 0; > >> String text = null; > >> String key = null; > >> String value = null; > >> > >> try { > >> // Create an instance of the POIFSFileSystem class and > >> // attach it to the Word document using an InputStream. > >> inputFile = new File(inputFilename); > >> fileIStream = new FileInputStream(inputFile); > >> bufIStream = new BufferedInputStream(fileIStream); > >> fileSystem = new POIFSFileSystem(bufIStream); > >> document = new HWPFDocument(fileSystem); > >> > >> // Get a reference to the overall Range for the > >> document > >> // and discover how many Paragraphs objects there > >> are > >> // in the document. > >> docRange = document.getRange(); > >> numParagraphs = docRange.numParagraphs(); > >> > >> // Recover a Set of the keys in the HashMap > >> keySet = replacements.keySet(); > >> > >> // Step through each Paragraph > >> for(int i = 0; i < numParagraphs; i++) { > >> paragraph = docRange.getParagraph(i); > >> // This line can almost certainly be removed - see > >> // the comments in the method above. > >> text = paragraph.text(); > >> > >> // Get the number of CharacterRuns in the Paragraph > >> // and step through each one. > >> numCharRuns = paragraph.numCharacterRuns(); > >> for(int j = 0; j < numCharRuns; j++) { > >> charRun = paragraph.getCharacterRun(j); > >> > >> // Get the text from the CharacterRun and recover an > >> // Iterator to step through the Set of keys. > >> text = charRun.text(); > >> keySetIterator = keySet.iterator(); > >> while(keySetIterator.hasNext()) { > >> > >> // Get the key - which is also the search term - > >> and > >> // check to see if it can be found within the > >> // CharacterRuns text. > >> key = keySetIterator.next(); > >> if(text.contains(key)) { > >> > >> // If the search term was found in the > >> text, > >> get > >> the > >> // matching value from the HashMap, find > >> out > >> whereabouts > >> // in the CharacterRuns text the search > >> term > >> is > >> // and call the replaceText() method to > >> substitute > >> // the replacement term for the search > >> term. > >> value = replacements.get(key); > >> int start = text.indexOf(key); > >> charRun.replaceText(key, value, start); > >> > >> // Note that this code was added to test > >> whether > >> // it was possible to replace multiple > >> occurrences > >> // of the search term. I cannot remember if I > >> tested > >> // it but believe that it did work; either > >> way, > >> // it could be tested now and if succeeds, > >> then > >> the > >> // searchAndReplace() method above could be > >> modified > >> // to include this. > >> docRange = document.getRange(); > >> paragraph = docRange.getParagraph(i); > >> charRun = paragraph.getCharacterRun(j); > >> text = charRun.text(); > >> } > >> } > >> } > >> } > >> > >> // Close the InputStream > >> bufIStream.close(); > >> bufIStream = null; > >> > >> // Open an OutputStream and save the modified document away. > >> outputFile = new File(outputFilename); > >> fileOStream = new FileOutputStream(outputFile); > >> bufOStream = new BufferedOutputStream(fileOStream); > >> document.write(bufOStream); > >> } > >> catch(Exception ex) { > >> System.out.println("Caught an: " + ex.getClass().getName()); > >> System.out.println("Message: " + ex.getMessage()); > >> System.out.println("Stacktrace follows............."); > >> ex.printStackTrace(System.out); > >> } > >> finally { > >> if(bufIStream != null) { > >> try { > >> bufIStream.close(); > >> bufIStream = null; > >> } > >> catch(Exception ex) { > >> // I G N O R E // > >> } > >> } > >> if(bufOStream != null) { > >> try { > >> bufOStream.flush(); > >> bufOStream.close(); > >> bufOStream = null; > >> } > >> catch(Exception ex) { > >> > >> } > >> } > >> } > >> > >> } > >> > >> /** > >> * The main entry point to the program demonstrating how the code > >> may > >> * be utilised. > >> * > >> * @param args An array of type String containing argumnets > passed > >> to the > >> * program on execution. > >> */ > >> public static void main(String[] args) { > >> SearchReplace replacer = new SearchReplace(); > >> > >> // To serach for and replace single items. Note, the code > >> has not, at > >> // least as far as I can remember, been tested by passing > >> the same > >> // file to both the searchTerm and replacementTerm > >> parameters. It ought > >> // to work but has NOT been tested I believe. > >> replacer.searchAndReplace("Document.doc", // > >> Source Document > >> "Replaced Document.doc", > >> // > >> Result Document > >> "search term", > >> // > >> Search term > >> "replacement term"); > >> // > >> Replacement term > >> > >> // To search for and replace a series of items > >> HashMap<String, String> searchTerms = new HashMap<String, > >> String>(); > >> searchTerms.put("search term 1", "replacement term 1"); > >> searchTerms.put("search term 2", "replacement term 2"); > >> searchTerms.put("search term 3", "replacement term 3"); > >> searchTerms.put("search term 4", "replacement term 4"); > >> > >> replacer.searchAndReplace("Document.doc", // Source > >> Document > >> "Replaced Document.doc", // > Result > >> Document > >> searchTerms) // > >> Search/replacement items > >> } > >> } > >> > >> > >> > >> karthik-33 wrote: > >> > > >> > Thanks for the reply mark. > >> > I dont think i need to preserve text formatting, but i would like to > >> try > >> > your code and see how it works. > >> > I think that would help me too. > >> > > >> > I cant go the open office since my business requirement is to use > >> > microsoft > >> > word documents. > >> > I will be using this search and replace function in the same PC as > that > >> of > >> > the application. > >> > > >> > If you can send me that code, i will try and let u know how it works. > >> > > >> > Thanks > >> > Karthik > >> > > >> > > >> > On Fri, Aug 7, 2009 at 11:49 AM, MSB <[email protected]> > wrote: > >> > > >> >> > >> >> Can I ask two questions please? > >> >> > >> >> Do you need to preserve the formatting applied to the text? If not, > >> then > >> >> I > >> >> think that somewhere I have a piece of HWPF code that does a search > >> and > >> >> replace. I am not at all certain about the state of the code and > >> cannot > >> >> remember if I hit the same problem as you - and I may well have - but > >> I > >> >> am > >> >> willing to look it out if you think it might help. > >> >> > >> >> Secondly, do you have to use HWPF/XWPF? The API is still immature and > >> it > >> >> is > >> >> really only suitable for realtively simple tasks. Better alternatives > >> >> might > >> >> be OpenOffice which you can 'control' through it's UNO API or Word > >> itself > >> >> that can be manipulated using OLE. You can ONLY use OLE if you are > >> >> working > >> >> on a windows based PC and you have Word installed on that PC. > >> OpenOffice > >> >> is > >> >> more flexible but it still cannot be used - at least as far as I am > >> aware > >> >> - > >> >> as a document server, so it is best to have that application > installed > >> on > >> >> the PC you will be using for the search/replace operation. > >> >> > >> >> Yours > >> >> > >> >> Mark B > >> >> > >> >> > >> >> karthik-33 wrote: > >> >> > > >> >> > I have microsoft office 2007 and while saving the document, i save > >> it > >> >> as > >> >> > microsoft 2003 document. > >> >> > Iam trying to replace the text using replaceText method in > >> Paragraph. > >> >> > It works fine when the replacement text and search text are of > equal > >> >> > length. > >> >> > It corrupts the document, when the length of the string is either > >> >> greater > >> >> > or > >> >> > less. > >> >> > If anyone has gone through the issue and resolved or have any idea. > >> >> Please > >> >> > let me know, it will be useful for me.. > >> >> > Iam not sure what is causing the problem to corrupt the document > >> >> > > >> >> > Code is: > >> >> > > >> >> > String replaceTxt = "Replacement"; > >> >> > String searchText = "Orginial"; > >> >> > POIFSFileSystem ps = new POIFSFileSystem (new > >> >> > FileInputStream("C:/Document.doc")); > >> >> > HWPFDocument doc = new HWPFDocument (); > >> >> > Range range = doc.getRange(); > >> >> > for(int x=0;x<range.numSections();x++) > >> >> > { > >> >> > Section s = range.getSection(x); > >> >> > for(int y=0;y<s.numParagraphs();y++) > >> >> > { > >> >> > Paragraph p = s.getParagraph(y); > >> >> > String paraText = p.text(); > >> >> > int offset = paraText.indexOf(searchText ); > >> >> > if(offset != -1) > >> >> > { > >> >> > p.replaceText(searchText,replaceTxt,offset); > >> >> > > >> >> > } > >> >> > } > >> >> > > >> >> > } > >> >> > > >> >> > > >> >> > >> >> -- > >> >> View this message in context: > >> >> > >> > http://www.nabble.com/Replace-Text-Problem-%28Document-Corrupt%29---POI-HWPFDocument-tp24864855p24867251.html > >> >> Sent from the POI - User mailing list archive at Nabble.com. > >> >> > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: [email protected] > >> >> For additional commands, e-mail: [email protected] > >> >> > >> >> > >> > > >> > > >> > -- > >> > karthik > >> > > >> > > >> > >> -- > >> View this message in context: > >> > http://www.nabble.com/Replace-Text-Problem-%28Document-Corrupt%29---POI-HWPFDocument-tp24864855p24876942.html > >> Sent from the POI - User mailing list archive at Nabble.com. > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > >> > > > > > > -- > > karthik > > > > > > -- > View this message in context: > http://www.nabble.com/Replace-Text-Problem-%28Document-Corrupt%29---POI-HWPFDocument-tp24864855p24885699.html > Sent from the POI - User mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- karthik
