I would like to manipulate text at the word level in a docx file based upon 
neighboring words (1 word to the left, 1 word to the right). 
With a txt file this process is very simple. Now I would like to do the same 
with docx files and then later doc files.
I spent quite a bit of time searching for an example to solve this problem that 
I could reproduce however to no avail. Therefore I thought of a possible hack 
to achieve this and would like some feedback.
Assuming each word has a consistent styling...1. Change the file extension from 
.docx to .zip2. Unzip the file3. Open the word folder and open the document.xml 
file (I assume this is where all the content is?)4. Ignore the content 
contained in "<...>" and concatenate each remaining fragment making sure to 
separate fragments with a space. 5. Split the single concatenated string into 
words, then map each word to it's desired form.6. Go back through the original 
document.xml and change each word to it's mapped value.7. Change the file 
extension from .zip to .docx8. Finished
To reiterate, I am yet to try this approach as it's merely an idea. Hopefully 
someone can set me in the right direction. It's also note important in this 
case to separate titles from paragraphs if that makes things easier.
Thanks                                    

Reply via email to