All,
  Over on Tika, I recently added an experimental SAX parser to process the 
document.xml component within .docx.  That parser allows the user to select 
whether or not to include text within "moveFrom" regions.  Has anyone come 
across how to do this with .doc files?
  A test document is available here [1].  If we hide the "moveFrom" run, we 
wouldn't see "second paragraph here" twice.
  Thank you!

           Cheers,

                       Tim


[1] 
https://git-wip-us.apache.org/repos/asf?p=tika.git;a=blob;f=tika-parsers/src/test/resources/test-documents/testWORD_2006ml.doc;h=c8f509aea483006d40de9c2970df7988ff058b51;hb=fe20ecd83ea43e5ec6ad0e9fded9d803cb011251

Reply via email to