Here Office2010 textbox.
there are some news tags :
<mc:AlternateContent>
<mc:Choice Requires="wps">
</mc:Choice>
<mc:Fallback>
</mc:Fallback>
</mc:AlternateContent>
[Code]
<w:p w:rsidR="00423106" w:rsidRDefault="00733140">
<w:r>
<mc:AlternateContent>
<mc:Choice Requires="wps">
<w:drawing>
<wp:anchor ...>
...
<a:graphic>
<a:graphicData
uri="http://schemas.microsoft.com/office/word/2010/wordprocessingShape">
<wps:wsp>
...
<wps:txbx>
<w:txbxContent>
<w:p w:rsidR="00423106" w:rsidRPr="00423106" w:rsidRDefault="00423106">
<w:r>
<w:t>Togodo</w:t>
</w:r>
</w:p>
</w:txbxContent>
</wps:txbx>
...
</wps:wsp>
</a:graphicData>
</a:graphic>
</wp:anchor>
</w:drawing>
</mc:Choice>
<mc:Fallback>
<w:pict>
...
<v:shape id="Text Box 2"
o:spid="_x0000_s1026" type="#_x0000_t202"
style="position:absolute;margin-left:0;...">
<v:textbox
style="mso-fit-shape-to-text:t">
<w:txbxContent>
<w:p
w:rsidR="00423106" w:rsidRPr="00423106" w:rsidRDefault="00423106">
<w:r>
<w:t>Togodo</w:t>
</w:r>
</w:p>
</w:txbxContent>
</v:textbox>
</v:shape>
</w:pict>
</mc:Fallback>
</mc:AlternateContent>
</w:r>
</w:p>[/Code]
Here Office2010 textbox.
[Code]
<w:p w:rsidR="00423106" w:rsidRDefault="00423106">
<w:r w:rsidRPr="00FB2EC2">
...
<w:pict>
...
<v:shape id="_x0000_s1026" type="#_x0000_t202"
style="position:absolute;margin-left:...">
<v:textbox style="mso-fit-shape-to-text:t">
<w:txbxContent>
<w:p w:rsidR="00423106"
w:rsidRPr="00423106" w:rsidRDefault="00423106">
<w:proofErr
w:type="spellStart" />
<w:r>
<w:t>Togodo</w:t>
</w:r>
<w:proofErr
w:type="spellEnd" />
</w:p>
</w:txbxContent>
</v:textbox>
</v:shape>
</w:pict>
</w:r>
</w:p>
[/Code]
----- Mail Origi
nal -----
De: "Nick Burch" <[email protected]>
À: "POI Users List" <[email protected]>
Envoyé: Lundi 28 Mars 2011 20h46:49 GMT +01:00 Amsterdam / Berlin / Berne /
Rome / Stockholm / Vienne
Objet: Re: Extract text office 2010
On Fri, 25 Mar 2011, [email protected] wrote:
> Is Poi 3.7 can extract text from a office 2010 document ?
Generally it ought to be able to, but there's no explicit support for any
new 2010 features that go beyond what 2007 did
> I can extract the text of the 2007 docx but not completely the text of
> the word 2010 docx : The text of the textbox is missing
Can you identify how the xml differs? I'd suggest you try unzipping the
two .docx files (they're a zip of xml) and see if you can see what's done
differently for the text boxes
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
