Re: Extract text office 2010

jfmnews Tue, 29 Mar 2011 02:24:18 -0700


Here Office2010 textbox.
there are some news tags :
   <mc:AlternateContent>
        <mc:Choice Requires="wps">
        </mc:Choice>
        <mc:Fallback>
        </mc:Fallback>
   </mc:AlternateContent>



[Code] 
<w:p w:rsidR="00423106" w:rsidRDefault="00733140">
        <w:r>
                <mc:AlternateContent>
                        <mc:Choice Requires="wps">
                                <w:drawing>
                                        <wp:anchor ...>
                                                ...
                                                <a:graphic>
                                                        <a:graphicData 
uri="http://schemas.microsoft.com/office/word/2010/wordprocessingShape";>
                                                                <wps:wsp>
                                                                        ...
                                                                        
<wps:txbx>
                                                                                
<w:txbxContent>
                                                                                
        <w:p w:rsidR="00423106" w:rsidRPr="00423106" w:rsidRDefault="00423106">
                                                                                
                <w:r>
                                                                                
                        <w:t>Togodo</w:t>
                                                                                
                </w:r>
                                                                                
        </w:p>
                                                                                
</w:txbxContent>
                                                                        
</wps:txbx>
                                                                        ...
                                                                 </wps:wsp>
                                                        </a:graphicData>
                                                </a:graphic>
                                        </wp:anchor>
                                </w:drawing>
                        </mc:Choice>
                        <mc:Fallback>
                                <w:pict>
                                        ...
                                        <v:shape id="Text Box 2" 
o:spid="_x0000_s1026" type="#_x0000_t202" 
style="position:absolute;margin-left:0;...">
                                                <v:textbox 
style="mso-fit-shape-to-text:t">
                                                        <w:txbxContent>
                                                                <w:p 
w:rsidR="00423106" w:rsidRPr="00423106" w:rsidRDefault="00423106">
                                                                        <w:r>
                                                                                
<w:t>Togodo</w:t>
                                                                        </w:r>
                                                                </w:p>
                                                        </w:txbxContent>
                                                </v:textbox>
                                        </v:shape>
                                </w:pict>
                        </mc:Fallback>
                </mc:AlternateContent>
        </w:r>
 </w:p>[/Code] 


Here Office2010 textbox.
[Code] 
 <w:p w:rsidR="00423106" w:rsidRDefault="00423106">
        <w:r w:rsidRPr="00FB2EC2">
                ...
                <w:pict>
                ...
                        <v:shape id="_x0000_s1026" type="#_x0000_t202" 
style="position:absolute;margin-left:...">
                                <v:textbox style="mso-fit-shape-to-text:t">
                                        <w:txbxContent>
                                                <w:p w:rsidR="00423106" 
w:rsidRPr="00423106" w:rsidRDefault="00423106">
                                                        <w:proofErr 
w:type="spellStart" /> 
                                                        <w:r>
                                                                
<w:t>Togodo</w:t> 
                                                        </w:r>
                                                        <w:proofErr 
w:type="spellEnd" /> 
                                                </w:p>
                                        </w:txbxContent>
                                </v:textbox>
                        </v:shape>
                </w:pict>
        </w:r>
  </w:p>
[/Code] 
----- Mail Origi


nal -----
De: "Nick Burch" <[email protected]>
À: "POI Users List" <[email protected]>
Envoyé: Lundi 28 Mars 2011 20h46:49 GMT +01:00 Amsterdam / Berlin / Berne / 
Rome / Stockholm / Vienne
Objet: Re: Extract text office 2010

On Fri, 25 Mar 2011, [email protected] wrote:
> Is Poi 3.7 can extract text from a office 2010 document ?

Generally it ought to be able to, but there's no explicit support for any 
new 2010 features that go beyond what 2007 did

> I can extract the text of the 2007 docx but not completely the text of 
> the word 2010 docx : The text of the textbox is missing

Can you identify how the xml differs? I'd suggest you try unzipping the 
two .docx files (they're a zip of xml) and see if you can see what's done 
differently for the text boxes

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Extract text office 2010

Reply via email to