Re: [users] Extract text from multiple Powerpoint documents?

JC Helary Tue, 31 Jan 2006 22:55:51 -0800

Ross wrote:

unzip -p File.odp content.xml | perl -p -e "s/<[^>]*>//g;s/ +//;"

There is some textual file in style.xml as well (headers/footers) andmeta information that my need to be extracted in the other xml files.

In some cases, the PPT will need to be imported with all the embeddedobjects converted to OD. THis is a setting in the MS compatibilitypreference pane I think.

In that case, the unpacked file includes /Object/ directories thataslo contain content.xml etc.

In case there are no embedded objects (like Word documents, Excelspeadsheets) the top level content.xml contains 99.99% of theextractable data.

JC

should work without needing to unpack first.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [users] Extract text from multiple Powerpoint documents?

Reply via email to