Ross wrote:

unzip -p File.odp content.xml | perl -p -e "s/<[^>]*>//g;s/ +//;"

There is some textual file in style.xml as well (headers/footers) and meta information that my need to be extracted in the other xml files.

In some cases, the PPT will need to be imported with all the embedded objects converted to OD. THis is a setting in the MS compatibility preference pane I think.

In that case, the unpacked file includes /Object/ directories that aslo contain content.xml etc.

In case there are no embedded objects (like Word documents, Excel speadsheets) the top level content.xml contains 99.99% of the extractable data.

JC

should work without needing to unpack first.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to