2014/1/10 Tom Davies <[email protected]>

> I tried opening with GEdit (= a lot like Notepad) but the 1st 2
> letters were not PK and then i checked a different Odt that someone
> else sent me earlier and that did start with PK.  I'm not convinced
> about the whole PK thing but it's interesting
>

​As we all know, ODT are ZIP files. "PK​" happen to be found at the
beginning of almost all ZIP files.
In this document
(http://www.pkware.com/documents/APPNOTE/APPNOTE-6.2.0.txt VI.A.)
we can see that every file start with the following header: 0x04034b50,
translated (in correct endianness) to "PK..". So every compressed files in
a ZIP file start with PK.

This also mean that if a file is somewhat corrupted, looking for this
signature and checking that the following bits make a correct header allows
one to recover files. For example, if you find the sequence 504b0304
followed 22 bytes later with a 2 bytes integer, 2 more bytes, then a
filename, you can recover it.

As we know, ODT are made of multiple files, some more important than other.
Losing the manifest for example is not a big issue, so we can recover some
ODT files with this knowledge: identifying files in the ZIP structure, then
checking that we have the "important" parts.


>
> I didn't yet try finding some tool for fixing zip files.  it might be
> worth testing on a copy of a couple of files.  There might be an odt
> fixing tool around the internet somewhere too.
>

I don't know if such tool exist for ODF, but it might be worth making one
based on my previous rant. In the case of minor corruption (which was *not*
the case from OP here), retrieving the document content and possibly losing
stuff like statusbar toolbar settings, document thumbnail, or the initial
mimetype info (representing 77bytes at the beginning of the file, enough
for it to get corrupted!) is probably an acceptable tradeoff. Even losing
some content (like pictures) might be better than losing the whole text.

-- 
To unsubscribe e-mail to: [email protected]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Reply via email to