Mike Shaver wrote:
On 8/22/05, Dennis Jenkins <[EMAIL PROTECTED]> wrote:
I very much disagree. I want the entire file, header included, to be
encrypted. Sometimes you don't want anyone to know what the file type
is. Security through obscurity is not secure. However, you don't want
to give the bad guys a road map either...
Finding out that it's a sqlite file is not a hard problem for an
attacker who has any interesting access to your machine, since your
programs must find that file somehow. Once they find it, are you not
concerned about lightening their cryptanalysis burden through
known-plaintext attacks?
Mike
No, not really. The sqlite crypto engine consumes the first several
hundred bytes of the rc4 random number generator output. It is my
understanding that this would significantly complicate the plain-text
attack. But I'm not a crytologist. I do find it facinating though.
I do not understand how "finding the file" would give the attackers any
clue to what kind of file it is (unless I make the filename something
like "sqlite3.db3"). If the file were named "jimbob.dat", and the
contents looked like gibberish, then what do they know? They must
analyze the program that accesses the file.
I once thought that I could remove all text strings from the sqlite code
that would give the attacker any clues. I then realized that the
strings are important to the proper functioning. The ones that need to
be left behind are significant enough to be good clues that the program
uses sqlite technology. So, I do agree with you, that it is not too
difficult to determine if a data file _might_ be an sqlite database,
even if it in encrypted.
That being said, I still like having the header encrypted as it is.
Maybe it just makes me feel warm and fuzzy on the inside :)
In the end, I feel that our software is much more vulnerable to someone
attacking it with a debugger than with crypto analytic attacks. At some
point, you must call "sqlite3_key()" and pass it three things: the
sqlite handle, a void* to the key initializer and an "int" (# of bytes
in the key). All the attacker has to do is locate that code and
determine what those last two arguments are. Personally, I find this to
be an easier approach. But then, I've been coding in assembly since I
was 8 and C for the last 10 years. I'm not much of a mathematician or
code breaker.
I have often wondered how difficult it would be to derive the rc4
initialization key given a known plain text and a known cipher text
generated from the unknown key and known plain text. I imagine it as a
breadth-first search of the key space.
Lets say that it is computationally feasible to do just that. The
sqlite header string is.. um, heck, I don't know, let's say 20 bytes.
Then you can derive the exact values for at most 20 values of the key
state vector (it might be less if a value gets muted more than once).
What do you know about the remaining bytes of the first 256 bytes of the
sqlite file? Some of those bytes have "sane" values or other
constraints. I think that it would be too difficult to fully derive the
key b/c you don't know much of the plain text.
This is the extent of what I know about rc4. If someone else knows
more, please enlighten me. :)