Still I don't see a reason for that situation to become a real problem: > * Does this actually break couchdb? Ie would it be impossible to > upload two different attachments with the same MD5? > * To the same document?
I'm not sure on this one, but since attachments are "namespaced" by their name, which have to be unique per document, I don't see a problem for them having the same MD5 sum. > * To different documents? Different documents are "namespaced" by their ID, so no problem here. > * Are there any other implications? Would replication get > confused? AFAIK replication does only take a look at _id and _rev. On 23.09.2010, at 10:13, Paul Hirst wrote: > On Thu, 2010-09-23 at 08:54 +0100, Sebastian Cohnen wrote: >> The collision probability is quite low. MD5 is considered to b broken >> from a cryptographical point of view - an attacker can craft a file >> that has the exact same hash of another one. I would doubt that you >> are going to encounter a collision in practice on "normal" usage. > > I really do have two crafted files with the same MD5 that I'd like to > store in CouchDB. They are proof of concept Windows executables and they > just happen to live in the set of files I'd like to store in Couch. It's > just 2 out of many millions of files but I'd really value an opinion on > if anything will break and in what way. > > I'll admit, this is an unusual use case. > > I want to use CouchDB to store files and metadata about files relating > to vulnerabilities, exploits, malware, etc. I could decide to throw away > these proof of concept files because they aren't actually that > interesting but there is a good chance the database I want to build > would see more of them in future. > > Obviously, under normal usage this sort of thing would never be a > problem. > > > Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United > Kingdom. > Company Reg No 2096520. VAT Reg No GB 348 3873 20.
