Do you think the multi-hash approach worth the added complexity? It'll
break a lot of things. I mean almost everything. All git algorithms
rely on the "same hash => same content" "same content => same hash"
statements.

I think converting is a much better option. Use a single-hash storage,
and convert everything to that on import/clone/pull. I would only
introduce a a two-way mapping table (old-hash <=> new-hash) to help
the users. In the normal workflow, everything can go with new-hashes
only.

That leaves most algorithms and code intact, and introduces a very few
new cases:

On import:
If the imported data does not match your selected new-hash format, add
it's hash to the lookup table, than convert it to your selected
format, and handle it as such.

If a user references an old hash:
We can look-up that table forward, and find the referenced object in
storage. We can handle it from now as normal.

On an old signature verify:
Look-up the table forward, find the object by knew key. Look-up
backwards for all referenced objects, reconstruct the "old-format" and
verify the hash on that.

If you double-check your hashes as you build the mapping, you can even
trust it, which makes the lookups and verifys very fast. You can
introduce as many hash mapping tables as you want, so you can not only
support old-to-new has transition, but there can be as many different
hashes in the world as you want. Your only rule is "you reference your
work in your current format", but you can look-up any references which
was valid at the moment it was made.

(There are slight issues with this aproach if we "convert than
convert". As for example, when you import from A (sha1) repo to B
(sha2) repo it's perfectl. But when you import the same commits from B
to C (sha3), you might loose sha1 references. That could be considered
normal if we wan't to keep-it-simple-stupid, only support a few
hashes, always going forward. Or you may add an extra "list" field to
objects, which could show what type of hashes you have to keep in
lookup-tables for that particular object. Or, you can even include a
list of old hashes in the object itself, which should make it to the
lookup table on import.)

Anyway, I think a single-hash storage, and the ability to hide any old
hashes from most of the internal algorithms is a key point in making
transition. If we want to provide multi-hash interface to users, than
we should look for "wrapper" solutions, that translates multi-hash
user needs to a single-hash backend.

Zsolt
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to