Hey SPDX-Tech team,
I have a suggestion I want to promote. Instead of rigorously defining a
new "canonical" format for computation of hashes, I suggest to define an
algorithm which recursively computes an cryptographic hash, purely based
on the structure, and its values. Similar to the Package verification
code. See bellow for a short or [1] for a complete definition of such an
algorithm.
This resulting hash can then be used to validate that the content of the
SPDX is matching.
Advantages (in my opinion):
- it is easy to implement, just took me 2h to implement it in Haskell
and Go (see [1]). It is especially easier than building another
serializer that needs to escape strings correctly, might have
performance issues and can not use existing libraries.
- it is easy to test, and a list of test cases can be provided (see [2])
- one less "format" to support / just works with JSON output
- the canonicalisation meeting would just have the task to define,
at which point a JSON is canonical regarding different structures,
representing the same SPDX data. This is still hard.
- it feels less fragile
- it can be computed from a stream and there is no 500MB string that
needs to be created in memory
Example definition of such an algorithm:
> Lets say for an arbitrary JSON I compute its hash by defining:
>
> Hashing of base values:
> hash( null ) = sha256("null")
> hash( true ) = sha256("true")
> hash( false ) = sha256("false")
> hash( 123 ) = sha256("123")
> hash( 123e2 ) = sha256("12300")
> hash( "some string" ) = sha256("\"some string\"")
>
> Recursive datatypes can be hashed, by composing a string that contains
> the hashes of all parts and hashing that:
>
> Hashing of recursive datatypes, via recursion:
> hash( [] ) = sha256("[]")
> hash( [123, "abc", { "key": "value" }] ) =
> sha256("[" +
> hash( 123 ) +
> "," +
> hash( "abc" ) +
> "," +
> hash( { "key": "value" } ) +
> "]")
>
> The same can be done with objects, by hashing each key:value pair,
> sorting the hashes, and hashing the result similar to arrays.
There are also other similar algorithms published: [3], [4]
Best
Max
[1] https://github.com/maxhbr/recursiveHashing
[2] https://github.com/maxhbr/recursiveHashing/blob/main/testdata.csv
[3] https://github.com/marekventur/json-hash
[4] https://github.com/oyamist/merkle-json
--
Maximilian Huber * [email protected] * +49-174-3410223
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Dr. Robert Dahlke, Thomas Endres
Sitz: Unterföhring * Amtsgericht München * HRB 135082
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4519): https://lists.spdx.org/g/Spdx-tech/message/4519
Mute This Topic: https://lists.spdx.org/mt/91310709/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-