Re: [PATCH v3 01/11] t: add tool to translate hash-related values

2018-08-29 Thread brian m. carlson
On Tue, Aug 28, 2018 at 09:05:48PM -0700, Jonathan Nieder wrote:
> >Add two additional helpers,
> > test_oid_cache, which can be used to load data for test_oid from
> > standard input, and test_oid_init, which can be used to load certain
> > fixed values from lookup charts.  Check that these functions work in
> > t, as the rest of the testsuite will soon come to depend on them.
> >
> > Implement two basic lookup charts, one for common invalid or synthesized
> > object IDs, and one for various facts about the hash function in use.
> > Provide versions for both SHA-1 and SHA-256.
> 
> What do test_oid_cache and test_oid_init do?  How can I use them?
> 
> Judging from t-basic.sh, the idea looks something like
> 
>   Add a test function helper, test_oid, that ...
> 
>   test_oid allows looking up arbitrary information about an object format:
>   the length of object ids, values of well known object ids, etc.  Before
>   calling it, a test script must invoke test_oid_cache (either directly
>   or indirectly through test_oid_init) to load the lookup charts.
> 
>   See t for an example, which also serves as a sanity-check that
>   these functions work in preparation for using them in the rest of the
>   test suite.
> 
>   There are two basic lookup charts for now: oid-info/oid, with common
>   invalid or synthesized object IDs; and oid-info/hash-info, with facts
>   such as object id length about the formats in use.  The charts contain
>   information about both SHA-1 and SHA-256.
> 
>   So now you can update existing tests to be format-independent by (1)
>   adding an invocation of test_oid_init to test setup and (2) replacing
>   format dependencies with $(test_oid foo).
> 
>   Since values are stored as shell variables, names used for lookup can
>   only consist of shell identifier characters.  If this is a problem in
>   the future, we can hash the names before use.
> 
>   Improved-by: Eric Sunshine 
> 
> Do these always use sha1 for now?  Ah, t answers but it might be
> worth mentioning in the commit message, too:
> 
>   test_set_hash allows setting which object format test_oid should look
>   up information for, and test_detect_hash returns to the default format.

I'll expand the commit message.  They do use SHA-1 for now, but I have a
branch that makes them use SHA-256 instead.

> [...]
> > --- /dev/null
> > +++ b/t/oid-info/hash-info
> > @@ -0,0 +1,8 @@
> > +rawsz sha1:20
> > +rawsz sha256:32
> 
> Can there be a README in this directory describing the files and format?

Sure, if you like.

> [...]
> > --- a/t/t-basic.sh
> > +++ b/t/t-basic.sh
> > @@ -821,6 +821,41 @@ test_expect_success 'tests clean up even on failures' "
> > EOF
> >  "
> >  
> > +test_oid_init
> 
> Can this be wrapped in test_expect_success?  That way, if it fails or
> prints an error message then the usual test machinery would handle it.

Sure.

> > +
> > +test_expect_success 'test_oid provides sane info by default' '
> > +   test_oid zero >actual &&
> > +   grep "^00*$" actual &&
> 
> nit: can save the reader some confusion by escaping the $.

Good point.

> > +   rawsz="$(test_oid rawsz)" &&
> > +   hexsz="$(test_oid hexsz)" &&
> 
> optional: no need for these quotation marks --- a command substitution
> assigned to a shell variable is treated as if it were quoted.

That's good to know.  I will honestly say that looking through the Git
codebase and getting reviews on the list has taught me huge amounts
about the intricacies of shell.

> [...]
> > --- a/t/test-lib-functions.sh
> > +++ b/t/test-lib-functions.sh
> > @@ -1155,3 +1155,47 @@ depacketize () {
> [...]
> > +test_oid_cache () {
> > +   test -n "$test_hash_algo" || test_detect_hash
> 
> Should this use an uninterrupted &&-chain?

Yes.  Will fix.

> > +   while read _tag _rest
> 
> This appears to be the first use of this naming convention.  I wonder
> if we can use "local" instead.

We probably can.  There was a discussion about this elsewhere, and we
determined that it's probably safe, and if it's not, it should be
relatively easy to back out.

> > +   esac &&
> > +
> > +   _k="${_rest%:*}" &&
> > +   _v="${_rest#*:}" &&
> > +   { echo "$_k" | egrep '^[a-z0-9]+$' >/dev/null ||
> > +   error 'bug in the test script: bad hash algorithm'; } &&
> > +   eval "test_oid_${_k}_$_tag=\"\$_v\"" || return 1
> 
> This is dense, so I'm having trouble taking it in at a glance.
> 
> I think the idea is
> 
>   key=${rest%%:*} &&
>   val=${rest#*:} &&
> 
>   if ! expr "$key" : '[a-z0-9]*$' >/dev/null
>   then
>   error ...
>   fi &&
>   eval "test_oid_${key}_${tag}=\${val}"

Yes.  I take it that you think that's easier to read, so I'll rewrite it
that way.  I will admit a tendency to write code that is more 

Re: [PATCH v3 01/11] t: add tool to translate hash-related values

2018-08-29 Thread brian m. carlson
On Wed, Aug 29, 2018 at 08:37:48AM -0400, Derrick Stolee wrote:
> On 8/28/2018 8:56 PM, brian m. carlson wrote:
> > +   rawsz="$(test_oid rawsz)" &&
> > +   hexsz="$(test_oid hexsz)" &&
> 
> These are neat helpers! The 'git commit-graph verify' tests in
> t5318-commit-graph.sh should automatically work if we use these for HASH_LEN
> instead of 20. I'll use a similar pattern when writing 'git multi-pack-index
> verify'.

Thanks.  In the future, test_oid will learn about internal versus
external object names, since we'll have what's printed by the UI (which
might be SHA-1) and what's under the hood (which might be SHA-256).
However, it should be easy enough to update all those pplaces, so if you
use them now, it should be easy enough to update them in them in the
future to refer to the right thing.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v3 01/11] t: add tool to translate hash-related values

2018-08-29 Thread Derrick Stolee

On 8/28/2018 8:56 PM, brian m. carlson wrote:

+   rawsz="$(test_oid rawsz)" &&
+   hexsz="$(test_oid hexsz)" &&


These are neat helpers! The 'git commit-graph verify' tests in 
t5318-commit-graph.sh should automatically work if we use these for 
HASH_LEN instead of 20. I'll use a similar pattern when writing 'git 
multi-pack-index verify'.


Thanks,

-Stolee



Re: [PATCH v3 01/11] t: add tool to translate hash-related values

2018-08-28 Thread Jonathan Nieder
Hi,

brian m. carlson wrote:

> Add a test function helper, test_oid, that produces output that varies
> depending on the hash in use.

Cool!

>Add two additional helpers,
> test_oid_cache, which can be used to load data for test_oid from
> standard input, and test_oid_init, which can be used to load certain
> fixed values from lookup charts.  Check that these functions work in
> t, as the rest of the testsuite will soon come to depend on them.
>
> Implement two basic lookup charts, one for common invalid or synthesized
> object IDs, and one for various facts about the hash function in use.
> Provide versions for both SHA-1 and SHA-256.

What do test_oid_cache and test_oid_init do?  How can I use them?

Judging from t-basic.sh, the idea looks something like

Add a test function helper, test_oid, that ...

test_oid allows looking up arbitrary information about an object format:
the length of object ids, values of well known object ids, etc.  Before
calling it, a test script must invoke test_oid_cache (either directly
or indirectly through test_oid_init) to load the lookup charts.

See t for an example, which also serves as a sanity-check that
these functions work in preparation for using them in the rest of the
test suite.

There are two basic lookup charts for now: oid-info/oid, with common
invalid or synthesized object IDs; and oid-info/hash-info, with facts
such as object id length about the formats in use.  The charts contain
information about both SHA-1 and SHA-256.

So now you can update existing tests to be format-independent by (1)
adding an invocation of test_oid_init to test setup and (2) replacing
format dependencies with $(test_oid foo).

Since values are stored as shell variables, names used for lookup can
only consist of shell identifier characters.  If this is a problem in
the future, we can hash the names before use.

Improved-by: Eric Sunshine 

Do these always use sha1 for now?  Ah, t answers but it might be
worth mentioning in the commit message, too:

test_set_hash allows setting which object format test_oid should look
up information for, and test_detect_hash returns to the default format.

[...]
> --- /dev/null
> +++ b/t/oid-info/hash-info
> @@ -0,0 +1,8 @@
> +rawsz sha1:20
> +rawsz sha256:32

Can there be a README in this directory describing the files and format?

[...]
> --- a/t/t-basic.sh
> +++ b/t/t-basic.sh
> @@ -821,6 +821,41 @@ test_expect_success 'tests clean up even on failures' "
>   EOF
>  "
>  
> +test_oid_init

Can this be wrapped in test_expect_success?  That way, if it fails or
prints an error message then the usual test machinery would handle it.

> +
> +test_expect_success 'test_oid provides sane info by default' '
> + test_oid zero >actual &&
> + grep "^00*$" actual &&

nit: can save the reader some confusion by escaping the $.

> + rawsz="$(test_oid rawsz)" &&
> + hexsz="$(test_oid hexsz)" &&

optional: no need for these quotation marks --- a command substitution
assigned to a shell variable is treated as if it were quoted.

> + test "$hexsz" -eq $(wc -c  + test $(( $rawsz * 2)) -eq "$hexsz"

Makes sense.

[...]
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -1155,3 +1155,47 @@ depacketize () {
[...]
> +test_oid_cache () {
> + test -n "$test_hash_algo" || test_detect_hash

Should this use an uninterrupted &&-chain?

> + while read _tag _rest

This appears to be the first use of this naming convention.  I wonder
if we can use "local" instead.

> + do
> + case $_tag in
> + \#*)
> + continue;;
> + ?*)
> + # non-empty
> + ;;
> + *)
> + # blank line
> + continue;;
> +

unnecessary blank line here

> + esac &&
> +
> + _k="${_rest%:*}" &&
> + _v="${_rest#*:}" &&
> + { echo "$_k" | egrep '^[a-z0-9]+$' >/dev/null ||
> + error 'bug in the test script: bad hash algorithm'; } &&
> + eval "test_oid_${_k}_$_tag=\"\$_v\"" || return 1

This is dense, so I'm having trouble taking it in at a glance.

I think the idea is

key=${rest%%:*} &&
val=${rest#*:} &&

if ! expr "$key" : '[a-z0-9]*$' >/dev/null
then
error ...
fi &&
eval "test_oid_${key}_${tag}=\${val}"

> + done
> +}
> +
> +test_oid () {
> + eval "
> + test -n \"\${test_oid_${test_hash_algo}_$1+set}\" &&
> + printf '%s' \"\${test_oid_${test_hash_algo}_$1}\"
> + "

I'm also having trouble taking this one in.  Maybe splitting into two
evals would work?


[PATCH v3 01/11] t: add tool to translate hash-related values

2018-08-28 Thread brian m. carlson
Add a test function helper, test_oid, that produces output that varies
depending on the hash in use.  Add two additional helpers,
test_oid_cache, which can be used to load data for test_oid from
standard input, and test_oid_init, which can be used to load certain
fixed values from lookup charts.  Check that these functions work in
t, as the rest of the testsuite will soon come to depend on them.

Implement two basic lookup charts, one for common invalid or synthesized
object IDs, and one for various facts about the hash function in use.
Provide versions for both SHA-1 and SHA-256.

Note that due to the implementation used, names used for lookup can
currently consist only of shell identifier characters.  If this is a
problem in the future, we can hash the names before use.

Signed-off-by: Eric Sunshine 
Signed-off-by: brian m. carlson 
---
 t/oid-info/hash-info|  8 
 t/oid-info/oid  | 29 +++
 t/t-basic.sh| 35 
 t/test-lib-functions.sh | 44 +
 4 files changed, 116 insertions(+)
 create mode 100644 t/oid-info/hash-info
 create mode 100644 t/oid-info/oid

diff --git a/t/oid-info/hash-info b/t/oid-info/hash-info
new file mode 100644
index 00..ccdbfdf974
--- /dev/null
+++ b/t/oid-info/hash-info
@@ -0,0 +1,8 @@
+rawsz sha1:20
+rawsz sha256:32
+
+hexsz sha1:40
+hexsz sha256:64
+
+zero sha1:
+zero sha256:
diff --git a/t/oid-info/oid b/t/oid-info/oid
new file mode 100644
index 00..a754970523
--- /dev/null
+++ b/t/oid-info/oid
@@ -0,0 +1,29 @@
+# These are some common invalid and partial object IDs used in tests.
+001sha1:0001
+001sha256:0001
+002sha1:0002
+002sha256:0002
+003sha1:0003
+003sha256:0003
+004sha1:0004
+004sha256:0004
+005sha1:0005
+005sha256:0005
+006sha1:0006
+006sha256:0006
+007sha1:0007
+007sha256:0007
+# All zeros or Fs missing one or two hex segments.
+zero_1 sha1:000
+zero_1 
sha256:000
+zero_2 sha1:00
+zero_2 
sha256:00
+ff_1   sha1:fff
+ff_1   
sha256:fff
+ff_2   sha1:ff
+ff_2   
sha256:ff
+# More various invalid OIDs.
+numericsha1:0123456789012345678901234567890123456789
+numeric
sha256:0123456789012345678901234567890123456789012345678901234567890123
+deadbeef   sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
+deadbeef   
sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
diff --git a/t/t-basic.sh b/t/t-basic.sh
index 850f651e4e..e3cace299e 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -821,6 +821,41 @@ test_expect_success 'tests clean up even on failures' "
EOF
 "
 
+test_oid_init
+
+test_expect_success 'test_oid provides sane info by default' '
+   test_oid zero >actual &&
+   grep "^00*$" actual &&
+   rawsz="$(test_oid rawsz)" &&
+   hexsz="$(test_oid hexsz)" &&
+   test "$hexsz" -eq $(wc -c actual &&
+   grep "^00*$" actual &&
+   rawsz="$(test_oid rawsz)" &&
+   hexsz="$(test_oid hexsz)" &&
+   test $(wc -c actual &&
+   grep "^00*$" actual &&
+   rawsz="$(test_oid rawsz)" &&
+   hexsz="$(test_oid hexsz)" &&
+   test $(wc -c /dev/null ||
+   error 'bug in the test script: bad hash algorithm'; } &&
+   eval "test_oid_${_k}_$_tag=\"\$_v\"" || return 1
+   done
+}
+
+test_oid () {
+   eval "
+   test -n \"\${test_oid_${test_hash_algo}_$1+set}\" &&
+   printf '%s' \"\${test_oid_${test_hash_algo}_$1}\"
+   "
+}