Re: additional API for SHAKE streaming read

2024-03-14 Thread Niels Möller
Daiki Ueno  writes:

>> * One could perhaps use index == 0 instead of index == block_size for
>>   the case that there is no buffered data. But the current convention
>>   does make your "if (length <= left)" nice and simple.
>
> I agree that the current convention is a bit awkward, so in the attached
> patch I changed to use index == 0 as the indicator where buffering is
> needed.  That actually makes the code simpler as we can defer buffering
> until when the data is read.  One drawback though is that it causes
> additional memcpy in a corner case where the _shake_output is used to
> retrieve data smaller than the block size.

I wonder if that will still be simpler if one also moves the
sha3_permute calls?

I have merged your previous version to a branch
add-sha3_256_shake_output, and ci looks green. So perhaps best to merge
that to master, and iterate from there?

>> * It looks a bit backwards to me that each iteration *first* copies data
>>   to the digest, and *then* calls sha3_permute. In case no more data is
>>   to be output, that sha3_permute call is wasted. It would be more
>>   natural to me to not call sha3_permute until we know the output is
>>   needed. But to fix that and still keep things nice for the first
>>   output block, I think one would need to reorganize _nettle_sha3_pad to
>>   not imply a call to sha3_permute (via sha3_absorb). So that's better
>>   done in a separate change.
>
> Right, I can do that after the current patch is settled.

I've done a bit of hacking locally. What I did was to take out the
xoring parts of sha3_absorb into it's own function sha3_xor_block, and
let sha3_pad_shake use that, without any call to sha3_permute. And then
call sha3_permute as output is needed.

>> * I'm still tempted to use ctx->index = ~index rather than ctx->index =
>>   index | INDEX_HIGH_BIT. But maybe that would just be too obfuscated.
>
> I'm actually not sure how this works.  For example, if unsigned int is
> 32-bit and index is 3, wouldn't ~index turn to 0xfffc, while index |
> INDEX_HIGH_BIT is 0x8003?

It would be a different representation, with the very minor advantage
that the INDEX_HIGH_BIT value isn't needed (in source code, or handled
at runtime). Like

  index = ctx->index;

  if (index < sizeof(ctx->block)) 
{ ... first call to shake_output, pad and initialize...  }
  else
index = ~index;

  assert (index <= sizeof(ctx->block));

  ... output processing ...

  ctx->index = ~index;

>> In next step, to also support shake128, we should generalize your code
>> using an internal function _sha3_shake_output taking block and block
>> size as arguments.
>
> Yes.

I've tried that in my local hack, I think it's rather straight-forward.
(I might be able to post corresponding patch later). What's unclear is
how much to share between _shake and shake_output. One could define
_shake as _shake_output + _init. The drawback I see is that (i) we would
allow _shake_output followed by _shake, which isn't proper api usage,
and (ii) _shake needs a lot less logic since it should always start by
padding, and it doesn't need to buffer any data, so it seems a bit wrong
to have it call shake_output that does this unneeded extra work.

/Regards,
Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se
To unsubscribe send an email to nettle-bugs-le...@lists.lysator.liu.se


Re: additional API for SHAKE streaming read

2024-03-14 Thread Daiki Ueno
Niels Möller  writes:

> Daiki Ueno  writes:
>
>> Yes, this makes the code a lot simpler.  I'm attaching an updated patch.
>
> Thanks, looks good to me. Some details I'm thinking about that might be
> improvements:
>
> * One could perhaps use index == 0 instead of index == block_size for
>   the case that there is no buffered data. But the current convention
>   does make your "if (length <= left)" nice and simple.

I agree that the current convention is a bit awkward, so in the attached
patch I changed to use index == 0 as the indicator where buffering is
needed.  That actually makes the code simpler as we can defer buffering
until when the data is read.  One drawback though is that it causes
additional memcpy in a corner case where the _shake_output is used to
retrieve data smaller than the block size.

> * It looks a bit backwards to me that each iteration *first* copies data
>   to the digest, and *then* calls sha3_permute. In case no more data is
>   to be output, that sha3_permute call is wasted. It would be more
>   natural to me to not call sha3_permute until we know the output is
>   needed. But to fix that and still keep things nice for the first
>   output block, I think one would need to reorganize _nettle_sha3_pad to
>   not imply a call to sha3_permute (via sha3_absorb). So that's better
>   done in a separate change.

Right, I can do that after the current patch is settled.

> * I'm still tempted to use ctx->index = ~index rather than ctx->index =
>   index | INDEX_HIGH_BIT. But maybe that would just be too obfuscated.

I'm actually not sure how this works.  For example, if unsigned int is
32-bit and index is 3, wouldn't ~index turn to 0xfffc, while index |
INDEX_HIGH_BIT is 0x8003?

> In next step, to also support shake128, we should generalize your code
> using an internal function _sha3_shake_output taking block and block
> size as arguments.

Yes.

> I'm also not sure about proper naming for shake128. If I read the
> Instances table at https://en.wikipedia.org/wiki/SHA-3 right, there's no
> standard regular hash function corresponding to shake128. We could still
> name it sha3_128_shake, but that might be confusing (there's no
> corresponding sha3_128_digest, would there be any use for that?). The
> alternative could be to use names sha3_shakeN_init, sha3_shakeN_update,
> sha3_shakeN_digest, sha3_shakeN_output (with some of the shake256
> functions, as well as the context struct, being aliases to corresponding
> sha3_256 names). But aliases also have a cost in potential confusion.

I agree; we probably shouldn't expose sha3_128_digest et. al., from the
API.

>> +  if (length > 0)
>> +{
>> +  /* Fill in the buffer for next call.  */
>> +  _nettle_write_le64 (sizeof (ctx->block), ctx->block, ctx->state.a);
>> +  sha3_permute (&ctx->state);
>> +  memcpy (digest, ctx->block, length);
>> +  ctx->index = length | INDEX_HIGH_BIT;
>> +}
>> +  else
>> +ctx->index = sizeof (ctx->block) | INDEX_HIGH_BIT;
>> +}
>
> If I read your code right, we actually always have length > 0 at this
> place. So either delete the if conditional, or change the condition of
> the loop above from (length > sizeof (ctx->block)) to (length >= sizeof
> (ctx->block)). The latter option would avoid a memcpy in the case that
> the requested digest ends with a full block.

Indeed, fixed.

Regards,
-- 
Daiki Ueno
>From 42c6b5686d361f5572fc6e2daf5d7e355d5b90c0 Mon Sep 17 00:00:00 2001
From: Daiki Ueno 
Date: Sun, 10 Mar 2024 09:43:04 +0900
Subject: [PATCH] sha3: Extend SHAKE256 API with incremental output

This adds an alternative function sha3_256_shake_output in the
SHAKE256 support, which enables to read output multiple times in an
incremental manner.

Signed-off-by: Daiki Ueno 
---
 sha3.h|  8 +
 shake256.c| 65 +++
 testsuite/shake256-test.c | 65 +++
 3 files changed, 138 insertions(+)

diff --git a/sha3.h b/sha3.h
index 9220829d..4b7e186c 100644
--- a/sha3.h
+++ b/sha3.h
@@ -49,6 +49,7 @@ extern "C" {
 #define sha3_256_update nettle_sha3_256_update
 #define sha3_256_digest nettle_sha3_256_digest
 #define sha3_256_shake nettle_sha3_256_shake
+#define sha3_256_shake_output nettle_sha3_256_shake_output
 #define sha3_384_init nettle_sha3_384_init
 #define sha3_384_update nettle_sha3_384_update
 #define sha3_384_digest nettle_sha3_384_digest
@@ -143,6 +144,13 @@ sha3_256_shake(struct sha3_256_ctx *ctx,
 	   size_t length,
 	   uint8_t *digest);
 
+/* Unlike sha3_256_shake, this function can be called multiple times
+   to retrieve output from shake256 in an incremental manner */
+void
+sha3_256_shake_output(struct sha3_256_ctx *ctx,
+		  size_t length,
+		  uint8_t *digest);
+
 struct sha3_384_ctx
 {
   struct sha3_state state;
diff --git a/shake256.c b/shake256.c
index f5c77a43..cba22af4 100644
--- a/shake256.c
+++ b/shake256.c
@@ -36,6 +36,8 @@
 # include "conf