Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams

2016-09-27 Thread Jeff King
On Tue, Sep 27, 2016 at 02:10:50PM +0200, Lars Schneider wrote:

> > That being said, why don't you just use LARGE_PACKET_MAX here? It is
> > already the accepted size for feeding to packet_read(), and we know it
> > has enough space to hold a NUL terminator. Yes, we may over-allocate by
> > 4 bytes, but that isn't really relevant. Strbufs over-allocate anyway.
> 
> TBH in that case I would prefer the "PKTLINE_DATA_MAXLEN+1" solution with
> an additional comment explaining "+1".
> 
> Would that be OK for you?
> 
> I am not worried about the extra 4 bytes. I am worried that we make it harder
> to see what is going on if we use LARGE_PACKET_MAX.

I guess I don't feel to strongly either way. My interest in
LARGE_PACKET_MAX is mostly that this is how all the rest of the
packet_read() callers behave.

-Peff


Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams

2016-09-27 Thread Lars Schneider

> On 27 Sep 2016, at 11:00, Jeff King  wrote:
> 
> On Tue, Sep 27, 2016 at 10:14:16AM +0200, Lars Schneider wrote:
> 
> + strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1);
> + paket_len = packet_read(fd_in, NULL, NULL,
> + sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, 
> options);
>> [...]
>> After looking at it with fresh eyes I think the existing code is probably 
>> correct,
>> but maybe a bit confusing.
>> 
>> packet_read() adds a '\0' at the end of the destination buffer:
>> https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/pkt-line.c#L206
>> 
>> That is why the destination buffer needs to be one byte larger than the 
>> expected content.
>> 
>> However, in this particular case that wouldn't be necessary because the 
>> destination
>> buffer is a 'strbuf' that allocates an extra byte for '\0' at the end. But 
>> we are not
>> supposed to write to this extra byte:
>> https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/strbuf.h#L25-L31
> 
> Right. The allocation happens as part of strbuf_grow(), but whatever
> fills the buffer is expected to write the actual NUL (either manually,
> or by calling strbuf_setlen().
> 
> I see the bit you quoted warns not to touch the extra byte yourself,
> though I wonder if that is a bit heavy-handed (I guess it would matter
> if we moved the extra 1-byte growth into strbuf_setlen(), but I find
> that a rather unlikely change).
> 
> That being said, why don't you just use LARGE_PACKET_MAX here? It is
> already the accepted size for feeding to packet_read(), and we know it
> has enough space to hold a NUL terminator. Yes, we may over-allocate by
> 4 bytes, but that isn't really relevant. Strbufs over-allocate anyway.

TBH in that case I would prefer the "PKTLINE_DATA_MAXLEN+1" solution with
an additional comment explaining "+1".

Would that be OK for you?

I am not worried about the extra 4 bytes. I am worried that we make it harder
to see what is going on if we use LARGE_PACKET_MAX.


> So just:
> 
>  for (;;) {
>strbuf_grow(sb_out, LARGE_PACKET_MAX);
>packet_len = packet_read(fd_in, NULL, NULL,
>sb_out->buf + sb_out->len, LARGE_PACKET_MAX,
>options);
>if (packet_len <= 0)
>break;
>/*
> * no need for strbuf_setlen() here; packet_read always adds a
> * NUL terminator.
> */
>sb_out->len += packet_len;
>  }
> 
> You _could_ make the final line of the loop use strbuf_setlen(); it
> would just overwrite something we already know is a NUL (and we know
> that no extra allocation is necessary).
> 
> Also, using LARGE_PACKET_MAX fixes the fact that this patch is using
> PKTLINE_DATA_MAXLEN before it is actually defined. :)

Yeah, I noticed that too and fixed it in v9 :-) Thanks for the reminder!


> You might want to occasionally run:
> 
>  git rebase -x make
> 
> to make sure all of your incremental steps are valid (or even "make
> test" if you are extremely patient; I often do that once after a big
> round of complex interactive-rebase reordering).

That is a good suggestion. I'll add that to my "tool-belt" :-)


Thanks,
Lars



Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams

2016-09-27 Thread Jeff King
On Tue, Sep 27, 2016 at 10:14:16AM +0200, Lars Schneider wrote:

> >>> + strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1);
> >>> + paket_len = packet_read(fd_in, NULL, NULL,
> >>> + sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, 
> >>> options);
> [...]
> After looking at it with fresh eyes I think the existing code is probably 
> correct,
> but maybe a bit confusing.
> 
> packet_read() adds a '\0' at the end of the destination buffer:
> https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/pkt-line.c#L206
> 
> That is why the destination buffer needs to be one byte larger than the 
> expected content.
> 
> However, in this particular case that wouldn't be necessary because the 
> destination
> buffer is a 'strbuf' that allocates an extra byte for '\0' at the end. But we 
> are not
> supposed to write to this extra byte:
> https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/strbuf.h#L25-L31

Right. The allocation happens as part of strbuf_grow(), but whatever
fills the buffer is expected to write the actual NUL (either manually,
or by calling strbuf_setlen().

I see the bit you quoted warns not to touch the extra byte yourself,
though I wonder if that is a bit heavy-handed (I guess it would matter
if we moved the extra 1-byte growth into strbuf_setlen(), but I find
that a rather unlikely change).

That being said, why don't you just use LARGE_PACKET_MAX here? It is
already the accepted size for feeding to packet_read(), and we know it
has enough space to hold a NUL terminator. Yes, we may over-allocate by
4 bytes, but that isn't really relevant. Strbufs over-allocate anyway.
So just:

  for (;;) {
strbuf_grow(sb_out, LARGE_PACKET_MAX);
packet_len = packet_read(fd_in, NULL, NULL,
 sb_out->buf + sb_out->len, LARGE_PACKET_MAX,
 options);
if (packet_len <= 0)
break;
/*
 * no need for strbuf_setlen() here; packet_read always adds a
 * NUL terminator.
 */
sb_out->len += packet_len;
  }

You _could_ make the final line of the loop use strbuf_setlen(); it
would just overwrite something we already know is a NUL (and we know
that no extra allocation is necessary).

Also, using LARGE_PACKET_MAX fixes the fact that this patch is using
PKTLINE_DATA_MAXLEN before it is actually defined. :)

You might want to occasionally run:

  git rebase -x make

to make sure all of your incremental steps are valid (or even "make
test" if you are extremely patient; I often do that once after a big
round of complex interactive-rebase reordering).

> I see two options:
> 
> 
> (1) I leave the +1 as is and add a comment why the extra byte is necessary.
> 
> Pro: No change in existing code necessary
> Con: The destination buffer has two '\0' at the end.
> 
> 
> (2) I add an option PACKET_READ_DISABLE_NUL_TERMINATION. If the option is
> set then no '\0' byte is added to the end.
> 
> Pro: Correct solution, no byte wasted.
> Con: Change in existing code required.
> 
> 
> Any preference?

Of the two, I prefer (1), though I like what I suggested above even more
(big surprise, I know).

-Peff


Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams

2016-09-27 Thread Lars Schneider

On 26 Sep 2016, at 22:23, Lars Schneider  wrote:

> 
> On 25 Sep 2016, at 15:46, Jakub Narębski  wrote:
> 
>> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
>>> From: Lars Schneider 
> 
> 
>>> +   strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1);
>>> +   paket_len = packet_read(fd_in, NULL, NULL,
>>> +   sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, 
>>> options);
>> 
>> A question (which perhaps was answered during the development of this
>> patch series): why is this +1 in PKTLINE_DATA_MAXLEN+1 here?
> 
> Nice catch. I think this is wrong:
> https://github.com/git/git/blob/6fe1b1407ed91823daa5d487abe457ff37463349/pkt-line.c#L196
> 
> It should be "if (len > size)" ... then we don't need the "+1" here.
> (but I need to think a bit more about this)

After looking at it with fresh eyes I think the existing code is probably 
correct,
but maybe a bit confusing.

packet_read() adds a '\0' at the end of the destination buffer:
https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/pkt-line.c#L206

That is why the destination buffer needs to be one byte larger than the 
expected content.

However, in this particular case that wouldn't be necessary because the 
destination
buffer is a 'strbuf' that allocates an extra byte for '\0' at the end. But we 
are not
supposed to write to this extra byte:
https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/strbuf.h#L25-L31


I see two options:


(1) I leave the +1 as is and add a comment why the extra byte is necessary.

Pro: No change in existing code necessary
Con: The destination buffer has two '\0' at the end.


(2) I add an option PACKET_READ_DISABLE_NUL_TERMINATION. If the option is
set then no '\0' byte is added to the end.

Pro: Correct solution, no byte wasted.
Con: Change in existing code required.


Any preference?


Thanks,
Lars

Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams

2016-09-26 Thread Lars Schneider

On 25 Sep 2016, at 15:46, Jakub Narębski  wrote:

> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
>> From: Lars Schneider 


>> +{
>> +static char buf[PKTLINE_DATA_MAXLEN];
> 
> Sidenote: we have LARGE_PACKET_MAX (used in previous patch), but
> PKTLINE_DATA_MAXLEN not LARGE_PACKET_DATA_MAX.

Agreed, I will rename it.


> 
>> +int err = 0;
>> +ssize_t bytes_to_write;
>> +
>> +while (!err) {
>> +bytes_to_write = xread(fd_in, buf, sizeof(buf));
>> +if (bytes_to_write < 0)
>> +return COPY_READ_ERROR;
>> +if (bytes_to_write == 0)
>> +break;
>> +err = packet_write_gently(fd_out, buf, bytes_to_write);
>> +}
>> +if (!err)
>> +err = packet_flush_gently(fd_out);
>> +return err;
>> +}
> 
> Looks good: clean and readable.
> 
> Sidenote (probably outside of scope of this patch): what are the
> errors that we can get from this function, beside COPY_READ_ERROR
> of course?

Everything that is returned by "read()"


>> +
>> static int get_packet_data(int fd, char **src_buf, size_t *src_size,
>> void *dst, unsigned size, int options)
>> {
>> @@ -305,3 +346,30 @@ char *packet_read_line_buf(char **src, size_t *src_len, 
>> int *dst_len)
>> {
>>  return packet_read_line_generic(-1, src, src_len, dst_len);
>> }
>> +
>> +ssize_t read_packetized_to_buf(int fd_in, struct strbuf *sb_out)
> 
> It's a bit strange that the signature of write_packetized_from_buf() is
> that different from read_packetized_to_buf().  This includes the return
> value: int vs ssize_t.  As I have checked, write() and read() both
> use ssize_t, while fread() and fwrite() both use size_t.

read_packetized_to_buf() returns the number of bytes read or a negative 
error code.

write_packetized_from_buf() returns 0 if the call was successful and an 
error code if not.

That's the reason these two functions have a different signature


> Perhaps this function should be named read_packetized_to_strbuf()
> (err, I asked this already)?

I agree with the rename as makes it distinct from 
write_packetized_from_buf().


>> +{
>> +int paket_len;
> 
> Possible typo: shouldn't it be called packet_len?

True!


> Shouldn't it be initialized to 0?

Well, it is set for sure later. That's why I think it is not necessary.

Plus, Eric Wong thought me not to:
"Compilers complain about uninitialized variables."
http://public-inbox.org/git/20160725072745.GB11634@starla/
(Note: he was talking about pointers there :-)


>> +int options = PACKET_READ_GENTLE_ON_EOF;
> 
> Why is this even a local variable?  It is never changed, and it is
> used only in one place; we can inline it.

Removed.


>> +
>> +size_t oldlen = sb_out->len;
>> +size_t oldalloc = sb_out->alloc;
> 
> Just a nitpick (feel free to ignore): doesn't this looks better:
> 
>  +size_t old_len   = sb_out->len;
>  +size_t old_alloc = sb_out->alloc;
> 
> Also perhaps s/old_/orig_/g.

Agreed. That matches the other variables better.


>> +strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1);
>> +paket_len = packet_read(fd_in, NULL, NULL,
>> +sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, 
>> options);
> 
> A question (which perhaps was answered during the development of this
> patch series): why is this +1 in PKTLINE_DATA_MAXLEN+1 here?

Nice catch. I think this is wrong:
https://github.com/git/git/blob/6fe1b1407ed91823daa5d487abe457ff37463349/pkt-line.c#L196

It should be "if (len > size)" ... then we don't need the "+1" here.
(but I need to think a bit more about this)

> 
>> +if (paket_len <= 0)
>> +break;
>> +sb_out->len += paket_len;
>> +}
>> +
>> +if (paket_len < 0) {
>> +if (oldalloc == 0)
>> +strbuf_release(sb_out);
>> +else
>> +strbuf_setlen(sb_out, oldlen);
> 
> A question (maybe I don't understand strbufs): why there is a special
> case for oldalloc == 0?

I tried to mimic the behavior of strbuf_read() [1]. The error handling
was introduced in 2fc647 [2] to ease error handling:

"This allows for easier error handling, as callers only need to call
strbuf_release() if A) the command succeeded or B) if they would have had
to do so anyway because they added something to the strbuf themselves."

[1] 
https://github.com/git/git/blob/cda1bbd474805e653dda8a71d4ea3790e2a66cbb/strbuf.c#L377-L383
[2] https://github.com/git/git/commit/2fc647004ac7016128372a85db8245581e493812


Thanks,
Lars

Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams

2016-09-25 Thread Jakub Narębski
W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
> From: Lars Schneider 
> 
> write_packetized_from_fd() and write_packetized_from_buf() write a
> stream of packets. All content packets use the maximal packet size
> except for the last one. After the last content packet a `flush` control
> packet is written.
> 
> read_packetized_to_buf() reads arbitrary sized packets until it detects
> a `flush` packet.

I guess that read_packetized_to_fd(), for completeness, is not needed
for the filter protocol (though it might be useful for the receive
side of send-pack / receive-pack).

Also, should it be read_packetized_to_strbuf()?  I guess using strbuf
to read is here because we might need more size to read in full, isn't
it.

> 
> Signed-off-by: Lars Schneider 
> ---
>  pkt-line.c | 68 
> ++
>  pkt-line.h |  7 +++
>  2 files changed, 75 insertions(+)
> 
> diff --git a/pkt-line.c b/pkt-line.c
> index fc0ac12..a0a8543 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -196,6 +196,47 @@ void packet_buf_write(struct strbuf *buf, const char 
> *fmt, ...)
>   va_end(args);
>  }
>  
> +int write_packetized_from_fd(int fd_in, int fd_out)

I wonder if it would be worth it to name parameters in such way that
it is known from the name which one is to be packetized, for example
fd_out_pkt here...

But it might be not worth it; you can get it from the function name.

> +{
> + static char buf[PKTLINE_DATA_MAXLEN];

Static buffer means not thread-safe and not reentrant. It would be
nice to have this information in a comment for this function, but
it is not necessary.

Also, is using static variable better than using global variable
`packet_buffer`?  Well, scope for weird interactions is smaller...

Sidenote: we have LARGE_PACKET_MAX (used in previous patch), but
PKTLINE_DATA_MAXLEN not LARGE_PACKET_DATA_MAX.

> + int err = 0;
> + ssize_t bytes_to_write;
> +
> + while (!err) {
> + bytes_to_write = xread(fd_in, buf, sizeof(buf));
> + if (bytes_to_write < 0)
> + return COPY_READ_ERROR;
> + if (bytes_to_write == 0)
> + break;
> + err = packet_write_gently(fd_out, buf, bytes_to_write);
> + }
> + if (!err)
> + err = packet_flush_gently(fd_out);
> + return err;
> +}

Looks good: clean and readable.

Sidenote (probably outside of scope of this patch): what are the
errors that we can get from this function, beside COPY_READ_ERROR
of course?

> +
> +int write_packetized_from_buf(const char *src_in, size_t len, int fd_out)
> +{
> + static char buf[PKTLINE_DATA_MAXLEN];
> + int err = 0;
> + size_t bytes_written = 0;
> + size_t bytes_to_write;

Those two variables, instead of modifying the values of len and/or src_in,
make code very easy to read.

> +
> + while (!err) {
> + if ((len - bytes_written) > sizeof(buf))
> + bytes_to_write = sizeof(buf);
> + else
> + bytes_to_write = len - bytes_written;
> + if (bytes_to_write == 0)
> + break;
> + err = packet_write_gently(fd_out, src_in + bytes_written, 
> bytes_to_write);
> + bytes_written += bytes_to_write;
> + }
> + if (!err)
> + err = packet_flush_gently(fd_out);
> + return err;
> +}

Looks good: clean and readable.

> +
>  static int get_packet_data(int fd, char **src_buf, size_t *src_size,
>  void *dst, unsigned size, int options)
>  {
> @@ -305,3 +346,30 @@ char *packet_read_line_buf(char **src, size_t *src_len, 
> int *dst_len)
>  {
>   return packet_read_line_generic(-1, src, src_len, dst_len);
>  }
> +
> +ssize_t read_packetized_to_buf(int fd_in, struct strbuf *sb_out)

It's a bit strange that the signature of write_packetized_from_buf() is
that different from read_packetized_to_buf().  This includes the return
value: int vs ssize_t.  As I have checked, write() and read() both
use ssize_t, while fread() and fwrite() both use size_t.

Perhaps this function should be named read_packetized_to_strbuf()
(err, I asked this already)?

> +{
> + int paket_len;

Possible typo: shouldn't it be called packet_len?
Shouldn't it be initialized to 0?

  + int packet_len = 0;

> + int options = PACKET_READ_GENTLE_ON_EOF;

Why is this even a local variable?  It is never changed, and it is
used only in one place; we can inline it.

If it would be needed in subsequent patches, then such information
should be included in the commit message.

> +
> + size_t oldlen = sb_out->len;
> + size_t oldalloc = sb_out->alloc;

Just a nitpick (feel free to ignore): doesn't this looks better:

  + size_t old_len   = sb_out->len;
  + size_t old_alloc = sb_out->alloc;

Also perhaps s/old_/orig_/g.

> +
> + for (;;) {

I see that you used the more popular way of coding forever loop:

  $ git grep 'for (;;