Re: Review Request 47481: Rewrote os::read() to be friendlier to reading binary data.

2016-05-19 Thread Kevin Klues


> On May 19, 2016, 9:12 a.m., Jan Schlicht wrote:
> > 3rdparty/stout/include/stout/os/read.hpp, line 136
> > 
> >
> > s/delete/delete[]
> > 
> > Why not use a `std::vector` for `buffer`?
> > 
> > I've opened https://reviews.apache.org/r/47585/ to fix this.

I agree that this needs to be `delete[]`. Thanks for that.

I can see why this bug was introduced over time though. If you look back at 
revision 1 of this patch, I wasn't using temporary storage at all, which would 
have avoided this bug altogether. Unfortunately, this approach was vetoed by 
the reviewer, leading to revision 2, which used a stack allocated array. This 
also didn't have the bug, but introduced other "stylistic" problems that the 
reviewers weren't happy with. In revision 3, I used a `std::vector` as 
you suggest, but it never made it back to reviewboard because it was vetoed in 
external feedback. As a compromise, I started using a `unique_ptr<>` in 
revision 4, but this was vetoed as well. Looks like I overlooked the proper 
delete to use after 5 revisions through a very simple patch.

I would personally do away with the temporary storage and go with the approach 
I have in revision 1 on review board. If we are going the temporary storage 
route, I  would prefer the `std::vector` approach as you suggest. 

Need to convince a shepherd though.


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47481/#review133904
---


On May 18, 2016, 3:25 a.m., Kevin Klues wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47481/
> ---
> 
> (Updated May 18, 2016, 3:25 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, and switched 
> to 'mcypark'.
> 
> 
> Bugs: MESOS-5398
> https://issues.apache.org/jira/browse/MESOS-5398
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The previous read() implementation was based on calling getline() to
> read in chunks of data from a file. This is fine for text-based files,
> but is a little strange for binary files.
> 
> The new implementation reads in chunks of raw bytes into a stack
> allocated buffer before copying them into their final location. I
> usually don't like stack allocated buffers because of their potential
> security implications when it comes to stack buffer overflows, but I
> took extra care here to make sure there won't be any.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/read.hpp 
> e1e97c1bcb7493a734fc77721a83c230b1a23724 
> 
> Diff: https://reviews.apache.org/r/47481/diff/
> 
> 
> Testing
> ---
> 
> make check -j
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>



Re: Review Request 47481: Rewrote os::read() to be friendlier to reading binary data.

2016-05-19 Thread Jan Schlicht

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47481/#review133904
---




3rdparty/stout/include/stout/os/read.hpp (lines 90 - 103)


`getline` is no longer used in the code below. Can this be removed then?



3rdparty/stout/include/stout/os/read.hpp (line 129)


s/delete/delete[]

Why not use a `std::vector` for `buffer`?

I've opened https://reviews.apache.org/r/47585/ to fix this.



3rdparty/stout/include/stout/os/read.hpp (line 142)


s/delete/delete[]


- Jan Schlicht


On May 18, 2016, 5:25 a.m., Kevin Klues wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47481/
> ---
> 
> (Updated May 18, 2016, 5:25 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, and switched 
> to 'mcypark'.
> 
> 
> Bugs: MESOS-5398
> https://issues.apache.org/jira/browse/MESOS-5398
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The previous read() implementation was based on calling getline() to
> read in chunks of data from a file. This is fine for text-based files,
> but is a little strange for binary files.
> 
> The new implementation reads in chunks of raw bytes into a stack
> allocated buffer before copying them into their final location. I
> usually don't like stack allocated buffers because of their potential
> security implications when it comes to stack buffer overflows, but I
> took extra care here to make sure there won't be any.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/read.hpp 
> e1e97c1bcb7493a734fc77721a83c230b1a23724 
> 
> Diff: https://reviews.apache.org/r/47481/diff/
> 
> 
> Testing
> ---
> 
> make check -j
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>



Re: Review Request 47481: Rewrote os::read() to be friendlier to reading binary data.

2016-05-18 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47481/#review133834
---


Ship it!





3rdparty/stout/include/stout/os/read.hpp (line 108)


Not yours, but we should avoid including caller-available information here 
as it leads to redundant information when error messages are composed.



3rdparty/stout/include/stout/os/read.hpp (line 120)


true instead of 1?



3rdparty/stout/include/stout/os/read.hpp (line 124)


Perhaps we should mention that ferror does not modify errno if the file is 
valid.



3rdparty/stout/include/stout/os/read.hpp (line 137)


Perhaps we should assert feof here?


- Benjamin Mahler


On May 18, 2016, 3:25 a.m., Kevin Klues wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47481/
> ---
> 
> (Updated May 18, 2016, 3:25 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, and switched 
> to 'mcypark'.
> 
> 
> Bugs: MESOS-5398
> https://issues.apache.org/jira/browse/MESOS-5398
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The previous read() implementation was based on calling getline() to
> read in chunks of data from a file. This is fine for text-based files,
> but is a little strange for binary files.
> 
> The new implementation reads in chunks of raw bytes into a stack
> allocated buffer before copying them into their final location. I
> usually don't like stack allocated buffers because of their potential
> security implications when it comes to stack buffer overflows, but I
> took extra care here to make sure there won't be any.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/read.hpp 
> e1e97c1bcb7493a734fc77721a83c230b1a23724 
> 
> Diff: https://reviews.apache.org/r/47481/diff/
> 
> 
> Testing
> ---
> 
> make check -j
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>



Re: Review Request 47481: Rewrote os::read() to be friendlier to reading binary data.

2016-05-17 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47481/
---

(Updated May 18, 2016, 3:25 a.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, and switched 
to 'mcypark'.


Changes
---

Removed erroneous #include


Bugs: MESOS-5398
https://issues.apache.org/jira/browse/MESOS-5398


Repository: mesos


Description
---

The previous read() implementation was based on calling getline() to
read in chunks of data from a file. This is fine for text-based files,
but is a little strange for binary files.

The new implementation reads in chunks of raw bytes into a stack
allocated buffer before copying them into their final location. I
usually don't like stack allocated buffers because of their potential
security implications when it comes to stack buffer overflows, but I
took extra care here to make sure there won't be any.


Diffs (updated)
-

  3rdparty/stout/include/stout/os/read.hpp 
e1e97c1bcb7493a734fc77721a83c230b1a23724 

Diff: https://reviews.apache.org/r/47481/diff/


Testing
---

make check -j


Thanks,

Kevin Klues



Re: Review Request 47481: Rewrote os::read() to be friendlier to reading binary data.

2016-05-17 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47481/
---

(Updated May 18, 2016, 3:18 a.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, and switched 
to 'mcypark'.


Changes
---

Updated to use a heap allocated buffer instead of a stack allocated one.


Bugs: MESOS-5398
https://issues.apache.org/jira/browse/MESOS-5398


Repository: mesos


Description
---

The previous read() implementation was based on calling getline() to
read in chunks of data from a file. This is fine for text-based files,
but is a little strange for binary files.

The new implementation reads in chunks of raw bytes into a stack
allocated buffer before copying them into their final location. I
usually don't like stack allocated buffers because of their potential
security implications when it comes to stack buffer overflows, but I
took extra care here to make sure there won't be any.


Diffs (updated)
-

  3rdparty/stout/include/stout/os/read.hpp 
e1e97c1bcb7493a734fc77721a83c230b1a23724 

Diff: https://reviews.apache.org/r/47481/diff/


Testing
---

make check -j


Thanks,

Kevin Klues



Re: Review Request 47481: Rewrote os::read() to be friendlier to reading binary data.

2016-05-17 Thread Kevin Klues


> On May 17, 2016, 8:44 p.m., Jojy Varghese wrote:
> > 3rdparty/stout/include/stout/os/read.hpp, line 113
> > 
> >
> > Does this assume a certain stack size?
> 
> Kevin Klues wrote:
> It would assume a certain stack size, yes. 
> Trust me, I didn't want to go this route personally. I switched to this 
> way based on offline feedback.
> 
> Benjamin Mahler wrote:
> > Trust me, I didn't want to go this route personally. I switched to this 
> way based on offline feedback.
> 
> In the interest of transparency, the initial implementation was directly 
> writing to std::string managed storage. I gave feedback to Kevin to avoid 
> this approach because it's a bit clever and it's not obviously correct 
> compared to writing to a character array and doing the copy: 
> http://stackoverflow.com/a/6700534 . While the string approach is nice 
> because it avoids the extra copy from a character array into the string, 
> `fread` already maintains an internal user-space buffer in order to provide 
> buffered I/O. So, if we really cared about eliminating user space copying we 
> have to use `read` rather than `fread` AFAICT.
> 
> Whether the array is on the heap or the stack is a separate concern IMO, 
> if assuming a particular stack size is a problem, then let's put it on the 
> heap. Curious to hear more thoughts on that however, since we have a number 
> of stack allocated buffers in the code, 1024 bytes or less FWICT.
> 
> Also, why 4096 instead of `BUFSIZ`?
> http://www.cplusplus.com/reference/cstdio/BUFSIZ/
> http://www.cplusplus.com/reference/cstdio/setbuf/
> 
> If we used `BUFSIZ` is there still a concern about stack size assumptions?

I chose 4096 because we had talked about using a PAGE sized buffer. I'm fine 
with any size though.

In general, no matter what size buffer we choose, so long as the buffer is 
stack allocated, there is always a risk of overruning the stack if we are near 
the end of the stack. This isn't just for buffers, it's true for any stack 
allocated variables -- buffers just tend to be rather large stack allocated 
variables, increasing the risk of an overflow.

That said, I'd say the risk is pretty low of overruning your stack on Linux (by 
default the stack can expand up to 8Mb and you can set this higher if you 
want). If you happen to use ulimit to set the max stack size to something 
smaller (or programatically launch threads with smaller stacks), you do 
increase your chances of overrunning the stack (but there was always a non-zero 
cahnce of overruning it anyway).  Either way, Linux will always start with a 
small stack and automatically grow it on demand up to the specified max size 
(page-fault -> allocate new stack -> copy old stack -> update stack pointer -> 
restart task).

The bigger concern for me is accidentally writing buggy code that overruns 
stack allocated buffers in the current stack frame. I've tried to make sure to 
avoid this in this particular function, but I typically like to avoid using 
stack allocated buffers myself.


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47481/#review133624
---


On May 17, 2016, 8:36 p.m., Kevin Klues wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47481/
> ---
> 
> (Updated May 17, 2016, 8:36 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, and switched 
> to 'mcypark'.
> 
> 
> Bugs: MESOS-5398
> https://issues.apache.org/jira/browse/MESOS-5398
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The previous read() implementation was based on calling getline() to
> read in chunks of data from a file. This is fine for text-based files,
> but is a little strange for binary files.
> 
> The new implementation reads in chunks of raw bytes into a stack
> allocated buffer before copying them into their final location. I
> usually don't like stack allocated buffers because of their potential
> security implications when it comes to stack buffer overflows, but I
> took extra care here to make sure there won't be any.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/read.hpp 
> e1e97c1bcb7493a734fc77721a83c230b1a23724 
> 
> Diff: https://reviews.apache.org/r/47481/diff/
> 
> 
> Testing
> ---
> 
> make check -j
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>



Re: Review Request 47481: Rewrote os::read() to be friendlier to reading binary data.

2016-05-17 Thread Benjamin Mahler


> On May 17, 2016, 8:44 p.m., Jojy Varghese wrote:
> > 3rdparty/stout/include/stout/os/read.hpp, line 113
> > 
> >
> > Does this assume a certain stack size?
> 
> Kevin Klues wrote:
> It would assume a certain stack size, yes. 
> Trust me, I didn't want to go this route personally. I switched to this 
> way based on offline feedback.

> Trust me, I didn't want to go this route personally. I switched to this way 
> based on offline feedback.

In the interest of transparency, the initial implementation was directly 
writing to std::string managed storage. I gave feedback to Kevin to avoid this 
approach because it's a bit clever and it's not obviously correct compared to 
writing to a character array and doing the copy: 
http://stackoverflow.com/a/6700534 . While the string approach is nice because 
it avoids the extra copy from a character array into the string, `fread` 
already maintains an internal user-space buffer in order to provide buffered 
I/O. So, if we really cared about eliminating user space copying we have to use 
`read` rather than `fread` AFAICT.

Whether the array is on the heap or the stack is a separate concern IMO, if 
assuming a particular stack size is a problem, then let's put it on the heap. 
Curious to hear more thoughts on that however, since we have a number of stack 
allocated buffers in the code, 1024 bytes or less FWICT.

Also, why 4096 instead of `BUFSIZ`?
http://www.cplusplus.com/reference/cstdio/BUFSIZ/
http://www.cplusplus.com/reference/cstdio/setbuf/

If we used `BUFSIZ` is there still a concern about stack size assumptions?


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47481/#review133624
---


On May 17, 2016, 8:36 p.m., Kevin Klues wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47481/
> ---
> 
> (Updated May 17, 2016, 8:36 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, and switched 
> to 'mcypark'.
> 
> 
> Bugs: MESOS-5398
> https://issues.apache.org/jira/browse/MESOS-5398
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The previous read() implementation was based on calling getline() to
> read in chunks of data from a file. This is fine for text-based files,
> but is a little strange for binary files.
> 
> The new implementation reads in chunks of raw bytes into a stack
> allocated buffer before copying them into their final location. I
> usually don't like stack allocated buffers because of their potential
> security implications when it comes to stack buffer overflows, but I
> took extra care here to make sure there won't be any.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/read.hpp 
> e1e97c1bcb7493a734fc77721a83c230b1a23724 
> 
> Diff: https://reviews.apache.org/r/47481/diff/
> 
> 
> Testing
> ---
> 
> make check -j
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>



Re: Review Request 47481: Rewrote os::read() to be friendlier to reading binary data.

2016-05-17 Thread Kevin Klues


> On May 17, 2016, 8:44 p.m., Jojy Varghese wrote:
> > 3rdparty/stout/include/stout/os/read.hpp, line 113
> > 
> >
> > Does this assume a certain stack size?

It would assume a certain stack size, yes. 
Trust me, I didn't want to go this route personally. I switched to this way 
based on offline feedback.


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47481/#review133624
---


On May 17, 2016, 8:36 p.m., Kevin Klues wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47481/
> ---
> 
> (Updated May 17, 2016, 8:36 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, and switched 
> to 'mcypark'.
> 
> 
> Bugs: MESOS-5398
> https://issues.apache.org/jira/browse/MESOS-5398
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The previous read() implementation was based on calling getline() to
> read in chunks of data from a file. This is fine for text-based files,
> but is a little strange for binary files.
> 
> The new implementation reads in chunks of raw bytes into a stack
> allocated buffer before copying them into their final location. I
> usually don't like stack allocated buffers because of their potential
> security implications when it comes to stack buffer overflows, but I
> took extra care here to make sure there won't be any.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/read.hpp 
> e1e97c1bcb7493a734fc77721a83c230b1a23724 
> 
> Diff: https://reviews.apache.org/r/47481/diff/
> 
> 
> Testing
> ---
> 
> make check -j
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>



Re: Review Request 47481: Rewrote os::read() to be friendlier to reading binary data.

2016-05-17 Thread Jojy Varghese

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47481/#review133624
---




3rdparty/stout/include/stout/os/read.hpp (line 113)


Does this assume a certain stack size?


- Jojy Varghese


On May 17, 2016, 8:36 p.m., Kevin Klues wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47481/
> ---
> 
> (Updated May 17, 2016, 8:36 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, and switched 
> to 'mcypark'.
> 
> 
> Bugs: MESOS-5398
> https://issues.apache.org/jira/browse/MESOS-5398
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The previous read() implementation was based on calling getline() to
> read in chunks of data from a file. This is fine for text-based files,
> but is a little strange for binary files.
> 
> The new implementation reads in chunks of raw bytes into a stack
> allocated buffer before copying them into their final location. I
> usually don't like stack allocated buffers because of their potential
> security implications when it comes to stack buffer overflows, but I
> took extra care here to make sure there won't be any.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/read.hpp 
> e1e97c1bcb7493a734fc77721a83c230b1a23724 
> 
> Diff: https://reviews.apache.org/r/47481/diff/
> 
> 
> Testing
> ---
> 
> make check -j
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>



Re: Review Request 47481: Rewrote os::read() to be friendlier to reading binary data.

2016-05-17 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47481/
---

(Updated May 17, 2016, 8:36 p.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, and switched 
to 'mcypark'.


Bugs: MESOS-5398
https://issues.apache.org/jira/browse/MESOS-5398


Repository: mesos


Description (updated)
---

The previous read() implementation was based on calling getline() to
read in chunks of data from a file. This is fine for text-based files,
but is a little strange for binary files.

The new implementation reads in chunks of raw bytes into a stack
allocated buffer before copying them into their final location. I
usually don't like stack allocated buffers because of their potential
security implications when it comes to stack buffer overflows, but I
took extra care here to make sure there won't be any.


Diffs (updated)
-

  3rdparty/stout/include/stout/os/read.hpp 
e1e97c1bcb7493a734fc77721a83c230b1a23724 

Diff: https://reviews.apache.org/r/47481/diff/


Testing
---

make check -j


Thanks,

Kevin Klues



Re: Review Request 47481: Rewrote os::read() to be friendlier to reading binary data.

2016-05-17 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47481/
---

(Updated May 17, 2016, 8:34 p.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, and switched 
to 'mcypark'.


Changes
---

Changed the logic in the read() call to copy from a stack allocated buffer 
instead of writing to the string directly. I usually don't like stack allocated 
buffers because of their potential security implications when it comes to stack 
buffer overflows, but I took extra care here to make sure there won't be any :).


Bugs: MESOS-5398
https://issues.apache.org/jira/browse/MESOS-5398


Repository: mesos


Description
---

The previous read() implementation was based on calling getline() to
read in chunks of data from a file. This is fine for text-based files,
but is a little strange for binary files.

The new implementation reads in chunks of raw bytes in exponentially
increasing chunk sizes, and works well for both text files and binary
files alike.


Diffs (updated)
-

  3rdparty/stout/include/stout/os/read.hpp 
e1e97c1bcb7493a734fc77721a83c230b1a23724 

Diff: https://reviews.apache.org/r/47481/diff/


Testing (updated)
---

make check -j


Thanks,

Kevin Klues