Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-09 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160937
---



Patch looks great!

Reviews applied: [54996]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Jan. 9, 2017, 4:42 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 9, 2017, 4:42 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> `make check -j4` also passes with no errors.
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-09 Thread Aaron Wood

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/
---

(Updated Jan. 9, 2017, 4:42 p.m.)


Review request for mesos and Jie Yu.


Bugs: MESOS-6835
https://issues.apache.org/jira/browse/MESOS-6835


Repository: mesos


Description
---

Currently in the Linux launcher when the stack is allocated and prepared for a 
call to clone() it is not properly aligned. This is not an issue for x86 or x64 
but for ARM64/AArch64 it is because of the requirement of having the stack 
aligned to a 16 byte boundary. While x86 and x64 also expect the stack to have 
a 16 byte aligned stack, it is not enforced. An explanation of the stack and 
requirements for ARM64 can be found here 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf 
(specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
quad-word aligned.)

Additionally, the way that the stack is currently allocated and passed to 
clone() accidentally chops off one entry, making a stack overflow using those 
missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
both the issue of the stack overflow issue as well as the SIGBUS crash. We 
should also net better performance from having the stack aligned.


Diffs (updated)
-

  3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
  src/linux/ns.hpp 77789717e 

Diff: https://reviews.apache.org/r/54996/diff/


Testing
---

Built Mesos from source and am currently running it in a test cluster. Launched 
both Docker and Mesos tasks via Marathon without any resulting crash (initial 
crash only happened with Mesos containerizer + linux_launcher, not with the 
posix_launcher).

`make check -j4` also passes with no errors.


Thanks,

Aaron Wood



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-09 Thread James Peach


> On Jan. 7, 2017, 6:42 p.m., Jie Yu wrote:
> > 3rdparty/stout/include/stout/os/linux.hpp, line 72
> > 
> >
> > Please use errno error so that errno are in the error string.
> > 
> > `return ErrnoError("Memory allocation failed");`
> 
> Aaron Wood wrote:
> `posix_memalign` never sets `errno`. Shouldn't we just return `Error`?

You can do this:
```
int err = posix_memalign(...);
if (err) {
  return ErrnoError(err);
}
```


- James


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160821
---


On Jan. 9, 2017, 4:19 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 9, 2017, 4:19 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> `make check -j4` also passes with no errors.
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-09 Thread Aaron Wood

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/
---

(Updated Jan. 9, 2017, 4:19 p.m.)


Review request for mesos and Jie Yu.


Bugs: MESOS-6835
https://issues.apache.org/jira/browse/MESOS-6835


Repository: mesos


Description
---

Currently in the Linux launcher when the stack is allocated and prepared for a 
call to clone() it is not properly aligned. This is not an issue for x86 or x64 
but for ARM64/AArch64 it is because of the requirement of having the stack 
aligned to a 16 byte boundary. While x86 and x64 also expect the stack to have 
a 16 byte aligned stack, it is not enforced. An explanation of the stack and 
requirements for ARM64 can be found here 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf 
(specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
quad-word aligned.)

Additionally, the way that the stack is currently allocated and passed to 
clone() accidentally chops off one entry, making a stack overflow using those 
missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
both the issue of the stack overflow issue as well as the SIGBUS crash. We 
should also net better performance from having the stack aligned.


Diffs
-

  3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
  src/linux/ns.hpp 77789717e 

Diff: https://reviews.apache.org/r/54996/diff/


Testing (updated)
---

Built Mesos from source and am currently running it in a test cluster. Launched 
both Docker and Mesos tasks via Marathon without any resulting crash (initial 
crash only happened with Mesos containerizer + linux_launcher, not with the 
posix_launcher).

`make check -j4` also passes with no errors.


Thanks,

Aaron Wood



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-09 Thread Aaron Wood


> On Jan. 7, 2017, 6:42 p.m., Jie Yu wrote:
> > 3rdparty/stout/include/stout/os/linux.hpp, line 72
> > 
> >
> > Please use errno error so that errno are in the error string.
> > 
> > `return ErrnoError("Memory allocation failed");`

`posix_memalign` never sets `errno`. Shouldn't we just return `Error`?


- Aaron


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160821
---


On Jan. 4, 2017, 9:55 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 9:55 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-07 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160821
---


Fix it, then Ship it!





3rdparty/stout/include/stout/os/linux.hpp (line 58)


s/DefaultSize/DEFAULT_SIZE/



3rdparty/stout/include/stout/os/linux.hpp (line 60)


It's weird that we allow copy constructor but disallow assignment operator. 
Let's just remove this.



3rdparty/stout/include/stout/os/linux.hpp (line 66)


I typically prefer one extra blank line before this comment



3rdparty/stout/include/stout/os/linux.hpp (lines 70 - 71)


I would prefer align the code like the following:
```
if (::posix_memalign(
reinterpret_cast(),
os::pagesize(),
stack.size) != 0) {
  return ErrnoError("Memory allocation failed");
}
```



3rdparty/stout/include/stout/os/linux.hpp (line 71)


please use os::pagesize() here.



3rdparty/stout/include/stout/os/linux.hpp (line 72)


Please use errno error so that errno are in the error string.

`return ErrnoError("Memory allocation failed");`


- Jie Yu


On Jan. 4, 2017, 9:55 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 9:55 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-05 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160652
---



Patch looks great!

Reviews applied: [54996]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Jan. 4, 2017, 9:55 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 9:55 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-04 Thread Aaron Wood

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/
---

(Updated Jan. 4, 2017, 9:55 p.m.)


Review request for mesos and Jie Yu.


Changes
---

`posix_memalign` does not set `errno`.


Bugs: MESOS-6835
https://issues.apache.org/jira/browse/MESOS-6835


Repository: mesos


Description
---

Currently in the Linux launcher when the stack is allocated and prepared for a 
call to clone() it is not properly aligned. This is not an issue for x86 or x64 
but for ARM64/AArch64 it is because of the requirement of having the stack 
aligned to a 16 byte boundary. While x86 and x64 also expect the stack to have 
a 16 byte aligned stack, it is not enforced. An explanation of the stack and 
requirements for ARM64 can be found here 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf 
(specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
quad-word aligned.)

Additionally, the way that the stack is currently allocated and passed to 
clone() accidentally chops off one entry, making a stack overflow using those 
missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
both the issue of the stack overflow issue as well as the SIGBUS crash. We 
should also net better performance from having the stack aligned.


Diffs (updated)
-

  3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
  src/linux/ns.hpp 77789717e 

Diff: https://reviews.apache.org/r/54996/diff/


Testing
---

Built Mesos from source and am currently running it in a test cluster. Launched 
both Docker and Mesos tasks via Marathon without any resulting crash (initial 
crash only happened with Mesos containerizer + linux_launcher, not with the 
posix_launcher).


Thanks,

Aaron Wood



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-04 Thread Aaron Wood

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160541
---




3rdparty/stout/include/stout/os/linux.hpp (line 72)


Looks like `posix_memalign` never sets `errno`. Need to change this to 
return `Error`.


- Aaron Wood


On Jan. 4, 2017, 9:28 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 9:28 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-04 Thread Aaron Wood


> On Jan. 4, 2017, 5:41 a.m., James Peach wrote:
> > 3rdparty/stout/include/stout/os/linux.hpp, line 120
> > 
> >
> > Consider hoisting this into the `Stack` class:
> > 
> > ```
> > void * Stack::start() {
> >   return (uint8_t *)address + size;
> > }
> > ```
> 
> Aaron Wood wrote:
> Don't we want to avoid C-style casts?
> 
> James Peach wrote:
> Sure, you could `static_cast` here. Not much difference imho :)
> 
> Aaron Wood wrote:
> True. I thought it was part of the Mesos style guide which is why I ask.
> 
> Aaron Wood wrote:
> Dropping this to stick with `char*` instead.

Actually, let me mark this as fixed since I did add a `start()` method, it just 
doesn't return a `void*`.


- Aaron


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160464
---


On Jan. 4, 2017, 9:28 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 9:28 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-04 Thread Aaron Wood


> On Jan. 4, 2017, 5:41 a.m., James Peach wrote:
> > 3rdparty/stout/include/stout/os/linux.hpp, line 63
> > 
> >
> > I'd recommend something like this:
> > ```
> > Try allocate(size_t nbytes) {
> > int error = ::posix_memalign(, os::getpagesize(), nbytes);
> > if (error) {
> > return ErrnoError(error);
> > }
> > 
> > return Nothing();
> > }
> > ```

While my changes are not 100% aligned with what you suggested here, I did move 
to using `getpagesize()`.


> On Jan. 4, 2017, 5:41 a.m., James Peach wrote:
> > 3rdparty/stout/include/stout/os/linux.hpp, line 60
> > 
> >
> > I don't think that you need this static constructor. You can do 
> > everything you need in `allocate`, which is them symmetric with 
> > `deallocate`.

Dropping since Jie wants to keep the private constructor.


- Aaron


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160464
---


On Jan. 4, 2017, 9:28 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 9:28 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-04 Thread Aaron Wood


> On Jan. 4, 2017, 5:41 a.m., James Peach wrote:
> > 3rdparty/stout/include/stout/os/linux.hpp, line 82
> > 
> >
> > You should explictly delete the copy constructors here to ensure the 
> > stack can't be leaked if the code changes:
> > 
> > ```
> > Stack(const Stack&) = delete;
> > Stack& operator=(const Stack&) = delete;
> > ```

See comment below about why the copy constructor cannot be deleted in this case.


- Aaron


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160464
---


On Jan. 4, 2017, 9:28 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 9:28 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-04 Thread Aaron Wood


> On Jan. 4, 2017, 5:41 a.m., James Peach wrote:
> > 3rdparty/stout/include/stout/os/linux.hpp, line 120
> > 
> >
> > Consider hoisting this into the `Stack` class:
> > 
> > ```
> > void * Stack::start() {
> >   return (uint8_t *)address + size;
> > }
> > ```
> 
> Aaron Wood wrote:
> Don't we want to avoid C-style casts?
> 
> James Peach wrote:
> Sure, you could `static_cast` here. Not much difference imho :)
> 
> Aaron Wood wrote:
> True. I thought it was part of the Mesos style guide which is why I ask.

Dropping this to stick with `char*` instead.


- Aaron


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160464
---


On Jan. 4, 2017, 9:28 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 9:28 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-04 Thread Aaron Wood


> On Jan. 4, 2017, 5:58 p.m., Jie Yu wrote:
> > 3rdparty/stout/include/stout/os/linux.hpp, lines 57-60
> > 
> >
> > I would actually suggest keep this `create` method, but move the 
> > allocation logic in `create`. 
> > 
> > ```
> > static Try create(size_t size)
> > {
> >   Stack stack(size);
> >   
> >   if (posix_memalign(...) != 0) {
> > return ErrnoError("Failed to allocate stack");
> >   }
> >   
> >   return stack;
> > }
> > ```
> > 
> > We can get rid of the `allocate` function. Once created, it's by 
> > default allocated.
> 
> Aaron Wood wrote:
> `address` and `size` would need to be static for this. That opens up 
> another issue since all stacks would share the same data. Would it be better 
> to get rid of the private constructor, delete the copy constructor and 
> assignment, and just have `allocate()` and `deallocate()`?
> 
> Jie Yu wrote:
> Why? you can do `stack.address` and `stack.size`.  constructor is 
> private, meaning that only 'create' can construct the Stack. Removing copy 
> and assignment operator sounds good. Can you try that see if that compiles?

You're right, I was not thinking of using the local `stack` object directly.
Note that this only compiles with deleting the `=` operator since, as you had 
said, the copy constructor `Try(const U& u)` is invoked underneath.


- Aaron


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160513
---


On Jan. 4, 2017, 9:28 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 9:28 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-04 Thread Aaron Wood

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/
---

(Updated Jan. 4, 2017, 9:28 p.m.)


Review request for mesos and Jie Yu.


Bugs: MESOS-6835
https://issues.apache.org/jira/browse/MESOS-6835


Repository: mesos


Description
---

Currently in the Linux launcher when the stack is allocated and prepared for a 
call to clone() it is not properly aligned. This is not an issue for x86 or x64 
but for ARM64/AArch64 it is because of the requirement of having the stack 
aligned to a 16 byte boundary. While x86 and x64 also expect the stack to have 
a 16 byte aligned stack, it is not enforced. An explanation of the stack and 
requirements for ARM64 can be found here 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf 
(specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
quad-word aligned.)

Additionally, the way that the stack is currently allocated and passed to 
clone() accidentally chops off one entry, making a stack overflow using those 
missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
both the issue of the stack overflow issue as well as the SIGBUS crash. We 
should also net better performance from having the stack aligned.


Diffs (updated)
-

  3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
  src/linux/ns.hpp 77789717e 

Diff: https://reviews.apache.org/r/54996/diff/


Testing
---

Built Mesos from source and am currently running it in a test cluster. Launched 
both Docker and Mesos tasks via Marathon without any resulting crash (initial 
crash only happened with Mesos containerizer + linux_launcher, not with the 
posix_launcher).


Thanks,

Aaron Wood



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-04 Thread Jie Yu


> On Jan. 4, 2017, 5:58 p.m., Jie Yu wrote:
> > 3rdparty/stout/include/stout/os/linux.hpp, lines 57-60
> > 
> >
> > I would actually suggest keep this `create` method, but move the 
> > allocation logic in `create`. 
> > 
> > ```
> > static Try create(size_t size)
> > {
> >   Stack stack(size);
> >   
> >   if (posix_memalign(...) != 0) {
> > return ErrnoError("Failed to allocate stack");
> >   }
> >   
> >   return stack;
> > }
> > ```
> > 
> > We can get rid of the `allocate` function. Once created, it's by 
> > default allocated.
> 
> Aaron Wood wrote:
> `address` and `size` would need to be static for this. That opens up 
> another issue since all stacks would share the same data. Would it be better 
> to get rid of the private constructor, delete the copy constructor and 
> assignment, and just have `allocate()` and `deallocate()`?

Why? you can do `stack.address` and `stack.size`.  constructor is private, 
meaning that only 'create' can construct the Stack. Removing copy and 
assignment operator sounds good. Can you try that see if that compiles?


- Jie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160513
---


On Jan. 4, 2017, 12:26 a.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 12:26 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-04 Thread Aaron Wood


> On Jan. 4, 2017, 5:58 p.m., Jie Yu wrote:
> > 3rdparty/stout/include/stout/os/linux.hpp, lines 57-60
> > 
> >
> > I would actually suggest keep this `create` method, but move the 
> > allocation logic in `create`. 
> > 
> > ```
> > static Try create(size_t size)
> > {
> >   Stack stack(size);
> >   
> >   if (posix_memalign(...) != 0) {
> > return ErrnoError("Failed to allocate stack");
> >   }
> >   
> >   return stack;
> > }
> > ```
> > 
> > We can get rid of the `allocate` function. Once created, it's by 
> > default allocated.

`address` and `size` would need to be static for this. That opens up another 
issue since all stacks would share the same data. Would it be better to get rid 
of the private constructor, delete the copy constructor and assignment, and 
just have `allocate()` and `deallocate()`?


- Aaron


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160513
---


On Jan. 4, 2017, 12:26 a.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 12:26 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-04 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160513
---




3rdparty/stout/include/stout/os/linux.hpp (lines 57 - 60)


I would actually suggest keep this `create` method, but move the allocation 
logic in `create`. 

```
static Try create(size_t size)
{
  Stack stack(size);
  
  if (posix_memalign(...) != 0) {
return ErrnoError("Failed to allocate stack");
  }
  
  return stack;
}
```

We can get rid of the `allocate` function. Once created, it's by default 
allocated.



3rdparty/stout/include/stout/os/linux.hpp (line 58)


Please refer to our style guide:
https://github.com/apache/mesos/blob/master/docs/c%2B%2B-style-guide.md

You should move the tailing `{` to the next line. Ditto everywhere else.



3rdparty/stout/include/stout/os/linux.hpp (lines 74 - 76)


add a space after `if`



3rdparty/stout/include/stout/os/linux.hpp (line 75)


+1 to James's suggestion.



3rdparty/stout/include/stout/os/linux.hpp (lines 102 - 111)


I would simply the logics here:
```
if (stack.isNone()) {
  Try _stack = Stack::create(...);
  if (_stack.isError()) {
return -1;
  }
  
  stack = _stack.get();
  cleanup = true;
}

char* address = stack->address + stack->size;

...

if (cleanup && ...) {
  stack->deallocate();
}
```



3rdparty/stout/include/stout/os/linux.hpp (line 118)


Let's try to use `char*` consistently. Pointer math on void pointers is 
disallowed in both C and C++.



src/linux/ns.hpp (lines 432 - 439)


```
Try stack = os::Stack::create(...);
if (stack.isError()) {
  return Error("Failed to allocate stack: " + stack.error());
}

...
```


- Jie Yu


On Jan. 4, 2017, 12:26 a.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 12:26 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-04 Thread Aaron Wood


> On Jan. 4, 2017, 5:41 a.m., James Peach wrote:
> > 3rdparty/stout/include/stout/os/linux.hpp, line 120
> > 
> >
> > Consider hoisting this into the `Stack` class:
> > 
> > ```
> > void * Stack::start() {
> >   return (uint8_t *)address + size;
> > }
> > ```
> 
> Aaron Wood wrote:
> Don't we want to avoid C-style casts?
> 
> James Peach wrote:
> Sure, you could `static_cast` here. Not much difference imho :)

True. I thought it was part of the Mesos style guide which is why I ask.


- Aaron


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160464
---


On Jan. 4, 2017, 12:26 a.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 12:26 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-04 Thread James Peach


> On Jan. 4, 2017, 5:41 a.m., James Peach wrote:
> > 3rdparty/stout/include/stout/os/linux.hpp, line 120
> > 
> >
> > Consider hoisting this into the `Stack` class:
> > 
> > ```
> > void * Stack::start() {
> >   return (uint8_t *)address + size;
> > }
> > ```
> 
> Aaron Wood wrote:
> Don't we want to avoid C-style casts?

Sure, you could `static_cast` here. Not much difference imho :)


- James


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160464
---


On Jan. 4, 2017, 12:26 a.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 12:26 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-04 Thread Aaron Wood


> On Jan. 4, 2017, 5:41 a.m., James Peach wrote:
> > 3rdparty/stout/include/stout/os/linux.hpp, line 120
> > 
> >
> > Consider hoisting this into the `Stack` class:
> > 
> > ```
> > void * Stack::start() {
> >   return (uint8_t *)address + size;
> > }
> > ```

Don't we want to avoid C-style casts?


- Aaron


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160464
---


On Jan. 4, 2017, 12:26 a.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 12:26 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-03 Thread James Peach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160464
---




3rdparty/stout/include/stout/os/linux.hpp (line 60)


I don't think that you need this static constructor. You can do everything 
you need in `allocate`, which is them symmetric with `deallocate`.



3rdparty/stout/include/stout/os/linux.hpp (line 63)


I'd recommend something like this:
```
Try allocate(size_t nbytes) {
int error = ::posix_memalign(, os::getpagesize(), nbytes);
if (error) {
return ErrnoError(error);
}

return Nothing();
}
```



3rdparty/stout/include/stout/os/linux.hpp (line 75)


You can simplify this to just:
```
void deallocate() {
  ::free(address);
  
  address = nullptr;
  size = 0;
}
```



3rdparty/stout/include/stout/os/linux.hpp (line 79)


You can avoid the `reinterpret_cast` by typing this as `void *`. If you add 
the `start()` method suggested below, you can also make it private.



3rdparty/stout/include/stout/os/linux.hpp (line 82)


You should explictly delete the copy constructors here to ensure the stack 
can't be leaked if the code changes:

```
Stack(const Stack&) = delete;
Stack& operator=(const Stack&) = delete;
```



3rdparty/stout/include/stout/os/linux.hpp (line 105)


Since this magic number appears twice I'd be inclined to hoist it into 
`Stack`:
```
class Stack {
  ...
  static constexpr size_t DefaultSize = 8 * 1024 * 1024;
  ...
};
```



3rdparty/stout/include/stout/os/linux.hpp (line 118)


Consider hoisting this into the `Stack` class:

```
void * Stack::start() {
  return (uint8_t *)address + size;
}
```


- James Peach


On Jan. 4, 2017, 12:26 a.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 12:26 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-03 Thread Aaron Wood

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160454
---




3rdparty/stout/include/stout/os/linux.hpp (line 68)


Anyone see a way around this reinterpret_cast?


- Aaron Wood


On Jan. 4, 2017, 12:26 a.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Jan. 4, 2017, 12:26 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
>   src/linux/ns.hpp 77789717e 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2017-01-03 Thread Aaron Wood

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/
---

(Updated Jan. 4, 2017, 12:26 a.m.)


Review request for mesos and Jie Yu.


Changes
---

Created stack class so that the logic for creating an aligned stack can be 
shared. Used this new class in another area of the codebase where a stack was 
being allocated.


Bugs: MESOS-6835
https://issues.apache.org/jira/browse/MESOS-6835


Repository: mesos


Description
---

Currently in the Linux launcher when the stack is allocated and prepared for a 
call to clone() it is not properly aligned. This is not an issue for x86 or x64 
but for ARM64/AArch64 it is because of the requirement of having the stack 
aligned to a 16 byte boundary. While x86 and x64 also expect the stack to have 
a 16 byte aligned stack, it is not enforced. An explanation of the stack and 
requirements for ARM64 can be found here 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf 
(specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
quad-word aligned.)

Additionally, the way that the stack is currently allocated and passed to 
clone() accidentally chops off one entry, making a stack overflow using those 
missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
both the issue of the stack overflow issue as well as the SIGBUS crash. We 
should also net better performance from having the stack aligned.


Diffs (updated)
-

  3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
  src/linux/ns.hpp 77789717e 

Diff: https://reviews.apache.org/r/54996/diff/


Testing
---

Built Mesos from source and am currently running it in a test cluster. Launched 
both Docker and Mesos tasks via Marathon without any resulting crash (initial 
crash only happened with Mesos containerizer + linux_launcher, not with the 
posix_launcher).


Thanks,

Aaron Wood



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2016-12-29 Thread Aaron Wood


> On Dec. 27, 2016, 1:25 p.m., Till Toenshoff wrote:
> > 3rdparty/stout/include/stout/os/linux.hpp, line 81
> > 
> >
> > As you have no further use for the returned value, I guess we could 
> > simply spare the temporary and make it a bit more concise and also 
> > C++'esque by prventing the C-style cast;
> > 
> > ```
> >   if (posix_memalign(static_cast(>address), 16, 
> > stack->size) !=
> >   0) {
> > return -1;
> >   }
> > ```

Sure, I will address this in combination with Jie's comments next week!


- Aaron


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160159
---


On Dec. 29, 2016, 7:03 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Dec. 29, 2016, 7:03 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash. We 
> should also net better performance from having the stack aligned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2016-12-29 Thread Aaron Wood

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/
---

(Updated Dec. 29, 2016, 7:03 p.m.)


Review request for mesos and Jie Yu.


Bugs: MESOS-6835
https://issues.apache.org/jira/browse/MESOS-6835


Repository: mesos


Description (updated)
---

Currently in the Linux launcher when the stack is allocated and prepared for a 
call to clone() it is not properly aligned. This is not an issue for x86 or x64 
but for ARM64/AArch64 it is because of the requirement of having the stack 
aligned to a 16 byte boundary. While x86 and x64 also expect the stack to have 
a 16 byte aligned stack, it is not enforced. An explanation of the stack and 
requirements for ARM64 can be found here 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf 
(specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
quad-word aligned.)

Additionally, the way that the stack is currently allocated and passed to 
clone() accidentally chops off one entry, making a stack overflow using those 
missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
both the issue of the stack overflow issue as well as the SIGBUS crash. We 
should also net better performance from having the stack aligned.


Diffs
-

  3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 

Diff: https://reviews.apache.org/r/54996/diff/


Testing
---

Built Mesos from source and am currently running it in a test cluster. Launched 
both Docker and Mesos tasks via Marathon without any resulting crash (initial 
crash only happened with Mesos containerizer + linux_launcher, not with the 
posix_launcher).


Thanks,

Aaron Wood



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2016-12-29 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160279
---




3rdparty/stout/include/stout/os/linux.hpp (lines 76 - 77)


`Stack` is also used in src/linux/ns.hpp. We need to update there as well. 
Would you mind running a make check (or sudo make check) after applying this 
patch? That'll expose the error.

I would probably move all the allocation/deallocation logic to `os::Stack` 
so that we don't have to impl this multiple times.
```
class Stack
{
public:
  // Allocate a stack.
  static Try create(size_t size);
  
  // Explicitly free the stack. The destructor
  // won't free the allocated stack.
  void deallocate();

private:
  explicit Stack(size_t size);
  
  size_t size;
  char* address;
};
```


- Jie Yu


On Dec. 22, 2016, 9:24 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Dec. 22, 2016, 9:24 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2016-12-28 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160269
---



Patch looks great!

Reviews applied: [54996]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Dec. 22, 2016, 9:24 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Dec. 22, 2016, 9:24 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2016-12-27 Thread Till Toenshoff

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/#review160159
---




3rdparty/stout/include/stout/os/linux.hpp (line 81)


As you have no further use for the returned value, I guess we could simply 
spare the temporary and make it a bit more concise and also C++'esque by 
prventing the C-style cast;

```
  if (posix_memalign(static_cast(>address), 16, stack->size) 
!=
  0) {
return -1;
  }
```


- Till Toenshoff


On Dec. 22, 2016, 9:24 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54996/
> ---
> 
> (Updated Dec. 22, 2016, 9:24 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6835
> https://issues.apache.org/jira/browse/MESOS-6835
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently in the Linux launcher when the stack is allocated and prepared for 
> a call to clone() it is not properly aligned. This is not an issue for x86 or 
> x64 but for ARM64/AArch64 it is because of the requirement of having the 
> stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack 
> to have a 16 byte aligned stack, it is not enforced. An explanation of the 
> stack and requirements for ARM64 can be found here 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>  (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
> quad-word aligned.)
> 
> Additionally, the way that the stack is currently allocated and passed to 
> clone() accidentally chops off one entry, making a stack overflow using those 
> missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
> both the issue of the stack overflow issue as well as the SIGBUS crash.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 
> 
> Diff: https://reviews.apache.org/r/54996/diff/
> 
> 
> Testing
> ---
> 
> Built Mesos from source and am currently running it in a test cluster. 
> Launched both Docker and Mesos tasks via Marathon without any resulting crash 
> (initial crash only happened with Mesos containerizer + linux_launcher, not 
> with the posix_launcher).
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2016-12-22 Thread Aaron Wood

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/
---

(Updated Dec. 22, 2016, 9:24 p.m.)


Review request for mesos and Jie Yu.


Changes
---

Referenced ARM documentation for the required stack alignment.


Bugs: MESOS-6835
https://issues.apache.org/jira/browse/MESOS-6835


Repository: mesos


Description (updated)
---

Currently in the Linux launcher when the stack is allocated and prepared for a 
call to clone() it is not properly aligned. This is not an issue for x86 or x64 
but for ARM64/AArch64 it is because of the requirement of having the stack 
aligned to a 16 byte boundary. While x86 and x64 also expect the stack to have 
a 16 byte aligned stack, it is not enforced. An explanation of the stack and 
requirements for ARM64 can be found here 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf 
(specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be 
quad-word aligned.)

Additionally, the way that the stack is currently allocated and passed to 
clone() accidentally chops off one entry, making a stack overflow using those 
missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
both the issue of the stack overflow issue as well as the SIGBUS crash.


Diffs
-

  3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 

Diff: https://reviews.apache.org/r/54996/diff/


Testing
---

Built Mesos from source and am currently running it in a test cluster. Launched 
both Docker and Mesos tasks via Marathon without any resulting crash (initial 
crash only happened with Mesos containerizer + linux_launcher, not with the 
posix_launcher).


Thanks,

Aaron Wood



Review Request 54996: Fix SIGBUS crash on ARM64/AArch64.

2016-12-22 Thread Aaron Wood

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54996/
---

Review request for mesos and Jie Yu.


Bugs: MESOS-6835
https://issues.apache.org/jira/browse/MESOS-6835


Repository: mesos


Description
---

Currently in the Linux launcher when the stack is allocated and prepared for a 
call to clone() it is not properly aligned. This is not an issue for x86 or x64 
but for ARM64/AArch64 it is because of the requirement of having the stack 
aligned to a 16 byte boundary. While x86 and x64 also expect the stack to have 
a 16 byte aligned stack, it is not enforced.

Additionally, the way that the stack is currently allocated and passed to 
clone() accidentally chops off one entry, making a stack overflow using those 
missing 8 bytes a possibility. Fixing this while aligning the memory will fix 
both the issue of the stack overflow issue as well as the SIGBUS crash.


Diffs
-

  3rdparty/stout/include/stout/os/linux.hpp 530f1a55b 

Diff: https://reviews.apache.org/r/54996/diff/


Testing
---

Built Mesos from source and am currently running it in a test cluster. Launched 
both Docker and Mesos tasks via Marathon without any resulting crash (initial 
crash only happened with Mesos containerizer + linux_launcher, not with the 
posix_launcher).


Thanks,

Aaron Wood