Re: macOS 10.12, broken PTHREAD_CANCEL_DISABLE and UNIX certification

2016-11-07 Thread Shware Systems
Given frame 6 and 7, it looks like write is calling pthread_exit directly, 
rather than pthread_cancel, so would be where the bug is, unless write
is required to exit for that particular circumstance. If it has to exit, then 
the setup code necessary to avoid it is missing from the main thread's code, it 
looks. As I don't see the main thread reading the data the other thread writes, 
it may be write() is generating a SIGPIPE, due to EPIPE, that is unblocked and 
has a T default action. This would occur after an arbitrary time with no read() 
by any thread on the read descriptor, I expect; just keeping the descriptor 
unclosed without even a single read() attempt I wouldn't consider sufficient to 
avoid it.

I'm pretty sure someone at Open Group has fielding non-conformance reports in 
their job description, but who that would be at this point I have no idea, 
sorry.

On Sunday, November 6, 2016 Per Mildner  wrote:


On 5 Nov 2016, at 10:22, Shware Systems  wrote:

>From the output, I'm wondering about the source of the Illegal instruction: 4 
>diagnostic. If SIGILL isn't blocked, it would also exit the process, and I 
>believe run cancel handlers as part of process shutdown, whatever cancelstate 
>set to. So something about the code is suspect, but it may be a problem 
>internal to the pipe reads or writes, not the pthread routines or how they're 
>being used; possibly a buffer overrun or aggressive optimization issue, as a 
>guess.



The illegal instruction is because of an ud2 instruction used as a last 
fallback in abort() (really in __abort()). Repeating the test with a debugger 
attached verifies that the cleanup handler is called when the write() in 
pthread_start_routine is cancelled, i.e. something that would not happen if 
PTHREAD_CANCEL_DISABLE was working.


Starting test, 1 iterations, sleep interval 10ms
cancel_leak.c:46: ERROR cancelled while PTHREAD_CANCEL_DISABLE
Process 28027 stopped
* thread #2: tid = 0x4f4c05, 0x7fffbc4ec4db libsystem_c.dylib`__abort + 
172, stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x7fffbc4ec4db libsystem_c.dylib`__abort + 172
libsystem_c.dylib`__abort:
->  0x7fffbc4ec4db <+172>: ud2    

libsystem_c.dylib`abort_report_np:
    0x7fffbc4ec4dd <+0>:   pushq  %rbp
    0x7fffbc4ec4de <+1>:   movq   %rsp, %rbp
    0x7fffbc4ec4e1 <+4>:   pushq  %r14
(lldb) bt
bt
warning: could not load any Objective-C class information. This will 
significantly reduce the quality of type information available.
* thread #2: tid = 0x4f4c05, 0x7fffbc4ec4db libsystem_c.dylib`__abort + 
172, stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x7fffbc4ec4db libsystem_c.dylib`__abort + 172
    frame #1: 0x7fffbc4ec42f libsystem_c.dylib`abort + 144
    frame #2: 0x00011d6a 
cancel_leak`cleanup_routine(arg=0x) + 74 at cancel_leak.c:48
    frame #3: 0x7fffbc671233 libsystem_pthread.dylib`_pthread_exit + 130
    frame #4: 0x7fffbc671da8 libsystem_pthread.dylib`pthread_exit + 30
    frame #5: 0x7fffbc66ee26 
libsystem_pthread.dylib`_pthread_exit_if_canceled + 71
    frame #6: 0x7fffbc57fda1 libsystem_kernel.dylib`cerror + 13
    frame #7: 0x00011c25 
cancel_leak`pthread_start_routine(vcookie=0x7fff5fbff8b0) + 549 at 
cancel_leak.c:80
    frame #8: 0x7fffbc66faab libsystem_pthread.dylib`_pthread_body + 180
    frame #9: 0x7fffbc66f9f7 libsystem_pthread.dylib`_pthread_start + 286
    frame #10: 0x7fffbc66f221 libsystem_pthread.dylib`thread_start + 13
(lldb) 

As to certification, the person running the conformance test suites and 
submitting the results probably doesn't monitor bug reports. If the test suite 
passes, they happy, go on vacation, and figure any actual bugs a feature that 
can be ignored or is some underling's job to handle. If it doesn't pass, they 
file reports, not read them, and wait for someone to tell them try running it 
again. This may be unfair, but is frequently enough accurate. Whether the test 
suite is doing sufficient test cases to catch intermittent environmentally 
induced failures also unknown, and is another possibility, but at least one of 
the test suite maintainers does monitor this list.

Is there a way to make formal bug-reports against conformance, i.e. a formal 
way to tell the Unix certification authority about non-conformance? It seems 
possible that a vendor is not really interested in fixing a conformance problem 
unless it is reported by many users, or the vendor risks losing the marketing 
benefit of Unix certification. And, as you point out, it may well be that the 
ones responsible for certification at the vendor do not even hear about the 
bugs reported to the vendor bug reporting system. A nudge from the 
certification authority may be more likely to reach the right people.

Regards,


On Friday, November 4, 2016 Per Mildner  wrote:


Re: macOS 10.12, broken PTHREAD_CANCEL_DISABLE and UNIX certification

2016-11-06 Thread Per Mildner

> On 5 Nov 2016, at 10:22, Shware Systems  wrote:
> 
> From the output, I'm wondering about the source of the Illegal instruction: 4 
> diagnostic. If SIGILL isn't blocked, it would also exit the process, and I 
> believe run cancel handlers as part of process shutdown, whatever cancelstate 
> set to. So something about the code is suspect, but it may be a problem 
> internal to the pipe reads or writes, not the pthread routines or how they're 
> being used; possibly a buffer overrun or aggressive optimization issue, as a 
> guess.
> 
> 

The illegal instruction is because of an ud2 instruction used as a last 
fallback in abort() (really in __abort()). Repeating the test with a debugger 
attached verifies that the cleanup handler is called when the write() in 
pthread_start_routine is cancelled, i.e. something that would not happen if 
PTHREAD_CANCEL_DISABLE was working.


Starting test, 1 iterations, sleep interval 10ms
cancel_leak.c:46: ERROR cancelled while PTHREAD_CANCEL_DISABLE
Process 28027 stopped
* thread #2: tid = 0x4f4c05, 0x7fffbc4ec4db libsystem_c.dylib`__abort + 
172, stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
frame #0: 0x7fffbc4ec4db libsystem_c.dylib`__abort + 172
libsystem_c.dylib`__abort:
->  0x7fffbc4ec4db <+172>: ud2

libsystem_c.dylib`abort_report_np:
0x7fffbc4ec4dd <+0>:   pushq  %rbp
0x7fffbc4ec4de <+1>:   movq   %rsp, %rbp
0x7fffbc4ec4e1 <+4>:   pushq  %r14
(lldb) bt
bt
warning: could not load any Objective-C class information. This will 
significantly reduce the quality of type information available.
* thread #2: tid = 0x4f4c05, 0x7fffbc4ec4db libsystem_c.dylib`__abort + 
172, stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x7fffbc4ec4db libsystem_c.dylib`__abort + 172
frame #1: 0x7fffbc4ec42f libsystem_c.dylib`abort + 144
frame #2: 0x00011d6a 
cancel_leak`cleanup_routine(arg=0x) + 74 at cancel_leak.c:48
frame #3: 0x7fffbc671233 libsystem_pthread.dylib`_pthread_exit + 130
frame #4: 0x7fffbc671da8 libsystem_pthread.dylib`pthread_exit + 30
frame #5: 0x7fffbc66ee26 
libsystem_pthread.dylib`_pthread_exit_if_canceled + 71
frame #6: 0x7fffbc57fda1 libsystem_kernel.dylib`cerror + 13
frame #7: 0x00011c25 
cancel_leak`pthread_start_routine(vcookie=0x7fff5fbff8b0) + 549 at 
cancel_leak.c:80
frame #8: 0x7fffbc66faab libsystem_pthread.dylib`_pthread_body + 180
frame #9: 0x7fffbc66f9f7 libsystem_pthread.dylib`_pthread_start + 286
frame #10: 0x7fffbc66f221 libsystem_pthread.dylib`thread_start + 13
(lldb) 

> As to certification, the person running the conformance test suites and 
> submitting the results probably doesn't monitor bug reports. If the test 
> suite passes, they happy, go on vacation, and figure any actual bugs a 
> feature that can be ignored or is some underling's job to handle. If it 
> doesn't pass, they file reports, not read them, and wait for someone to tell 
> them try running it again. This may be unfair, but is frequently enough 
> accurate. Whether the test suite is doing sufficient test cases to catch 
> intermittent environmentally induced failures also unknown, and is another 
> possibility, but at least one of the test suite maintainers does monitor this 
> list.
> 
Is there a way to make formal bug-reports against conformance, i.e. a formal 
way to tell the Unix certification authority about non-conformance? It seems 
possible that a vendor is not really interested in fixing a conformance problem 
unless it is reported by many users, or the vendor risks losing the marketing 
benefit of Unix certification. And, as you point out, it may well be that the 
ones responsible for certification at the vendor do not even hear about the 
bugs reported to the vendor bug reporting system. A nudge from the 
certification authority may be more likely to reach the right people.

Regards,

> 
> On Friday, November 4, 2016 Per Mildner  wrote:
> 
> PTHREAD_CANCEL_DISABLE has never worked reliably on OS X. This is true for 
> all versions of OS X from 10.8 to 10.12, despite the fact that most of these 
> have received Unix certification.
> 
> This bug has been known by Apple at least since I reported the issue for OS X 
> 10.8, in 2011 .
> 
> The lack of a working PTHREAD_CANCEL_DISABLE makes pthread_cancel() more or 
> less useless, and there is no workaround.
> 
> I never got any feedback from Apple about this bug-report and would 
> appreciate if anyone on this list can shed some light on the following.
> 
> 1. Is my test program correct? That is, does it really expose a violation 
> against the Unix standard? If my test is broken, please accept my apologies 
> and ignore the rest of my email.
> 
> 2. What is supposed to happen when a vendor gets notified about conformance 
> bugs but never fixes 

RE: macOS 10.12, broken PTHREAD_CANCEL_DISABLE and UNIX certification

2016-11-05 Thread Shware Systems
>From the output, I'm wondering about the source of the Illegal instruction: 4 
>diagnostic. If SIGILL isn't blocked, it would also exit the process, and I 
>believe run cancel handlers as part of process shutdown, whatever cancelstate 
>set to. So something about the code is suspect, but it may be a problem 
>internal to the pipe reads or writes, not the pthread routines or how they're 
>being used; possibly a buffer overrun or aggressive optimization issue, as a 
>guess.

As to certification, the person running the conformance test suites and 
submitting the results probably doesn't monitor bug reports. If the test suite 
passes, they happy, go on vacation, and figure any actual bugs a feature that 
can be ignored or is some underling's job to handle. If it doesn't pass, they 
file reports, not read them, and wait for someone to tell them try running it 
again. This may be unfair, but is frequently enough accurate. Whether the test 
suite is doing sufficient test cases to catch intermittent environmentally 
induced failures also unknown, and is another possibility, but at least one of 
the test suite maintainers does monitor this list.

On Friday, November 4, 2016 Per Mildner  wrote:

PTHREAD_CANCEL_DISABLE has never worked reliably on OS X. This is true for all 
versions of OS X from 10.8 to 10.12, despite the fact that most of these have 
received Unix certification.

This bug has been known by Apple at least since I reported the issue for OS X 
10.8, in 2011 .

The lack of a working PTHREAD_CANCEL_DISABLE makes pthread_cancel() more or 
less useless, and there is no workaround.

I never got any feedback from Apple about this bug-report and would appreciate 
if anyone on this list can shed some light on the following.

1. Is my test program correct? That is, does it really expose a violation 
against the Unix standard? If my test is broken, please accept my apologies and 
ignore the rest of my email.

2. What is supposed to happen when a vendor gets notified about conformance 
bugs but never fixes them? That is, why does Apple get certification for new 
releases of their OS (macOS 10.12 
) when they and 
others know before certification that it violates the standard?

(I understand that there can be conformance bugs detected after a vendor 
receives certification and that there may be a delay in fixing bugs. But this 
is certification for new products that they knew from the start was 
non-conforming.)


Regards,

Per Mildner per.mild...@sics.se
SICS Swedish ICT






macOS 10.12, broken PTHREAD_CANCEL_DISABLE and UNIX certification

2016-11-04 Thread Per Mildner
PTHREAD_CANCEL_DISABLE has never worked reliably on OS X. This is true for all 
versions of OS X from 10.8 to 10.12, despite the fact that most of these have 
received Unix certification.

This bug has been known by Apple at least since I reported the issue for OS X 
10.8, in 2011 .

The lack of a working PTHREAD_CANCEL_DISABLE makes pthread_cancel() more or 
less useless, and there is no workaround.

I never got any feedback from Apple about this bug-report and would appreciate 
if anyone on this list can shed some light on the following.

1. Is my test program correct? That is, does it really expose a violation 
against the Unix standard? If my test is broken, please accept my apologies and 
ignore the rest of my email.

2. What is supposed to happen when a vendor gets notified about conformance 
bugs but never fixes them? That is, why does Apple get certification for new 
releases of their OS (macOS 10.12 
) when they and 
others know before certification that it violates the standard?

(I understand that there can be conformance bugs detected after a vendor 
receives certification and that there may be a delay in fixing bugs. But this 
is certification for new products that they knew from the start was 
non-conforming.)


Regards,

Per Mildner per.mild...@sics.se
SICS Swedish ICT