Re: [1003.1(2013)/Issue7+TC1 0001016]: race condition with set -C

2016-11-06 Thread Shware Systems
To last question, yes, but the effects are supposed to be documented so generic 
guard code that may invoke platform specific pre-ln attempt handling can be 
written.
This is a compromise to disqualifying a system that defines additional file 
types from being considered conforming at all. In a script this might look like:
if [ -e /app/$platform/linkhandler ] ; 
then { . .../linkhandler }
else { do ln directly }; fi

To some extent it's also the operator's responsibility to sandbox use of 
non-standard file types outside directories the standard says portable 
applications need access to, such as $TMP, to avoid issues. A platform aware 
application might create such a file in a $TMP/appsubdir directory but 
shouldn't link it into /tmp after, iow, but to an ~/app/files type directory 
instead. That is more a training issue to me, not something the standard can 
reasonably address or make a requirement.

On Sunday, November 6, 2016 Martijn Dekker  wrote:

Op 02-11-16 om 13:32 schreef Martijn Dekker:
> If both 'mkdir' and 'ln' operate atomically, there could be a safe
> workaround for creating a regular file directly under /tmp. It would
> involve creating a (very) temporary directory under /tmp using 'mkdir
> -m700', then creating the file inside there, setting the mode, etc. with
> no need for atomicity, then attempting to 'ln' that file back to /tmp
> until we've got an available name. Do you think this could work?

No one replied to poke holes in this, so I went ahead and implemented
this workaround in the modernish shell library implementation of
'mktemp'. 

Just one thing still worries me a bit. Though 'ln' without the -f option
is never supposed to overwrite files, the spec also states:

| If the last operand specifies an existing file of a type not
| specified by the System Interfaces volume of POSIX.1-2008, the
| behavior is implementation-defined.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ln.html

Does this mean I cannot actually rely on 'ln' not overwriting a file or
otherwise behaving unexpectedly?

Thanks,

- M.





Re: macOS 10.12, broken PTHREAD_CANCEL_DISABLE and UNIX certification

2016-11-06 Thread Per Mildner

> On 5 Nov 2016, at 10:22, Shware Systems  wrote:
> 
> From the output, I'm wondering about the source of the Illegal instruction: 4 
> diagnostic. If SIGILL isn't blocked, it would also exit the process, and I 
> believe run cancel handlers as part of process shutdown, whatever cancelstate 
> set to. So something about the code is suspect, but it may be a problem 
> internal to the pipe reads or writes, not the pthread routines or how they're 
> being used; possibly a buffer overrun or aggressive optimization issue, as a 
> guess.
> 
> 

The illegal instruction is because of an ud2 instruction used as a last 
fallback in abort() (really in __abort()). Repeating the test with a debugger 
attached verifies that the cleanup handler is called when the write() in 
pthread_start_routine is cancelled, i.e. something that would not happen if 
PTHREAD_CANCEL_DISABLE was working.


Starting test, 1 iterations, sleep interval 10ms
cancel_leak.c:46: ERROR cancelled while PTHREAD_CANCEL_DISABLE
Process 28027 stopped
* thread #2: tid = 0x4f4c05, 0x7fffbc4ec4db libsystem_c.dylib`__abort + 
172, stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
frame #0: 0x7fffbc4ec4db libsystem_c.dylib`__abort + 172
libsystem_c.dylib`__abort:
->  0x7fffbc4ec4db <+172>: ud2

libsystem_c.dylib`abort_report_np:
0x7fffbc4ec4dd <+0>:   pushq  %rbp
0x7fffbc4ec4de <+1>:   movq   %rsp, %rbp
0x7fffbc4ec4e1 <+4>:   pushq  %r14
(lldb) bt
bt
warning: could not load any Objective-C class information. This will 
significantly reduce the quality of type information available.
* thread #2: tid = 0x4f4c05, 0x7fffbc4ec4db libsystem_c.dylib`__abort + 
172, stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x7fffbc4ec4db libsystem_c.dylib`__abort + 172
frame #1: 0x7fffbc4ec42f libsystem_c.dylib`abort + 144
frame #2: 0x00011d6a 
cancel_leak`cleanup_routine(arg=0x) + 74 at cancel_leak.c:48
frame #3: 0x7fffbc671233 libsystem_pthread.dylib`_pthread_exit + 130
frame #4: 0x7fffbc671da8 libsystem_pthread.dylib`pthread_exit + 30
frame #5: 0x7fffbc66ee26 
libsystem_pthread.dylib`_pthread_exit_if_canceled + 71
frame #6: 0x7fffbc57fda1 libsystem_kernel.dylib`cerror + 13
frame #7: 0x00011c25 
cancel_leak`pthread_start_routine(vcookie=0x7fff5fbff8b0) + 549 at 
cancel_leak.c:80
frame #8: 0x7fffbc66faab libsystem_pthread.dylib`_pthread_body + 180
frame #9: 0x7fffbc66f9f7 libsystem_pthread.dylib`_pthread_start + 286
frame #10: 0x7fffbc66f221 libsystem_pthread.dylib`thread_start + 13
(lldb) 

> As to certification, the person running the conformance test suites and 
> submitting the results probably doesn't monitor bug reports. If the test 
> suite passes, they happy, go on vacation, and figure any actual bugs a 
> feature that can be ignored or is some underling's job to handle. If it 
> doesn't pass, they file reports, not read them, and wait for someone to tell 
> them try running it again. This may be unfair, but is frequently enough 
> accurate. Whether the test suite is doing sufficient test cases to catch 
> intermittent environmentally induced failures also unknown, and is another 
> possibility, but at least one of the test suite maintainers does monitor this 
> list.
> 
Is there a way to make formal bug-reports against conformance, i.e. a formal 
way to tell the Unix certification authority about non-conformance? It seems 
possible that a vendor is not really interested in fixing a conformance problem 
unless it is reported by many users, or the vendor risks losing the marketing 
benefit of Unix certification. And, as you point out, it may well be that the 
ones responsible for certification at the vendor do not even hear about the 
bugs reported to the vendor bug reporting system. A nudge from the 
certification authority may be more likely to reach the right people.

Regards,

> 
> On Friday, November 4, 2016 Per Mildner  wrote:
> 
> PTHREAD_CANCEL_DISABLE has never worked reliably on OS X. This is true for 
> all versions of OS X from 10.8 to 10.12, despite the fact that most of these 
> have received Unix certification.
> 
> This bug has been known by Apple at least since I reported the issue for OS X 
> 10.8, in 2011 .
> 
> The lack of a working PTHREAD_CANCEL_DISABLE makes pthread_cancel() more or 
> less useless, and there is no workaround.
> 
> I never got any feedback from Apple about this bug-report and would 
> appreciate if anyone on this list can shed some light on the following.
> 
> 1. Is my test program correct? That is, does it really expose a violation 
> against the Unix standard? If my test is broken, please accept my apologies 
> and ignore the rest of my email.
> 
> 2. What is supposed to happen when a vendor gets notified about conformance 
> bugs but never fixes 

Intended difference between waitpid() and waitid() ??

2016-11-06 Thread Robert Elz
The spec (C165) for wait() (though this is only relevant to waitpid())
says ...

If waitpid( ) was invoked with WNOHANG set in options, it
has at least one child process specified by pid for which status
is not available, and status is not available for any process
specified by pid, 0 is returned. Otherwise, -1 shall be returned,
and errno set to indicate the error.

Whereas for waitid() the the similar spec is ...

If WNOHANG was specified and status is not available for any process
specified by idtype and id, 0 shall be returned. If waitid( ) returns
due to the change of state of one of its children, 0 shall be returned.
Otherwise, -1 shall be returned and errno set to indicate the error.

Note a lack in waitid() of anything corresponding to the "it has at least
one child process..." clause that exists for waitpid().

Is this intentional?   That is, should I assume that a process with no
children (at all) which does a waitid() with WNOHANG set is not intended
to receive ECHILD, but just a 0 return (with the appropriate fields of the
siginfo set to 0 as well of course.)


And while I am here, what is expected (from waitid()) if the
process identified by idtype & id (for waitid) does not exist,
but there are other child processes ?

One might expect ECHILD, as it is clear for waitpid() that is the
correct error - except in waitid() ...

   [ECHILD]  The calling process has no existing unwaited-for child processes.

Thus is not true in the postulated case, there are existing unwaited for
child processes, but not the one requested.

That is, let us assume the current process is pid 4, it has a single
child process (pid 5) whose state does not matter here, and pid 4 does

pid = waitid(P_PID, 6, ...);/* remaining args assumed 
correct,
but are irrelevant here */

what is expected to happen in that case?   The calling process(4) has
existing unwaited for child processes (pid 5) so ECHILD is apparently
not the correct error.

Perhaps:
[EINVAL]An invalid value was specified for options, or idtype
and id specify an invalid set of processes.

Is specifying a pid which is not a child of the current process "an
invalid set of processes" or does that text mean something different?

If EINVAL is correct here, is that an intended difference from waitpid()
or just an error?

kre