Re: (call-process ...) hangs in emacs
On Sep 2 22:23, Achim Gratz wrote: Corinna Vinschen writes: $ setfacl -d g:system: filename Note the trailing colon. That's not what the man page specifies, however. I'll keep it in mind. I patched setfacl to not require trailing colons anymore. This also fixes a bug in terms of the allowed acl entries when deleting. I now also fixed setfacl to add missing acl entries when modifying an acl, same way as the Linux setfacl handles this. And, this is important, given that setfacl now always creates complete acls when modifying an acl, I could finally fix the aclcheck(2) function in the Cygwin DLL to more thorougly test the incoming acl for all required entries. That means, when using the new Cygwin DLL, you also have to use the accompanying setfacl(1). With any older setfacl you'll suffer setfacl: Invalid argument error messages. I just created a new snapshot on https://cygwin.com/snapshots/ containing these patches. Please give them a try. Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpt3MhdP8NQ2.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
Corinna Vinschen writes: I patched setfacl to not require trailing colons anymore. This also fixes a bug in terms of the allowed acl entries when deleting. Great, thanks! […] I just created a new snapshot on https://cygwin.com/snapshots/ containing these patches. Please give them a try. I'll do. Regards, Achim. -- +[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]+ Factory and User Sound Singles for Waldorf Blofeld: http://Synth.Stromeko.net/Downloads.html#WaldorfSounds -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Sep 1 19:38, Achim Gratz wrote: Corinna Vinschen writes: Over the weekend it occured to me that the acl(2) function created ACLs which not aligned well with the ACLs created by open(2),chmod(2), etc. Yesterday I fixed the acl(2) function to create ACLs the same way as the other functions, especially in terms of the owner and group entries and the SE_DACL_PROTECTED flag. The changes are in the latest snapshot from https://cygwin.com/snapshots/ Please give them a try. Installed, everything looks fine so far. Thanks for testing! Maybe you'd like to have a view into the simple as well as the more complex ACLs created by Cygwin? Icacls output is moderately easy to read. If you have questions or concerns... Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgp6Zo0ZTRA2j.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
Corinna Vinschen writes: Installed, everything looks fine so far. Thanks for testing! Maybe you'd like to have a view into the simple as well as the more complex ACLs created by Cygwin? Icacls output is moderately easy to read. If you have questions or concerns... I've used icacls in the past. Is there anything that you want me to check specifically? Regards, Achim. -- +[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]+ SD adaptation for Waldorf microQ V2.22R2: http://Synth.Stromeko.net/Downloads.html#WaldorfSDada -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Sep 2 19:29, Achim Gratz wrote: Corinna Vinschen writes: Installed, everything looks fine so far. Thanks for testing! Maybe you'd like to have a view into the simple as well as the more complex ACLs created by Cygwin? Icacls output is moderately easy to read. If you have questions or concerns... I've used icacls in the past. Is there anything that you want me to check specifically? More or less, just compare the ACLs and see if you find strange differences. This only works for the ACLs created or modified with `setfacl' and the snapshot DLL. The ACLs created or modified via setfacl with the older DLLs always were different and, I have to admit, kind of broke the default POSIX permissions created via open() or chmod(). The idea of my change was to make them always in an identical fashion. The order may only vary in secondary permissions, but never in the standard permissions, which also always come first. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpLHgeR1vMf8.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
Corinna Vinschen writes: More or less, just compare the ACLs and see if you find strange differences. This only works for the ACLs created or modified with `setfacl' and the snapshot DLL. I see, I'll have to make extra tests for this. Usually I just have to live with some inherited ACL that I can't change at all. The ACLs created or modified via setfacl with the older DLLs always were different and, I have to admit, kind of broke the default POSIX permissions created via open() or chmod(). The idea of my change was to make them always in an identical fashion. The order may only vary in secondary permissions, but never in the standard permissions, which also always come first. One thing I've noticed, but can't really say if it's related to the change, is that setfacl quite often claims an illegal ACL when trying to remove for instance the SYSTEM read permission. Removing the group owner ACL instead did the right thing in at least one instance. But I've mostly been removing all ACL from the whole tree via the explorer security tab (for ~/.ssh/ and similar stuff). Regards, Achim. -- +[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]+ Factory and User Sound Singles for Waldorf Blofeld: http://Synth.Stromeko.net/Downloads.html#WaldorfSounds -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Sep 2 21:42, Achim Gratz wrote: Corinna Vinschen writes: More or less, just compare the ACLs and see if you find strange differences. This only works for the ACLs created or modified with `setfacl' and the snapshot DLL. I see, I'll have to make extra tests for this. Usually I just have to live with some inherited ACL that I can't change at all. The ACLs created or modified via setfacl with the older DLLs always were different and, I have to admit, kind of broke the default POSIX permissions created via open() or chmod(). The idea of my change was to make them always in an identical fashion. The order may only vary in secondary permissions, but never in the standard permissions, which also always come first. One thing I've noticed, but can't really say if it's related to the change, is that setfacl quite often claims an illegal ACL when trying to remove for instance the SYSTEM read permission. $ setfacl -d g:system: filename Note the trailing colon. Removing the group owner ACL instead did the right thing in at least one instance. ??? It shouldn't. Removing the standard ACL entries for the owner, owner group, and other is not allowed: $ setfacl -d g:: filename setfacl: No error The No error is a bug, related to the fact that the aclsort() function doesn't set errno if aclcheck() failed. I just fixed that in CVS. I've mostly been removing all ACL from the whole tree via the explorer security tab (for ~/.ssh/ and similar stuff). *All* ACL??? That sounds wrong to me. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpFZzps41_Ga.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
Corinna Vinschen writes: $ setfacl -d g:system: filename Note the trailing colon. That's not what the man page specifies, however. I'll keep it in mind. Removing the group owner ACL instead did the right thing in at least one instance. ??? It shouldn't. Removing the standard ACL entries for the owner, owner group, and other is not allowed: $ setfacl -d g:: filename setfacl: No error The No error is a bug, related to the fact that the aclsort() function doesn't set errno if aclcheck() failed. I just fixed that in CVS. I'll try that again to be sure. I've mostly been removing all ACL from the whole tree via the explorer security tab (for ~/.ssh/ and similar stuff). *All* ACL??? That sounds wrong to me. Take ownership (if not already owner), give full acces and nuke everything else (general read access for user and change permission for admin groups of various sorts mostly). Under Cygwin this then leaves a clean mode (600 or 700, depending) with no + ACL. Regards, Achim. -- +[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]+ Factory and User Sound Singles for Waldorf Q+, Q and microQ: http://Synth.Stromeko.net/Downloads.html#WaldorfSounds -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Aug 30 03:32, Andrey Repin wrote: Greetings, Corinna Vinschen! What needs to be done is to fix the ssh-host-config script. It adds an ACE for SYSTEM on /var/empty, /etc, and /var/log for no apparent reason. If it's not apparent: you can actually prevent system from accessing the file by removing access from SYSTEM user to the file. No, you can't. You only can for accounts not having and applications not enabling backup and restore rights. But, anyway, this has nothing to do with extra permissions on /var/empty which the SYSTEM account really doesn't need. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpTBEWyUprym.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
On Aug 29 09:58, Achim Gratz wrote: Achim Gratz Stromeko at NexGo.DE writes: Please test. This fixes the read-only problem in Emacs (so that hunch was correct). Perl still doesn't play, but I think the 5.18 version should get it correct. Will need to switch a test installation over for that, though. With that snapshot in place, ssh suddenly recognized that my private key file was more readable than it liked it to be, so it looks that it's using the same general strategy of dealing with ACL as Emacs. I'm starting to like this patch very much... :-) Over the weekend it occured to me that the acl(2) function created ACLs which not aligned well with the ACLs created by open(2),chmod(2), etc. Yesterday I fixed the acl(2) function to create ACLs the same way as the other functions, especially in terms of the owner and group entries and the SE_DACL_PROTECTED flag. The changes are in the latest snapshot from https://cygwin.com/snapshots/ Please give them a try. Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpH80KCyg9ws.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
Corinna Vinschen writes: Over the weekend it occured to me that the acl(2) function created ACLs which not aligned well with the ACLs created by open(2),chmod(2), etc. Yesterday I fixed the acl(2) function to create ACLs the same way as the other functions, especially in terms of the owner and group entries and the SE_DACL_PROTECTED flag. The changes are in the latest snapshot from https://cygwin.com/snapshots/ Please give them a try. Installed, everything looks fine so far. Regards, Achim. -- +[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]+ Factory and User Sound Singles for Waldorf rackAttack: http://Synth.Stromeko.net/Downloads.html#WaldorfSounds -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: ACL behavior in Cygwin // Re: (call-process ...) hangs in emacs
On Aug 28 21:00, Andrey Repin wrote: Greetings, Corinna Vinschen! It's what acl means on Cygwin. acl means that Windowsd ACLs are used and permissions are handled and converted to and from POSIX permissions. noacl means, Cygwin ignores all ACLs and fakes ownership and POSIX permissions only based only on filetype and DOS R/O attribute, as it has to on filesystems not supporting ACLs, like FAT/FAT32. Got it. It seems, Cygwin need a middle groung between these two for cases, where FS support access control, but don't want to be mangled. I'm certainly not going to introduce another mount mode. I didn't said it has to be mount mode... besides, it doesn't make sense to implement YA mode to do what is already done, just a little different. What Cygwin could do is to perform ACL-based access checks independently of the acl/noacl mount mode on FSes supporting ACLs. However, if you want ACLs, why not use the acl mount mode in the first place? ACL inheritance, mostly. POSIX'ized permissions break inheritance on newly created files, at times making these files inaccessible to native applications, even though inheritance rules would allow it otherwise. Still, it *might* makes sense in some scenarios, even if the results of stat(2)/acl(2) may differ surprisingly from what access(2) returns. We can also simply try it out. A patch to enable this behaviour is dead-simple. Here's the prerequisite: Would more than one person want that *and* be willing to give this a *thorough* testing? I'd like to hear out expected behavior from this patch first. Same output from stat(2), different output from access(2). Access(2) would only take the actual Windows ACL into account and let Windows functions decide about granting or denying the requested access. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpHtJIogdk1V.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
Achim Gratz Stromeko at NexGo.DE writes: Please test. This fixes the read-only problem in Emacs (so that hunch was correct). Perl still doesn't play, but I think the 5.18 version should get it correct. Will need to switch a test installation over for that, though. With that snapshot in place, ssh suddenly recognized that my private key file was more readable than it liked it to be, so it looks that it's using the same general strategy of dealing with ACL as Emacs. I'm starting to like this patch very much... :-) Regards, Achim. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Aug 29 09:58, Achim Gratz wrote: Achim Gratz Stromeko at NexGo.DE writes: Please test. This fixes the read-only problem in Emacs (so that hunch was correct). Perl still doesn't play, but I think the 5.18 version should get it correct. Will need to switch a test installation over for that, though. With that snapshot in place, ssh suddenly recognized that my private key file was more readable than it liked it to be, so it looks that it's using the same general strategy of dealing with ACL as Emacs. ...which means, they don't deal with ACLs at all. They only see what's given in the st_mode permission bits. With this change, the group permission bits now show that *somebody* has certain permissions on the file, thus the group permissions indicate a too open access for ssh, if somebody except you have write access to the file. Downside: If you use inherited Windows permissions, you'll often have the case that Administrators and/or SYSTEM have full access to your files. This in turn shows up as rwx group permissions now. If you can't change the permissions (company requirements, etc) the ssh key file permission test will get annoying. So it's probably a very nice change (thanks a lot for bringing this up!), but it will probably have some negative side-effects for existing installations. I'm starting to like this patch very much... :-) Despite of what I'm outlining above, me too :) Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpEaeJNMvNNZ.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
On 8/29/2014 7:09 AM, Corinna Vinschen wrote: On Aug 29 09:58, Achim Gratz wrote: Achim Gratz Stromeko at NexGo.DE writes: Please test. This fixes the read-only problem in Emacs (so that hunch was correct). Perl still doesn't play, but I think the 5.18 version should get it correct. Will need to switch a test installation over for that, though. With that snapshot in place, ssh suddenly recognized that my private key file was more readable than it liked it to be, so it looks that it's using the same general strategy of dealing with ACL as Emacs. ...which means, they don't deal with ACLs at all. They only see what's given in the st_mode permission bits. With this change, the group permission bits now show that *somebody* has certain permissions on the file, thus the group permissions indicate a too open access for ssh, if somebody except you have write access to the file. Downside: If you use inherited Windows permissions, you'll often have the case that Administrators and/or SYSTEM have full access to your files. This in turn shows up as rwx group permissions now. If you can't change the permissions (company requirements, etc) the ssh key file permission test will get annoying. So it's probably a very nice change (thanks a lot for bringing this up!), but it will probably have some negative side-effects for existing installations. I'm starting to like this patch very much... :-) Despite of what I'm outlining above, me too :) With the latest snapshot I can't start the sshd service. The Application Log just says, `sshd' service stopped, exit status:255. The problem doesn't occur with the 2014-08-27 snapshot. I guess this has something to do with the new permissions on various files, but I'm not sure which ones. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
Ken Brown writes: With the latest snapshot I can't start the sshd service. The Application Log just says, `sshd' service stopped, exit status:255. The problem doesn't occur with the 2014-08-27 snapshot. I guess this has something to do with the new permissions on various files, but I'm not sure which ones. Off the top of my head for the standard installation: /etc/ssh* /var/empty /var/log/sshd When you try to debug the sshd, IIR these are the files that must be chown'ed to the admin user that runs sshd from the terminal. Running in debug mode (either from the terminal or via sshd_config) should produce messages which file or directory sshd is choking on. Regards, Achim. -- +[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]+ Factory and User Sound Singles for Waldorf rackAttack: http://Synth.Stromeko.net/Downloads.html#WaldorfSounds -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 8/29/2014 3:23 PM, Achim Gratz wrote: Ken Brown writes: With the latest snapshot I can't start the sshd service. The Application Log just says, `sshd' service stopped, exit status:255. The problem doesn't occur with the 2014-08-27 snapshot. I guess this has something to do with the new permissions on various files, but I'm not sure which ones. Off the top of my head for the standard installation: /etc/ssh* /var/empty /var/log/sshd When you try to debug the sshd, IIR these are the files that must be chown'ed to the admin user that runs sshd from the terminal. Running in debug mode (either from the terminal or via sshd_config) should produce messages which file or directory sshd is choking on. I just checked /var/log/sshd.log. (I hadn't thought to do that before.) The last message in it is, /var/empty must be owned by root and not group or world-writable. So the problem seems to be that /var/empty appears to sshd to be group writable under the latest snapshot. This is the downside that Corinna mentioned. What needs to be done to /var/empty to fix this? Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
Ken Brown writes: I just checked /var/log/sshd.log. (I hadn't thought to do that before.) The last message in it is, /var/empty must be owned by root and not group or world-writable. So the problem seems to be that /var/empty appears to sshd to be group writable under the latest snapshot. This is the downside that Corinna mentioned. What needs to be done to /var/empty to fix this? You need to remove all ACL from the directory, either with setfacl or (from cmd) icacls or even the security tab in Explorer. Most likely these are inherited from the parent directory of the Cygwin installation. Regards, Achim. -- +[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]+ Factory and User Sound Singles for Waldorf rackAttack: http://Synth.Stromeko.net/Downloads.html#WaldorfSounds -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
Greetings, Ken Brown! With the latest snapshot I can't start the sshd service. The Application Log just says, `sshd' service stopped, exit status:255. The problem doesn't occur with the 2014-08-27 snapshot. I guess this has something to do with the new permissions on various files, but I'm not sure which ones. Off the top of my head for the standard installation: /etc/ssh* /var/empty /var/log/sshd When you try to debug the sshd, IIR these are the files that must be chown'ed to the admin user that runs sshd from the terminal. Running in debug mode (either from the terminal or via sshd_config) should produce messages which file or directory sshd is choking on. I just checked /var/log/sshd.log. (I hadn't thought to do that before.) The last message in it is, /var/empty must be owned by root and not group or world-writable. So the problem seems to be that /var/empty appears to sshd to be group writable under the latest snapshot. This is the downside that Corinna mentioned. What needs to be done to /var/empty to fix this? I think, setting ACL that will directly translate to -rwx-- without any + should help. I'm in the middle of transfer to Win7/64, can't test anything right now. -- WBR, Andrey Repin (anrdae...@yandex.ru) 29.08.2014, 23:56 Sorry for my terrible english... -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 8/29/2014 4:00 PM, Achim Gratz wrote: Ken Brown writes: I just checked /var/log/sshd.log. (I hadn't thought to do that before.) The last message in it is, /var/empty must be owned by root and not group or world-writable. So the problem seems to be that /var/empty appears to sshd to be group writable under the latest snapshot. This is the downside that Corinna mentioned. What needs to be done to /var/empty to fix this? You need to remove all ACL from the directory, either with setfacl or (from cmd) icacls or even the security tab in Explorer. Most likely these are inherited from the parent directory of the Cygwin installation. The ACLs aren't inherited. They're explicitly set by ssh-host-config: if ! /usr/bin/setfacl -m u:system:rwx ${LOCALSTATEDIR}/empty /dev/null 21 then csih_warning Can't set extended permissions on ${LOCALSTATEDIR}/empty! let ++warning_cnt fi This must be done for a reason, but I don't know what it is. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Aug 29 15:36, Ken Brown wrote: On 8/29/2014 3:23 PM, Achim Gratz wrote: Ken Brown writes: With the latest snapshot I can't start the sshd service. The Application Log just says, `sshd' service stopped, exit status:255. The problem doesn't occur with the 2014-08-27 snapshot. I guess this has something to do with the new permissions on various files, but I'm not sure which ones. Off the top of my head for the standard installation: /etc/ssh* /var/empty /var/log/sshd When you try to debug the sshd, IIR these are the files that must be chown'ed to the admin user that runs sshd from the terminal. Running in debug mode (either from the terminal or via sshd_config) should produce messages which file or directory sshd is choking on. I just checked /var/log/sshd.log. (I hadn't thought to do that before.) The last message in it is, /var/empty must be owned by root and not group or world-writable. So the problem seems to be that /var/empty appears to sshd to be group writable under the latest snapshot. This is the downside that Corinna mentioned. What needs to be done to /var/empty to fix this? What needs to be done is to fix the ssh-host-config script. It adds an ACE for SYSTEM on /var/empty, /etc, and /var/log for no apparent reason. I just sent a patch upstream which removes the code trying to generate /etc and /var/log entirely (done by setup.exe) and which drops adding a SYSTEM ACE to /var/empty. A temporary workaround is either to remove the SYSTEM ACE: $ setfacl -d g:18: /var/empty or to change /etc/sshd_config not to use privilege separation: UsePrivilegeSeparation no However, this is obviously a problem for all existing installations. OpenSSH 6.7p1 will be released pretty soon. I will add a postinstall script which removes the SYSTEM ACE from /var/empty at installation time. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpvDRoiP3Mx5.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
Greetings, Corinna Vinschen! What needs to be done is to fix the ssh-host-config script. It adds an ACE for SYSTEM on /var/empty, /etc, and /var/log for no apparent reason. If it's not apparent: you can actually prevent system from accessing the file by removing access from SYSTEM user to the file. Windows ACL's are THAT nasty. [C:\WINDOWS\system32]$ cacls.exe cmd.exe C:\WINDOWS\system32\cmd.exe DAEMON2\AnrDaemon:R NT AUTHORITY\SYSTEM:R BUILTIN\Administrators:R BUILTIN\Advanced users:R BUILTIN\Users:R [C:\WINDOWS\system32]$ Ergo, Windows now unable to delete/overwrite cmd.exe... (It's not actually a cmd.exe - it's a renamed 4nt.exe. Windows SFC always trying to repair it. And always fail. So do windowsupdate.) -- WBR, Andrey Repin (anrdae...@yandex.ru) 30.08.2014, 03:25 Sorry for my terrible english... -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
Achim Gratz Stromeko at NexGo.DE writes: Let's get one issue out of the way first that may be a Cygwin bug: on Linux a file with all access removed via standard POSIX modes and then access granted via ACL would place the mask bits of the ACL (the maximum permission that can be granted via ACL, usually rwx) into the group portion of the POSIX modes (ls --color would even show these in different color if you didn't happen to notice the +). That doesn't happen on Cygwin and it seems that some software optimizes based on that information to not traverse the ACL when there's no chance to ever get a permission granted. This behaviour is mandated by POSIX IIUC (and it is what Linux is doing) so unless Cygwin explicitly follows a different ACL model, I think that should be taken care of before diving into this further. As a concrete example, in the following the directory x86 shows up on Cygwin as follows: getfacl x86 # file: x86 # owner: otheruser # group: Domain Users user::--- group::--- group:FilerAdmins:rwx group:ShareOwners:rwx mask:rwx other:--- default:user::--- default:group::--- default:group:FilerAdmins:rwx default:group:ShareOwners:rwx default:mask:rwx default:other:--- ls -ld x86 d-+ 1 otheruser Domain Users 0 Jun 23 14:09 x86/ Under Linux in the same situation you'd get ls -ld x86 d---rwx---+ 1 otheruser Domain Users 0 Jun 23 14:09 x86/ instead (i.e. the mask bits shown in the group portion of the standard mode flags). If the file was owned by your uid, then you'd get indeed ls -ld x86 d-+ 1 myself Domain Users 0 Jun 23 14:09 x86/ but you'd also really have no permissions. On Windows you do have permission to the file in that situation since the POSIX part of the ACL (particularly the user::--- part that revokes all access for the file owner) are faked by Cygwin and not taken into account when the file gets finally accessed: icacls x86 x86 DOM\FilerAdmins:(I)(OI)(IO)(F) DOM\FilerAdmins:(I)(CI)(F) DOM\ShareOwners:(I)(OI)(IO)(M) DOM\ShareOwners:(I)(CI)(M) If getting at the correct mask is too expensive, simply always faking an rwx mask might actually be better than what we have now, since once the ACL are fully processed you'll get the correct permissions anyway. Regards, Achim. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Aug 28 07:25, Achim Gratz wrote: Achim Gratz Stromeko at NexGo.DE writes: Let's get one issue out of the way first that may be a Cygwin bug: on Linux a file with all access removed via standard POSIX modes and then access granted via ACL would place the mask bits of the ACL (the maximum permission that can be granted via ACL, usually rwx) into the group portion of the POSIX modes (ls --color would even show these in different color if you didn't happen to notice the +). That doesn't happen on Cygwin and it seems that some software optimizes based on that information to not traverse the ACL when there's no chance to ever get a permission granted. This behaviour is mandated by POSIX IIUC (and it is what Linux is doing) so unless Cygwin explicitly follows a different ACL model, I think that should be taken care of before diving into this further. As a concrete example, in the following the directory x86 shows up on Cygwin as follows: getfacl x86 # file: x86 # owner: otheruser # group: Domain Users user::--- group::--- group:FilerAdmins:rwx group:ShareOwners:rwx mask:rwx other:--- default:user::--- default:group::--- default:group:FilerAdmins:rwx default:group:ShareOwners:rwx default:mask:rwx default:other:--- ls -ld x86 d-+ 1 otheruser Domain Users 0 Jun 23 14:09 x86/ Under Linux in the same situation you'd get ls -ld x86 d---rwx---+ 1 otheruser Domain Users 0 Jun 23 14:09 x86/ instead (i.e. the mask bits shown in the group portion of the standard mode flags). If the file was owned by your uid, then you'd get indeed ls -ld x86 d-+ 1 myself Domain Users 0 Jun 23 14:09 x86/ but you'd also really have no permissions. On Windows you do have permission to the file in that situation since the POSIX part of the ACL (particularly the user::--- part that revokes all access for the file owner) are faked by Cygwin and not taken into account when the file gets finally accessed: icacls x86 x86 DOM\FilerAdmins:(I)(OI)(IO)(F) DOM\FilerAdmins:(I)(CI)(F) DOM\ShareOwners:(I)(OI)(IO)(M) DOM\ShareOwners:(I)(CI)(M) If getting at the correct mask is too expensive, simply always faking an rwx mask might actually be better than what we have now, since once the ACL are fully processed you'll get the correct permissions anyway. Handling of the CLASS object (aka mask) has never been fully implemented, especially because there's no such thing as a CLASS object in a Windows ACL. I guess it will always be some fake, but, yes, we can try to change stat() so that the st_mode group permissions reflect the or'ed bits of all permissions given to non-primary users and groups. Same in acl(2). That might be useful. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpqbboDPSg8s.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
On Aug 28 01:02, Andrey Repin wrote: Greetings, Corinna Vinschen! faccessat/access/eaccess don't try to be intelligent by themselves. Rather they just call a Windows function if the filesystem is mounted with acl mount flags: - Fetch file's security descriptor - Create process impersonation token. - Call NtAccessCheck - If NtAccessCheck returns not allowed, check for backup/restore privileges via NtPrivilegeCheck. In noacl mode or on filesystems not supporting ACLs, access uses the st_mode flags from stat() to figure out the permissions. I'm not very much into Cygwin internals, so beg pardon if I got something wrong here... But reading this makes my internal sanity checker go into red alarm state. Here's why: When Cygwin mount a filesystem with 'acl' flag set, it mangles current ACL's set on the files to produce something that can be understood as basic POSIX 'ugly'...erm, 'ugo' permissions. Behavior least desirable in many cases. You say, it will then use native functions to determine access rights... No wonder they will work, since you already mangled them to suit your needs. When Cygwin mount a filesystem with 'noacl' flag, thus let OS use true ACL's (a feature Windows implemented surprisingly fast, while *NIX was only proposing it... for far too long without any result in sight), it is then followed by some magic and guesswork on Cygwin's end to find out access rights. If you ask me, something isn't quite right here. Or something is missing. It's what acl means on Cygwin. acl means that Windowsd ACLs are used and permissions are handled and converted to and from POSIX permissions. noacl means, Cygwin ignores all ACLs and fakes ownership and POSIX permissions only based only on filetype and DOS R/O attribute, as it has to on filesystems not supporting ACLs, like FAT/FAT32. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgplhDoMgJaB_.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
On 08/27/2014 09:15 AM, Achim Gratz wrote: Ken Brown kbrown at cornell.edu writes: Achim, could you send me a recipe for reproducing the problem so that I can test further? Please be very detailed; I have no experience with ACLs. Let's get one issue out of the way first that may be a Cygwin bug: on Linux a file with all access removed via standard POSIX modes and then access granted via ACL would place the mask bits of the ACL (the maximum permission that can be granted via ACL, usually rwx) into the group portion of the POSIX modes (ls --color would even show these in different color if you didn't happen to notice the +). That doesn't happen on Cygwin and it seems that some software optimizes based on that information to not traverse the ACL when there's no chance to ever get a permission granted. This behaviour is mandated by POSIX IIUC (and it is what Linux is doing) so unless Cygwin explicitly follows a different ACL model, I think that should be taken care of before diving into this further. Sadly, POSIX doesn't mandate ACLs (in part because there are so many differing implementations that it was impossible to standardize a common ground). So code using ACLs across multiple platforms has to deal with quirks, and although gnulib attempts to do so, it may be a shortcoming in the gnulib wrappers that emacs is using. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: (call-process ...) hangs in emacs
On Aug 28 11:55, Corinna Vinschen wrote: On Aug 28 07:25, Achim Gratz wrote: As a concrete example, in the following the directory x86 shows up on Cygwin as follows: getfacl x86 # file: x86 # owner: otheruser # group: Domain Users user::--- group::--- group:FilerAdmins:rwx group:ShareOwners:rwx mask:rwx other:--- default:user::--- default:group::--- default:group:FilerAdmins:rwx default:group:ShareOwners:rwx default:mask:rwx default:other:--- ls -ld x86 d-+ 1 otheruser Domain Users 0 Jun 23 14:09 x86/ Under Linux in the same situation you'd get ls -ld x86 d---rwx---+ 1 otheruser Domain Users 0 Jun 23 14:09 x86/ instead (i.e. the mask bits shown in the group portion of the standard mode flags). If the file was owned by your uid, then you'd get indeed ls -ld x86 d-+ 1 myself Domain Users 0 Jun 23 14:09 x86/ but you'd also really have no permissions. On Windows you do have permission to the file in that situation since the POSIX part of the ACL (particularly the user::--- part that revokes all access for the file owner) are faked by Cygwin and not taken into account when the file gets finally accessed: icacls x86 x86 DOM\FilerAdmins:(I)(OI)(IO)(F) DOM\FilerAdmins:(I)(CI)(F) DOM\ShareOwners:(I)(OI)(IO)(M) DOM\ShareOwners:(I)(CI)(M) If getting at the correct mask is too expensive, simply always faking an rwx mask might actually be better than what we have now, since once the ACL are fully processed you'll get the correct permissions anyway. Handling of the CLASS object (aka mask) has never been fully implemented, especially because there's no such thing as a CLASS object in a Windows ACL. I guess it will always be some fake, but, yes, we can try to change stat() so that the st_mode group permissions reflect the or'ed bits of all permissions given to non-primary users and groups. Same in acl(2). That might be useful. I implemented this preliminary and uploaded a snapshot to https://cygwin.com/snapshots/ Preliminary, because this change introduces an API change: Since the CLASS_OBJ and DEF_CLASS_OBJ entries only exist if secondary user and group (default) entries exist, that means the default permission entry only consists of 3 ACEs. This in turn means, the constant MIN_ACL_ENTRIES changed from 4 to 3. This might negatively affect coreutils, at least `ls', even though in my local testing it looked all normal. Please test. Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpfJAadC7Sk0.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
Greetings, Corinna Vinschen! faccessat/access/eaccess don't try to be intelligent by themselves. Rather they just call a Windows function if the filesystem is mounted with acl mount flags: - Fetch file's security descriptor - Create process impersonation token. - Call NtAccessCheck - If NtAccessCheck returns not allowed, check for backup/restore privileges via NtPrivilegeCheck. In noacl mode or on filesystems not supporting ACLs, access uses the st_mode flags from stat() to figure out the permissions. I'm not very much into Cygwin internals, so beg pardon if I got something wrong here... But reading this makes my internal sanity checker go into red alarm state. Here's why: When Cygwin mount a filesystem with 'acl' flag set, it mangles current ACL's set on the files to produce something that can be understood as basic POSIX 'ugly'...erm, 'ugo' permissions. Behavior least desirable in many cases. You say, it will then use native functions to determine access rights... No wonder they will work, since you already mangled them to suit your needs. When Cygwin mount a filesystem with 'noacl' flag, thus let OS use true ACL's (a feature Windows implemented surprisingly fast, while *NIX was only proposing it... for far too long without any result in sight), it is then followed by some magic and guesswork on Cygwin's end to find out access rights. If you ask me, something isn't quite right here. Or something is missing. It's what acl means on Cygwin. acl means that Windowsd ACLs are used and permissions are handled and converted to and from POSIX permissions. noacl means, Cygwin ignores all ACLs and fakes ownership and POSIX permissions only based only on filetype and DOS R/O attribute, as it has to on filesystems not supporting ACLs, like FAT/FAT32. Got it. It seems, Cygwin need a middle groung between these two for cases, where FS support access control, but don't want to be mangled. -- WBR, Andrey Repin (anrdae...@yandex.ru) 28.08.2014, 17:22 Sorry for my terrible english... -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Aug 28 17:23, Andrey Repin wrote: It's what acl means on Cygwin. acl means that Windowsd ACLs are used and permissions are handled and converted to and from POSIX permissions. noacl means, Cygwin ignores all ACLs and fakes ownership and POSIX permissions only based only on filetype and DOS R/O attribute, as it has to on filesystems not supporting ACLs, like FAT/FAT32. Got it. It seems, Cygwin need a middle groung between these two for cases, where FS support access control, but don't want to be mangled. I'm certainly not going to introduce another mount mode. What Cygwin could do is to perform ACL-based access checks independently of the acl/noacl mount mode on FSes supporting ACLs. However, if you want ACLs, why not use the acl mount mode in the first place? Still, it *might* makes sense in some scenarios, even if the results of stat(2)/acl(2) may differ surprisingly from what access(2) returns. We can also simply try it out. A patch to enable this behaviour is dead-simple. Here's the prerequisite: Would more than one person want that *and* be willing to give this a *thorough* testing? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgp7tQxst6ic0.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
Corinna Vinschen corinna-cygwin at cygwin.com writes: I implemented this preliminary and uploaded a snapshot to https://cygwin.com/snapshots/ Oh, great! I'll bump my machine to that snapshot tomorrow. Since I can now compile my own DLL, would that be a good time to ask what could be done for that link count problem on NetApp volumes? I guess the first step would be to patch out the forced ihash mount option? Regards, Achim. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Aug 28 15:04, Achim Gratz wrote: Corinna Vinschen corinna-cygwin at cygwin.com writes: I implemented this preliminary and uploaded a snapshot to https://cygwin.com/snapshots/ Oh, great! I'll bump my machine to that snapshot tomorrow. Since I can now compile my own DLL, would that be a good time to ask what could be done for that link count problem on NetApp volumes? I guess the first step would be to patch out the forced ihash mount option? It's a bug in Netapp which you won't be able to workaround in Cygwin. Netapp inode numbers are unstable. This is non-trivial. Hardlinks with different inode numbers will be a major PITA. Unless, of course, this has been fixed. But then again there's no way for Cygwin to know whether it's acessing a fixed Netapp or an older non-fixed one. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpuQHgRriWr1.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
Corinna Vinschen corinna-cygwin at cygwin.com writes: Since the CLASS_OBJ and DEF_CLASS_OBJ entries only exist if secondary user and group (default) entries exist, that means the default permission entry only consists of 3 ACEs. This in turn means, the constant MIN_ACL_ENTRIES changed from 4 to 3. This might negatively affect coreutils, at least `ls', even though in my local testing it looked all normal. Please test. This fixes the read-only problem in Emacs (so that hunch was correct). Perl still doesn't play, but I think the 5.18 version should get it correct. Will need to switch a test installation over for that, though. Thanks! Regards, Achim. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
ACL behavior in Cygwin // Re: (call-process ...) hangs in emacs
Greetings, Corinna Vinschen! It's what acl means on Cygwin. acl means that Windowsd ACLs are used and permissions are handled and converted to and from POSIX permissions. noacl means, Cygwin ignores all ACLs and fakes ownership and POSIX permissions only based only on filetype and DOS R/O attribute, as it has to on filesystems not supporting ACLs, like FAT/FAT32. Got it. It seems, Cygwin need a middle groung between these two for cases, where FS support access control, but don't want to be mangled. I'm certainly not going to introduce another mount mode. I didn't said it has to be mount mode... besides, it doesn't make sense to implement YA mode to do what is already done, just a little different. What Cygwin could do is to perform ACL-based access checks independently of the acl/noacl mount mode on FSes supporting ACLs. However, if you want ACLs, why not use the acl mount mode in the first place? ACL inheritance, mostly. POSIX'ized permissions break inheritance on newly created files, at times making these files inaccessible to native applications, even though inheritance rules would allow it otherwise. Still, it *might* makes sense in some scenarios, even if the results of stat(2)/acl(2) may differ surprisingly from what access(2) returns. We can also simply try it out. A patch to enable this behaviour is dead-simple. Here's the prerequisite: Would more than one person want that *and* be willing to give this a *thorough* testing? I'd like to hear out expected behavior from this patch first. I might be able to do some testing, but not in the nearest month, I'm afraid. The list of things to do grew out of control, and I'm trying hard to make it shorter. If there's no other interested parties, let's put it on ice, until you come back from your vacation? -- WBR, Andrey Repin (anrdae...@yandex.ru) 28.08.2014, 20:37 Sorry for my terrible english... -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: ACL behavior in Cygwin // Re: (call-process ...) hangs in emacs
Andrey Repin writes: What Cygwin could do is to perform ACL-based access checks independently of the acl/noacl mount mode on FSes supporting ACLs. However, if you want ACLs, why not use the acl mount mode in the first place? ACL inheritance, mostly. POSIX'ized permissions break inheritance on newly created files, at times making these files inaccessible to native applications, even though inheritance rules would allow it otherwise. You can prevent this from happening if you forbid users to change the ACL and enforce inheritance. That's the reason I can't give those files sensible POSIX permissions since they'd need to be translated into ACL which I can't write. All our filers are set up that way. No I don't think this is a good idea, but I guess there'd been one support call too many with a share that somebody made inaccessible by fiddling with the ACL. Regards, Achim. -- +[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]+ DIY Stuff: http://Synth.Stromeko.net/DIY.html -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
Corinna Vinschen writes: Here's the prerequisite: Would more than one person want that *and* be willing to give this a *thorough* testing? That really becomes an issue only if you have to use external shares that are set up in peculiar ways and AD integration. The number of people that fall into that category is… small, let's say. For the ACL part it would be possible to set up a test bed on a local machine, but my guess for the number of people doing that just for the fun of testing is that it's likely to be an even smaller number. Regards, Achim. -- +[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]+ SD adaptations for KORG EX-800 and Poly-800MkII V0.9: http://Synth.Stromeko.net/Downloads.html#KorgSDada -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
Greetings, Achim Gratz! Here's the prerequisite: Would more than one person want that *and* be willing to give this a *thorough* testing? That really becomes an issue only if you have to use external shares that are set up in peculiar ways and AD integration. I've managed to do that on a local machine that is not a member of any domain. Using noacl mount flag ever since, to avoid similar things from happening. The number of people that fall into that category is… small, let's say. For the ACL part it would be possible to set up a test bed on a local machine, but my guess for the number of people doing that just for the fun of testing is that it's likely to be an even smaller number. -- WBR, Andrey Repin (anrdae...@yandex.ru) 28.08.2014, 23:40 Sorry for my terrible english...
Re: (call-process ...) hangs in emacs
On Aug 26 18:12, Ken Brown wrote: On 8/26/2014 2:55 PM, Achim Gratz wrote: Ken Brown writes: It looks like my idea is going to work, but it needs testing to make sure I've implemented it correctly. If anyone is willing to test it, you can download emacs-24.3.93-2 from my personal Cygwin repository: http://sanibeltranquility.com/cygwin/ Instructions can be found at that URL. I've switched to this version today. I've noticed that two bugs are still present at least in the emacs-w32 version: 1) When showing the Windows desktop with Win-D and then restoring it (including Emacs) with Win-D again, the cursor becomes a hollow rectangle that doesn't blink. To get the normal cursor behaviour back you have to minimize and restore the Emacs window in the normal way. This one has nothing to do with emacs. I see the same thing in mintty, with just a shell prompt (bash in my case). 2) Files that have no POSIX permissions (filemode ) and where access is granted via ACL only get always opened as read-only and you have to C-x C-q them before saving. It appears that this is Cygwin specific since on Linux the same version copes with that situation correctly (however, the mask bits in the ACL get displayed in the group portion of the file mode, which I've never seen happen on Cygwin, so this may be something that Cygwin needs to do -- maybe that'd even solve the problems that Perl has in the same situation). AFAICT, emacs decides whether the file is writable via the system call faccessat. (See the function 'check_writable' in src/fileio.c.) This is not Cygwin specific. So faccessat must be returning failure in the scenario you described. I don't know if that's a Cygwin bug or not. faccessat/access/eaccess don't try to be intelligent by themselves. Rather they just call a Windows function if the filesystem is mounted with acl mount flags: - Fetch file's security descriptor - Create process impersonation token. - Call NtAccessCheck - If NtAccessCheck returns not allowed, check for backup/restore privileges via NtPrivilegeCheck. In noacl mode or on filesystems not supporting ACLs, access uses the st_mode flags from stat() to figure out the permissions. The relevant parts of the implementation are the check_file_access and subsequently called check_access functions in security.cc. If you see a bug there, please let me know. BTW, emacs on Cygwin doesn't directly check ACLs, because the relevant configure test fails. Works for vim. Does the Emacs configure test only check for POSIX ACL functions and not for Solaris ACL functions, by any chance? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpRL0KS725JD.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
On 8/27/2014 4:42 AM, Corinna Vinschen wrote: On Aug 26 18:12, Ken Brown wrote: On 8/26/2014 2:55 PM, Achim Gratz wrote: 2) Files that have no POSIX permissions (filemode ) and where access is granted via ACL only get always opened as read-only and you have to C-x C-q them before saving. It appears that this is Cygwin specific since on Linux the same version copes with that situation correctly (however, the mask bits in the ACL get displayed in the group portion of the file mode, which I've never seen happen on Cygwin, so this may be something that Cygwin needs to do -- maybe that'd even solve the problems that Perl has in the same situation). AFAICT, emacs decides whether the file is writable via the system call faccessat. (See the function 'check_writable' in src/fileio.c.) This is not Cygwin specific. So faccessat must be returning failure in the scenario you described. I don't know if that's a Cygwin bug or not. faccessat/access/eaccess don't try to be intelligent by themselves. Rather they just call a Windows function if the filesystem is mounted with acl mount flags: - Fetch file's security descriptor - Create process impersonation token. - Call NtAccessCheck - If NtAccessCheck returns not allowed, check for backup/restore privileges via NtPrivilegeCheck. In noacl mode or on filesystems not supporting ACLs, access uses the st_mode flags from stat() to figure out the permissions. The relevant parts of the implementation are the check_file_access and subsequently called check_access functions in security.cc. If you see a bug there, please let me know. Achim, could you send me a recipe for reproducing the problem so that I can test further? Please be very detailed; I have no experience with ACLs. BTW, emacs on Cygwin doesn't directly check ACLs, because the relevant configure test fails. Works for vim. Does the Emacs configure test only check for POSIX ACL functions and not for Solaris ACL functions, by any chance? I spoke too soon. It does detect that Cygwin has certain ACL functions. But the feature that Achim was asking about seems to get used only on systems that have acl_get_file. I guess that's a POSIX ACL function. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Aug 27 08:52, Ken Brown wrote: On 8/27/2014 4:42 AM, Corinna Vinschen wrote: On Aug 26 18:12, Ken Brown wrote: On 8/26/2014 2:55 PM, Achim Gratz wrote: 2) Files that have no POSIX permissions (filemode ) and where access is granted via ACL only get always opened as read-only and you have to C-x C-q them before saving. It appears that this is Cygwin specific since on Linux the same version copes with that situation correctly (however, the mask bits in the ACL get displayed in the group portion of the file mode, which I've never seen happen on Cygwin, so this may be something that Cygwin needs to do -- maybe that'd even solve the problems that Perl has in the same situation). AFAICT, emacs decides whether the file is writable via the system call faccessat. (See the function 'check_writable' in src/fileio.c.) This is not Cygwin specific. So faccessat must be returning failure in the scenario you described. I don't know if that's a Cygwin bug or not. faccessat/access/eaccess don't try to be intelligent by themselves. Rather they just call a Windows function if the filesystem is mounted with acl mount flags: - Fetch file's security descriptor - Create process impersonation token. - Call NtAccessCheck - If NtAccessCheck returns not allowed, check for backup/restore privileges via NtPrivilegeCheck. In noacl mode or on filesystems not supporting ACLs, access uses the st_mode flags from stat() to figure out the permissions. The relevant parts of the implementation are the check_file_access and subsequently called check_access functions in security.cc. If you see a bug there, please let me know. Achim, could you send me a recipe for reproducing the problem so that I can test further? Please be very detailed; I have no experience with ACLs. I'd be interested in a way to reproduce this as well. On *real* local or remote NTFS, if possible. BTW, emacs on Cygwin doesn't directly check ACLs, because the relevant configure test fails. Works for vim. Does the Emacs configure test only check for POSIX ACL functions and not for Solaris ACL functions, by any chance? I spoke too soon. It does detect that Cygwin has certain ACL functions. But the feature that Achim was asking about seems to get used only on systems that have acl_get_file. I guess that's a POSIX ACL function. Yes, it is. It's pretty much the same as the Solaris/Cygwin function int acl (const char *path, int cmd, int nentries, aclent_t *aclbufp); See http://docs.oracle.com/cd/E23823_01/html/816-5167/acl-2.html for a description. We're only supporting the aclent_t type (funny, isn't it?) which is pretty much based on POSIX ACLs and which is defined in /usr/include/cygwin/acl.h. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgp2AFk7MA9Gp.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
On 08/27/2014 07:47 AM, Corinna Vinschen wrote: Works for vim. Does the Emacs configure test only check for POSIX ACL functions and not for Solaris ACL functions, by any chance? I spoke too soon. It does detect that Cygwin has certain ACL functions. But the feature that Achim was asking about seems to get used only on systems that have acl_get_file. I guess that's a POSIX ACL function. Yes, it is. It's pretty much the same as the Solaris/Cygwin function int acl (const char *path, int cmd, int nentries, aclent_t *aclbufp); See http://docs.oracle.com/cd/E23823_01/html/816-5167/acl-2.html for a description. We're only supporting the aclent_t type (funny, isn't it?) which is pretty much based on POSIX ACLs and which is defined in /usr/include/cygwin/acl.h. Hmm; isn't emacs using gnulib's acl wrappers? (Paul Eggert would know; he's the developer that's done the most recent work on Gnulib ACL support as well as on emacs). In that case, shouldn't the behavior be the same as for coreutils, which also uses gnulib? I'm wondering if there are any bugs in the gnulib acl wrappers which might affect more than just emacs, and/or where a cygwin patch would make the gnulib wrappers happier. Sadly, I'm also not the best expert on ACLs. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: (call-process ...) hangs in emacs
Ken Brown kbrown at cornell.edu writes: Achim, could you send me a recipe for reproducing the problem so that I can test further? Please be very detailed; I have no experience with ACLs. Let's get one issue out of the way first that may be a Cygwin bug: on Linux a file with all access removed via standard POSIX modes and then access granted via ACL would place the mask bits of the ACL (the maximum permission that can be granted via ACL, usually rwx) into the group portion of the POSIX modes (ls --color would even show these in different color if you didn't happen to notice the +). That doesn't happen on Cygwin and it seems that some software optimizes based on that information to not traverse the ACL when there's no chance to ever get a permission granted. This behaviour is mandated by POSIX IIUC (and it is what Linux is doing) so unless Cygwin explicitly follows a different ACL model, I think that should be taken care of before diving into this further. Regards, Achim. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 8/27/2014 10:40 AM, Eric Blake wrote: On 08/27/2014 07:47 AM, Corinna Vinschen wrote: Works for vim. Does the Emacs configure test only check for POSIX ACL functions and not for Solaris ACL functions, by any chance? I spoke too soon. It does detect that Cygwin has certain ACL functions. But the feature that Achim was asking about seems to get used only on systems that have acl_get_file. I guess that's a POSIX ACL function. Yes, it is. It's pretty much the same as the Solaris/Cygwin function int acl (const char *path, int cmd, int nentries, aclent_t *aclbufp); See http://docs.oracle.com/cd/E23823_01/html/816-5167/acl-2.html for a description. We're only supporting the aclent_t type (funny, isn't it?) which is pretty much based on POSIX ACLs and which is defined in /usr/include/cygwin/acl.h. Hmm; isn't emacs using gnulib's acl wrappers? (Paul Eggert would know; he's the developer that's done the most recent work on Gnulib ACL support as well as on emacs). In that case, shouldn't the behavior be the same as for coreutils, which also uses gnulib? I'm wondering if there are any bugs in the gnulib acl wrappers which might affect more than just emacs, and/or where a cygwin patch would make the gnulib wrappers happier. Sadly, I'm also not the best expert on ACLs. I'll follow up on the emacs-devel list. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
Greetings, Corinna Vinschen! faccessat/access/eaccess don't try to be intelligent by themselves. Rather they just call a Windows function if the filesystem is mounted with acl mount flags: - Fetch file's security descriptor - Create process impersonation token. - Call NtAccessCheck - If NtAccessCheck returns not allowed, check for backup/restore privileges via NtPrivilegeCheck. In noacl mode or on filesystems not supporting ACLs, access uses the st_mode flags from stat() to figure out the permissions. I'm not very much into Cygwin internals, so beg pardon if I got something wrong here... But reading this makes my internal sanity checker go into red alarm state. Here's why: When Cygwin mount a filesystem with 'acl' flag set, it mangles current ACL's set on the files to produce something that can be understood as basic POSIX 'ugly'...erm, 'ugo' permissions. Behavior least desirable in many cases. You say, it will then use native functions to determine access rights... No wonder they will work, since you already mangled them to suit your needs. When Cygwin mount a filesystem with 'noacl' flag, thus let OS use true ACL's (a feature Windows implemented surprisingly fast, while *NIX was only proposing it... for far too long without any result in sight), it is then followed by some magic and guesswork on Cygwin's end to find out access rights. If you ask me, something isn't quite right here. Or something is missing. The relevant parts of the implementation are the check_file_access and subsequently called check_access functions in security.cc. If you see a bug there, please let me know. BTW, emacs on Cygwin doesn't directly check ACLs, because the relevant configure test fails. Works for vim. Does the Emacs configure test only check for POSIX ACL functions and not for Solaris ACL functions, by any chance? -- WBR, Andrey Repin (anrdae...@yandex.ru) 28.08.2014, 00:48 Sorry for my terrible english... -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Mon, Aug 25, 2014 at 8:00 PM, Ken Brown kbr...@cornell.edu wrote: It looks like my idea is going to work, but it needs testing to make sure I've implemented it correctly. If anyone is willing to test it, you can download emacs-24.3.93-2 from my personal Cygwin repository: I've downloaded it - no problems so far but I'll run with it from now on and let know you if anything untoward happens. Pete -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
Ken Brown writes: It looks like my idea is going to work, but it needs testing to make sure I've implemented it correctly. If anyone is willing to test it, you can download emacs-24.3.93-2 from my personal Cygwin repository: http://sanibeltranquility.com/cygwin/ Instructions can be found at that URL. I've switched to this version today. I've noticed that two bugs are still present at least in the emacs-w32 version: 1) When showing the Windows desktop with Win-D and then restoring it (including Emacs) with Win-D again, the cursor becomes a hollow rectangle that doesn't blink. To get the normal cursor behaviour back you have to minimize and restore the Emacs window in the normal way. 2) Files that have no POSIX permissions (filemode ) and where access is granted via ACL only get always opened as read-only and you have to C-x C-q them before saving. It appears that this is Cygwin specific since on Linux the same version copes with that situation correctly (however, the mask bits in the ACL get displayed in the group portion of the file mode, which I've never seen happen on Cygwin, so this may be something that Cygwin needs to do -- maybe that'd even solve the problems that Perl has in the same situation). Regards, Achim. -- +[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]+ Factory and User Sound Singles for Waldorf Blofeld: http://Synth.Stromeko.net/Downloads.html#WaldorfSounds -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 8/26/2014 2:55 PM, Achim Gratz wrote: Ken Brown writes: It looks like my idea is going to work, but it needs testing to make sure I've implemented it correctly. If anyone is willing to test it, you can download emacs-24.3.93-2 from my personal Cygwin repository: http://sanibeltranquility.com/cygwin/ Instructions can be found at that URL. I've switched to this version today. I've noticed that two bugs are still present at least in the emacs-w32 version: 1) When showing the Windows desktop with Win-D and then restoring it (including Emacs) with Win-D again, the cursor becomes a hollow rectangle that doesn't blink. To get the normal cursor behaviour back you have to minimize and restore the Emacs window in the normal way. This one has nothing to do with emacs. I see the same thing in mintty, with just a shell prompt (bash in my case). 2) Files that have no POSIX permissions (filemode ) and where access is granted via ACL only get always opened as read-only and you have to C-x C-q them before saving. It appears that this is Cygwin specific since on Linux the same version copes with that situation correctly (however, the mask bits in the ACL get displayed in the group portion of the file mode, which I've never seen happen on Cygwin, so this may be something that Cygwin needs to do -- maybe that'd even solve the problems that Perl has in the same situation). AFAICT, emacs decides whether the file is writable via the system call faccessat. (See the function 'check_writable' in src/fileio.c.) This is not Cygwin specific. So faccessat must be returning failure in the scenario you described. I don't know if that's a Cygwin bug or not. BTW, emacs on Cygwin doesn't directly check ACLs, because the relevant configure test fails. That would explain the ACL display you see on Linux but not on Cygwin. But I think this is a separate matter, not related to the bug you're reporting. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 8/18/2014 8:28 AM, Ken Brown wrote: On 8/8/2014 9:26 AM, Ken Brown wrote: On 8/7/2014 5:42 PM, Eric Blake wrote: On 08/07/2014 12:53 PM, Ken Brown wrote: On 8/7/2014 11:30 AM, Eric Blake wrote: On 08/07/2014 05:51 AM, Ken Brown wrote: I think I found the problem with NORMAL mutexes. emacs calls pthread_atfork after initializing the mutexes, and the resulting 'prepare' handler locks the mutexes. (The parent and child handlers unlock them.) So when emacs calls fork, the mutexes are locked, and shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock. Here's a gdb backtrace showing the sequence of calls: Arguably, that's an upstream bug in emacs. POSIX has declared pthread_atfork to be fundamentally useless; it is broken by design, because you cannot use it for anything that is not async-signal-safe without risking deadlock. And (except for sem_post()), NONE of the standardized locking functions are async-signal-safe. http://austingroupbugs.net/view.php?id=858 That said, it would still be nice to support this, since even though the theory says it is broken, there are still lots of (broken) programs/libraries still trying to use it. So what do you think emacs should do instead of using pthread_atfork? Or is it better to just remove it? I don't know how likely it is that this would cause a problem. The POSIX recommendation is that multithreaded apps limit themselves solely to async-signal-safe functions in the window between fork and exec (or to use pthread_spawn instead of fork/exec). I don't know what emacs is trying to do in that window, but at this point, it's certainly worth reporting it upstream. If you need a pointer to the full list of async-signal-safe functions: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04 and search for The following table defines a set of functions that shall be async-signal-safe. The most common deadlocks when violating async-signal-safety rules look like this in single-threaded programs: function calls malloc() malloc() grabs a non-recursive mutex async signal arrives signal handler called signal handler calls malloc() malloc() can't grab the mutex - deadlock and this counterpart in multithreaded programs: thread1 calls malloc() malloc() grabs a non-recursive mutex thread 2 gains control and calls fork() because of the fork, thread1 no longer exists to release the lock child process calls malloc() malloc() tries to grab mutex, but it is locked with no thread to release it Switching malloc() to a recursive lock may or may not solve the single-threaded deadlock (in that malloc can now obtain the mutex), but it is probably NOT what you want to happen (unless malloc is fully re-entrant, the inner instance will see incomplete data and either be totally clobbered itself, or else totally clobber the outer instance when it returns). So it's GOOD that malloc does NOT use a recursive mutex by default. In the multithreaded case, you are flat out hosed. Switching to a recursive lock does not change the picture - you are still deadlocked waiting on thread1 to release the lock, but thread1 doesn't exist. Thanks for the explanations, Eric. I've filed an emacs bug report: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18222 I've just made a new emacs test release that includes a workaround for this bug. I think I see a way to make emacs use Cygwin's malloc; if this works, it will provide a better fix for the bug. It looks like my idea is going to work, but it needs testing to make sure I've implemented it correctly. If anyone is willing to test it, you can download emacs-24.3.93-2 from my personal Cygwin repository: http://sanibeltranquility.com/cygwin/ Instructions can be found at that URL. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 8/8/2014 9:26 AM, Ken Brown wrote: On 8/7/2014 5:42 PM, Eric Blake wrote: On 08/07/2014 12:53 PM, Ken Brown wrote: On 8/7/2014 11:30 AM, Eric Blake wrote: On 08/07/2014 05:51 AM, Ken Brown wrote: I think I found the problem with NORMAL mutexes. emacs calls pthread_atfork after initializing the mutexes, and the resulting 'prepare' handler locks the mutexes. (The parent and child handlers unlock them.) So when emacs calls fork, the mutexes are locked, and shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock. Here's a gdb backtrace showing the sequence of calls: Arguably, that's an upstream bug in emacs. POSIX has declared pthread_atfork to be fundamentally useless; it is broken by design, because you cannot use it for anything that is not async-signal-safe without risking deadlock. And (except for sem_post()), NONE of the standardized locking functions are async-signal-safe. http://austingroupbugs.net/view.php?id=858 That said, it would still be nice to support this, since even though the theory says it is broken, there are still lots of (broken) programs/libraries still trying to use it. So what do you think emacs should do instead of using pthread_atfork? Or is it better to just remove it? I don't know how likely it is that this would cause a problem. The POSIX recommendation is that multithreaded apps limit themselves solely to async-signal-safe functions in the window between fork and exec (or to use pthread_spawn instead of fork/exec). I don't know what emacs is trying to do in that window, but at this point, it's certainly worth reporting it upstream. If you need a pointer to the full list of async-signal-safe functions: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04 and search for The following table defines a set of functions that shall be async-signal-safe. The most common deadlocks when violating async-signal-safety rules look like this in single-threaded programs: function calls malloc() malloc() grabs a non-recursive mutex async signal arrives signal handler called signal handler calls malloc() malloc() can't grab the mutex - deadlock and this counterpart in multithreaded programs: thread1 calls malloc() malloc() grabs a non-recursive mutex thread 2 gains control and calls fork() because of the fork, thread1 no longer exists to release the lock child process calls malloc() malloc() tries to grab mutex, but it is locked with no thread to release it Switching malloc() to a recursive lock may or may not solve the single-threaded deadlock (in that malloc can now obtain the mutex), but it is probably NOT what you want to happen (unless malloc is fully re-entrant, the inner instance will see incomplete data and either be totally clobbered itself, or else totally clobber the outer instance when it returns). So it's GOOD that malloc does NOT use a recursive mutex by default. In the multithreaded case, you are flat out hosed. Switching to a recursive lock does not change the picture - you are still deadlocked waiting on thread1 to release the lock, but thread1 doesn't exist. Thanks for the explanations, Eric. I've filed an emacs bug report: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18222 I've just made a new emacs test release that includes a workaround for this bug. I think I see a way to make emacs use Cygwin's malloc; if this works, it will provide a better fix for the bug. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Mon, Aug 18, 2014 at 1:28 PM, Ken Brown kbr...@cornell.edu wrote: I've just made a new emacs test release that includes a workaround for this bug. I think I see a way to make emacs use Cygwin's malloc; if this works, it will provide a better fix for the bug. I'd like to give this a try. I've selected Exp mode in setup.exe but I don't see the latest version - do I need to do anything else or is it just a matter of waiting for the mirror to catch up (I am in the UK)? Pete -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 08/18/2014 10:58 AM, Peter Hull wrote: On Mon, Aug 18, 2014 at 1:28 PM, Ken Brown kbr...@cornell.edu wrote: I've just made a new emacs test release that includes a workaround for this bug. I think I see a way to make emacs use Cygwin's malloc; if this works, it will provide a better fix for the bug. I'd like to give this a try. I've selected Exp mode in setup.exe but I don't see the latest version - do I need to do anything else or is it just a matter of waiting for the mirror to catch up (I am in the UK)? Probably the latter. If you don't want to wait, check out some other mirrors. -- Larry _ A: Yes. Q: Are you sure? A: Because it reverses the logical flow of conversation. Q: Why is top posting annoying in email? -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 8/7/2014 5:42 PM, Eric Blake wrote: On 08/07/2014 12:53 PM, Ken Brown wrote: On 8/7/2014 11:30 AM, Eric Blake wrote: On 08/07/2014 05:51 AM, Ken Brown wrote: I think I found the problem with NORMAL mutexes. emacs calls pthread_atfork after initializing the mutexes, and the resulting 'prepare' handler locks the mutexes. (The parent and child handlers unlock them.) So when emacs calls fork, the mutexes are locked, and shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock. Here's a gdb backtrace showing the sequence of calls: Arguably, that's an upstream bug in emacs. POSIX has declared pthread_atfork to be fundamentally useless; it is broken by design, because you cannot use it for anything that is not async-signal-safe without risking deadlock. And (except for sem_post()), NONE of the standardized locking functions are async-signal-safe. http://austingroupbugs.net/view.php?id=858 That said, it would still be nice to support this, since even though the theory says it is broken, there are still lots of (broken) programs/libraries still trying to use it. So what do you think emacs should do instead of using pthread_atfork? Or is it better to just remove it? I don't know how likely it is that this would cause a problem. The POSIX recommendation is that multithreaded apps limit themselves solely to async-signal-safe functions in the window between fork and exec (or to use pthread_spawn instead of fork/exec). I don't know what emacs is trying to do in that window, but at this point, it's certainly worth reporting it upstream. If you need a pointer to the full list of async-signal-safe functions: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04 and search for The following table defines a set of functions that shall be async-signal-safe. The most common deadlocks when violating async-signal-safety rules look like this in single-threaded programs: function calls malloc() malloc() grabs a non-recursive mutex async signal arrives signal handler called signal handler calls malloc() malloc() can't grab the mutex - deadlock and this counterpart in multithreaded programs: thread1 calls malloc() malloc() grabs a non-recursive mutex thread 2 gains control and calls fork() because of the fork, thread1 no longer exists to release the lock child process calls malloc() malloc() tries to grab mutex, but it is locked with no thread to release it Switching malloc() to a recursive lock may or may not solve the single-threaded deadlock (in that malloc can now obtain the mutex), but it is probably NOT what you want to happen (unless malloc is fully re-entrant, the inner instance will see incomplete data and either be totally clobbered itself, or else totally clobber the outer instance when it returns). So it's GOOD that malloc does NOT use a recursive mutex by default. In the multithreaded case, you are flat out hosed. Switching to a recursive lock does not change the picture - you are still deadlocked waiting on thread1 to release the lock, but thread1 doesn't exist. Thanks for the explanations, Eric. I've filed an emacs bug report: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18222 Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
A bug in Emacs? Gosh I thought it was probably just me doing something silly! Thanks for your help everyone in tracking this down. Ken, do you know if it's possible to subscribe to the bug report - I'd be interested in knowing how it pans out. Pete -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 8/8/2014 11:39 AM, Peter Hull wrote: A bug in Emacs? Gosh I thought it was probably just me doing something silly! Thanks for your help everyone in tracking this down. Ken, do you know if it's possible to subscribe to the bug report - I'd be interested in knowing how it pans out. No, I don't think so. (You can subscribe to the bug mailing list, but then you'd get all emacs bug reports.) But I'll CC you the next time I write to the report, and then you'll probably stay in the CC. And you can keep checking back at http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18222. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
Hi Corinna, On 8/5/2014 2:40 PM, Corinna Vinschen wrote: Hi Ken, On Aug 5 13:55, Ken Brown wrote: On 8/5/2014 9:58 AM, Corinna Vinschen wrote: On Aug 5 08:21, Ken Brown wrote: === modified file 'src/gmalloc.c' --- src/gmalloc.c 2014-03-04 19:02:49 + +++ src/gmalloc.c 2014-08-05 01:35:38 + @@ -490,8 +490,8 @@ } #ifdef USE_PTHREAD -pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER; -pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER; +pthread_mutex_t _malloc_mutex; +pthread_mutex_t _aligned_blocks_mutex; int _malloc_thread_enabled_p; static void @@ -526,8 +526,11 @@ initialized mutexes when they are used first. To avoid such a situation, we initialize mutexes here while their use is disabled in malloc etc. */ - pthread_mutex_init (_malloc_mutex, NULL); - pthread_mutex_init (_aligned_blocks_mutex, NULL); + pthread_mutexattr_t attr1, attr2; + pthread_mutexattr_settype (attr1, PTHREAD_MUTEX_NORMAL); + pthread_mutexattr_settype (attr2, PTHREAD_MUTEX_NORMAL); + pthread_mutex_init (_malloc_mutex, attr1); + pthread_mutex_init (_aligned_blocks_mutex, attr2); pthread_atfork (malloc_atfork_handler_prepare, malloc_atfork_handler_parent, malloc_atfork_handler_child); The first hunk avoids the double initialization, but I don't understand why the second hunk does anything. Since PTHREAD_MUTEX_NORMAL is now the default, shouldn't calling pthread_mutex_init with NULL second argument be equivalent to my calls to pthread_mutexattr_settype? Does this indicate a Cygwin bug, or am I misunderstanding something? AFAICS you're missing something. Your pthread_mutexattr_t attr1, attr2 are not initialized. They contain some random values, thus they are not good objects. The calls to pthread_mutexattr_settype as well as the calls to pthread_mutex_init will fail with EINVAL, but you won't see it due to missing error handling, and you end up without mutexes at all. If you call pthread_mutexattr_init before calling pthread_mutexattr_settype the situation shoul;d be the same as before. Thanks for catching my mistake. Your earlier suggestion about explicitly setting the pthread_mutexes to be ERRORCHECK mutexes seems to fix the problem (as long as I remember to call pthread_mutexattr_init). The revised patch is attached. I went back to using both the static and dynamic initializations as in the original code, since you said that's harmless. I'm glad to read that, but I'm still a little bit concerned. If your code works with ERRORCHECK mutexes but hangs with NORMAL mutexes, you *might* miss an error case. I'd suggest to tweak the pthread_mutex_lock/unlock calls and log the threads calling it. It looks like the same thread calls malloc from malloc for some reason and it might be interesting to learn how that happens and if it's really ok in this scenario, because it seems to be unexpected by the code. I think I found the problem with NORMAL mutexes. emacs calls pthread_atfork after initializing the mutexes, and the resulting 'prepare' handler locks the mutexes. (The parent and child handlers unlock them.) So when emacs calls fork, the mutexes are locked, and shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock. Here's a gdb backtrace showing the sequence of calls: #0 malloc (size=size@entry=40) at gmalloc.c:919 #1 0x0053fc28 in calloc (nmemb=1, size=40) at gmalloc.c:1510 #2 0x61082074 in calloc (nmemb=1, size=40) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/malloc_wrapper.cc:100 #3 0x61003177 in operator new (s=s@entry=40) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/cxx.cc:23 #4 0x610fc9d3 in pthread_mutex::init (mutex=0x61187d34 reent_data+852, attr=0x0, initializer=0x12) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:3118 #5 0x610fcc13 in pthread_mutex_lock (mutex=0x61187d34 reent_data+852) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:3170 #6 0x611319d8 in __fp_lock (ptr=0x61187cd0 reent_data+752) at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/findfp.c:287 #7 0x61154f75 in _fwalk (ptr=0x28d544, function=function@entry=0x611319c0 __fp_lock) at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/fwalk.c:50 #8 0x61131dea in __fp_lock_all () at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/findfp.c:307 #9 0x610fa45e in pthread::atforkprepare () at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:2031 #10 0x61076292 in lock_pthread (this=synthetic pointer) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/sigproc.h:137 #11 hold_everything (x=synthetic pointer, this=synthetic pointer) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/sigproc.h:169 #12 fork () at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/fork.cc:582 Is there a better way to deal with this issue than using ERRORCHECK mutexes? Ken -- Problem reports: http://cygwin.com/problems.html FAQ:
Re: (call-process ...) hangs in emacs
Hi Ken, On Aug 7 07:51, Ken Brown wrote: Hi Corinna, On 8/5/2014 2:40 PM, Corinna Vinschen wrote: I'm glad to read that, but I'm still a little bit concerned. If your code works with ERRORCHECK mutexes but hangs with NORMAL mutexes, you *might* miss an error case. I'd suggest to tweak the pthread_mutex_lock/unlock calls and log the threads calling it. It looks like the same thread calls malloc from malloc for some reason and it might be interesting to learn how that happens and if it's really ok in this scenario, because it seems to be unexpected by the code. I think I found the problem with NORMAL mutexes. emacs calls pthread_atfork after initializing the mutexes, and the resulting 'prepare' handler locks the mutexes. (The parent and child handlers unlock them.) So when emacs calls fork, the mutexes are locked, and shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock. Here's a gdb backtrace showing the sequence of calls: First question: Why does emacs use its own malloc on Cygwin rather than the system-provided one? Is that really necessary? #0 malloc (size=size@entry=40) at gmalloc.c:919 #1 0x0053fc28 in calloc (nmemb=1, size=40) at gmalloc.c:1510 #2 0x61082074 in calloc (nmemb=1, size=40) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/malloc_wrapper.cc:100 #3 0x61003177 in operator new (s=s@entry=40) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/cxx.cc:23 #4 0x610fc9d3 in pthread_mutex::init (mutex=0x61187d34 reent_data+852, attr=0x0, initializer=0x12) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:3118 #5 0x610fcc13 in pthread_mutex_lock (mutex=0x61187d34 reent_data+852) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:3170 #6 0x611319d8 in __fp_lock (ptr=0x61187cd0 reent_data+752) Right, __fp_lock needs a pthread lock and since this lock hasn't been used yet, it has to create it. The pthread_mutex creation calls the new operator which in turn calls calloc. at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/findfp.c:287 #7 0x61154f75 in _fwalk (ptr=0x28d544, function=function@entry=0x611319c0 __fp_lock) at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/fwalk.c:50 #8 0x61131dea in __fp_lock_all () at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/findfp.c:307 #9 0x610fa45e in pthread::atforkprepare () at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:2031 #10 0x61076292 in lock_pthread (this=synthetic pointer) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/sigproc.h:137 #11 hold_everything (x=synthetic pointer, this=synthetic pointer) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/sigproc.h:169 #12 fork () at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/fork.cc:582 Is there a better way to deal with this issue than using ERRORCHECK mutexes? Did you check if you get an error from pthread_mutex_lock on the second invocation of malloc? Is it EDEADLK? If so, you can ignore the error, but if you want to go ahead without adding lots of error checking you might be better off using a RECURSIVE mutex. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpHtwDijEEXV.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
On 08/07/2014 05:51 AM, Ken Brown wrote: I think I found the problem with NORMAL mutexes. emacs calls pthread_atfork after initializing the mutexes, and the resulting 'prepare' handler locks the mutexes. (The parent and child handlers unlock them.) So when emacs calls fork, the mutexes are locked, and shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock. Here's a gdb backtrace showing the sequence of calls: Arguably, that's an upstream bug in emacs. POSIX has declared pthread_atfork to be fundamentally useless; it is broken by design, because you cannot use it for anything that is not async-signal-safe without risking deadlock. And (except for sem_post()), NONE of the standardized locking functions are async-signal-safe. http://austingroupbugs.net/view.php?id=858 That said, it would still be nice to support this, since even though the theory says it is broken, there are still lots of (broken) programs/libraries still trying to use it. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: (call-process ...) hangs in emacs
On 8/7/2014 8:51 AM, Corinna Vinschen wrote: Hi Ken, On Aug 7 07:51, Ken Brown wrote: Hi Corinna, On 8/5/2014 2:40 PM, Corinna Vinschen wrote: I'm glad to read that, but I'm still a little bit concerned. If your code works with ERRORCHECK mutexes but hangs with NORMAL mutexes, you *might* miss an error case. I'd suggest to tweak the pthread_mutex_lock/unlock calls and log the threads calling it. It looks like the same thread calls malloc from malloc for some reason and it might be interesting to learn how that happens and if it's really ok in this scenario, because it seems to be unexpected by the code. I think I found the problem with NORMAL mutexes. emacs calls pthread_atfork after initializing the mutexes, and the resulting 'prepare' handler locks the mutexes. (The parent and child handlers unlock them.) So when emacs calls fork, the mutexes are locked, and shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock. Here's a gdb backtrace showing the sequence of calls: First question: Why does emacs use its own malloc on Cygwin rather than the system-provided one? Is that really necessary? Cygwin's malloc lacks a few features that emacs requires because of the unusual way emacs is built. The most important such features (or maybe even the only ones) are malloc_set_state and malloc_get_state. #0 malloc (size=size@entry=40) at gmalloc.c:919 #1 0x0053fc28 in calloc (nmemb=1, size=40) at gmalloc.c:1510 #2 0x61082074 in calloc (nmemb=1, size=40) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/malloc_wrapper.cc:100 #3 0x61003177 in operator new (s=s@entry=40) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/cxx.cc:23 #4 0x610fc9d3 in pthread_mutex::init (mutex=0x61187d34 reent_data+852, attr=0x0, initializer=0x12) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:3118 #5 0x610fcc13 in pthread_mutex_lock (mutex=0x61187d34 reent_data+852) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:3170 #6 0x611319d8 in __fp_lock (ptr=0x61187cd0 reent_data+752) Right, __fp_lock needs a pthread lock and since this lock hasn't been used yet, it has to create it. The pthread_mutex creation calls the new operator which in turn calls calloc. at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/findfp.c:287 #7 0x61154f75 in _fwalk (ptr=0x28d544, function=function@entry=0x611319c0 __fp_lock) at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/fwalk.c:50 #8 0x61131dea in __fp_lock_all () at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/findfp.c:307 #9 0x610fa45e in pthread::atforkprepare () at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:2031 #10 0x61076292 in lock_pthread (this=synthetic pointer) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/sigproc.h:137 #11 hold_everything (x=synthetic pointer, this=synthetic pointer) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/sigproc.h:169 #12 fork () at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/fork.cc:582 Is there a better way to deal with this issue than using ERRORCHECK mutexes? Did you check if you get an error from pthread_mutex_lock on the second invocation of malloc? Is it EDEADLK? If so, you can ignore the error, but if you want to go ahead without adding lots of error checking you might be better off using a RECURSIVE mutex. I didn't check the error, but it seemed clear from the code that that was what was happening. Yes, using a RECURSIVE mutex sounds like a good idea. Or maybe it would be just as good to remove the call to pthread_atfork. See my reply to Eric later in the thread. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 8/7/2014 11:30 AM, Eric Blake wrote: On 08/07/2014 05:51 AM, Ken Brown wrote: I think I found the problem with NORMAL mutexes. emacs calls pthread_atfork after initializing the mutexes, and the resulting 'prepare' handler locks the mutexes. (The parent and child handlers unlock them.) So when emacs calls fork, the mutexes are locked, and shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock. Here's a gdb backtrace showing the sequence of calls: Arguably, that's an upstream bug in emacs. POSIX has declared pthread_atfork to be fundamentally useless; it is broken by design, because you cannot use it for anything that is not async-signal-safe without risking deadlock. And (except for sem_post()), NONE of the standardized locking functions are async-signal-safe. http://austingroupbugs.net/view.php?id=858 That said, it would still be nice to support this, since even though the theory says it is broken, there are still lots of (broken) programs/libraries still trying to use it. So what do you think emacs should do instead of using pthread_atfork? Or is it better to just remove it? I don't know how likely it is that this would cause a problem. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 08/07/2014 12:53 PM, Ken Brown wrote: On 8/7/2014 11:30 AM, Eric Blake wrote: On 08/07/2014 05:51 AM, Ken Brown wrote: I think I found the problem with NORMAL mutexes. emacs calls pthread_atfork after initializing the mutexes, and the resulting 'prepare' handler locks the mutexes. (The parent and child handlers unlock them.) So when emacs calls fork, the mutexes are locked, and shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock. Here's a gdb backtrace showing the sequence of calls: Arguably, that's an upstream bug in emacs. POSIX has declared pthread_atfork to be fundamentally useless; it is broken by design, because you cannot use it for anything that is not async-signal-safe without risking deadlock. And (except for sem_post()), NONE of the standardized locking functions are async-signal-safe. http://austingroupbugs.net/view.php?id=858 That said, it would still be nice to support this, since even though the theory says it is broken, there are still lots of (broken) programs/libraries still trying to use it. So what do you think emacs should do instead of using pthread_atfork? Or is it better to just remove it? I don't know how likely it is that this would cause a problem. The POSIX recommendation is that multithreaded apps limit themselves solely to async-signal-safe functions in the window between fork and exec (or to use pthread_spawn instead of fork/exec). I don't know what emacs is trying to do in that window, but at this point, it's certainly worth reporting it upstream. If you need a pointer to the full list of async-signal-safe functions: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04 and search for The following table defines a set of functions that shall be async-signal-safe. The most common deadlocks when violating async-signal-safety rules look like this in single-threaded programs: function calls malloc() malloc() grabs a non-recursive mutex async signal arrives signal handler called signal handler calls malloc() malloc() can't grab the mutex - deadlock and this counterpart in multithreaded programs: thread1 calls malloc() malloc() grabs a non-recursive mutex thread 2 gains control and calls fork() because of the fork, thread1 no longer exists to release the lock child process calls malloc() malloc() tries to grab mutex, but it is locked with no thread to release it Switching malloc() to a recursive lock may or may not solve the single-threaded deadlock (in that malloc can now obtain the mutex), but it is probably NOT what you want to happen (unless malloc is fully re-entrant, the inner instance will see incomplete data and either be totally clobbered itself, or else totally clobber the outer instance when it returns). So it's GOOD that malloc does NOT use a recursive mutex by default. In the multithreaded case, you are flat out hosed. Switching to a recursive lock does not change the picture - you are still deadlocked waiting on thread1 to release the lock, but thread1 doesn't exist. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: (call-process ...) hangs in emacs
On Aug 6 11:30, Katsumi Yamaoka wrote: On Tue, 05 Aug 2014 13:55:31 -0400, Ken Brown wrote: Angelo and Katsumi, could you test it and see if it solves the problems you reported? If so, I'll issue new emacs releases. Thanks. But currently I cannot test it since the autogen.sh script doesn't work as the following. I must make it work, somehow or other... % ./autogen.sh [...] Running 'autoreconf -fi -I m4' ... 0 [main] perl 4508 child_info_fork::abort: address space needed by 'POSIX.dll' (0x2D) is already occupied Can't fork, trying again in 5 seconds at /usr/bin/autoreconf-2.69 line 188. 0 [main] perl 6264 child_info_fork::abort: address space needed by 'POSIX.dll' (0x26) is already occupied Can't fork, trying again in 5 seconds at /usr/lib/perl5/5.14/i686-cygwin-threads-64int/IO/File.pm line 188. [...] rebaseall nor reinstalling of perl and some things doesn't help. : You definitely have a DLL collision. What about perlrebase? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpf_G6HgnVy7.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
On Wed, 06 Aug 2014 10:48:49 +0200, Corinna Vinschen wrote: On Aug 6 11:30, Katsumi Yamaoka wrote: % ./autogen.sh [...] Running 'autoreconf -fi -I m4' ... 0 [main] perl 4508 child_info_fork::abort: address space needed by 'POSIX.dll' (0x2D) is already occupied [...] You definitely have a DLL collision. What about perlrebase? Oh, I did never know such a help tool existence. Thanks so much! But strangely enough, autogen.sh and perl work today without doing anything in particular. What I did since last night was only to shutdown and to boot the PC. Such is one of mysteries of Cygwin. On Tue, 05 Aug 2014 13:55:31 -0400, Ken Brown wrote: Angelo and Katsumi, could you test it and see if it solves the problems you reported? If so, I'll issue new emacs releases. Great! Now I'm running trunk Emacs that I built with the patch. I also verified it runs with no problem for the batch jobs; I'm using newly built ELisp packages, such as Gnus, for the first time in these 7 days. Thanks a lot! -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
Greetings, Katsumi Yamaoka! % ./autogen.sh [...] Running 'autoreconf -fi -I m4' ... 0 [main] perl 4508 child_info_fork::abort: address space needed by 'POSIX.dll' (0x2D) is already occupied [...] You definitely have a DLL collision. What about perlrebase? Oh, I did never know such a help tool existence. Thanks so much! But strangely enough, autogen.sh and perl work today without doing anything in particular. What I did since last night was only to shutdown and to boot the PC. Such is one of mysteries of Cygwin. That's no mystery, but looks more like BLODA. One or another DLL got loaded into different address, and everything went up in flames. -- WBR, Andrey Repin (anrdae...@yandex.ru) 07.08.2014, 04:20 Sorry for my terrible english... -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 8/4/2014 9:45 AM, Corinna Vinschen wrote: On Aug 4 09:34, Ken Brown wrote: On 8/4/2014 4:00 AM, Corinna Vinschen wrote: On Aug 3 21:02, Ken Brown wrote: On 8/1/2014 9:32 AM, Corinna Vinschen wrote: It could be a problem with the new default pthread mutexes being NORMAL, rather then ERRORCHECK mutexes. That does seem to be the problem, since I can reproduce the bug starting with the 2014-07-14 snapshot. More precisely, I can reproduce it using emacs-nox (which is what the OP was using according to his cygcheck output) but not using emacs-X11 or emacs-w32. I tried running emacs under gdb with a breakpoint at call_process, but all I could see from that is that emacs tries to fork a subprocess, but the call to fork() never returns. I also tried running it under strace, but again all I can see is that fork() is called and then everything seems to be at a standstill. Corinna, if you want to take a look, here's the precise recipe: 1. emacs-nox -Q [This should start emacs and put you in the *scratch* buffer.] 2. Enter the following text into the buffer: (call-process pwd nil t) 3. Position the cursor at the end of the line and type Ctrl-j. What should happen, and what does happen prior to the 2014-07-14 snapshot, is that the current directory is displayed, followed by the exit code of 0. What happens instead is that emacs appears to hang. How does emacs start a process? Does it create a thread and then forks and execs from the thread? Does it use its own pthread_mutex to control the job? Is there a chance to create an STC of this process? emacs does some bookkeeping and then calls vfork. It does not create a new thread, nor does it create a pthread_mutex. The only pthread_mutexes created anywhere in the emacs source code are in its implementation of malloc and friends, not in anything directly related to controlling subprocesses. (FWIW, this malloc implementation is used in the Cygwin build of emacs but not in the Linux build.) Can you take a close look here? This malloc will be used by Cygwin as well if it's implemented in the usual way and... I did think about trying to create an STC, but I'm stymied because the problem depends so strongly on how emacs is run: - If emacs is run interactively, the problem only occurs with emacs-nox, not with emacs-X11 or emacs-w32. - If emacs is run non-interactively (i.e., in batch mode), the problem occurs with emacs-w32 and emacs-X11 too, as Angelo and Katsumi pointed out earlier in the thread. I can't think of any way to capture these peculiarities in an STC. ...this, and the fact that fork/exec (vfork == fork on Cygwin) still works nicely in other scenarios points to some problem with the usage of pthread_mutexes in the application may be the culprit. For instance, is it possible that emacs expects the pthread_mutexes in malloc to be ERRORCHECK mutexes? What if you explicitely set them to ERRORCHECK at creation time? That doesn't seem to be the issue, but I think I did find the problem, and it looks like there might be both an emacs bug and a Cygwin bug. Here's the relevant code from emacs's gmalloc.c: pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER; [...] /* Some pthread implementations call malloc for statically initialized mutexes when they are used first. To avoid such a situation, we initialize mutexes here while their use is disabled in malloc etc. */ pthread_mutex_init (_malloc_mutex, NULL); pthread_mutex_init (_aligned_blocks_mutex, NULL); The pthread_mutexes are initialized twice, resulting in undefined behavior according to Posix. That's the emacs bug. But simply removing the static initialization doesn't fix the problem. On the other hand, the following patch does seem to fix it, at least in preliminary testing: === modified file 'src/gmalloc.c' --- src/gmalloc.c 2014-03-04 19:02:49 + +++ src/gmalloc.c 2014-08-05 01:35:38 + @@ -490,8 +490,8 @@ } #ifdef USE_PTHREAD -pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER; -pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER; +pthread_mutex_t _malloc_mutex; +pthread_mutex_t _aligned_blocks_mutex; int _malloc_thread_enabled_p; static void @@ -526,8 +526,11 @@ initialized mutexes when they are used first. To avoid such a situation, we initialize mutexes here while their use is disabled in malloc etc. */ - pthread_mutex_init (_malloc_mutex, NULL); - pthread_mutex_init (_aligned_blocks_mutex, NULL); + pthread_mutexattr_t attr1, attr2; + pthread_mutexattr_settype (attr1, PTHREAD_MUTEX_NORMAL); + pthread_mutexattr_settype (attr2, PTHREAD_MUTEX_NORMAL); + pthread_mutex_init (_malloc_mutex, attr1); + pthread_mutex_init (_aligned_blocks_mutex, attr2); pthread_atfork (malloc_atfork_handler_prepare, malloc_atfork_handler_parent,
Re: (call-process ...) hangs in emacs
On Tue, Aug 5, 2014 at 1:21 PM, Ken Brown kbr...@cornell.edu wrote: - pthread_mutex_init (_malloc_mutex, NULL); - pthread_mutex_init (_aligned_blocks_mutex, NULL); + pthread_mutexattr_t attr1, attr2; + pthread_mutexattr_settype (attr1, PTHREAD_MUTEX_NORMAL); + pthread_mutexattr_settype (attr2, PTHREAD_MUTEX_NORMAL); + pthread_mutex_init (_malloc_mutex, attr1); + pthread_mutex_init (_aligned_blocks_mutex, attr2); pthread_atfork (malloc_atfork_handler_prepare, malloc_atfork_handler_parent, malloc_atfork_handler_child); Does there need to be a 'pthread_mutexattr_init' in there? I don't think that's the problem though... Pete -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Aug 5 08:21, Ken Brown wrote: On 8/4/2014 9:45 AM, Corinna Vinschen wrote: ...this, and the fact that fork/exec (vfork == fork on Cygwin) still works nicely in other scenarios points to some problem with the usage of pthread_mutexes in the application may be the culprit. For instance, is it possible that emacs expects the pthread_mutexes in malloc to be ERRORCHECK mutexes? What if you explicitely set them to ERRORCHECK at creation time? That doesn't seem to be the issue, but I think I did find the problem, and it looks like there might be both an emacs bug and a Cygwin bug. Here's the relevant code from emacs's gmalloc.c: pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER; [...] /* Some pthread implementations call malloc for statically initialized mutexes when they are used first. To avoid such a situation, we initialize mutexes here while their use is disabled in malloc etc. */ pthread_mutex_init (_malloc_mutex, NULL); pthread_mutex_init (_aligned_blocks_mutex, NULL); The pthread_mutexes are initialized twice, resulting in undefined behavior according to Posix. That's the emacs bug. That's not the problem. It's not necessary to call pthread_mutex_init on statically initialized mutexes, but it doesn't hurt either. Only when calling pthread_mutex_init twice on the same object it goes downhill, especially when the first incarnation of the mutex was already locked. But simply removing the static initialization doesn't fix the problem. On the other hand, the following patch does seem to fix it, at least in preliminary testing: === modified file 'src/gmalloc.c' --- src/gmalloc.c 2014-03-04 19:02:49 + +++ src/gmalloc.c 2014-08-05 01:35:38 + @@ -490,8 +490,8 @@ } #ifdef USE_PTHREAD -pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER; -pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER; +pthread_mutex_t _malloc_mutex; +pthread_mutex_t _aligned_blocks_mutex; int _malloc_thread_enabled_p; static void @@ -526,8 +526,11 @@ initialized mutexes when they are used first. To avoid such a situation, we initialize mutexes here while their use is disabled in malloc etc. */ - pthread_mutex_init (_malloc_mutex, NULL); - pthread_mutex_init (_aligned_blocks_mutex, NULL); + pthread_mutexattr_t attr1, attr2; + pthread_mutexattr_settype (attr1, PTHREAD_MUTEX_NORMAL); + pthread_mutexattr_settype (attr2, PTHREAD_MUTEX_NORMAL); + pthread_mutex_init (_malloc_mutex, attr1); + pthread_mutex_init (_aligned_blocks_mutex, attr2); pthread_atfork (malloc_atfork_handler_prepare, malloc_atfork_handler_parent, malloc_atfork_handler_child); The first hunk avoids the double initialization, but I don't understand why the second hunk does anything. Since PTHREAD_MUTEX_NORMAL is now the default, shouldn't calling pthread_mutex_init with NULL second argument be equivalent to my calls to pthread_mutexattr_settype? Does this indicate a Cygwin bug, or am I misunderstanding something? AFAICS you're missing something. Your pthread_mutexattr_t attr1, attr2 are not initialized. They contain some random values, thus they are not good objects. The calls to pthread_mutexattr_settype as well as the calls to pthread_mutex_init will fail with EINVAL, but you won't see it due to missing error handling, and you end up without mutexes at all. If you call pthread_mutexattr_init before calling pthread_mutexattr_settype the situation shoul;d be the same as before. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgprPE0TImYR3.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
In my experiments, not calling pthread_mutexattr_init caused errors such that the final mutex was invalid and could not be locked. The difference between the explicitly initialised mutex and the statically initialised one is that the latter does get 'lazily' initialised when the mutex is first locked (I think...?) so maybe the problem is something to do with the timing of that? Pete -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 8/5/2014 9:58 AM, Corinna Vinschen wrote: On Aug 5 08:21, Ken Brown wrote: On 8/4/2014 9:45 AM, Corinna Vinschen wrote: ...this, and the fact that fork/exec (vfork == fork on Cygwin) still works nicely in other scenarios points to some problem with the usage of pthread_mutexes in the application may be the culprit. For instance, is it possible that emacs expects the pthread_mutexes in malloc to be ERRORCHECK mutexes? What if you explicitely set them to ERRORCHECK at creation time? That doesn't seem to be the issue, but I think I did find the problem, and it looks like there might be both an emacs bug and a Cygwin bug. Here's the relevant code from emacs's gmalloc.c: pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER; [...] /* Some pthread implementations call malloc for statically initialized mutexes when they are used first. To avoid such a situation, we initialize mutexes here while their use is disabled in malloc etc. */ pthread_mutex_init (_malloc_mutex, NULL); pthread_mutex_init (_aligned_blocks_mutex, NULL); The pthread_mutexes are initialized twice, resulting in undefined behavior according to Posix. That's the emacs bug. That's not the problem. It's not necessary to call pthread_mutex_init on statically initialized mutexes, but it doesn't hurt either. Only when calling pthread_mutex_init twice on the same object it goes downhill, especially when the first incarnation of the mutex was already locked. But simply removing the static initialization doesn't fix the problem. On the other hand, the following patch does seem to fix it, at least in preliminary testing: === modified file 'src/gmalloc.c' --- src/gmalloc.c 2014-03-04 19:02:49 + +++ src/gmalloc.c 2014-08-05 01:35:38 + @@ -490,8 +490,8 @@ } #ifdef USE_PTHREAD -pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER; -pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER; +pthread_mutex_t _malloc_mutex; +pthread_mutex_t _aligned_blocks_mutex; int _malloc_thread_enabled_p; static void @@ -526,8 +526,11 @@ initialized mutexes when they are used first. To avoid such a situation, we initialize mutexes here while their use is disabled in malloc etc. */ - pthread_mutex_init (_malloc_mutex, NULL); - pthread_mutex_init (_aligned_blocks_mutex, NULL); + pthread_mutexattr_t attr1, attr2; + pthread_mutexattr_settype (attr1, PTHREAD_MUTEX_NORMAL); + pthread_mutexattr_settype (attr2, PTHREAD_MUTEX_NORMAL); + pthread_mutex_init (_malloc_mutex, attr1); + pthread_mutex_init (_aligned_blocks_mutex, attr2); pthread_atfork (malloc_atfork_handler_prepare, malloc_atfork_handler_parent, malloc_atfork_handler_child); The first hunk avoids the double initialization, but I don't understand why the second hunk does anything. Since PTHREAD_MUTEX_NORMAL is now the default, shouldn't calling pthread_mutex_init with NULL second argument be equivalent to my calls to pthread_mutexattr_settype? Does this indicate a Cygwin bug, or am I misunderstanding something? AFAICS you're missing something. Your pthread_mutexattr_t attr1, attr2 are not initialized. They contain some random values, thus they are not good objects. The calls to pthread_mutexattr_settype as well as the calls to pthread_mutex_init will fail with EINVAL, but you won't see it due to missing error handling, and you end up without mutexes at all. If you call pthread_mutexattr_init before calling pthread_mutexattr_settype the situation shoul;d be the same as before. Thanks for catching my mistake. Your earlier suggestion about explicitly setting the pthread_mutexes to be ERRORCHECK mutexes seems to fix the problem (as long as I remember to call pthread_mutexattr_init). The revised patch is attached. I went back to using both the static and dynamic initializations as in the original code, since you said that's harmless. Angelo and Katsumi, could you test it and see if it solves the problems you reported? If so, I'll issue new emacs releases. Ken === modified file 'src/gmalloc.c' --- src/gmalloc.c 2014-03-04 19:02:49 + +++ src/gmalloc.c 2014-08-05 17:30:18 + @@ -490,8 +490,8 @@ } #ifdef USE_PTHREAD -pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER; -pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER; +pthread_mutex_t _malloc_mutex = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP; +pthread_mutex_t _aligned_blocks_mutex = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP; int _malloc_thread_enabled_p; static void @@ -526,8 +526,13 @@ initialized mutexes when they are used first. To avoid such a situation, we initialize mutexes here while their use is disabled in malloc etc. */ - pthread_mutex_init (_malloc_mutex, NULL); - pthread_mutex_init (_aligned_blocks_mutex, NULL); + pthread_mutexattr_t attr1,
Re: (call-process ...) hangs in emacs
Hi Ken, On Aug 5 13:55, Ken Brown wrote: On 8/5/2014 9:58 AM, Corinna Vinschen wrote: On Aug 5 08:21, Ken Brown wrote: === modified file 'src/gmalloc.c' --- src/gmalloc.c 2014-03-04 19:02:49 + +++ src/gmalloc.c 2014-08-05 01:35:38 + @@ -490,8 +490,8 @@ } #ifdef USE_PTHREAD -pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER; -pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER; +pthread_mutex_t _malloc_mutex; +pthread_mutex_t _aligned_blocks_mutex; int _malloc_thread_enabled_p; static void @@ -526,8 +526,11 @@ initialized mutexes when they are used first. To avoid such a situation, we initialize mutexes here while their use is disabled in malloc etc. */ - pthread_mutex_init (_malloc_mutex, NULL); - pthread_mutex_init (_aligned_blocks_mutex, NULL); + pthread_mutexattr_t attr1, attr2; + pthread_mutexattr_settype (attr1, PTHREAD_MUTEX_NORMAL); + pthread_mutexattr_settype (attr2, PTHREAD_MUTEX_NORMAL); + pthread_mutex_init (_malloc_mutex, attr1); + pthread_mutex_init (_aligned_blocks_mutex, attr2); pthread_atfork (malloc_atfork_handler_prepare, malloc_atfork_handler_parent, malloc_atfork_handler_child); The first hunk avoids the double initialization, but I don't understand why the second hunk does anything. Since PTHREAD_MUTEX_NORMAL is now the default, shouldn't calling pthread_mutex_init with NULL second argument be equivalent to my calls to pthread_mutexattr_settype? Does this indicate a Cygwin bug, or am I misunderstanding something? AFAICS you're missing something. Your pthread_mutexattr_t attr1, attr2 are not initialized. They contain some random values, thus they are not good objects. The calls to pthread_mutexattr_settype as well as the calls to pthread_mutex_init will fail with EINVAL, but you won't see it due to missing error handling, and you end up without mutexes at all. If you call pthread_mutexattr_init before calling pthread_mutexattr_settype the situation shoul;d be the same as before. Thanks for catching my mistake. Your earlier suggestion about explicitly setting the pthread_mutexes to be ERRORCHECK mutexes seems to fix the problem (as long as I remember to call pthread_mutexattr_init). The revised patch is attached. I went back to using both the static and dynamic initializations as in the original code, since you said that's harmless. I'm glad to read that, but I'm still a little bit concerned. If your code works with ERRORCHECK mutexes but hangs with NORMAL mutexes, you *might* miss an error case. I'd suggest to tweak the pthread_mutex_lock/unlock calls and log the threads calling it. It looks like the same thread calls malloc from malloc for some reason and it might be interesting to learn how that happens and if it's really ok in this scenario, because it seems to be unexpected by the code. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpnZYfsETNUF.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
Ciao, Ken Ken Brown wrote: Angelo and Katsumi, could you test it and see if it solves the problems you reported? for what I can see, with your patch (pthread_mutex.patch), the things work better.. at least the build does not hang and with repeated 'make -j3' it was also completed in Cygwin-1.7.31... :) Ciao, Angelo. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Tue, 05 Aug 2014 13:55:31 -0400, Ken Brown wrote: Angelo and Katsumi, could you test it and see if it solves the problems you reported? If so, I'll issue new emacs releases. Thanks. But currently I cannot test it since the autogen.sh script doesn't work as the following. I must make it work, somehow or other... % ./autogen.sh [...] Running 'autoreconf -fi -I m4' ... 0 [main] perl 4508 child_info_fork::abort: address space needed by 'POSIX.dll' (0x2D) is already occupied Can't fork, trying again in 5 seconds at /usr/bin/autoreconf-2.69 line 188. 0 [main] perl 6264 child_info_fork::abort: address space needed by 'POSIX.dll' (0x26) is already occupied Can't fork, trying again in 5 seconds at /usr/lib/perl5/5.14/i686-cygwin-threads-64int/IO/File.pm line 188. [...] rebaseall nor reinstalling of perl and some things doesn't help. : -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Aug 3 21:02, Ken Brown wrote: On 8/1/2014 9:32 AM, Corinna Vinschen wrote: On Aug 1 14:17, Peter Hull wrote: On Fri, Aug 1, 2014 at 1:50 PM, Angelo Graziosi angelo.grazi...@alice.it wrote: Since I upgraded to Cygwin-1.7.31*, I see similar issue in building Emacs trunk (--with-w32 build)... The build always hangs in compiling some .el file. CTRL-C does not work and I have to search the running processes with ps and kill them with 'kill -9'. Downgrading to 1.7.30, things work better. Now I am using 1.7.30... By better, do you mean 'perfectly'? It seems like it might be a little bit intermittent, from the reports I have seen. It's easy enough to do a cvs rdiff between the releases if 1.7.30 is known to be good - I am happy to help but I am unfamiliar with the code so I don't know where to start looking... It could be a problem with the new default pthread mutexes being NORMAL, rather then ERRORCHECK mutexes. That does seem to be the problem, since I can reproduce the bug starting with the 2014-07-14 snapshot. More precisely, I can reproduce it using emacs-nox (which is what the OP was using according to his cygcheck output) but not using emacs-X11 or emacs-w32. I tried running emacs under gdb with a breakpoint at call_process, but all I could see from that is that emacs tries to fork a subprocess, but the call to fork() never returns. I also tried running it under strace, but again all I can see is that fork() is called and then everything seems to be at a standstill. Corinna, if you want to take a look, here's the precise recipe: 1. emacs-nox -Q [This should start emacs and put you in the *scratch* buffer.] 2. Enter the following text into the buffer: (call-process pwd nil t) 3. Position the cursor at the end of the line and type Ctrl-j. What should happen, and what does happen prior to the 2014-07-14 snapshot, is that the current directory is displayed, followed by the exit code of 0. What happens instead is that emacs appears to hang. How does emacs start a process? Does it create a thread and then forks and execs from the thread? Does it use its own pthread_mutex to control the job? Is there a chance to create an STC of this process? Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpWyGUWfvPyo.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
On Mon, Aug 4, 2014 at 2:02 AM, Ken Brown kbr...@cornell.edu wrote: That does seem to be the problem, since I can reproduce the bug starting with the 2014-07-14 snapshot. More precisely, I can reproduce it using emacs-nox (which is what the OP was using according to his cygcheck output) but not using emacs-X11 or emacs-w32. Thanks for your help in resolving this. I am indeed using emacs-nox. Do you think emacs-x11 or emacs-w32 would be a good alternative to work round the problem in the short term? Peter -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 8/4/2014 4:00 AM, Corinna Vinschen wrote: On Aug 3 21:02, Ken Brown wrote: On 8/1/2014 9:32 AM, Corinna Vinschen wrote: On Aug 1 14:17, Peter Hull wrote: On Fri, Aug 1, 2014 at 1:50 PM, Angelo Graziosi angelo.grazi...@alice.it wrote: Since I upgraded to Cygwin-1.7.31*, I see similar issue in building Emacs trunk (--with-w32 build)... The build always hangs in compiling some .el file. CTRL-C does not work and I have to search the running processes with ps and kill them with 'kill -9'. Downgrading to 1.7.30, things work better. Now I am using 1.7.30... By better, do you mean 'perfectly'? It seems like it might be a little bit intermittent, from the reports I have seen. It's easy enough to do a cvs rdiff between the releases if 1.7.30 is known to be good - I am happy to help but I am unfamiliar with the code so I don't know where to start looking... It could be a problem with the new default pthread mutexes being NORMAL, rather then ERRORCHECK mutexes. That does seem to be the problem, since I can reproduce the bug starting with the 2014-07-14 snapshot. More precisely, I can reproduce it using emacs-nox (which is what the OP was using according to his cygcheck output) but not using emacs-X11 or emacs-w32. I tried running emacs under gdb with a breakpoint at call_process, but all I could see from that is that emacs tries to fork a subprocess, but the call to fork() never returns. I also tried running it under strace, but again all I can see is that fork() is called and then everything seems to be at a standstill. Corinna, if you want to take a look, here's the precise recipe: 1. emacs-nox -Q [This should start emacs and put you in the *scratch* buffer.] 2. Enter the following text into the buffer: (call-process pwd nil t) 3. Position the cursor at the end of the line and type Ctrl-j. What should happen, and what does happen prior to the 2014-07-14 snapshot, is that the current directory is displayed, followed by the exit code of 0. What happens instead is that emacs appears to hang. How does emacs start a process? Does it create a thread and then forks and execs from the thread? Does it use its own pthread_mutex to control the job? Is there a chance to create an STC of this process? emacs does some bookkeeping and then calls vfork. It does not create a new thread, nor does it create a pthread_mutex. The only pthread_mutexes created anywhere in the emacs source code are in its implementation of malloc and friends, not in anything directly related to controlling subprocesses. (FWIW, this malloc implementation is used in the Cygwin build of emacs but not in the Linux build.) I did think about trying to create an STC, but I'm stymied because the problem depends so strongly on how emacs is run: - If emacs is run interactively, the problem only occurs with emacs-nox, not with emacs-X11 or emacs-w32. - If emacs is run non-interactively (i.e., in batch mode), the problem occurs with emacs-w32 and emacs-X11 too, as Angelo and Katsumi pointed out earlier in the thread. I can't think of any way to capture these peculiarities in an STC. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On 8/4/2014 4:05 AM, Peter Hull wrote: On Mon, Aug 4, 2014 at 2:02 AM, Ken Brown kbr...@cornell.edu wrote: That does seem to be the problem, since I can reproduce the bug starting with the 2014-07-14 snapshot. More precisely, I can reproduce it using emacs-nox (which is what the OP was using according to his cygcheck output) but not using emacs-X11 or emacs-w32. Thanks for your help in resolving this. I am indeed using emacs-nox. Do you think emacs-x11 or emacs-w32 would be a good alternative to work round the problem in the short term? Yes. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Aug 4 09:34, Ken Brown wrote: On 8/4/2014 4:00 AM, Corinna Vinschen wrote: On Aug 3 21:02, Ken Brown wrote: On 8/1/2014 9:32 AM, Corinna Vinschen wrote: It could be a problem with the new default pthread mutexes being NORMAL, rather then ERRORCHECK mutexes. That does seem to be the problem, since I can reproduce the bug starting with the 2014-07-14 snapshot. More precisely, I can reproduce it using emacs-nox (which is what the OP was using according to his cygcheck output) but not using emacs-X11 or emacs-w32. I tried running emacs under gdb with a breakpoint at call_process, but all I could see from that is that emacs tries to fork a subprocess, but the call to fork() never returns. I also tried running it under strace, but again all I can see is that fork() is called and then everything seems to be at a standstill. Corinna, if you want to take a look, here's the precise recipe: 1. emacs-nox -Q [This should start emacs and put you in the *scratch* buffer.] 2. Enter the following text into the buffer: (call-process pwd nil t) 3. Position the cursor at the end of the line and type Ctrl-j. What should happen, and what does happen prior to the 2014-07-14 snapshot, is that the current directory is displayed, followed by the exit code of 0. What happens instead is that emacs appears to hang. How does emacs start a process? Does it create a thread and then forks and execs from the thread? Does it use its own pthread_mutex to control the job? Is there a chance to create an STC of this process? emacs does some bookkeeping and then calls vfork. It does not create a new thread, nor does it create a pthread_mutex. The only pthread_mutexes created anywhere in the emacs source code are in its implementation of malloc and friends, not in anything directly related to controlling subprocesses. (FWIW, this malloc implementation is used in the Cygwin build of emacs but not in the Linux build.) Can you take a close look here? This malloc will be used by Cygwin as well if it's implemented in the usual way and... I did think about trying to create an STC, but I'm stymied because the problem depends so strongly on how emacs is run: - If emacs is run interactively, the problem only occurs with emacs-nox, not with emacs-X11 or emacs-w32. - If emacs is run non-interactively (i.e., in batch mode), the problem occurs with emacs-w32 and emacs-X11 too, as Angelo and Katsumi pointed out earlier in the thread. I can't think of any way to capture these peculiarities in an STC. ...this, and the fact that fork/exec (vfork == fork on Cygwin) still works nicely in other scenarios points to some problem with the usage of pthread_mutexes in the application may be the culprit. For instance, is it possible that emacs expects the pthread_mutexes in malloc to be ERRORCHECK mutexes? What if you explicitely set them to ERRORCHECK at creation time? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgpvmwycDyZow.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
On 8/1/2014 9:32 AM, Corinna Vinschen wrote: On Aug 1 14:17, Peter Hull wrote: On Fri, Aug 1, 2014 at 1:50 PM, Angelo Graziosi angelo.grazi...@alice.it wrote: Since I upgraded to Cygwin-1.7.31*, I see similar issue in building Emacs trunk (--with-w32 build)... The build always hangs in compiling some .el file. CTRL-C does not work and I have to search the running processes with ps and kill them with 'kill -9'. Downgrading to 1.7.30, things work better. Now I am using 1.7.30... By better, do you mean 'perfectly'? It seems like it might be a little bit intermittent, from the reports I have seen. It's easy enough to do a cvs rdiff between the releases if 1.7.30 is known to be good - I am happy to help but I am unfamiliar with the code so I don't know where to start looking... It could be a problem with the new default pthread mutexes being NORMAL, rather then ERRORCHECK mutexes. That does seem to be the problem, since I can reproduce the bug starting with the 2014-07-14 snapshot. More precisely, I can reproduce it using emacs-nox (which is what the OP was using according to his cygcheck output) but not using emacs-X11 or emacs-w32. I tried running emacs under gdb with a breakpoint at call_process, but all I could see from that is that emacs tries to fork a subprocess, but the call to fork() never returns. I also tried running it under strace, but again all I can see is that fork() is called and then everything seems to be at a standstill. Corinna, if you want to take a look, here's the precise recipe: 1. emacs-nox -Q [This should start emacs and put you in the *scratch* buffer.] 2. Enter the following text into the buffer: (call-process pwd nil t) 3. Position the cursor at the end of the line and type Ctrl-j. What should happen, and what does happen prior to the 2014-07-14 snapshot, is that the current directory is displayed, followed by the exit code of 0. What happens instead is that emacs appears to hang. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
I can't reproduce this. I tested both emacs-X11 and emacs-w32, on both 32-bit Cygwin and 64-bit Cygwin. Can you think of anything on your system that has changed in the last few days? And have you tried starting emacs with the '-Q' option to rule out a problem in your initialization files? No effect from emacs -Q. I've found that if I run call-process and then quit, subsequent invocations work. The last 2 things I can remember installing were numpy and guile. At the same time there were quite a number of updates to my packages and to cygwin itself IIRC. I've not installed anything new on Windows itself recently though there have been WIndows Updates (mostly Defender files) cygcheck output below. Thanks for your help, Pete Cygwin Configuration Diagnostics Current System Time: Thu Jul 31 15:48:46 2014 Windows 8.1 Professional Ver 6.3 Build 9600 Running under WOW64 on AMD64 Path:C:\cygwin\usr\local\bin C:\cygwin\bin C:\Program Files (x86)\Intel\iCLS Client C:\Program Files\Intel\iCLS Client C:\WINDOWS C:\WINDOWS\system32 C:\WINDOWS\system32\wbem C:\Program Files (x86)\Windows Kits\8.1\Windows Performance Toolkit C:\Program Files (x86)\Microchip\MPLAB C32 Suite\bin C:\Program Files\Microsoft\Web Platform Installer C:\Program Files (x86)\Microsoft SDKs\TypeScript\1.0 C:\Program Files (x86)\Windows Live\Shared C:\Program Files\Intel\Intel(R) Management Engine Components\DAL C:\Program Files\Intel\Intel(R) Management Engine Components\IPT C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\DAL C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\IPT C:\Program Files\Microsoft SQL Server\120\Tools\Binn C:\cygwin\lib\lapack Output from C:\cygwin\bin\id.exe UID: 1001(Peter)GID: 545(Users) 545(Users) 578(Hyper-V Administrators) 559(Performance Log Users) SysDir: C:\WINDOWS\system32 WinDir: C:\WINDOWS USER = 'Peter' PWD = '/home/Peter' HOME = '/home/Peter' USERDOMAIN_ROAMINGPROFILE = 'DELL_E5530' HOMEPATH = '\Users\Peter' APPDATA = 'C:\Users\Peter\AppData\Roaming' ProgramW6432 = 'C:\Program Files' HOSTNAME = 'DELL_E5530' SHELL = '/bin/bash' TERM = 'xterm' PROCESSOR_IDENTIFIER = 'Intel64 Family 6 Model 58 Stepping 9, GenuineIntel' PROFILEREAD = 'true' WINDIR = 'C:\WINDOWS' PUBLIC = 'C:\Users\Public' OLDPWD = '/tmp' ORIGINAL_PATH = '/cygdrive/c/Program Files (x86)/Intel/iCLS Client:/cygdrive/c/Program Files/Intel/iCLS Client:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS/system32/wbem:/cygdrive/c/Program Files (x86)/Windows Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program Files (x86)/Microchip/MPLAB C32 Suite/bin:/cygdrive/c/Program Files/Microsoft/Web Platform Installer:/cygdrive/c/Program Files (x86)/Microsoft SDKs/TypeScript/1.0:/cygdrive/c/Program Files (x86)/Windows Live/Shared:/cygdrive/c/Program Files/Intel/Intel(R) Management Engine Components/DAL:/cygdrive/c/Program Files/Intel/Intel(R) Management Engine Components/IPT:/cygdrive/c/Program Files (x86)/Intel/Intel(R) Management Engine Components/DAL:/cygdrive/c/Program Files (x86)/Intel/Intel(R) Management Engine Components/IPT:/cygdrive/c/Program Files/Microsoft SQL Server/120/Tools/Binn' USERDOMAIN = 'DELL_E5530' CommonProgramFiles(x86) = 'C:\Program Files (x86)\Common Files' OS = 'Windows_NT' ALLUSERSPROFILE = 'C:\ProgramData' !:: = '::\' TEMP = '/tmp' COMMONPROGRAMFILES = 'C:\Program Files (x86)\Common Files' USERNAME = 'Peter' PROCESSOR_LEVEL = '6' ProgramFiles(x86) = 'C:\Program Files (x86)' LIBGL_ALWAYS_INDIRECT = '1' PSModulePath = 'C:\WINDOWS\system32\WindowsPowerShell\v1.0\Modules\' FP_NO_HOST_CHECK = 'NO' SYSTEMDRIVE = 'C:' PROCESSOR_ARCHITEW6432 = 'AMD64' LANG = 'en_US.UTF-8' VS120COMNTOOLS = 'C:\Program Files (x86)\Microsoft Visual Studio 12.0\Common7\Tools\' USERPROFILE = 'C:\Users\Peter' TZ = 'Europe/London' Data_FuncRetVal = 'True' PS1 = '\[\e]0;\w\a\]\n\[\e[32m\]\u@\h \[\e[33m\]\w\[\e[0m\]\n\$ ' LOGONSERVER = '\\MicrosoftAccount' CommonProgramW6432 = 'C:\Program Files\Common Files' PROCESSOR_ARCHITECTURE = 'x86' LOCALAPPDATA = 'C:\Users\Peter\AppData\Local' ProgramData = 'C:\ProgramData' EXECIGNORE = '*.dll' SHLVL = '1' PATHEXT = '.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC' HOMEDRIVE = 'C:' VBOX_MSI_INSTALL_PATH = 'C:\Program Files\Oracle\VirtualBox\' COMSPEC = 'C:\WINDOWS\system32\cmd.exe' TMP = '/tmp' SYSTEMROOT = 'C:\WINDOWS' MERGE_INI = 'C:\Mortara Instrument Inc\ELI Link\merge.ini' PRINTER = 'Canon MX510 series Printer (Copy 1)' PROCESSOR_REVISION = '3a09' INFOPATH = '/usr/local/info:/usr/share/info:/usr/info' PROGRAMFILES = 'C:\Program Files (x86)' VS110COMNTOOLS = 'C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\Tools\' NUMBER_OF_PROCESSORS = '4' SESSIONNAME = 'Console' COMPUTERNAME = 'DELL_E5530' _ = '/usr/bin/cygcheck' HKEY_CURRENT_USER\Software\Cygwin HKEY_CURRENT_USER\Software\Cygwin\Installations (default) = '\??\C:\cygwin'
Re: (call-process ...) hangs in emacs
On Thu, 31 Jul 2014 15:51:49 +0100, Peter Hull wrote: VC integration in emacs has stopped working for me in the past few days. Using emacs debugger I found the last function call was to call-process which never returns. I can reproduce this by evaluating in Lisp Interaction mode (using ^J) (call-process pwd nil t) I would expect to see the PWD and exit code but instead it just hangs until I Quit it (^G) I am using GNU Emacs 24.3.1 and confirmed cygwin and all packages up to date. (cygwin DLL 1.7.31) I'm troubled with a similar problem since Wednesday[1]. /usr/bin/emacs that Cygwin distributed (24.3) seems ok, but /usr/local/bin/emacs that I built from the Emacs trunk Tuesday (24.4.50) got not to work conditionally. With that version of Emacs: Evaluating the form `(call-process pwd nil t)' on normally running Emacs works. It returns /home/yamaoka immediately. However, if it is done in the bacth mode, /usr/local/bin/emacs -Q -batch -eval '(call-process pwd nil t)' it never returns; the Emacs process eats no cpu but stays consistently[2]. `kill -9 PID' has no effect. Using the Windows Task Manager is the only means to kill it. I tried to rebuild that version of Emacs from scratch, but failed. During bootstrap, bootstrap-emacs for the first use never returns as follows (a trigger that makes bootstrap-emacs hang might be different, though): make[3]: Entering directory '/Work/emacs/lisp' EMACSLOADPATH= '../src/bootstrap-emacs.exe' -batch --no-site-file --no-site-lisp -l autoload \ --eval (setq generate-autoload-cookie \;;;###cal-autoload\) \ --eval (setq generated-autoload-file (expand-file-name (unmsys--file-name \calendar/cal-loaddefs.el\))) \ -f batch-update-autoloads ./calendar Wrote /Work/emacs/lisp/calendar/cal-loaddefs.el (Note: the cal-loaddefs.el file is created but bootstrap-emacs doesn't exit.) Thanks. [1] I did updating my Cygwin installation on Wednesday morning for the first time in the last 7 days. It must be the trigger of the problem. I also tried clean install of Cygwin, however it didn't help. Since it didn't seem to finish because of texlive-collection-basic.sh (the bash process didn't eat cpu, but never returned), I redid it by marking all the texlive packages to be `Skip'. [2] The configure script of some packages uses `call-process' in the batch mode of Emacs in order to examine something. It prevents the package from being built because of this problem. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Fri, Aug 1, 2014 at 11:22 AM, Katsumi Yamaoka yama...@jpl.org wrote: On Thu, 31 Jul 2014 15:51:49 +0100, Peter Hull wrote: I'm troubled with a similar problem since Wednesday[1]. I checked my /var/log/setup.log. The last time I installed anything was 2014/07/30 (Wednesday). That update included the 'cygwin' package which for me now is at version 1.7.31-3 Pete -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
Katsumi Yamaoka wrote: I'm troubled with a similar problem since Wednesday[1]. /usr/bin/emacs that Cygwin distributed (24.3) seems ok, but /usr/local/bin/emacs that I built from the Emacs trunk Tuesday (24.4.50) got not to work conditionally. With that version of Emacs: Evaluating the form `(call-process pwd nil t)' on normally running Emacs works. It returns /home/yamaoka immediately. However, if it is done in the bacth mode, /usr/local/bin/emacs -Q -batch -eval '(call-process pwd nil t)' it never returns; the Emacs process eats no cpu but stays consistently[2]. `kill -9 PID' has no effect. Using the Windows Task Manager is the only means to kill it. [1] I did updating my Cygwin installation on Wednesday morning for the first time in the last 7 days. It must be the trigger of the problem. I also tried clean install of Cygwin, however it didn't help. Since it didn't seem to finish because of texlive-collection-basic.sh (the bash process didn't eat cpu, but never returned), I redid it by marking all the texlive packages to be `Skip'. Just for completeness.. or to add another story to the eternal Cygwin - Emacs saga.. Since I upgraded to Cygwin-1.7.31*, I see similar issue in building Emacs trunk (--with-w32 build)... The build always hangs in compiling some .el file. CTRL-C does not work and I have to search the running processes with ps and kill them with 'kill -9'. Downgrading to 1.7.30, things work better. Now I am using 1.7.30... Here is on Win7 64 and cygwin64. Ciao, Angelo. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Fri, Aug 1, 2014 at 1:50 PM, Angelo Graziosi angelo.grazi...@alice.it wrote: Since I upgraded to Cygwin-1.7.31*, I see similar issue in building Emacs trunk (--with-w32 build)... The build always hangs in compiling some .el file. CTRL-C does not work and I have to search the running processes with ps and kill them with 'kill -9'. Downgrading to 1.7.30, things work better. Now I am using 1.7.30... By better, do you mean 'perfectly'? It seems like it might be a little bit intermittent, from the reports I have seen. It's easy enough to do a cvs rdiff between the releases if 1.7.30 is known to be good - I am happy to help but I am unfamiliar with the code so I don't know where to start looking... Pete -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: (call-process ...) hangs in emacs
On Aug 1 14:17, Peter Hull wrote: On Fri, Aug 1, 2014 at 1:50 PM, Angelo Graziosi angelo.grazi...@alice.it wrote: Since I upgraded to Cygwin-1.7.31*, I see similar issue in building Emacs trunk (--with-w32 build)... The build always hangs in compiling some .el file. CTRL-C does not work and I have to search the running processes with ps and kill them with 'kill -9'. Downgrading to 1.7.30, things work better. Now I am using 1.7.30... By better, do you mean 'perfectly'? It seems like it might be a little bit intermittent, from the reports I have seen. It's easy enough to do a cvs rdiff between the releases if 1.7.30 is known to be good - I am happy to help but I am unfamiliar with the code so I don't know where to start looking... It could be a problem with the new default pthread mutexes being NORMAL, rather then ERRORCHECK mutexes. Somebody would have to debug it... Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat pgphnQOHVLz_y.pgp Description: PGP signature
Re: (call-process ...) hangs in emacs
On 7/31/2014 10:51 AM, Peter Hull wrote: VC integration in emacs has stopped working for me in the past few days. Using emacs debugger I found the last function call was to call-process which never returns. I can reproduce this by evaluating in Lisp Interaction mode (using ^J) (call-process pwd nil t) I would expect to see the PWD and exit code but instead it just hangs until I Quit it (^G) I am using GNU Emacs 24.3.1 and confirmed cygwin and all packages up to date. (cygwin DLL 1.7.31) I can't reproduce this. I tested both emacs-X11 and emacs-w32, on both 32-bit Cygwin and 64-bit Cygwin. Can you think of anything on your system that has changed in the last few days? And have you tried starting emacs with the '-Q' option to rule out a problem in your initialization files? You should also attach cygcheck output, as suggested at http://cygwin.com/problems.html. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple