[issue8299] Improve GIL in 2.7

2010-08-06 Thread Terry J. Reedy
Terry J. Reedy added the comment: OK. I would probably be better to expend energy on the 3.x new GIL, should issues arise. -- resolution: -> out of date status: open -> closed ___ Python tracker _

[issue8299] Improve GIL in 2.7

2010-08-06 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Although I did finally manage to explain the point of this patch (after a long, long discussion), I think the issue is still too controversial. We did, for example, see some strange behaviour in my last comment (Date: 2010-04-21 23:22) regarding affi

[issue8299] Improve GIL in 2.7

2010-08-05 Thread Raymond Hettinger
Raymond Hettinger added the comment: That question should probably raised on python-dev, not the bug tracker. -- nosy: +rhettinger ___ Python tracker ___

[issue8299] Improve GIL in 2.7

2010-08-05 Thread Terry J. Reedy
Terry J. Reedy added the comment: For better or worse, this did not make it into 2.7. Is it the sort of thing that could still go into a bug-fix release, without alpha/beta testing? Or should this be closes as out-of-date? -- nosy: +terry.reedy ___

[issue8299] Improve GIL in 2.7

2010-04-21 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Sorry, all the benchmarks were missing from my last comment. Here they are: total time avg time/request stddev LEGACY_GIL serial ((30, 500), (2.6908777512225717, 0.08968708313486773, 0.00151646440612788

[issue8299] Improve GIL in 2.7

2010-04-21 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: David, trying to get some more realistic IO benchmarks I did some more tests. The idea is to have a threaded socket server, serving requests that take different amounts of time to process, and see how io response measures up for two classes of reques

[issue8299] Improve GIL in 2.7

2010-04-20 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: It is interesting to see, David, the difference in the behaviour of the semaphore based and condition variable based lock on linux. It is clear that the semaphore and the condition varable have different queuing characteristics. I wouldn't be surpri

[issue8299] Improve GIL in 2.7

2010-04-20 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Sorry Martin, I meant issue 8410. I have so many of these going on :) -- ___ Python tracker ___

[issue8299] Improve GIL in 2.7

2010-04-18 Thread David Beazley
David Beazley added the comment: Here are the results of running the fair.py test on a Mac OS-X system using a "fair" GIL implementation (modified condition variable): [ Fair GIL, Dual-Core, OS-X ] Sequential execution slow: 5.490943 (0 left) fast: 0.369257 (0 left) Threaded execution slow: 6.

[issue8299] Improve GIL in 2.7

2010-04-18 Thread Martin v . Löwis
Martin v. Löwis added the comment: > Martin, I´ve explained it in my other dissue, issue 8411, with a step by step > example. Hmm. Can't find it there. What message or file should I be looking at? -- ___ Python tracker

[issue8299] Improve GIL in 2.7

2010-04-18 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Martin, I´ve explained it in my other dissue, issue 8411, with a step by step example. It is unfair because a thread can _bypass_ the condition variable. A thread just woken up from the condition variable has to race to get the lock, and it is a race

[issue8299] Improve GIL in 2.7

2010-04-18 Thread Martin v . Löwis
Martin v. Löwis added the comment: > Martin, I don't know if you were suggesting that a "fair" mutex would > make the emulated semaphore fair too. You probably weren't, but just > in case, the fairness of the mutex is immaterial because it is only > held for a short time to guard the internal s

[issue8299] Improve GIL in 2.7

2010-04-17 Thread David Beazley
David Beazley added the comment: As a followup, since I'm not sure anyone actually here actually tried a fair GIL on Linux, I incorporated your suggested fairness patch to the condition-variable version of the GIL (using this pseudocode you wrote as a guide): with gil.cond: if gil.n_waitin

[issue8299] Improve GIL in 2.7

2010-04-17 Thread David Beazley
David Beazley added the comment: I'm definitely sure that semaphores were being used in my test---I stuck a print statement inside the code that creates locks just to make sure it was using the semaphore version :-). Unfortunately, at this point I think most of this discussion is academic sin

[issue8299] Improve GIL in 2.7

2010-04-17 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: >I'm not trying to be a pain here, but do you have any explanation as to >why, >with fair scheduling, the observed execution time of multiple CPU->bound >threads is substantially worse than with unfair scheduling? Yes. This is because the GIL yield no

[issue8299] Improve GIL in 2.7

2010-04-16 Thread David Beazley
David Beazley added the comment: One other comment. Running the modified fair.py file on my Linux system using Python compiled with semaphores shows they they are *definitely* not fair. Here's the relevant part of your test: Treaded, balanced execution, with quickstop: fast C: 1.580815 (0 l

[issue8299] Improve GIL in 2.7

2010-04-16 Thread David Beazley
David Beazley added the comment: I'm not trying to be a pain here, but do you have any explanation as to why, with fair scheduling, the observed execution time of multiple CPU-bound threads is substantially worse than with unfair scheduling? >From your own benchmarks, consider this result (Fa

[issue8299] Improve GIL in 2.7

2010-04-16 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: What your fair.py is doing is demonstrating the superior behaviour of a time-based GIL interrupt to a bytecode based one. I have no quibbles with that and I agree that it is superior. But I also think that your example is a very artificial one. On

[issue8299] Improve GIL in 2.7

2010-04-16 Thread David Beazley
David Beazley added the comment: I've attached a test "fair.py" that gives an example of the fair CPU scheduling issue. In this test, there are two threads, one of which has fast-running ticks, one of which has slow-running ticks. Here is their sequential performance (OS-X, Python 2.6): s

[issue8299] Improve GIL in 2.7

2010-04-16 Thread David Beazley
David Beazley added the comment: I'm sorry, but even in the presence of fair locking, I still don't like this patch. The main problem is that it confuses fair locking with fair CPU use---something that this patch does not and can not achieve on any platform. The main problem is that everyth

[issue8299] Improve GIL in 2.7

2010-04-15 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Googling a bit gave me this: > http://lists.apple.com/archives/darwin-kernel/2005/Dec/msg00022.html > It would appear that mac os X was at least lacking full posix semaphore > support in 2005. Hmm. OS X really sucks. -- ___

[issue8299] Improve GIL in 2.7

2010-04-15 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Googling a bit gave me this: http://lists.apple.com/archives/darwin-kernel/2005/Dec/msg00022.html It would appear that mac os X was at least lacking full posix semaphore support in 2005. -- ___ Python tracke

[issue8299] Improve GIL in 2.7

2010-04-15 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: David, I urge you to reconsider: The "emulated" semaphore is broken because it is unfair. It is clearly a programming error, born out of naivete about how to implement such primitives. Proper semaphores therefore cannot be implemented using the "exac

[issue8299] Improve GIL in 2.7

2010-04-15 Thread David Beazley
David Beazley added the comment: I hope everyone realizes that all of this bike-shedding about emulated semaphores versus "real" semaphores is mostly a non-issue. For one thing, go look at how a "real" semaphore is implemented by reading the source code to pthreads or some other thread libr

[issue8299] Improve GIL in 2.7

2010-04-15 Thread R. David Murray
R. David Murray added the comment: Also note that his results were much worse on MacOS than anyone was seeing on Linux, which may support this theory :) -- ___ Python tracker __

[issue8299] Improve GIL in 2.7

2010-04-15 Thread R. David Murray
R. David Murray added the comment: My understanding is that David noticed the problem originally on MacOS. If the emulation is indeed being used on that platform (and a little googling indicates the MacOS posix semaphore implementation is considered at least slightly broken, and FreeBSD didn

[issue8299] Improve GIL in 2.7

2010-04-15 Thread Antoine Pitrou
Antoine Pitrou added the comment: > You do realize, that if we enable the USE_SEMAPHORE, we get the GIL > behaviour as seen on windows and with my ROUNDROBIN_GIL > implementation, right? I haven't studied this argument, but I don't see how that contradicts anything. The main issue witnessed wit

[issue8299] Improve GIL in 2.7

2010-04-15 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: You do realize, that if we enable the USE_SEMAPHORE, we get the GIL behaviour as seen on windows and with my ROUNDROBIN_GIL implementation, right? Also, at the GIL open space talk on PyCon, David did show us the "emulation" source code as if it were _

[issue8299] Improve GIL in 2.7

2010-04-15 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Yes, we put #error in both places (defining and undefining > USE_SEMAPHORES). The colleague in question is Christian Tismer, he is > unlikely to have gotten it wrong. Ok, so can you or Christian open an issue about it? We should try to fix it. > I am also c

[issue8299] Improve GIL in 2.7

2010-04-15 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Yes, we put #error in both places (defining and undefining USE_SEMAPHORES). The colleague in question is Christian Tismer, he is unlikely to have gotten it wrong. I am also curious why David Beazley kept talking about the "binary semaphore" when it

[issue8299] Improve GIL in 2.7

2010-04-15 Thread Antoine Pitrou
Antoine Pitrou added the comment: > However, I just asked a colleague with a os X to compile python 2.7 > and _POSIX_SEMAPHORES isn't defined, and so, it is running using the > emulation. Why, I wonder? Isn't it defined in unistd.h? Perhaps a bad combination of defines. Has he checked that th

[issue8299] Improve GIL in 2.7

2010-04-15 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Oh dear. I was assuming that the mutex+condition variable were the actual implementation mostly in use on pthreads. This is because of David's GIL open talk at pycon, where we were looking at the source and bickering about the placement of "pthread_

[issue8299] Improve GIL in 2.7

2010-04-15 Thread Antoine Pitrou
Antoine Pitrou added the comment: > if _POSIX_SEMAPHORES is defined, thread_pthread.h is designed to use > the (fair) semaphore. If it is not present, or > HAVE_BROKEN_POSIX_SEMAPHORES defined, the semaphore is supposed to be > emulated using a condition variable. > Now, I don't have access to

[issue8299] Improve GIL in 2.7

2010-04-15 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Here is yet another point: if _POSIX_SEMAPHORES is defined, thread_pthread.h is designed to use the (fair) semaphore. If it is not present, or HAVE_BROKEN_POSIX_SEMAPHORES defined, the semaphore is supposed to be emulated using a condition variable. N

[issue8299] Improve GIL in 2.7

2010-04-13 Thread David Beazley
David Beazley added the comment: What bothers me most about this discussion is that the Windows implementation (legacy GIL) is being held up as an example of what we should be doing on posix. Yet, if I go run the same thread tests that I presented in my GIL talks on a multicore Windows machi

[issue8299] Improve GIL in 2.7

2010-04-13 Thread Martin v . Löwis
Martin v. Löwis added the comment: > Maybe the state of this discussion is my fault for not being clear > enough. Let's abandon terms such as "broken" and "roundrobin." CS > theory has the perfectly useful terms "fair" and "unfair." The fact > of the matter is this: the pthread GIL (implemente

[issue8299] Improve GIL in 2.7

2010-04-13 Thread Antoine Pitrou
Antoine Pitrou added the comment: Kristjan, > Maybe the state of this discussion is my fault for not being clear enough. It's quite a bit simpler. The first 2.7 beta has been released and there's IMO no way such patches will be accepted. It doesn't seem to be a pressing enough issue to be cons

[issue8299] Improve GIL in 2.7

2010-04-13 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Maybe the state of this discussion is my fault for not being clear enough. Let's abandon terms such as "broken" and "roundrobin." CS theory has the perfectly useful terms "fair" and "unfair." The fact of the matter is this: the pthread GIL (implemente

[issue8299] Improve GIL in 2.7

2010-04-11 Thread David Beazley
David Beazley added the comment: I'm sorry, I still don't get the supposed benefits of this round-robin patch over the legacy GIL. Given that using interpreter ticks as a basis for thread scheduling is problematic to begin with (mostly due to the fact that ticks have totally unpredictable e

[issue8299] Improve GIL in 2.7

2010-04-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Antoine (2): The need to have do_yield is a symptom of the brokenness > of the GIL. Of course it is. But the point of the benchmark is to give valid results even with the old broken GIL. I could remove do_yield and still have it give valid results, but that

[issue8299] Improve GIL in 2.7

2010-04-11 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: David, I don't necessarily think it is reasonable to yield every 100 opcodes, but that is the _intent_ of the current code base. Checkinterval is set to 100. If you don't want that, then set it higher. Your statement is like saying: "Why would you w

[issue8299] Improve GIL in 2.7

2010-04-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: > SHA1 hashing (C) > > threads= 1: 1275 iterations/s. balance > threads= 2: 1267 ( 99%)0.7238 > threads= 3: 1271 ( 99%)0.2405 > threads= 4: 1270 ( 99%)0.1508 > > Using the forced "do_yield" helps balance things, but not much. We >

[issue8299] Improve GIL in 2.7

2010-04-11 Thread David Beazley
David Beazley added the comment: Sorry, but I don't see how you can say that the round-robin GIL and the legacy GIL have the same behavior based solely on the result of a performance benchmark. Do you have any kind of thread scheduling trace that proves they are scheduling threads in exactl

[issue8299] Improve GIL in 2.7

2010-04-11 Thread David Beazley
David Beazley added the comment: I must be missing something, but why, exactly would you want multiple CPU-bound threads to yield every 100 ticks? Frankly, that sounds like a horrible idea that is going to hammer your system with excessive context switching overhead and cache performance pr

[issue8299] Improve GIL in 2.7

2010-04-11 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Fyi, here is the output using the unmodified Windows GIL, i.e. without my patch being active: C:\pydev\python\trunk\PCbuild>python.exe ..\Tools\ccbench\ccbench.py -t -y == CPython 2.7a4+.0 (trunk) == == AMD64 Windows on 'Intel64 Family 6 Model 23 Steppi

[issue8299] Improve GIL in 2.7

2010-04-11 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: I looked at ccbench. It's a great tool. I've added two features to it (see the attached patch) -y option to turn off the "do_yield" option in throughput, and so measure thread scheduling without assistance, and the throughput option now also compute

[issue8299] Improve GIL in 2.7

2010-04-09 Thread anatoly techtonik
anatoly techtonik added the comment: If it really improves multicore performance and none of our test fail (even in memory/resource/time survival tests) then I'd give it a try even after a beta. 2.x is still the best practical version out there. -- nosy: +techtonik __

[issue8299] Improve GIL in 2.7

2010-04-09 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: David, yes messing about with processor affinities is certainly not nice. Especially since the issue is cross-platform. The pthreads api doesn't offer much. There is pthreadd_setschedparam(), and pthreads_setconcurrency(). Unfortunately I don't have a

[issue8299] Improve GIL in 2.7

2010-04-06 Thread Antoine Pitrou
Antoine Pitrou added the comment: > The counter is "stall cycles". > During the 10 second run on my 2.4Ghz cpu, we had instruction cache > miss stalls for 2 billion cycles (2000 samples of 100 cycles per > sample). That does account for around 10% of the availible cpu. Ok, thanks. > 2) Th

[issue8299] Improve GIL in 2.7

2010-04-06 Thread David Beazley
David Beazley added the comment: The analysis of instruction cache behavior is interesting---I could definitely see that coming into play given the heavy penalty that one sees going to multiple cores (it's a side effect in addition everything else that goes wrong such as a huge increase in th

[issue8299] Improve GIL in 2.7

2010-04-06 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: The counter is "stall cycles". During the 10 second run on my 2.4Ghz cpu, we had instruction cache miss stalls for 2 billion cycles (2000 samples of 100 cycles per sample). That does account for around 10% of the availible cpu. I'm observing some

[issue8299] Improve GIL in 2.7

2010-04-06 Thread Antoine Pitrou
Antoine Pitrou added the comment: [...] > _PyObject_Call403 99,02 [...] > affinity off: > Functions Causing Most Work > Name Samples % [...] > _PyObject_Call1.936 99,23 [...] > _threadstartex1.934 99,13 > > When we run on both cores, we get four times as many L1

[issue8299] Improve GIL in 2.7

2010-04-06 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: I just did some profiling. I´m using visual studio team edition which has some fancy built in profiling. I decided to compare the performance of the iotest.py script with two cpu threads, running for 10 seconds with processor affinity enabled and di

[issue8299] Improve GIL in 2.7

2010-04-05 Thread Florent Xicluna
Changes by Florent Xicluna : -- nosy: +flox ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.o

[issue8299] Improve GIL in 2.7

2010-04-05 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Sorry, what I meant with the "original problem" was the phenomenon observed by Antoine (IIRC) that the same CPU thread tends to hog the gil, even when releaseing it in ceval.c. What I have been looking at up to now is chiefly IO performance using David

[issue8299] Improve GIL in 2.7

2010-04-03 Thread David Beazley
David Beazley added the comment: It's not a simple mutex because if you did that, you would have performance problems much worse than those described in issue 7946. http://bugs.python.org/issue7946 -- ___ Python tracker

[issue8299] Improve GIL in 2.7

2010-04-03 Thread Torsten Landschoff
Torsten Landschoff added the comment: Silly question, I know, but why isn't the GIL just implemented as a lock of the host operating system? After all, we want mutual exclusion, I don't see why condition variables are required for this. I have to admin that I did not look at the source, so th

[issue8299] Improve GIL in 2.7

2010-04-03 Thread Antoine Pitrou
Antoine Pitrou added the comment: Kristjan, I agree with Martin, it's probably too late to make such changes for 2.7. Additionally, your "round-robin" scheme only seems round-robin when there are two threads competing. Otherwise, you could have three threads A, B and C, and the GIL bouncing betw

[issue8299] Improve GIL in 2.7

2010-04-03 Thread David Beazley
David Beazley added the comment: Just ran the CPU-bound GIL test on my wife's dual core Windows Vista machine. The code runs twice as slow using two threads as it does using no threads (original observed behavior in my GIL talk). -- ___ Python tra

[issue8299] Improve GIL in 2.7

2010-04-03 Thread David Beazley
David Beazley added the comment: I'm not sure where you're getting your information, but the original GIL problem *DEFINITELY* exists on multicore Windows machines. I've had numerous participants try it in training classes and workshops they've all observed severely degraded performance for

[issue8299] Improve GIL in 2.7

2010-04-03 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Antoine: Please take a look, the change is really simple, particularly the ROUNDROBIN_GIL variant which fixes the originally observed problem. the GIL is still a lock, implemented using a mutex and a semaphore. It is modified to work exactly as the l

[issue8299] Improve GIL in 2.7

2010-04-03 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Martin: Well, this patch was originally conceived more as a demonstration of the GIL problem and an alternative fix proposal. However, it is possible to configure it so that there is no change from existing functionality, simply by not including thread

[issue8299] Improve GIL in 2.7

2010-04-03 Thread David Beazley
David Beazley added the comment: Without looking at this patch, I think it would wise to proceed with caution on incorporating any kind of GIL patch into 2.X. If there is anything to be taken away from my own working studying the GIL, it's that the problem is far more tricky than it looks an

[issue8299] Improve GIL in 2.7

2010-04-03 Thread Martin v . Löwis
Martin v. Löwis added the comment: I think this is too late for 2.7. 2.7b1 will be released RSN, and we should not implement such a change after the first beta release. -- nosy: +loewis ___ Python tracker

[issue8299] Improve GIL in 2.7

2010-04-03 Thread Kristján Valur Jónsson
New submission from Kristján Valur Jónsson : This patch does several things: 1) Creates a separate lock type PyThread_type_gil and locking functions for that. This allows tweaking of the GIL without affecting regular lock behaviour. 2) Creates a uniform implementation of the GIL on windows/pthr