from:"Sturla Molden"

Re: [Python-Dev] C99

2016-08-06 Thread Sturla Molden

Guido van Rossum  wrote:

> This feels close to a code of conduct violation. This kind of language
> may be okay on the Linux kernel list, but I don't see the point of it
> here.

Sorry, I should have found a more diplomatic formulation. But the principle
remains, build problems araising from missing Xcode installation should not
be CPython's problem. It is ok to assume that Xcode is always installed
when CPython is built on OSX.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] C99

2016-08-06 Thread Sturla Molden

"Stephen J. Turnbull"  wrote:

> I may be talking through my hat here, but Apple has been using LLVM
> for several major releases now.  They seem to be keeping the GCC
> frontend stuck at 4.2.1, though.  So just because we've been using GCC
> 4.2.1 on Mac, including on buildbots, doesn't mean that there is no
> C99 compiler available on Macs -- it's quite possible that the
> available clang frontend does already support the C99 features we
> would use.  I suppose that might mean fiddling with the installer
> build process as well as the buildbots.

On OSX 10.8 and earlier, the default CC is llvm-gcc-4.2.1, available as the
gcc command. clang is also installed, so we can always 

$ export CC=clang
$ export CXX=clang++

to remedy the problem.

On OSX 10.9 and later, gcc is just a symlink to clang.

Xcode must be installed to build anything on Mac. It is not optional. Users
who need to build Python without installing Xcode need to fix their heads.
Because that is where their problem resides. There is no remedy for
stubbornness to the level of stupidity. Either they install Xcode or they
don't get to build anything. 

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] C99

2016-08-06 Thread Sturla Molden

Ned Deily <n...@python.org> wrote:

> On Aug 6, 2016, at 01:16, Stephen J. Turnbull 
> <turnbull.stephen...@u.tsukuba.ac.jp> wrote:
>> I may be talking through my hat here, but Apple has been using LLVM
>> for several major releases now.  They seem to be keeping the GCC
>> frontend stuck at 4.2.1, though.  So just because we've been using GCC
>> 4.2.1 on Mac, including on buildbots, doesn't mean that there is no
>> C99 compiler available on Macs -- it's quite possible that the
>> available clang frontend does already support the C99 features we
>> would use.  I suppose that might mean fiddling with the installer
>> build process as well as the buildbots.
> 
> Sorry, I wasn't clear; I did not mean to imply there is no C99 compiler
> available from Apple for OS X.  On current OS X releases, clang is the
> default and only supported compiler.  I was bringing up the example of
> the impact on building on older releases with the supported build tools
> for those releases where clang is either not available or was too
> immature to be usable.  As I said, there are a number of solutions to
> that problem - building on newer systems with deployment targets,
> installing third-party compilers, etc.



Clang is also available (and installed) on OSX 10.8 and earlier, although
gcc 4.2.1 is the default frontend to LLVM. The easiest solution to get C99
on those platforms is

$ export CC=clang

Not very difficult, and highly recommended.



Sturla Molden

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] C99

2016-08-06 Thread Sturla Molden

Matthias Klose  wrote:

> GCC 5 and GCC 6 default to C11 (-std=gnu11), does the restriction to C99 mean
> that -std=gnu99 should be passed explicitly?

Also note that -std=c99 is not the same as -std=gnu99. The latter allows
GNU extensions like computed goto. Does the interpreter depend on those?
(Presumably it could be a benefit.)

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] C99

2016-06-07 Thread Sturla Molden

Nathaniel Smith  wrote:

> No-one's proposing to use C99 indiscriminately;

> There's no chance that CPython is going to drop MSVC support in 3.6.

Stinner was proposing that by saying

"Is it worth to support a compiler that in 2016 doesn't support the C
standard released in 1999, 17 years ago?"

This is basically a suggestion to drop MSVC support, as I read it. That is
never going to happen.


Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] C99

2016-06-07 Thread Sturla Molden

Victor Stinner  wrote:

> Is it worth to support a compiler that in 2016 doesn't support the C
> standard released in 1999, 17 years ago?

MSVC only supports C99 when its needed for C++11 or some MS extension to C.

Is it worth supporting MSVC? If not, we have Intel C, Clang and Cygwin GCC
are the viable options we have on Windows (and perhaps Embarcadero, but I
haven't used C++ builder for a very long time). Even MinGW does not fully
support C99, because it depends on Microsoft's CRT. If we think MSVC and
MinGW are worth supporting, we cannot just use C99 indiscriminantly.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] C99

2016-06-06 Thread Sturla Molden

Guido van Rossum  wrote:

> I'm not sure I meant that. But if I have a 3rd party extension that
> compiles with 3.5 headers using C89, then it should still compile with
> 3.6 headers using C99. Also if I compile it for 3.5 and it only uses
> the ABI it should still be linkable with 3.6.

Ok, but if third-party developers shall be free to use a C89 compiler for
their own code, we cannot have C99 in the include files. Otherwise the
include files will taint the C89 purity of their source code.

Personally I don't think we need to worry about compilers that don't
implement C99 features like inline functions in C. How long have the Linux
kernel used inline functions instead of macros? 20 years or more?

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] C99

2016-06-05 Thread Sturla Molden

 wrote:

> I share Guido's priority there - source compatibility is more important than
> smoothing a few of C's rough edges.  Maybe the next breaking change release
> this should be considered (python 4000... python 5000?)

I was simply pointing out that Guido's priority removes a lot of the
usefulness of C99 at source level. I was not saying I disagreed. If we have
to keep header files clean of C99 I think this proposal just adds clutter.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] C99

2016-06-05 Thread Sturla Molden

Guido van Rossum <gu...@python.org> wrote:

> I'm talking about 3rd party extensions. Those may require source
> compatibility with older Python versions. All I'm asking for is to not
> require source-level use of C99 features. 

This of course removes a lot of its usefulness. E.g. macros cannot be
replaced by inline functions, as header files must still be plain C89.

Sturla Molden

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Is there a reference manual for Python bytecode?

2015-12-28 Thread Sturla Molden

Brett Cannon  wrote:

> Ned also neglected to mention his byterun project which is a pure Python
> implementation of the CPython eval loop:  href="https://github.com/nedbat/byterun;>https://github.com/nedbat/byterun

I would also encourage you to take a look at Numba. It is an LLVM based JIT
compiler for Python bytecode, written for hardcore numerical algorithms in
Python. It can often achieve the same performance as -O2 in C after a short
burn-in while inferring the types of the arguments and variables. Using it
is mostly as easy as adding an @numba.jit decorator to the function we want
to accelerate. Numba is rapidly becoming what Google's long dead swallow
should have been.

:-)

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Computed Goto dispatch for Python 2

2015-05-28 Thread Sturla Molden


On 28/05/15 21:37, Chris Barker wrote:


I think it's great for it to be used by end users as a system library /
utility. i.e. like you would a the system libc -- so if you can write a
little python script that only uses the stdlib -- you can simply deliver
that script.


No it is not, because someone will be 'clever' and try to upgrade it 
with pip or install packages into it.


Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Computed Goto dispatch for Python 2

2015-05-28 Thread Sturla Molden

Donald Stufft don...@stufft.io wrote:

 Honestly, I’m on an OS that *does* ship Python (OS X) and part of me hopes
 that they stop shipping it. It’s very rare that someone ships Python as
 part of their OS without modifying it in some way, and those modifications
 almost always cause pain to some set of users (and since I work on pip, they
 tend to come to us with the weirdo problems). Case in point: Python on OS X
 adds some preinstalled software, but they put this pre-installed software 
 before
 site-packages in sys.path, so pip can’t upgrade those pre-installed software
 packages at all. 

Many Unix tools need Python, so Mac OS X (like Linux distros and FreeBSD)
will always need a system Python. Yes, it would be great if could be called
spython or something else than python. But the main problem is that it is
used by end-users as well, not just the operating system. 

Anyone who use Python on OSX should install their own Python. The system
Python should be left alone as it is. 

If the system Python needs updating, it is the responsibility of Apple to
distribute the upgrade. Nobody should attempt to use pip to update the
system Python. Who knows what side-effects it might have. Preferably pip
should have a check for it and bluntly refuse to do it.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-11 Thread Sturla Molden

Antoine Pitrou solip...@pitrou.net wrote:

 But you can compile OpenBLAS with one compiler and then link it to
 Python using another compiler, right? There is a single C ABI.

BLAS and LAPACK are actually Fortran, which does not have a single C ABI.
The ABI depends on the Fortran compiler. g77 and gfortran will produce
different C ABIs. This is a consistent source of PITA  in any scientific
programming that combines C and Fortran.

There is cblas though, which is a C API, but it does not include LAPACK.

Another thing is that libraries are different. MSVC wants a .lib file, but
MinGW produces .a files like GCC does on Linux. Perhaps you can generate a
.lib file from a .a file, but I have never tried.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-11 Thread Sturla Molden

Sturla Molden sturla.mol...@gmail.com wrote:

 BLAS and LAPACK are actually Fortran, which does not have a single C ABI.
 The ABI depends on the Fortran compiler. g77 and gfortran will produce
 different C ABIs. This is a consistent source of PITA  in any scientific
 programming that combines C and Fortran.
 
 There is cblas though, which is a C API, but it does not include LAPACK.
 
 Another thing is that libraries are different. MSVC wants a .lib file, but
 MinGW produces .a files like GCC does on Linux. Perhaps you can generate a
 .lib file from a .a file, but I have never tried.

And not to mention that the Fortran run-time depends on the C runtime...
What Carl Keffner did for SciPy was to use a static libgfortran, which is
not liked against any specific CRT, so it could be linked with msvcr90.dll
when the Python extension is built. The vanilla libgfortran.dll from MinGW
is linked with msvcrt.dll. However, not linking with msvcrt.dll broke the
pthreads library, which in turn broke OpenMP, so he had to patch the
pthreads library for this... This just shows some of the difficulties of
trying to combine the GNU and Microsoft compilers. There are many others,
like different stack alignment, differenr exception handling, and the mingw
runtime (which causes segfaults when linked dynamically to MSVC
executables). It's not just getting the CRT right.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-11 Thread Sturla Molden

Antoine Pitrou solip...@pitrou.net wrote:

 It sound like whatever MSVC produces should be the defacto standard
 under Windows.

Yes, and that is what Clang does on Windows. It is not as usable as MinGW
yet, but soon it will be. Clang also suffers fronthe lack of a Fortran
compiler, though.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-11 Thread Sturla Molden

Steve Dower steve.do...@microsoft.com wrote:

 Is there some reason the Fortran part can't be separated out into a DLL? 

DLL hell, I assume. Using the Python extension module loader makes it less
of a problem. If we stick with .pyd files where everything is statically
linked we can rely on the Python dev team to make sure that DLL hell does
not bite us. Most of the contributors to projects like NumPy and SciPy are
not computer scientists. So the KISS principle is important, which is why
scientific programmers often use Fortran in the first place. Making sure
DLLs are resolved and loaded correctly, or using stuff like COM or .NET to
mitigate DLL hell, is just in a different league. That is for computer
engineers to take care of, but we are trained as physicists, matematicians,
astronomers, chemists, biologists, or what ever... I am sure that engineers
at Microsoft could do this correctly, but we are not the kind of guys you
would hire :-)


OT: Contrary to common belief, there is no speed advantage of using Fortran
on a modern CPU, because the long pipeline and the hierarchical memory
alleviates the problems with pointer aliasing. C code tends to run faster
then Fortran, often 10 to 20 % faster, and C++ tends to be slightly faster
than C. In 2014, Fortran is only used because it is easier to program for
non-specialists. And besides, correctness is far more important than speed,
which is why we prefer Python or MATLAB in the first place. If you ever see
the argument that Fortran is used because of pointer aliasing, please feel
free to ignore it.


Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-10 Thread Sturla Molden

Nathaniel Smith n...@pobox.com wrote:

 You may want to get in touch with Carl Kleffner -- he's done a bunch
 of work lately on getting a mingw-based toolchain to the point where
 it can build numpy and scipy. 

To build *Python extensions*, one can use Carl's toolchain or the VC9
compiler for Python 2.7 that Microsoft just released.

To build *Python* you need Visual Studio, Visual Studio Express, Windows
SDK, or Cygwin because there is no other build process available on
Windows. Python cannot be built with MinGW.

The official 64-bit Python installer from Python.org is built with the
Windows SDK compiler, not Visual Studio. The Windows SDK is a free
download. The 32-bit installer is built with Visual Studio.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-10 Thread Sturla Molden

Paul Moore p.f.mo...@gmail.com wrote:

 Having said that, I'm personally not interested in this, as I am happy
 with MSVC Express. Python 3.5 will be using MSVC 14, where the express
 edition supports both 32 and 64 bit.

If you build Python yourself, you can (more or less) use whichever version
of Visual Studio you want. There is nothing that prevents you from building
Python 2.7 or 3.4 with MSVC 14. But then you have to build all Python
extensions with this version of Visual Studio as well.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-10 Thread Sturla Molden

Larry Hastings la...@hastings.org wrote:

 Just to make something clear that may not be clear to non-Windows
 developers: the C library is implicitly part of the ABI.  

MacOS X also has this issue, but it less known amon Mac developers! There
tends to be multiple versions of the C library, one for each SDK version.
If you link the wrong one when building a Python extension it can crash.
For example if you have Python built with the 10.6 SDK (e.g. Enthought) but
has XCode with the 10.9 SDK as default, you need to build with the flag
-mmacosx-version-min=10.6, and for C++ also -stdlib=libstdc++. Not doing so
will cause all sorts of mysterious errors.


Two other ABI problems on Windows is the stack alignment and the MinGW
runtime: On 32-bit applications, MSVC use 16 bit stack alignment whereas
MinGW uses 32-bit alignment. This is a common cause of segfaults for Python
extensions built with MinGW. Most developers just assume it is sufficient
to link the same CRT as Python. Another problem is the MinGW runtime
(mingw32.a or mingw32.dll) which conflicts with MSCV and can cause
segfaults unless it is statically linked. The vanlilla MinGW distro
defaults to dynamic linkage for this library. Because of this a special
MinGW toolchain was created for building SciPy on Windows:

https://github.com/numpy/numpy/wiki/Mingw-static-toolchain



Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-10 Thread Sturla Molden

Victor Stinner victor.stin...@gmail.com wrote:

 Is MinGW fully compatible with MSVS ABI? I read that it reuses the
 MSVCRT, but I don't know if it's enough. 

Not out of the box. See:

https://github.com/numpy/numpy/wiki/Mingw-static-toolchain



Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-10 Thread Sturla Molden

Larry Hastings la...@hastings.org wrote:

 So as a practical matter I think I'd prefer if we continued to only
 support MSVC.  In fact I'd prefer it if we removed support for other
 Windows compilers, instead asking those maintainers to publish their own
 patches / repos, in the way that Stackless does.

The scientific community needs to use MinGW or Intel compilers because of
Fortran. So some support for other compilers will be good, at least for
building C extensions.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-10 Thread Sturla Molden

Merlijn van Deen valhall...@arctus.nl wrote:

 VC++ 2008/2010 EE do not *bundle* a 64-bit compiler, 

Actually it does, but it is not available from the UI. You can use it from
the command line, though.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-10 Thread Sturla Molden

Steve Dower steve.do...@microsoft.com wrote:

 I don't have any official confirmation, but my guess would be that the
 64-bit compilers were omitted from the VC 2008 Express to save space
 (bearing in mind that WinXP was the main target at that time, which had
 poor 64-bit support, and very few people cared about building 64-bit
 binaries) and were not available in the IDE for VC 2010 Express by
 mistake. For building extensions, the former is resolved by the package at
 http://aka.ms/vcpython27, and the latter works fine since the 64-bit
 compiler is there, just not exposed in the IDE. Neither of these will be
 an issue with VC14 - 64-bit is far too important these days.


The 64-bit compiler is in VC 2008 Express as well, just not exposed in the
IDE. I know this because when I got the Absoft Fortran compiler I was told
to download VC 2008 Express, because Absoft uses the VC9 linker. And indeed
there was a 64-bit compiler in VC 2008 Express as well, just not available
from the IDE. If I remeber correctly, some fiddling with vcvars.bat was
required to turn it on. I never tried to build Python extensions with it,
though. In the beginning I thought Absoft had given me the wrong product,
because I had ordered a 64-bit Fortran compiler and I knew VC 2008
Express was only 32-bit. But they assured me the 64-bit VC9 compiler was
there as well, and indeed it was.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-10 Thread Sturla Molden

Larry Hastings la...@hastings.org wrote:

 CPython doesn't require OpenBLAS.  Not that I am not receptive to the
 needs of the numeric community... but, on the other hand, who in the
 hell releases a library with Windows support that doesn't work with MSVC?!

It uses ATT assembly syntax instead of Intel assembly syntax. 

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Microsoft Visual C++ Compiler for Python 2.7

2014-09-27 Thread Sturla Molden

Christian Heimes christ...@python.org wrote:

 Is it possible to compile extensions from Python's numerical stack such
 as NumPy, SciPy and SciKit, too?

The official NumPy installer is currently built with VC9, so probably yes.
Other parts of the SciPy stack needs a Fortran compiler as well, so those
might be more tricky. Currently the limitation to Fortran 77 is considered
lifted, Fortran 90 and later will be allowed, so g77 is no longer an
option. In practice you will need Intel ifort or a patched MinGW gfortran. 

Because of this the SciPy community has been creating a customized MinGW
toolchain (including gfortran) for building binary wheels on Windows. It is
patched to make sure that e.g. the MinGW runtime does not conflict with the
VC9 code in the official Python 2.7 installer and that libgfortran uses the
correct C runtime. The stack alignment is also changed to make it VC9
compatible. There was also a customization of the C++ exception handling.
In addition to this the MinGW runtime and libgfortran are statically
linked, so there are no extra runtime DLLs to install.

https://github.com/numpy/numpy/wiki/Mingw-static-toolchain

The toolchain also contains a build of OpenBLAS to use as BLAS and LAPACK
when building NumPy and the SciPy stack. Intel MKL or ATLAS might be
preferred though, due to concerns about the maturity of OpenBLAS.

Sturla Molden

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Microsoft Visual C++ Compiler for Python 2.7

2014-09-27 Thread Sturla Molden

Steve Dower steve.do...@microsoft.com wrote:
 It'll help with the numerical stack, but only a little. The devs involved
 have largely figured it out already and I can't provide a good Fortran
 compiler or BLAS library, which is what they need.

We finally have a MinGW based toolchain that can be used. Making sure it
was compatible with VC9 is actually worse than most would expect, as there
are subtile incompatibilities between vanilla MinGW and VC9, beyond just
linking the same C runtime DLL. But it was worked out:

https://github.com/numpy/numpy/wiki/Mingw-static-toolchain

As for BLAS, the NumPy/SciPy devs have procured a permission from Intel to
use MKL in binary wheels. But still there will be official binaries linked
with a free BLAS library available. Currently we use ATLAS, but the plan is
to use OpenBLAS (successor to GotoBLAS2) when it matures. 

OpenBLAS is currently the fastest abd most scalable BLAS library available,
actually better than MKL, but it is severely underfunded. It is not a good
situation for the industry that the only open BLAS library with the
performance of MKL is a Chinese student project in HPC. ATLAS is
unfortunately far less performant and scalable. 

Apple and Cray solved the problem on their platforms by building
high-performance BLAS and LAPACK libraries into their operating systems
(Apple Accelerate Framework and Cray libsci). But AFAIK, Windows does not
have a BLAS library from Microsoft. 

Sturla Molden

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-06 Thread Sturla Molden

Julian Taylor jtaylor.deb...@googlemail.com wrote:
 
 The problem with this approach is that it is already difficult enough to
 handle memory in numpy.

I would not do this in a way that complicates memory management in NumPy. I
would just replace malloc and free with temporarily cached versions. From
the perspective of NumPy the API should be the same.

 Having a cache that potentially stores gigabytes
 of memory out of the users sight will just make things worse.

Buffer don't need to stay in cache forver, just long enough to allow resue
within an expression. We are probably talking about delaying the call to
free with just a few microseconds.

We could e.g. have a setup like this:

NumPy thread on malloc:
- tries to grab memory off the internal heap
- calls system malloc on failure

NumPy thread on free:
- returns a buffer to the internal heap
- signals a condition

Background daemonic GC thread:
- wakes after sleeping on the condition
- sleeps for another N microseconds (N = magic number)
- flushes or shrinks the internal heap with system free
- goes back to sleeping on the condition 
 
It can be implemented with the same API as malloc and free, and plugged
directly into the existing NumPy code. 

We would in total need two mutexes, one condition variable, a pthread, and
a heap.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-06 Thread Sturla Molden

Nathaniel Smith n...@pobox.com wrote:

 The proposal in my initial email requires zero pthreads, and is
 substantially more effective. (Your proposal reduces only the alloc
 overhead for large arrays; mine reduces both alloc and memory access
 overhead for boyh large and small arrays.)

My suggestion prevents the kernel from zeroing pages in the middle of a
computation, which is an important part. It would also be an optimiation
the Python interpreter could benefit from indepently of NumPy, by allowing
reuse of allocated memory pages within CPU bound portions of the Python
code. And no, the method I suggested does not only work for large arrays.

If we really want to take out the memory access overhead, we need to
consider lazy evaluation. E.g. a context manager that collects a symbolic
expression and triggers evaluation on exit:

with numpy.accelerate:
x = expression
y = expression
z = expression
# evaluation of x,y,z happens here

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Moving Python 3.5 on Windows to a new compiler

2014-06-06 Thread Sturla Molden

Brett Cannon bcan...@gmail.com wrote:

 Nope. A new minor release of Python is a massive undertaking which is why
 we have saved ourselves the hassle of doing a Python 2.8 or not giving a
 clear signal as to when Python 2.x will end as a language.

Why not just define Python 2.8 as Python 2.7 except with a newer compiler?
I cannot see why that would be massive undertaking, if changing compiler
for 2.7 is neccesary anyway.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Moving Python 3.5 on Windows to a new compiler

2014-06-06 Thread Sturla Molden

Brian Curtin br...@python.org wrote:

 Adding features into 3.x is already not enough of a carrot on the
 stick for many users. Intentionally leaving 2.7 on a dead compiler is
 like beating them with the stick.

Those who want to build extensions on Windows will just use MinGW
(currently GCC 2.8.2) instead. 

NumPy and SciPy are planning a switch to a GCC based toolchain with static
linkage of the MinGW runtime on Windows.  It is carefully configured to be
binary compatible with VS2008 on Python 2.7. The major reason for this is
to use gfortran also on Windows. But the result will be a GCC based
toolchain that anyone can use to build extensions on Windows.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Moving Python 3.5 on Windows to a new compiler

2014-06-06 Thread Sturla Molden

Brian Curtin br...@python.org wrote:

 Well we're certainly not going to assume such a thing. I know people do
 that, but many don't (I never have).

If Python 2.7 users are left with a dead compiler on Windows, they will
find a solution. For example, Enthought is already bundling their Python
distribution with gcc 2.8.1 on Windows.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Moving Python 3.5 on Windows to a new compiler

2014-06-06 Thread Sturla Molden

Eli Bendersky eli...@gmail.com wrote:

 While we're at it, Clang in nearing a stage where it can compile C and C++
 on Windows *with ABI-compatibility to MSVC* (yes, even C++) -- see
 a
 href=http://clang.llvm.org/docs/MSVCCompatibility.html;http://clang.llvm.org/docs/MSVCCompatibility.html/a
 for more details. Could
 this help?

Possibly. cl-clang is exciting and I hope distutils will support it one
day. Clang is not well known among Windows users as it is among users of
Unix (Apple, Linux, FreeBSD, et al.) It would be even better if Python
were bundled with Clang on Windows. 

The MinGW-based SciPy toolchain has ABI compatibility with MSVC only for
C (and Fortran), not C++. Differences from vanilla MinGW is mainly static
linkage of the MinGW runtime, different stack alignment (4 bytes instead of
16), and it links with msvcr91.dll instead of msvcrt.dll. 

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Moving Python 3.5 on Windows to a new compiler

2014-06-06 Thread Sturla Molden

Brian Curtin br...@python.org wrote:

 If Python 2.7 users are left with a dead compiler on Windows, they will
 find a solution. For example, Enthought is already bundling their Python
 distribution with gcc 2.8.1 on Windows.
 
 Again, not something I think we should depend on. A lot of people use
 python.org installers.

I am not talking about changing the python.org installers. Let it remain on
VS2008 for Python 2.7. I am only suggesting we make it easier to find a
free C compiler compatible with the python.org installers. 

The NumPy/SciPy dev team have taken the burden to build a MinGW toolchain
that is configured to be 100 % ABI compatible with the python.org
installer. I am only suggesting a link to it or something like that,
perhaps even host it as a separate download. (It is GPL, so anyone can do
that.) That way it would be easy to find a compatible C compiler. We have
to consider that VS2008 will be unobtainable abandonware long before the
promised Python 2.7 support expires. When that happens, users of Python 2.7
will need to find another compiler to build C extensions. If Python.org
makes this easier it would hurt less to have Python 2.7 remain on VS2008
forever.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-06 Thread Sturla Molden

Greg Ewing greg.ew...@canterbury.ac.nz wrote:
 Julian Taylor wrote:
 tp_can_elide receives two objects and returns one of three values:
 * can work inplace, operation is associative
 * can work inplace but not associative
 * cannot work inplace
 
 Does it really need to be that complicated? Isn't it
 sufficient just to ask the object potentially being
 overwritten whether it's okay to overwrite it?

How can it know this without help from the interpreter? 

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-06 Thread Sturla Molden

Nathaniel Smith n...@pobox.com wrote:

 with numpy.accelerate:
 x = expression
 y = expression
 z = expression
 # evaluation of x,y,z happens here
 
 Using an alternative evaluation engine is indeed another way to
 optimize execution, which is why projects like numexpr, numba, theano,
 etc. exist. But this is basically switching to a different language in
 a different VM.

I was not thinking that complicated. Let us focus on what an unmodified
CPython can do.

A compound expression with arrays can also be seen as a pipeline. Imagine
what would happen if in NumPy 2.0 arithmetic operators returned
coroutines instead of temporary arrays. That way an expression could be
evaluated chunkwise, and the chunks would be small enough to fit in cache.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-05 Thread Sturla Molden


On 05/06/14 22:51, Nathaniel Smith wrote:


This gets evaluated as:

tmp1 = a + b
tmp2 = tmp1 + c
result = tmp2 / c

All these temporaries are very expensive. Suppose that a, b, c are
arrays with N bytes each, and N is large. For simple arithmetic like
this, then costs are dominated by memory access. Allocating an N byte
array requires the kernel to clear the memory, which incurs N bytes of
memory traffic.


It seems to be the case that a large portion of the run-time in Python 
code using NumPy can be spent in the kernel zeroing pages (which the 
kernel does for security reasons).


I think this can also be seen as a 'malloc problem'. It comes about 
because each new NumPy array starts with a fresh buffer allocated by 
malloc. Perhaps buffers can be reused?


Sturla






___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Should standard library modules optimize for CPython?

2014-06-03 Thread Sturla Molden

Stefan Behnel stefan...@behnel.de wrote:

 Thus my proposal to compile the modules in CPython with Cython, rather than
 duplicating their code or making/keeping them CPython specific. I think
 reducing the urge to reimplement something in C is a good thing.

For algorithmic and numerical code, Numba has already proven that Python
can be JIT compiled comparable to -O2 in C.  For non-algorthmic code, the
speed determinants are usually outside Python (e.g. the network
connection). Numba is becoming what the dead swallow should have been.
The question is rather should the standard library use a JIT compiler like
Numba? Cython is great for writing C extensions while avoiding all the
details of the Python C API. But for speeding up algorithmic code, Numba is
easier to use.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Should standard library modules optimize for CPython?

2014-06-03 Thread Sturla Molden

Stefan Behnel stefan...@behnel.de wrote:

 So the
 argument in favour is mostly a pragmatic one. If you can have 2-5x faster
 code essentially for free, why not just go for it?

I would be easier if the GIL or Cython's use of it was redesigned. Cython
just grabs the GIL and holds on to it until it is manually released. The
standard lib cannot have packages that holds the GIL forever, as a Cython
compiled module would do. Cython has to start sharing access the GIL like
the interpreter does.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 2.7.7. on Windows

2014-04-28 Thread Sturla Molden

Mike Miller python-...@mgmiller.net wrote:

 The main rationale given (for not using the standard %ProgramFiles%) has been 
 that the full path to python is too long to type, and ease of use is more 
 important than the security benefits given by following Windows conventions.

C:\Program Files\Python27 contains an empty space in the path. If you
want to randomly break build tools for C extensions, then go ahead and
change it. 

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 465: A dedicated infix operator for matrix multiplication

2014-04-08 Thread Sturla Molden

Björn Lindqvist bjou...@gmail.com wrote:

 import numpy as np
 from numpy.linalg import inv, solve
 
 # Using dot function:
 S = np.dot((np.dot(H, beta) - r).T,
np.dot(inv(np.dot(np.dot(H, V), H.T)), np.dot(H, beta) - r))
 
 # Using dot method:
 S = (H.dot(beta) - r).T.dot(inv(H.dot(V).dot(H.T))).dot(H.dot(beta) - r)
 
 Don't keep your reader hanging! Tell us what the magical variables H,
 beta, r and V are. And why import solve when you aren't using it?
 Curious readers that aren't very good at matrix math, like me, should
 still be able to follow your logic. Even if it is just random data,
 it's better than nothing!

Perhaps. But you don't need to know matrix multiplication to see that those
expressions are not readable. And by extension, you can still imagine that
bugs can easily hide in unreadable code.  

Matrix multiplications are used extensively in anything from engineering to
statistics to computer graphics (2D and 3D). This operator will be a good
thing for a lot of us.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Pyston: a Python JIT on LLVM

2014-04-05 Thread Sturla Molden

Kevin Modzelewski k...@dropbox.com wrote:
 Using optional type annotations is a really promising strategy and may
 eventually be added to Pyston, but our primary target right now is
 unmodified and untyped Python code

What I meant to say is that Numba already has done the boiler-plate coding.
Even if you use no type annotations, it is already a Python bytecode
JIT-compiler based on LLVM that is hooked up with CPython. You might have
to add optimizations to it, yes, but it has the skeleton for a CPython
LLVM-based JIT compiler set up and running.

If you provide no type annotations, Numba's autojit decorator will do a
data-guided specialization. The types will be inferred from running the
code through the CPython interpreter, and then Numba will generate a
specialization. This is somewhat similar to the information-gathering that
GCC does when we run profile-guided optimizations.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Pyston: a Python JIT on LLVM

2014-04-03 Thread Sturla Molden

Kevin Modzelewski k...@dropbox.com wrote:

 Since it's the question that I think most people will inevitably (and
 rightly) ask, why do we think there's a place for Pyston when there's PyPy
 and (previously) Unladen Swallow?

Have you seen Numba, the Python JIT that integrates with NumPy?

http://numba.pydata.org

It uses LLVM to compile Python bytecode. When I have tried it I tend to get
speed comparable to -O2 in C for numerical and algorithmic code.

Here is an example, giving a 150 times speed boost to Python:

http://stackoverflow.com/questions/21811381/how-to-shove-this-loop-into-numpy/21818591#21818591


Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Start writing inlines rather than macros?

2014-02-27 Thread Sturla Molden

Brett Cannon br...@python.org wrote:

 The Visual Studio team has publicly stated they will never support C99,
 so dropping C89 blindly is going to alienate a big part of our user base
 unless we switch to C++ instead. I'm fine with trying to pull in C99
 features, though, that we can somehow support in a backwards-compatible way 
 with VS.

So you are saying that Python should use the C that Visual Studio
supports? I believe Microsoft is not competent to define the C standard.
If they cannot provide a compiler that is their bad. There are plenty of
other standard-compliant compilers we can use, including Intel, clang and
gcc (MinGW).

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Coverity Scan Spotlight Python

2013-08-29 Thread Sturla Molden


Do the numbers add up?

.005 defects in 1,000 lines of code is one defect in every 200,000 lines of 
code. 

However they also claim that to date, the Coverity Scan service has analyzed 
nearly 400,000 lines of Python code and identified 996 new defects – 860 of 
which have been fixed by the Python community.

Sturla

Sendt fra min iPad

Den 30. aug. 2013 kl. 00:10 skrev Christian Heimes christ...@python.org:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512
 
 Hello,
 
 Coverity has published its Coverity Scan Spotlight Python a couple
 of hours ago. It features a summary of Python's ecosystem, an
 interview with me about Python core development and a defect report.
 The report is awesome. We have reached a defect density of .005
 defects per 1,000 lines of code. In 2012 the average defect density of
 Open Source Software was 0.69.
 
 http://www.coverity.com/company/press-releases/read/coverity-finds-python-sets-new-level-of-quality-for-open-source-software
 
 http://wpcme.coverity.com/wp-content/uploads/2013-Coverity-Scan-Spotlight-Python.pdf
 
 The internet likes it, too.
 
 http://www.prnewswire.com/news-releases/coverity-finds-python-sets-new-level-of-quality-for-open-source-software-221629931.html
 
 http://www.securityweek.com/python-gets-high-marks-open-source-software-security-report
 
 
 Thank you very much to Kristin Brennan and Dakshesh Vyas from Coverity
 as well as everybody who has helped to fix the remaining issues!
 
 Christian
 
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.12 (GNU/Linux)
 Comment: Using GnuPG with undefined - http://www.enigmail.net/
 
 iQIcBAEBCgAGBQJSH8bEAAoJEMeIxMHUVQ1FFQcQAL1/Tb5PFMdLXwWsMt9D06aP
 A2qQPunEnfDBMdQz4GTEeDmHPdjs/EgAtUz4sLI48HlAmpdWEtoVPCdg1GvKSvMi
 IRVHR5LAtxe5p8M42+8DnSFyIOtEsbtv06W5cHvRxr6RuIkY3bTy0SVhtP9JW+N7
 wQKsp2cOIOz/FHDWWQWjxwlZmUWEGkvSSggzbYxcdsaJeGHoJgkuzoChQ3mCtUCo
 w231OTKBZhGQp/VpMK+Q7OXWm78BZdB6d4GcSR3meCU9GpRMfPBxPF7v4IWvDPv9
 4l/y922hmLLoOchJG+PDqcDhX1dnFm1t3Q199iqS5c0c+ttgaMRdSJEXZpZrubxe
 k+frJiOivG4G7BuzgQ39yF01rRHpjs57FW9FBbt4pp2c+4iOEkgARH+L/e2ZwOnk
 puXE45AfKwJwHLc4RDOhxdaPy/ovOh53HY68UxXoKjeZKWK5ShRopk0muvYG0y5O
 +8PbAKOYgJbe//NC3ac89V/1eu4rrFhN7xsK2Wc8i+kcbTB2XIVFElLHuV5wjmLd
 MMXFlm9LDJFOw12E4sF3MPaHyXQYpNJHvbnuxCkcHRQoLKzrcRJ2Y0Jj4HPSUCsj
 JhfmHX7Zu+/akmT4haqXUdtRrn4wji0OYqGydEqi4aLy7ELrC1EVNZY4OkbUhJO8
 gGbpseJXtVThXQ7fymMS
 =++g9
 -END PGP SIGNATURE-
 
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/sturla%40molden.no
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The end of 2.7

2013-04-12 Thread Sturla Molden


On 07.04.2013 21:50, Martin v. Löwis wrote:


So I believe that extension building is becoming more and more
painful on Windows for Python 2.7 as time passes (and it is already
way more painful than it is on Linux), and I see no way to do much
about that. The stable ABI would have been a solution, but it's
too late now for 2.7.


I think extension building for Python 2.7 on Windows for this reason is 
moving from VS2008 to GCC 4.7 (MinGW). When using VS, we are stuck with 
an old compiler (i.e. the .NET 3.5 SDK). With GCC, there is no such 
issue - we just link with whatever CRT is appropriate. Thus, providing 
link libraries for GCC/MinGW (both for the Python and the CRT DLL) 
somewhat alleviates the problem, unless using VS is mandatory.


A long-term solution might be to expose the CRT used by the Python 2.7 
DLL with DLL forwarding. That way, linking with the Python DLL's import 
library would also link the correct CRT.


Sturla



















___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Slides from today's parallel/async Python talk

2013-03-21 Thread Sturla Molden

Den 14. mars 2013 kl. 23:23 skrev Trent Nelson tr...@snakebite.org:

 
For the record, here are all the Windows calls I'm using that have
no *direct* POSIX equivalent:
 
Interlocked singly-linked lists:
- InitializeSListHead()
- InterlockedFlushSList()
- QueryDepthSList()
- InterlockedPushEntrySList()
- InterlockedPushListSList()
- InterlockedPopEntrySlist()
 
Synchronisation and concurrency primitives:
- Critical sections
- InitializeCriticalSectionAndSpinCount()
- EnterCriticalSection()
- LeaveCriticalSection()
- TryEnterCriticalSection()
- Slim read/writer locks (some pthread implements have
  rwlocks)*:
- InitializeSRWLock()
- AcquireSRWLockShared()
- AcquireSRWLockExclusive()
- ReleaseSRWLockShared()
- ReleaseSRWLockExclusive()
- TryAcquireSRWLockExclusive()
- TryAcquireSRWLockShared()
- One-time initialization:
- InitOnceBeginInitialize()
- InitOnceComplete()
- Generic event, signalling and wait facilities:
- CreateEvent()
- SetEvent()
- WaitForSingleObject()
- WaitForMultipleObjects()
- SignalObjectAndWait()
 
Native thread pool facilities:
- TrySubmitThreadpoolCallback()
- StartThreadpoolIo()
- CloseThreadpoolIo()
- CancelThreadpoolIo()
- DisassociateCurrentThreadFromCallback()
- CallbackMayRunLong()
- CreateThreadpoolWait()
- SetThreadpoolWait()
 
Memory management:
- HeapCreate()
- HeapAlloc()
- HeapDestroy()
 
Structured Exception Handling (#ifdef Py_DEBUG):
- __try/__except
 
Sockets:
- ConnectEx()
- AcceptEx()
- WSAEventSelect(FD_ACCEPT)
- DisconnectEx(TF_REUSE_SOCKET)
- Overlapped WSASend()
- Overlapped WSARecv()
 
 
Don't get me wrong, I grew up with UNIX and love it as much as the
next guy, but you can't deny the usefulness of Windows' facilities
for writing high-performance, multi-threaded IO code.  It's decades
ahead of POSIX.  (Which is also why it bugs me when I see select()
being used on Windows, or IOCP being used as if it were a poll-type
generic IO multiplexor -- that's like having a Ferrari and speed
limiting it to 5mph!)
 
So, before any of this has a chance of working on Linux/BSD, a lot
more scaffolding will need to be written to provide the things we
get for free on Windows (threadpools being the biggest freebie).
 
 
 


Have you considered using OpenMP instead of Windows API or POSIX threads 
directly? OpenMP gives you a thread pool and synchronization primitives for 
free as well, with no special code needed for Windows or POSIX. 

OpenBLAS (and GotoBLAS2) uses OpenMP to produce a thread pool on POSIX systems 
(and actually Windows API on Windows). The OpenMP portion of the C code is 
wrapped so it looks like sending an asynch task to a thread pool; the C code is 
not littered with OpenMP pragmas. If you need something like Windows 
threadpools on POSIX, just look at the BSD licensed OpenBLAS code. It is 
written to be scalable for the world's largest supercomputers (but also 
beautifully written and very easy to read).

Cython has code to register OpenMP threads as Python threads, in case that is 
needed. So that problem is also solved.


Sturla








___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] ctypes is not an acceptable implementation strategy for modules in the standard library?

2012-11-05 Thread Sturla Molden



On 05.11.2012 15:14, Xavier Morel wrote:

 Such as segfaulting the interpreter. I seem to reliably segfault
 everything every time I try to use ctypes.

You can do that with C extensions too, by the way. Apart from that, 
dependency on ABI is more annoying to maintain across platforms than 
dependency on API. Function calls with ctypes are also very slow. For C 
extensions in the stdlib, Cython might be a better choice then ctypes.


ctypes might be a good choice if you are to use a DLL on your own 
computer. Because then you only have one ABI to worry about. Not so for 
Python's standard library.



Sturla
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] possible bug in distutils (Mingw32CCompiler)?

2012-05-24 Thread Sturla Molden



Mingw32CCompiler in cygwincompiler.py emits the symbol -mno-cygwin.

This is used to make Cygwin's gcc behave as mingw. As of gcc 4.6 it is 
not recognized by the mingw gcc compiler itself, and causes as crash. It 
should be removed because it is never needed for mingw (in any version), 
only for cross-compilation to mingw from other gcc versions.


Instead, those who use CygwinCCompiler or Linux GCC to cross-compile 
to plain Win32 can set -mno-cygwin manually. It also means -mcygwin 
should be removed from the output of CygwinCCompiler.


I think...


Sturla



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL removal question

2011-08-17 Thread Sturla Molden


Den 10.08.2011 13:43, skrev Guido van Rossum:

They have a specific plan, based on Software Transactional Memory:
http://morepypy.blogspot.com/2011/06/global-interpreter-lock-or-how-to-kill.html



Microsoft's experiment to use STM in .NET failed though. And Linux got 
rid of the BKL without STM.


There is a similar but simpler paradim called bulk synchronous 
parallel (BSP) which might work too. Threads work independently for a 
particular amount of time with private objects (e.g. copy-on-write 
memory), then enter a barrier, changes to global objects are 
synchronized and the GC collects garbage, after which worker threads 
leave the barrier, and the cycle repeats.


To communicate changes to shared objects between synchronization 
barriers, Python code must use explicit locks and flush statements. But 
for the C code in the interpreter, BSP should give the same atomicity 
for Python bytecodes as the GIL  (there is just one active thread inside 
the barrier).


BSP is much simpler to implement than STM because of the barrier 
synchronization. BSP also cannot deadlock or livelock. And because 
threads in BSP work with private memory, there will be no trashing 
(false sharing) from the reference counting GC.


Sturla




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL removal question

2011-08-13 Thread Sturla Molden


Den 13.08.2011 17:43, skrev Antoine Pitrou:

These days we have PyGILState_Ensure():
http://docs.python.org/dev/c-api/init.html#PyGILState_Ensure


With the most recent Cython (0.15) we can just do:

   with gil:
suite

to ensure holding the GIL.

And similarly from a thread holding the GIL

   with nogil:
suite

to temporarily release it.

There are also some OpenMP support in Cython 0.15. OpenMP is much easier 
than messing around with threads manually (it moves all the hard parts 
of multithreading to the compiler). Now Cython almost makes it look 
Pythonic:


http://docs.cython.org/src/userguide/parallelism.html


Sturla


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL removal question

2011-08-12 Thread Sturla Molden


Den 12.08.2011 18:51, skrev Xavier Morel:
* Erlang uses erlang processes, which are very cheap preempted 
*processes* (no shared memory). There have always been tens to 
thousands to millions of erlang processes per interpreter source 
contention within the interpreter going back to pre-SMP by setting the 
number of schedulers per node to 1 can yield increased overall 
performances) 


Technically, one can make threads behave like processes if they don't 
share memory pages (though they will still share address space). Erlangs 
use of 'process' instead of 'thread' does not mean an Erlang process has 
to be implemented as an OS process. With one interpreter per thread, and 
a malloc that does not let threads share memory pages (one heap per 
thread), Python could do the same.


On Windows, there is an API function called HeapAlloc, which lets us 
allocate memory form a dedicated heap. The common use case is to prevent 
threads from sharing memory, thus behaving like light-weight processes 
(except address space is shared). On Unix, is is more common to use 
fork() to create new processes instead, as processes are more 
light-weight than on Windows.


Sturla







___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL removal question

2011-08-12 Thread Sturla Molden


Den 12.08.2011 18:57, skrev Rene Nejsum:

My two danish kroner on GIL issues….

I think I understand the background and need for GIL. Without it 
Python programs would have been cluttered with lock/synchronized 
statements and C-extensions would be harder to write. Thanks to Sturla 
Molden for he's explanation earlier in this thread.


I doesn't seem I managed to explain it :(

Yes, C extensions would be cluttered with synchronization statements, 
and that is annoying. But that was not my point all!


Even with fine-grained locking in place, a system using reference 
counting will not scale on an multi-processor computer. Cache-lines 
containing reference counts will become incoherent between the 
processors, causing traffic jam on the memory bus.


The technical term in parallel computing litterature is false sharing.


However, the GIL is also from a time, where single threaded programs 
running in single core CPU's was the common case.


On a new MacBook Pro I have 8 core's and would expect my multithreaded 
Python program to run significantly fast than on a one-core CPU.


Instead the program slows down to a much worse performance than on a 
one-core CPU.


A multi-threaded program can be slower on a multi-processor computer as 
well, if it suffered from extensive false sharing (which Python 
programs nearly always will do).


That is, instead of doing useful work, the processors are stepping on 
each others toes. So they spend the bulk of the time synchronizing cache 
lines with RAM instead of computing.


On a computer with a single processor, there cannot be any false 
sharing. So even without a GIL, a multi-threaded program can often run 
faster on a single-processor computer. That might seem counter-intuitive 
at first. I seen this inversed scaling blamed on the GIL many times, 
but it's dead wrong.


Multi-threading is hard to get right, because the programmer must ensure 
that processors don't access the same cache lines. This is one of the 
reasons why numerical programs based on MPI (multiple processes and IPC) 
are likely to perform better than numerical programs based on OpenMP 
(multiple threads and shared memory).


As for Python, it means that it is easier to make a program based on 
multiprocessing scale well on a multi-processor computer, than a program 
based on threading and releasing the GIL. And that has nothing to do 
with the GIL! Albeit, I'd estimate 99% of Python programmers would blame 
it on the GIL. It has to do with what shared memory does if cache lines 
are shared. Intuition about what affects the performance of a 
multi-threaded program is very often wrong. If one needs parallel 
computing, multiple processes is much more likely to scale correctly. 
Threads are better reserved for things like non-blocking I/O.


The problem with the GIL is merely what people think it does -- not what 
it actually does. It is so easy to blame a performance issue on the GIL, 
when it is actually the use of threads and shared memory per se that is 
the problem.


Sturla
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL removal question

2011-08-11 Thread Sturla Molden


Den 09.08.2011 11:33, skrev Марк Коренберг:

Probably I want to re-invent a bicycle. I want developers to say me
why we can not remove GIL in that way:

1. Remove GIL completely with all current logick.
2. Add it's own RW-locking to all mutable objects (like list or dict)
3. Add RW-locks to every context instance
4. use RW-locks when accessing members of object instances

Only one reason, I see, not do that -- is performance of
singlethreaded applications. Why not to fix locking functions for this
4 cases to stubs when only one thread present?


This has been discussed to death before, and is probably OT to this list.

There is another reason than speed of single-threaded applications, but 
it is rather technical: As CPython uses reference counting for garbage 
collection, we would get false sharing of reference counts -- which 
would work as an invisible GIL (synchronization bottleneck) anyway. 
That is, if one processor writes to memory in a cache-line shared by 
another processor, they must stop whatever they are doing to synchronize 
the dirty cache lines with RAM. Thus, updating reference counts would 
flood the memory bus with traffic and be much worse than the GIL. 
Instead of doing useful work, the processors would be stuck 
synchronizing dirty cache lines. You can think of it as a severe traffic 
jam.


To get rid of the GIL, CPython would either need

(a) another GC method (e.g. similar to .NET or Java)

or

(b) another threading model (e.g. one interpreter per thread, as in Tcl, 
Erlang, or .NET app domains).


As CPython has neither, we are better off with the GIL.

Nobody likes the GIL, fork a project to write a GIL free CPython if you 
can. But note that:


1. With Cython, you have full manual control over the GIL. IronPython 
and Jython does not have a GIL at all.


2. Much of the FUD against the GIL is plain ignorance: The GIL slows 
down parallel computational code, but any serious number crunching 
should use numerical performance libraries (i.e. C extensions) anyway. 
Libraries are free to release the GIL or spawn threads internally. Also, 
the GIL does not matter for (a) I/O bound code such as network servers 
or clients and (b) background threads in GUI programs -- which are the 
two common use-cases for threads in Python programs. If the GIL bites 
you, it's most likely a warning that your program is badly written, 
independent of the GIL issue.


There seems to be a common misunderstanding that Python threads work 
like fibers due to they GIL. They do not! Python threads are native OS 
threads and can do anything a thread can do, including executing library 
code in parallel. If one thread is blocking on I/O, the other threads 
can continue with their business.


The only thing Python threads cannot do is access the Python interpreter 
concurrently. And the reason CPython needs that restriction is reference 
counting.


Sturla


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Sturla Molden


Den 24.05.2011 11:55, skrev Artur Siekielski:


PYRO/multiprocessing proxies isn't a comparable solution because of
ORDERS OF MAGNITUDE worser performance. You compare here direct memory
access vs serialization/message passing through sockets/pipes.


The bottleneck is likely the serialization, but only if you serialize 
large objects. IPC is always very fast, at least on localhost .


Just out of curiosity, have you considered using a database? Sqlite and 
BSD DB can even be put in shared memory if you want. It sounds like you 
are trying to solve a database problem using os.fork, something which is 
more or less doomed to fail (i.e. you have to replicate all effort put 
into scaling up databases). If a database is too slow, I am rather sure 
you need something else than Python as well.


Sturla
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Sturla Molden


Den 24.05.2011 13:31, skrev Maciej Fijalkowski:


Not sure what scenario exactly are you discussing here, but storing
reference counts outside of objects has (at least on a single
processor) worse cache locality than inside objects.



Artur Siekielski is not talking about cache locality, but copy-on-write 
fork on Linux et al.


When reference counts are updated after forking, memory pages marked 
copy-on-write are copied if they store reference counts. And then he 
quickly runs out of memory. He wants to put reference counts and 
PyObjects in different pages, so only the pages with reference counts 
get copied.


I don't think he cares about cache locality at all, but the rest of us 
do :-)



Sturla





___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Sturla Molden


Den 24.05.2011 11:55, skrev Artur Siekielski:


POSH might be good, but the project is dead for 8 years. And this
copy-on-write is nice because you don't need changes/restrictions to
your code, or a special garbage collector.


Then I have a solution for you, one that is cheaper than anything else 
you are trying to do (taking work hours into account):


BUY MORE RAM!

RAM is damn cheap. You just need more of it. And 64-bit Python :-)


Sturla

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Sturla Molden


Den 24.05.2011 17:39, skrev Artur Siekielski:


Disk access is about 1000x slower than memory access in C, and Python
in a worst case is 50x slower than C, so there is still a huge win
(not to mention that in a common case Python is only a few times
slower).


You can put databases in shared memory (e.g. Sqlite and BSDDB have options
for this). On linux you can also mount /dev/shm as ramdisk. Also, why
do you distrust the database developers of Oracle et al. not to do the
suffient optimizations?

Sturla


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-23 Thread Sturla Molden


Den 23.05.2011 06:59, skrev Martin v. Löwis:


My expectation is that your approach would likely make the issues
worse in a multi-CPU setting. If you put multiple reference counters
into a contiguous block of memory, unrelated reference counters will
live in the same cache line. Consequentially, changing one reference
counter on one CPU will invalidate the cached reference counters of
that cache line on other CPU, making your problem a) actually worse.


In a multi-threaded setting with concurrent thread accessing reference 
counts, this would certainly worsen the situation.


In a single-threaded setting, this will likely be an improvement.

CPython, however, has a GIL. Thus there is only one concurrently active 
thread with access to reference counts. On a thread switch in the 
interpreter, I think the performance result will depend on the nature of 
the Python code: If threads share a lot of objects, it could help to 
reduce the number of dirty cache lines. If threads mainly work on 
private objects, it would likely have the effect you predict. Which will 
dominate is hard to tell.


Instead, we could use multiple heaps:

Each Python thread could manage it's own heap for malloc and free (cf. 
HeapAlloc and HeapFree in Windows). Objects local to one thread only 
reside in the locally managed heap.


When an object becomes shared by seveeral Python threads, it is moved 
from a local heap to the global heap of the process. Some objects, such 
as modules, would be stored directly onto the global heap.


This way, objects only used by only one thread would never dirty cache 
lines used by other threads.


This would also be a way to reduce the CPython dependency on the GIL. 
Only the global heap would need to be protected by the GIL, whereas the 
local heaps would not need any global synchronization.



(I am setting follow-up to the Python Ideas list, it does not belong on 
Python dev.)


Sturla Molden
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-23 Thread Sturla Molden


Den 24.05.2011 00:07, skrev Artur Siekielski:


Oh, and using explicit shared memory or mmap is much harder, because
you have to map the whole object graph into bytes.


It sounds like you need PYRO, POSH or multiprocessing's proxy objects.

Sturla
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Bugs in thread_nt.h

2011-03-10 Thread Sturla Molden


Den 10.03.2011 11:06, skrev Scott Dial:

http://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt





The important part here (forgive me for being a pedant) is that register 
allocation of the (1) 'owned' field is actually unwanted, and (2) 
Microsoft specify 'volatile' in calls to the Interlocked* API-functions.


I am sorry for misreading 32 bits (4 bytes) as 32 bytes. That is 
obviously very different. If Microsoft's malloc is sufficient, why does 
MSDN tell us to use _aligned_malloc instead of malloc?



The rest is a comment to the link and a bit OT:

Their argument is that (1) volatile supresses optimization; (2) within a 
critical section, access to shared data is synchronized; and thus (3) 
volatile is critical there.


That is true, to some extent.

Volatile is not a memory barrier, nor a mutex. Volatile's main purpose 
is to prevent the compiler from storing  a variable in a register. 
Volatile might be used incorrectly if we don't understand this.


Obvious usecases for volatile are:

- Implementation of a spinlock, where register allocation is detrimental.
- A buffer that is filled from the outside with some DMA mechanism.
- Real-time programs and games where order of execution and and timing 
is critical, so optimization must be supressed.


Even though volatile is not needed for processing within a critical 
section, we still need the shared data to be re-loaded upon entering and 
er-written upon leaving. We can use a typecast to non-volatile within a 
critical section, and achieve both data consisitency and compiler 
optimization. OpenMP has a better mechanism for this, where a 
flush-operand (#pragma omp flush) will force a synchronization of shared 
data among threads (write and reload), and volatile is never needed for 
consistency of shared data.


Sturla






___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Bugs in thread_nt.h

2011-03-09 Thread Sturla Molden



Atomic operations (InterlockedCompareExchange, et al.) are used on the 
field 'owned' in NRMUTEX. These methods require the memory to be aligned 
on 32-byte boundaries. They also require the volatile qualifer. Three 
small changes are therefore needed (see below).



Regards,
Sturla Molden





typedef struct NRMUTEX {
volatile LONG   owned ;  /* Bugfix: remember volatile */
DWORD  thread_id ;
HANDLE hevent ;
} NRMUTEX, *PNRMUTEX;


NRMUTEX
AllocNonRecursiveMutex(void)
{
PNRMUTEX mutex = (PNRMUTEX)_aligned_malloc(sizeof(NRMUTEX),32) ; /* 
Bugfix: align to 32-bytes */

if (mutex  !InitializeNonRecursiveMutex(mutex))
{
free(mutex) ;
mutex = NULL ;
}
return mutex ;
}

void
FreeNonRecursiveMutex(PNRMUTEX mutex)
{
if (mutex)
{
DeleteNonRecursiveMutex(mutex) ;
_aligned_free(mutex) ; /* Bugfix: align to 32-bytes */
}
}







___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Bugs in thread_nt.h

2011-03-09 Thread Sturla Molden


Den 10.03.2011 03:02, skrev Mark Hammond:
These issues are best put in the tracker so they don't get lost - 
especially at the moment with lots of regulars at pycon.



 Ok, sorry :-)


It would also be good to know if there is an actual behaviour bug 
caused by this (ie, what problems can be observed which are caused by 
the current code?)


None that I have observed, but this is required according to MSDN.

Theoretically, an optimizing compiler could cache the 'owned' field if 
it's not declared volatile. It currently works because a wait on the 
lock is implemented with a WaitForSingleObject on a kernel event object 
when the waitfalg is set. If the wait mechanism is changed to a much 
less expensive user-space spinlock, just releasing the time-slice by 
Sleep(0) for each iteration, it will certainly fail without a volatile 
qualifier.


As for InterlockedCompareExchange et al., MSDN says this: The 
parameters for this function must be aligned on a 32-bit boundary; 
otherwise, the function will behave unpredictably on multiprocessor x86 
systems and any non-x86 systems. See _aligned_malloc.


Well, it does not hurt to obey :-)

Regards,
Sturla



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] mingw support?

2010-08-13 Thread Sturla Molden


 The problem really is that when people ask for MingW support, they mean
 all kinds of things,

Usually it means they want to build C or C++ extensions, don't have Visual
Studio, don't know about the SDK compiler, and have misunderstood the CRT
problem.

As long at Python builds with the free Windows 7 SDK, I think it is
sufficient to that mingw is supported for extensions (and the only reasons
for selecing mingw over Microsoft C/C++ on windows are Fortran and C99 --
the Windows SDK compiler is a free download as well.)

Enthought (32-bit) ships with a mingw gcc compiler configured to build
extensions. That might be something to consider for Python on Windows. It
will prevent accidental linking with wrong libraries (particularly the
CRT).

Sturla





___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] mingw support?

2010-08-13 Thread Sturla Molden

Cesare Di Mauro:

 I like to use Windows because it's a comfortable and productive
 environment,
 certainly not because someone forced me to use it.

 Also, I have limited time, so I want to spend it the better I can,
 focusing
 on solving real problems. Setup, Next, Next, Finish, and I want it working
 without thinking about anything else.
[...]
 Give users a better choice, and I don't see logical reasons because
 they'll
 not change their mind.


I use Windows too, even though I am a scientist and most people seem to
prefer Linux for scientific computing. I do like to just click on the
installer from Enthought and forget about building all the binaries an
libraries myself. Maybe I am just lazy...

But likewise, I think that most Windows users don't care which C compiler
was used to build Python. Nor do we/they care which compiler was used to
build any other third-party software, as long as the MSI installers works
and the binaries are void of malware.

Also note that there are non-standard things on Windows that mingw does
not support properly, such as COM and structured exceptions. Extensions
like pywin32 depend on Microsoft C/C++ for that reason.

So for Windows I think it is sufficient to support mingw for extension
libraries. The annoying part is the CRT DLL hell, which is the fault of
Microsoft. An easy fix would be a Python/mingw bundle, or a correctly
configured mingw compiler from python.org. Or Python devs could consider
not using Microsoft's CRT at all on Windows, and replacing it with a
custom CRT or plain Windows API calls.

Sturla









___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] mingw support?

2010-08-11 Thread Sturla Molden


David Cournapeau:
 Autotools only help for posix-like platforms. They are certainly a big
 hindrance on windows platform in general,

That is why mingw has MSYS.

mingw is not just a gcc port, but also a miniature gnu environment for
windows. MSYS' bash shell allows us to do things like:

$ ./configure
$ make  make install



Sturla

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] mingw support?

2010-08-09 Thread Sturla Molden

 Terry Reedy:

MingW has become less attractive in recent years by the difficulty
 in downloading and installing a current version and finding out how to
 do so. Some projects have moved on to the TDM packaging of MingW.

 http://tdm-gcc.tdragon.net/


MinGW has become a mess. Equation.com used to have a decent installer, but
at some point they started to ship mingw builds with a Trojan. TDM looks
OK for now.

Building 32-bit Python extension works with MinGW. 64-bit extensions are
not possible due to lacking import libraries (no libmsvcr90.a and
libpython26.a for amd64). It is not possible to build Python with mingw,
only extensions.

I think it is possible to build Python with Microsoft's SDK compiler, as
it has nmake. The latest is Windows 7 SDK for .NET 4, but we need the
version for .NET 3.5 to maintain CRT compatibility with current Python
releases.

Python's distutils do not work with the SDK compiler, only Visual Studio.
Building Python extensions with the SDK compiler is not as easy as it
could (or should) be.

One advantage of mingw for scientific programmers (which a frequent users
of Python) is the gfortran compiler. Although it is not as capable as
Absoft or Intel Fortran, it is still decent and can be used with f2py.
This makes the lack of 64-bit support for Python extensions with mingw
particularly annoying. Microsoft's SDK does not have a Fortran compiler,
and commercial versions are very expensive (though I prefer to pay for
Absoft anyway).

I do not wish for a complete build process for mingw. But support for
64-bit exensions with mingw and distutils support for Microsoft's SDK
compiler would be nice.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] mingw support?

2010-08-09 Thread Sturla Molden


 Please understand that this very choice is there already.

 That's great. Is that what the documentation refers to when it says

 
 MSVCCompiler will normally choose the right compiler, linker etc. on its
 own. To override this choice, the environment variables
 DISTUTILS_USE_SDK and MSSdk must be both set. MSSdk indicates that the
 current environment has been setup by the SDK?s SetEnv.Cmd script, or
 that the environment variables had been registered when the SDK was
 installed; DISTUTILS_USE_SDK indicates that the distutils user has made
 an explicit choice to override the compiler selection by MSVCCompiler.
 

 That isn't particularly clear to me, but it may be to others more
 familiar with building on Windows.


Ahh...

MSSdk must be set typically means we must use the Windows 7 SDK command
prompt.

Without DISTUTILS_USE_SDK, the build fails:


C:\DEVELOPMENT\test-distutilssetup.py build_ext
running build_ext
building 'test' extension
Traceback (most recent call last):
  File C:\DEVELOPMENT\test-distutils\setup.py, line 6, in module
ext_modules=[Extension('test', ['test.c'])],
  File C:\Python26\lib\distutils\core.py, line 152, in setup
dist.run_commands()
  File C:\Python26\lib\distutils\dist.py, line 975, in run_commands
self.run_command(cmd)
  File C:\Python26\lib\distutils\dist.py, line 995, in run_command
cmd_obj.run()
  File C:\Python26\lib\distutils\command\build_ext.py, line 340, in run
self.build_extensions()
  File C:\Python26\lib\distutils\command\build_ext.py, line 449, in
build_exte
nsions
self.build_extension(ext)
  File C:\Python26\lib\distutils\command\build_ext.py, line 499, in
build_exte
nsion
depends=ext.depends)
  File C:\Python26\lib\distutils\msvc9compiler.py, line 449, in compile
self.initialize()
  File C:\Python26\lib\distutils\msvc9compiler.py, line 359, in initialize
vc_env = query_vcvarsall(VERSION, plat_spec)
  File C:\Python26\lib\distutils\msvc9compiler.py, line 275, in
query_vcvarsal
l
raise ValueError(str(list(result.keys(
ValueError: [u'path', u'include', u'lib']



Now let's do what the documentations says:

C:\DEVELOPMENT\test-distutilsset DISTUTILS_USE_SDK=1

C:\DEVELOPMENT\test-distutilssetup.py build_ext
running build_ext
building 'test' extension
creating build
creating build\temp.win-amd64-2.6
creating build\temp.win-amd64-2.6\Release
C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\Bin\amd64\cl.exe /c
/nolog
o /Ox /MD /W3 /GS- /DNDEBUG -IC:\Python26\include -IC:\Python26\PC
/Tctest.c /Fo
build\temp.win-amd64-2.6\Release\test.obj
test.c
creating build\lib.win-amd64-2.6
C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\Bin\amd64\link.exe
/DLL /n
ologo /INCREMENTAL:NO /LIBPATH:C:\Python26\libs
/LIBPATH:C:\Python26\PCbuild\amd
64 /EXPORT:inittest build\temp.win-amd64-2.6\Release\test.obj
/OUT:build\lib.win
-amd64-2.6\test.pyd /IMPLIB:build\temp.win-amd64-2.6\Release\test.lib
/MANIFESTF
ILE:build\temp.win-amd64-2.6\Release\test.pyd.manifest
test.obj : warning LNK4197: export 'inittest' specified multiple times;
using fi
rst specification
   Creating library build\temp.win-amd64-2.6\Release\test.lib and object
build\t
emp.win-amd64-2.6\Release\test.exp
C:\Program Files\Microsoft SDKs\Windows\v7.0\Bin\x64\mt.exe -nologo
-manifest bu
ild\temp.win-amd64-2.6\Release\test.pyd.manifest
-outputresource:build\lib.win-a
md64-2.6\test.pyd;2


:-D


Sturla

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] mingw support?

2010-08-09 Thread Sturla Molden


 At one point Mike Fletcher published a patch to make distutils use the
 SDK compiler. It would make a lot of sense if it were built in to
 distutils as a further compiler choice.

 Please understand that this very choice is there already.


Yes you are right. I did not know about DISTUTILS_USE_SDK.

Sturla

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Sturla Molden


Martin v. Löwis skrev:

b) notice that, on Windows, minimum wait resolution may be as large as
   15ms (e.g. on XP, depending on the hardware). Not sure what this
   means for WaitForMultipleObjects; most likely, if you ask for a 5ms
   wait, it waits until the next clock tick. It would be bad if, on
   some systems, a wait of 5ms would mean that it immediately returns.
  

Which is why one should use multimedia timers with QPC on Windows.

To get a wait function with much better resolution than Windows' 
default, do this:


1. Set a high resolution with timeBeginPeriod.

2. Loop using a time-out of 0 for WaitForMultipleObjects and put a 
Sleep(0) in the loop not to burn the CPU. Call QPF to get a precise 
timing, and break the loop when the requested time-out has been reached.


3. When you are done, call timeBeginPeriod to turn the multimedia timer off.

This is how you create usleep() in Windows as well: Just loop on QPF and 
Sleep(0) after setting timeBeginPeriod(1).







___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Sturla Molden


Martin v. Löwis skrev:

Maybe you should study the code under discussion before making such
a proposal.

I did, and it does nothing of what I suggested. I am sure I can make the 
Windows GIL
in ceval_gil.h and the mutex in thread_nt.h at lot more precise and 
efficient.


This is the kind of code I was talking about, from ceval_gil.h:

r = WaitForMultipleObjects(2, objects, TRUE, milliseconds);

I would turn on multimedia timer (it is not on by default), and replace 
this

call with a loop, approximately like this:

for (;;) {
  r = WaitForMultipleObjects(2, objects, TRUE, 0);
  /* blah blah blah */   
  QueryPerformanceCounter(cnt);  
  if (cnt  timeout) break;

  Sleep(0);
}

And the timeout milliseconds would now be computed from querying the 
performance

counter, instead of unreliably by the Windows NT kernel.



Sturla










___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Sturla Molden


Sturla Molden skrev:


I would turn on multimedia timer (it is not on by default), and 
replace this

call with a loop, approximately like this:

for (;;) {
  r = WaitForMultipleObjects(2, objects, TRUE, 0);
  /* blah blah blah */ QueryPerformanceCounter(cnt);if (cnt  
timeout) break;

  Sleep(0);
}


And just so you don't ask: There should not just be a Sleep(0) in the 
loop, but a sleep that gets shorter and shorter until a lower threshold 
is reached, where it skips to Sleep(0). That way we avoid hammering om 
WaitForMultipleObjects and QueryPerformanceCounter more than we need. 
And for all that to work better than just giving a timeout to 
WaitForMultipleObjects, we need the multimedia timer turned on.


Sturla



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Sturla Molden


Sturla Molden skrev:


And just so you don't ask: There should not just be a Sleep(0) in the 
loop, but a sleep that gets shorter and shorter until a lower 
threshold is reached, where it skips to Sleep(0). That way we avoid 
hammering om WaitForMultipleObjects and QueryPerformanceCounter more 
than we need. And for all that to work better than just giving a 
timeout to WaitForMultipleObjects, we need the multimedia timer turned 
on.


The important thing about multimedia timer is that the granularity of 
Sleep() and WaitForMultipleObjects() by default is 10 ms or at most 20 
ms. But if we call


timeBeginPeriod(1);

the MM timer is on and granularity becomes 1 ms or at most 2 ms. But we 
can get even more precise than that by hammering on Sleep(0) for 
timeouts less than 2 ms. We can get typical granularity in the order of 
10 µs, with the occational 100 µs now and then. I know this because I 
was using Windows 2000 to generate TTL signals on the LPT port some 
years ago, and watched them on the oscilloscope.


~ 15 ms granularity is Windows default. But that is brain dead.

By the way Antoine, if you think granularity of 1-2 ms is sufficient, 
i.e. no need for µs precision, then just calling timeBeginPeriod(1) will 
be sufficient.


Sturla








___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Sturla Molden


Antoine Pitrou skrev:

It certainly is.
But once again, I'm no Windows developer and I don't have a native Windost host
to test on; therefore someone else (you?) has to try.
  
I'd love to try, but I don't have VC++ to build Python, I use GCC on 
Windows.


Anyway, the first thing to try then is to call
 
  timeBeginPeriod(1);


once on startup, and leave the rest of the code as it is. If 2-4 ms is 
sufficient we can use timeBeginPeriod(2), etc. Microsoft is claiming 
Windows performs better with high granularity, which is why it is 10 ms 
by default.



Sturla






___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Sturla Molden


Martin v. Löwis skrev:

I did, and it does nothing of what I suggested. I am sure I can make the
Windows GIL in ceval_gil.h and the mutex in thread_nt.h at lot more precise and
efficient.



Hmm. I'm skeptical that your code makes it more accurate, and I
completely fail to see that it makes it more efficient (by what
measurement of efficiency?)

Also, why would making it more accurate make it better? IIUC,
accuracy is completely irrelevant here, though efficiency
(low overhead) does matter.

  

This is the kind of code I was talking about, from ceval_gil.h:

r = WaitForMultipleObjects(2, objects, TRUE, milliseconds);

I would turn on multimedia timer (it is not on by default), and replace
this
call with a loop, approximately like this:

for (;;) {
  r = WaitForMultipleObjects(2, objects, TRUE, 0);
  /* blah blah blah */ QueryPerformanceCounter(cnt);if (cnt 
timeout) break;
  Sleep(0);
}

And the timeout milliseconds would now be computed from querying the
performance counter, instead of unreliably by the Windows NT kernel.



Hmm. This creates a busy wait loop; if you add larger sleep values,
then it loses accuracy.

  
Actually an usleep lookes like this, and the call to the wait function 
must go into the for loop. But no, it's not a busy sleep.


static int inited = 0;
static __int64 hz;
static double dhz;
const double sleep_granularity = 2.0E10-3;

void usleep( long us )
{
   __int64 cnt, end;
   double diff;
   if (!inited) {
   timeBeginPeriod(1);
   QueryPerformanceFrequency((LARGE_INTEGER*)hz);
   dhz = (double)hz;
   inited = 1;
   }  
   QueryPerformanceCounter((LARGE_INTEGER*)cnt);

   end = cnt + (__int64)(1.0E10-6 * (double)(us) * dhz);
   for (;;) {
   QueryPerformanceCounter((LARGE_INTEGER*)cnt);
   if (cnt = end) break;
   diff = (double)(end - cnt)/dhz;
   if (diff  sleep_granularity)
   Sleep((DWORD)(diff - sleep_granularity));
   else
   Sleep(0);  
   }

}



Why not just call timeBeginPeriod, and then rely on the higher clock
rate for WaitForMultipleObjects?

  

That is what I suggested when Antoine said 1-2 ms was enough.



Sturla




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 2.7 Release? 2.7 == last of the 2.x line?

2009-11-02 Thread Sturla Molden

I'd just like to mention that the scientific community is highly 
dependent on NumPy. As long as NumPy is not ported to Py3k, migration is 
out of the question. Porting NumPy is not a trivial issue. It might take 
a complete rewrite of the whole C base using Cython. NumPy's ABI is not 
even PEP 3118 compliant. Changing the ABI for Py3k might break extension 
code written for NumPy using C. And scientists tend to write CPU-bound 
routines in languages like C and Fortran, not Python, so that is a major 
issue as well. If we port NumPy to Py3k, everyone using NumPy will have 
to port their C code to the new ABI. There are lot of people stuck with 
Python 2.x for this reason. It does not just affect individual 
scientists, but also large projects like IBM and CERN's blue brain and 
NASA's space telecope. So please, do not cancel 2.x support before we 
have ported NumPy, Matplotlib and most of their dependant extensions to 
Py3k. The community of scientists and engineers using Python is growing, 
but shutting down 2.x support might bring an end to that.


Sturla Molden


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Integer behaviour in Python 2.6.4

2009-11-01 Thread Sturla Molden



Why does this happen?

 type(2**31-1)
type 'long'

It seems to have broken NumPy's RNG on Win32.






___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Integer behaviour in Python 2.6.4

2009-11-01 Thread Sturla Molden


Curt Hagenlocher skrev:

Does that not happen on non-Windows platforms? 2**31 can't be
represented as a 32-bit signed integer, so it's automatically promoted
to a long.

  

Yes you are right.

I've now traced down the problem to an integer overflow in NumPy.

It seems to have this Pyrex code:

cdef long lo, hi, diff
[...]
diff = hi - lo - 1

:-D


Sturla


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread Sturla Molden


Antoine Pitrou skrev:

- priority requests, which is an option for a thread requesting the GIL
to be scheduled as soon as possible, and forcibly (rather than any other
threads). 
So Python threads become preemptive rather than cooperative? That would 
be great. :-)


time.sleep should generate a priority request to re-acquire the GIL; and 
so should all other blocking standard library functions with a time-out.



S.M.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread Sturla Molden


Antoine Pitrou skrev:

- priority requests, which is an option for a thread requesting the GIL
to be scheduled as soon as possible, and forcibly (rather than any other
threads). T

Should a priority request for the GIL take a priority number?

- If two threads make a priority requests for the GIL, the one with the 
higher priority should get the GIL first. 

- If a thread with a low priority make a priority request for the GIL, 
it should not be allowed to preempt (take the GIL away from) a 
higher-priority thread, in which case the priority request would be 
ignored. 

Related issue: Should Python threads have priorities? They are after all 
real OS threads.


S.M.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] time.clock() on windows

2009-10-23 Thread Sturla Molden


Kristján Valur Jónsson skrev:

Thanks, I'll take a look in that direction.

  

I have a suggestion, forgive me if I am totally ignorant. :-)

Sturla Molden



#include windows.h

union __reftime {
   double   us;
   __int64  bits;
};

static volatile union __reftime __ref_perftime, __ref_filetime;


double clock()
{
   __int64 cnt, hz, init;
   double us;
   union __reftime ref_filetime;
   union __reftime ref_perftime; 
   for (;;) {

   ref_filetime.bits = __ref_filetime.bits;
   ref_perftime.bits = __ref_perftime.bits;
   if(!QueryPerformanceCounter((LARGE_INTEGER*)cnt)) goto error;
   if(!QueryPerformanceFrequency((LARGE_INTEGER*)hz)) goto error;
   us = ref_filetime.us + ((double)(100*cnt)/(double)hz - 
ref_perftime.us);

   /* verify that values did not change */
   init = InterlockedCompareExchange64((LONGLONG*)__ref_filetime.bits,
   (LONGLONG)ref_filetime.bits,
   (LONGLONG)ref_filetime.bits);
   if (init != ref_filetime.bits) continue;
   init = InterlockedCompareExchange64((LONGLONG*)__ref_perftime.bits,
   (LONGLONG)ref_perftime.bits,
   (LONGLONG)ref_perftime.bits);
   if (init == ref_perftime.bits) break;
   }
   return us;
error:
   /* only if there is no performance counter */
   return -1; /* or whatever */
}


int periodic_reftime_check()
{
   /* call this function at regular intervals, e.g. once every second 
*/   
   __int64 cnt1, cnt2, hz;

   FILETIME systime;
   double ft;
   if(!QueryPerformanceFrequency((LARGE_INTEGER*)hz)) goto error;
   if(!QueryPerformanceCounter((LARGE_INTEGER*)cnt1)) goto error;
   GetSystemTimeAsFileTime(systime);
   __ref_filetime.us = (double)(__int64)(systime.dwHighDateTime)) 
 32)
   | ((__int64)(systime.dwLowDateTime)))/10); 
   if(!QueryPerformanceCounter((LARGE_INTEGER*)cnt2)) goto error;

   __ref_perftime.us = 50*(cnt1 + cnt2)/((double)hz);
   return 0;
error:
   /* only if there is no performance counter */
   return -1;   
}










___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] time.clock() on windows

2009-10-23 Thread Sturla Molden

Sturla Molden skrev: 

I have a suggestion, forgive me if I am totally ignorant. :-)

Ah, damn... Since there is a GIL, we don't need any of that crappy 
synchronization. And my code does not correct for the 20 ms time jitter 
in GetSystemTimeAsFileTime. Sorry!



S.M.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL behaviour under Windows

2009-10-22 Thread Sturla Molden


Antoine Pitrou skrev:

This number lacks the elapsed time. 61 switches in one second is probably
enough, the same amount of switches in 10 or 20 seconds is too small (at least
for threads needing good responsivity, e.g. I/O threads).

Also, fair has to take into account the average latency and its relative
stability, which is why I wrote ccbench.
  


Since I am a scientist and statistics interests me, let's do this 
properly :-) Here is a suggestion:


_Py_Ticker is a circular variable. Thus, it can be transformed to an 
angle measured in radians, using:


  a = 2 * pi * _Py_Ticker / _Py_CheckInterval

With simultaneous measurements of a, check interval count x, and time y 
(µs), we can fit the multiple regression:


  y = b0 + b1*cos(a) + b2*sin(a) + b3*x + err

using some non-linear least squares solver. We can then extract all the 
statistics we need on interpreter latencies for ticks with and without 
periodic checks.


On a Python setup with many missed thread switches (pthreads according 
to D. Beazley), we could just extend the model to take into account 
successful and unsccessful check intervals:


  y = b0 + b1*cos(a) + b2*sin(a) + b3*x1 + b4*x2 + err

where x1 being successful thread switches and x2 being missed thread 
switches. But at least on Windows we can use the simpler model.


The reason why multiple regression is needed, is that the record method 
of my GIL_Battle class is not called on every interpreter tick. I thus 
cannot measure precicesly each latency, which I could have done with a 
direct hook into ceval.c. So statistics to the rescue. But on the bright 
side, it reduces the overhead of the profiler.


Would that help?


Sturla Molden

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-ideas] Remove GIL with CAS instructions?

2009-10-22 Thread Sturla Molden


Phillip Sitbon skrev:

Some of this is more low-level. I did see higher performance when
using non-Event objects, although I have not had time to follow up and
do a deeper analysis. The GIL flashing problem with critical
sections can very likely be rectified with a call to Sleep(0) or
YieldProcessor() for those who are worried about it. 
For those who don't know what Sleep(0) on Windows does: It returns the 
reminder of the current time-slice back to the system is a thread with 
equal or higher-priority is ready to run. Otherwise it does nothing.


GIL flashing is a serious issue if it happens often; with the current 
event-based GIL on Windows, it never happens (61 cases of GIL flash in 
100,000 periodic checks is as good as never).


S.M.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL behaviour under Windows

2009-10-21 Thread Sturla Molden


Antoine Pitrou skrev:

(*) http://svn.python.org/view/sandbox/trunk/ccbench/
  

I´ve run it twice on my dual core machine.  It hangs every time, but not in the 
same place:
D:\pydev\python\trunk\PCbuildpython.exe \tmp\ccbench.py



Ah, you should report a bug then. ccbench is pure Python (and not
particularly evil Python), it shouldn't be able to crash the
interpreter.

  

It does not crash the interpreter, but it seems it can deadlock.

Here is what I get con a quadcore (Vista, Python 2.6.3).




D:\ccbench.py
--- Throughput ---

Pi calculation (Python)

threads=1: 568 iterations/s.
threads=2: 253 ( 44 %)
threads=3: 274 ( 48 %)
threads=4: 283 ( 49 %)

regular expression (C)

threads=1: 510 iterations/s.
threads=2: 508 ( 99 %)
threads=3: 503 ( 98 %)
threads=4: 502 ( 98 %)

bz2 compression (C)

threads=1: 456 iterations/s.
threads=2: 892 ( 195 %)
threads=3: 1320 ( 289 %)
threads=4: 1743 ( 382 %)

--- Latency ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 0 ms.)
CPU threads=2: 0 ms. (std dev: 0 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 0 ms. (std dev: 0 ms.)

Background CPU task: regular expression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 37 ms. (std dev: 21 ms.)
CPU threads=2: 379 ms. (std dev: 175 ms.)
CPU threads=3: 625 ms. (std dev: 310 ms.)
CPU threads=4: 718 ms. (std dev: 381 ms.)

Background CPU task: bz2 compression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 0 ms.)
CPU threads=2: 0 ms. (std dev: 0 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 1 ms. (std dev: 3 ms.)




D:\ccbench.py
--- Throughput ---

Pi calculation (Python)

threads=1: 554 iterations/s.
threads=2: 400 ( 72 %)
threads=3: 273 ( 49 %)
threads=4: 231 ( 41 %)

regular expression (C)

threads=1: 508 iterations/s.
threads=2: 509 ( 100 %)
threads=3: 509 ( 100 %)
threads=4: 509 ( 100 %)

bz2 compression (C)

threads=1: 456 iterations/s.
threads=2: 897 ( 196 %)
threads=3: 1316 ( 288 %)

DEADLOCK





D:\ccbench.py
--- Throughput ---

Pi calculation (Python)

threads=1: 559 iterations/s.
threads=2: 397 ( 71 %)
threads=3: 274 ( 49 %)
threads=4: 238 ( 42 %)

regular expression (C)

threads=1: 507 iterations/s.
threads=2: 499 ( 98 %)
threads=3: 505 ( 99 %)
threads=4: 495 ( 97 %)

bz2 compression (C)

threads=1: 455 iterations/s.
threads=2: 896 ( 196 %)
threads=3: 1320 ( 290 %)
threads=4: 1736 ( 381 %)

--- Latency ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 0 ms.)
CPU threads=2: 0 ms. (std dev: 0 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 0 ms. (std dev: 0 ms.)

Background CPU task: regular expression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 34 ms. (std dev: 21 ms.)
CPU threads=2: 358 ms. (std dev: 174 ms.)
CPU threads=3: 619 ms. (std dev: 312 ms.)
CPU threads=4: 742 ms. (std dev: 382 ms.)

Background CPU task: bz2 compression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 0 ms.)
CPU threads=2: 0 ms. (std dev: 0 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 6 ms. (std dev: 13 ms.)









___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL behaviour under Windows

2009-10-21 Thread Sturla Molden


Sturla Molden skrev:

does not crash the interpreter, but it seems it can deadlock.

Here is what I get con a quadcore (Vista, Python 2.6.3).



This what I get with affinity set to CPU  3.

There are deadlocks happening at random locations in ccbench.py. It gets 
worse with affinity set to one processor.


Sturla






D:\start /AFFINITY 3 /B ccbench.py

D:\--- Throughput ---

Pi calculation (Python)

threads=1: 554 iterations/s.
threads=2: 257 ( 46 %)
threads=3: 272 ( 49 %)
threads=4: 280 ( 50 %)

regular expression (C)

threads=1: 501 iterations/s.
threads=2: 505 ( 100 %)
threads=3: 493 ( 98 %)
threads=4: 507 ( 101 %)

bz2 compression (C)

threads=1: 455 iterations/s.
threads=2: 889 ( 195 %)
threads=3: 1309 ( 287 %)
threads=4: 1710 ( 375 %)

--- Latency ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 0 ms.)
CPU threads=2: 0 ms. (std dev: 0 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 0 ms. (std dev: 0 ms.)

Background CPU task: regular expression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 40 ms. (std dev: 22 ms.)
CPU threads=2: 384 ms. (std dev: 179 ms.)
CPU threads=3: 618 ms. (std dev: 314 ms.)
CPU threads=4: 713 ms. (std dev: 379 ms.)

Background CPU task: bz2 compression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 0 ms.)
CPU threads=2: 0 ms. (std dev: 0 ms.)
CPU threads=3: 0 ms. (std dev: 3 ms.)
CPU threads=4: 0 ms. (std dev: 1 ms.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL behaviour under Windows

2009-10-21 Thread Sturla Molden


Antoine Pitrou skrev:

Kristján sent me a patch which I applied and is supposed to fix this.
Anyway, thanks for the numbers. The GIL does seem to fare a bit better (zero
latency with the Pi calculation in the background) than under Linux, although it
may be caused by the limited resolution of time.time() under Windows.

  


My critisism of the GIL on python-ideas was partly motivated by this:

http://blip.tv/file/2232410

However, David Beazley is not talking about Windows. Since the GIL is 
apparently not a mutex on Windows, it could behave differently. So I 
wrote a small script that contructs a GIL battle, and record how often a 
check-interval results in a thread-switch or not. For monitoring check 
intervals, I used a small C extension to read _Py_Ticker from ceval.c. 
It is not declared static so I could easily hack into it.


With two threads and a check interval og 100, only 61 of 10 check 
intervals failed to produce a thread-switch in the interpreter. I'd call 
that rather fair. :-)


And in case someone asks, the nthreads=1 case is just for verification.

S.M.




D:\test.py
check interval = 1
nthreads=1, swiched=0, missed=10
nthreads=2, swiched=57809, missed=42191
nthreads=3, swiched=91535, missed=8465
nthreads=4, swiched=99751, missed=249
nthreads=5, swiched=95839, missed=4161
nthreads=6, swiched=10, missed=0

D:\test.py
check interval = 10
nthreads=1, swiched=0, missed=10
nthreads=2, swiched=99858, missed=142
nthreads=3, swiched=2, missed=8
nthreads=4, swiched=10, missed=0
nthreads=5, swiched=10, missed=0
nthreads=6, swiched=10, missed=0

D:\test.py
check interval = 100
nthreads=1, swiched=0, missed=10
nthreads=2, swiched=99939, missed=61
nthreads=3, swiched=10, missed=0
nthreads=4, swiched=10, missed=0
nthreads=5, swiched=10, missed=0
nthreads=6, swiched=10, missed=0

D:\test.py
check interval = 1000
nthreads=1, swiched=0, missed=10
nthreads=2, swiched=9, missed=1
nthreads=3, swiched=10, missed=0
nthreads=4, swiched=10, missed=0
nthreads=5, swiched=10, missed=0
nthreads=6, swiched=10, missed=0
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GIL behaviour under Windows

2009-10-21 Thread Sturla Molden


Sturla Molden skrev:


However, David Beazley is not talking about Windows. Since the GIL is 
apparently not a mutex on Windows, it could behave differently. So I 
wrote a small script that contructs a GIL battle, and record how often 
a check-interval results in a thread-switch or not. For monitoring 
check intervals, I used a small C extension to read _Py_Ticker from 
ceval.c. It is not declared static so I could easily hack into it.


Anyway, if anyone wants to run a GIL battle, here is the code I used.

If it turns out the GIL is far worse with pthreads, as it is implemented 
with a mutex, it might be a good idea to reimplement it with an event 
object as it is on Windows.


Sturla Molden



In python:

from giltest import *
from time import clock
import threading
import sys


def thread(rank, battle, start):

   while not start.isSet():
   if rank == 0:
   start.set()

   try:
   while 1:
   battle.record(rank)
   except:
   pass   



if __name__ == '__main__':

   sys.setcheckinterval(1000)

   print check interval = %d % sys.getcheckinterval()

   for nthreads  in range(1,7):
  
   start = threading.Event()

   battle = GIL_Battle(10)
   threads = [threading.Thread(target=thread, args=(i,battle,start))
   for i in range(1,nthreads)]
   for t in threads:
   t.setDaemon(True)
   t.start()

   thread(0, battle, start)
   for t in threads: t.join()
  
   s,m = battle.report()
  
   print nthreads=%d, swiched=%d, missed=%d % (nthreads, s, m)



In Cython or Pyrex:


from exceptions import Exception

cdef extern from *:
   ctypedef int vint volatile int
   vint _Py_Ticker

class StopBattle(Exception):
   pass

cdef class GIL_Battle:

tests the fairness of the GIL 

   cdef vint prev_tick, prev_rank, switched, missed
   cdef int trials
  
   def __cinit__(GIL_Battle self, int trials=10):

   self.prev_tick = _Py_Ticker
   self.prev_rank = -1
   self.missed = 0
   self.switched = 0
   self.trials = trials

   def record(GIL_Battle self, int rank):
   if self.trials == self.switched + self.missed:
   raise StopBattle   
   if self.prev_rank == -1:

   self.prev_tick = _Py_Ticker
   self.prev_rank = rank
   else:
   if _Py_Ticker  self.prev_tick:
   if self.prev_rank == rank:
   self.missed += 1
   else:
   self.switched += 1
   self.prev_tick = _Py_Ticker
   self.prev_rank = rank
   else:
   self.prev_tick = _Py_Ticker
  
   def report(GIL_Battle self):

   return int(self.switched), int(self.missed)








___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3142: Add a while clause to generator expressions

2009-01-20 Thread Sturla Molden


On 1/20/2009 4:45 PM, Gerald Britton wrote:

OK, so your suggestion:

 g = (n for n in range(100) if n*n  50 or raiseStopIteration())

really means return in in the range 0-99 if n-squared is less than 50
or the function raiseStopIteration() returns True.

How would this get the generator to stop once n*n =50? It looks
instead like the first time around, StopIteration will be raised and
(presumably) the generator will terminate.



I still find it odd to invent new syntax for simple things like


def quit(): raise StopIteration

gen = itertools.imap( lambda x: x if x = 50 else quit(),
  (i for i in range(100)) )

for i in gen: print i




Sturla Molden


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3142: Add a while clause to generator expressions

2009-01-19 Thread Sturla Molden


On 1/19/2009 6:51 PM, Terry Reedy wrote:

The other, posted by Steven Bethard, is that it fundamentally breaks the 
current semantics of abbreviating (except for iteration variable 
scoping) an 'equivalent' for loop. 


The proposed syntax would suggest that this should be legal as well:

for i in iterable while cond:
   blahblah

or perhaps:

while cond for i in iterable:
   blahblah


A while-for or for-while loop would be a novel invention, not seen in 
any other language that I know of. I seriously doubt its usefulness 
though...




Sturla Molden
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The endless GIL debate: why not remove thread support instead?

2008-12-12 Thread Sturla Molden


On 12/12/2008 11:52 AM, Lennart Regebro wrote:


The use of threads for load balancing should be discouraged, yes. That
is not what they are designed for. Threads are designed to allow
blocking processes to go on in the background without blocking the
main process.


It seems that most programmers with Java or Windows experience don't 
understand this; hence the ever lasting GIL debate.


With multiple interpreters - one interpreter per thread - this could 
still be accomplished. Let one interpreter block while another continues 
to work. Then the result of the blocking operation is messaged back. 
Multi-threaded C libraries could be used the in same way. But there 
would be no need for a GIL, because each interpreter would be a 
single-threaded compartment.


.NET have something similar in what is called 'appdomains'.

I am not suggesting removal of threads but rather the Java threading 
model. I just think it is a mistake to let multiple OS threads touch the 
same interpreter.


Sturla Molden
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] The endless GIL debate: why not remove thread support instead?

2008-12-11 Thread Sturla Molden

Last month there was a discussion on Python-Dev regarding removal of
reference counting to remove the GIL. I hope you forgive me for continuing
the debate.

I think reference counting is a good feature. It prevents huge piles of
garbage from building up. It makes the interpreter run more smoothly. It
is not just important for games and multimedia applications, but also
servers under high load. Python does not pause to look for garbage like
Java or .NET. It only pauses to look for dead reference cycles. This can
be safely turned off temporarily; it can be turned off completely if you
do not create reference cycles. With Java and .NET, no garbage is ever
reclaimed except by the intermittent garbage collection. Python always
reclaims an object when the reference count drops to zero  whether the GC
is enabled or not. This makes Python programs well-behaved. For this
reason, I think removing reference counting is a genuinely bad idea. Even
if the GIL is evil, this remedy is even worse.

I am not a Python core developer; I am a research scientist who use Python
because Matlab is (or used to be) a bad programming language, albeit a
good computing environment. As most people who have worked with scientific
computing know, there are better paradigms for concurrency than threads.
In particular, there are message-passing systems like MPI and Erlang, and
there are autovectorizing compilers for OpenMP and Fortran 90/95. There
are special LAPACK, BLAS and FFT libraries for parallel computer
architectures. There are fork-join systems like cilk and
java.util.concurrent. Threads seem to be used only because mediocre
programmers don't know what else to use.

I genuinely think the use of threads should be discouraged. It leads to
code that are full of bugs and difficult to maintain - race conditions,
deadlocks, and livelocks are common pitfalls. Very few developers are
capable of implementing efficient load-balancing by hand. Multi-threaded
programs tend to scale badly because they are badly written. If the GIL
discourages the abuse of threads, it serves a purpose albeit being evil
like the Linux kernel's BKL.

Python could be better off doing what tcl does. Allow each process to
embed multiple interpreters; run each interpreter in its own thread.
Implement a fast message-passing system between the interpreters (e.g.
copy-on-write by making communicated objects immutable), and Python would
be closer to Erlang than Java.

I thus think the main offender is the thread and threading modules - not
the GIL. Without thread support in the interpreter, there would be no
threads. Without threads, there would be no need for a GIL. Both sources
of evil can be removed by just removing thread support from the Python
interpreter. In addition, it would make Python faster at executing linear
code. Just copy the concurrency model of Erlang instead of Java and get
rid of those nasty threads. In the meanwhile, I'll continue to experiment
with multiprocessing.

Removing reference counting to encourage the use of threads is like
shooting ourselves in the leg twice. Thats my two cents on this issue.

There is another issue to note as well: If you can endure a 200x loss of
efficacy by using Python instead of Fortran, scalability on dual or
quad-core processors may not be that important. Just move the bottlenecks
out of Python and you are much better off.


Regards,
Sturla Molden


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

91 matches

Mail list logo