Re: [Numpy-discussion] Linker script, smaller source files and symbol visibility

2009-04-22 Thread David Cournapeau
On Wed, Apr 22, 2009 at 2:24 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Mon, Apr 20, 2009 at 11:06 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:


 On Mon, Apr 20, 2009 at 10:13 PM, David Cournapeau
 da...@ar.media.kyoto-u.ac.jp wrote:

 Charles R Harris wrote:

 
  Here is a link to the start of the old discussion
 
  http://article.gmane.org/gmane.comp.python.numeric.general/12974/match=exported+symbols+code+reorganization.
  You took part in it also.

 Thanks, I remembered we had the discussion, but could not find it. The
 different is that I am much more familiar with the technical details and
 numpy codebase now :) I know how to control exported symbols on most
 platform which matter (I can't test for AIX or HP-UX unfortunately - but
 I am perfectly fine with ignoring namespace pollution on those anyway),
 and I would guess that the only platforms which do not support symbol
 visibility in one way or the other do not support shared library anyway
 (some CRAY stuff, for example).

 Concerning the file size, I don't think anyone would disagree that they
 are too big, but we don't need to go the java-way of one
 file/class-function either. One first split which I personally like is
 API/implementation. For example, for multiarray.c, we would only keep
 the public PyArray_* functions, and put everything else in another file.
 The other very big file is arrayobject.c, and this one is already mostly
 organized in independent parts (buffer protocol, number protocol, etc...)

 Another thing I would like to do it to make the global C API array
 pointer a 'true' global variable instead of a static one. It took me a
 while when I was working on the hashing protocol for dtype to understand
 why it was crashing (the array pointer being static, every file has its
 own copy, so it was never initialized in the hashdescr.c file). I think
 a true global variable, hidden through a symbol map, is easier to
 understand and more reliable.

 I made an experiment along those lines a couple of years ago. There were
 compilation problems because the needed include files weren't available. No
 doubt that could be fixed in the build, but at some point I would like to
 have real include files, not the generated variety. Generated include files
 are kind of bogus IMHO, as they don't define an interface but rather reflect
 whatever the function definition happens to be. So as any part of a split I
 would also suggest writing the associated include files. That would also
 make separate compilation possible, which would make it easier to do test
 compilations while doing development.

 The list of visible symbols has grown ;)

Yes. Except PyArray_DescrHash which is a mistake on my own, for all
the npy_* symbols, there is nothing we can do ATM because they are
from a pure C (static) library. That's one of the rationale in the
original email :)

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Linker script, smaller source files and symbol visibility

2009-04-21 Thread Charles R Harris
On Mon, Apr 20, 2009 at 11:06 PM, Charles R Harris 
charlesr.har...@gmail.com wrote:



 On Mon, Apr 20, 2009 at 10:13 PM, David Cournapeau 
 da...@ar.media.kyoto-u.ac.jp wrote:

 Charles R Harris wrote:

 
  Here is a link to the start of the old discussion
  
 http://article.gmane.org/gmane.comp.python.numeric.general/12974/match=exported+symbols+code+reorganization
 .
  You took part in it also.

 Thanks, I remembered we had the discussion, but could not find it. The
 different is that I am much more familiar with the technical details and
 numpy codebase now :) I know how to control exported symbols on most
 platform which matter (I can't test for AIX or HP-UX unfortunately - but
 I am perfectly fine with ignoring namespace pollution on those anyway),
 and I would guess that the only platforms which do not support symbol
 visibility in one way or the other do not support shared library anyway
 (some CRAY stuff, for example).

 Concerning the file size, I don't think anyone would disagree that they
 are too big, but we don't need to go the java-way of one
 file/class-function either. One first split which I personally like is
 API/implementation. For example, for multiarray.c, we would only keep
 the public PyArray_* functions, and put everything else in another file.
 The other very big file is arrayobject.c, and this one is already mostly
 organized in independent parts (buffer protocol, number protocol, etc...)

 Another thing I would like to do it to make the global C API array
 pointer a 'true' global variable instead of a static one. It took me a
 while when I was working on the hashing protocol for dtype to understand
 why it was crashing (the array pointer being static, every file has its
 own copy, so it was never initialized in the hashdescr.c file). I think
 a true global variable, hidden through a symbol map, is easier to
 understand and more reliable.


 I made an experiment along those lines a couple of years ago. There were
 compilation problems because the needed include files weren't available. No
 doubt that could be fixed in the build, but at some point I would like to
 have real include files, not the generated variety. Generated include files
 are kind of bogus IMHO, as they don't define an interface but rather reflect
 whatever the function definition happens to be. So as any part of a split I
 would also suggest writing the associated include files. That would also
 make separate compilation possible, which would make it easier to do test
 compilations while doing development.


The list of visible symbols has grown ;)

./multiarray.so
00039360 T PyArray_DescrHash
00073698 T _fini
0003e200 T _flat_copyinto
7c58 T _init
00027d70 T initmultiarray
000728e0 T npy_acos
000723d0 T npy_acosf
00072880 T npy_acosh
00072370 T npy_acoshf
00072e50 T npy_acoshl
00072ee0 T npy_acosl
00072900 T npy_asin
000723f0 T npy_asinf
000728a0 T npy_asinh
00072390 T npy_asinhf
00072e80 T npy_asinhl
00072f10 T npy_asinl
000728c0 T npy_atan
000726b0 T npy_atan2
000721b0 T npy_atan2f
00072c10 T npy_atan2l
000723b0 T npy_atanf
00072860 T npy_atanh
00072350 T npy_atanhf
00072e20 T npy_atanhl
00072eb0 T npy_atanl
00071f70 T npy_ceil
00072040 T npy_ceilf
00071eb0 T npy_ceill
00072a90 T npy_cos
00072580 T npy_cosf
00072a30 T npy_cosh
00072520 T npy_coshf
000730b0 T npy_coshl
00073140 T npy_cosl
00071fd0 T npy_deg2rad
000720a0 T npy_deg2radf
00071f00 T npy_deg2radl
000726e0 T npy_exp
000727b0 T npy_exp2
00072720 T npy_exp2_1m
00072220 T npy_exp2_1mf
00072cc0 T npy_exp2_1ml
000722b0 T npy_exp2f
00072d50 T npy_exp2l
000721e0 T npy_expf
00072c60 T npy_expl
00072920 T npy_expm1
00072410 T npy_expm1f
00072f40 T npy_expm1l
00071f20 T npy_fabs
00071ff0 T npy_fabsf
00071e70 T npy_fabsl
00071f30 T npy_floor
00072000 T npy_floorf
00071e80 T npy_floorl
000725f0 T npy_fmod
000720f0 T npy_fmodf
00072b10 T npy_fmodl
00072680 T npy_hypot
00072180 T npy_hypotf
00072bc0 T npy_hypotl
00072940 T npy_log
00072960 T npy_log10
00072450 T npy_log10f
00072fa0 T npy_log10l
000727d0 T npy_log1p
000722d0 T npy_log1pf
00072d80 T npy_log1pl
00072700 T npy_log2
00072200 T npy_log2f
00072c90 T npy_log2l
000727f0 T npy_logaddexp
00073290 T npy_logaddexp2
00073390 T npy_logaddexp2f
000731a0 T npy_logaddexp2l
000722f0 T npy_logaddexpf
00072db0 T npy_logaddexpl
00072430 T npy_logf
00072f70 T npy_logl
000725c0 T npy_modf
000720c0 T npy_modff
00072ad0 T npy_modfl
00072650 T npy_pow
00072150 T npy_powf
00072b70 T npy_powl
00071fb0 T npy_rad2deg
00072080 T npy_rad2degf
00071ee0 T npy_rad2degl
000729f0 T npy_rint
000724e0 T npy_rintf
00073050 T npy_rintl
00072ab0 T npy_sin
000725a0 T npy_sinf
00072a50 T npy_sinh
00072540 T npy_sinhf
000730e0 T npy_sinhl
00073170 T npy_sinl
00072980 T npy_sqrt
00072470 T npy_sqrtf
00072fd0 T npy_sqrtl
00072a70 T npy_tan
00072560 T npy_tanf
00072a10 T npy_tanh
00072500 T npy_tanhf
00073080 T npy_tanhl
00073110 T npy_tanl
000729d0 T npy_trunc
000724c0 T npy_truncf
00073020 T npy_truncl
./umath_tests.so
1a38 T _fini

Re: [Numpy-discussion] Linker script, smaller source files and symbol visibility

2009-04-21 Thread Charles R Harris
On Tue, Apr 21, 2009 at 11:24 PM, Charles R Harris 
charlesr.har...@gmail.com wrote:



 On Mon, Apr 20, 2009 at 11:06 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:



 On Mon, Apr 20, 2009 at 10:13 PM, David Cournapeau 
 da...@ar.media.kyoto-u.ac.jp wrote:

 Charles R Harris wrote:

 
  Here is a link to the start of the old discussion
  
 http://article.gmane.org/gmane.comp.python.numeric.general/12974/match=exported+symbols+code+reorganization
 .
  You took part in it also.

 Thanks, I remembered we had the discussion, but could not find it. The
 different is that I am much more familiar with the technical details and
 numpy codebase now :) I know how to control exported symbols on most
 platform which matter (I can't test for AIX or HP-UX unfortunately - but
 I am perfectly fine with ignoring namespace pollution on those anyway),
 and I would guess that the only platforms which do not support symbol
 visibility in one way or the other do not support shared library anyway
 (some CRAY stuff, for example).

 Concerning the file size, I don't think anyone would disagree that they
 are too big, but we don't need to go the java-way of one
 file/class-function either. One first split which I personally like is
 API/implementation. For example, for multiarray.c, we would only keep
 the public PyArray_* functions, and put everything else in another file.
 The other very big file is arrayobject.c, and this one is already mostly
 organized in independent parts (buffer protocol, number protocol, etc...)

 Another thing I would like to do it to make the global C API array
 pointer a 'true' global variable instead of a static one. It took me a
 while when I was working on the hashing protocol for dtype to understand
 why it was crashing (the array pointer being static, every file has its
 own copy, so it was never initialized in the hashdescr.c file). I think
 a true global variable, hidden through a symbol map, is easier to
 understand and more reliable.


 I made an experiment along those lines a couple of years ago. There were
 compilation problems because the needed include files weren't available. No
 doubt that could be fixed in the build, but at some point I would like to
 have real include files, not the generated variety. Generated include files
 are kind of bogus IMHO, as they don't define an interface but rather reflect
 whatever the function definition happens to be. So as any part of a split I
 would also suggest writing the associated include files. That would also
 make separate compilation possible, which would make it easier to do test
 compilations while doing development.


 The list of visible symbols has grown ;)

 ./multiarray.so
 00039360 T PyArray_DescrHash
 00073698 T _fini
 0003e200 T _flat_copyinto
 7c58 T _init
 00027d70 T initmultiarray
 000728e0 T npy_acos
 000723d0 T npy_acosf
 00072880 T npy_acosh
 00072370 T npy_acoshf
 00072e50 T npy_acoshl
 00072ee0 T npy_acosl
 00072900 T npy_asin
 000723f0 T npy_asinf
 000728a0 T npy_asinh
 00072390 T npy_asinhf
 00072e80 T npy_asinhl
 00072f10 T npy_asinl
 000728c0 T npy_atan
 000726b0 T npy_atan2
 000721b0 T npy_atan2f
 00072c10 T npy_atan2l
 000723b0 T npy_atanf
 00072860 T npy_atanh
 00072350 T npy_atanhf
 00072e20 T npy_atanhl
 00072eb0 T npy_atanl
 00071f70 T npy_ceil
 00072040 T npy_ceilf
 00071eb0 T npy_ceill
 00072a90 T npy_cos
 00072580 T npy_cosf
 00072a30 T npy_cosh
 00072520 T npy_coshf
 000730b0 T npy_coshl
 00073140 T npy_cosl
 00071fd0 T npy_deg2rad
 000720a0 T npy_deg2radf
 00071f00 T npy_deg2radl
 000726e0 T npy_exp
 000727b0 T npy_exp2
 00072720 T npy_exp2_1m
 00072220 T npy_exp2_1mf
 00072cc0 T npy_exp2_1ml
 000722b0 T npy_exp2f
 00072d50 T npy_exp2l
 000721e0 T npy_expf
 00072c60 T npy_expl
 00072920 T npy_expm1
 00072410 T npy_expm1f
 00072f40 T npy_expm1l
 00071f20 T npy_fabs
 00071ff0 T npy_fabsf
 00071e70 T npy_fabsl
 00071f30 T npy_floor
 00072000 T npy_floorf
 00071e80 T npy_floorl
 000725f0 T npy_fmod
 000720f0 T npy_fmodf
 00072b10 T npy_fmodl
 00072680 T npy_hypot
 00072180 T npy_hypotf
 00072bc0 T npy_hypotl
 00072940 T npy_log
 00072960 T npy_log10
 00072450 T npy_log10f
 00072fa0 T npy_log10l
 000727d0 T npy_log1p
 000722d0 T npy_log1pf
 00072d80 T npy_log1pl
 00072700 T npy_log2
 00072200 T npy_log2f
 00072c90 T npy_log2l
 000727f0 T npy_logaddexp
 00073290 T npy_logaddexp2
 00073390 T npy_logaddexp2f
 000731a0 T npy_logaddexp2l
 000722f0 T npy_logaddexpf
 00072db0 T npy_logaddexpl
 00072430 T npy_logf
 00072f70 T npy_logl
 000725c0 T npy_modf
 000720c0 T npy_modff
 00072ad0 T npy_modfl
 00072650 T npy_pow
 00072150 T npy_powf
 00072b70 T npy_powl
 00071fb0 T npy_rad2deg
 00072080 T npy_rad2degf
 00071ee0 T npy_rad2degl
 000729f0 T npy_rint
 000724e0 T npy_rintf
 00073050 T npy_rintl
 00072ab0 T npy_sin
 000725a0 T npy_sinf
 00072a50 T npy_sinh
 00072540 T npy_sinhf
 000730e0 T npy_sinhl
 00073170 T npy_sinl
 00072980 T npy_sqrt
 00072470 T npy_sqrtf
 00072fd0 T npy_sqrtl
 00072a70 T npy_tan
 

[Numpy-discussion] Linker script, smaller source files and symbol visibility

2009-04-20 Thread David Cournapeau
Hi,

For quite a long time I have been bothered by the very large files
needed for python extensions. In particular for numpy.core, which
consists in a few files which are ~ 1 Mb, I find this a pretty high
barrier of entry for newcomers, and it has quite a big impact on the
code organization. I think I have found a way to split things on common
platforms (this includes at least windows, mac os x, linux and solaris),
without impacting other  potentially less capable platforms, or static
linking of numpy.

Assuming my idea is technically sound and that I can demonstrate it
works on say Linux without impacting other platforms (see example
below), would that be considered useful ?

cheers,

David

Technical details
==

The rationale for doing things as they are is a C limitation related
to symbol visibility being limited to file scope, i.e. if you want to
share a function into several files without making it public in the
binary, you have to tag the function static, and include all .c files
which use this function into one giant .c file. That's how we do it in
numpy. Many binary format (elf, coff and Mach-O) have a mechanism to
limit the symbol visibility, so that we can explicitly set the functions
we do want to export. With a couple of defines, we could either include
every files and tag the implementation functions as static, or link
every file together and limit symbol visibility with some linker magic.

Example
---

I use the spam example from the official python doc, with one function
PySpam_System which is exported in a C API, and the actual
implementation is _pyspam_system.

* spammodule.c: define the interface available from python interpreter:

#include
Python.h  

#include
stdio.h   


 

#define
SPAM_MODULE 
 

#include
spammodule.h  

#include
spammodule_imp.h  


/* if we don't know how to deal with symbol visibility on the platform,
just include everything in one file */
#ifdef
SYMBOL_SCRIPT_UNSUPPORTED   
  

#include
spammodule_imp.c  

#endif  
 


/* C API for spam module */

  

static int  
PySpam_System(const char *command)
{
_pyspam_implementation(command);
return 0;
}

* spammodule_imp.h: declares the implementation, should only be included
by spammodule.c and spammodule_imp.c which implements the actual function

#ifndef _IMP_H_
#define _IMP_H_

#ifndef SPAM_MODULE
#error this should not be included unless you really know what you are doing
#endif

#ifdef SYMBOL_SCRIPT_UNSUPPORTED
#define SPAM_PRIVATE static
#else
#define SPAM_PRIVATE
#endif

SPAM_PRIVATE int
_pyspam_implementation(const char *command);

#endif

For supported platforms (where SYMBOL_SCRIPT_UNSUPPORTED is not
defined), _pyspam_implementation would not be visible because we would
have a list of functions to export (only initspam in this case).

Advantages
--

This has several advantages on platforms where this is supported
- code more amenable: source code which are thousand of lines are
difficult to follow
- faster compilation times: in my experience, compilation time
doesn't scale linearly with the amount of code.
- compilation can be better parallelized
- changing one file does not force a whole multiarray/ufunc module
recompilation (which can be pretty long when you chase bugs in it)

Another advantage is related to namespace pollution. Since library
extensions are static libraries for now, any symbol frome those
libraries used by any extension is publicly available. For example, now
that multiarray.so uses the npy_math library, every symbol in npy_math
is in the public namespace. That's also true for every scipy extensions
(for example, _fftpack.so exports the whole dfftpack public API). If we
want to go further down the road of making core computational code
publicly available, I think we should improve this first.

Disadvantage


We need to code it. There are two parts:
- numpy.distutils support: I have already something working in for
linux. Once we have one platform working, adding others should not be a
problem
- changing the C code: we could at first splitting things in .c
files but still including everything, and then starting the conversion.



Re: [Numpy-discussion] Linker script, smaller source files and symbol visibility

2009-04-20 Thread Charles R Harris
Hi David

On Mon, Apr 20, 2009 at 6:51 AM, David Cournapeau 
da...@ar.media.kyoto-u.ac.jp wrote:

 Hi,

For quite a long time I have been bothered by the very large files
 needed for python extensions. In particular for numpy.core, which
 consists in a few files which are ~ 1 Mb, I find this a pretty high
 barrier of entry for newcomers, and it has quite a big impact on the
 code organization. I think I have found a way to split things on common
 platforms (this includes at least windows, mac os x, linux and solaris),
 without impacting other  potentially less capable platforms, or static
 linking of numpy.


There was a discussion of this a couple of years ago. I was in favor of many
small files maybe in subdirectories. Robert, IIRC, thought too many small
files could become confusing, so there is a fine line in there somewhere.  I
am generally in favor of breaking the files up into their functional
components and maybe rewriting some of the upper level interface files in
cython. But it does need some agreement and we should probably start by just
breaking up a few files. I don't have a problem with big files that are just
collections of small routines all of the same type, umath_loops.inc.src for
instance.



 Assuming my idea is technically sound and that I can demonstrate it
 works on say Linux without impacting other platforms (see example
 below), would that be considered useful ?


Definitly worth consideration.



 cheers,

 David

 Technical details
 ==

The rationale for doing things as they are is a C limitation related
 to symbol visibility being limited to file scope, i.e. if you want to
 share a function into several files without making it public in the
 binary, you have to tag the function static, and include all .c files
 which use this function into one giant .c file. That's how we do it in
 numpy. Many binary format (elf, coff and Mach-O) have a mechanism to
 limit the symbol visibility, so that we can explicitly set the functions
 we do want to export. With a couple of defines, we could either include
 every files and tag the implementation functions as static, or link
 every file together and limit symbol visibility with some linker magic.


Maybe just not worry about symbol visibility on other platforms. It is one
of those warts that only becomes apparent when you go looking for it. For
instance, the current *.so has some extraneous symbols but I don't hear
folks complaining.



 Example
 ---

 I use the spam example from the official python doc, with one function
 PySpam_System which is exported in a C API, and the actual
 implementation is _pyspam_system.

 * spammodule.c: define the interface available from python interpreter:

 #include
 Python.h

 #include
 stdio.h



 #define
 SPAM_MODULE

 #include
 spammodule.h

 #include
 spammodule_imp.h


 /* if we don't know how to deal with symbol visibility on the platform,
 just include everything in one file */
 #ifdef
 SYMBOL_SCRIPT_UNSUPPORTED

 #include
 spammodule_imp.c

 #endif


 /* C API for spam module */


 static int
 PySpam_System(const char *command)
 {
_pyspam_implementation(command);
return 0;
 }

 * spammodule_imp.h: declares the implementation, should only be included
 by spammodule.c and spammodule_imp.c which implements the actual function

 #ifndef _IMP_H_
 #define _IMP_H_

 #ifndef SPAM_MODULE
 #error this should not be included unless you really know what you are
 doing
 #endif

 #ifdef SYMBOL_SCRIPT_UNSUPPORTED
 #define SPAM_PRIVATE static
 #else
 #define SPAM_PRIVATE
 #endif

 SPAM_PRIVATE int
 _pyspam_implementation(const char *command);

 #endif

 For supported platforms (where SYMBOL_SCRIPT_UNSUPPORTED is not
 defined), _pyspam_implementation would not be visible because we would
 have a list of functions to export (only initspam in this case).

 Advantages
 --

 This has several advantages on platforms where this is supported
- code more amenable: source code which are thousand of lines are
 difficult to follow
- faster compilation times: in my experience, compilation time
 doesn't scale linearly with the amount of code.
- compilation can be better parallelized
- changing one file does not force a whole multiarray/ufunc module
 recompilation (which can be pretty long when you chase bugs in it)

 Another advantage is related to namespace pollution. Since library
 extensions are static libraries for now, any symbol frome those
 libraries used by any extension is publicly available. For example, now
 that multiarray.so uses the npy_math library, every symbol in npy_math
 is in the public namespace. That's also true for every scipy extensions
 (for example, _fftpack.so exports the whole dfftpack public API). If we
 want to go further down the road of making core computational code
 publicly available, I think we should improve this first.

 Disadvantage
 

 We need to code it. There are two parts:
- numpy.distutils support: I have 

Re: [Numpy-discussion] Linker script, smaller source files and symbol visibility

2009-04-20 Thread Charles R Harris
On Mon, Apr 20, 2009 at 9:48 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:

 Hi David

 On Mon, Apr 20, 2009 at 6:51 AM, David Cournapeau 
 da...@ar.media.kyoto-u.ac.jp wrote:

 Hi,

For quite a long time I have been bothered by the very large files
 needed for python extensions. In particular for numpy.core, which
 consists in a few files which are ~ 1 Mb, I find this a pretty high
 barrier of entry for newcomers, and it has quite a big impact on the
 code organization. I think I have found a way to split things on common
 platforms (this includes at least windows, mac os x, linux and solaris),
 without impacting other  potentially less capable platforms, or static
 linking of numpy.


 There was a discussion of this a couple of years ago. I was in favor of
 many small files maybe in subdirectories. Robert, IIRC, thought too many
 small files could become confusing, so there is a fine line in there
 somewhere.  I am generally in favor of breaking the files up into their
 functional components and maybe rewriting some of the upper level interface
 files in cython. But it does need some agreement and we should probably
 start by just breaking up a few files. I don't have a problem with big files
 that are just collections of small routines all of the same type,
 umath_loops.inc.src for instance.



Here is a link to the start of the old
discussionhttp://article.gmane.org/gmane.comp.python.numeric.general/12974/match=exported+symbols+code+reorganization.
You took part in it also.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Linker script, smaller source files and symbol visibility

2009-04-20 Thread David Cournapeau
Charles R Harris wrote:
 

 Here is a link to the start of the old discussion
 http://article.gmane.org/gmane.comp.python.numeric.general/12974/match=exported+symbols+code+reorganization.
 You took part in it also.

Thanks, I remembered we had the discussion, but could not find it. The
different is that I am much more familiar with the technical details and
numpy codebase now :) I know how to control exported symbols on most
platform which matter (I can't test for AIX or HP-UX unfortunately - but
I am perfectly fine with ignoring namespace pollution on those anyway),
and I would guess that the only platforms which do not support symbol
visibility in one way or the other do not support shared library anyway
(some CRAY stuff, for example).

Concerning the file size, I don't think anyone would disagree that they
are too big, but we don't need to go the java-way of one
file/class-function either. One first split which I personally like is
API/implementation. For example, for multiarray.c, we would only keep
the public PyArray_* functions, and put everything else in another file.
The other very big file is arrayobject.c, and this one is already mostly
organized in independent parts (buffer protocol, number protocol, etc...)

Another thing I would like to do it to make the global C API array
pointer a 'true' global variable instead of a static one. It took me a
while when I was working on the hashing protocol for dtype to understand
why it was crashing (the array pointer being static, every file has its
own copy, so it was never initialized in the hashdescr.c file). I think
a true global variable, hidden through a symbol map, is easier to
understand and more reliable.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Linker script, smaller source files and symbol visibility

2009-04-20 Thread Charles R Harris
On Mon, Apr 20, 2009 at 10:13 PM, David Cournapeau 
da...@ar.media.kyoto-u.ac.jp wrote:

 Charles R Harris wrote:

 
  Here is a link to the start of the old discussion
  
 http://article.gmane.org/gmane.comp.python.numeric.general/12974/match=exported+symbols+code+reorganization
 .
  You took part in it also.

 Thanks, I remembered we had the discussion, but could not find it. The
 different is that I am much more familiar with the technical details and
 numpy codebase now :) I know how to control exported symbols on most
 platform which matter (I can't test for AIX or HP-UX unfortunately - but
 I am perfectly fine with ignoring namespace pollution on those anyway),
 and I would guess that the only platforms which do not support symbol
 visibility in one way or the other do not support shared library anyway
 (some CRAY stuff, for example).

 Concerning the file size, I don't think anyone would disagree that they
 are too big, but we don't need to go the java-way of one
 file/class-function either. One first split which I personally like is
 API/implementation. For example, for multiarray.c, we would only keep
 the public PyArray_* functions, and put everything else in another file.
 The other very big file is arrayobject.c, and this one is already mostly
 organized in independent parts (buffer protocol, number protocol, etc...)

 Another thing I would like to do it to make the global C API array
 pointer a 'true' global variable instead of a static one. It took me a
 while when I was working on the hashing protocol for dtype to understand
 why it was crashing (the array pointer being static, every file has its
 own copy, so it was never initialized in the hashdescr.c file). I think
 a true global variable, hidden through a symbol map, is easier to
 understand and more reliable.


I made an experiment along those lines a couple of years ago. There were
compilation problems because the needed include files weren't available. No
doubt that could be fixed in the build, but at some point I would like to
have real include files, not the generated variety. Generated include files
are kind of bogus IMHO, as they don't define an interface but rather reflect
whatever the function definition happens to be. So as any part of a split I
would also suggest writing the associated include files. That would also
make separate compilation possible, which would make it easier to do test
compilations while doing development.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion