On 2007-09-24, at 1120, Joshua Megerman wrote:

First off, let me prefice this by saying that while I understand the
concept of shared libraries, I don't understand the underlying mechanics
of how the OS handles them,

i'm not sure exactly how far "underlying" you don't understand, but here's a fairly simple overview of the seedy underside of program linking and the difference between static (i.e. compile-time) and dynamic (i.e. run-time) linking.

the compiler generates ".o" files, containing the following:

- one or more "text" segments, which contain byte sequences of executable code

- a list of "exports", symbols which are available in the module, usually functions which may be called from, or global variables (such as "errno") which may be accessed by other modules.

- a list of "imports", symbols which this module needs in order to execute correctly, and the "fixup" locations where the final memory address of each symbol should be stored.

there are other types of files called ".a" files, which are basically a collection of .o files joined together for easier management- a "library", in other words. libvpopmail.a is one of these.

you can see the various imports and exports in a .o or .a file using the program "nm". for example, in vpopmail 5.4.22, the file "md5.o" contains the following symbols:

$ nm md5.o
000007e0 T MD5Final
00000038 T MD5Init
0000006c T MD5Transform
000006ec T MD5Update
00000000 T byteReverse
         U memcpy
         U memset

the symbols with "T" are exports, the functions in the module. these function names are available to be matched against other modules which may need them. the symbols with "U" are imports, names which need to be matched against other modules in order to build a final working program. in this case, the "memcpy" and "memset" functions are defined in the "memcpy.o" module within "libc.a" or "libc.so".

the compile-time linker gathers a bunch of these .o and .a files, matches up the "imports" with the "exports" from the various modules, and produces a final executable with any interior links resolved. for a statically linked program, ALL links must be resolved in order to have a working program- so if your "main()" called any or all of the MD5 functions listed above, your ".o" would have "MD5Init" and friends as imports, and the linker would match those against the md5.o module and add "memcpy" and "memset" to the list of imports, so the linker would then bring in the "memcpy.o" and "memset.o" modules from "libc.a", as part of your program's final executable.

there are two problems with this scenario:

- some functions, like printf(), have a LOT of dependencies. a three- line program which might normally generate a 4K executable, can grow to over 800K because of these dependencies.

- if the underlying library changes, you have to re-compile this program to gain the benefits (security fixes, new features, etc.) of the new library.

if a program is being compiled to support dynamic linking, then instead of looking at "libc.a", it looks at "libc.so". and instead of copying the code from the .so into the final executable, it builds a list of "run-time fixups", which is stored in the final executable.

then, when the program is actually executed, the first thing it does is call a "run-time linker", usually called "ld.so". the run-time linker loads the necessary .so files into your program's memory space, performs the "fixups" (i.e. stores the final in-memory address of the library functions into the correct memory locations in your code), and then jumps to the starting point of your program.

because modern CPUs support the concept of making a particular segment of memory "read only", and because most memory management hardware makes it possible to map a particular physical segment of memory to appear in any logical address within the address space, it is possible for shared libraries to physically exist in memory only one time, while visible to multiple processes as different addresses. this is why, if you look at a process with "ps" or "top", you'll see two memory-usage numbers- the "virtual size", which is how much total memory space is used if this process were the only one on the machine, and "resident set", which is how much memory is dedicated to just that one process. the difference in these two numbers is the amount used for shared memory, usually shared libraries like libc.so.

 and thus am not sure exactly how can be affected performance-wise.

the vpopmail programs are already dynamically loaded- it's just the "libvpopmail.a" functions which are not loaded dynamically. the performance hit would be minimal- it already has to load libc.so at run-time, one more library won't take long enough to make any real difference.

1) A shared library with a stable API would make recompiling outside
programs (e.g., QmailAdmin) unnecessary, which would be a Good Thing (tm).

as long as it's the same API for all of the authentication modules.

i can also see having "libvpopmail.so" for the client-facing programs, then modules like "libvpopmailauth_cdb.so", "libvpopmailauth_mysql.so", and so forth, for the back-end code to handle the mechanics for that particular authentication back-end, similar to how courier-authlib is structured.

2) There has been some question regarding performance of the vpopmail
programs when compiled against shared vs. static libraries. I suggest the
following options be added for shared libraries at compile-time:
a) --disable-shared - don't build libvpopmail.so, which is what vpopmail
does now.
b) --enable-shared - build libvpopmail.so, but don't link the vpopmail
binaries against it - this gives other programs the ability to use the
shared library, but keeps the vpopmail binaries statically linked.
c) --enable-shared-binaries - build libvpopmail.so and link the vpopmail
binaries against it.  Implies --enable-shared.
d) possibly, if it's not to difficult, have a --enable-shared- binaries= and/or --enable-static-binaries= option, which takes a list of binaries
to link against the stated library, and links the rest against the
other.  So you could have static vdelivermail and vchkpw, but not
vadduser, for example. Not sure if that really is necessary, but static
linking does save space...

i vote for "a" and "c" during a transition period, then "c" as the only option after that.

in either case, i think "d" might be taking the idea too far.

3) In all cases, even if the vpopmail binaries are linked against the
shared library, the static library libvpopmail.a should be built since
some programs expect it.

maybe for interim versions, to give other programs' developers time to deal with the change... but i think that a "vpopmail version 6" should be "shared only".

Also, just a supposition on my part, but if you're running (e.g.)
courier-authdaemon linked against libvpopmail.so all the time, wouldn't that (theoretically) mean that other dynamically linked vpopmail programs would run faster than the static version since the library would already
be loaded in memory?

yes, but the difference wouldn't really be noticeable- it would still be a few milliseconds slower than having the functions hard-coded into the binaries.

If so, perhaps the speed solution for a dynamic
(e.g.) vdelivermail would be to run something that was dynamically linked
all the time, so libvpopmail stayed in memory...

if you're on a system which is busy enough that these few milliseconds are a significant issue, you will already have tens or hundreds of other processes with libvpopmail.so mapped into their memory space anyway- so again, it won't be an issue.

Anyway, that's it for now - I haven't even tried the patch against the
latest vpopmail, though I'm guessing it should be fairly easy (albeing
possibly tedious) to integrate since it's not much in the way of actual
code changes...

if you have a URL for that patch, i'd like to play with it myself.

| John M. Simpson    ---   KG4ZOW   ---    Programmer At Large |
| http://www.jms1.net/                         <[EMAIL PROTECTED]> |
| http://video.google.com/videoplay?docid=-1656880303867390173 |

Attachment: PGP.sig
Description: This is a digitally signed message part

Reply via email to