Re: Replacement for GMP: Update

2007-01-24 Thread Thorkil Naur
Hello,

On Wednesday 24 January 2007 16:30, Peter Tanski wrote:
> ...
> Thorkil Naur and others have suggested writing the whole   
> thing as small assembler operations and piece them together in  
> Haskell; I have been looking into that as well but it seems to entail  
> inlining every Integer function--imagine the code bloat.
> ...

Just a quick adjustment: I have suggested writing raw operations in 
(hopefully) portable C. And although I would consider eliminating some of the 
levels of calls (from the compiled Haskell code via Num.lhs and the handcoded 
PrimOps.cmm to the specific C function implementing the desired operation), I 
agree that inlining the entire function implementing the operation would be 
excessive.

Best regards
Thorkil
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2007-01-24 Thread Peter Tanski

On Tue, 23 Jan 2007, John Meacham wrote:

I think the benchmarks are flawed in an important way, I believe, (but
am not positive) that ARPREC uses a special factorized form of
representing numbers, which makes multiplicative type operations
extremely fast, but simple things like addition/subtraction quite  
slow.


Oh, don't worry.  I've given up on ARPREC for the simple reason that  
*every* operation on small-size operands is very slow to say nothing  
of the time it would take to initialise ARPREC every time.  ARPREC  
does store numbers using a representation similar to floating point,  
essentially an array of doubles where:

(int)(array[0]) = array_size
(int)(array[1]) = no_mantissa_words
array[2]= exponent
array[ 3 .. ] = mantissa
I hope the tests aren't "flawed," in that the main purpose of  
performing those operations was to see the median-level performance  
(extreme performance being Integers with operands greater than,  
10,000 or 30,000 bits).  The crumby graphs do show that I made a low  
cutoff of 256 bits so the most common Integer-use ( < 128 bits, 4  
uint32_t) wasn't even tested.  I probably didn't clarify why I tested  
larger size operands.  GHC should have something that is comparable  
to GMP, and that means speed (and precision) for medium-large  
operands as well as small operands.



you are only benchmarking multiplicative or similar routines, giving
ARPREC a huge lead, when in practice it might end up being slowest, as
addition/subtraction are extremely more common than multiplication.


In the common case it is slowest--but not by much.

Also, how are you choosing the numbers to test with? it is possible  
that

some packages are using 'sparse' representations or other specialized
forms if all your numbers happen to be powers of two or something.


Random numbers--a different number for each iteration in size.  I  
used a PRNG based on the SNOW2 stream cipher--something I wrote  
awhile ago, as fast as arc4random and tests well on DIEHARD and other  
statistical things.  The GMP and OpenSSL random number generators  
were slower and I wanted to use the same generator across libraries.


also, pretty much all uses of integers will be for very small  
integers,
we should be careful to not lose sight of speed for the common case  
due

to pretty asymtotic bounds.


All numbers were the same size, so cases like multiplying a 1024-bit  
operand by a 256-bit operand weren't tested; that could be a real  
flaw.  It's all very sloppy--just to get an idea of where things  
generally line up.


So, where am I now?  I worked on the GHC-Win off and on and then went  
back to making a uniform API between GHC and the replacement, and I  
am re-writing the replacement.  I thought I would be done several  
weeks ago but of course little things take over...  One area of GHC I  
would really like to change is the memory-use.  ForeignPtr seems to  
work well but places the full burden of memory management on the  
Integer library; for SIMD-use (AltiVec, SSE2), the library would  
require GHC to allocate memory aligned to 16-bytes.  Right now, the  
only choice would be allocatePinned() (in rts/sm/Storage.c), which is  
4-byte aligned and it is, of course, pinned so the GC can't move it.   
Imagine the RTS-memory problems you could have if you wrote a Haskell  
program using lots of small Integers allocated with, say, an  
allocatePinned_16() (16-byte aligned).  The alternative would be to  
use normal memory but re-align the operands for the vectorized  
operations by peeling; o.k., doable (especially easy with AltiVec),  
but slower.  In the meantime I have been working through SSE2  
assembler, which doesn't have an addc (add-carry) operation and  
doesn't set a flag for overflow, so I have been experimenting with a  
SWAR-like algorithm--essentially the same thing as GMP's Nails--to  
make the normally 32-bit operands 31 bits with the last bit for a  
carry flag.  Thorkil Naur and others have suggested writing the whole  
thing as small assembler operations and piece them together in  
Haskell; I have been looking into that as well but it seems to entail  
inlining every Integer function--imagine the code bloat.


Cheers,
Pete

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2007-01-23 Thread John Meacham
I think the benchmarks are flawed in an important way, I believe, (but
am not positive) that ARPREC uses a special factorized form of
representing numbers, which makes multiplicative type operations
extremely fast, but simple things like addition/subtraction quite slow.

you are only benchmarking multiplicative or similar routines, giving
ARPREC a huge lead, when in practice it might end up being slowest, as
addition/subtraction are extremely more common than multiplication.

Also, how are you choosing the numbers to test with? it is possible that
some packages are using 'sparse' representations or other specialized
forms if all your numbers happen to be powers of two or something.

also, pretty much all uses of integers will be for very small integers,
we should be careful to not lose sight of speed for the common case due
to pretty asymtotic bounds.

I mean, it would be great to be proven wrong and find out ARPREC was
great across the board. :)

John

-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2007-01-02 Thread Peter Tanski

On Dec 29, 2006, at 8:32 AM, Thorkil Naur wrote:

On Friday 01 September 2006 06:43, Peter Tanski wrote:

...
For a brief overview of the speed of the libraries I looked at   
carefully, see http://hackage.haskell.org/trac/ghc/wiki/ 
ReplacingGMPNotes (I added a few charts to show the speed of  
ARPREC, OpenSSL's BN, GMP and LibTomMath.  )  I tested GMP and  
OpenSSL's BN for reference.


I must confess that it took me a while to understand what you were  
doing here
and I am still uncertain: For example, how many multiplications  
were actually

performed for bit size 4096?


4096 = size 4 (counting from zero: sizes[5] = {256,512,1024,2048,4096})
1,000,000 / ( 4 * 3 ) = 83,333 rounds

My reason for doing this was simple: as a basic comparison, rather  
than an absolute measurement, the number of rounds doesn't matter as  
long as the results are measurable.  Besides, more rounds means more  
time running each test.  I did a lot of tweaking, especially with  
ARPREC, to get each library to (1) a generally available speed and  
(2) a configuration similar to what it would be when used with GHC.   
I could have measured the speed in nanoseconds, with one iteration  
for each calculation using random numbers of a specified size and  
posting the mean for a number of trials but that would have required  
me to use OS-X specific profiling software like Shark in order to get  
reliable measurements--a bit more labour intensive as it would  
require me to manually perform each test.  (I did use random numbers  
of a specified size.)



In addition, for "Powers", the markings on the
horizontal axis ("256 pow(n,3)", "512 pow(n,4)", "1024 pow 
(n5)" (missing a
"," here?), ...) on your graph seem to indicate that you are  
changing two

different variables (the bit size and the exponent) at the same time.


Yes, the testing is a bit sloppy there (so is the graph; ugly typo).   
The graph shows a general trend more than anything else.  I actually  
tested the Exponentials (Powers) individually for each size and  
timing but posting a 3-D graph or making the graph (time = exponent/ 
size) seemed like overkill or would conflict with the previous  
measurements.  Not a bad idea, though, just for clarity.


I would suggest that you also quoted the original measurements  
directly. And perhaps
(especially in case of the "Powers" and "Division") some more  
details about

what was actually measured.


I did quote the original measurements directly.  There wasn't much  
variance overall and I took what seemed like median results from a  
number of tests.  What matters is the relative time to initialise and  
perform each computation since in a GHC-implementation each  
computation would require some minimal initialisation.  ARPREC was  
built for this and in ARPREC-only tests the major factor in speed of  
initialisation was the time to compute the architecture and precision- 
specific constants for PI, Log_2 and Log_10 (the Epsilon constant  
doesn't require much time).  Log_2 and Log_10 are necessary for  
printing Integers because computations in ARPREC are performed as  
floating-point values and must converted to decimal digits by  
multiplying by Log_10.  (Note that printing Integers also requires a  
size increase as the mantissa holds the Integer value, requiring  
further multiplication by the float-exponent.)


Details on differences between algorithms used in each library would  
be fairly complex: as you already know, each library (ARPREC, GMP,  
LibTomMath, etc.) uses a different algorithm based on the size of the  
number-array *and* each may have a different implementation of an  
algorithm--LibTomMath uses a simpler Karatsuba algorithm, for example.



It is distinctly odd that squaring seems to be more expensive than
multiplication for some bit sizes in 3 of the 4 measured cases.


This is also an implementation matter: the particular algorithms  
change with size and squaring may require some extra initialisation  
for, say, computing the size of the result and the number of  
operations.  All of the libraries provide special squaring functions  
and I used those.  LibTomMath is a good example: it uses its  
"baseline" squaring algorithm for small sizes and Comba-optimised  
Toom-Cook or Karatsuba algorithms for larger sizes. (I purposely did  
not tune LibTomMath or any of the libraries because I wanted a more- 
or-less average-case comparison, so the Karatsuba algorithm was used  
for size=512 bytes (128 digits) and Toom-Cook was used for size=2048  
bytes (512 digits).)  So where you see LibTomMath's time dip in the  
middle of the 'Squaring' graph it is using the Karatsuba algorithm.   
ARPREC uses a FFT for everything above size=256 and calculates with  
fewer actual digits (it stores extra "size" as an exponent, just like  
ordinary floating point numbers).  The trends in the graphs for  
ARPREC versus GMP generally hold true until GMP's FFT kicks in, at  
size > 32768.



Also, I
wonder what div

Re: Replacement for GMP: Update

2006-12-29 Thread Thorkil Naur
Hello,

Thanks a lot for your reply. Here are some comments to this. As is customary, 
I must apologise for the late reply (the response time for this conversation 
seems to increase exponentially with time) which also may very well make some 
of the comments totally out-dated.

On Friday 01 September 2006 06:43, Peter Tanski wrote:
> ...
> > For a brief overview of the speed of the libraries I looked at   
> carefully, see
> http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes
> (I added a few charts to show the speed of ARPREC, OpenSSL's BN, GMP  
> and LibTomMath.  )  I tested GMP and OpenSSL's BN for reference.

I must confess that it took me a while to understand what you were doing here 
and I am still uncertain: For example, how many multiplications were actually 
performed for bit size 4096? In addition, for "Powers", the markings on the 
horizontal axis ("256 pow(n,3)", "512 pow(n,4)", "1024 pow(n5)" (missing a 
"," here?), ...) on your graph seem to indicate that you are changing two 
different variables (the bit size and the exponent) at the same time. I would 
suggest that you also quoted the original measurements directly. And perhaps 
(especially in case of the "Powers" and "Division") some more details about 
what was actually measured.

It is distinctly odd that squaring seems to be more expensive than 
multiplication for some bit sizes in 3 of the 4 measured cases. Also, I 
wonder what divisions these libraries are capable of carrying out so much 
faster than comparable multiplications. For the division measurements, I may 
again simply be presuming something about your experiments that simply isn't 
true.

> ...
> I keep talking about ARPREC--why?  For three reasons:
> (1) I trust its level of precision--this has been tested, see FFTW's  
> benchmark page for accuracy: http://www.fftw.org/accuracy/G4-1.06GHz- 
> gcc4/

I am not sure I understand here: With respect to Haskell Integers, I would not 
say that there is any concept of a "level of precision": If Integer 
computations are not accurate, they are wrong.

> (2) if you look at the charts, although ARPREC is bad in division it  
> simply blows GMP, OpenSSL and LibTomMath away: at 4096 bits (85  
> doubles--each double has conservatively only 48.3 or so bits of  
> integer precision), ARPREC can take a full random number to pow(n,7)  
> in .98 seconds, compared to 77.61 or so seconds for the leader of the  
> Integer-based libraries, GMP.  (I ran the tests many times to make  
> sure readings weren't flukes.)

Again, some additional details about these measurements would be most welcome.

> (3) of all the FFT-based arbitrary precision libraries available,  
> ARPREC is the only BSD-licensed one--Takuya Ooura's library (used in  
> MAPM) is only a FFT algorithm and not necessarily either fast or  
> accurate.  The rest of the FFT libraries available are essentially  
> GMP-licensed.
> 
> So I am in an unenviable position: I intend to fulfill my promise and  
> get a replacement for GHC (and maybe more), but I have to essentially  
> build better functionality into the front-runner, ARPREC.  At present  
> I have been working with vector-based algorithms that would enable me  
> to use hardware-tuned code for Single Instruction Multiple Data  
> (SIMD) chipsets.  Currently I am researching algorithms based on  
> operating-system supplied vector libraries.  Part of this  
> modification involves a fast transformation between a vector of large  
> integers and an array of doubles, without loss of precision (although  
> vectors of doubles are also working well, they do not have the same  
> library support.)  I am also working on enhancing ARPREC's division  
> algorithm.
> 

I looked briefly at ARPREC: It seems that it works with an explicitly set 
maximum bound on the number of digits desired. Although it also appears that 
it is possible to change this bound as computations proceed, such additional 
administration seems difficult to embrace.

> This is the problem I face: GHC unfortunately does not use Integer as  
> a mathematical operation but as a primitive type, complete with  
> bitwise operations.  

I do not understand what problem you are referring to here.

> ...
> On Aug 24, 2006, at 2:39 PM, Thorkil Naur wrote:
> 
> > I am truly unable to tell what I would consider the ideal  
> > situation. On the
> > one hand, I value greatly the freedom of choosing my circumstances  
> > without
> > restraints. And I appreciate the desire of noble people like the GHC
> > developers to be equally free of such restraints. This would point  
> > in the
> > direction of basing Integer computations on a fairly generic  
> > implementation.
> > Which insightful consideration would then force us to admit would  
> > probably
> > cost a factor of, say, 2 or more in performance in some cases.
> 
> The problem I face is: performance at which level?  The low-bit-size  
> (low precision) range or the high precision range?
> 
> > Two proble

Re: Replacement for GMP: Update

2006-08-31 Thread Peter Tanski

Hello Thorkil,

I am very sorry for the late reply.  I have been extremely busy and I  
wanted to give you a coherent answer.


For a brief overview of the speed of the libraries I looked at  
carefully, see

http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes
(I added a few charts to show the speed of ARPREC, OpenSSL's BN, GMP  
and LibTomMath.  I did not add speed tests for Crypto++ and Botan  
because they don't measure up.  The original timings I obtained for  
them were based on their own benchmarks which are inadequate and (for  
Crypto++) based on tuned assembly code only available on Pentium4  
processors with SSE.)  I tested GMP and OpenSSL's BN for reference.


Over the past few weeks I tore Crypto++ apart and modified a few  
things, only to find out that it has a persistent bug: woven  
throughout the library is a conversion from 32-bit to 64-bit ints  
using unions.  This kind of transformation breaks the C's (and C++'s)  
aliasing rules (thanks to John Skaller for pointing out the problem),  
so compiling Crypto++ with optimisations turned on (g++ -O3)  
introduces failures, especially in the Division algorithms.  I could  
change the unions to bitwise transformations with masks but I would  
have to really dig out all the references.  After some more rigorous  
timing tests I found that I would have to essentially rewrite the  
algorithms anyway.  What a mess.


After some more research I found that there really are no other good  
public domain or BSD-compatible licensed libraries available.  I  
tested two other free arbitrary precision integer libraries, MPI and  
MAPM, but they were also too slow, sometimes as much as 4 times  
slower.  MAPM uses a Fast Fourier Transform (FFT) from Takuya Ooura  
(see http://momonga.t.u-tokyo.ac.jp/~ooura/fft.html) and should have  
been fast but turned out to be even slower than MPI.


If you look at the ReplacingGMPNotes page I mentioned at the top of  
this email, the charts show that LibTomMath is weak in  
multiplication--at larger integer sizes (2048-4096 bits) it is half  
as fast as GMP, or worse.  On the other hand ARPREC, which also uses  
a FFT algorithm, is slow at lower precisions (256-512 bits) for two  
reasons: (1) at relatively low precisions, ARPREC defaults to  
"faster" standard algorithms instead of its FFT and (2) when using  
its fast FFT at medium levels of precision (512) bits the FFT is too  
ponderous keep up with the relatively lighter and faster algorithms  
of the int-based libraries (GMP, OpenSSL's BN and LibTomMath).  (As a  
little history, ARPREC used to be fairly bad relative to other FFT  
programs available but was redone after 2003 so by 2004 it was fairly  
good, it is up to version 2.1.94 now and it is much better; if you  
are looking at ARPREC benchmarks online prior to 2003, they are too  
old to be good indicators of its present capability.)


I keep talking about ARPREC--why?  For three reasons:
(1) I trust its level of precision--this has been tested, see FFTW's  
benchmark page for accuracy: http://www.fftw.org/accuracy/G4-1.06GHz- 
gcc4/
(2) if you look at the charts, although ARPREC is bad in division it  
simply blows GMP, OpenSSL and LibTomMath away: at 4096 bits (85  
doubles--each double has conservatively only 48.3 or so bits of  
integer precision), ARPREC can take a full random number to pow(n,7)  
in .98 seconds, compared to 77.61 or so seconds for the leader of the  
Integer-based libraries, GMP.  (I ran the tests many times to make  
sure readings weren't flukes.)
(3) of all the FFT-based arbitrary precision libraries available,  
ARPREC is the only BSD-licensed one--Takuya Ooura's library (used in  
MAPM) is only a FFT algorithm and not necessarily either fast or  
accurate.  The rest of the FFT libraries available are essentially  
GMP-licensed.


So I am in an unenviable position: I intend to fulfill my promise and  
get a replacement for GHC (and maybe more), but I have to essentially  
build better functionality into the front-runner, ARPREC.  At present  
I have been working with vector-based algorithms that would enable me  
to use hardware-tuned code for Single Instruction Multiple Data  
(SIMD) chipsets.  Currently I am researching algorithms based on  
operating-system supplied vector libraries.  Part of this  
modification involves a fast transformation between a vector of large  
integers and an array of doubles, without loss of precision (although  
vectors of doubles are also working well, they do not have the same  
library support.)  I am also working on enhancing ARPREC's division  
algorithm.


This is the problem I face: GHC unfortunately does not use Integer as  
a mathematical operation but as a primitive type, complete with  
bitwise operations.  From my experience, GHC users typically use  
Integers at lower precisions (typically under 256 bits) and they do  
not use Integer for higher math.  (Integer math operations are basic  
primitives, as you already know.)  I 

Re: Replacement for GMP: Update

2006-08-24 Thread Thorkil Naur
Hello Peter,

Sorry for the late reply. From your latest communication which seems to be

Date: Sat, 12 Aug 2006 21:12:05 -0400
From: Peter Tanski <[EMAIL PROTECTED]>
Subject: Re: OpenSSL License (was Replacement for GMP: Update)
To: John Goerzen <[EMAIL PROTECTED]>

I am a bit uncertain where the matter stands right now. Nevertheless, I wish 
to thank you for your reply. Clearly, you have done a lot of work, not least 
to investigate alternatives.

You write

> I hope I have answered a few of your points; some of them have given  
> me more work but that is all right, I guess.  ...

And you most certainly have, much more detailed than I could have hoped for. 
And hopefully, this additional work that you mention is not something that 
you considered a waste of time.

I am also most happy to read that

> Before I actually bind any library with GHC, I will thoroughly test  
> it as a stand-alone library   ...

since "correctness" of Integer computation is by far its most important 
property. (I would say that "performance" comes second, then what could be 
termed "interoperability" as third.)

The support for Integer computation is a messy subject: On the one hand, even 
children out of the first few years of school are capable of understanding 
the basic functionality involved. On the other hand, providing efficient 
support of this functionality, especially if required in a wide selection of 
circumstances, requires a surprising amount of hard work and insight. I would 
say, if only really large integers were routinely required in actual, 
real-world computations, our CPUs would support these computations and there 
would be no need to consider the question in the, admittedly, fairly limited 
context of GHC. The situation seems to be that the functionality is perhaps 
considered obviously available, but the reality is that it can only be 
attained at significant cost (either real money or honest sweat).

I am truly unable to tell what I would consider the ideal situation. On the 
one hand, I value greatly the freedom of choosing my circumstances without 
restraints. And I appreciate the desire of noble people like the GHC 
developers to be equally free of such restraints. This would point in the 
direction of basing Integer computations on a fairly generic implementation. 
Which insightful consideration would then force us to admit would probably 
cost a factor of, say, 2 or more in performance in some cases.

On the other hand, the existence of libraries like GMP seems to offer an 
economic path out of this jungle. However, as the discussion over the past 
couple of weeks illustrates, the path may be unacceptably thorny.

Let me quote some additional statements from this debate:

On Thursday 10 August 2006 19:00, Simon Peyton-Jones wrote:
...
> OK so you are talking of taking a copy of the BN code, changing the
> function names, and statically linking its allocator to GHC's RTS.
> 
> If someone wants to make a program that uses BN for some other purpose,
> they get the original BN, whose function names don't clash, and which
> uses malloc.
> 
> Sounds ok to me.
> 
> Simon
...

Two problems seem to be lurking here: The first is that, although taking a 
copy of some existing library and massaging it to become working under some 
presently available circumstances may be fine, what is really needed is 
something that keeps on working, even under changing and future 
circumstances. The second is that if I wish to mix Haskell and non-Haskell 
code that processes the same data, I may find myself in need of a way to 
convert between whatever is required by this copy of some existing library 
that has been created and the presently available, but potentially quite 
different, version of that same library.

No offence is meant here: I know that I am simply re-iterating what I have 
said earlier. However, I believe that this is sufficiently important to risk 
being ridiculed for pointing out the obvious.

Then you ask

> ... If you have the time, would you  
> be able to find any programs you might have that displayed poor  
> Integer performance or possibly bottlenecks in the current system?   

Let me say first that I have not experienced any really surprisingly poor 
performance of the current system. To be sure, I have seen programs similar 
to my own and not written in Haskell that performed better on comparable 
tasks. But I have not been able to say specifically that this poorer 
performance of my program was caused by some inadequacy in the Haskell (with 
GMP) implementtion.

I better be specific here: What I do is integer factorization (splitting 
integers into their prime factors). And, for example, I have implemented a 
version of the ECM (Elliptic Curve Method) in Haskell. Having done that 
(independently and mainly to learn what it was about), I eventually compared 
it in performance to GMP-ECM which

RE: Re[2]: Replacement for GMP: Update

2006-08-22 Thread Simon Marlow
On 21 August 2006 16:20, Bulat Ziganshin wrote:

> Hello Simon,
> 
> Monday, August 21, 2006, 6:55:47 PM, you wrote:
>
>> Parallel major GC was worked on by Roshan James during his
>> internship here this summer, and we have some working code, but it
>> needs a lot more testing and tuning, and it may be some time before
>> we can incorporate this into the mainline. 
> 
> so it seems that it will not be included in 6.6.* ?

Unfortunately, no.  I hope to work on it after the 6.6 release.

Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: Replacement for GMP: Update

2006-08-21 Thread Bulat Ziganshin
Hello Simon,

Monday, August 21, 2006, 6:55:47 PM, you wrote:

>> is not exactly true. look at "Non-stop Haskell"

> Simply because it adds overhead (both in runtime and code complexity), and the
> benefits are relatively modest.

i think that GC that don't stops the world is just a different
product. even if it makes program, say, 2 times slower overall. just
imagine game like Quake written in GHC ;)

of course, i understand that you don't want to support one more GC
architecture, just mentioned why it may matter for some people

> Parallel major GC was worked on by Roshan James during his internship here 
> this
> summer, and we have some working code, but it needs a lot more testing and
> tuning, and it may be some time before we can incorporate this into the 
> mainline.

so it seems that it will not be included in 6.6.* ?


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-21 Thread Simon Marlow

Bulat Ziganshin wrote:

Hello skaller,

Sunday, August 13, 2006, 4:34:14 AM, you wrote:


But the state of the art is then two stages behind the
requirement: Haskell still has to 'world stop' threads
to do a major collection.
 
is not exactly true. look at "Non-stop Haskell"

(http://www.haskell.org/~simonmar/papers/nonstop.pdf)

i don't know why it is not included in 6.6 or previous version


Simply because it adds overhead (both in runtime and code complexity), and the 
benefits are relatively modest.


To comment on another point in this thread: currently the generation 0 GCs 
(minor GCs) also stop-the-world, even on a multiprocessor.  This is a 
significant problem, and we have some ideas for fixing it, but no code yet (it's 
  a possible intern project, if anyone is interested).


Parallel major GC was worked on by Roshan James during his internship here this 
summer, and we have some working code, but it needs a lot more testing and 
tuning, and it may be some time before we can incorporate this into the mainline.


Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[8]: Replacement for GMP: Update

2006-08-14 Thread Bulat Ziganshin
Hello skaller,

Sunday, August 13, 2006, 4:34:14 AM, you wrote:

> I know very little about Haskell, let alone GHC internals

me too. so it's better to wait for comments about your thoughts from
GHC team than from me. but at least i can said that

> But the state of the art is then two stages behind the
> requirement: Haskell still has to 'world stop' threads
> to do a major collection.

is not exactly true. look at "Non-stop Haskell"
(http://www.haskell.org/~simonmar/papers/nonstop.pdf)

i don't know why it is not included in 6.6 or previous version

> So I'm bringing into question whether these nice
> 'optimisations' are actually worthwhile. They actually
> seem to *degrade* performance, not improve it, when we're
> running with a large number of CPUs. Stopping the world
> if you have 20,000 CPU's will happen so often, all the
> CPU's will be idle 99.99% of the time :)

btw, one GHC intern worked on multi-processor GC and i hope that it
will be included in 6.6. so, the GC will also use all these 20k cpus :)
or Intel/IBM/Sun will start make some FP chips. they already do this
actually, just these chips still emulate x86/sparc/... ones :)




-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: OpenSSL License (was Replacement for GMP: Update)

2006-08-12 Thread Peter Tanski

John,

Have you carefully investigated the OpenSSL license?  We in Debian  
have

had repeated problems since the OpenSSL license is, as written,
incompatible with the GPL (even linking to OpenSSL is incompatible  
with

the GPL).  I would hate to have a situation where all GHC-compiled
programs can't be under the GPL.


I have been discussing this very issue with several people in this  
list.  At first I thought it was a peripheral issue because I had not  
carefully reviewed the GPL license until Florian Weimer pointed out  
what the problem was.  By that time I had already decided to work  
with another replacement library--I have not yet decided which  
because I am still testing things out--based on good arguments other  
people gave for not subjecting users to OpenSSL's advertising  
clause.  The libraries I am working with are either public domain or  
BSD licenses.  Sorry to scare you.


Best regards,
Peter
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Re[6]: Replacement for GMP: Update

2006-08-12 Thread skaller
On Sat, 2006-08-12 at 16:58 +0400, Bulat Ziganshin wrote:
> Hello skaller,

hi .. :)

> > The problem will occur if the 'stack' is aged: in that case
> > the sweep can miss the mutation and reap a live object.
> 
> well, i don't know how updatable vars in stack interoperate with GC.
> let's someone else answer this question.

I know very little about Haskell, let alone GHC internals
(I'm mainly using Ocaml at the moment, to implement my
own programming language).

Haskell seems to be on the right track in many ways: I view the
desirable upgrade path for programmers as roughly:

C --> C++ --> Felix --> ML --> Haskell

which represents a gradual move towards a more functional
and declarative style. It is not just that this is 
'easier to reason about' for the human programmer, and thus
provide better ways of ensuring correctness, but also that
it is 'easier to reason about' for machine algorithms,
and thus provides for better optimisation and much faster code.

The move to multi-processing on desktop computers seems to
accelerate the need for better languages than C++ (don't even
mention that J*** language .. :)

>  but afaik GHC 6.5 supports
> multi-cpu execution of multiple Haskell threads (which makes it a
> better language for modern multi-core cpus), tail-call optimization
> and 2-generation GC (actually, it supports any number of generations).
> also, GHC supports mutable arrays. you search GHC bug tickets for one
> that optimized GC with mutable vars: in ghc64 _each_ GC, including
> minor ones need to scan _every_ updatable reference (IORef/STRef) and
> _each_ element of _every_ updatable array (IOArray/STArray) and this
> significantly slowdowns many programs, including GHC itself. in ghc65
> better scheme now used

Cool. I guess what I'm saying is: there seems to be a contradiction
between multi-processing and mutation, perhaps more specifically
shared state. The tail-rec and array optimisation stuff may map
functional code to a procedural model, and obtain good performance
on a uni-processor.. but perhaps it does so at the cost of
bad performance on a multi-processor?

It may turn out that the underlying reason for preferring a
lazy purely functional high level programming language is 
also a reason for a similar low level execution model.

In particular .. it seems to me it defeats high performance
garbage collection of the very kind Haskell already uses.
The issue isn't making the collector faster .. but being
able to run it asynchronously. We all surely conceive the
functional model as constructive: you make new objects
as you need them, but never change old ones. This is basically
how 'mathematics' works. You don't worry about objects you
don't need any more. Since we don't have infinite memory
on real computers, we use a garbage collector to recycle
the unreachable store. And we all 'conceive' that as
a transparent background process.

Clearly, we need not just an asynchronous reaper -- we
also need to be able to execute multiple reapers in 
parallel, otherwise the system will not scale to a large
number of CPUs.

But the state of the art is then two stages behind the
requirement: Haskell still has to 'world stop' threads
to do a major collection. My suggestion that the cause
of this is precisely that it is possible to age the
independent per thread young heaps so that the aged
objects, now in the shared major heap, are mutable.
To prevent the mutable parts changing during collection,
all the threads have to be stopped. 

Do I understand correctly this is what actually happens
at the moment?

To fix the problem, we have to prevent the mutable
store ever getting into the major heap. That can
be done by throwing out the tail-rec, array, and
other optimisations that 'cheat' the collector
by manually reusing store, and bypassing the collection.
That works for a single thread only because you can't
collect and spew objects at the same time, ensuring
synchronisation. 

So I'm bringing into question whether these nice
'optimisations' are actually worthwhile. They actually
seem to *degrade* performance, not improve it, when we're
running with a large number of CPUs. Stopping the world
if you have 20,000 CPU's will happen so often, all the
CPU's will be idle 99.99% of the time :)

Removing the optimisation will slow down each CPU,
but remove any need to stop the world for long periods.
The world WILL have to pause to serialise aging the
young heaps.

It would be even better if the collection space could
somehow be partitioned so you could dedicate say 1%
of the CPU's to collection, that is, support multiple
parallel background collection: that is mandatory 
for scalability.

I note in passing that because the problem is with
*aging* mutable store .. the problem also goes away
if it is not aged. In particular if all mutable
objects are forcibly retained in the young heap,
there's no problem. For tail-rec optimisation this
is probably viable, and the implementation is possibly
quite simple, at least in p

Re[6]: Replacement for GMP: Update

2006-08-12 Thread Bulat Ziganshin
Hello skaller,

Saturday, August 12, 2006, 12:34:54 PM, you wrote:

>> > My point here is that actually this is a disastrous optimisation
>> > in a  multi-processing environment, because in general, the
>> > assignment of a pointer means the store isn't write once.
>> 
>> :)  all the variables rewritten is local to the function. _without_
>> tail call optimization, we create new stack frame for each recursive
>> call. _with_ optimization, we just update vars in the first stack
>> frame created because we know that these vars will not be reused after
>> return from call

> Yes, but this defeats the use of the kind of collector I attempted
> to describe. 

> The problem will occur if the 'stack' is aged: in that case
> the sweep can miss the mutation and reap a live object.

well, i don't know how updatable vars in stack interoperate with GC.
let's someone else answer this question. but afaik GHC 6.5 supports
multi-cpu execution of multiple Haskell threads (which makes it a
better language for modern multi-core cpus), tail-call optimization
and 2-generation GC (actually, it supports any number of generations).
also, GHC supports mutable arrays. you search GHC bug tickets for one
that optimized GC with mutable vars: in ghc64 _each_ GC, including
minor ones need to scan _every_ updatable reference (IORef/STRef) and
_each_ element of _every_ updatable array (IOArray/STArray) and this
significantly slowdowns many programs, including GHC itself. in ghc65
better scheme now used

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-12 Thread Peter Tanski

Florian:


On most systems, readline is GPLed.  There is a non-copyleft
reimplementation somewhere, but I don't think it's widely used.


The non-copyleft implementation is an upgrade from BSD's lineedit,  
called libedit .  Apple's OS X  
includes it as a version of libreadline by a adding "/usr/lib/ 
libreadline.dylib" as a symbolic link to libedit.dylib.  (I had  
forgotten about this because I have a version of Darwin Ports's  
libreadline I had installed so I could build GHC on OS 10.4: libedit  
does not fully implement libreadline's API so GHC requires libreadline).


That GHC includes libreadline is not a problem for developers  
compiling programs using GHC because libreadline is used for GHC's  
interactive bytecode interpreter, GHCi, which is part of the compiler  
system but is not linked into the RTS of compiled programs.  (This  
might be a problem with GHC's BSD license; I don't know.)  Clearly  
the best solution would be to write to the FSF.  The whole purpose of  
licensing GHC as BSD3 is to promote the broad use of Haskell as a  
programming language and developer involvement in the support,  
maintenance and research of Haskell and the GHC compiler.  Those  
goals realistically require commercial developer support, otherwise  
small company or individual developers who lack the market  
recognition to sell software as a service could not write  
applications in Haskell and compile them with GHC.  There are a few  
companies that do profitably sell Haskell as a service, notably  
Galois Connections, Aetion Technologies, Bluespec, Erlang Training  
and Consulting and OM Consulting.



 the "no further restriction" clause is
so broad


It has to be very broad, otherwise developers could bypass the
copyleft concept.


True.


the same clause (verbatim) in section 10 of the LGPL means the GPL
license is incompatible with the terms of the LGPL!


The LGPL permits relicensing of covered works under the GPL.  Without
that provision, it would indeed be incompatible with the GPL.


Are you referring to section 3 of the LGPL?  The drafters should have  
referenced section 10 there to avoid confusion: section 3 and section  
10 are clearly conflicting provisions with the only indicator of  
precedence being the order (section 3 comes before 10).  I should  
note that this would not be a problem for licenses assigned directly  
by the FSF since they could refer to other documentation that would  
clarify the discrepancy.  It might be a problem for an individual  
programmer who simply adopted the LGPL license and then had to  
enforce it.


Best regards,
Peter





___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-12 Thread John Goerzen
On 2006-08-10, Peter Tanski <[EMAIL PROTECTED]> wrote:
> Summary: I finally settled on modifying OpenSSL, since that would be  
> the easiest to work with under GHC's hood (plain C code, not C++).  I  
> have to:

Have you carefully investigated the OpenSSL license?  We in Debian have
had repeated problems since the OpenSSL license is, as written,
incompatible with the GPL (even linking to OpenSSL is incompatible with
the GPL).  I would hate to have a situation where all GHC-compiled
programs can't be under the GPL.

-- John

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Re[4]: Replacement for GMP: Update

2006-08-12 Thread skaller
On Sat, 2006-08-12 at 10:58 +0400, Bulat Ziganshin wrote:
> Hello skaller,
> 
> Saturday, August 12, 2006, 7:06:15 AM, you wrote:
> 
> > My point here is that actually this is a disastrous optimisation
> > in a  multi-processing environment, because in general, the
> > assignment of a pointer means the store isn't write once.
> 
> :)  all the variables rewritten is local to the function. _without_
> tail call optimization, we create new stack frame for each recursive
> call. _with_ optimization, we just update vars in the first stack
> frame created because we know that these vars will not be reused after
> return from call

Yes, but this defeats the use of the kind of collector I attempted
to describe. 

The problem will occur if the 'stack' is aged: in that case
the sweep can miss the mutation and reap a live object.

-- 
John Skaller 
Felix, successor to C++: http://felix.sf.net

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[4]: Replacement for GMP: Update

2006-08-12 Thread Bulat Ziganshin
Hello skaller,

Saturday, August 12, 2006, 7:06:15 AM, you wrote:

> My point here is that actually this is a disastrous optimisation
> in a  multi-processing environment, because in general, the
> assignment of a pointer means the store isn't write once.

:)  all the variables rewritten is local to the function. _without_
tail call optimization, we create new stack frame for each recursive
call. _with_ optimization, we just update vars in the first stack
frame created because we know that these vars will not be reused after
return from call

> I haven't figured all the details. I am imagining a generational
> copying collector with a separate young heap for every
> thread, whilst the great grand mother heap is collected
> in the background.

it's close to the GHC scheme - 2-generational GC with 1st generation
local to thread and having default size of 256kb (tied to the size of L2
caches of modern CPUs) while 2nd generation is global for all threads
and collected at moments when memory currently allocated from OS
exhausted. but currently 2nd-generation GCs don't runs in background
and can be as long as several seconds. it's definitely hard to write
games (or even web servers) with such architecture! :)

so memory allocation is very fast. 1st-level GC is also fast because
it runs inside L2 cache. access to "local variables" (i.e. data
structures allocated in near past) is fast because they are all inside
L2 cache. although Haskell don't supports "auto" variables which don't
need GC and freed automatically, this scheme allows to significantly
lower cost of managing short-lived data structures

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-11 Thread Florian Weimer
* Peter Tanski:

> Quite right; my mistake: under the OpenSSL license a developer cannot
> mention features of the software in advertising materials, so the
> license grant of the GPL-OpenSSL program to the developer is void.
> The reason I mentioned "users" only was that in the particular
> problem we have here GHC does not use any other GPL programs (I think
> I am correct--readline is the unix version, not the GPL version,
> correct?)

On most systems, readline is GPLed.  There is a non-copyleft
reimplementation somewhere, but I don't think it's widely used.

> The advertising requirement in the OpenSSL license would certainly
> constitute a "further restriction" under GPLv2 section 6; the
> strange implication is that the "no further restriction" clause is
> so broad

It has to be very broad, otherwise developers could bypass the
copyleft concept.

> the same clause (verbatim) in section 10 of the LGPL means the GPL
> license is incompatible with the terms of the LGPL!

The LGPL permits relicensing of covered works under the GPL.  Without
that provision, it would indeed be incompatible with the GPL.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Re[2]: Replacement for GMP: Update

2006-08-11 Thread skaller
On Fri, 2006-08-11 at 18:56 -0400, Peter Tanski wrote:
> John,
> 
> > After all on the average call where an object of that
> > size is free already it is a single array lookup, we have:
> >
> > (a) fetch pointer (one read)
> > (b) fetch next (one read)
> > (c) store next as current (one write)
> 
> This is true for memory access; it is not true for memory  
> allocation.

That's an *allocation* obtained by popping a suitable
block off a linked list (sorry should have explained better)

>   I do not know how malloc allocates memory on Windows but  
> on general POSIX systems the kernel uses a linked list and lots of  
> other management things to reduce fragmentation, such KMEMSTAT.  

Yes, but most allocations don't require a kernel call:
malloc is implemented in libc and suballocates an block
obtained from the kernel. Free'd blocks go on a free list
indexed by block size .. so the typical program will get
to a state where most allocations reduce to popping a list.

> Malloc may also block, which is something that you have more control  
> over in your own garbage collected heap.

Yes.

>  A really good explanation  
> for the problem of rapidly allocating and deallocating temporary  
> blocks of memory under 35kb is here: http://ridiculousfish.com/blog/ 
> archives/2006/05/16/36/ .

Eh? That explains why OSX is slower with 35K blocks because
it switches to kernel calls for larger allocations at that 
size .. the article doesn't say anything about smaller blocks.

I assume we're considering small blocks here: list nodes and
so on are small, and these would be the majority case in a
functional programming language.

> > A purely functional system -- one which does NOT convert
> > self tail calls into jumps and reuse storage -- can
> > perhaps be faster, since each thread can have its own
> > local arena to allocate from (without need of any write
> > barrier) .. however it isn't clear to me what the trade
> > off is between cache misses and parallelism.
> 
> That is interesting but I do not understand whether your mention of  
> self tail calls turned into jumps was low or high level. 

What I mean is the optimisation (which Felix does for functional
code) whereby you convert:

int f(int x) {
int y = x + x;
if(x>100) return x;
return f(y);
}

to

int f(int x) {
start:
int y = x + x;
if (x>100) return x;
x = y; // assign argument
goto start;
}

This optimisation eliminates the need to spawn a new stack
frame, and will eliminate cache misses.

So it looks very efficient.

My point here is that actually this is a disastrous optimisation
in a  multi-processing environment, because in general, the
assignment of a pointer means the store isn't write once.

With write once store -- ignoring registers -- you can
collect lazily in a background thread. Although you're
not scanning all of memory .. new memory is allocated
as time goes on .. it doesn't matter. Any garbage you
find is certain to be garbage.

This is because you cannot miss a heap block in your sweep.
Even though new blocks are being allocated in parallel,
they can only obtain a pointer from the old picture
of the heap (implying the block is reachable), and,
the pointer in the old block from which it was obtained
cannot be erased: since constructed blocks are read only.

I haven't figured all the details. I am imagining a generational
copying collector with a separate young heap for every
thread, whilst the great grand mother heap is collected
in the background.

This means (a) mother collection, (b) young heap compaction 
both work on a per thread basis independently
of other threads and without any synchronisation or locking.

What requires synchronisation is aging. Moving objects
out of the child heaps into a parent needs to be
serialised, and the start of a new sweep of the parent
heap must twiddle a lock or something to ensure it
has a coherent snapshot of the heap at some point
of this serialisation (however the picture of the heap
only needs to be synchronised once).

for example if you have a linked list of the mother heap
blocks, you have to serialise pushing stuff onto it during
ageing .. and you have to get a valid pointer to start the
sweep from.

The point is, this scheme reduces the need for serialisation
to the infrequent need to age objects or start a new
sweep of the mother heap: the actual sweep, raw allocations,
and even young heap compaction, all occur in separate threads and
don't interfere with each other.

Thus, it seems a lot more scalable to a multi-processing
environment to actually have LOW LEVEL purely functional
paradigm: no store can be written more than once.

More precisely, pointers are initialised to NULL,
and be assigned a non-NULL heap pointer once, but non-NULL
pointer can never be erased.

Which means: converting tail calls to a loop and
using mutable store is a nono

Re: Replacement for GMP: Update

2006-08-11 Thread skaller
On Fri, 2006-08-11 at 21:08 +0100, Brian Hulley wrote:
> Florian Weimer wrote:
> > This is the offending part:
> >
> > * 3. All advertising materials mentioning features or use of this
> > software
> > *must display the following acknowledgement:
> > *"This product includes cryptographic software written by
> > * Eric Young ([EMAIL PROTECTED])"
> > *The word 'cryptographic' can be left out if the rouines from the
> > library
> > *being used are not cryptographic related :-).

> Also, what if your company made a T-shirt with a character from a computer 
> game you're developing using Haskell - such legal smallprint would totally 
> spoil the effect... ;-)

Does that T-shirt mention the OpenSSL library?

-- 
John Skaller 
Felix, successor to C++: http://felix.sf.net

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Re[2]: Replacement for GMP: Update

2006-08-11 Thread Peter Tanski

John,


After all on the average call where an object of that
size is free already it is a single array lookup, we have:

(a) fetch pointer (one read)
(b) fetch next (one read)
(c) store next as current (one write)


This is true for memory access; it is not true for memory  
allocation.  I do not know how malloc allocates memory on Windows but  
on general POSIX systems the kernel uses a linked list and lots of  
other management things to reduce fragmentation, such KMEMSTAT.   
Malloc may also block, which is something that you have more control  
over in your own garbage collected heap. A really good explanation  
for the problem of rapidly allocating and deallocating temporary  
blocks of memory under 35kb is here: http://ridiculousfish.com/blog/ 
archives/2006/05/16/36/ .


In any case, Simon Marlow had previously mentioned that alloc (from  
GHC's heap) is faster than malloc.  He is almost certainly correct,  
although I hope the difference will not be that great and the only  
thing I have to worry about is ForeignPtr.  We shall see whether  
malloc-memory makes a difference in the benchmarks.



A purely functional system -- one which does NOT convert
self tail calls into jumps and reuse storage -- can
perhaps be faster, since each thread can have its own
local arena to allocate from (without need of any write
barrier) .. however it isn't clear to me what the trade
off is between cache misses and parallelism.


That is interesting but I do not understand whether your mention of  
self tail calls turned into jumps was low or high level.  From the  
context it seems as if you are talking about a high level  
implementation; each function running in a separate thread.  GHC's  
RTS does use many separate threads (the RTS is threaded by default  
for the latest version, 6.6).


As for turning self tail calls into jumps at the low level, GHC does  
do this through C-- (the GHC implementation of C-- is called Cmm).  I  
believe that is both faster and more memory efficient than a high  
level threaded system.  Philosophically speaking, even if Simon  
Peyton-Jones developed Cmm to solve the problem of efficient  
functional computations, Cmm has turned the research of Haskell from  
research on a computer language to research on a system of  
computation.  (Maybe that is what he meant when some time ago he  
wrote John Meacham and said that they (the GHC researchers)  
considered compiling via C a dead end.)  A language can be anything:  
all it requires is a means of being translated into machine code;  
pseudo-intelligent and advanced compiler systems such as gcc, JHC for  
Haskell or OCaml for the Caml version of ML, may translate  
programming languages into machine code but the underlying  
computation remains largely sequential.  The curious thing about GHC- 
Haskell is that through the prism of Cmm, which enforces such things  
as immutable variables and recursion right at the machine level,  
Haskell is less a language of translation to sequential machine code  
and more a description of a computational model.  If you still think  
I am wrong about this, consider the possibility that Haskell with Cmm  
is a modern research project in the same concept that motivated Lisp:  
a different model of computation.


Best regards,
Peter

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-11 Thread Peter Tanski

Brian,

Therefore I'd recommend that licenses for code used by GHC runtime  
should be either BSD or public domain.


I agree.  I was working on a rewrite of OpenSSL's BN from scratch-- 
maybe a rewrite of GMP would be better--but that is a huge  
undertaking for no other reason than these are big, complex  
projects.  I will have to test Crypto++ and Botan to see if they are  
comparable in a Haskell context (both are written in C++; Crypto++ is  
essentially public domain while Botan has a BSD2 license--reproduce  
copyright in distributions of binary and source code).  I will have  
to write least-common-multiple, bitwise operators and conversions to  
and from floating point representations.


If the FFI was used for bignum then (talking about Windows OS for  
the moment) the bignum implementation could just be supplied as a C  
DLL, perhaps even several different C DLL's for people to choose  
which one they wanted to distribute with their program based on  
speed vs licencing issues. Eg if GMP was in a DLL then it would be  
sufficient to just supply gmp.dll + the gmp LGPL as two files along  
with the app binary and licensing issues would disappear afaiu.  
Another advantage of this direction would be that any slowness in  
the FFI would have to be ironed out, leading to a faster FFI which  
would be good for other things too eg matrix libs, graphics libs  
etc. Finally, separating bignum out of GHC runtime would make GHC  
runtime leaner therefore (hopefully)easier to maintain.


I am testing two versions of GMP against the current internal  
version: one using FFI and another with the ForeignPtrs written in  
C--.  If either is comparable to the internal version that is  
definitely a preferable solution for flexibility.  I have to be very  
picky, though: Simon Marlow, Simon Peyton-Jones and the rest of the  
GHC Team are primarily interested in performance and the integrity of  
the RTS (no one would be happy if the RTS broke for bad FFI calls).


Thanks for the encouragement.

Best regards,
Peter Tanski
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Re[2]: Replacement for GMP: Update

2006-08-11 Thread Peter Tanski

Simon PJ and Bulat,


[the] ForeignPtr
solution [has] gotten a lot cheaper in GHC 6.6 than it used to be, so
it's worth trying.  A merit of the approach is that is avoids fiddling
with the bignum allocator at all.


I actually did not know that until today; I have tried to keep up  
with the rapid changes going on but until Simon Marlow posted the FFI  
syntax patch on cvs-ghc-request I had not read into it that much.  It  
won't be too much trouble for me to do a bare FFI binding to GMP or  
another library (people seem to be having problems with OpenSSL's  
license) but what I have been doing still applies: writing bitwise  
operators and cleaning things up.  I don't know how much the  
indirection of a FFI binding would degrade the speed compared to a  
direct C-- binding (you have an extra function call with FFI); it  
should not be any more costly than indirection through two pointers.   
This will be quick: I will set up a GMP-FFI binding as a speed- 
reference, for starters.


Best regards,
Peter



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-11 Thread Brian Hulley

Florian Weimer wrote:

This is the offending part:

* 3. All advertising materials mentioning features or use of this
software
*must display the following acknowledgement:
*"This product includes cryptographic software written by
* Eric Young ([EMAIL PROTECTED])"
*The word 'cryptographic' can be left out if the rouines from the
library
*being used are not cryptographic related :-).


I would find this a real problem. Not because I wouldn't gladly acknowledge 
contributions in the text of the license to be distributed with the binary, 
or in the Help->About dialog box in the program, but because I wouldn't want 
to have to list contributions in "all advertising materials" eg an 
advertisement for one's program in a magazine etc - it would just make the 
product look amateurish as well as giving undue importance to just one of 
the many people whose work it was making use of.
Also, what if your company made a T-shirt with a character from a computer 
game you're developing using Haskell - such legal smallprint would totally 
spoil the effect... ;-)


Therefore I'd recommend that licenses for code used by GHC runtime should be 
either BSD or public domain.


If the FFI was used for bignum then (talking about Windows OS for the 
moment) the bignum implementation could just be supplied as a C DLL, perhaps 
even several different C DLL's for people to choose which one they wanted to 
distribute with their program based on speed vs licencing issues. Eg if GMP 
was in a DLL then it would be sufficient to just supply gmp.dll + the gmp 
LGPL as two files along with the app binary and licensing issues would 
disappear afaiu. Another advantage of this direction would be that any 
slowness in the FFI would have to be ironed out, leading to a faster FFI 
which would be good for other things too eg matrix libs, graphics libs etc. 
Finally, separating bignum out of GHC runtime would make GHC runtime leaner 
therefore (hopefully)easier to maintain.


Of course this depends on whether or not an FFI implementation of bignum 
would be fast enough compared to directly coding it in the runtime.


(If you choose the DLL method I'd recommend incorporating the GHC version 
number into the name of the DLL to avoid DLL conflicts eg ghc-6.4.2 would 
use gmp-6.4.2.dll etc)


Anyway good luck with whatever way you choose,

Regards, Brian.
--
Logic empowers us and Love gives us purpose.
Yet still phantoms restless for eras long past,
congealed in the present in unthought forms,
strive mightily unseen to destroy us.

http://www.metamilk.com 


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-11 Thread Peter Tanski

Florian,


This is the offending part:

 * 3. All advertising materials mentioning features or use of this  
software

 *must display the following acknowledgement:
 *"This product includes cryptographic software written by
 * Eric Young ([EMAIL PROTECTED])"
 *The word 'cryptographic' can be left out if the rouines from  
the library

 *being used are not cryptographic related :-).

It's generally believed that this is a further restriction in the
sense of section 6 of the GPL (version 2).


In any case, I think it would be more of a restriction to someone
*using* the OpenSSL program, not a developer.


It's a problem for a developer who wants to use a GPLed library
written by someone else, too.


Quite right; my mistake: under the OpenSSL license a developer cannot  
mention features of the software in advertising materials, so the  
license grant of the GPL-OpenSSL program to the developer is void.   
The reason I mentioned "users" only was that in the particular  
problem we have here GHC does not use any other GPL programs (I think  
I am correct--readline is the unix version, not the GPL version,  
correct?) so until the developer compiles a Haskell program with GHC  
(with OpenSSL) *and* that program uses a GPL program, the Haskell  
developer is still able transfer a valid license to users.


The way the OpenSSL FAQ stated the problem, the implication was that  
there was specific mention of OpenSSL in a GPL license.  The  
advertising requirement in the OpenSSL license would certainly  
constitute a "further restriction" under GPLv2 section 6; the strange  
implication is that the "no further restriction" clause is so broad  
the same clause (verbatim) in section 10 of the LGPL means the GPL  
license is incompatible with the terms of the LGPL!  It's all very  
touchy.


Best regards,
Peter
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-11 Thread Florian Weimer
* Peter Tanski:

> On the other hand, the OpenSSL FAQ at  faq.html#LEGAL2> mentions that some "GPL" programs do not allow
> binary combination (static linking) or interoperation (dynamic
> linking) with OpenSSL.  Honestly I have not seen any "GPL" licenses
> like this.  The GNU GPL version 2, at  gpl.html>, does not mention OpenSSL, nor does version 1, nor does the
> GNU LGPL, at .

This is the offending part:

 * 3. All advertising materials mentioning features or use of this software
 *must display the following acknowledgement:
 *"This product includes cryptographic software written by
 * Eric Young ([EMAIL PROTECTED])"
 *The word 'cryptographic' can be left out if the rouines from the library
 *being used are not cryptographic related :-).

It's generally believed that this is a further restriction in the
sense of section 6 of the GPL (version 2).

> In any case, I think it would be more of a restriction to someone
> *using* the OpenSSL program, not a developer.

It's a problem for a developer who wants to use a GPLed library
written by someone else, too.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-11 Thread Simon Marlow

Einar Karttunen wrote:

On 10.08 11:16, Peter Tanski wrote:


Paragraph 6 of the OpenSSL (1998-2005) license states that:

* 6. Redistributions of any form whatsoever must retain the following
*acknowledgment:
*"This product includes software developed by the OpenSSL Project
*for use in the OpenSSL Toolkit (http://www.openssl.org/)".

All developers would have to do is include the acknowledgment stated  
above.



I think this is not bad for specific applications, but forcing this
upon all code compiled by GHC would be bad. I think the compiler
should not link applications by default to things that force
license related things. 


Most packages (including the base package) already force you to do 
license-related things when you distribute binaries:



- Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.


The OpenSSL license is no worse in this respect.

Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: Re[2]: Replacement for GMP: Update

2006-08-11 Thread Simon Peyton-Jones
The issue isn't that malloc is slow, but rather that we don't know when
to call 'free'.   To make sure that we do call free, we must make GHC's
garbage collector remember when it drops the reference to the object,
and then call free.  That is what Peter refers to as the ForeignPtr
solution.  It's gotten a lot cheaper in GHC 6.6 than it used to be, so
it's worth trying.  A merit of the approach is that is avoids fiddling
with the bignum allocator at all.

Simon

| -Original Message-
| From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
| On Behalf Of skaller
| Sent: 11 August 2006 09:21
| To: Bulat Ziganshin
| Cc: Peter Tanski; glasgow-haskell-users@haskell.org
| Subject: Re: Re[2]: Replacement for GMP: Update
| 
| On Fri, 2006-08-11 at 09:34 +0400, Bulat Ziganshin wrote:
| 
| > why you say that ForeignPtr is slow? afaik, malloc/free is slow, but
| > starting from ghc 6.6 speed of _using_ ForeignPtr is the same as for
Ptr
| 
| Er, malloc/free is not slow .. its very fast. I implemented
| an arena based allocator (one which just increments a pointer)
| and it was slower than malloc .. ok so my implementation was
| naive, but still, that does make malloc pretty good.
| 
| After all on the average call where an object of that
| size is free already it is a single array lookup, we have:
| 
| (a) fetch pointer (one read)
| (b) fetch next (one read)
| (c) store next as current (one write)
| 
| It is very hard to beat that. Indeed, the whole cost
| of this technique is probably in the mutex
| based locking that wraps it on a multi-processor.
| 
| A purely functional system -- one which does NOT convert
| self tail calls into jumps and reuse storage -- can
| perhaps be faster, since each thread can have its own
| local arena to allocate from (without need of any write
| barrier) .. however it isn't clear to me what the trade
| off is between cache misses and parallelism.
| 
| Doesn't Haskell do that optimisation?
| 
| It is of course hard to test this today without a
| fairly expensive box with enough CPUs on the one bus.
| 
| --
| John Skaller 
| Felix, successor to C++: http://felix.sf.net
| 
| ___
| Glasgow-haskell-users mailing list
| Glasgow-haskell-users@haskell.org
| http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Re[2]: Replacement for GMP: Update

2006-08-11 Thread skaller
On Fri, 2006-08-11 at 09:34 +0400, Bulat Ziganshin wrote:

> why you say that ForeignPtr is slow? afaik, malloc/free is slow, but
> starting from ghc 6.6 speed of _using_ ForeignPtr is the same as for Ptr

Er, malloc/free is not slow .. its very fast. I implemented
an arena based allocator (one which just increments a pointer)
and it was slower than malloc .. ok so my implementation was
naive, but still, that does make malloc pretty good.

After all on the average call where an object of that
size is free already it is a single array lookup, we have:

(a) fetch pointer (one read)
(b) fetch next (one read)
(c) store next as current (one write)

It is very hard to beat that. Indeed, the whole cost
of this technique is probably in the mutex 
based locking that wraps it on a multi-processor.

A purely functional system -- one which does NOT convert
self tail calls into jumps and reuse storage -- can
perhaps be faster, since each thread can have its own
local arena to allocate from (without need of any write
barrier) .. however it isn't clear to me what the trade
off is between cache misses and parallelism.

Doesn't Haskell do that optimisation?

It is of course hard to test this today without a
fairly expensive box with enough CPUs on the one bus.

-- 
John Skaller 
Felix, successor to C++: http://felix.sf.net

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: Replacement for GMP: Update

2006-08-10 Thread Bulat Ziganshin
Hello Peter,

Friday, August 11, 2006, 4:00:40 AM, you wrote:

> OpenSSL's BN library is primarily tuned to support cryptography,
> particularly the generation of very large primes for public key  
> cryptosystems.  It is possible to separate the BN library out (I am  
> having some success there already).  It is also possible to use the  
> library separately from Haskell using ForeignPtr; essentially doing  
> everything through Haskell's FFI.  I have honestly not benchmarked a  
> FFI-ForeignPtr interface against the current internal-GMP  
> implementation, partly because the overhead required to use  
> ForeignPtr and the availability of garbage-collected memory for GMP  
> indicate that an internal GHC Bignum library would clearly be  
> faster.

why you say that ForeignPtr is slow? afaik, malloc/free is slow, but
starting from ghc 6.6 speed of _using_ ForeignPtr is the same as for Ptr

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-10 Thread Reilly Hayes
I have a Mac as well, but it is an intel one.  GHC does NOT support dynamic linking of Haskell libraries on intel macs, but C libraries like readline & GMP dynamically link just fine.  For example:$ otool -L /usr/local/ghc/lib/ghc-6.5/ghc-6.5/usr/local/ghc/lib/ghc-6.5/ghc-6.5:        /usr/local/lib/libreadline.5.0.dylib (compatibility version 5.0.0, current version 5.0.0)        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 88.1.3)        GMP.framework/Versions/A/GMP (compatibility version 7.0.0, current version 7.0.0)  On Aug 10, 2006, at 7:15 PM, Alec Berryman wrote:Reilly Hayes on 2006-08-10 18:36:49 -0700: There's one thing I don't entirely understand about the GMP problem.   I understand that there are some limitations on GHC's ability to  generate relocatable (and therefore dynamically linkable) code on x86  (a register allocation problem related to the mangler if I recall the  comments in the code correctly).  But this shouldn't prohibit linking  GMP in dynamically, should it?  It's just a C library and GCC should  happily generate relocatable code.  As a dynamically linked library,  there should be no tainting issues to worry about even if the  dynamically linked code is shipped with the executable.Am I missing something? No, the LGPL doesn't mandate source redistribution when you redistributea binary that is dynamically linked to a LGPL-licensed library.  If GHCdid support dynamically linking programs, it wouldn't be an issue, butGHC only supports that on OS X.  I was wondering something similar - is it really easier to replace thefunctionality and speed of GMP than to extend GHC's dynamic librarysupport to other platforms?___Glasgow-haskell-users mailing listGlasgow-haskell-users@haskell.orghttp://www.haskell.org/mailman/listinfo/glasgow-haskell-users  Reilly Hayes[EMAIL PROTECTED] ___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-10 Thread Peter Tanski

Reilly,


...  this shouldn't prohibit linking
GMP in dynamically, should it?  It's just a C library and GCC should
happily generate relocatable code.  As a dynamically linked library,
there should be no tainting issues to worry about even if the
dynamically linked code is shipped with the executable.

Am I missing something?


Not at all.  GHC builds the GMP library only if it is not already  
available on the system.  On my Mac OS X computer it uses a dynamic  
library.  I have not tried using gmp.dll on Windows since I have not  
built GHC on my Windows computer (it is a bit slow--a 600MHz P3).   
But the dynamic library form of GMP only solves the licensing problem  
(admittedly, for my purposes the worst of the bunch).  It should be  
easy to change GMP's build settings so GHC is distributed with a  
dynamic GMP library.


The other problem is that GMP has a mechanism to let the user  
determine its memory allocator, with the caveat that only one  
allocator can be used by a single program.  GHC configures GMP to use  
GHC's RTS-GC for allocation so GHC-compiled programs can't interface  
with GHC separately.  (This would not be such a big problem for  
general programs but C-Haskell cryptographic or scientific programs  
that might benefit from GMP's additional functionality would suffer.)


On a side note, if you have been reading this user-list recently it  
seems that programmers (including myself, I guess) do not want to  
have to package a dynamic library (GMP) with programs compiled with  
GHC--a particularly irksome task if your Haskell program doesn't even  
*use* Integer.  Not only do users have to package the separate dll,  
they also have to package a notice of the GMP copyright along with  
the binary.  Just today Einar Karttunen mentioned that:



*"This product includes software developed by the OpenSSL Project
*for use in the OpenSSL Toolkit (http://www.openssl.org/)".

All developers would have to do is include the acknowledgment stated
above.

...
ps. personally I don't think the advertising clause is bad, but
I think it is bad to force it on other users.


Einar does have a good point, here.  Personally speaking, such  
packaging and licensing stuff is o.k. for free software but for a  
clean commercial distribution it would be a bad thing; a reason to  
choose not to use GHC (or nhc98, for that matter).


Best regards,
Peter
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-10 Thread Alec Berryman
Reilly Hayes on 2006-08-10 18:36:49 -0700:

> There's one thing I don't entirely understand about the GMP problem.   
> I understand that there are some limitations on GHC's ability to  
> generate relocatable (and therefore dynamically linkable) code on x86  
> (a register allocation problem related to the mangler if I recall the  
> comments in the code correctly).  But this shouldn't prohibit linking  
> GMP in dynamically, should it?  It's just a C library and GCC should  
> happily generate relocatable code.  As a dynamically linked library,  
> there should be no tainting issues to worry about even if the  
> dynamically linked code is shipped with the executable.
> 
> Am I missing something?

No, the LGPL doesn't mandate source redistribution when you redistribute
a binary that is dynamically linked to a LGPL-licensed library.  If GHC
did support dynamically linking programs, it wouldn't be an issue, but
GHC only supports that on OS X.  

I was wondering something similar - is it really easier to replace the
functionality and speed of GMP than to extend GHC's dynamic library
support to other platforms?



signature.asc
Description: Digital signature
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-10 Thread Reilly Hayes
There's one thing I don't entirely understand about the GMP problem.  I understand that there are some limitations on GHC's ability to generate relocatable (and therefore dynamically linkable) code on x86 (a register allocation problem related to the mangler if I recall the comments in the code correctly).  But this shouldn't prohibit linking GMP in dynamically, should it?  It's just a C library and GCC should happily generate relocatable code.  As a dynamically linked library, there should be no tainting issues to worry about even if the dynamically linked code is shipped with the executable.Am I missing something?On Aug 10, 2006, at 5:00 PM, Peter Tanski wrote:Thorkil, I would like to mention a few things that I have not seen discussed: Clearly,using an existing library unmodified would be preferable: New developments,error corrections, documentation, wide exposure, all of these things would beavailable. Agreed.  Unfortunately for us it seems that, apart from GMP, none of the fully developed and fast libraries available have all the functionality Haskell requires: ARPREC (and its predecessor, MPFUN) lack all binary operators (AND, OR, XOR, left-shift, right-shift, bit-flip, word-flip); OpenSSL's BN library lacks AND, OR, XOR and the least common multiple operation (lcm), as do the libraries in Botan and Crypto++.  I did not mention this before, but they also lack conversion functions to and from floating point representations.  So it would not be possible to use these in GHC and still maintain the same functionality without writing those transformation functions either separately (slow) or in the libraries themselves (modification).There are libraries that have the functionality we require: MIRACL (at http://indigo.ie/~mscott/) is free for educational use but requires a premium for commercial use; LibTomMath is essentially beta (version 0.38), slow (less than half as fast as OpenSSL, which is equivalent in speed to GMP) and from my experience when putting it together, a little hacked (after all, it is a beta version); MPI (from http://www.cs.dartmouth.edu/~sting/mpi/) is a good library and has all the math functions but lacks the binary operations and has not been updated since 2003; maybe it has not required updating (it was originally written in 1998).  I have read that MPI is not as fast as LibTomMath (I will do a real comparison if anyone is interested).  I do know from timing tests I did run in the MPI distribution that MPI tends to operate better on small integers (128-256 bits).  LibTomMath and MPI are both essentially public domain.  I have heard that LibTomMath is used as the Tcl bignum library; I do not know that library supports Perl's bignum (Math::BigInt) but I could look into it.Bignum libraries are actually fairly widely available but few are portable across a wide variety of domains: Microsoft's .NET bignum library and Apple's vecLib BigNum library, for example.  (Apple's vecLib vBigNum is also limited since it offers big integers only at 256, 512 and 1024 bits and it does not provide all the mathematical operations we require.)You may have noticed from reading the discussions so far that an internal library *is* a possibility. I have looked briefly at the OpenSSL Bignum library and in the areas of memorymanagement, but also error handling, it seems clearly intertwined to someextent with OpenSSL in ways which would appear to rule out the direct use ofthis library for GHC Integers. OpenSSL's BN library is primarily tuned to support cryptography, particularly the generation of very large primes for public key cryptosystems.  It is possible to separate the BN library out (I am having some success there already).  It is also possible to use the library separately from Haskell using ForeignPtr; essentially doing everything through Haskell's FFI.  I have honestly not benchmarked a FFI-ForeignPtr interface against the current internal-GMP implementation, partly because the overhead required to use ForeignPtr and the availability of garbage-collected memory for GMP indicate that an internal GHC Bignum library would clearly be faster.  Many people have given good arguments for using an FFI-interface to such a library (for one, GMP supports dll's and posix dynamic libraries, which would eliminate the licensing issue), so I think before I go on I will set up a benchmark to try the two out.  The one thing I cannot do without taking GMP out of the GHC compiler is a side-to-side comparison of GMP-FFI and GMP-internal because GMP can use only one memory allocator at a time: either GHC's runtime system Garbage Collector or malloc. However, considering the advantages of usingan existing library unchanged, we might consider another possibility: Workingwith the OpenSSL people to modify their library to allow the sort ofinterfacing that is needed for its direct and efficient use in GHC. While, ofcourse, retaining its value as part of OpenSSL. I haven't looked at that for reasons of time--perhaps this is my p

Re: Replacement for GMP: Update

2006-08-10 Thread Peter Tanski

Thorkil,

I would like to mention a few things that I have not seen  
discussed: Clearly,
using an existing library unmodified would be preferable: New  
developments,
error corrections, documentation, wide exposure, all of these  
things would be

available.


Agreed.  Unfortunately for us it seems that, apart from GMP, none of  
the fully developed and fast libraries available have all the  
functionality Haskell requires: ARPREC (and its predecessor, MPFUN)  
lack all binary operators (AND, OR, XOR, left-shift, right-shift, bit- 
flip, word-flip); OpenSSL's BN library lacks AND, OR, XOR and the  
least common multiple operation (lcm), as do the libraries in Botan  
and Crypto++.  I did not mention this before, but they also lack  
conversion functions to and from floating point representations.  So  
it would not be possible to use these in GHC and still maintain the  
same functionality without writing those transformation functions  
either separately (slow) or in the libraries themselves (modification).


There are libraries that have the functionality we require: MIRACL  
(at http://indigo.ie/~mscott/) is free for educational use but  
requires a premium for commercial use; LibTomMath is essentially beta  
(version 0.38), slow (less than half as fast as OpenSSL, which is  
equivalent in speed to GMP) and from my experience when putting it  
together, a little hacked (after all, it is a beta version); MPI  
(from http://www.cs.dartmouth.edu/~sting/mpi/) is a good library and  
has all the math functions but lacks the binary operations and has  
not been updated since 2003; maybe it has not required updating (it  
was originally written in 1998).  I have read that MPI is not as fast  
as LibTomMath (I will do a real comparison if anyone is interested).   
I do know from timing tests I did run in the MPI distribution that  
MPI tends to operate better on small integers (128-256 bits).   
LibTomMath and MPI are both essentially public domain.  I have heard  
that LibTomMath is used as the Tcl bignum library; I do not know that  
library supports Perl's bignum (Math::BigInt) but I could look into it.


Bignum libraries are actually fairly widely available but few are  
portable across a wide variety of domains: Microsoft's .NET bignum  
library and Apple's vecLib BigNum library, for example.  (Apple's  
vecLib vBigNum is also limited since it offers big integers only at  
256, 512 and 1024 bits and it does not provide all the mathematical  
operations we require.)


You may have noticed from reading the discussions so far that an  
internal library *is* a possibility.


I have looked briefly at the OpenSSL Bignum library and in the  
areas of memory
management, but also error handling, it seems clearly intertwined  
to some
extent with OpenSSL in ways which would appear to rule out the  
direct use of

this library for GHC Integers.


OpenSSL's BN library is primarily tuned to support cryptography,  
particularly the generation of very large primes for public key  
cryptosystems.  It is possible to separate the BN library out (I am  
having some success there already).  It is also possible to use the  
library separately from Haskell using ForeignPtr; essentially doing  
everything through Haskell's FFI.  I have honestly not benchmarked a  
FFI-ForeignPtr interface against the current internal-GMP  
implementation, partly because the overhead required to use  
ForeignPtr and the availability of garbage-collected memory for GMP  
indicate that an internal GHC Bignum library would clearly be  
faster.  Many people have given good arguments for using an FFI- 
interface to such a library (for one, GMP supports dll's and posix  
dynamic libraries, which would eliminate the licensing issue), so I  
think before I go on I will set up a benchmark to try the two out.   
The one thing I cannot do without taking GMP out of the GHC compiler  
is a side-to-side comparison of GMP-FFI and GMP-internal because GMP  
can use only one memory allocator at a time: either GHC's runtime  
system Garbage Collector or malloc.



However, considering the advantages of using
an existing library unchanged, we might consider another  
possibility: Working

with the OpenSSL people to modify their library to allow the sort of
interfacing that is needed for its direct and efficient use in GHC.  
While, of

course, retaining its value as part of OpenSSL.


I haven't looked at that for reasons of time--perhaps this is my  
personal target and perhaps it is for the benefit of the next GHC  
release on 26 August: it may take a long time to work with OpenSSL to  
modify their library.  It might be worth looking into, just to cover  
all bases.  The problem for me is that I seem to be doing all the  
work, albeit rather slowly; I simply do not have the time to follow  
every possibility.


(And way further back: Have we tried to discuss the LGPL licence of  
GMP with
the authors? I am not really into all these matters, sorry if this  
does

Re: Replacement for GMP: Update

2006-08-10 Thread Thorkil Naur
Hello,

On Thursday 10 August 2006 07:31, Peter Tanski wrote:
...
> Summary: I finally settled on modifying OpenSSL, since that would be  
...

Being a heavy user of Haskell Integers, I have followed this development with 
great interest. Although your decision has its drawbacks, it could very well 
be the best, all things considered.

I would like to mention a few things that I have not seen discussed: Clearly, 
using an existing library unmodified would be preferable: New developments, 
error corrections, documentation, wide exposure, all of these things would be 
available.

I have looked briefly at the OpenSSL Bignum library and in the areas of memory 
management, but also error handling, it seems clearly intertwined to some 
extent with OpenSSL in ways which would appear to rule out the direct use of 
this library for GHC Integers. However, considering the advantages of using 
an existing library unchanged, we might consider another possibility: Working 
with the OpenSSL people to modify their library to allow the sort of 
interfacing that is needed for its direct and efficient use in GHC. While, of 
course, retaining its value as part of OpenSSL.

(And way further back: Have we tried to discuss the LGPL licence of GMP with 
the authors? I am not really into all these matters, sorry if this doesn't 
make sense.)

Failing that, I would suggest considering the development of the modified 
library to a form that would allow independent use, apart from its use in 
GHC. This would add valuable possibilities to your options when choosing the 
precise mixture of Haskell and, perhaps, raw C code that best balances your 
performance desires and needs for convenience.

I wish you the best of luck with your work.

Regards
Thorkil
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-10 Thread Peter Tanski

Einar,


*"This product includes software developed by the OpenSSL Project
*for use in the OpenSSL Toolkit (http://www.openssl.org/)".

All developers would have to do is include the acknowledgment stated
above.


I think this is not bad for specific applications, but forcing this
upon all code compiled by GHC would be bad. I think the compiler
should not link applications by default to things that force
license related things.

I think this is one reason GMP is being replaced.

ps. personally I don't think the advertising clause is bad, but
I think it is bad to force it on other users.


You may be right.  The licensing problem with GHC, as I understood  
it, is summed up at .  LGPL is very restrictive.


As I have been working on separating BN out of the main OpenSSL  
distribution, renaming symbols and generally reforming it into a  
custom, stand-alone library for GHC I could take it one step further  
and implement it from scratch as a GHC library.  Implementing the BN  
library from scratch may take some time but I will give it a shot and  
see if I can't get better benchmarks.  The downside is that I would  
have more incentive to remove some Cryptography-based cruft, such as  
BN_nnmod, BN_mod_add, BN_mod_sub and the BN-random routines, as these  
are unnecessary for Prelude and GHC.


Best regards,
Peter
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-10 Thread Einar Karttunen
On 10.08 11:16, Peter Tanski wrote:
> Paragraph 6 of the OpenSSL (1998-2005) license states that:
> 
>  * 6. Redistributions of any form whatsoever must retain the following
> *acknowledgment:
> *"This product includes software developed by the OpenSSL Project
> *for use in the OpenSSL Toolkit (http://www.openssl.org/)".
> 
> All developers would have to do is include the acknowledgment stated  
> above.

I think this is not bad for specific applications, but forcing this
upon all code compiled by GHC would be bad. I think the compiler
should not link applications by default to things that force
license related things. 

I think this is one reason GMP is being replaced.

ps. personally I don't think the advertising clause is bad, but
I think it is bad to force it on other users.

- Einar Karttunen
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-10 Thread Peter Tanski

Einar,

In my previous email I wrote something potentially confusing (really  
a typo):



For developers (commercial or open source), the OpenSSL license  
only mentions redistribution of the OpenSSL code in binary form  
(paragraph 2).  In this context "binary form" means the complete  
program binary, not partial binary as with statically linking to a  
library, so developers of GHC programs would *not* have to include  
the whole OpenSSL&SSLeay license in their source code.


I meant in their code, not "source code", source or binary.  I hope  
that helps.


Best regards,
Peter
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-10 Thread Peter Tanski

Einar,


Summary: I finally settled on modifying OpenSSL, since that would be
the easiest to work with under GHC's hood (plain C code, not C++).  I
have to:


Does this cause license problems?

I think OpenSSL license had the advertising clause which means
problems if Haskell programs want to link against GPLed
libraries too.


The relevant portion of the OpenSSL license (reproduced in all source  
code and documentation) is:


 * The licence and distribution terms for any publically available  
version or
 * derivative of this code cannot be changed.  i.e. this code cannot  
simply be

 * copied and put under another distribution licence
 * [including the GNU Public Licence.]

(NOTE: this section is actually from the original SSLeay (1995-1998)  
license, but applies because OpenSSL is distributed under a dual  
license.)


Otherwise the OpenSSL license is a BSD-style license that gives the  
authors credit (BSD2 and BSD3 do not require names in all the source  
code and documentation, only the distribution).


The copying restriction means that I cannot simply take the BN  
library out of the OpenSSL distribution and put that code under the  
GHC license; it does not mean that I cannot distribute the OpenSSL  
code (still under the OpenSSL license) along with GHC code.


For developers (commercial or open source), the OpenSSL license only  
mentions redistribution of the OpenSSL code in binary form (paragraph  
2).  In this context "binary form" means the complete program binary,  
not partial binary as with statically linking to a library, so  
developers of GHC programs would *not* have to include the whole  
OpenSSL&SSLeay license in their source code.


Paragraph 6 of the OpenSSL (1998-2005) license states that:

 * 6. Redistributions of any form whatsoever must retain the following
*acknowledgment:
*"This product includes software developed by the OpenSSL Project
*for use in the OpenSSL Toolkit (http://www.openssl.org/)".

All developers would have to do is include the acknowledgment stated  
above.


On the other hand, the OpenSSL FAQ at  mentions that some "GPL" programs do not allow  
binary combination (static linking) or interoperation (dynamic  
linking) with OpenSSL.  Honestly I have not seen any "GPL" licenses  
like this.  The GNU GPL version 2, at , does not mention OpenSSL, nor does version 1, nor does the  
GNU LGPL, at .  In any case, I  
think it would be more of a restriction to someone *using* the  
OpenSSL program, not a developer.


Best regards,
Peter

P.S. On a related note, OpenSSL's BN library does not contain any  
patented algorithms.



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-10 Thread Peter Tanski

Remember that the memory-allocation mechanism is crucial.  How does BN
do that?


BN uses a structure called "CTX"--OpenSSL calls all such structures  
"CTX"--to hold the local static variables for reentrancy.  CTX  
structures do not affect memory allocation, though they *do* require  
malloc'd memory. For my purposes, the BN-CTX structure does give me  
an easy way to handle thread local storage.  Otherwise, BN uses  
standard malloc'd memory.  Creating a BN-MP (called a BIGNUM; really  
a struct), you either do:


BIGNUM x;
BN_init(&x);// defined in bn_lib.c; uses memset

or,

// uses OpenSSL-named checked malloc:
// OPENSSL_malloc == CRYPTO_malloc
BIGNUM* y = BN_new();   

It would be easy to change the definition of OPENSSL_malloc to call  
RTS-memory as necessary.  It would be more efficient for BN to be  
garbage collected (these bignum libraries allocate and delete a lot  
of small memory blocks (~1-2KB for large integers)).  Since  
ForeignPtr is simply too slow I am bringing BN value allocations into  
the rts as close as possible to how you did it with GMP.




Making a single word contain either a pointer or a non-pointer
(depending on the setting of some bit) would have some quite global
implications, as would losing 32 bit precision from Int#.  I suggest
that you do not couple these two projects together!  Do the GMP thing
first, then investigate this later.  We have quite a few bit-twidding
ideas that we have never followed up.


The original idea was to create a specialized Int30# for that  
purpose.  In any case implementing it would certainly make my  
intended job--getting this done in time for the next release--a bit  
more difficult.


Best regards,
Peter
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Replacement for GMP: Update

2006-08-10 Thread Peter Tanski

Simon,


Possibly, yes.  IIRC, -O3 was mainly to get some loop unrolling.  This
is a performance-critical part of the system though, and when  
making any

changes we like to measure things to make sure we haven't pessimised
performance.


I will try to test it both ways, then.  Also, -optc-O3 is turned on  
even for debug builds since GC_HC_OPTS is set unconditionally.  I  
could change it to:


ifneq (,$(findstring $(SRC_HC_OPTS), -DDEBUG))
GC_HC_OPTS += -optc-O3
endif



(3) I have been looking at how to implement a dual-constructor-in-a-
pointer for Integer (i.e., merge constructors of small Integers and
big Integers into the Int#).  Would that solution be workable or
might it break current Haskell programs?  Just a thought.


Which representation in particular are you talking about here?


I was talking about the definition of Integer in packages/base/GHC/ 
Num.lhs particularly the implementation in PrimOps.cmm, which returns  
Integers as:


   /* returns (# size  :: Int#,
 data  :: ByteArray# #)
   */
   RET_NP(s,p);

If the ByteArray were indicated through a pointer, the representation  
in Haskell would be


J# IntegerT#
{- new type, depending on pointer size,
   either holding 30 or 62 bits of precision -}

and a Cmm procedure returning an Integer would be:

RET_NP(p);

I hope this doesn't confuse you.

Best regards,
Peter
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: Replacement for GMP: Update

2006-08-10 Thread Simon Marlow
On 10 August 2006 06:32, Peter Tanski wrote:

>   for the Makefile in ghc/rts, in lines 300-346,
>   GC_HC_OPTS += -optc-O3
>   --isn't this problematic?  gcc, from -O2 on includes
-fgcse which
>  may *reduce* runtime performance in programs using
computed
> gotos; -fgcse is actually run twice, because
> -frerun-cse-after-loop is also set at -O2.  Would it
be better to
> pass individual flags, such as -funroll-loops and
> -falign-loops=16 (ppc, Intel setting)? 

Possibly, yes.  IIRC, -O3 was mainly to get some loop unrolling.  This
is a performance-critical part of the system though, and when making any
changes we like to measure things to make sure we haven't pessimised
performance.
 
> (3) I have been looking at how to implement a dual-constructor-in-a-
> pointer for Integer (i.e., merge constructors of small Integers and
> big Integers into the Int#).  Would that solution be workable or
> might it break current Haskell programs?  Just a thought.

Which representation in particular are you talking about here?

Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: Replacement for GMP: Update

2006-08-10 Thread Simon Peyton-Jones
Sounds good! 

Remember that the memory-allocation mechanism is crucial.  How does BN
do that?


| (3) I have been looking at how to implement a dual-constructor-in-a-
| pointer for Integer (i.e., merge constructors of small Integers and
| big Integers into the Int#).  Would that solution be workable or
| might it break current Haskell programs?  Just a thought.

Making a single word contain either a pointer or a non-pointer
(depending on the setting of some bit) would have some quite global
implications, as would losing 32 bit precision from Int#.  I suggest
that you do not couple these two projects together!  Do the GMP thing
first, then investigate this later.  We have quite a few bit-twidding
ideas that we have never followed up.

Simon


| -Original Message-
| From: Peter Tanski [mailto:[EMAIL PROTECTED]
| Sent: 10 August 2006 06:32
| To: Simon Peyton-Jones; Simon Marlow; Esa Ilari Vuokko; John Meacham
| Cc: glasgow-haskell-users@haskell.org
| Subject: Replacement for GMP: Update
| 
| Simon PJ, Simon, Esa and John,
| 
| Here is an update on what I have been doing so far in making a grand
| attempt to replace GMP.
| 
| (1) evaluate replacement libraries
|   LibTomMath:
|   Pros-
|   * has all the operators GMP offered
|   Cons-
|   * slow; about half as fast as OpenSSL in my tests for
simple
|  mathematical operations, much slower if you account
for time to
|  write or resize memory. (There is another MP library,
which
|  LibTomMath is based on, that is also very
slow--student work)
|   * not even reentrant; needs significant modification
|   * beta-release; needs a bit of work to get it to
production level
|   * configuration needs to be written (not really
portable, messy)
|   ARPREC:
|   Pros-
|   * very fast (essentially does everything through the
|  Floating Point Unit of a CPU)
|   * complex mathematical operations
|   * very precise
|   * already thread safe (through C++ thread-safe statics)
|   Cons-
|   * no bitwise operations (not even left and right-shifts)
|   * difficult configuration (everything runs by setting a
precision
| level;
|  ("precision level" ~= number of words (doubles) in
array)
|  it does not automatically resize memory; conversion
from
|  MP Real to Integer relies specially on careful
precision-level)
|   * memory inefficient (underestimates the number of real
digits you
|  can fit into a double, i.e., a 64-bit double has 48
bits of
| precision,
|  holding about 9.6 digits per byte, resulting in an
848-byte array
|  to hold an MP number with 1000 digits).
|   OpenSSL:
|   Crypto++ (http://www.eskimo.com/~weidai/cryptlib.html):
|   Botan (http://botan.randombit.net/):
|   Pros-
|   * all of these are fast, since all use Integers to
support
| cryptography;
|  (Crypto++ and Botan are C++ crypto-libraries, all
licenses good)
|   * all of these provide most basic mathematical
operations
|   Cons-
|   * none of these provide &, |, ^(xor) bitwise operators
|   * Botan has least mathematical operators of the three
|   * none provide lcm operator
|   * all would realistically have to have the Integer
libraries stripped
|  out of the distribution and repackaged for GHC
| 
| Summary: I finally settled on modifying OpenSSL, since that would be
| the easiest to work with under GHC's hood (plain C code, not C++).  I
| have to:
|   a. make OpenSSL's BN thread-safe (add thread local storage, at
least)
|   b. optimally add configure-conditional parallelism to BN
(through PVM)
|   c. add &, |, ^, lcm and a few other operations
|   d. separate the BN from the rest of OpenSSL and rename the
symbols
| to avoid conflicts (necessary because I have to modify the library
| anyway)
| 
| (2) work on GHC:
|   * finally understand C--; know what I need to modify
|   * work through Makefiles: touch and go; I haven't mapped out all
the
|  variable settings from configure.in on down when it comes to
DLLs
|   Comment:
|   for the Makefile in ghc/rts, in lines 300-346,
|   GC_HC_OPTS += -optc-O3
|   --isn't this problematic?  gcc, from -O2 on includes
-fgcse which may
|  *reduce* runtime performance in programs using
computed gotos;
| -fgcse is actually run twice, because
-frerun-cse-after-loop is also
| set at -O2.  Would it be better to pass individual
flags, such as
| -funroll-loops and -falign-loops=16 (ppc, Intel
setting)?

Replacement for GMP: Update

2006-08-09 Thread Peter Tanski

Simon PJ, Simon, Esa and John,

Here is an update on what I have been doing so far in making a grand  
attempt to replace GMP.


(1) evaluate replacement libraries
LibTomMath:
Pros-
* has all the operators GMP offered
Cons-
* slow; about half as fast as OpenSSL in my tests for simple
   mathematical operations, much slower if you account for time 
to
   write or resize memory. (There is another MP library, which
   LibTomMath is based on, that is also very slow--student work)
* not even reentrant; needs significant modification
* beta-release; needs a bit of work to get it to production 
level
* configuration needs to be written (not really portable, messy)
ARPREC:
Pros-
* very fast (essentially does everything through the
   Floating Point Unit of a CPU)
* complex mathematical operations
* very precise
* already thread safe (through C++ thread-safe statics)
Cons-
* no bitwise operations (not even left and right-shifts)
		* difficult configuration (everything runs by setting a precision  
level;

   ("precision level" ~= number of words (doubles) in array)
   it does not automatically resize memory; conversion from
   MP Real to Integer relies specially on careful 
precision-level)
* memory inefficient (underestimates the number of real digits 
you
		   can fit into a double, i.e., a 64-bit double has 48 bits of  
precision,

   holding about 9.6 digits per byte, resulting in an 848-byte 
array
   to hold an MP number with 1000 digits).
OpenSSL:
Crypto++ (http://www.eskimo.com/~weidai/cryptlib.html):
Botan (http://botan.randombit.net/):
Pros-
		* all of these are fast, since all use Integers to support  
cryptography;

   (Crypto++ and Botan are C++ crypto-libraries, all licenses 
good)
* all of these provide most basic mathematical operations
Cons-
* none of these provide &, |, ^(xor) bitwise operators
* Botan has least mathematical operators of the three
* none provide lcm operator
* all would realistically have to have the Integer libraries 
stripped
   out of the distribution and repackaged for GHC

Summary: I finally settled on modifying OpenSSL, since that would be  
the easiest to work with under GHC's hood (plain C code, not C++).  I  
have to:

a. make OpenSSL's BN thread-safe (add thread local storage, at least)
b. optimally add configure-conditional parallelism to BN (through PVM)
c. add &, |, ^, lcm and a few other operations
	d. separate the BN from the rest of OpenSSL and rename the symbols  
to avoid conflicts (necessary because I have to modify the library  
anyway)


(2) work on GHC:
* finally understand C--; know what I need to modify
* work through Makefiles: touch and go; I haven't mapped out all the
   variable settings from configure.in on down when it comes to DLLs
Comment:
for the Makefile in ghc/rts, in lines 300-346,
GC_HC_OPTS += -optc-O3
--isn't this problematic?  gcc, from -O2 on includes -fgcse 
which may
   *reduce* runtime performance in programs using computed 
gotos;
  -fgcse is actually run twice, because -frerun-cse-after-loop 
is also
  set at -O2.  Would it be better to pass individual flags, 
such as
  -funroll-loops and -falign-loops=16 (ppc, Intel setting)?

(3) I have been looking at how to implement a dual-constructor-in-a- 
pointer for Integer (i.e., merge constructors of small Integers and  
big Integers into the Int#).  Would that solution be workable or  
might it break current Haskell programs?  Just a thought.


-Peter



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users