Re: [Caml-list] How does OCaml update references when values are moved by the GC?

2010-11-02 Thread Xavier Leroy

Jon Harrop wrote:

I was hoping for a little more detail, of course. :-)

How is the mapping from old to new pointers stored?


With forwaring pointers.  Just take 10 minutes to familiarize yourself
with the standard stopcopy algorithm and everything will be clear:

http://en.wikipedia.org/wiki/Cheney's_algorithm

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Generalized Algebraic Datatypes

2010-10-29 Thread Xavier Leroy
Jacques Le Normand wrote:

 Assuming I understand this syntax, the following currently valid type
 definition would have two interpretations: [...]

Don't take the syntax from my 2008 CUG talk too seriously, it was just
a mock-up for the purpose of the talk.  Besides, it's too early for a
syntax war :-)

This said, Coq could be another source of syntactic inspiration: it
has several equivalent syntaxes for inductive type declarations (a
superset of GADTs), one Haskell-like, others more Caml-like.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Re: Random segfaults / out of memory

2010-03-30 Thread Xavier Leroy

So, is it really forbidden to release the global lock in a noalloc function?


Yes.  Actually, it is forbidden to call any function of the OCaml
runtime system from a noalloc function.

Explanation: ocamlopt-generated code caches in registers some global
variables of importance to the OCaml runtime system, such as the
current allocation pointer.

When calling a regular (no-noalloc) C function from OCaml, these
global variables are updated with the cached values so that everything
goes well if the C function allocates, triggers a GC, or releases the
global lock (enabling a context switch).

This updating is skipped when the C function has been declared
noalloc -- this is why calls to noalloc functions are slightly
faster.  The downside is that the runtime system is not in a
functioning state while within a noalloc C function, and must
therefore not be invoked.

The cost of updating global variables is small, so noalloc makes
sense only for short-running C functions (say,  100 instructions) like
those from the math library (sin, cos, etc).  If the C function makes
significant work (1000 instructions or more), just play it safe and
don't declare it noalloc.

Hope this helps,

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] testers wanted for experimental SSE2 back-end

2010-03-29 Thread Xavier Leroy
Hello Dmitry,

 This is a call for testers concerning an experimental OCaml compiler
 back-end that uses SSE2 instructions for floating-point arithmetic.[...]
 
 I cannot provide any benchmark yet

Too bad :-( I got very little feedback to my call: just one data point
(thanks Gaetan).  Perhaps most OCaml users interested in numerical
computations have switched to x86-64bits already?  At any rate, given
such a lack of interest, this x86-32/SSE2 port isn't going to make it
into the OCaml distribution.

 but even not taking into account
 the better register organization there are at least two areas where
 SSE2 can outperform x87 significantly.
 
 1. Float to integer conversion
 Is quite inefficient on x87 because you have to explicitly set and
 restore rounding mode.

Right.  The mode change makes the conversion about 10x slower on x87
than on SSE2.  Apparently, float-int conversion is uncommon is
numerical code, otherwise we'd observe bigger speedups on real
applications...

 2. Float compare
 Does not set flags on x87 so

The SSE2 code is prettier than the x87 code, but this doesn't seem to
translate into a significant performance gain, in my limited testing.

 As for SSE2 backend presented I have some thoughts regarding the code
 (fast math functions via x87 are questionable,

Most x86-32bits C libraries implement sin(), cos(), etc with the x87
instructions, so I'm curious to know what you find objectionable here.

 optimization of floating compare etc.) Where to discuss that - just
 here or there is some entry in Mantis?

Why not start on this list?  We'll move to private e-mail if the
discussion becomes too heated :-)

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] testers wanted for experimental SSE2 back-end

2010-03-11 Thread Xavier Leroy

Mike Lin wrote:
I have a bunch of biological sequence analysis stuff that could be 
interesting but I am already in x86-64 (Wow! A 64 bit architecture!). 
The above seems pretty clear but just to verify - I would not benefit 
from this new back-end, right?


Right.  Sorry for not mentioning this.  The x86-64 bit code generator for
OCaml uses SSE2 floats, like all C compilers for this platform.  The
experimental back-end I announced is for x86-32 bit.  Some more QA:

Q: I have OCaml installed on my x86 machine, how do I know if it's 32
or 64 bits?

A: Do:

  grep ^ARCH `ocamlopt -where`/Makefile.config

If it says amd64, it's 64 bits with SSE2 floats.
If it says i386, it's 32 bits with x87 floats.
If if says ia32, it's the experimental back-end: 32 bits with SSE2 floats.

Q: If I compile from sources, which code generator is chosen by
default? 32 or 64 bits?

A: OCaml's configure script chooses whatever mode the C compiler
defaults to.  For instance, on a 32-bit Linux installation, the 32-bit
generator is selected, and on 64-bit Linux installation, it's the
64-bit generator.  Mac OS X is more tricky: 10.5 and earlier default
to 32 bits, but 10.6 defaults to 64 bits...

Will Farr wrote:


Oops.  I just ran a bunch of tests on my Mac OS 10.6 system---does
that mean that I compared two sse2 backends?  The ocaml-sse2 branch
definitely produced different code than the trunk, but that could
easily be due to any small difference in the two compilers, and not
due to a change of architecture.


It is quite possible you ended up with two 64-bit, SSE2-float back-ends.
Oups.  Sorry for your time loss.  And, yes, unrelated changes between
release 3.11.2 and the experimental sources I released (based on what
will become 3.12.0) can account for small speed differences.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] testers wanted for experimental SSE2 back-end

2010-03-09 Thread Xavier Leroy

Hello list,

This is a call for testers concerning an experimental OCaml compiler
back-end that uses SSE2 instructions for floating-point arithmetic.
This code generation strategy was discussed before on this list, and I
include below a summary in QA style.

The new back-end is being considered for inclusion in the next major
release (3.12), but performance testing done so far at INRIA and by
Caml Consortium members is not conclusive.  Additional results
from members of this list would therefore be very welcome.

We're not terribly interested in small ( 50 LOC), Shootout-style
benchmarks, since their performance is very sensitive to code and data
placement.  However, if some of you have a sizeable ( 500 LOC) body
of float-intensive Caml code, we'd be very interested to hear about
the compared speed of the SSE2 back-end and the old back-end on your
code.

Switching to QA style:

Q: Where can I get the code?

A: From the SVN repository:

svn checkout http://caml.inria.fr/svn/ocaml/branches/sse2 ocaml-sse2

Source-code only.  Very lightly tested under Windows, so you might be
better off testing under Unix.

Q: What is this SSE2 thingy?

A: An extension of the Intel/AMD x86 instruction set that provides,
among other things, 64-bit float arithmetic instructions operating
over 64-bit float registers.  Before SSE2, the only way to perform
64-bit float arithmetic on x86 was the x87 instructions, which compute
in 80-bit precision and use a stack instead of registers.

Q: Why this sudden interest in SSE2?

A: SSE2 has several potential advantages over x87, including:

- The register-based SSE2 model fits the OCaml back-end much better
  than the stack-based x87 model.  In particular, let-bound intermediate
  results of type float can be kept in SSE2 registers, while in
  the current x87 mode they are systematically flushed to the stack.

- SSE2 implements exactly 64-bit IEEE arithmetic, giving float results
  that are consistent with those obtained on other platforms and with
  the OCaml bytecode interpreter.  The 80-bit format of x87 produces
  different results and can causes surprises such as double rounding
  errors.  (For more explanations, see David Monniaux's excellent article,
  http://hal.archives-ouvertes.fr/hal-00128124/ )

- Some x86 processors execute SSE2 instructions faster than their x87
  counterparts.  This speed difference was notable on the Pentium 4
  in particular, but is much smaller on more recent processors such as
  Core 2.

Note that x86-64 bits systems as well as Mac OS X already use SSE2 as
their default floating-point model.

SSE2 also has some potential disadvantages:

- The instructions are bigger than x87 instructions, causing some
  increase in code size and potentially some decrease in instruction
  cache efficiency.

- Computing intermediate results in 80-bit precision, like x87 does,
  can improve the numerical stability of poorly-conditioned float
  computations, although it doesn't make a difference for well-written
  numerical code.

Q: Is SSE2 universally available on x86 processors?

A: Not universally but pretty close.  SSE2 made its debut in 2000, in
the Pentium 4 processor.  All x86 machines built in the last 4 years
or so support SSE2, but pre-Pentium 4 and pre-Athlon64 processors do not.

Q: So if you adopt this new back-end, OCaml will stop working on my
trusty 1995-vintage Pentium?

A: No.  Under friendly pressure from our Debian friends, we agreed to
keep the x87 back-end alive for a while in parallel with the SSE2
back-end.  The x87 back-end is selected at configuration time if the
processor doesn't support SSE2 or if a special flag is given to the
configure script.

Q: I observed a 20% (speedup|slowdown)!  Should I tell the world about it?

A: If your benchmark spends all its time in 10 lines of OCaml, maybe
not.  On such small codes, variations in code and data placement alone
(without changing the instructions that are actually executed) can
result in performance variations by 20%, so this is just experimental
noise.  Larger programs are less sensitive to this noise, which is why
we're much more interested in results obtained on real OCaml
applications.  Finally, one micro-benchmark slowed down by a factor of
2 for reasons we couldn't explain.

Q: What are those inconclusive results you mentioned?

A: On medium-sized numerical kernels (e.g. FFT, Gaussian process
regression), we've observed speedups of about 8% on Core 2 processors
and somewhat higher on recent AMD processors.  On bigger OCaml
applications that perform floating-point computations but not
exclusively, the performance difference was lost in the noise.

Looking forward to interesting experimental results,

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr

Re: [Caml-list] Alignment of data

2010-01-27 Thread Xavier Leroy
I am working on some ppc architecture, and I realize that I have a 
(very) big slowdown due to bad alignment of data by ocamlopt. I need to 
have my data aligned in memory depending of the size of the data : 
floats are to be aligned on 8 bytes, int on 4 bytes, etc


First, make sure that misalignment is really the source of your
slowdown.  The PowerPC processors I'm familiar with can access
4-aligned 8-byte floats with minimal overhead, while the penalty is
much bigger for other misalignments.  Indeed, the PowerPC calling
conventions mandate that some 8-byte float arguments are passed on the
stack at 4-aligned addresses, so that's strong incentive for the
hardware people to implement those accesses efficiently.

BUT, after verification, I remark that ocamlopt doesn't align as I need. 
I tried to use ARCH_ALIGN_DOUBLE, but it doesn't seem to be what I 
thought, and doesn't change anything for my needs. Is there ANY way to 
obtain what I need easily or at least quickly ?


Data allocated in the Caml heap is word-aligned, where a word is 4
bytes on a 32-bit platform and 8 bytes on a 64-bit platform.  This is
deeply ingrained in the Caml GC and allocator, so don't expect to
change this easily.

What you can do, however:

1- Use the 64-bit PowerPC port.  Everything will be 8-aligned then.

2- Use a bigarray instead of a float array.  Bigarray data is
allocated outside the heap, at naturally-aligned addresses.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] dynamic link library path under mingw

2010-01-27 Thread Xavier Leroy

I'm doing something wrong, but can't figure out what?


Try setting the OCAMLRUNPARAMS environment variable to the value v=256.
The run-time system will then print additional debug messages
concerning DLL searching and loading.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Re: multicore wish

2009-12-28 Thread Xavier Leroy
Gerd Stolpmann wrote:

 It works with all types:
 
 https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_mem.mli
 
 look for init_value. It's non-released code yet.
 
 However, there are some problems: Values outside the heap do not support
 the polymorphic comparison and hash functions. That's a hard limitation,
 e.g. you cannot even compare two strings, or build a hash table with
 strings as keys. That limits the usefulness of shared memory.

In OCaml 3.11 and later, you can call

   caml_page_table_add(In_static_area, start, end)

to inform the run-time system that the address range [start, end)
contains well-formed Caml data that polymorphic primitives can safely
work on.  This should solve your problem.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] old Caml projects looking for new maintainers contributors

2009-12-11 Thread Xavier Leroy
Hello list,

Prompted by Stefano Zacchiroli and Grégoire Henry's recent success in
resurrecting CamlJava, I just migrated a number of my inactive OCaml
side projects to the forge on ocamlcore.org:

- CamlIDL (stub code generator for Caml/C interface)
- CamlJava (low-level interface between Caml and Java through JNI)
- CamlZIP (library to handle zip and gzip files)
- Cryptokit (library of cryptographic primitives)
- OCamlAgrep (library for string searching with errors)
- OCamlMPI (Caml interface for the MPI parallel programming model)
- SpamOracle (Bayesian mail filter)

The purpose of this migration is to make it easy for others to
participate in (or even take over) the maintenance and development of
these projects, two tasks that I've very much neglected lately.

So, if you find these projects useful and feel like participating in
one way or another, just create an account on forge.ocamlcore.org and
ask to join these projects.

Also, the kind folks who package these projects are welcome to update
their upstream URLs to point to the files on ocamlcore.org instead of
my home pages: currently, they are identical, but further releases, if
any, will be on ocamlcore.org.

Cheers,

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Gc.compact surprisingly helpful

2009-12-05 Thread Xavier Leroy
Aaron Bohannon wrote:

 In order to prevent irregular GC pauses, I decided to try inserting
 a call to Gc.compact once per loop.  I was hoping the overall
 throughput wouldn't suffer too badly.  To my very pleasant surprise,
 I found the throughput *increased* by about 2%!!  So in a 15 second
 run (with no idle time, as I said), it now does about 130 heap
 compactions instead of 3 and gets better total performance because
 of it, utterly defying my GC intuition.

As Damien said, maybe the original code ran into a bad case of free
list fragmentation which the compactor cured.  But maybe the 2% is
just measurement noise.  Some of my favorite horror stories about
timings:

http://www-plan.cs.colorado.edu/diwan/asplos09.pdf
  We see that something external and orthogonal to the program,
   i.e., changing the size (in bytes) of an unused environment variable,
   can dramatically (frequently by about 33% and once by almost 300%)
   change the performance of our program.

http://compilers.iecc.com/comparch/article/96-02-165
  [ Execution speed for the same binary varies by a factor of 2
depending on cache placement ]

I have also personally observed speed differences of 20% just from
inserting or deleting dead code in a program...

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] public access to the SVN repository for OCaml

2009-10-22 Thread Xavier Leroy

Hello list,

This is an announcement for those of you who track the development of
OCaml between releases.

The OCaml development team recently switched from CVS to Subversion
(SVN) as our version control system.  Public read-only access to our
new SVN repository is now available as described in the following
page:

http://caml.inria.fr/ocaml/anonsvn.en.html

The old CVS repository on camlcvs.inria.fr is still accessible but no
longer updated.  It will go away in a few weeks.

Subersively yours,

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Incremental linking

2009-09-30 Thread Xavier Leroy

Dawid Toton wrote:

I have lot of modules and they are compiled to native code.
So I have .cmx and .o files and want to link them faster.
Is is possible to make linking an associative operation acting on modules?
[...]
Documentation of ld says that files produced with --relocatable can be 
used as intermediate partially linked files. Can something like this be 
done with object code produced by ocamlopt?


Yes.  ocamlopt -pack actually calls ld -r underneath to
consolidate several compilation units in a single .cmx/.o file.
ld -r will resolve references between these compilation units.

Gerd Stolpmann wrote:

Well, you can link several .cmx files (and their accompanying .o files)
to a .cmxa file (and an accompanying .a file): ocamlopt -a


From a linking standpoint, ocamlopt -a is equivalent to ar: it
does not resolve any references, just concatenates individual
.cmx/.o files in a single .cmxa/.a file.   That can still speed up
linking a bit, since reading one big .a file is faster than reading a
zillion small .o files.

Generally speaking, I'm somewhat surprised that linking time is an
issue for Dawid.  Modern Unix linkers are quite fast, and the
additional link-time work that OCaml does is small.  Let us know if
you manage to narrow the problem.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-25 Thread Xavier Leroy

Jon Harrop wrote:

On Thursday 24 September 2009 13:39:40 Stefano Zacchiroli wrote:

On Thu, Sep 24, 2009 at 12:52:24PM +0100, Jon Harrop wrote:

The next steps are to get oc4mc into the apt repositories and build

Uhm, I'm curious: how do you plan to achieve that?


Good question. I have no idea, of course. :-)


That would be suicidal.  I definitely do not want to belittle the work
of Philippe and his teammates -- what they did is an amazing hack
indeed --, but you need to keep in mind the difference between a
proof-of-concept experiment and a product.

In a proof-of-concept experiment, you implement the feature want to
experiment with and keep everything else as simple as possible
(otherwise there is little chance that you'll complete the
experiment).  That's exactly what Philippe et al did, and rightly so:
their GC is about the simplest you can think of, they didn't bother
adapting some features of the run-time system, they target AMD64/Unix
only, etc.  Now they have a platform they can experiment with and make
measurements on: mission accomplished.

In a product, you'd need something that is essentially a drop-off
replacement for today's OCaml and can run, say, Coq with at most a 10%
slowdown.  That's a long way to go (I'd say a couple of years of work).
For example, single-generation stop-and-copy GC is known to have
terrible performance (both in running time and in latency) for
programs that have large data sets and allocate intensively.  This is
true in the sequential case and even worse in a stop-the-world
parallel setting, by Amdahl's law.  Note that the programs I mentioned
above are exactly those that the Caml user community cares most about
-- not matrix multiply nor ray tracers, Harrop's propaganda
notwithstanding -- and those for which OCaml has been delivering
top-class performance for the last 12 years -- again, Harrop's
propaganda notwithstanding.

On your way to a product, you'd need to independently-collectable
generations (which means some work on the compiler as well), plus a
parallel or even better concurrent major collector.  And of course a
lot more work on the runtime system and C interface to make everything
truly reentrant while remaining portable.  And probably some kind of
two-level scheduler for threads.  And after all that work
you'd end up with an extremely low-level and unsafe parallel
programming model that you'd need to tame by developing clever
libraries that mere mortals can use effectively (Apple's Grand Central
was mentioned on this thread; it's a good example)...

In summary, Philippe and his coauthors do deserve a round of applause,
but please keep a cool head.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Optimizing Float Ref's

2009-09-03 Thread Xavier Leroy
I'm running OCaml 3.11.1, and I noticed something strange in some native 
code for matrix multiply today.  The code was

[...]
[Local float ref being unboxed or not? ]


You omitted the definition of dims, but after adding the obvious
definition, the float ref sum is indeed completely unboxed and is
kept in a float register (on x86-64 bits) or stack location (on x86-32
bits).  No modify takes place in the inner loop.  So, I don't
understand the problem you observed.  Feel free to post a report on
the BTS with a *complete* piece of code that reproduces the problem.

But, I thought that float ref's were automatically unboxed by the 
compiler when they didn't escape the local context.


Yes, if all uses of the float ref are unboxed, which is the case in
your code.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] book Le langage Caml

2009-07-20 Thread Xavier Leroy

A few months ago, there was a discussion on this list about
Le langage Caml, an early book on Caml (Light) programming written
by Pierre Weis and I.  The book was out of print, but the publisher,
Dunod Éditions, graciously agreed to relinquish its rights and give
them back to the authors.

Pierre and I are therefore happy to announce that the full text of the
book (2nd edition) is now available freely:

   http://caml.inria.fr/pub/distrib/books/llc.pdf

There was a companion book, Manuel de référence du langage Caml,
which is the French translation of the Caml Light 0.7 reference
manual.  For completeness, we also made it available:

  http://caml.inria.fr/pub/distrib/books/manuel-cl.pdf

Both texts are distributed under the Creative Commons BY-NC-SA
license.

Enjoy,

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Ocamlopt x86-32 and SSE2

2009-05-12 Thread Xavier Leroy

This is an interesting discussion with many relevant points being
made.  Some comments:

Matteo Frigo:

Do you guys have any sort of empirical evidence that scalar SSE2 math is
faster than plain old x87?
I ask because every time I tried compiling FFTW with gcc -m32
-mfpmath=sse, the result has been invariably slower than the vanilla x87
compilation.  (I am talking about scalar arithmetic here.  FFTW also
supports SSE2 2-way vector arithmetic, which is of course faster.)


gcc does rather clever tricks with the x87 float stack and the fxch
instruction, making it look almost like a flat register set and
managing to expose some instruction-level parallelism despite the
dependencies on the top of the stack.  In contrast, ocamlopt uses the
x87 stack in a pedestrian, reverse-Polish-notation way, so the
benefits of having real float registers is bigger.

Using the experimental x86-sse2 port that I did in 2003 on a Core2
processor, I see speedups of 10 to 15% on my few standard float
benchmarks.  However, these benchmarks were written in such a way that
the generated x87 code isn't too awful.  It is easy to construct
examples where the SSE2 code is twice as fast as x87.

More generally, the SSE2 code generator is much more forgiving towards
changes in program style, and its performance characteristics are more
predictable than the x87 code generator.  For instance, manual
elimination of common subexpressions is almost always a win with SSE2
but quite often a loss with x87 ...

Pascal Cuoq:

According to http://en.wikipedia.org/wiki/SSE2, someone using a Via C7
should be fine.


Richard Jones:

AMD Geode then ...


Apparently, recent versions of the Geode support SSE2 as well.
Low-power people love vector instruction sets, because it lets them do
common tasks like audio and video decoding more efficiently, ergo with
less energy.

Sylvain Le Gall:

If INRIA choose to switch to SSE2 there should be at least still a way
to compile on older architecture. Doesn't mean that INRIA need to keep
the old code generator, but should provide a simple emulation for it. In
this case, we will have good performance on new arch for float and we
will still be able to compile on old arch. 


The least complicated way to preserve backward compatibility with
pre-SSE2 hardware is to keep the existing x87 code generator and bolt
the SSE2 generator on top of it, Frankenstein-style.  Well, either
that, or rely on the kernel to trap unimplemented SSE2 instructions
and emulate them in software.  This is theoretically possible but I'm
pretty sure neither Linux nor Windows implement it.

David Mentre:

Regarding option 2, I assume that byte-code would still work on i386
pre-SSE2 machines? So OCaml programs would still work on those machines.


You're correct, provided the bytecode interpreter isn't compiled in
SSE2 mode itself (see below for one reason one might want to do this).
However, packagers would still be unhappy about this: packaged OCaml
applications like Unison or Coq are usually compiled to native-code
(the additional speed is most welcome in the case of Coq...).
Therefore, packagers would have to choose between making these
applications SSE2-only or make them slower by compiling them to bytecode.

Dmitry Bely:

[Reproducibility of results between bytecode and native]
I wouldn't be so sure. Bytecode runtime is C compiler-dependent (that
does use x87 for floating-point calculations), so rounding errors can
lead to different results.


That's right: even though it stores all intermediate float results in
64-bit format, a bytecode interpreter compiled in default x87 mode still
exhibits double rounding anomalies.  One would have to compile it with
gcc in SSE2 mode (like MacOS X does by default) to have complete
reproducibility between bytecode and native.


Floating point is always approximate...


I used to believe strongly in this viewpoint, but after discussion
with people who do static analysis or program proof over float
programs, I'm not so sure: static analysis and program proof are
difficult enough that one doesn't want to complicate them even further
to take extended-precision intermediate results and double rounding
into account...

To finish: I'm still very interested in hearing from packagers.  Does
Debian, for example, already have some packages that are SSE2-only?
Are these packages specially tagged so that the installer will refuse
to install them on pre-SSE2 hardware?  What's the party line?

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Ocamlopt x86-32 and SSE2

2009-05-08 Thread Xavier Leroy
Dmitry Bely wrote:

 I see. Why I asked this: trying to improve floating-point performance
 on 32-bit x86 platform I have merged floating-point SSE2 code
 generator from amd64 ocamlopt back end to i386 one, making ia32sse2
 architecture. It also inlines sqrt() via -ffast-math flag and slightly
 optimizes emit_float_test (usually eliminates an extra jump) -
 features that are missed in the original amd64 code generator.

You just passed black belt in OCaml compiler hacking :-)

 Is this of any interest to anybody?

I'm definitely interested in the potential improvements to the amd64
code generator.

Concerning the i386 code generator (x86 in 32-bit mode), SSE2 float
arithmetic does improve performance and fit ocamlopt's compilation
model much better than the current x87 float arithmetic, which is a
bit of a hack.  Several options can be considered:

1- Have an additional ia32sse2 port of ocamlopt in parallel with the
   current i386 port.

2- Declare pre-SSE2 processors obsolete and convert the current
   i386 port to always use SSE2 float arithmetic.

3- Support both x87 and SSE2 float arithmetic within the same i386
   port, with a command-line option to activate SSE2, like gcc does.

I'm really not keen on approach 1.  We have too many ports (and
their variants for Windows/MSVC) already.  Moreover, I suspect
packagers would stick to the i386 port for compatibility with old
hardware, and most casual users would, too, out of lazyness, so this
hypothetical ia32sse2 port would receive little testing.

Approach 2 is tempting for me because it would simplify the x86-32
code generator and remove some historical cruft.  The issue is that it
demands a processor that implements SSE2.  For a list of processors, see
  http://en.wikipedia.org/wiki/SSE2
As a rule of thumb, almost all desktop PC bought since 2004 has SSE2,
as well as almost all notebooks since 2006.  That should be OK for
professional users (it's nearly impossible to purchase maintenance
beyond 3 years, anyway) and serious hobbyists.  However, packagers are
going to be very unhappy: Debian still lists i486 as its bottom line;
for Fedora, it's Pentium or Pentium II; for Windows, it's a 1GHz
processor, meaning Pentium III.  All these processors lack SSE2
support.  Only MacOS X is SSE2-compatible from scratch.

Approach 3 is probably the best from a user's point of view.  But it's
going to complicate the code generator: the x87 cruft would still be
there, and new cruft would need to be added to support SSE2.  Code
compiled with the SSE2 flag could link with code compiled without,
provided the SSE2 registers are not used for parameter and result
passing.  But as Dmitry observed, this is already the case in the
current ocamlopt compiler.

Jean-Marc Eber:
 But again, having better floating point performance (and
 predictable behaviour, compared to the bytecode version) would be a
 big plus for some applications.

Dmitry Bely:
 Don't quite understand what is predictable behavior - any generator
 should conform to specs. In my tests x87 and SSE2 backends show the
 same results (otherwise it would be called a bug).

You haven't tested enough :-).  The x87 backend keeps some intermediate
results in 80-bit float format, while the SSE2 backend (as well as all
other backends and the bytecode interpreter) compute everything in
64-bit format.  See David Monniaux's excellent tutorial:
  http://hal.archives-ouvertes.fr/hal-00128124/en/
Computing intermediate results in extended precision has pros and
cons, but my understanding is that the cons slightly outweigh the pros.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Ocamlopt code generator question

2009-05-05 Thread Xavier Leroy

For amd64 we have in asmcomp/amd64/proc_nt.mlp:

(*  xmm0 - xmm15  100 - 115   xmm0 - xmm9: Caml function arguments
xmm0 - xmm3: C function arguments
xmm0: Caml and C function results
xmm6-xmm15 are preserved by C *)

let loc_arguments arg =
  calling_conventions 0 9 100 109 outgoing arg
let loc_parameters arg =
  let (loc, ofs) = calling_conventions 0 9 100 109 incoming arg in loc
let loc_results res =
  let (loc, ofs) = calling_conventions 0 0 100 100 not_supported res in loc

What these first_float=100 and last_float=109 for loc_arguments and
loc_parameters affect? My impression is that floats are always passed
boxed, so xmm registers are in fact never used to pass parameters. And
float values are returned as a pointer in eax, not a value in xmm0 as
loc_results would suggest.


The ocamlopt code generators support unboxed floats as function
parameters and results, as well as returning multiple results in
several registers.  (Except for the x86-32 bits port, because of the
weird floating-point model of this architecture.)  You're right that
the ocamlopt middle-end does not currently take advantage of this
possibility, since floats are passed between functions in boxed state.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] arm backend

2009-05-05 Thread Xavier Leroy

Joel Reymont wrote:


Is the ARM backend (ocamlopt) usable and actively maintained?


In brief: yes modulo ABI issues; yes.

In more details:

In OCaml 3.11 and earlier, the ARM port uses an old ABI (software
conventions on using registers, etc).  This ABI corresponds to the
arm port of Debian; I don't know about other Linux distros.
OCaml/ARM works like a charm on platforms supported by Debian/arm.
I use it on a Linksys NSLU2.

However, most embedded Linux/ARM platforms use a more recent,
incompatible ABI called EABI.  In Debian Lenny, it's available under
the name armel.

I recently revised the OCaml/ARM port to adapt it to EABI and to
software floating-point emulation.  You can find it in the CVS trunk,
and testing and feedback is most welcome.  It works fine under Debian
Lenny armel.

Floating-point performance is better than with the old port, because
the latter used floating-point instructions that are no longer
available on contemporary ARM processors and therefore had to be
trapped and emulated by the kernel.  In contrast, with soft
floating-point, emulation is performed in user land by C library
functions, which can also take advantage of vector float
instructions if the processor supports them.

Concerning the iPhone, it is not supported out of the box by 3.11 nor
by the CVS trunk code.  For 3.11, several patches have been mentioned
on this list; it would be great if someone with iPhone development
experience could combine them and publish a unified patch.

For the CVS trunk code, it seems we are getting close: as far as I
could see, MacOSX/ARM uses EABI plus Apple's signature approach to
dynamic linking.  However, I haven't yet succeeded in running Apple's
iPhone SDK compilers from the command-line.  (It looks like one of
those Microsoft SDK's that assume everyone is developing from the
vendor-supplied IDE...)  Again, I welcome feedback and patches from
iPhone development experts.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OCaml and Boehm

2009-04-13 Thread Xavier Leroy
 Is the OCaml runtime Boehm-safe? That is, can it be run with Boehm
 turned on and traversing OCaml's heap? (So that the OCaml heap can
 provide roots to Boehm.)

I conjecture the answer is yes, although it's hard to tell for sure
without a precise specification of what is/is not OK with the
Boehm-Demers-Weiser collector.

From the standpoint of this collector, OCaml's heap is just a set of
large-ish blocks allocated with malloc()  (*) and containing a zillion
pointers within those blocks.  OCaml doesn't play any dirty tricks
with pointers: no xoring of two pointers, no pointers represented as
offsets from a base, no pointers one below or one above a malloc-ed
block.  Most pointers are word-aligned but we sometimes play tricks
with the low 2 bits.

Of course, almost all Caml pointers point inside those malloc-ed
blocks, not to the beginning, but I'm confident that the B-D-W collector
can handle this, otherwise it would fail on pretty much any existing C
code.

This said, I agree with Basile that what you're trying to achieve
(coexistence between several GCs) is risky, and that a design based on
message passing and separated memory spaces would be more robust, if
feasible.

- Xavier Leroy


(*) In 3.10 and earlier releases, OCaml sometimes used mmap() instead
of malloc() to obtain these blocks.  Starting from 3.11, malloc() is
the only interface OCaml uses to obtain memory from the OS.

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] questions

2009-04-01 Thread Xavier Leroy

I saw Xavier Leroy teach caml at the CNAM in france, and he know how
to teach.


Just for the record: I never lectured at CNAM, but probably you're
thinking of Pierre Weis, who taught a great programming in Caml
course there for several year.  That course was the main starting
point for our book Le langage Caml.

Regards,

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] questions

2009-03-31 Thread Xavier Leroy
 There must be some reason why the manual and other materials on the
 official site are of such poor quality. I've thought a bit about it, and
 the only reason I see is that the authors do not have a feel for what it
 takes to learn/understand/use that language. They obviously know it all
 through, but that's still far removed from being able to explain it to
 someone else. I don't know, of course, how it is that one understands
 something well yet is not able to explain it to somebody else. To me,
 that's very fragile knowledge.

Because we are autistic morons who lack your rock-solid knowledge, if
I properly catch your (rather insulting) drift?

At the very least, you're confusing to be able with to intend to.
The tutorial part of the OCaml reference manual was a quick job
targeted at readers who already know functional programming and just
want a quick overview of what's standard and what's different in
OCaml.  Maybe that shouldn't be titled tutorial at all.

Teaching functional programming in OCaml to beginners is a rather
different job, for which they are plenty of good books already.  Most
of them happen to be in French for various reasons: O'Reilly's refusal
to publish the English translation of the Chailloux-Manoury-Pagano
book; the Hickey-Rentsch controversy, etc.  But, yes, some talented
teachers invested huge amounts of time in writing good intro to Caml
programming books.  Don't brush their efforts aside.

One last word to you, that Xah Lee troll, and anyone else on this
list: if you're not happy with the existing material, write something
better.  Everyone will thank you and you'll get to better appreciate
the difficulty of the task.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] PowerPC 405

2009-03-27 Thread Xavier Leroy
Just to complement Basile's excellent answers:

 Do you know if it is possible to compile caml code on a PowerPC 405
 from the Vertex 4 family ?
 We'd like to put this processor in a FPGA.  On the Caml's website,
 it is written PowerPC but is it only for Macintosch ?

Not just Macintosh: PowerPC/Linux is also supported and works very
well.

 Yes, it will run Linux. It will have the uclibC or even the lib C.

Then you're in good shape.  I would expect a basic OCaml system to
work with uclibC, although a number of external libraries might not.
With GNU libC, everything will work but watch out for code size:
glibc is big!

 The best case is to run native code for better performance. We'd
 like to cross-compile for the PowerPC.

Setting up OCaml as a cross-compiler is a bit of a challenge at the
moment.  As a prerequisite, you'll need a complete cross-compilation
environment for C: cross-compiler, cross-binutils, libraries and
header files for your target.  It sounds obvious but in my experience
that's quite hard to get right.  Then, there is a bit of configuration
magic to be done on the OCaml side.  Write back for help if you're
going to follow this way.

A perhaps simpler alternative would be to compile on a bigger
PowerPC/Linux platform.  An old Mac would be handy for this, but you
can also use a Sony Playstation 3 (if you happen to have one around
for, ahem, RD purposes) after installing YellowDog Linux on it.

- Xavier doing lots of RD with my PS3 Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Récursivité terminale

2009-03-26 Thread Xavier Leroy
  Je voudrais savoir s'il existait un moyen de transformer une fonction 
récursive non terminale en fonction récursive terminale avec Caml.


[ Translation: is there a way to transform a non-tail-recursive function
  into a tail-recursive function? ]

A technique that always works is to convert your function to
continuation-passing style.  The resulting code is hard to read and
not particularly efficient, though.

It is possible to do better in a number of specific cases.  Functions
operating over lists can often be made tail-rec by adding an
accumulator parameter and reversing the accumulator at the end.
For instance, List.map f l (not tail-rec) can be rewritten as
List.rev (List.rev_map f l) (tail-rec).

For more complex data structures than lists, Huet's zippers can often
be used for the same purpose.

Happy Googling,

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Google summer of Code proposal

2009-03-24 Thread Xavier Leroy

   2- OCaml's strategy is close to optimal for symbolic computing.


Is MLton not several times faster than OCaml for symbolic computing?


No, only in your dreams.  If there was a Caml or SML compiler that was
twice as fast as Caml on codes like Coq or Isabelle/HOL, everyone (me
included) would have switched to that compiler a long time ago.
MLton can probably outperform Caml on some symbolic codes, but not by
a large factor and not because of data representation strategies (but
rather because of more aggressive inlining and the like).

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] The new OCaml book (Objective Caml Programming Language by Tim Rentsch)

2009-03-02 Thread Xavier Leroy
 Overall, I can't help seeing that any author who isn't known on this
 list ends up with a questionable book -- first Smith and now Rentsch. 
 Perhaps the elders should form a book vetting committee?

Well, the power to decide is in the hands of publishers (initially)
and customers (later).  But I can assure you that reputable publishing
houses like Springer, Cambridge University Press or MIT Press do
sollicit opinions from academics like me and take them into account.
Their area editors attend major conferences like Principles of
Programming Languages and it's always a pleasure to chat with them.
But there isn't much that can be done with less reputable publishers
and self-publishing, as Alexy remarked.

Coming back to the Hickey/Rentsch book(s), I feel deeply sad about
the mess that is unfolding on this list.  I proofread a draft of Jason
Hickey's book, at his request, and found it very good and just what
the OCaml community is still missing: a well-written, English-language
book on Caml appropriate both as a reference and as teaching material.
(I'm not criticizing the other books in english on OCaml -- thanks God
they exist! -- just noting that they don't quite fit this exact purpose.)
What we now have is lawsuit material...  I sincerely hope some kind of
agreement can still be found at this point.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Lazy and Threads

2009-02-20 Thread Xavier Leroy
Victor Nicollet wrote:
 I'm working with both lazy expressions and threads, and noticed that the
 evaluation of lazy expressions is not thread-safe:

Yaron Minsky wrote:
 At a minimum, this seems like a bug in the documentation. The
 documentation states very clearly that Undefined is called when a value
 is recursively forced.  Clearly, you get the same error when you force a
 lazy value that is in the process of being forced for the first time
 It does seem like fixing the behavior to match the current documentation
 would be superior to fixing the documentation to match the current behavior.

It's not just the Lazy module: in general, the whole standard library
is not thread-safe.  Probably that should be stated in the
documentation for the threads library, but there isn't much point in
documenting it per standard library module.

As to making the standard library thread-safe by sprinkling it with
mutexes, Java-style: no way.  There is one part of the stdlib that is
made thread-safe this way: buffered I/O operations.  (The reason is
that, owing to the C implementation of some of these operations, a
race condition in buffered I/O could actually crash the whole program,
rather than just result in unexpected results as in the case of pure
Caml modules.)  You (Yaron) and others recently complained that such
locking around buffered I/O made some operations too slow for your
taste.  Wait until you wrap a mutex around all Lazy.force
operations...

More generally speaking, locking within a standard library is the
wrong thing to do: that doesn't prevent race conditions at the
application level, and for reasonable performance you need to lock at
a much coarser grain, again at the application level.  (That's one of
the things that make shared-memory programming with threads and locks
so incredibly painful and non-modular.)

Coming back to Victor's original question:

 Aside from handling a mutex myself (which I don't find very elegant for
 a read operation in a pure functional program) is there a solution I can
 use to manipulate lazy expressions in a pure functional multi-threaded
 program?

You need to think more / tell us more about what you're trying to
achieve with sharing lazy values between threads.

If your program is really purely functional (i.e. no I/O of any kind),
OCaml's multithreading is essentially useless, as you're not going to
get any speedup from it and would be better off with sequential
computations.  If your program does use threads to overlap computation
and I/O, using threads might be warranted, but then what is the
appropriate granularity of locking that you'd need?

A somewhat related question is: what semantics do you expect from
concurrent Lazy.force operations on a shared suspension?  One thread
blocks while the other completes the computation?  Same but with busy
waiting?  (if the computations are generally small).  Or do you want
speculative execution?  (Both threads may evaluate the suspended
computation.)

There is no unique answer to these questions: it all depends on what
you're trying to achieve...

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] speeding up matrix multiplication (newbie question)

2009-02-20 Thread Xavier Leroy
 I'm working on speeding up some code, and I wanted to check with
 someone before implementation.
 
 As you can see below, the code primarily spends its time multiplying
 relatively small matrices. Precision is of course important but not
 an incredibly crucial issue, as the most important thing is relative
 comparison between things which *should* be pretty different.

You need to post your matrix multiplication code so that the regulars
on this list can tear it to pieces :-)

From the profile you gave, it looks like you parameterized your matrix
multiplication code over the + and * operations over matrix elements.
This is good for genericity but not so good for performance, as it
will result in more boxing (heap allocation) of floating-point values.
The first thing you should try is write a version of matrix
multiplication that is specialized for type float.

Then, there are several ways to write the textbook matrix
multiplication algorithm, some of which perform less boxing than
others.  Again, post your code and we'll let you know.

 Currently I'm just using native (double-precision) ocaml floats and
 the native ocaml arrays for a first pass on the problem.  Now I'm
 thinking about moving to using float32 bigarrays, and I'm hoping
 that the code will double in speed. I'd like to know: is that
 realistic? Any other suggestions?

It won't double in speed: arithmetic operations will take exactly the
same time in single or double precision.  What single-precision
bigarrays buy you is halving the memory footprint of your matrices.
That could result in better cache behavior and therefore slightly
better speed, but it depends very much on the sizes and number of your
matrices.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] What is a future of ocaml?

2009-01-18 Thread Xavier Leroy
 [*] INRIA: Are you interested in handling control of http://ocaml.org
 to OcamlCore?  I think we (Red Hat) can kick in some money to pay a
 graphic designer and a user interface specialist to work on a good
 looking site that appeals to beginners and directs people to the
 necessary resources.

That sounds like an interesting offer indeed.  We'd have to discuss
actual contents of the site, but, yes, this is an area where outside
help would be welcome.

- Xavier Leroy


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Shared libraries with ocamlopt callable from C (without main())?

2009-01-11 Thread Xavier Leroy
 =  $ cat ./main.c =
 #include stdio.h
 #include libadd5wrapper.h
 int main (int argc,char **argv){
printf(Gimme - %d \n, add5wrapper());

Should be add5wrapper(argv) -- as gcc's warnings told you.

return 0;
 }
 
 Now I try to BUILD the whole thing:
 [...]
 Number 4 is where it breaks. I get a ton of errors and this is where
 it ends for me. Because # 4 is also the point where I definitely have
 no idea what I'm doing.

Two things:

- You're not linking in add5-prog.o as far as I can see

- In static mode, the Unix linker is very picky about the relative
  order of -lxxx arguments on the command-line.  For more information,
  see the Info pages for GNU ld.  You probably don't need -static anyway.

The following works:

ocamlopt -output-obj add5.ml -o add5-prog.o
gcc -I`ocamlc -where` -c add5wrapperlib.c
gcc -c main.c
gcc -o mainprog.opt main.o add5wrapperlib.o add5-prog.o \
   -L`ocamlc -where` -lasmrun -ldl -lm

Add -static to the last line if you know you really need it.

Hope this puts you back on tracks.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Why does value restriction not apply to the empty list ?

2009-01-11 Thread Xavier Leroy
Antoine Delignat-Lavaud wrote:

 I chose to solve the problem of polymorphic references by adding value
 restriction* to my inferer, using ocaml to check my results.
 Not knowing whether the empty list should be considered a value or an
 expression, I copied Ocaml's behavior and made it a value.

Yes, the empty list is a value, like all other constants.

 As a result, my inferer gave the following expression the integer type :
 let el = [] in if hd el then 1 else hd el ;;
 which is the expected result since el has polymorphic type 'a list
 but does not look right because it is used as both a bool list and an
 int list.

It is perfectly right.  The empty list can of course be used both as a
bool list and an int list; that's exactly what parametric polymorphism
is all about.

Richard Jones wrote:

 But the same if statement within a function definition causes an error:

 # let f el =
   if List.length el  0 then (List.hd el)+(int_of_string (List.hd el)) else 
 0;;
   ^^
 This expression has type int but is here used with type string

This is Hindley-Milner polymorphism at work: only let-bound
variables can have polymorphic types, while function parameters are
monomorphic.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Freeze in 64-bit Windows Num module

2008-12-18 Thread Xavier Leroy
 I have an issue with using the Num module in my Win64 OCaml compile.
 Whenever I try to operate on numbers that are too large it will lock up.

I'll try to look into this, but it would help if you'd submit a
problem report through the bug tracking system.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OCaml version 3.11.0 released.

2008-12-11 Thread Xavier Leroy
 It is our pleasure to celebrate the 51st birthday of Eric S. Raymond
 by releasing OCaml version 3.11.0. [...]
 Please note: at this time it is only available as source and as
 binary for Mac OS X on Intel processors.  Other binary versions
 will be added to the Web site next week.

Binary distributions for Windows (MSVC and Mingw toolchains) are now
available from the usual place:

  http://caml.inria.fr/download.html

Enjoy,

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OCaml 3.11.0 release candidate

2008-11-26 Thread Xavier Leroy
Damien Doligez wrote:

 We are closing in on version 3.11.0.  A Release Candidate is now available.
 The release candidate is available here:
  http://caml.inria.fr/pub/distrib/ocaml-3.11/ 

The documentation for 3.11 is also available from the same place.  The
HTML manual can be browsed online at

http://caml.inria.fr/pub/docs/manual-ocaml-311/index.html

Enjoy,

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OCaml version 3.11.0+beta1

2008-10-15 Thread Xavier Leroy
 Native dynlink used to work on Mac OS X  10.5 (x86 only). The new
 linker in 10.5 does not support linking shared libraries with non-PIC
 code. It is still possible to use the old linker, called ld_classic,
 but some libraries (like X11) does not work, so this has been disabled
 in the configure script.

 The clean solution to make natdynlink work on recent Mac OS X systems
 (beside convincing Apple to support the old behavior of their linker
 in their new implementation) is to change OCaml's x86 backend so that
 it produces only PIC code (this has been done for the AMD64 port). I
 don't think there is currently any plan to work on that.
 
 Ouch, this makes it almost a dead end for us. I can offer some time to
 help in this effort, working in the port, or providing feedback. The
 native dynlink and toplevel are, at least to me, the killer features in
 3.11, but adding another hole for Mac OS X intel (in addition to not
 supporting x86_64) does not seem like the best choice for an
 increasingly popular architecture.

Well, we'd very much like to support native dynlink on OS X 10.5,
but Apple is not helping in the least by crippling their linker
compared with the one in 10.4.  If anyone from Apple is on this list,
feel free to contact us at [EMAIL PROTECTED] for more
information on this regression.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Re: Road to native windows OCaml...

2008-10-15 Thread Xavier Leroy
 Nasm and masm syntaxes differ. You cannot simply interchange them.
 
 What I meant was that caml would generate one more asm syntax, it already
 supports two (masm and gas).

... and it's already a major pain, with quite a bit of code
duplication between the masm and gas code emitters.  The world doesn't
need yet another symtax for x86 assembly language.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] [ANN] OCamlSpotter: OCaml compiler extension for source browsing

2008-09-18 Thread Xavier Leroy
 I have written a small compiler patch called ocamlspotter. It extends
 -annot option and records all the variable definition locations, so
 that we can jump from variable uses to their definitions easily from
 editors such as emacs.
 
 You have completely blown my mind.  I was thinking about this very
 idea about 10 minutes ago in my car, and *blam* there it is.  I should
 think about some other, more profitable ideas...
 
 I would suggest submitting this as a patch for inclusion.  I've heard
 there are going to be enhancements to the .annot format in 3.11 so
 it's not unprecedented.

From what I've heard, there's also an OCaml summer of code project
that enriched the info found in .annot files.  So, it's certainly time
to discuss extensions to .annot files, but let's do that globally, not
one at a time.  It is probably too late for inclusion in 3.11, but as
long as these extensions are backward compatible, inclusion in bugfix
releases can be considered.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Problem of compilation with ocamlopt

2008-07-11 Thread Xavier Leroy
 I made some code wich compil without any problem with ocamlc.
 When I try to compil with ocamlopt on a first computer where the version
 of ocamlopt 3.09.1 I have the following message:

Please submit a proper bug report on the bug tracking system,
including code that reproduces the issue.  Make sure to mention the OS
and the architecture (x86 or x86-64).  I'll look into it.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] ocamlopt generates binaries with executable stacks

2008-06-16 Thread Xavier Leroy
 I posted a patch which should fix the issue that ocamlopt generates
 binaries with executable stacks:

   http://caml.inria.fr/mantis/view.php?id=4564

 However this patch affects every assembly target, far more than I
 could possibly test.  Could people using OCaml on non-Linux platforms
 have a look at the patch, or even test it for me?

I'm pretty sure this patch is Linux-specific.  My fear is that it
might be specific to particular versions of binutils and/or particular
Linux distributions...  I smell a portability nightmare!

Note that in 3.11, the configure script will have options to specify
how to call the assembler (for ocamlopt-generated assembly code and
for the hand-written asm files in the runtime system).  So it might be
sufficient to configure the Gentoo packages with e.g.

configure -as as --noexecstack -aspp gcc -c -Wa,--noexecstack

This could be one of the rare cases where addressing the issue at the
level of the packages is safer than by changing the source distribution.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs