Re: Fw: HaskellDoc?

2000-03-21 Thread Jonathan King


On Tue, 21 Mar 2000, Jan Brosius wrote:

  Jonathan King [EMAIL PROTECTED] wrote:
 
  (Well, the person touting lout seemed to ignore the HTML
  requirement...)
 
 I invite you to look at he following website
 
 http://www.ptc.spbu.ru/~uwe/lout/lout.html

But, of course, I had.  Whenever I look at some piece of software new to
me, and have questions about it, I look at the FAQ.  In this case, I saw a
FAQ referenced there, read it, and saw, in the answer to question 3.1:

   3. Lout Packages

   3.1 Can I produce HTML from my Lout documents? 

   The short answer is no, not easily. The problem is that Lout is 
   quite different from HTML. Lout is a powerful language which
   allows you to place text or graphics at a specific point on the 
   page, and the ease with which this is possible is one of its nicest
   features. HTML is --purposefully-- very poor in this respect. 

OK, so the FAQ may be out of date, but I don't see that as my problem.  
I'd suggest that *this* paragraph is a fairly straightforward answer to
the question that I could best paraphrase as "no".  Which isn't to say
that lout has no nice points, but it strikes me as a better backend (along
side, e.g., HTML) for processing than an input for a literate programming
tool.

jking






Re: speed of compiled Haskell code.

2000-03-21 Thread Jonathan King


On 21 Mar 2000, Ketil Malde wrote:

 "Jan Brosius" [EMAIL PROTECTED] writes:
 
  But this example task was chosen as unlucky for Haskell.
  In other, average case, I expect the ratio of  6-10.
 
  This seems that Haskell cannot be considered as a language for real
  world applications but merely as a toy for researchers .
 
 Yeah.  Let's just lump Haskell in with Perl, Python, Java, Lisp and
 all those other toy languages, completely useless in the "real world".

Not sure I understand this statement.  Compiled common lisp and scheme are
pretty competitive with C at this point for a great many things.  Java is
designed to be portable as byte codes, which is a different aim than the
others.  On the string/text processing programs where it excels,
performance of both perl and C (at least in my hands :-)) tend to be I/O
bound, and perl is waaay easier to get written.  I know less about python,
but I do know they've got a version (jpython) that produces java
bytecodes, while there is also an effort going on to introduce stronger
type-checking into the language (to improve its performance).  I'm not
sure there is any pair of these 6 languages I would lump together other
than possibly perl and python.

A more reasonable question might be: what real world applications would
you use choose Haskell for today?  And other people have given better
answers here than I could.  

Which I guess is a cue to bring up my hunch that hugs/ghc *could* end up
being a language that could eventually "out-perl perl" in some
applications, since it's a good choice for parsing, and many
text-filtering applications look like good matches for a functional
programming language.

In that regard, I think the biggest problems remaining are the lack of a
standard "fast" string type, and some remaining warts in hugs.  These are
maybe easiest to see when you do something like "strace -c" on a hugs
program and the comparable perl program.  So, in my naive version of
"hello world", the hugs version generates 803 calls to 'lstat', 102 calls
to 'stat', and a performance-killing 13 calls to 'write'; yup, that's
one for every character. :-(  throw most of those out, and you're within
shouting distance of perl.  And that would be something to shout about.

Oh yeah: my code. :-)

#!/usr/bin/runhugs
module Main where
main = putStr "Hello, world\n"

---

#!/usr/bin/perl
print "Hello, world\n";

jking






Re: HaskellDoc?

2000-03-20 Thread Jonathan King


Well, I probably should have skipped it, I guess.  But just to clarify:

On Mon, 20 Mar 2000, Frank Atanassow wrote:

 Jonathan King writes:
  
   [snip about the off-sides rule vs. actual delimiters, etc. jking]
   
  Am I the only person who finds that really, really weird?
 
 Most people slam the offside rule because it depends on trapping
 parser errors, not because they hate the notion of layout. 

That's true.  I don't "hate the notion of layout", but I wanted to
point out that it's possibly not as intuitive as might be expected,
and does complicate the tools side of things.  This is particularly true
if you combine layout with a literate programming system that goes against
that grain.

 There is a simpler notion of the layout which doesn't depend on
 error-trapping and is preferable, at least in my mind.

That would be nice. :-)
 
 As for stray "where" clauses and the like, I think that happens to me
 maybe once or twice a year. Once my definitions get longer than one
 screenful or so, I know it's time to factor some things out.

Yes, but I was more worried about novices than experts.  True, they won't
get burned by a "where" clause 200 lines back, but they can get burned by
other things.  I'm afraid I'm beginning to subscribe to the idea that if
you can't reliably cut and paste the code from email...

[snip]

Meanwhile:

 [Was there a #2?]

There was, but I edited it out and didn't change the numbering. :-(

[my rant on the state of Haskell documentation and so forth deleted,
 except for:]

   There's probably a lesson there.  You can stipulate any old format
   you like, but if it won't easily produce HTML (like lout), or
   produces a psychotic approximation to HTML (like W**d), you're hosed.
   Any browser on the planet can dump HTML to text or postscript, and
   no, it won't be artsy, but, gosh, it might just be good enough.
 
 What's your point? I think we all want to be able to produce HTML...
 or did I miss something? I also agree that we should not shoot for too
 much; but I think we should agree on what we shoot for, first.

(Well, the person touting lout seemed to ignore the HTML requirement...)

All I was suggesting is that people were concentrating on the part of the
problem that might be academically more interesting, but which doesn't
actually produce something as quickly or easily as people might expect
these days.  

Or, which won't reliably be used by people who write code, so that it
won't achieve the goal of universal anything.  Again, I'm not sure you can
shoot too low here.  I have a hunch that if you have to use anything like
a DTD to use the system, it's all over.  If it doesn't look enough like
Haskell to make writing a helpful editor for it easy, that's probably a
problem, too.  (I'm trying to imagine a good emacs mode that does both
layout and XML...).

So, while I like XML in many respects, I'm really dubious that it's a good
idea to make that the basis for a human-typable literary programming
documentation system.  Now, as an intermediate format for such a system,
that might be another story.  Again, I'd like to point out the unsettling
charm of perl's pod format.  If you're a computer scientist, reading the
description could bring tears to your eyes, or even cause internal
bleeding.  Yet, it is the key to one of the most impressive distributed
software documentation efforts that I know of.  Personally, I know I
rarely used to do man pages for small, tool-like things I did in perl. But
now anything I write for perl has a man page, since it's s easy, and
fits right into the perl I just wrote.

jking




Re: Reverse composition

1999-10-08 Thread Jonathan King


On Fri, 8 Oct 1999, Hamilton Richards Jr. wrote:

 At 1:01 PM -0500 10/8/1999, [EMAIL PROTECTED] wrote:
 
 Some time ago there was a discussion about what to call reverse
 composition (I can't find it in the archive - needs a search option?)
 
 Just now I thought of .~ from . for composition and ~ (tilde, but
 commonly called twiddle) for twiddling the order about.
 
 Maybe we could adopt that as normal usage?
 
 Assuming that "reverse composition" means
 
   f .~ g  =  g . f
 
 I kind of like . ("forward composition"), which I first saw in Simon
 Thompson's book.

Discussion of what glyph(s) to use for the reverse composition operator
just reminded me of the fact that you might really want to think up
a new "forward" composition operator, as well.  Twice in the past few
months I've seen the suggestion that "." really should be used for
what the vast majority of programming languages already use it for,
namely, a part of the record syntax.   If you look at the code 
examples in the "Lightweight extensible records" paper:

   http://research.microsoft.com/Users/simonpj/#records

or those in the O'Haskell work:

   http://www.cs.chalmers.se/~nordland/ohaskell/index.html

I think you might see the point.  (No pun back there, I promise...) I
understand where using "." to mean composition came from, and I know that
it's a long-standing tradition in at least the Haskell community, but I
don't think the visual correspondence of . to the typographic glyph
"raised open circle" is so close that you'd really like to explain why
you diverged from current usage so much as to choose "." to mean
"composition".  Especially since to "reverse" it, you end up using .
 
 It makes pipelines easy to read:
 
   f . g . h . ...

How about:

f | g | h | ...

for the above, and

g | f

for "normal" composition?  (Or does this step on some other notation out
there already?) You save a character, get a nice, reversible glyph, and
get to make a different point somewhere else.

jking







Re: Haskell's efficiency

1999-09-23 Thread Jonathan King


On Thu, 23 Sep 1999, Manuel M. T. Chakravarty wrote:

 [EMAIL PROTECTED] (Marcin 'Qrczak' Kowalczyk) wrote,

  S.D.Mechveliani [EMAIL PROTECTED] pisze:
   So far, no clear progrm example appeared in this list to demonstrate
   Haskell's in-efficiency in comparison to other languages.
  
  I have not done benchmarking myself yet, but in
  http://www.brookes.ac.uk/~p0071749/papers/bridging.ps.gz
  they describe an algorithm for text formatting.
  
| lines | chars | size(KB) | time(s) | memory(KB) |
  --+---+---+--+-++
   Haskell  |   163 |  4640 |  410 |45.2 |   4505 |
   Modula-2 |   551 | 10005 |   74 |18.6 |356 |
   C++  |   389 |  5832 |  328 | 3.9 |368 |
  
  It is not quite fair because in Modula-2 and C++ all data structures
  were of fixed size, but...
 
 This may lead to a number of different conclusions:
 * Haskell is hopeless,
 * the author of the program has no clue about Haskell,
 * the Haskell compiler is hopeless,
 * the Haskell interpreter is really only an interpreter, or
 * the author forgot to pass -O when calling the Haskell
   compiler.

The original URL cited by Kowalczyk seems to be dead, but you can get
what I believe is an incarnation of the same work in:

http://www.brookes.ac.uk/~p0071749/publications.html#para

Please note that this paper does not actually concentrate on a "Haskell
vs.the world" comparison, but on the derivation of a new, linear time
paragraph formatting algorithm.  The date on it was September 9, 1999, so
this is pretty much "hot off the press".

This version of the paper drops the Modula-2 line and the memory size
column; the rest of the new table would look like this:

 | lines | chars | size(KB) | time(s)  |
-+---+---+--+--+
Haskell ghc 4.01 |   183 |  4676 |  453 | 4.72 |
Haskell hbc 0..4 |   183 |  4676 |  196 |10.34 |
C++ g++ 2.90.27  |   416 |  6310 |8 | 0.43 |

The C++ code was a direct (by hand) translation of the haskell, except
that they did things the way a C++ programmer would do them: using
destructive array updates, and data structures that were of fixed size
for a given problem.  The code size of the C++ version suggests it was
dynamically linked to the usual stuff.

The text they formatted was the ASCII version of _Far from the Madding
Crowd_; clearly a problem of decent size.

You can probably learn a lot more from reading the whole paper, and
presumably from playing with the code.  But that's still a factor of 10
speed difference between the ghc version and the C++ version.  This is
actually a bit encouraging, I think, but does point out some possible room
for improvement.  If I had to make a wild guess, I'd bet the major source
of performance loss is in the haskell data structures used.  While the
pedagogical niceness of defining String to be [Char] is obvious, I know
that (at least in Hugs98) this can really ruin your day.  

(Some of you might like to try doing:

  strace -c your_favorite_standalone_hugs_texthack

and note what you see.  Then write the equivalent 5-line perl script (:-))
and do the strace exercise again.  Ouch.  But since the next version of
Hugs98 is apparently due out imminently, this situation may well have
improved in the new version.)

But, man, things are getting *this close*; if it were really true that all
ghc needs right now to get in line with C++ on text-formatting problems is
a kicking string library and a way for the compiler to know when to use
it, then the arguments usually used in favor become much more compelling
to a far larger audience.

jking







Re: Units of measure

1999-08-26 Thread Jonathan King


On Thu, 26 Aug 1999, Christian Sievers wrote:

 Anatoli Tubman wrote:
 
  I once wrote a C++ template library that did exactly that.  Arbitrary 
  units, rational exponents -- you can have (m^(3/2)/kg^(5/16)) 
  dimensioned value. All at compile time, without runtime checking 
  whatsoever.
 
 Is there any sense physically in rational exponents?

Physically? Probably not much, other than if you get them, you might have
made a dimensional error (which could be caught, as you suggest below).  

Well...actually, I take a bit of that back.  There are occasional "rules
of thumb" that relate various physical quantities in "natural" situations.  
So, for example, the speed of a fish is more or less proportional to the
square root of its length (assuming fish of some reasonably "fishy"  
shape).

Other situations come up with "real" data where you're doing a regression
analysis (or something similar) and the limitations of the technique
require a data transformation in order for to you fulfill statistical
model assumptions.  Not a pleasant business...

But in both cases it's not clear how (or that) you want to enforce
dimensional correctness.

 If not, we could use this extra information for less permissive type
 checking, for example only allowing a square root from an argument
 that has only even exponents.

And I can see that logarithmic data transformations are really going to
shake your world view. :-)

But getting back to Haskell (or functional programming), I did take a look
at the suggested Kennedy papers on "Dimension Types".  Boy, does this
problem turn out to be subtle; I was in over my head in no time.  But, in
addition to the solution he proposes there (which, by the way, would
handle the square root problem you mention), he discusses Wand and
O'Keefe's attempt to do a similar thing with an ML-like type system.

This is not as general, but might be good enough for some purposes. What
you do there is fix the number of base dimension at N and express
dimension types as N-tuples, so that if you had dimensions M, L, and T,
then Q (n1,n2,n3) represents [M^n1 L^n2 T^n3] in dimensional speak.  The
"fun" here begins with things like sqrt which can have type Q (n1, n2, n3)
- Q (.5*n1, .5*n2, .5*n3).  Oh, and the type inference algorithm requires
Gaussian elimination.

jking







TclHaskell manual woes...

1999-08-13 Thread Jonathan King


TclHaskell was just announced here, and I grabbed the distribution this
morning.  I was very eager to get started messing around with this, but
the TclHaskell manual is distributed as a Microsoft Word document which I
can't use and as a file that purports to be HTML. By "purports" I mean
that my version of Netscape kinda chokes on it. :-)  When I "used the
source" for the manual, the grim truth of the matter was revealed...I
won't reveal that truth, but after a non-trivial mission with my
X-wing, er, Xemacs, I ended up with something that more or less passes
for html using characters that actually appear in the Latin1 character
set.

If anybody wants my hacked-up, unofficial, but more hygenic html copy
of this, let me know.  

jking






Haskell conventions (was: RE: how to write a simple cat)

1999-06-10 Thread Jonathan King


Well, the cat has been skinned and boned, but I think I see a
shread of meat or two that hasn't been picked over yet...

On Thu, 10 Jun 1999, Frank A. Christoph wrote:

[some attributions missing...I hope you know who you are]

[big snip, about the fact that Haskell programs can be quite brief]

 Third, it is actually quite common in Haskell (in my experience at
 least) to use very non-descriptive names for local variables if the
 definition is short.  I think it helps readability because the program
 fragment is shorter as a result, and it is easier to see the
 relationship between all the elements. For example, compare:
 
   map :: (a - b) - [a] - [b]
   map f [] = []
   map f (x:xs) = (f x) : map f xs
 
   transformListElems :: (elem - elem') - List elem - List elem'
   transformListElems transform Nil = Nil
   transformListElems transform (Cons elem elemRest) =
 Cons (transform elem) (transformListElems transform elemRest)

Well, the second version does more than just use descriptive variable
names (and some not very descriptive, for that matter).  It also spells
out constructors, has an especially long-winded function name, and uses
one name for both a type variable and an argument (and a "primed" version
for a second type variable).  I would prefer to compare with:

   map :: (atype - btype) - [atype] - [btype]
   map transform [] = []
   map transform (head:rest) = (transform head) : map transform rest

Now, I wouldn't necessarily prefer that form to the "canonical" one, but I
think this is because the canonical one *is* canonical.  That is, there
are some additional Haskell conventions that really are not spelled out as
well as they might be.

You point out that short variable names keep code segments short, but my
take on the why Haskell seems to "prefer" short names in many situations
is that they are easier to think of as being *generic*.  (Intuitively,
when you make a concept something more specific, it tends to get a longer
name.)

So, the name of a type is always at least a full word, as are the names of
specific functions.  But type variables are almost always single
characters, and distinct from the names of any type.  Conventionally, they
are also usually "a", "b", and "c", although "m" is for monad.
Conventionally also, generic function arguments are "f" and "g", the
conventional predicate is "p". Generic arguments are "x" and "y" (or "xs"
and "ys" if they are lists); arguments with specified types are usually
the first letter of their type name (e.g., "c" for Char, "i" for an Int;
"n" and "m" are indices)... that covers most of it, I think.

I think most of the Haskell code I've ever seen that *wasn't* written by
me follow these conventions pretty closely.  But the strange thing is...I
haven't found a prominent place on, e.g., the Haskell home page where this
is spelled out. (Please tell me if I'm missing anything obvious.) In a
way, I guess this is trivial, but I know from hard experience it can often
take a long time to become completely aware of trivial things.

It's amusing, actually.  Haskell basically defines your indentation for
you, these conventions give you all your variable names, static
type-checking almost ensures program correctness...I guess students taking
their first Haskell course should all get As and all produce identical
code on their homework assignments. :-)

 Of course, for more involved definitions, it is better to use 
 descriptive names.

Well, for more specific definitions, anyway.  If I've got the style right.

jking