from:"Jan\-Willem Maessen"

Re: [Haskell-cafe] What class for splittable data / balanced-fold?

2013-09-30 Thread Jan-Willem Maessen

On Sun, Sep 29, 2013 at 9:13 PM, Ryan Newton rrnew...@gmail.com wrote:

 Thanks, that's interesting to know (re: Fortress).

 Interestingly, in my Fortress days we looked at both using a split-like
 interface and at a more foldMap / reduce - like interface, and it seemed
 like the latter worked better – it requires a lot less boilerplate for
 controlling recursion, and better matches the fanout of whatever structure
 you're actually using underneath.


 Ok, we'll have to try that.  I may be underestimating the power of a
 newtype and a monoid instance to expose the structure..  I was wrong about
 this before [1].  Here's the foldMap instance for Data.Map:

   foldMap _ Tip = mempty  foldMap f (Bin _ _ v l r) = Foldable.foldMap f l 
 `mappend` f v `mappend` Foldable.foldMap f r

 Simon Marlow in his recent Haxl talk also had a domain where they wanted a 
 symmetric (non-monadic) parallel spawn operation...

 But it remains pretty hard for me to reason about the operational behavior of 
 these things... especially since foldMap instances may vary.


I'll note that there's really a documentation responsibility here that
hasn't been honored as much as it should (possibly because lots of folks
are driving Foldable, which other commenters have noted doesn't seem to do
what you want for tree-like data structures – I certainly didn't realize
that).

It'd be worth thinking about doing the derivation of foldMap directly from
the structure of the underlying type.

It'd also be worth documenting when we get tree-structured traversal out of
a Foldable instance, and fixing the ones that don't provide it.

And I agree that getting down to non-allocating traversals is the ultimate
goal here.  If we leak space or lose parallelism we might as well not
bother.

-Jan

Thanks,

-Ryan

 [1] For example, here is a non-allocating traverseWithKey_ that I failed to 
 come up with:


 -- Version of traverseWithKey_ from Shachaf Ben-Kiki
 -- (See thread on Haskell-cafe.)
 -- Avoids O(N) allocation when traversing for side-effect.

 newtype Traverse_ f = Traverse_ { runTraverse_ :: f () }
 instance Applicative f = Monoid (Traverse_ f) where
   mempty = Traverse_ (pure ())
   Traverse_ a `mappend` Traverse_ b = Traverse_ (a * b)
 -- Since the Applicative used is Const (newtype Const m a = Const m), the
 -- structure is never built up.
 --(b) You can derive traverseWithKey_ from foldMapWithKey, e.g. as follows:
 traverseWithKey_ :: Applicative f = (k - a - f ()) - M.Map k a - f ()
 traverseWithKey_ f = runTraverse_ .
  foldMapWithKey (\k x - Traverse_ (void (f k x)))
 foldMapWithKey :: Monoid r = (k - a - r) - M.Map k a - r
 foldMapWithKey f = getConst . M.traverseWithKey (\k x - Const (f k x))



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] What class for splittable data / balanced-fold?

2013-09-29 Thread Jan-Willem Maessen

On Sat, Sep 28, 2013 at 1:09 PM, Ryan Newton rrnew...@gmail.com wrote:

Hi all,

We all know and love Data.Foldable and are familiar with left folds and
right folds. But what you want in a parallel program is a balanced fold
over a tree. Fortunately, many of our datatypes (Sets, Maps) actually ARE
balanced trees. Hmm, but how do we expose that?

It seems like it would be nice to have a* standard class t*hat allows you
to split a datatype into roughly even halves, until you get down to the
leaves. This goes along with Guy Steele's argument that we should use
append lists as primitive rather than cons-lists, and it's why we added
append-lists
within the monad-par
libraryhttp://hackage.haskell.org/package/monad-par-extras-0.3.3/docs/Control-Monad-Par-AList.html
.

Interestingly, in my Fortress days we looked at both using a split-like
interface and at a more foldMap / reduce - like interface, and it seemed
like the latter worked better – it requires a lot less boilerplate for
controlling recursion, and better matches the fanout of whatever structure
you're actually using underneath.

So I'd just go with a hand-written Foldable instance here.

But I'd love to hear if you've come up with an application that requires
split itself, and that *isn't* zip. I recall we decided zip was better
done with element-and-index iteration over one of the structures and
indexing into the other since most tree structures don't actually zip
properly anyway.

-Jan-Willem Maessen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Optimization flag changing result of code execution

2013-07-17 Thread Jan-Willem Maessen

This has all the marks of a 64-bit-only code running on a 32 bit machine.
 It looks like you're getting the high bits of the rng with a signed shift
right, ultimately yielding only the sign bit.

I suspect mwc-random needs to use Int64 rather than Int internally in a few
critical places.


On Wed, Jul 17, 2013 at 7:22 AM, kudah kudahkuka...@gmail.com wrote:

 Test triggers the bug, only zeros and ones like you said, but
 only for native-sized types:

 -O2:
 Int
 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0
 Int32
 41 37 25 85 27 84 70 8 70 32 36 1 14 92 1 74 17 28 38 76
 Int64
 37 77 57 75 17 58 28 77 23 51 1 13 50 35 21 11 70 43 6 5
 Word
 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0
 Word32
 52 45 86 4 85 44 71 59 91 10 65 89 41 78 84 88 3 60 71 0
 Word64
 12 82 25 1 11 14 76 58 1 77 9 25 57 20 41 8 2 29 21 29

 ghci:
 Int
 53 13 24 58 66 71 19 16 73 54 95 87 2 34 62 67 2 45 56 2
 Int32
 41 37 25 85 27 84 70 8 70 32 36 1 14 92 1 74 17 28 38 76
 Int64
 37 77 57 75 17 58 28 77 23 51 1 13 50 35 21 11 70 43 6 5
 Word
 41 19 99 69 27 58 92 45 9 38 51 39 50 14 2 21 25 94 96 2
 Word32
 52 45 86 4 85 44 71 59 91 10 65 89 41 78 84 88 3 60 71 0
 Word64
 12 82 25 1 11 14 76 58 1 77 9 25 57 20 41 8 2 29 21 29

 I run a 32-bit system, as I've said before.

  Gentoo Linux _i686_ 3.8.2-pf

 Could perhaps be something with my system, I'll test on Ubuntu later
 today, and if there are no problems with compiling mwc-random on ghc
 git — on it too.

 On Wed, 17 Jul 2013 12:19:29 +0400 Aleksey Khudyakov
 alexey.sklad...@gmail.com wrote:

  On 10 July 2013 14:10, kudah kudahkuka...@gmail.com wrote:
   Yes, it does. Without optimizations the result is
   ndgorsfesnywaiqraloa, while with optimizations the result is
   always aabb.
  
  Sorry for taking so long. So problem is uniformR. You can reproduce
  bug reliably and I cannot. Are you on 32-bit system? I only tested on
  64-bit ones. If this isn't the case I'm out of ideas.
 
  I finally wrote test case that doesn't depends on anything besides
  mwc-random (it's
  in attachment). Could you check whether it still triggers the bug

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Defining a Strict language pragma

2012-11-06 Thread Jan-Willem Maessen

On Mon, Nov 5, 2012 at 5:52 PM, Johan Tibell johan.tib...@gmail.com wrote:

 The tricky part is to define the semantics of this pragma in terms of
 Haskell, instead of in terms of Core. While we also need the latter, we
 cannot describe the feature to users in terms of Core. The hard part is to
 precisely define the semantics, especially in the presence of separate
 compilation (i.e. we might import lazy functions).

 I'd like to get the Haskell communities input on this. Here's a strawman:

  * Every function application f _|_ = _|_, if f is defined in this module
 [1]. This also applies to data type constructors (i.e. the code acts if all
 fields are preceded by a bang).

  * lets and where clauses act like (strict) case statements.


What ordering constraints will exist on let and where clauses?  Is the
compiler free to re-order them in dependency order?

Must they be strictly evaluated in the context in which they occur?
 Haskell syntax readily lends itself to a style a bit like this:

f x y z
  | p x = ... a ... b
  | q y = ... a ... c
  | otherwise = ... d ...
  where a = ...
  b = ...
  c = ...
  d = ...

This tripped us up a lot in pH and Eager Haskell, where we at least wanted
to be able to float d inwards and where it was sometimes surprising and
costly if we missed the opportunity.  But that changes the semantics if d =
_|_.  It's even worse if d = _|_ exactly when p x || q y.

Part of the answer, I'm sure, is don't do that, but it might mean some
code ends up surprisingly less readable than you'd expect.

 * It's still possible to define strict arguments, using ~. In essence
 the Haskell lazy-by-default with opt-out via ! is replaced with
 strict-by-default with opt-out via ~.

 Thoughts?


I found myself wondering about free variables of lambdas, but realized that
would be handled at the point where those variables are bound (the binding
will either be strict or lazy).

-Jan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-30 Thread Jan-Willem Maessen

On Mon, Jan 30, 2012 at 6:24 AM, Malcolm Wallace malcolm.wall...@me.comwrote:


 On 29 Jan 2012, at 22:25, Ertugrul Söylemez wrote:
  A strict-by-default Haskell comes with the
  implication that you can throw away most of the libraries, including the
  base library.  So yes, a strict-by-default Haskell is very well
  possible, but the question is whether you actually want that.  I
  wouldn't, because a lot of my code relies on the standard semantics.

 At work, we have a strict version of Haskell, and indeed we do not use the
 standard libraries, but have built up our own versions of the ones we use.
  However, our compiler is smart enough to transform and optimise the source
 code *as if* it were non-strict: it is only at runtime that things are
 evaluated strictly.  This means that, in general, you can rely on the
 standard semantics to a surprisingly large extent.


I wanted to emphasize Malcolm's point here.  Optimizing using the original
Haskell semantics turned out to be pretty important back when I was working
on Eager Haskell.  For example, a lot of Haskell functions are written in
the following style:

f a b c
  | guard1 d = g h i
  | guard2 e = h
  | guard3 f = i
  | otherwise = j
  where d = ...expensive...
e = ...expensive...
f = ...expensive...
g = ...expensive...
h = ...expensive...
i = ...expensive...
j = ... expensive...

An an ordinary procedural language, where function calls in g, h, i, and j
might have side effects, we can't sink bindings down to the point of use.
 Even in the absence of side effects, we have to account for the fact that
some of these computations are used in some -- but not all -- right-hand
sides, and that often we need to do some inlining to discover that a value
isn't going to be used.  It turns out Haskell code relies on this sort of
thing all over the place, and simply coding around it leads to
deeply-nested let bindings that walk off the right-hand edge of the screen.

It's not difficult to rewrite most of the prelude functions in this style,
but it's no longer pretty, and it's recognizably not idiomatic Haskell.

However, bulk operations do transform the entire data structure, not merely
 the fragments that are needed for the onward computation, so it can often
 be a net performance loss.  The standard lazy computational paradigm of
 generate-and-test is therefore hideously expensive, on occasion.


This was a huge issue in Eager Haskell.  By far our worst performance was
on stream-like programs that generated infinite lists of results, and then
sliced away the useless tail.  With completely strict evaluation, this of
course doesn't work at all, but it can be tricky to bound the required
sizes of inputs even if you know how much of the output you want (imagine a
call to takeWhile or filter on an infinite list).

-Jan-Willem Maessen



 Regards,
Malcolm
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Question: Lazy Incremental Evaluation and Haskell?

2011-10-08 Thread Jan-Willem Maessen

On Fri, Oct 7, 2011 at 2:46 PM, Brandon Moore brandon_m_mo...@yahoo.com wrote:
 Margnus Carlsson did something monadic several years ago.

 http://dl.acm.org/citation.cfm?doid=581478.581482

 Perhaps there is an implementation on Hackage or on his website.

 This stuff also goes by the moniker adaptive computation. See the
 references and citations of that paper for more on this.

 Umut Acar now seems to refer to this as self-adjusting computation,
 and has some work here:

 http://umut.mpi-sws.org/self-adjusting-computation

 In particular, there seems to be a modified version of Mlton.

To tie things together a bit, Magnus Carlsson's paper was based on
Umut Acar's earlier work.  Note in particular that there's a lot of
emphasis placed on efficiently figuring out what computation needs to
be re-done (and some theory to support those efficiency claims).  FRP
frameworks, etc. naively re-do rather too much computation (all of it,
in particularly poor cases) compared to systems specifically tailored
to self-adjustment.

-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Robert Harper on monads and laziness

2011-05-03 Thread Jan-Willem Maessen

On Tue, May 3, 2011 at 1:32 AM, Manuel M T Chakravarty
c...@cse.unsw.edu.au wrote:
...  Interestingly, today (at least the academic fraction of) the Haskell 
community appears to hold the purity of the language in higher regard than its 
laziness.

As someone who implemented Haskell with quite a bit less laziness, I'm
inclined to agree.

That said, I think it's easy to underestimate just how much of the
structure of the language really encourages a lazy evaluation
strategy.  One example: where blocks scope over groups of conditional
RHSs.  This is very handy, in that we can bind variables that are then
used in some, but not all, of the disjuncts.  Grabbing the first
example that comes to hand from my own code:

tryOne (gg, uu) e
  | not (consistent uu)  = (gg', uu)
  | uu==uu' = (gg, uu)
  | otherwise = (gg', uu')
  where gg' = gg `addEquation` e
uu' = uu `addEquation` e

This kind of thing happens all over the place in Haskell code -- it's
a very natural coding style -- but if you want to preserve purity it's
tough to compile without laziness.

-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Class constraints for associated type synonyms

2011-04-01 Thread Jan-Willem Maessen

On Thu, Mar 24, 2011 at 11:36 AM, Simon Peyton-Jones
simo...@microsoft.com wrote:
 | class Monoid (GeneratorOf a) = Generable a where
 |   type GeneratorOf a :: * - *
 |   construct :: GeneratorOf a - a
 |
 | Now, it seems I need FlexibleInstances to do this when I'm using an
 | associated type synonym, but I don't need the flexibility when using a
 | multiparameter type class.


 Suppose you have these wierd instances:
        type instance GeneratorOf (Tree a) = Tree (Tree a)
        instance Generable a = Monoid (Tree a)
        instance Generable (Tree a)

 Now, in the last of these we need to cough up an instance of Generable (Tree 
 a)'s superclasses.  Ah, that's Monoid (GeneratorOf (Tree a))
 Ah, that's Monoid (Tree (Tree a))
 We have an instance of Monoid, but it needs, well Generable (Tree a), which 
 is where we started.

 If I'd nested things a bit more deeply you can see I'd get into an infinite 
 regress.   So you have to take responsibility that instance solving will 
 terminate, hence FlexibleInstances.

 As you say, the same thing can happen with fundeps. The fact that the thing 
 is allowed is probably a bug in the Fundep stuff.

Thanks, it's good to know that I was, in fact, being naughty in both
instances (and not merely being constrained from doing Good Things by
the typing rules for associated types).  Back to the drawing board.

-Jan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Class constraints for associated type synonyms

2011-03-23 Thread Jan-Willem Maessen

Hi all -

I've been trying to construct a class declaration with an associated
type synonym, but I'd like to constrain that type to belong to a
particular class.

Consider the following class:

class Monoid m = Constructs c m | c - m where
  construct :: m - c

This captures the idea that the collection c ought to be constructed
using the monoid m (say if we're doing the construction using the
Writer monad)--the functional dependency indicates the desire for the
type c to injectively determine the choice of monoid m.  For example:

newtype ListBuilder a = Builder ([a] - [a]) deriving (Monoid)

instance Constructs [a] (ListBuilder a) where
  construct (Builder f) = f []

instance (Ord a) = Constructs (Set a) (Set a) where
  construct = id

Now I'd like to be able to do the same thing using an associated type
synonym, something like this:

class Monoid (GeneratorOf a) = Generable a where
  type GeneratorOf a :: * - *
  construct :: GeneratorOf a - a

Now, it seems I need FlexibleInstances to do this when I'm using an
associated type synonym, but I don't need the flexibility when using a
multiparameter type class.  In both cases the instance constraint
involves types that can be injectively inferred (if I have my
terminology straight; work with me here) from a single type mentioned
in the class head.  In particular, I can imagine storing the
dictionary for Monoid (GeneratorOf a) in the dictionary for Generable
a, and thus allowing context reduction of (Monoid (GeneratorOf tyvar))
to (Generable tyvar).  Meanwhile, I think there are things that are
permitted by FlexibleInstances that I'd rather *not* have creeping
into my programs.

Is there a fundamental constraint I'm missing here that requires the
full generality of FlexibleInstances?  Do I need to use
FlexibleInstances whenever I use associated types in my programs?

-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Jan-Willem Maessen

On Wed, Feb 23, 2011 at 11:08 AM, Johan Tibell johan.tib...@gmail.com wrote:
 ...
 HashTable is not a concurrent data structure. You need e.g. a lock
 free mutable hash table.

Good implementations of which are *not* thick on the ground.  Even
java.util.concurrent isn't fully lock-free.

-Jan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-21 Thread Jan-Willem Maessen

On Mon, Feb 21, 2011 at 8:07 AM, Felipe Almeida Lessa
felipe.le...@gmail.com wrote:
 On Mon, Feb 21, 2011 at 12:58 PM, Max Cantor mxcan...@gmail.com wrote:
 If you want to use the library and need a short term fix, just write a small 
 wrapper type/module

 newtype SizedMap = SizedMap (Int, HashMap) and track the size yourself.  
 only complication is that on inserts and deletes you'll need to check if the 
 key existed.  other than that, it shouldn't be too difficult.

 This way, the library stays super optimized but, if you need, you can track 
 the size. As Johan said, it would slow down insert and delete a bit.  
 shouldn't affect lookup though..

 This isn't sufficient in all cases.  How would you know the resulting
 size of a union or intersection?

Note that the library does not at present support union or
intersection!  There are various ways to deal with the problem without
caching sizes at nodes, one of which is to count overlap during union
or intersection operations (since this involves counting leaves that
are visited during these operations).

-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Misleading MVar documentation

2011-01-15 Thread Jan-Willem Maessen

On Wed, Jan 12, 2011 at 11:23 AM, Neil Brown nc...@kent.ac.uk wrote:
 On 12/01/11 15:53, Edward Z. Yang wrote:

 These are interesting, opposed perspectives, and I suspect what would be
 good is to treat both situations.  I think perhaps what would be good
 to put in the introduction is the conceptual model of MVars: that is,
 take and put are the fundamental operations, and everything else is
 composed of them.  With additional constraints on who is writing and
 reading
 MVars, you can assume more safety properties, but you have to ensure
 that those are indeed held (or you should use STM instead.)

 I'll try another writeup. Does anyone know where the original papers for
 MVars might be?

 I think the original paper is Concurrent Haskell, available here:

 http://www.haskell.org/ghc/docs/papers/concurrent-haskell.ps.gz

 and here:

 http://research.microsoft.com/en-us/um/people/simonpj/papers/concurrent-haskell.ps.gz

Actually, the first presentation of M-structures is rather older than
that.  See Barth, Nikhil, and Arvind's FPCA '91 paper:
http://portal.acm.org/citation.cfm?id=652538

The original formulation was indeed in terms of take and put,
though unconditional read and write primitives were prtty commonly
used in Id programs.  The take/put view can also usefully be thought
of as a 1-element blocking channel.

-Jan-Willem MAessen


 Thanks,

 Neil.

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Guy Steele's Praise For Haskell @ Strange Loop Keynote

2011-01-15 Thread Jan-Willem Maessen

On Sat, Jan 15, 2011 at 9:02 PM, Jake McArthur jake.mcart...@gmail.com wrote:
 So everybody doesn't have to go watch it, here is a shortened version of
 what Steele said in the video:

 Although Fortress is originally designed as an object-oriented framework
 in which to build an array-style scientific programming language, [...] as
 we've experimented with it and tried to get the parallelism going we found
 ourselves pushed more and more in the direction of using immutable data
 structures and a functional style of programming. [...] If I'd known seven
 years ago what I know now, I would have started with Haskell and pushed it a
 tenth of the way toward Fortran instead of starting with Fortran and pushing
 it nine tenths of the way toward Haskell.

 I think I might use this in some slides soon. :) Thanks for pointing it out!

The big things I can recall missing were pattern matching and
Haskell-style classes rather than OO + generic typing.  The Fortress
type system actually approximates pattern matching in some interesting
ways, but it's not the same.

-Jan-Willem Maessen
Experienced Fortress programmer (!)


 - Jake

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Eta-expansion destroys memoization?

2010-10-07 Thread Jan-Willem Maessen

What people seem to be missing here is that the location of the
where-binding with respect to the lambda changes in each case.  As a
result, I think the forgoing explanations were rather confusing;
there's no magic going on here.

On Thu, Oct 7, 2010 at 8:17 AM, Brent Yorgey byor...@seas.upenn.edu wrote:

 - Forwarded message from Yue Wang yulew...@gmail.com -

 From: Yue Wang yulew...@gmail.com
 Then there is a clever way to do that on haskell wiki:

 fib = ((map fib' [0 ..]) !!)
    where
      fib' 0 = 0
      fib' 1 = 1
      fib' n =trace(show(n)) fib (n - 1) + fib (n - 2)

This is indeed equivalent to:
fib =
  let fib' 0 = 0
   fib' 1 = 1
   fib' n = fib (n-1) + fib (n-2)
  in (map fib' [0..] !!)

But adding the argument embeds the let inside the function call:
fib x =
  let fib' 0 = 0
   fib' 1 = 1
   fib' n = fib (n-1) + fib (n-2)
  in (map fib' [0..] !!)

Now we create a new fib' for each invocation of fib.  Not efficient at
all!  (Much *less* efficient the the recursive fib).

There's no evaluation magic here---all that's happening is GHC is
executing the program exactly as written.  It can't float the list out
of the function, as that can lead to unexpected space leaks (if you
didn't intend to keep the list of fibs around forever).

-Jan-Willem Maessen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Shared thunk optimization. Looking to solidify my understanding

2010-09-25 Thread Jan-Willem Maessen

No one seems to have mentioned that this is a non-optimization in
call-by-need lambda-calculus (Ariola et al.), where it follows from
the standard reduction rules.  Since lazy implementations of Haskell
all use call-by-need evaluation in some form, I'd call this playing
by the rules rather than optimization.  Unoptimized call-by-need
indeed evaluates (nthPrime 10) twice in test2, but only once in
test1.  (Challenge: prove observationl equivalence of these two
fragments under call-by-need.)

-Jan-Willem Maessen

On Fri, Sep 24, 2010 at 5:58 PM, David Sankel cam...@gmail.com wrote:
 On Wed, Sep 22, 2010 at 11:10 AM, David Sankel cam...@gmail.com wrote:

 snip
 My questions are:

 What is the optimization that test1 is taking advantage of called?
 Is said optimization required to conform to the Haskell98 standard? If so,
 where is it stated?
 Could someone explain or point to a precise explanation of this
 optimization? If I'm doing an operational reduction by hand on paper, how
 would I take account for this optimization?

 Thanks everyone for your responses. I found them very helpful. This is my
 current understanding, please correct me where I am wrong:
 When using Launchbury's Natural Semantics (LNS) as an operational model,
 this optimization is called sharing which would lie in a category of
 optimizations called common subexpression elimination. Holger Siegel's email
 provided steps of an evaluation using LNS to show the different runtimes
 between test1 and test2.
 Because Haskell98 does not specify an operational semantics, there is
 no guarantee that an implementation will provide a sharing optimization. On
 the other hand, Haskell implementations are all similar enough that the
 sharing optimization can be depended on. LNS was indeed written to model
 what is common in implementations for languages characteristically like
 Haskell.
 When compiled with ghc with optimizations, test1 and test2 result in
 identical runtime behaviors. This is an artifact of another, more
 aggressive, optimization that falls within common subexpression elimination
 optimizations. It is not easy to describe or predict when this optimization
 occurs so depending on it when writing code is problematic.
 wren ng thornton provided an evaluation using another operational semantics
 (reference?). Under this semantics, this optimization would be called
 partial evaluation. Unfortunately I couldn't follow the steps or the
 reasoning behind them, perhaps a reference to the semantics would help.
 Thanks again!

 David
 --
 David Sankel
 Sankel Software
 www.sankelsoftware.com
 585 617 4748 (Office)

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] and from standard Prelude

2010-08-24 Thread Jan-Willem Maessen

On Wed, Aug 18, 2010 at 9:56 PM, wren ng thornton w...@freegeek.org wrote:
 Oleg Lobachev wrote:

 #ifdef USE_REPORT_PRELUDE
 and                     =  foldr () True
 or                      =  foldr (||) False
 #else
 and []          =  True
 and (x:xs)      =  x  and xs
 or []           =  False
 or (x:xs)       =  x || or xs

 {-# RULES

 and/build     forall (g::forall b.(Bool-b-b)-b-b) .
                and (build g) = g () True

 or/build      forall (g::forall b.(Bool-b-b)-b-b) .
               or (build g) = g (||) False

 #-}
 #endif

 The thing I find puzzling is that the foldr is inlined. The (regular) clever
 optimizations for build/foldr seem like they should already handle this
 without the need for the extra rules. I wonder how much is gained by
 specializing to ()/True instead of relying on the regular deforestation?

The above code does not inline and / or, *unless they are fused using
the RULES.*  There's not really any benefit to inlining them
otherwise, and it duplicates code.

-Jan-Willem Maessen


 --
 Live well,
 ~wren
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] trees and pointers

2010-07-16 Thread Jan-Willem Maessen

2010/7/16 wren ng thornton w...@freegeek.org:
 Jake McArthur wrote:

 On 07/15/2010 05:33 PM, Victor Gorokhov wrote:

 From the docs, lookup is O(min(n,W))

 Actually worse than O(log n).

 Perhaps I am misunderstanding you, but O(min(n,W)) is either better than
 or the same as O(log n), depending on how you look at things, but I don't
 see any way that the former could be *worse* than the latter.

 For n  W: min(n,W)  log n

 So, when you can guarantee that n  W ---which is almost always the case for
 IntMap---, then O(min(n,W)) is linear and therefore worse than O(log n).

Indeed---though you see worst-case behavior only for carefully-chosen
key sets (eg successive powers of 2).  If the n values in an IntMap
are, say, consecutive or nearly-consecutive, we obtain a log n bound.
I suspect in practice most programmers will see logarithmic behavior.

 But even so, if your constant factors are k  c, then k*n  c*log n is
 perfectly possible for all n  W, and therefore what matters in the real
 world here is the constant factors. The reason why is that for asymptotic
 purposes O(min(n,W)) and O(log n) belong to the same class of functions
 between constant and linear, so they're effectively the same (in
 asymptotic-land).

The argument for constant-time IntMap performance is essentially
similar to the following argument:

There are balanced trees that provide an O(lg n) bound on tree depth
for a tree containing n elements.  Our computer has only k bits of
address space and therefore the number of elements in any in-memory
tree is O(k).  Thus there is a constant (and smallish) upper bound on
tree depth, O(lg k).  Therefore every balanced tree implementation
offers constant-time access.

As you observe, it's really down to constant factors.  The reason
IntMap (or any digital trie) is so interesting is that it is simple
enough that the constant factors are quite good---in particular we
don't waste a lot of time figuring out if we're going to need to
rearrange the tree structure on the fly.  That turns out to amortize
quite a few extra level traversals in a lot of settings.

It also offers really elegant implementations of union and unions.
Whether that means they're quickish I leave to the benchmarkers.

-Jan-Willem Maessen


 --
 Live well,
 ~wren
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Speed of Error handling with Continuations vs. Eithers

2010-05-10 Thread Jan-Willem Maessen

On Mon, May 10, 2010 at 5:38 AM, Max Cantor mxcan...@gmail.com wrote:

 Based on some discussions in #haskell, it seemed to be a consensus that
 using a modified continuation monad for Error handling instead of Eithers
 would be a significant optimization since it would eliminate a lot of
 conditional branching (everytime = is called in the Either monad, there is
 a conditional.


My suspicion, based on using a similar monad to implement IO in Eager
Haskell, is that you're creating a lot of closures.  This is rather more
expensive in general than the extra control flow required to inspect the
Eithers.

In more detail: CPS works well if the compiler can inline most of the
continuation passing and turn your code back into direct style, at least
along the no failures path.  In this case you can avoid creating closures
except at what would have been actual function call points in your original
code, and at catch points for the error continuation.  However, I expect
that you're probably calling functions that are polymorphic in Monad (stuff
like mapM etc.) that is not being inlined or specialized.  These end up
building a continuation rather naively on the heap.  You're essentially
moving the call stack to the heap, and the compiler can't assist you in
moving it back again; hence, slow code.

To make matters worse, you get a lot more branch prediction leverage with
pointer-tagged Either than you possibly could with a closure invocation on a
modern architecture.  But I suspect that's rather unimportant compared to
allocation time / memory footprint issues here.

-Jan-Willem Maessen



 I implemented a ErrCPS monad which does exactly that, but the speed has
 been disappointing.  It runs almost exactly 3x slower than a drop in
 replacement using the MonadError instance of Either from mtl.

 mkEMA and midError are basically toy functions but I dont know why Either
 is so much faster.  I've experimented with putting some seq's in the
 bindErrCPS and even {-# INLINE (=) #-} in the Monad instance, but to no
 avail.

 I've copy/pasted the code below, any suggestions on optimization, or if
 this is simply a bad idea would be much appreciated.  Strangely, compiling
 with -O2 seems to have no effect on the speed:


 -Max
 [... code snipped]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] GHC vs GCC

2010-03-28 Thread Jan-Willem Maessen

On Sat, Mar 27, 2010 at 8:16 PM, Roman Leshchinskiy r...@cse.unsw.edu.auwrote:

 On 28/03/2010, at 01:36, Jan-Willem Maessen wrote:

  It's worth pointing out that there's a bit of bang-pattern mysticism
 going on in this conversation (which has not been uncommon of late!).  A
 non-buggy strictness analyzer should expose the strictness of these
 functions without difficulty.

 Actually, rangeJ is lazy in i and rangeK is lazy in i and j. GHC does unbox
 everything important here but that needs more optimisations than just
 strictness analysis. You are right, though, that GHC doesn't need bang
 patterns here.


Quite right, the condition in rangeK that mentions all variables is under
another condition:

   rangeK :: Int - Int - Int - Int - Int
   rangeK i j k acc
   | k  1000 =
   if i * i + j * j + k * k `mod` 7 == 0
   ...

So we need to apply some constructor specialization as well to notice that i
and j are always of the form (Int# i#).

-Jan

Roman



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Are there any female Haskellers?

2010-03-28 Thread Jan-Willem Maessen

2010/3/28 Günther Schmidt gue.schm...@web.de

 Dear Philippa,

 display themselves on demand is putting it rather harshly don't you
 think?


In the context of an existing, lengthy discussion that displays the
ignorance of some of its participants, no.  I could easily see reading the
discussion thus far and deciding to take the path of least resistance and
keep quiet.

It's worth noting the following study, which received quite a bit of media
attention:
  http://www.pnas.org/content/106/22/8801.abstract
This argues that gender differences in math performance vary among cultures,
and that differences in math performance are thus more likely cultural
rather than genetic.  Discussions of the study often mention that fact that
previous work citing evidence for innateness of ability tended to focus on
participants with a shared cultural background.

A relatively recent article in CACM made much the same point for CS;
particularly noteworthy to me is the rather different proportion of
undergrad CS majors in different countries (the US is particularly low):
http://portal.acm.org/citation.cfm?id=1461928.1461947coll=portaldl=ACMidx=J79part=magazineWantType=Magazinestitle=Communications

-Jan-Willem Maessen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] GHC vs GCC

2010-03-27 Thread Jan-Willem Maessen

On Sat, Mar 27, 2010 at 12:43 AM, Rafael Almeida almeida...@gmail.comwrote:

 On Fri, Mar 26, 2010 at 6:49 PM, Jason Dagit da...@codersbase.com wrote:
 
 
  On Fri, Mar 26, 2010 at 2:33 PM, Bryan O'Sullivan b...@serpentine.com
  wrote:
 
  On Fri, Mar 26, 2010 at 10:46 AM, Rafael Cunha de Almeida
  almeida...@gmail.com wrote:
 
  During a talk with a friend I came up with two programs, one written in
  C and another in haskell.
 
  Your Haskell code builds a huge thunked accumulator value, so of course
  it's slow (put bang patterns on all arguments). Also, you should use rem
  instead of mod. Make those tiny changes and you'll get a 5x speedup, to
 half
  the performance of the C code.
 
  Interesting.  I had to add -fvia-C to get within half the performance of
 C.
  Just bang patterns and rem and I'm 1/5th of C.  I'm on a x86_64 machine.
 I
  wonder if that plays in.
 
  Jason
 

 Using bang patterns didn't help almost anything here. Using rem
 instead of mod made the time go from 45s to 40s. Now, using -fvia-C
 really helped (when I used rem but not using mod). It went down to
 10s.


It's worth pointing out that there's a bit of bang-pattern mysticism going
on in this conversation (which has not been uncommon of late!).  A non-buggy
strictness analyzer should expose the strictness of these functions without
difficulty.  If bang patterns make any difference at all with a -O flag,
either there's a strictness analysis bug, or some very interesting effects
from shifting the order of forcing of strict variables.

Putting in bang patterns is a good idea to plug the obvious space leak when
run without optimization, but isn't going to make a difference for
optimizing compilation of obviously-strict functions.

-Jan-Willem Maessen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] generalized newtype deriving allows the definition of otherwise undefinable functions

2010-03-09 Thread Jan-Willem Maessen


On Mar 9, 2010, at 5:26 AM, Simon Peyton-Jones wrote:
 ...
 Stephanie Weirich, Steve Zdancewic, Dimitrios Vytiniotis and I have been 
 working hard on a development of the FC intermediate language, and hence of 
 the source language, that will close this (embarrassing) loophole, and allow 
 some new expressiveness.  Nothing written down in a form that someone other 
 than us can make sense of, but there will be!  In brief, though, we're going 
 to end up with kinds looking like
   * = *
 as well as the existing
   * - *
 The new form means a type-indexed function whereas the latter means a 
 type-parametric function. 
 
 John Meacham's example is also very interesting. Even if the data type 
 doesn't use type functions, it might have invariants concerning type classes 
 (his example is Set), and converting all the elements might destroy the 
 invariants.  Excellent point!  There's no type-soundness issue (no run-time 
 seg fault) but something nearly as bad.  Will have to think about that.  
 Probably declaring Set to have kind (* = *) will do the job.

It occurs to me to observe: if we give class constraints in data types some 
force, and write:

data Ord a = Set a = ...[internals go here]...

Would this be enough to cue us that Set has a more interesting kind than just * 
- * ?

-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] generalized newtype deriving allows the definition of otherwise undefinable functions

2010-03-09 Thread Jan-Willem Maessen


On Mar 9, 2010, at 5:53 AM, Max Cantor wrote:

 Isn't this just an extension of the notion that multi-parameter typeclasses 
 without functional dependencies or type families are dangerous and allow for 
 type-naughtiness?  

I wondered the same thing, but came up with an analogous problematic case that 
*only* uses generalized newtype deriving:

 {-# LANGUAGE GeneralizedNewtypeDeriving #-}
 module Main(main) where
 import Data.Set

 class IsoInt a where
 stripToInt :: item a - item Int
 convFromInt :: item Int - item a

 instance IsoInt Int where
 stripToInt = id
 convFromInt = id

 newtype Down a = Down a deriving (Eq, Show, IsoInt)

 instance Ord a = Ord (Down a) where
 compare (Down a) (Down b) = compare b a

 asSetDown :: Set (Down Int) - Set (Down Int)
 asSetDown = id

 a1 = toAscList . asSetDown . convFromInt . fromAscList $  [0..10]
 a2 = toAscList . asSetDown . fromAscList . reverse . convFromInt $ [0..10]

 main = do
 print a1
 print a2

-Jan-Willem Maessen___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: GHC's parallel garbage collector -- what am I doing wrong?

2010-03-07 Thread Jan-Willem Maessen


On Mar 3, 2010, at 8:44 AM, Simon Marlow wrote:

 On 01/03/2010 21:20, Michael Lesniak wrote:
 Hello Bryan,
 
 The parallel GC currently doesn't behave well with concurrent programs that
 uses multiple capabilities (aka OS threads), and the behaviour you see is
 the known symptom of this.. I believe that Simon Marlow has some fixes in
 hand that may go into 6.12.2.
 
 It's more correct to say the parallel GC has difficulty when one of its 
 threads is descheduled by the OS, because the other threads just spin waiting 
 for it.  Presumably some kernels are more susceptible than others due to 
 differences in scheduling policy, I know they've been fiddling around with 
 this a lot in Linux recently.
 
 You typically don't see a problem when there are spare cores, the slowdown 
 manifests when you are trying to use all the cores in your machine, so it 
 affects people on dual-cores quite a lot. This probably explains why I've not 
 been particularly affected by this myself, since I do most of my benchmarking 
 on an 8-core box.
 
 The fix that will be in 6.12.2 is to insert some yields, so that threads will 
 yield rather than spinning indefinitely, and this seems to help a lot.

Be warned that inserting yield into a spin loop is also non-portable, and may 
make the problem *worse* on some systems.

The problem is that yield calls can be taken by the scheduler to mean See, 
I'm a nice thread, giving up the core when I don't need it.  Please give me 
extra Scheduling Dubloons.

Now let's say 7 of your 8 threads are doing this.  It's likely that each one 
will yield to the next, and the 8th thread (the one you actually want 
on-processor) could take a long time to bubble up and get its moment.  At one 
time on Solaris you could even livelock (because the scheduler didn't try 
particularly hard to be fair in the case of multiple yielding threads in a 
single process)---but that was admittedly a long time ago.

The only recourse I know about is to tell the OS you're doing synchronization 
(by using OS-visible locking calls, say the ones in pthreads or some of the 
lightweight calls that Linux has added for the purpose).  Obviously this has a 
cost if anyone falls out of the spin loop---and it's pretty likely some thread 
will have to wait a while.

-Jan-Willem Maessen

 
 Cheers,
   Simon
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] vector stream fusion, inlining and compilation time

2010-03-07 Thread Jan-Willem Maessen


On Mar 4, 2010, at 9:05 PM, Roman Leshchinskiy wrote:

 On 05/03/2010, at 04:34, stefan kersten wrote:
 
 i've been hunting down some performance problems in DSP code using vector and
 the single most important transformation seems to be throwing in INLINE 
 pragmas
 for any function that uses vector combinators and is to be called from
 higher-level code. failing to do so seems to prevent vector operations from
 being fused and results in big performance hits (the good news is that the
 optimized code is quite competitive)
 
 the downside after adding the INLINE pragmas is that now some of my modules 
 take
 _really_ long to compile (up to a couple of minutes); any ideas where i can
 start looking to bring the compilation times down again?
 
 Alas, stream fusion (and fusion in general, I guess) requires what I would 
 call whole loop compilation - you need to inline everything into loops. That 
 tends to be slow. I don't know what your code looks like but you could try to 
 control inlining a bit more. For instance, if you have something like this:
 
 foo ... = ... map f xs ...
  where
f x = ...
 

I can confirm that this is a general problem with libraries based on fusion / 
deforestation (having done the independent implementation of fusion in pH back 
in the day).  No INLINE pragma?  No fusion for you!

That said, as Roman points out it'd be nice if when GHC discovers something is 
inlinable, it would inline the original definition (or perhaps the inlined, 
simplified, no-rules-firing version of same).  The problem is that this 
duplicates a lot of the work of the optimizer a lot of the time.

 you could tell GHC not to inline f until fairly late in the game by adding
 
  {-# INLINE [0] f #-}
 
 to the where clause. This helps sometimes.

Hands up if you can remember what things are legal in the braces, and what they 
mean. :-)  I suspect I'm not the only one for whom remembering this stuff is 
fairly hard, in part because it doesn't come up too often.

-Jan-Willem Maessen___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Seen on reddit: or, foldl and foldr considered slightly harmful

2010-02-11 Thread Jan-Willem Maessen


On Feb 11, 2010, at 3:41 AM, Johann Höchtl wrote:

 In a presentation of Guy Steele for ICFP 2009 in Edinburgh:
 http://www.vimeo.com/6624203
 he considers foldl and foldr harmful as they hinder parallelism
 because of Process first element, then the rest Instead he proposes
 a divide and merge aproach, especially in the light of going parallel.
 
 The slides at
 http://docs.google.com/viewer?url=http%3A%2F%2Fresearch.sun.com%2Fprojects%2Fplrg%2FPublications%2FICFPAugust2009Steele.pdf
 [Bware: Google docs]

There's no need to use Google docs.  A direct url for the pdf:

http://research.sun.com/projects/plrg/Publications/ICFPAugust2009Steele.pdf

I recently gave a followup talk at Portland State, arguing that notation 
matters, and that even with better notation programmer mindset is also going to 
be hard to change:

http://research.sun.com/projects/plrg/Publications/PSUJan2010-Maessen.pdf

The key thing here isn't *just* the handedness of lists, but the handedness of 
foldl/foldr *irrespective of the underlying data structure*.  So switching to 
tree-structured data a la fingertrees is a necessary step, but not a sufficient 
one.  The use of monoidal reductions has always been an important part of 
parallel programming.

 are somewhat  geared towards Fortress, but I wonder what Haskellers
 have to say about his position.

Now, what if list comprehensions were really shorthand for construction of 
Applicative or Monoid structures by traversing a mixture of data types with a 
common interface (something like this one)?

class Generator t e | t - e
mapReduce :: (Monoid m) = t - (e - m) - m


-Jan-Willem Maessen
 Another Fortress/Haskell crossover

 Greetings,
 
   Johann
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Map unionWith generalization

2010-01-27 Thread Jan-Willem Maessen


On Jan 27, 2010, at 5:53 AM, Hans Aberg wrote:

 I need ideally some generalizations of unionWith and unionWithKey, for 
 efficiency matters (i.e. avoiding conversions and traversing the maps more 
 than once). I could use a modification of the code in Map.hs, but then the 
 problem is that the module Map interface does not export the constructors of 
 data Map. So suggestions are welcome.
 
 For example, in Map String Integer (sparse representation of monomials) 
 compute the minimum value of all associative pairs with the same key (the 
 gcd); if only one key is present, the absent should be treated as having 
 value 0. So
  unionWith min xs ys
 will not work, because unionWith will always apply the identity to the 
 remaining value when one key is missing, whereas it should be sent to 0.
 
 So here, one would want:
  (a - c) - (b - c) - (a - b - c) - Map k a - Map k b - Map k c
 where the two first functions are applied when the first or second key is 
 missing.

Ah, the swiss army knife function on maps.  This particular formulation works 
well for the application you describe above, where you're completely traversing 
both maps.  The following really grubby variant can be used to implement 
asymptotically efficient variations of union, intersection, difference, etc., 
etc:

swissArmy ::
  (Map k a - Map k c) --- Used to process submaps unique to the left map
  (Map k b - Map k c) --- Used to process submaps unique to the right map
  (a - b - Maybe c) - -- Used to process a single common entry
  Map k a - Map k b - Map k c

Then your function appears to be:

-- helper
just2 :: (a - b - c) - a - b - Maybe c
just2 f a b = Just (f a b)

swissArmy (fmap (const 0)) (fmap (const 0)) (just2 gcd)

Here are unionWith and intersectionWith:

unionWith f = swissArmy id id (just2 f)
intersectionWith = swissArmy (const empty) (const empty) (just2 f)
differenceWith = swissArmy id (const empty) (\a b - Nothing)

When throwing together tree-like data structures, I often start by writing this 
function; it handles most of the bulk operations I might want to do as 
one-liners.  It's got a messy, ugly type signature, but it does everything you 
want as efficiently as you want.*

-Jan-Willem Maessen

* Actually, this is only true if you add the key as an argument to the third 
function, so that you can also encode unionWithKey etc!  I've skipped that here.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Map unionWith generalization

2010-01-27 Thread Jan-Willem Maessen


On Jan 27, 2010, at 9:42 AM, Hans Aberg wrote:

 On 27 Jan 2010, at 14:56, Jan-Willem Maessen wrote:
 
 So here, one would want:
 (a - c) - (b - c) - (a - b - c) - Map k a - Map k b - Map k c
 where the two first functions are applied when the first or second key is 
 missing.
 
 Ah, the swiss army knife function on maps.  This particular formulation 
 works well for the application you describe above, where you're completely 
 traversing both maps.  The following really grubby variant can be used to 
 implement asymptotically efficient variations of union, intersection, 
 difference, etc., etc:
 
 swissArmy ::
 (Map k a - Map k c) --- Used to process submaps unique to the left map
 (Map k b - Map k c) --- Used to process submaps unique to the right map
 (a - b - Maybe c) - -- Used to process a single common entry
 Map k a - Map k b - Map k c
 
 I'm not sure why you want to throw in functions between maps in the two first 
 arguments. Then there is no requirement that single keys are preserved.
 
 But it is a good idea to have a Maybe so that one can remove keys on the fly.

A good point.  Technically, one should delimit the scope of the first two 
arguments:

data KeyPreservingMapOperation k a b = AlwaysEmpty | Identity | MapMaybeWithKey 
(k - a - Maybe b)

perform :: KeyPreservingMapOperation a b - Map k a - Map k b
perform AlwaysEmpty = const empty
perform Identity = id
perform (MapMaybeWithKey f) = mapMaybeWithKey f

 When throwing together tree-like data structures, I often start by writing 
 this function; it handles most of the bulk operations I might want to do as 
 one-liners.  It's got a messy, ugly type signature, but it does everything 
 you want as efficiently as you want.*
 
 My guess is that is when you write things from scratch.

Yes.  On the other hand, I've repeatedly run into the same problem you're 
describing: a api that doesn't give me an efficient way to perform an operation 
I know to be very simple.  If every map provided an operation like this one, 
then I can avoid writing my own library from scratch when I need it --- 
especially when from scratch means fork the code and add what I need.

So, library implementors: think hard about your swiss army knife combinators. 
 You end up with messy functions with gigantic signatures.  On the other hand, 
you can often add a couple of judicious INLINE annotations and remove tons of 
code from the rest of your library.  Then expose them, and mark them as the 
functions of last resort that they truly are.

I bet there's even a fusion framework in here somewhere.

-Jan___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Map unionWith generalization

2010-01-27 Thread Jan-Willem Maessen


On Jan 27, 2010, at 10:54 AM, Hans Aberg wrote:

 On 27 Jan 2010, at 16:33, Jan-Willem Maessen wrote:
 
 I'm not sure why you want to throw in functions between maps in the two 
 first arguments. Then there is no requirement that single keys are 
 preserved.
 
 But it is a good idea to have a Maybe so that one can remove keys on the 
 fly.
 
 A good point.  Technically, one should delimit the scope of the first two 
 arguments:
 
 data KeyPreservingMapOperation k a b = AlwaysEmpty | Identity | 
 MapMaybeWithKey (k - a - Maybe b)
 
 perform :: KeyPreservingMapOperation a b - Map k a - Map k b
 perform AlwaysEmpty = const empty
 perform Identity = id
 perform (MapMaybeWithKey f) = mapMaybeWithKey f
 
 I'm thinking about
  (k - Maybe a - Maybe b - Maybe c) - Map k a - Map k b - Map k c
 The first two Maybe's tell if the keys are present, the last if one wants it 
 in the resulting map.

That has the same behavior semantically, but it's no more efficient than 
performing a unionWith followed by a filter.  For example, consider 
implementing:

xs `intersection` singleton k v

in this way.  With the function given, the complexity is necessarily O(size 
xs): we must traverse every key/value pair in xs.  By contrast, by aggregating 
the operations on keys that exist only in a single map, we can write functions 
like intersection with the desired complexity (which is O(log (size xs)) in 
this case).

 Yes.  On the other hand, I've repeatedly run into the same problem you're 
 describing: a api that doesn't give me an efficient way to perform an 
 operation I know to be very simple.  If every map provided an operation like 
 this one, then I can avoid writing my own library from scratch when I need 
 it --- especially when from scratch means fork the code and add what I 
 need.
 
 So, library implementors: think hard about your swiss army knife 
 combinators.  You end up with messy functions with gigantic signatures.  On 
 the other hand, you can often add a couple of judicious INLINE annotations 
 and remove tons of code from the rest of your library.  Then expose them, 
 and mark them as the functions of last resort that they truly are.
 
 One can transverse the product of keys. Then I'm thinking about
  (k1 - k2 - a - b - Maybe c - Maybe(k, c)) - Map k1 a - Map k2 b - 
 Map k c
 The first Maybe tells if the key is already present; the second if one wants 
 it inserted.

Traversing cross products is a very different operation from zipping in the key 
space.  Again I wouldn't want to mistakenly substitute one for the other!  But 
in this case I think you'll find that you're already well served by the 
functions that already exist---adding this function doesn't enable you to do 
anything more efficiently (at least in a big-O sense).

 The idea in both cases is to combine the modifying functions into one. This 
 simplifies the interface.

Understood, and with a sufficiently smart compiler we might analyze the 
behavior of the function and do the right thing with the single-function 
interface in both cases.  I have yet to encounter a compiler that is both smart 
and reliable on this count.  That is why I've found it necessary to expose 
these kinds of functions.

-Jan

 
  Hans
 
 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Boxed Mutable Arrays

2009-12-16 Thread Jan-Willem Maessen


On Dec 16, 2009, at 5:50 AM, Serguey Zefirov wrote:

 2009/12/16 Matt Morrow moonpa...@gmail.com:
 What are peoples' thoughts on this?
 http://hackage.haskell.org/trac/ghc/ticket/650#comment:16
 
 I think it won't get any better.
 
 Either we have O(log(N)) updates because we have to update
 hierarchical structure to speed up GC scanning (to get it to
 O(Mlog(N)), where M is a number of updated cells), or we have O(N)
 scanning.
 
 As far as I can tell, other systems (Java, for example) suffer from
 that problem as well.

The ticket suggests using VM protection to track writes.  This has been tried a 
number of times over the years by the GC community, and has usually gone badly 
- it turns out getting the OS to tell you about the page fault in reasonable 
time is hard.  It usually turns out to be cheaper just to make a cheap write 
barrier.

There are tricks that let us avoid re-scanning an array frequently during 
generational GC (and in particular avoid the problem of re-scanning the entire 
array during minor GC just to catch a single write).  But they require that we 
design appropriate read or write barriers for array accesses.  This is a Simple 
Matter of Programming, but hasn't risen to the top of anyone's priority list 
(in spite of the fact that this bug has existed for over a decade now, as far 
as I know).  It's fiddly coding and annoying to debug.

For those who are curious, here's one trick that avoids repeated re-scanning of 
arrays during GC:
  * During post-allocation tracing, move all data pointed to by the array into 
contiguous chunks of memory.
  * Group together that memory logically with the memory containing the array 
itself.
  * Keep track of whether anything points in to this memory; if nothing does, 
free the lot of it (or use the bad old tracing method; you won't find the big 
array and won't pay anything).
  * If there are subsequent writes, trace those and move them to the region.
  * Re-scan the whole region only if there have been enough writes (presumably 
causing the region to fill with data that has been overwritten and thrown away).

Here the cost is roughly proportional to the number of writes you perform.  
Haskell being lazy, that might be more than you would expect.  There are (lots 
of) other tricks along the same lines with differing tradeoffs, and naturally 
I've skimmed over some important details.  What you should take away is that 
GCing an array of pointers need not be expensive---we can do O(writes) scanning 
over its lifespan, rather than O(N) scanning work at every single GC.  And note 
that in general we don't pay any GC costs at all unless we actually keep the 
array around for a while.

-Jan-Willem Maessen

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Structural sharing in haskell data structures?

2009-05-14 Thread Jan-Willem Maessen



On May 13, 2009, at 6:58 PM, wren ng thornton wrote:


Jan-Willem Maessen wrote:

I wanted to clear up one misconception here...
wren ng thornton wrote:
 In heavily GCed languages like Haskell allocation and collection  
is  cheap, so we don't mind too much; but in Java and the like,  
both  allocation and collection are expensive so the idea of cheap  
throwaway  objects is foreign.

Not true!


I was speaking of Java, not Clojure. I believe the costs in Java are  
well documented, though I don't know enough about the JVM to know  
where the blame belongs. (All I know of Clojure is that it's a Lisp- 
like on the JVM :)


I think you're missing the point here: the code I refer to below *is  
in Java* and is running on a standard JVM; the costs you refer to  
simply don't exist!  As Vladimir Ivanov points out, and as Rich Hickey  
is happy to observe in his talks on Clojure, the JVM handles  
allocation-intensive garbage-intensive programs very well.


If you look at the internals of Clojure, you'll discover they're  
using trees with *very* wide fanout (eg fanout-64 leaf trees for  
lists).  Why?  Because it's so cheap to allocate and GC these  
structures!  By using shallow-but-wide trees we reduce the cost of  
indexing and accessing list elements.  I suspect you'd still be  
hard-pressed to support this kind of allocation behavior in any of  
the present Haskell implementations, and Haskell implementations of  
the same kinds of structures have limited fanout to 2-4 elements or  
so.


I was under the impression that the reason datastructures in Haskell  
tend to be limited to 4-fanout had more to do with the cleanliness  
of the implementations--- pattern matching on 64-wide cells is quite  
ugly, as is dealing with the proliferation of corner cases for  
complex structures like finger trees, patricia trees, etc. The use  
of view patterns could clean this up significantly. On the other  
hand, we do have things like lazy ByteStrings and UVector which do  
have wide fanouts.


Hmm, I think neither of the data structures you name actually support  
both O(lg n) indexing and O(lg n) cons or append.  That said, your  
point is well taken, so let's instead state it as a challenge:


Can you, oh Haskellers, implement a fast, wide-fanout (say = 8) tree- 
based sequence implementation in Haskell, which supports at-least-log- 
time indexing and at-least-log-time cons with a large base for the  
logarithm?  Can you do it without turning off array bounds checking  
(either by using unsafe operations or low-level peeking and poking)  
and without using an algebraic data type with O(f) constructors for  
fanout of f?  You can turn off bounds checks if your program encodes  
static guarantees that indices cannot be out of bounds (there are a  
couple of libraries to do this).


The spirit here is Work in Haskell with safe operations and no FFI  
except through safe libraries, but otherwise use any extensions you  
like.


I actually think this *is* doable, but it touches a few areas where  
Haskell doesn't presently do well (the bounds checking in particular  
is a challenge).  I threw in the bounds checking when I realized that  
in fact the equivalent Java code is always bounds checked, and these  
bounds checks are then optimized away where possible.  Actually, I'd  
*love* to see an *in*efficient solution to eliminating as many bounds  
checks as possible!


-Jan




--
Live well,
~wren
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Structural sharing in haskell data structures?

2009-05-14 Thread Jan-Willem Maessen



On May 14, 2009, at 10:17 AM, Duncan Coutts wrote:


On Thu, 2009-05-14 at 09:03 -0400, Jan-Willem Maessen wrote:


Hmm, I think neither of the data structures you name actually support
both O(lg n) indexing and O(lg n) cons or append.  That said, your
point is well taken, so let's instead state it as a challenge:

Can you, oh Haskellers, implement a fast, wide-fanout (say = 8)  
tree-
based sequence implementation in Haskell, which supports at-least- 
log-

time indexing and at-least-log-time cons with a large base for the
logarithm?  Can you do it without turning off array bounds checking
(either by using unsafe operations or low-level peeking and poking)
and without using an algebraic data type with O(f) constructors for
fanout of f?  You can turn off bounds checks if your program encodes
static guarantees that indices cannot be out of bounds (there are a
couple of libraries to do this).


Can we motivate the restriction of not using multiple constructors? If
we're only talking about a fanout of 8 then it doesn't look like a
problem.


I actually expect this will cause some fairly nasty code bloat, but  
I'm happy to be proven wrong. :-)



It sounds like you're really asking for an array but without
wanting to say so explicitly. Perhaps you should ask for a variable
fanout or a fanout of something bigger like 32 (and presumably these
requirements could be justified too?).


Wide fanout seems fair.

-Jan




Duncan



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Structural sharing in haskell data structures?

2009-05-14 Thread Jan-Willem Maessen



On May 14, 2009, at 11:01 AM, Dan Doel wrote:


On Thursday 14 May 2009 9:03:30 am Jan-Willem Maessen wrote:

Hmm, I think neither of the data structures you name actually support
both O(lg n) indexing and O(lg n) cons or append.  That said, your
point is well taken, so let's instead state it as a challenge:


Data.Sequence has O(log n) index, concatenation, update, take, drop  
and
splitAt, and O(1) cons, snoc, and viewing at both ends, according to  
the

documentation.


Yes.  But large sequences end up being quite deep.  Can a wide-fanout  
version be made that is actually faster?  Note that the effective  
fanout of Hinze's finger trees is approximately e; consider effective  
fanouts of e^2 to e^4 (which may require substantially higher maximum  
fanout).


-Jan


-- Dan


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Numeric Prelude and identifiers (Was: fad 1.0 -- Forward AutomaticDifferentiation library)

2009-04-05 Thread Jan-Willem Maessen



On Apr 5, 2009, at 9:33 AM, Henning Thielemann wrote:



On Sun, 5 Apr 2009, Kalman Noel wrote:


Henning Thielemann schrieb:

with advanced type classes:
http://hackage.haskell.org/packages/archive/numeric-prelude/0.0.5/doc/html/MathObj-PowerSeries.html


I'll take this as another opportunity to point out that the Haddock  
docs
of the Numeric Prelude are highly unreadable, due to all qualified  
class

and type names appearing as just C or T.


It's Haddock's fault. :-) I have written a Trac ticket, but  
trac.haskell.org does currently not respond.


I may be treading in murky waters here, but I do think a large part of  
the problem is that the Numeric Prelude has chosen to use ML naming  
conventions (which refer to types in a module as T, etc.) when you're  
writing a Haskell program.  Surely if the types, classes, and so forth  
were given evocative names, numeric prelude programs would become  
readable?  And as a special bonus, though it may offend your  
sensibilities, numeric prelude programs might be able to use  
unqualified import in certain circumstances?


-Jan-Willem Maessen
 [For each language, its own idiom!]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Elegant powerful replacement for CSS

2009-02-04 Thread Jan-Willem Maessen

You might be interested in some of the recent work by Leo Meyerovich  
et al. at Berkeley.  They wanted to parallelize CSS processing, and  
found they had to construct a simplified, orthogonal version of CSS  
and express their algorithms in terms of that subset.


This is very much work in progress, but there's a description and  
pointer to talk slides here:


http://www.eecs.berkeley.edu/~lmeyerov/#projects

Presumably their forthcoming HotPar paper will yield more details.

-Jan-Willem Maessen

On Feb 3, 2009, at 2:39 PM, Conal Elliott wrote:

[Spin-off from the haskell-cafe discussion on functional/ 
denotational GUI toolkits]


I've been wondering for a while now what a well-designed alternative  
to CSS could be, where well-designed would mean consistent,  
composable, orthogonal, functional, based on an elegantly compelling  
semantic model (denotational).


   - Conal

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] ANN: HLint 1.2

2009-01-12 Thread Jan-Willem Maessen



On Jan 12, 2009, at 9:01 AM, Duncan Coutts wrote:


No because the current definition are recursive and ghc cannot inline
recursive functions.

map :: (a - b) - [a] - [b]
map _ [] = []
map f (x:xs) = f x : map f xs

It has to be manually transformed into a version that is not recursive
at the top level:

map :: (a - b) - [a] - [b]
map f = go
 where
   go [] = []
   go (x:xs) = f x : go xs

Then the map can be inlined at the call site and the 'f' inlined into
the body of 'go'.


This seems like exactly the sort of mechanical transformation that  
computers do quickly and accurately, and humans get wrong.  Surely it  
wouldn't be that hard for GHC to transform self recursion in this way  
(possibly subject to the condition that the result be worth inlining)?


[phc did this, and I think it was inherited from Lennart's program  
transformations.]


-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] ANN: HLint 1.2

2009-01-12 Thread Jan-Willem Maessen



On Jan 12, 2009, at 12:47 PM, Max Bolingbroke wrote:


2009/1/12 Jan-Willem Maessen jmaes...@alum.mit.edu:

On Jan 12, 2009, at 9:01 AM, Duncan Coutts wrote:

No because the current definition are recursive and ghc cannot  
inline

recursive functions.



Then the map can be inlined at the call site and the 'f' inlined  
into

the body of 'go'.


This seems like exactly the sort of mechanical transformation that  
computers
do quickly and accurately, and humans get wrong.  Surely it  
wouldn't be that
hard for GHC to transform self recursion in this way (possibly  
subject to

the condition that the result be worth inlining)?


GHC should indeed be doing so. I'm working (on and off) to work out
some suitable heuristics and put the transformation into ghc -O2.
There are a few wrinkles that still need sorting out, but preliminary
indications are that it decreases the runtime of our standard
benchmark suite, nofib, by 12% or so.


This is excellent news, quite apart from Don's observation that it  
isn't particularly relevant for map (where we are essentially using  
RULES to instantiate an alternative definition in terms of foldr/ 
build, if I understand his message rightly).  Self recursion is abut  
so much more than map!


-Jan


Cheers,
Max


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Comparing on multiple criteria

2008-12-21 Thread Jan-Willem Maessen


On Dec 21, 2008, at 8:52 AM, Martijn van Steenbergen wrote:


Hello all,

Data.Ord has a handy function called comparing, and its  
documentation shows an example of its use.


But what if you want to sort a list of values based on multiple  
criteria? It turns out there is a neat way to do this:


compareTuple = mconcat [comparing fst, comparing snd]

The default Monoid instances for Ordering and functions work exactly  
as required here. (Thanks to vixey in #haskell for the hint to look  
at monoids!)


Indeed, this is great to know.  I can't help but notice that there is  
no documentation of any kind at all for the Monoid instance of  
Ordering; how were we supposed to know this behavior existed in the  
first place, except by hunting down the source code for the instance  
declaration?


-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re[2]: [Haskell-cafe] implementing python-style dictionary in Haskell

2008-11-18 Thread Jan-Willem Maessen



On Nov 18, 2008, at 7:03 AM, Bulat Ziganshin wrote:


Hello Tillmann,

Tuesday, November 18, 2008, 2:46:47 PM, you wrote:


Why should a Haskell hash table need more memory then a Python hash
table? I've heard that Data.HashTable is bad, so maybe writing a good
one could be an option.


about Data.HashTable: it uses one huge array to store all the entries.
the catch is that GC need to scan entire array on every (major) GC.


Actually, the scan on every major (full) GC is unavoidable.  What  
*can* be avoided is a scan on every *minor* GC that occurs after an  
update.  I forget what the exact strategy is here, but I know that one  
write used to cause the entire array to be re-scanned; what I don't  
remember is when/if the array transitions back to a state where it  
isn't being scanned by minor GC anymore.



using array of hashtables may improve situation a lot


Yes, this would be worth trying.  Understanding the current GC  
strategy would make it easier to make the right tradeoffs here; we  
expect n insertions will touch O(n) subtables, so repeated insertion  
will make life worse if we're not careful.


-Jan-Willem Maessen


plus check GC times for every version: +RTS -Soutfile


--
Best regards,
Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] number of references to a variable

2008-11-01 Thread Jan-Willem Maessen



On Nov 1, 2008, at 9:38 AM, Andrew Coppin wrote:


Alberto G. Corona wrote:
Is there a way to know the number of memory references for a  
variable?. The runtime must know it but i do not know if this  
available for the program trough any low level trick


More precisely, the GC computes it each time it runs. (And only  
computes it precisely during a major pass, not the more frequent  
minor passes.)


Even this isn't quite true for most GC algorithms.  The GC only needs  
to compute whether there is 0 or = 1 reference to a given location  
(with some special weasel words for stuff with finalizers defined).   
If you can see it, the answer is always =1, so this information is  
much less useful than you might think!


Usually the clever thing you want to know is this is the sole  
reference to the pointed-to object.  If that's what you're interested  
in, try looking up one-bit reference counting, but note that like  
any accurate reference counting technique it's really inefficient in  
practice compared to GC.  [There are efficient reference counting  
techniques, but they defer refcount updates in various subtle ways.   
Also, every ref count technique requires a cycle detector.]


-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Haskell on the JVM

2008-10-12 Thread Jan-Willem Maessen



On Oct 12, 2008, at 6:19 AM, Jon Harrop wrote:


On Saturday 11 October 2008 17:45:39 John A. De Goes wrote:

I have strong interest in hosting GHC on the JVM. And I suspect it
would be good for the Haskell community, as the JVM already runs on
nearly every machine known to man, has a wealth of cross-platform
libraries, and is getting improved support for dynamic and functional
languages (method handles, tail call).


The JVM has been about to get basic features like tail calls for  
several years

now.


As a gentle correction of fact, while folks have been *asking* for  
tail calls in the JVM for over 10 years, it's only in the last year or  
two that there's actually been support from VM implementors for their  
inclusion.  I've had specific discussions with JVM folks on this very  
topic on numerous occasions over the last 5 years.  [I can't speak to  
what other basic features might have been slated for inclusion  
several years ago; I'll note that invokedynamic has been slated for  
inclusion for much longer, but required some wrangling between  
competing VM implementors and language designers to actually settle on  
an implementable spec.  The result looks nothing at all like my memory  
of the original proposal.]


Back to your regularly scheduled Haskell discussion.

-Jan


--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re[2]: [Haskell-cafe] Pure hashtable library

2008-08-28 Thread Jan-Willem Maessen



On Aug 27, 2008, at 4:31 PM, Bulat Ziganshin wrote:


Hello Jan-Willem,

Wednesday, August 27, 2008, 4:06:11 PM, you wrote:


One obvious way to make non-modifiable hash tables useful is to eat
your own tail non-strictly--- aggregate a set of hash table entries,
construct a hash table from them, and plumb the resulting hash table
into the original computation by tying the knot.  This works really
well if you can construct the bucket lists lazily and if you specify
the table size up front.


i think it's impossible since you need to scan whole assoclist to
build list of entries for any value of hash function.


I think this is because I wasn't quite clear enough about the problem  
I was solving.  I think you'll agree that we can use an assocList non- 
strictly in the following sense:
  * We can do lookups that succeed so long as we can compute all keys  
up to and including the key that matches.
  * We never do non-strict lookups that fail, as that would require  
examining the entire assocList.


Now I can build a hashtable with the same property from an assocList,  
albeit very inefficiently, using code like this (untested):


lazyArray :: (Ix i) = (i,i) - [(i,v)] - Array i [v]
lazyArray bnds kvs =
array bnds [ (k, map snd . filter ((k==) . fst) $ kvs) | k -  
range bnds ]


makeHash :: (Eq k, Hashable k) = [(k,v)] - Array Int [(k,v)]
makeHash assocList =
lazyArray (0,n-1) labeledAssocList
  where labeledAssocList = [ (hashToSize n k, (k,v)) | (k,v) -  
assocList ]


We label each assocList element with its corresponding hash bucket  
(labeledAssocList); each bucket then contains exactly the elements of  
assocList that map to that bucket, in exactly the order in which they  
occurred in assocList.


The LazyArray library in hbc essentially did exactly what the  
lazyArray function I've shown above does, only the input list is  
traversed once rather than having a separate traversal for each  
bucket.  This can actually be implemented using the ST monad augmented  
by unsafeFreezeSTArray and unsafeInterleaveST; indeed the State in  
Haskell paper by Peyton Jones and Launchbury gives the implementation  
of a very similar function.


I have code for LazyArray based on the above paper that works with  
GHC, but I haven't needed it in a while.


-Jan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Jan-Willem Maessen


On Aug 27, 2008, at 3:41 AM, Bulat Ziganshin wrote:


Hello haskell-cafe,

solving one more task that uses English dictionary, i've thought:  
why we don't yet have pure hashtable library? There is imperative  
hashtables, pretty complex as they need to rebuild entire table as  
it grows. There is also simple assoc lists and tree/trie  
implementations, but there is no simple non-modifiable hashes.


I know that Lennart had such a hashtable implementation as part of the  
hbcc source tree (so dating back to the late stream age or the very  
very early monad age), though I think it relied upon hbc's LazyArray.


One obvious way to make non-modifiable hash tables useful is to eat  
your own tail non-strictly--- aggregate a set of hash table entries,  
construct a hash table from them, and plumb the resulting hash table  
into the original computation by tying the knot.  This works really  
well if you can construct the bucket lists lazily and if you specify  
the table size up front.  You can't make this trick work at all for  
tree-based maps in general, because the structure of the tree depends  
upon all the keys inserted.  You also can't make this trick work if  
you base the size of the hash table on the number of keys inserted,  
maximum bucket load, etc.  Finally, it doesn't work with strict arrays  
at all.


So a nice niche for a small and clever static hash table.

-Jan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Data.Complex.magnitude slow?

2008-07-20 Thread Jan-Willem Maessen



On Jul 20, 2008, at 8:11 PM, Richard A. O'Keefe wrote:


I read this as

	Is there any way to take a 32-bit float in a register and end up  
with a 32-bit int

in a register, without going through memory, in GHC?
 How about the other way around?
 How about 64-bit floats and integers?

It is rather hard to do portably in GHC what some hardware does not  
do.


The reason to provide this as a primitive is so that code like Complex  
magnitude doesn't have to go through a completely opaque-to-the- 
compiler interface in order to extract bits from the underlying IEEE  
float representation.  Sure, code generation can't assign it to a  
register, but it could be kept on the stack.  Once we're peeking and  
poking through a Ptr (the trick that was suggested the last time this  
came up, too, if I remember rightly) we're sunk---GHC doesn't reason  
well at all about this stuff, and we need to allocate a Ptr to peek  
and poke.  If instead we use a foreign call, again, it's completely  
compiler-opaque.


A long time ago hardware architects decided that separating the  
integer and

floating point register files was a Good Thing.
The machine I know best is SPARC, where there are no instructions to  
move

data directly between integer registers and floating point registers.


Ironically, this is no longer true on Niagara-class (T1 and T2) SPARC  
machines.  So nowadays this may be the only architecture that *does*  
provide this ability.  But again, the really important thing is that  
the compiler can see that bits is bits and that the conversion  
involves no magic and no extra storage beyond a spill location on the  
stack.



... [stuff about SIMD snipped]...


All things considered, I wouldn't worry about it too much:  the  
memory in
question should be at the top of the stack, so should be in L1  
cache, so should

be pretty fast to access, and other things may be more significant.


But only if GHC can treat them transparently---which at the moment it  
evidently can't!


All it'd take is a primitive from Float# - Int32# and another from  
Double# - Int64#...


-Jan-Willem Maessen




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Announce: Fortress talk in New York City

2008-06-13 Thread Jan-Willem Maessen



On Jun 13, 2008, at 10:43 AM, Bayley, Alistair wrote:


From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Henning Thielemann


There will be a talk on Fortress...

---

This e-mail may contain confidential and/or privileged information.


You send potentially confidential information to a public
mailing list?


I realise this is tongue-in-cheek...



 Is this new language secret? :-)


.. but I'm not so sure about this. If this is a serious question, here
are some links:

 http://research.sun.com/projects/plrg/
 http://research.sun.com/projects/plrg/Publications/index.html
 http://projectfortress.sun.com/Projects/Community


Let me assure the readers there's nothing confidential or privileged  
about Christine's talk! :-)  It is, after all, an open source project.


-Jan-Willem Maessen
 Project Fortress, Sun Microsystems Laboratories
 [who will be off teaching Fortress in Prague at the time]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Design your modules for qualified import

2008-06-06 Thread Jan-Willem Maessen


On Jun 6, 2008, at 8:12 AM, Wolfgang Jeltsch wrote:


Am Donnerstag, 5. Juni 2008 17:19 schrieb Johan Tibell:

[…]



2. It's the default. You have to add qualified to all your imports
to make them qualified. In most language imports are qualified by
default. I think the latter would have been a better choice but we
have to live with the current design so bite the bullet and add those
qualified keywords to your imports.


If you leave out “qualified”, you still get the qualified names  
imported.  And
if you use conflicting identifiers always qualified then there’s no  
problem.

For example, you can use

   import Data.Set as Set
   import Data.List as List

and then just say Set.null or List.null.


There's one caveat: Always choose descriptive names, even if you are  
assuming that you will usually use a qualified import.  The following  
are wonderful names, even though they conflict with the prelude:

  null
  filter
  map
  lookup

The following are terrible names:
  T
  C

What's a T?  What's a C?  There's no excuse to give something a lousy  
name just because the enclosing module is descriptively named.  I  
reject the naming conventions used by ML modules when writing Haskell  
code: Haskell modules are not ML modules.


-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Design your modules for qualified import

2008-06-06 Thread Jan-Willem Maessen



On Jun 6, 2008, at 12:54 PM, Henning Thielemann wrote:



On Fri, 6 Jun 2008, Achim Schneider wrote:


Jan-Willem Maessen [EMAIL PROTECTED] wrote:


There's one caveat: Always choose descriptive names, even if you are
assuming that you will usually use a qualified import.  The
following are wonderful names, even though they conflict with the
prelude: null
  filter
  map
  lookup


import Prelude as P


Precisely.  If I import the prelude qualified and your library  
unqualified, is my code readable?  I should hope it is.  And if the  
library used the overlapping names reasonably, you shouldn't be left  
wondering when you read my code.



The following are terrible names:
  T
  C


Not to mention that this usage is hideously confusing while looking  
at

the haddock docs.


But that will be resolved when Haddock can show identifiers with
qualifications.


I specifically *didn't* bring up the Haddock issue, because I think  
it's a side show.  Fundamentally, these types are neither clear nor  
descriptive.  Their treatment by one or another documentation tool is,  
at some level, beside the point.



It's good to have fine grained modules, because you can more easily
exchange the parts you want different from the standard way. For  
reducing
import lists for simple songs I think we could provide wrapper  
modules.


Make your modules as small as you like; small modules are great.  But  
keep things readable, please!


-Jan



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] lookup tables style guidelines

2008-04-26 Thread Jan-Willem Maessen



On Apr 26, 2008, at 7:41 AM, Adrian Hey wrote:


Jan-Willem Maessen wrote:

On Apr 24, 2008, at 11:33 AM, Adrian Hey wrote:
Also, if you're likely to be using union/intersection a lot you  
should

know that Data.Map/Set are very slow for this because they use the
not efficient hedge algorithm :-)
OK, I'm going to bite here: What's the efficient algorithm for  
union on balanced trees, given that hedge union was chosen as being  
more efficient than naive alternatives (split and merge, repeated  
insertion)?  My going hypothesis has been hedge union is an  
inefficient algorithm, except that it's better than all those other  
inefficient algorithms.


Divide and conquer seems to be the most efficient, though not the
algorithm presented in the Adams paper.


Just to clarify: divide and conquer splits one tree on the root value  
of the other (possibly avoiding enforcing the balance metric until  
after joining trees, though not obvious how / if that's useful)?  The  
definition of divide and conquer on trees without a fixed structure  
is rather unclear, which is why the question comes up in the first  
place.



Hedge algorithm performs many
more comparisons than are needed, which is obviously bad if you don't
know how expensive those comparisons are going to be.


That makes sense.  I found myself having the same kinds of thoughts  
when reading Knuth's analyses of (eg) different binary search  
algorithms in TAOCP v.3; if comparison was the dominant cost, which  
algorithm looked best suddenly changed.


-Jan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] lookup tables style guidelines

2008-04-25 Thread Jan-Willem Maessen



On Apr 24, 2008, at 11:33 AM, Adrian Hey wrote:


Also, if you're likely to be using union/intersection a lot you should
know that Data.Map/Set are very slow for this because they use the
not efficient hedge algorithm :-)


OK, I'm going to bite here: What's the efficient algorithm for union  
on balanced trees, given that hedge union was chosen as being more  
efficient than naive alternatives (split and merge, repeated  
insertion)?  My going hypothesis has been hedge union is an  
inefficient algorithm, except that it's better than all those other  
inefficient algorithms.


For IntSet/IntMap of course the split structure of the tree is fixed  
(we can think of these as being compressed versions of a complete  
binary tree) and union and intersection are quite efficient.


-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Stronger STM primitives needed? Or am I just doing it wrong?

2008-04-23 Thread Jan-Willem Maessen



On Apr 23, 2008, at 12:13 PM, Ryan Ingram wrote:


On Wed, Apr 23, 2008 at 7:54 AM, Tim Harris (RESEARCH)
[EMAIL PROTECTED] wrote:

What do you think about a slight change:

  readTVarWhen :: TVar a - (a - bool) - STM a


This seems strictly less powerful than retryUntil:

readTVarWhen v p = retryUntil v p  readTVar v


Consider the following transaction:


intV :: TVar Int
boolV :: TVar Bool

interesting = atomically $ do
  retryUntil intV ( 50)
  retryUntil boolV id


Lets say that intV contains 100 and boolV contains False.  Then this
transaction retries.  Now, if intV changes to 101, this transaction
doesn't need to re-run; we can see immediately that no predicate
changed.


How can we tell, though?  In effect, I need to either say I care when  
intV changes or I need read intV again and make sure that ( 50)  
still holds before I can commit.



 Using readTVarWhen, this is less clear; the transaction
log would hold a read on intV which would be more difficult to ignore.


In order to guarantee that your test is atomic wrt the rest of the  
transaction, you'll need to do the same.  What you do in response to a  
change in intV might be different, though.


I've been trying to decide whether either of these is implementable in  
terms of `orElse`, in such a way that we immediately check the  
predicate upon retry before doing anything else.   I can't quite make  
up my mind whether this is possible or not.


-Jan-Willem Maessen


 -- ryan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] compressed pointers?

2008-04-16 Thread Jan-Willem Maessen



On Apr 16, 2008, at 4:45 AM, Ketil Malde wrote:

I notice BEA uses something called compressed pointers to get the
64-bit (more registers, etc) benefits without paying the
(cache-thrashing) cost.


But only if you're not *actually* using a 64-bit address space.  From  
their own documentation:


The heap size will be limited to less than 4 GB; therefore, you can  
only use this option for applications that demand less than 4 GB of  
live data. The heap will be reduced to meet this size limitation if  
you specify a larger initial (-Xms) or maximum (-Xmx) heap size.


(http://edocs.bea.com/jrockit/jrdocs/refman/optionXX.html#wp1021022)

So this amounts to saying we can use the 64-bit ISA but still use 32- 
bit pointers with all the restrictions that accompany them.  You  
might be able to keep non-heap data around in excess of 4GB (eg it  
might be possible to mmap a file *and* have 4GB of heap data, and  
maybe even keep thread stacks off-heap as well).


You can take advantage of pointer alignment to get address spaces of  
8-32GB (by shifting 32-bit pointers before dereferencing them), but  
that requires taking back the pointer bits that GHC just stole for  
pointer tagging.


-Jan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] floating point operations and representation

2008-03-13 Thread Jan-Willem Maessen



On Mar 12, 2008, at 8:35 PM, Jacob Schwartz wrote:


My second question is how to get at the IEEE bit representation for a
Double.


My (rhetorical) question on this front isn't how do I get the  
representation, but why is it so hard and non-portable to get the  
representation sensibly?  A candidate for standardization, surely?


-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] STAMP benchmark in Haskell?

2008-03-02 Thread Jan-Willem Maessen



On Mar 1, 2008, at 6:41 PM, Tom Davies wrote:


I'm experimenting with STM (in CAL[1] rather than Haskell)
and want to run the STAMP[2] benchmarks.


Hmm, I don'tknow of a particularly good STM-in-Haskell benchmark, but  
I'd say that the STAMP benchmarks are written in a rather imperative,  
object-oriented style.  You wouldn't get very meaningful data about  
anything if you were to naively translate them to Haskell; you'd  
instead have to rewrite them completely (at which point head-to-head  
comparisons are difficult).



Is there a Haskell translation available, or can anyone
suggest a better/different benchmark suite for STM?


Good question.  Because we tend to eschew mutable state in Haskell,  
I'd expect the characteristics of such an application to be *very*  
different.


-Jan-Willem Maessen




Thanks,
 Tom

[1] http://openquark.org/Open_Quark/Welcome.html
[2] http://stamp.stanford.edu/

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: fast graph algorithms without object identities

2008-02-26 Thread Jan-Willem Maessen



On Feb 23, 2008, at 5:48 PM, apfelmus wrote:


Henning Thielemann wrote:

It seems that algorithms on graphs can be implemented particularly
efficient in low-level languages with pointers and in-place  
updates. E.g.

topological sort needs only linear time, provided that dereferencing
pointers requires constant time. I could simulate pointer  
dereferencings

and pointer updates by Map yielding linear logarithmic time for
topological sort. I wonder if it is possible to write a linear time
topological sort using lazy evaluation, since the runtime system of
Haskell implementations is a graph processor based on pointers.


First of all, topological sorting is only linear time because the 32  
or 64 bit used to label nodes aren't counted. Put differently,  
random access in constant time to a collection of  n  elements  
doesn't exist.


That being said, we want to use arrays of course. Preferably in a  
whole-meal way that doesn't involve incremental state updates. A  
few minutes ago, I stumbled upon the  lazyarray  packages which  
points to the paper


 Thomas Johnsson.
 Efficient Graph Algorithms Using Lazy Monolithic Arrays
 http://citeseer.ist.psu.edu/95126.html

which offers such a way! (Although I currently don't quite  
understand why this works, and these ad-hoc unique numbers bother me.)


I have an implementation of this in GHC based on some of the early ST  
papers, if anyone is interested.  It's rather old and may have bit- 
rotted; cabalizing it is at the top of whenever-I-get-time, along  
with generic splittable supplies.  Note that getting the laziness  
right is somewhat tricky.


-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Designing DSL with explicit sharing [was: I love purity, but it's killing me]

2008-02-15 Thread Jan-Willem Maessen



On Feb 15, 2008, at 1:15 PM, Tony Finch wrote:


On Thu, 14 Feb 2008, [EMAIL PROTECTED] wrote:


As I understand the original problem had less to do with the number  
of

comparison but more to do with the cost of a single comparison. In an
impure language, we can use constant-time physical equality. It is
usually provided natively as pointer comparison, and can be trivially
emulated via mutation.


In Haskell you can (with care) use System.Mem.StableName.
http://research.microsoft.com/~simonpj/Papers/weak.htm


Extra points to anyone who can demonstrate this actually in use for  
caching.


-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] A question about monad laws

2008-02-12 Thread Jan-Willem Maessen



On Feb 12, 2008, at 1:50 AM, David Benbennick wrote:

On Feb 11, 2008 10:18 PM, Uwe Hollerbach [EMAIL PROTECTED]  
wrote:

If I fire up ghci, import
Data.Ratio and GHC.Real, and then ask about the type of infinity,  
it

tells me Rational, which as far as I can tell is Ratio Integer...?


Yes, Rational is Ratio Integer.  It might not be a good idea to import
GHC.Real, since it doesn't seem to be documented at
http://www.haskell.org/ghc/docs/latest/html/libraries/.  If you just
import Data.Ratio, and define


pinf :: Integer
pinf = 1 % 0



ninf :: Integer
ninf = (-1) % 0


Then things fail the way you expect (basically, Data.Ratio isn't
written to support infinity).  But it's really odd the way the
infinity from GHC.Real works.  Anyone have an explanation?


An educated guess here: the value in GHC.Real is designed to permit  
fromRational to yield the appropriate high-precision floating value  
for infinity (exploiting IEEE arithmetic in a simple, easily- 
understood way).  If I'm right, it probably wasn't intended to be used  
as a Rational at all, nor to be exploited by user code.


-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] fast graph algorithms without object identities

2008-02-01 Thread Jan-Willem Maessen



On Feb 1, 2008, at 9:41 AM, Alfonso Acosta wrote:


You'd probably be interested to read
http://www.cs.chalmers.se/~koen/pubs/entry-asian99-lava.html


It is indeed an interesting paper (that I've read and referred to  
several times over the years).  But it's tricky to get right in  
practice!  And sadly, while it solves the problem of sharing (or  
object equivalence) it doesn't give us some sort of total order or  
hash key, so there's no way to efficiently associate transient mutable  
state uniquely with each reference we encounter.  For that we need one  
of the other solutions discussed and rejected.  This is why  
Data.Unique provides a pure hashUnique function.


The best option I know of here to get the desired time bounds with a  
purely-functional abstraction is to use a splittable supply of unique  
labels (which can be encapsulated in a monad if we like), then use ST  
to associate a hash table of references with the graph nodes while  
traversing the graph.


-Jan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] fast graph algorithms without object identities

2008-01-31 Thread Jan-Willem Maessen



On Jan 31, 2008, at 5:39 AM, Henning Thielemann wrote:



It seems that algorithms on graphs can be implemented particularly
efficient in low-level languages with pointers and in-place updates.  
E.g.

topological sort needs only linear time, provided that dereferencing
pointers requires constant time. I could simulate pointer  
dereferencings

and pointer updates by Map yielding linear logarithmic time for
topological sort. I wonder if it is possible to write a linear time
topological sort using lazy evaluation, since the runtime system of
Haskell implementations is a graph processor based on pointers.


If so, I'd love to see this written up; I think it may be publishable  
if it isn't published already.


Note that even using ST techniques can take more than linear time,  
given an arbitrary purely-functionally-defined graph as input.  We  
can't (eg) assume that each node contains a reference, or that they  
are densely numbered, so we end up having to look them up in some  
fashion (though using a hash table can be reasonably quick if we  
uniquely number nodes).


-Jan-Willem Maessen



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: The programming language market (was Re: [Haskell-cafe] Why functional programming matters

2008-01-28 Thread Jan-Willem Maessen



On Jan 27, 2008, at 11:05 PM, Dipankar Ray wrote:



thanks for the correction - very informative! that'll teach me to  
just go to the opencourseware site at MIT only...


On that note, I'll point out that many (roughly half?) the  
undergraduate CS majors at MIT do a 5 year combined bachelor's /  
master's program.  Many of them take the graduate programming  
languages course (6.821), which has good coverage of semantics and  
type inference.


That said, there's a good deal of skepticism about functional  
programming among many MIT faculty members just as there is at  
Berkeley.  If you want to hear an entertaining story some day, ask me  
in person about my Ph.D. thesis defense.


-Jan-Willem Maessen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Why functional programming matters

2008-01-24 Thread Jan-Willem Maessen



On Jan 24, 2008, at 6:04 PM, Evan Laforge wrote:


Well... ghc still has a single-threaded garbage collector, so all the
par threads must stop for garbage collection.  So scaling to the
level of a cluster would be significantly sub-linear.


A real time incremental gc would be really cool.  Some people claim
they exist, but which languages have one?


Define real time.  I'll note that, after all the mud that's been  
slung at Java, you've been able to get low-pause-time parallel GC in  
Java for years and years, and can get real time GC for various of  
your favorite definitions of that term if you're willing to look a  
little.


Relatively few other language implementations can claim the same.   
Writing a decent parallel GC (eg, faster than the tuned sequential GC  
it's replacing on 2 or more CPUs) is hard.  Sounds like GHC is nearly  
there, though.  GC implementors dream of systems where read barriers  
are nearly free.  Better still, everything that's been learned about  
and published in Java-land carries across to Haskell (though the  
tradeoffs in eg mutation behavior are often different).


-Jan-Willem Maessen



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: [Haskell] IVar

2007-12-05 Thread Jan-Willem Maessen



On Dec 5, 2007, at 3:58 AM, Simon Marlow wrote:


Jan-Willem Maessen wrote:


Consider this:

do
  x - newIVar
  let y = readIVar x
  writeIVar x 3
  print y

(I wrote the let to better illustrate the problem, of course you  
can inline y if you want).  Now suppose the compiler decided to  
evaluate y before the writeIVar.  What's to prevent it doing  
that?  Nothing in the Haskell spec, only implementation convention.
Nope, semantics.  If we have a cyclic dependency, we have to  
respect it---it's just like thunk evaluation order in that respect.


Ah, so I was thinking of the following readIVar:

readIVar = unsafePerformIO . readIVarIO

But clearly there's a better one.  Fair enough.


Hmm, so unsafePerformIO doesn't deal with any operation that blocks?   
I'm wondering about related sorts of examples now, as well:


do
  x - newIVar
  y - unsafeInterleaveIO (readIVarIO x)
  writeIVar x 3
  print y

Or the equivalent things to the above with MVars.

-Jan




Cheers,
Simon



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] A tale of Project Euler

2007-11-29 Thread Jan-Willem Maessen



On Nov 29, 2007, at 6:19 PM, Stefan O'Rear wrote:


On Thu, Nov 29, 2007 at 09:10:16PM +, Andrew Coppin wrote:

Sebastian Sylvan wrote:

On Nov 29, 2007 6:43 PM, Andrew Coppin [EMAIL PROTECTED]
wrote:


I don't understand the ST monad.


...[and ST uses language extensions Andrew doesn't understand.]

(As far as ST goes, runST is unsafePerformIO renamed.  The only tricky
bit is proving safety.)


To put it another way, runST is unsafePerformIO where somebody has  
already done the safety proof for you (so you know it's 100% safe).   
The strange extensions are simply a device to make the safety proof  
work.  Indeed, if you drop the extensions it can all be made to work  
(just say runST :: ST () a - a) but you lose the safety proof and  
it's equivalent to unsafePerformIO.


-Jan

[The trick used in runST is one of my all-time favorite bits of type  
theory, and is what convinced me we wanted second-order types back  
before the first Haskell workshop.]


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] isWHNF :: a - IO Bool ?

2007-09-27 Thread Jan-Willem Maessen



On Sep 27, 2007, at 9:14 AM, Pepe Iborra wrote:


Actually, in 6.8 we can build isWHNF on top of the GHC-API.

First, you need to import the ghc package:


ghci -package ghc
GHCi, version 6.7: http://www.haskell.org/ghc/  :? for help


Then, you can define the isWHNF function as follows:


Prelude :m +RtClosureInspect
Prelude RtClosureInspect let isWHNF = fmap (isConstr . tipe) .  
getClosureData


Prelude RtClosureInspect :t isWHNF
isWHNF :: a - IO Bool


What the code above does is to inspect the info table associated to  
the value given, and check if the closure is a Constructor closure,  
i.e. in WHNF.


Very cool.  This is much nicer than when I asked much the same  
question a few years back (and I can think of all sorts of  
interesting things I can learn from the interface in that module).   
But what about indirection chasing?  Surely we want isWHNF to return  
True if we have an indirection to a WHNF.  Possibly one wants  
something a bit like this (untested, and rather depends on GHC's  
indirection semantics):


removingIndirections :: (forall c . c - IO b) - a - IO b
removingIndirections k a = do
closureData - getClosureData a
if isConstr (tipe closureData) then
removingIndirections (ptrs closureData ! 0)
else
k a


simpleIsWHNF :: a - IO Boolean
simpleIsWHNF =  fmap (isConstr . tipe) . getClosureData

isWHNF = removingIndirections simpleIsWHNF

-Jan-Willem Maessen


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] transparent parallelization

2007-09-18 Thread Jan-Willem Maessen



On Sep 18, 2007, at 4:24 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:


http://haskell.org/haskellwiki/GHC/Data_Parallel_Haskell


Wow this is cool stuff! It would be nice to have something like  
this for the Playstation 3 :-)


Regarding parallelism, I wander how this extension will compare to  
Sun's Fortress language, if/when it gets finally released.


The scope of the two is very different.  DPH proposes a single rather  
flexible data structure---nested Data Parallel Arrays (really as much  
list-like as array-like).  The underlying data structure is  
manipulated using bulk operations like zip, sum, and comprehensions.


By contrast, Fortress defines the notion of a Generator which you  
can think of as being akin to a parallel version of Data.Traversable  
or ListLike, where the fundamental primitive is a generalization of  
foldP and mapP.  This is strictly more general---we can define many  
of the operations in Data.PArr on arbitrary data types, permitting us  
to talk about the sum of the elements of a set, or mapping a function  
across a distributed array.  We can define nested data parallel  
arrays in Fortress.  There isn't (yet) an equivalent of the RULES  
pragma that permits Fortress to optimize combinations of function  
calls.  However, clever uses of type information and function  
overloading let Fortress do some interesting program transformations  
of its own (eg early exit for reductions with left zeros).  Finally,  
Fortress actually has a mechanism for defining new types of  
comprehensions (though this isn't in the language specification yet).
The other nice thing about Generators is that we can support  
consumption of large or infinite things, if we're very careful about  
how we do the consumption.  We're planning to write the equivalent of  
hGetContents that works over blocks of file data in parallel where  
possible, but processes streams as chunks of data become available.   
It remains to be seen how this will work out in practice, though.   
Our goal is something LazyByteString or rope-like.



So: DPH: available today (-ish), one (very flexible) data structure.   
Bulk operations on a data structure run in parallel.  Relies on RULES  
+ special compiler support (am I getting that right?  You can fuse  
multiple traversals of a common argument, which isn't doable using  
RULES, right?) to make it all run nicely.  A well-established  
performance model, cribbed from NESL, for the PArr bits.


Fortress: Arrays and lists currently built in.  Bulk operations on a  
data structure can run in parallel.  Ability to define new parallel  
types with carefully-tailored traversal (eg we have a PureList that's  
based on Ralf Hinze and Ross Paterson's finger tree where traversal  
walks the tree structure in parallel).  No optimization RULES yet (an  
interpreter doesn't optimize), but a good deal of type-based code  
selection.  Implementation less complete than DPH in general (even  
the Generator API is in flux, though the fundamental use of a foldP- 
like operation hasn't changed over time).


-Jan-Willem Maessen
 Longtime Haskell Hacker
 Fortress Library Developer

PS - By the way, working on Generators has increased my suspicion  
that comprehensions do NOT have a naturally monadic structure (which  
bugged me when I worked on parallel list traversal optimization in pH  
back in 1994).  It just happens that for cons-lists the two  
structures happen to coincide.  If anyone else has had similarly  
subversive thoughts I'd love to chat.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Dynamic thread management?

2007-08-13 Thread Jan-Willem Maessen



On Aug 13, 2007, at 2:53 PM, Mitar wrote:


Hi!

I am thinking about a model where you would have only n threads on a
n-core (or processor) machine. They would be your worker threads and
you would spawn them only once (at the beginning of the program) and
then just delegate work between them.

On 8/13/07, Jan-Willem Maessen [EMAIL PROTECTED] wrote:

The problem here is that while Cilk spawns are incredibly cheap,
they're still more than a simple procedure call (2-10x as expensive
if my fading memory serves me rightly).  Let's imagine we have a
nice, parallelizable computation that we've expressed using recursive
subdivision (the Cilk folks like to use matrix multiplication as an
example here).  Near the leaves of that computation we still spend
the majority of our time paying the overhead of spawning.  So we end
up actually imposing a depth bound, and writing two versions of our
computation---the one that spawns, which we use for coarse-grained
computations, and the one that doesn't, which we use when computation
gets fine-grained.  It makes a really big difference in practice.


But this could be done at the runtime too. If the
lazy-non-evaluated-yet chunk is big then divide it into a few parts
and run each part in its thread. But if the chunk is small (you are at
the end of the evaluation and you already evaluated necessary
subexpressions) you do it in the thread which encounters this
situation (which is probably near the end of the program or the end of
the monadic IO action).


I didn't make my point very well.  The hard part is determining  
exactly when a chunk is big or small without actually computing  
its value.  Consider recursive fib (which I use only because it's  
easy to understand, and has been used as an example of this problem  
by the Cilk implementors):


fib n = if n = 1 then n else fib (n-1) + fib (n-2)

Imagine we're computing (fib 30).  We can divide and conquer; plenty  
of parallelism there!  But do most calls to fib represent enough work  
to justify running them in parallel?  No, because most of the calls  
are to (fib 0) or (fib 1)!  We should only pay the spawn cost up to a  
certain bound --- say n = 5 --- and then run serially for smaller  
n.  This has a dramatic effect on how fast fib runs, but of course  
the best possible choice of bound is going to be machine-dependent.


We can instrument our program and have some chance of doing OK for  
fib :: Int - Int; it's not at all obvious what to do for:

   myFunction :: [Int] - (Int - Bool) - [Frob]

In effect, I end up needing to write myFunction with a built-in bound  
on computation, and I need to do it in such a way that the underlying  
systems knows that one branch should be serial and the other branch  
parallel.  This is the problem I was attempting to allude to above.   
Yes, we can decide on a function-by-function or callsite-by-callsite  
basis that enough work is being done to justify parallelism.  But  
the answer to this question is often maybe, or even no (as in fib).



Yes, you have parMap but the problem I saw with it (and please correct
me if I am wrong) is that it spawns a new thread for every application
of the function to the element?



But what if functions are small? Then
this is quite an overhead. And you have to know this in advance if you
want to use something else than the default parMap which is not always
possible (if we are passing a function as an argument to the function
which calls map).

For example:

calculate f xs = foldl (+) 0 $ map f xs -- or foldr, I am not sure


You seem to be arguing that we can pick the right implementation of  
map and fold (serial or parallel) here if we only knew that xs  
was big enough and f expensive enough.


I agree.  But that begs the question: let's assume calculate is a  
function that's called from numerous places, with a mixture of big  
and small arguments.  Now I need two versions of calculate, and I  
need to decide at each call site whether to call big calculate or  
small calculate.  We also need to make sure any eagerness we  
introduce is semantically sound, too, but I think we've got a pretty  
good handle on that part in practice between my work on resource- 
bounded evaluation in Eager Haskell, Rob Ennals' work on eager  
evaluation in GHC, and the Singh  Harris paper.


That isn't to say that any of this is impossible---but it's going to  
take a while to figure out how to get it right [it'd be a great Ph.D.  
project to even knock off the low-hanging fruit like fib and  
recursive matrix multiply].


Meanwhile, we also need to work hard educating programmers how to  
write code that'll run in parallel in the first place, and giving  
them the libraries that'll make it easy.


-Jan-Willem Maessen



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Dynamic thread management?

2007-08-13 Thread Jan-Willem Maessen



On Aug 11, 2007, at 12:35 PM, Brian Hurt wrote:



You guys might also want to take a look at the Cilk programming  
language, and how it managed threads.  If you know C, learning Cilk  
is about 2 hours of work, as it's C with half a dozen extra  
keywords and a few new concepts.  I'd love to see Cilk - C +  
Haskell as a programming language.


It was called pH, and we (meaning Alejandro Caro and myself)  
implemented it back in the mid/late 90's using Lennart Augustsson's  
hbcc front end (which he hacked to add a bunch of pH-specific  
syntax).  Arvind and Nikhil wrote a textbook on pH programming.


There are two problems, still: one is that laziness means you can't  
actually prove you need something until very close to the time you  
actually want it.  By the time I know that I'm adding f x to g y,  
it's probably too late to usefully run them in parallel (unless  
they're both *very* large).  We used eager evaluation in pH---to the  
point that we actually gave up the ability to manipulate infinite  
lazy data structures.  In NDP they've done much the same thing, first  
instrumenting the program to see that the eagerness they introduce  
won't disrupt execution.  Even the par annotation has this feel: we  
are telling the implementation that it's OK to do some computation  
even if it isn't yet obvious that we'll need the results.


The second problem is controlling the overhead.  More on this below.

The key idea of Cilk is that it's easier to deparallelize than it  
is to parallelize, especially automatically.  So the idea is that  
the program is written incredibly parallel, with huge numbers of  
microthreads, which are (on average) very cheap to spawn.  The  
runtime then deparallelizes the microthreads, running multiple  
microthreads sequentially within a single real thread (a worker  
thread).  Microthreads that live their entire life within a single  
real thread are cheap to spawn (as in not much more expensive than  
a normal function call cheap).


The problem here is that while Cilk spawns are incredibly cheap,  
they're still more than a simple procedure call (2-10x as expensive  
if my fading memory serves me rightly).  Let's imagine we have a  
nice, parallelizable computation that we've expressed using recursive  
subdivision (the Cilk folks like to use matrix multiplication as an  
example here).  Near the leaves of that computation we still spend  
the majority of our time paying the overhead of spawning.  So we end  
up actually imposing a depth bound, and writing two versions of our  
computation---the one that spawns, which we use for coarse-grained  
computations, and the one that doesn't, which we use when computation  
gets fine-grained.  It makes a really big difference in practice.


The programmer is free to use this trick in any programming  
language.  But we haven't yet figured out a way to *avoid* the need  
to do so.  This continues to worry me to this day, because making the  
right choices is black magic and specific to a particular combination  
of algorithm and machine.


That said, there is some reason for optimism: the overhead of  
creating work in Cilk is comparable to the overhead of creating a  
thunk in Haskell.


The problem that Cilk runs into is that it's, well, C.  It doesn't  
deal with contention at well at all- a single microthread blocking  
blocks the whole worker thread- meaning, among other things, that  
you can have false deadlocks, where one microthread blocks on  
another microthread in the same real thread, and thus acts like  
it's deadlocked even though it really isn't.


This is actually a fundamental problem with the threading model:  
there is no guarantee of fairness using work stealing, so if you do  
something that requires fair scheduling you get into a lot of trouble  
fast.  It's not fair to blame C for this.  You have to be very  
careful to define the interaction between fair IO-style threads and  
unfair get-my-work-done threads.


You have greatly increased the likelyhood of raceconditions as well  
(mutable data and multithreading just don't mix).  Plus you have  
all the normal fun you have with C bugs- wild pointers, buffer over  
runs, etc.


This, however, *is* C's fault. :-)

More on pH: we got our programs to scale, but had troubles going past  
8 threads.  We found ourselves limited by a non-parallel GC (my  
fault, but labor-intensive to get right) and the absence of  
parallelism in the underlying algorithms.  For the latter problem  
there simply is no magic bullet.


-Jan-Willem Maessen



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Dynamic thread management?

2007-08-10 Thread Jan-Willem Maessen



On Aug 10, 2007, at 9:31 AM, Hugh Perkins wrote:

Not many replies on this thread?  Am I so wrong that no-one's even  
telling me?  I find it hard to believe that if there were obvious  
errors in the proposition that anyone would resist pointing them  
out to me ;-)


So, that leaves a couple of possibilites: some people are agreeing,  
but see no point in saying; or noone cares, because we all only  
have 1 or 2 core machines.


I'm going to kindof run with the second possibility for now.   
However, I do believe it's the right time to solve this, what with  
64-core Niagara's around the corner and so on.


What would be neat would be a way to test solutions on simulated  
1024-core machines, using a single-core machine.  Are there any  
utilities or virtual environments around that might make this kind  
of testing feasible?


It's actually difficult to do realistic simulations of large machines  
like this; most of the performance effects you'll see depend on the  
behavior of the cache and memory subsystems, and it's difficult and  
expensive to simulate those well.  So, for example, you could use  
something like Simics to simulate a 1024-core machine, but it'd be  
expensive (Simics costs money), slow (100x? slower than ordinary  
execution) and non-parallel (so you wouldn't be able to run it on a  
currently-extant multiprocessor box in the hopes of speeding up the  
simulation).  Between the simulator slowdown and the fact that you're  
simulating 1024 cores using only 1 thread, you can expect to wait a  
long time for simulation results.


Also, these things tend to require an awful lot of care and feeding.

[Full disclosure: I don't personally work with Simics or its ilk, but  
my colleagues do.]


-Jan-Willem Maessen



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] A probably-stupid question about a Prelude implementation.

2007-06-22 Thread Jan-Willem Maessen



On Jun 22, 2007, at 2:30 PM, Dan Weston wrote:


This is how I think of it:

lazyIntMult :: Int - Int - Int
lazyIntMult 0 _ = 0
lazyIntMult _ 0 = 0
lazyIntMult a b = a * b

*Q 0 * (5 `div` 0)
*** Exception: divide by zero
*Q 0 `lazyIntMult` (5 `div` 0)
0

foldr evaluates a `f` (b `f` (c `f` ...))

Only f knows which arguments are strict and in which order to  
evaluate them. foldr knows nothing about evaluation order.


And, indeed, if you foldr a function with left zeroes, and you check  
for them explicitly as lazyIntMult and (||) do, then foldr is  
guaranteed to terminate early if it finds a zero.


z is a left zero of op if for all x, z `op` x = z.

This isn't the only time foldr will terminate early, but it is an  
important one.


-Jan-Willem Maessen



Dan




smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] String Hashing

2007-06-18 Thread Jan-Willem Maessen



On Jun 17, 2007, at 9:55 PM, Thomas Conway wrote:


Hi All,

I'm trying to figure out how to maximum performance out of one of my
inner loops which involves string hashing.

...
   mix :: Triple - Triple

This looks like a version of the Bob Jenkins hash function from  
burtleburtle.net.  I implemented the 32-bit version of this as follows:


mix :: Int32 - Int32 - Int32 - (Int32 - Int32 - Int32 - a) - a
mix a0 b0 c0 k0 =
  let mixR k a b c = (a-b-c) `xor` (c `shiftR` k)
  mixL k b c a = (b-c-a) `xor` (a `shiftL` k)
  mix3 k1 k2 k3 k a  b  c  =
  let a' = mixR k1 a  b  c
  b' = mixL k2 b  c  a'
  c' = mixR k3 c  a' b'
  in k a' b' c'
  in  (mix3 13 8 13 $ mix3 12 16 5 $ mix3 3 10 15 $ k0) a0 b0 c0

I mention this because your code writes the whole thing out  
longhand---which might be faster, or might not, but certainly misses  
the highest-level structural patterns in the original.  Your use of a  
data type to represent triples is probably better nowadays than my  
rather quirky use of CPS (in other words, this could have been a  
function Triple - Triple instead of the rather odd type you see above).


That said, I assume you instrumented your code and determined that  
hash collisions are actually a bottleneck for you, and that a hash  
table is the right structure to begin with?  I fell back on much- 
simpler multiplicative hashing schemes for Data.HashTable.  A  
multiply is much faster than vast amounts of bit-fiddling---but of  
course its collision behavior isn't nearly as good and this can be a  
problem with large data sets.  And note that the multiplicative  
hashing currently used in Data.HashTable doesn't require prime table  
sizes; in fact we use powers of two and table doubling.  When last I  
checked the result was faster than Data.Map, but not by much, and  
using strings probably wipes out that advantage vs. tries.


-Jan-Willem Maessen

smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Finding points contained within a convex hull.

2007-06-07 Thread Jan-Willem Maessen



On Jun 6, 2007, at 11:38 PM, Daniel McAllansmith wrote:
[Trying to find the domain of a bounded integer linear program]


How would you go about finding extreme vertices?  Would it be  
quicker than

solving the constraints for each max/min?


If you're just looking to find bounding coordinates in each  
dimension, you should be able to do this using linear programming.   
This will yield non-integer coordinate bounds which you can narrow as  
appropriate.


-Jan-Willem Maessen





smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Frisby grammars that have context

2007-05-30 Thread Jan-Willem Maessen



On May 29, 2007, at 10:44 AM, apfelmus wrote:


Mark T.B. Carroll wrote:
I've been playing with Text.Parsers.Frisby to see how it stacks  
against

other options and, while it's been great so far, I am finding that I
can't encode a grammar where what's acceptable depends on what's  
already

been parsed in some nontrivial way.
[...]
Is this supposed to not be possible in Frisby, or (quite likely) am I
missing something that allows me to?


It's intentionally impossible. Frisby uses a dynamic programming
approach that crucially depends on the fact that the grammar in  
question

is context-free (actually something related, but the effect is the
same). You're trying to parse a context-sensitive language.


Interestingly, Rats (a packrat-based parser generator for Java)  
permits you to insert arbitrary boolean conditions into the grammar;  
if the test fails, we simply record this as parsing this nonterminal  
failed in the memo table, I believe.  So I believe it'd actually  
feasible to incorporate some of the checking you're looking for into  
Frisby.  Of course, as others point out, you can always generate  
grammar fragments up front if you have a fixed set of things you're  
looking for in any given program run (something a parser tool like  
Rats isn't capable of---though with its parametric module system Rats  
can come *very* close, doing multiple compile-time instantiations of  
grammar fragments).


Packrat parsing, by the way, has made it vastly easier to structure  
and maintain a grammar for a highly ambiguous, hard-to-parse language  
(Fortress).  I recommend it.



Sometimes, you can avoid context-sensitivity if there's a way to parse
the token in question regardless of whether it's valid. For example,
Pascal is a context-sensitive language because you may not use a
variable before it has been declared:

  procedure Foo(x:Integer)
  begin
y := 1;
  end;

This not a correct Pascal program, nevertheless the parse succeeds  
just

fine. ...


I'm pretty sure predicates in the grammar would let you catch this  
error at parse time (if you maintained a symbol table and looked up  
expression occurrences in it as you parsed).  That said, I wouldn't  
necessarily try to structure my parser that way.


-Jan-Willem Maessen


smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Language extensions

2007-05-28 Thread Jan-Willem Maessen



On May 28, 2007, at 7:32 AM, Claus Reinke wrote:

I meant to imply more that it's very difficult to understand why  
it's useful. If an extension were truely *useless*, I doubt those  
guys at GHC would have bothered spending years implementing them.


Most of the documents that describe these things begin with  
suppose we have this extremely complicated and difficult to  
understand situation... now, we want to do X, but the type system  
won't let us. Which makes it seem like these extensions are only  
useful in extremely complicated and rare situations.


keep in mind that paper space is a precious and limited resource. the
need for extensions tends to arise in practice first, but those  
real examples
are far too big and complex to fit into those limitations. it is  
very difficult to
come up with examples that are small enough to fit, yet complex  
enough to
exhibit the problem. which means that the examples usually look  
artificial,
but small and complete, or realistic, but so large that their  
presentation

has to be shallow enough to border on vague.


But I do wonder if we shouldn't declare a moratorium on examples that  
involve interpreters for simply-typed languages (which tend to  
characterize none of the problems I'm actually trying to solve---and  
that includes fiddling with non-simply-typed languages of a similar  
sort) in favor of examples which actually perform some sort of a  
useful manipulation.


This is why I absolutely LOVE functional pearls of all sorts, by the  
way.


-Jan-Willem Maessen




smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] New HaskODell user groups

2007-04-27 Thread Jan-Willem Maessen

Yes.  All of BlueSpec, myself, Rob Dockins (right?), and undoubtedly  
others.  I know a few classes at MIT (none of them required) use  
Haskell.


I expect it'd be easy to get room space at MIT if there were interest  
in a Boston-area group.  For a less formal group one could meet  
somewhere like the Diesel in Davis Sq.


I've given this a teensy bit of thought, but I've had too many other  
fish to fry lately. :-)


-Jan-Willem Maessen

On Apr 27, 2007, at 11:22 AM, Seth Gordon wrote:


Donald Bruce Stewart wrote:


P.S.  Some obvious user group candidates, in my opinion, would be a
Portland group, a Bay Area group and something at Chalmers... ;-)


Are there any other Haskellers in the Boston area?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: [Haskell] ANN: Data.Proposition 0.1

2007-04-18 Thread Jan-Willem Maessen

Hmm, your BDD implementation claims (in the comment at the top) that  
Equality is fast and accurate.  But you don't do sharing  
optimizations, and use derived Eq (rather than object identity), so  
it's exponential in the number of nodes.  Consider:


A
   / \
  B   |
   \ /
C
   / \
  D   |
   \ /
E

Here there are five nodes, but we will do two calls to see if C==C,  
four calls to see if E==E.  If I continue the diagram downwards I add  
two nodes and double the number of equality calls on the leaf.


The sharing optimizations are rather important to making an efficient  
BDD implementation.  I haven't yet seen a Haskell-only BDD  
implementation that didn't have one or more of the following flaws:
  1) IO in all the result types, even though the underlying  
abstraction is pure.

  2) Inability to garbage collect unused nodes.
  3) Loss of sharing leading to loss of fast equality.

Does anyone have a Haskell-only BDD library which avoids all these  
problems?  I wrote one with flaw #2, but was unable to make weak  
pointers and finalization behave in anything like a sensible fashion  
and gave up as it was a weekend project.  The concurrency +  
unsafePerformIO mix was trickier than I initially expected, too.


That said, this BDD implementation is pretty similar to the  
performance  behavior you'd get from Data.IntSet (where the bits of  
your int correspond to the True/False values of your variables).


-Jan-Willem Maessen


On Apr 16, 2007, at 9:07 AM, Neil Mitchell wrote:


Hi,

I am now releasing Data.Proposition. This library handles
propositions, logical formulae consisting of literals without
quantification. It automatically simplifies a proposition as it is
constructed using simple rules provided by the programmer.
Implementations of propositions in terms of an abstract syntax tree
and as a Binary Decision Diagram (BDD) are provided. A standard
interface is provided for all propositions.

Website: http://www-users.cs.york.ac.uk/~ndm/proposition/
Darcs: darcs get --partial http://www.cs.york.ac.uk/fp/darcs/ 
proposition/
Haddock: http://www.cs.york.ac.uk/fp/haddock/proposition/Data- 
Proposition.html


The Haddock documentation also has a short example in it.

This library is used substantially in the tool I have developed for my
PhD, and has been extensively tested.

Thanks

Neil
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell




smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Weaving fun

2007-04-13 Thread Jan-Willem Maessen



On Apr 12, 2007, at 9:39 PM, Matthew Brecknell wrote:


Jan-Willem Maessen:

Interestingly, in this particular case what we obtain is isomorphic
to constructing and reversing a list.


Jan-Willem's observation also hints at some interesting performance
characteristics of difference lists. It's well known that difference
lists give O(1) concatenation, but I haven't seen much discussion  
of the

cost of conversion to ordinary lists.


Nice analysis, thanks to both of you.  I think a lot of this folklore  
isn't widely understood, particularly the fact that the closures  
constructed by difference lists are isomorphic to trees, with nodes  
corresponding to append/compose and leaves corresponding to empty/ 
singleton.
So you pay the cost for append each time you flatten---the difference  
list trick is only a win if you flatten to an ordinary list once and/ 
or consume the entire list each time you flatten it.  I'd had an  
intuitive notion of how this worked, but this spells it out nicely.


Of course, once you represent things like so:

data DiffList a = Segment [a]
| DiffList a :++ DiffList a

toList :: DiffList a - [a]
toList dl = toListApp dl []

toListApp :: DiffList a - [a] - [a]
toListApp (Segment s) = (s++)
toListApp (a:++b) = toListApp a . toListApp b

You can start thinking about all sorts of other interesting  
questions, beyond just transforming to a list and eta-abstracting  
toListApp.  The next thing you know, you're writing a better pretty  
printer or otherwise mucking about with the DiffList type itself to  
tailor it for your own nefarious purposes.


-Jan


smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: [Haskell] Fixpoint combinator without recursion

2007-04-04 Thread Jan-Willem Maessen



On Apr 4, 2007, at 5:01 PM, Edsko de Vries wrote:


Yeah, it's rather cool. IIRC, this style of encoding of recursion
operators is attributed to Morris.


Do you have a reference?


Before the advent of equality coercions, GHC typically had problems
generating code for these kinds of definitions. Did you test this
with a release version? If so, how did you get around the code-
generation problem? Is it the NOINLINE pragma that does the trick?


Yep. Without the NOINLINE pragma, ghc tries to inline the  
definition of
fac, expanding it ad infinitum (this is a known bug in ghc and  
mentioned

in the ghc manual). Hugs doesn't have a problem though.


I keep waiting for someone to use this fact to cook up a poor man's  
partial evaluation---use fix for static recursion, and ordinary  
recursive definitions for dynamic recursion.  I fiddled with this a  
bit in the pH days (it had the same bug, for much the same reason).


-Jan-Willem Maessen



Edsko
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] idea for avoiding temporaries

2007-03-09 Thread Jan-Willem Maessen

 is the pessimal representation for  
them given that we're updating the whole array).


* Linear or Uniqueness types are almost what we want.  I think Josef  
Svenningson was the one who captured this the best: Uniqueness type  
*require* that the *caller* of these routines make sure that it is  
not sharing the data.  So we need two copies of our linear algebra  
library---one which takes unique arrays as arguments and updates in  
place, and one which takes non-unique arrays and allocates.  And we  
have to pick the right one based on context.  What we want, it seems  
to me, is one library and a compiler which can make informed judgments.


* We could imagine tracking uniqueness dynamically at run time, using  
something like reference counting for all our arrays.  But we need to  
do the reference counting precisely---this is pretty much the most  
inefficient way possible of tracking the storage, and doesn't play  
well at all with using efficient GC elsewhere in our programs.  The  
inefficiency might be worth it for arrays, but Haskell is polymorphic  
and in many cases we need to treat all our data the same way.


* Finally, I'll observe that we often want to use slightly different  
algorithms depending upon whether we're updating in place or  
computing into fresh storage.  Often copying the data and then  
updating it in place is not actually a good idea.


I'd love to hear if anyone has insights / pointers to related work on  
any of the issues above; I'm especially keen to learn if there's work  
I didn't know about on fusion of multiple traversals.  In my day job  
with Fortress we are looking at RULES-like approaches, but they  
founder quickly because the kind of problems David is trying to solve  
are 90% of what our programmers want to do.


-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Article review: Category Theory

2007-01-19 Thread Jan-Willem Maessen



On Jan 19, 2007, at 1:07 PM, Brian Hulley wrote:


Lennart Augustsson wrote:


On Jan 19, 2007, at 08:05 , [EMAIL PROTECTED] wrote:
Thus, Hask is not a category, at least not as defined in the  
article.

The problem is that (either) morphisms or the morphism composition
('.')
are not internalized correctly in Haskell.




And this is why some of us think that adding polymorphic seq to
Haskell was a mistake. :(


I've often wondered why seq is the primitive and not $!
Would this solve the problem?


Sadly, no:

seq = (const id $!)

-Jan-Willem Maessen

Is there any solution that would allow excess laziness to be  
removed from a Haskell program such that Hask would be a category?


Thanks, Brian.
--
http://www.metamilk.com
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Stronger static types, was Re: [Haskell-cafe] Re: Versioning

2006-12-22 Thread Jan-Willem Maessen



On Dec 21, 2006, at 5:03 PM, Jacques Carette wrote:


...
What must be remembered is that full dependent types are NOT needed  
to get a lot of the benefits of dependent-like types.  This is what  
some of Oleg's type gymnastics shows (and others too).  My interest  
right now lies in figuring out exactly how much can be done  
statically.
For example, if one had decent naturals at the type level (ie  
naturals encoded not-in-unary) with efficient arithmetic AND a few  
standard decision procedures (for linear equalities and  
inequalities say), then most of the things that people currently  
claim need dependent types are either decidable or have very strong  
heuristics that work [1].


My understanding is that BlueSpec did roughly this.  As we're  
discovering in Fortress, type-level naturals are a big help; faking  
it really is horrible, as unary representations are unusable for real  
work and digital representations require a ton of stunts to get the  
constraints to solve in every direction (and they're still ugly).


I for one would welcome a simple extension of Haskell with type-level  
nats (the implementor gets to decide if they're a new kind, or can  
interact with * somehow).


-Jan-Willem Maessen

[PS: hadn't seen the LNCS reference before, thanks to Jacques for  
sending that along.]





smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Debugging partial functions by the rules

2006-11-15 Thread Jan-Willem Maessen



On Nov 15, 2006, at 3:21 AM, [EMAIL PROTECTED] wrote:



Donald Bruce Stewart wrote:

So all this talk of locating head [] and fromJust failures got me
thinking:

Couldn't we just use rewrite rules to rewrite *transparently*
all uses of fromJust to safeFromJust, tagging the call site
with a location?


I'm sorry for shifting the topic: I'm wondering if, rather than trying
to make an error message more informative, we ought to make sure that
no error will ever arise?
...
This topic has been discussed at length on this list. It seems that
the discussion came to the conclusion that eliminating head of
the empty list error is possible and quite easy in Haskell.

  http://www.haskell.org/pipermail/haskell-cafe/2006-September/ 
017915.html


But this code contains a function with_non_empty_list (or perhaps  
with_nonempty_list or withNonemptyList or...) which has the same  
confusing failure mode as the examples under discussion.   
Fundamentally, if we try to package up check for failure in a  
function, whether the function does something useful as well (head,  
tail, fromJust) or not (withNonemptyList), we miss out on useful  
contextual information when our program fails.


In addition, we have this rather nice assembly of functions which  
work on ordinary lists.  Sadly, rewriting them all to also work on  
NonEmptyList or MySpecialInvariantList is a nontrivial task.


Which isn't to say that I disapprove of this style: check your  
invariants early, maintain them as you go.  I'm quite enjoying the  
escH paper, but I get through about a column per day between  
compiles. :-)


-Jan-Willem Maessen





smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: non-total operator precedence order (was:Fractional/negative fixity?)

2006-11-10 Thread Jan-Willem Maessen



On Nov 9, 2006, at 7:16 PM, Benjamin Franksen wrote:


Carl Witty wrote:


On Thu, 2006-11-09 at 22:20 +0100, Benjamin Franksen wrote:

Henning Thielemann wrote:
 Maybe making fixity declarations like type class instance  
declarations

 is
good.


I thought so too at first but, having thought about it for a  
while, I now

see this will cause /major/ problems. The precedence relations as
declared explicitly by the programmer must form a DAG, with the  
vertices
being the operator classes with equal precedence. There are two  
ways you
can break the DAG: by introducing a 'smaller' or 'larger'  
relation when
another module has already declared them to have equal precedence  
(resp.

the other way around); or by introducing a cycle. Both can be caused
simply by importing yet another module. I think it would be  
unacceptable
not to provide some way for the programmer to resolve such  
conflicts.


[ ... possibilities for resolving conflicts omitted ... ]

Another possibility is:

If you have operators op1 and op2, where the compiler sees  
conflicting
requirements for the precedence of op1 and op2, then they are  
treated as

non-associative relative to each other: the expression
  a op1 b op2 c
is illegal, and the programmer must instead write
  (a op1 b) op2 c
or
  a op1 (b op2 c)


It's a possibility. However, I fear that such conflicting  
precedences might
not come in nice little isolated pairs. For instance, each operator  
that is
in the same precedence class as op1 (i.e. has been declared as  
having equal
precedence) will now be 'incompatible' with any that is in the same  
class

as op2, right?


Well, look at it from the perspective of the reader.  Does the reader  
of your code know beyond a shadow of a doubt what the intended  
precedence will be in these cases?  If not, there should be  
parentheses there---quite apart from what the parser may or may not  
permit you to do.  If the parser can't figure it out, you can bet  
your readers will have trouble as well.



It gets worse if the conflict creates a cycle in a chain of
large operator classes. Thus one single bad declaration can tear a  
gaping

hole into an otherwise perfectly nice and consistent DAG of precedence
order relations, possibly invalidating a whole lot of code.


Requiring parenthesization solves these problems in a stroke.

-Jan-Willem Maessen
 who can't reliably parenthesize the C expression   a==b  34 | 17
 (yes, the horrific whitespace is deliberate!)



Although one
could view this as a bug in the offending module it makes me somewhat
uneasy that one additional import can have such a drastic effect on  
the

code in a module /even if you don't use anything from that module/.

Ben

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] function result caching

2006-10-13 Thread Jan-Willem Maessen



On Oct 13, 2006, at 4:05 AM, Tomasz Zielonka wrote:


On Thu, Oct 12, 2006 at 08:40:44PM -0700, John Meacham wrote:
it is too bad IntSet and IntMap are strict in their subtrees, it  
would

have been nice to provide things like

out of curiosity, why are IntMap and IntSet strict in their subtrees.


I guess the reason is balancing. I can't think of any way of  
balancing a

lazy tree that wouldn't break abstraction.


Uh, Patricia trees aren't balanced in the usual sense.  There is  
exactly one tree structure for a given set of keys, regardless of  
insertion order etc.  (IntSet and IntMap workes approximately as Carl  
Witty described last I checked, though I won't swear to whether bits  
are taken low to high or vice versa.)


I had assumed the strictness was to avoid deferring O(n) insertion  
work to the first lookup operation---though really it makes no  
difference in an amortized sense.


-Jan-Willem Maessen



Perhaps I would be possible to use some trick to rebalance an existing
tree to account for what's currently evaluated. But it could be very
tricky to get it right and it would certainly go beyond Haskell 98.

Best regards
Tomasz
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Traversing a graph in STM

2006-09-19 Thread Jan-Willem Maessen



On Sep 18, 2006, at 4:47 AM, Einar Karttunen wrote:


On 18.09 01:23, Josef Svenningsson wrote:

On 9/17/06, Jan-Willem Maessen [EMAIL PROTECTED] wrote:

You can associate a unique name with each traversal, and store a set
of traversals at each node (instead of a mark bit).  But this set
grows without bound unless you follow each traversal with a  
cleaning

traversal which removes a traversal tag from the set.  And you'd
need some way to generate the unique names.


Well, if your set implementation used weak pointers there would be no
need for a cleaning traversal. The garbage collector will take  
care of

that. The only slightly tricky thing is to keep a live pointer to the
unique traversal name during the entire of the traversal. But I don't
think that should be a big problem.



This just amounts to saying we can use the GC to implement the  
cleanup traversal on our behalf.  I'd be quite surprised if this  
were actually more efficient.  But this is all a bit moot, as Einar  
observes:



This suffers from the problem that two traversals reading the
same parts of the graph would have a good chance to make each other
retry.


Any solution which stores traversal states in the nodes has this  
problem.  Fundamentally you can't  update the state of graph nodes in  
any way using STM and expect to run multiple traversals concurrently  
over the same subgraph.



I am thinking of going the StableName route. But as this happens
inside STM Data.HashTable does not help much (without using
unsafeIOToSTM and dealing with retries).


I'd like to make an STM version of Data.HashTable, but it requires  
implementing some sort of STMArray, or using an array of TVars and  
slowing the hashtable implementation to a crawl.  Without access to  
the internals of STM, implementing some form of STMArray turns out to  
be awfully difficult (I understand the implementation techniques, but  
the ones I understand involve adding frobs to the STM implementation  
itself or degenerating to maps).  This would also address the lack of  
a concurrent hash table (the other alternatives for which are to run  
into a similar set of shortcomings for IOArrays or STArrays, where  
I'd want to have a CAS operation or some sort of compact array of  
MVars).


I'm always a little conflicted about making StableNames for keys into  
a hash table.  Internally GHC creates a giant invisible hash table of  
StableNames, just so we can look things up in it and then use the  
result to insert stuff into our user-visible hashtable.



If StableNames were in Ord using Set (StableName T)
would be nice. But in the current implementation one has to resort
to IntSet Int [StableName T] which is not pretty at all.


I agree.  I wish StableNames were ordered.

-Jan-Willem Maessen

[PS - Does the StableName-internal hash table use the same hash  
function as the old Data.HashTable did?  The comments in  
Data.HashTable suggested it might.  If so, it might be a good idea to  
switch to something like the multiplicative hash function in my  
Data.HashTable code; this gets much of the benefit of switching hash  
table implementations at pretty low cost.  It's not the best hash by  
a long stretch, but it's much better than what was there.]




- Einar Karttunen




smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Traversing a graph in STM

2006-09-17 Thread Jan-Willem Maessen



On Sep 13, 2006, at 3:37 AM, Einar Karttunen wrote:


Hello

Is there an elegant way of traversing a directed graph in STM?

type Node  nt et = TVar (NodeT nt et)
type Edge  et= TVar et
data NodeT nt et = NodeT nt [(Node nt et, Edge et)]

type MyGraph = Node String Int

When implementing a simple depth first search we need a way to
mark nodes (= TVars) as visited. In addition multiple concurrent
searches should be possible.

Is it possible to avoid passing around an explicit Set of visited
nodes?


You can associate a unique name with each traversal, and store a set  
of traversals at each node (instead of a mark bit).  But this set  
grows without bound unless you follow each traversal with a cleaning  
traversal which removes a traversal tag from the set.  And you'd  
need some way to generate the unique names.


If you're performing no side effects or accesses to TVars (which you  
aren't, as your graph representation requires reading TVars all over  
the place), you can in principle use the following horrid kludge:

  1) keep a TVar indicating visited at each node.
  2) within an atomic, perform your traversal in the usual  
sequential way, setting the TVar

 each time a node is visited.
  3) when you're done, package up your desired result in an  
exception and throw it.

 All your marking work will be un-done and your result will emerge.
  4) catch the exception outside the atomic and extract the result  
again.


However, this will still preclude two simultaneous traversals of  
overlapping portions of the graph.  Really, you're just asking the  
STM mechanism to maintain the hash table on your behalf; in practice  
you will be better off doing it yourself.


Really, there's no such thing as a free lunch here, I'm afraid.  If  
you want to concurrently traverse a graph, you need to keep separate  
cycle-avoidance state for each traversal.  Using TVars doesn't change  
that basic algorithmic detail.



And is there a better way of getting TVar identity than
StableNames?


Would that there were!

-Jan-Willem Maessen



- Einar Karttunen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Why does this program eat RAM?

2006-09-13 Thread Jan-Willem Maessen



On Sep 5, 2006, at 7:05 AM, Chris Kuklewicz wrote:


Bulat Ziganshin wrote:

Hello Bertram,
Tuesday, September 5, 2006, 12:24:57 PM, you wrote:

A quick hack up to use Data.ByteString uses a lot less ram, though
profiling still shows 95% of time spent in the building the Map.

Data.HashTable may be a faster alternative for Map (if ordering isn't
required)


I found Data.HashTable a bit slow (ghc 6.4).  Perhaps HsJudy (see  
http://cmarcelo.blogspot.com/ and http://judy.sourceforge.net/ and  
http://www.mail-archive.com/haskell@haskell.org/msg18766.html )


I'd urge programmers to give the version of Data.HashTable in 6.6 a  
try.  It uses a simple multiplicative hash function (a la Knuth)  
which seems to be dramatically better in practice.  It also uses a  
rather simpler hash table implementation which seems to perform  
slightly better in practice (if this isn't true for your application  
I'm keen to know).


As Udo Stenzel points out, we still need to examine the entire string  
in order to hash, and some problems may do better with something like  
a StringMap---I understand many information retrieval applications  
use Trie-like data structures for exactly this reason.


-Jan-Willem Maessen


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] practice problems?

2006-09-03 Thread Jan-Willem Maessen



On Sep 3, 2006, at 8:22 AM, Brian Hulley wrote:


Tamas K Papp wrote:

On Sun, Sep 03, 2006 at 12:47:45PM +0400, Bulat Ziganshin wrote:

i also suggest you to start write some library. there is enough
useful libs that are still bnot implemented because lack of time  
(and

insterest in such simple code) on side of more experienced
programmers. i once proposed you to write strings library, another
interesting and useful thing will be gzip/bzip2-lib bindings


Bulat,

I would be happy to write a strings library, I just don't know  
that it

is supposed to do... (I have no CS education, only math/economics).
If you show me the specifications or documentation in another
language, I would write one as practice.


What about a library for interval arithmetic [1]?


I'll sign up to write this just as soon as I can control the floating- 
point rounding mode in purely functional code with low overhead. :-)


Without this control, you end up with a toy which can't actually be  
used for real work.  Sadly, I don't think GHC's built-in thread  
scheduler plays nicely with floating-point mode changes unless you do  
them across the entire program for the entire run.


The interval multiplication algorithm turns out to be exciting. :-)

-Jan-Willem Maessen



smime.p7s
Description: S/MIME cryptographic signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] multiple computations, same input

2006-03-28 Thread Jan-Willem Maessen



On Mar 28, 2006, at 1:02 AM, Tomasz Zielonka wrote:


On Mon, Mar 27, 2006 at 03:10:18PM -0800, Greg Fitzgerald wrote:

hold a part of the data in memory while you show the first one,


Here would be a better example then.

f lst = show (sum (filter ( 1) lst), sum (filter ( 2) lst))

It ought to be *possible* to compute both operations without  
holding onto

any of the list elements.


I wonder if it would be possible to remove the space-leak by  
running both

branches concurrently, and scheduling threads in a way that would
minimise the space-leak. I proposed this before

  http://www.haskell.org/pipermail/haskell-cafe/2005-December/ 
013428.html


I would like to hear opinions from some compiler gurus.


This is possible in principle with something like resource-bounded  
eagerness, but it's not at all easy.  The problem is this: when lst  
gets big, you need to identify who's hanging on to it, and figure out  
that they are actually planning to consume it and come up with  
something smaller as a result.  This is all pretty heavyweight---not  
hard in principle, but hard enough in practice that it may not be  
worth the investment.


That said, there's a transformation that goes something like this:

a = foldr f z xs ==   (a,b) = foldr (f `cross` g)  
(z,y) xs

b = foldr g y xs

This could in principle at least pluck the lowest-hanging fruit (sum,  
filter, etc.).

However it runs into some significant problems:
- Only works with folds
- Has some problems with bottoms, if I recall rightly
- Not expressible using something like RULES;
   requires a special transformation in the compiler.
- It is often a pessimization.

That last point is a killer.



Best regards
Tomasz
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Positive integers

2006-03-27 Thread Jan-Willem Maessen



On Mar 26, 2006, at 4:35 PM, Daniel McAllansmith wrote:

[Discussion of positive integers and Words]



I was thinking about several things in this thread, torsors, overflow
semantics, bounds checking...
I wonder if there would be any merit in being able to define  
constrained

subsets of integers and the semantics when/if they overflow.


Oddly, I've just finished coding this up, with type-level bounds  
(represented by type-level ints).  It's a big pile of modules on top  
of John Meacham's type-level Nats library, which add type-level Ints  
(as non-normalized Nat pairs), unknown endpoints (permitting Integer  
to fit in the same framework), and the actual bounded ints themselves.


Naturally, I needed to define my own versions of the functionality in  
Eq, Ord, and Num.  These resolve statically as much as possible  
(including some possibly dubious behavior with singleton intervals).   
That said, I don't try to do everything at the type level---it became  
too tiring, with not enough plausible benefit.


Any and all: Drop me a line if you are interested, it's a stack of  
3-4 modules and at best half-baked.  I'd just gotten a mostly- 
complete version.


-Jan-Willem Maessen


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Badly designed Parsec combinators?

2006-02-17 Thread Jan-Willem Maessen



On Feb 16, 2006, at 7:32 PM, John Meacham wrote:


...

  Again that doesn't compile, because when requires a ()-returning
monad as its second parameter, but the string parser returns  
String.
Same thing with if-then-else, when used to switch IO actions and  
such:
the IO actions must fully match in type, even if the returned  
value will

be discarded, and again that can be trivially resolved by adding the
return ().


This is a straight up bug in the definition of when I hope we fix. it
should have type

when :: Bool - IO a - IO ()
when = ...


Arguably this could be made true of *every* function which presently  
takes m () as an argument.  That is, we could systematically go  
through the libraries and convert every function of type:


f :: (Monad m) =  - m () - ...

into

f :: (Monad m) =  - m otherwiseUnusedTypeVariable - ...

This would basically eliminate the need for ignore.  I can see  
taste arguments in either direction, but really the language ought to  
pick an alternative and use it everywhere (including for ).


-Jan-Willem Maessen



John


--
John Meacham - ⑆repetae.net⑆john⑈
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Hashtable woes

2006-02-15 Thread Jan-Willem Maessen



On Feb 15, 2006, at 3:42 AM, Ketil Malde wrote:



Not sure how relevant this is, but I see there is a recently released
hash library here that might be a candidate for FFIing?

https://sourceforge.net/projects/goog-sparsehash/


The real issue isn't the algorithms involved; I saw the best  
performance from the stupidest hash algorithm (well, and switching to  
multiplicative hashing rather than mod-k).  The problem is GC of hash  
table elements.  FFI-ing this library would give us really good  
algorithms, but the GC would all indirect through the FFI and I'd  
expect that to make things *worse*, not better.


-Jan



| An extremely memory-efficient hash_map implementation. 2 bits/entry
| overhead! The SparseHash library contains several hash-map
| implementations, including implementations that optimize for space
| or speed.

-k
--
If I haven't seen further, it is by standing in the footprints of  
giants


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re[2]: strict Haskell dialect

2006-02-05 Thread Jan-Willem Maessen



On Feb 5, 2006, at 2:02 PM, Brian Hulley wrote:


...
I wonder if current compilation technology for lazy Haskell (or  
Clean) has reached the theoretical limits on what is possible for  
the compiler to optimize away, or if it is just that optimization  
has not received so much attention as work on the type system etc?


I would answer resoundingly that there is still a good deal to  
learn / perfect in the compilation technology, but there's been a  
lack of manpower/funding to make it happen.


-Jan-Willem Maessen



Regards, Brian.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re[2]: strict Haskell dialect

2006-02-04 Thread Jan-Willem Maessen



On Feb 3, 2006, at 8:16 PM, Brian Hulley wrote:


Jan-Willem Maessen wrote:


I pointed out some problems with strict Haskell in a recent talk, but
I think it'd be worth underscoring them here in this forum.


Is the text of this talk or points raised in it available online  
anywhere?


snip There is one very difficult piece of syntax in a strict  
setting: The

*where* clause.  The problem is that it's natural to write a bunch of
bindings in a where clause which only scope over a few conditional
clauses.  I'm talking about stuff like this:

f x
  | p x   = . a ...a . a  a ...
  | complex_condition = . b .. b ... b ..
  | otherwise = . a ... b .
  where a = horrible expression in x which is bottom when
complex_condition is true.
b = nasty expression in x which doesn't terminate when p x
is true.
complex_condition = big expression which
 goes on for lines and lines
 and would drive the reader
 insane if it occurred in line.


Surely it would not be too difficult for the compiler to only  
evaluate the where bindings that are relevant depending on which  
guard evaluates to True ie in your example, the binding for a would  
be evaluated if p x is True, otherwise the complex_condition would  
be evaluated, and if True, b would be evaluated, otherwise a and b  
would be evaluated: ...


In principle, yes, this is eminently doable.  But the translation  
becomes surprisingly messy when the bindings in question are mutually  
recursive.  Certainly it's not a simple syntax-directed translation,  
in contrast to essentially every other piece of syntactic sugar in  
the language.


-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re[2]: strict Haskell dialect

2006-02-03 Thread Jan-Willem Maessen



On Feb 3, 2006, at 2:33 PM, Brian Hulley wrote:


Bulat Ziganshin wrote:

Hello Wolfgang,

Friday, February 03, 2006, 1:46:56 AM, you wrote:

i had one idea, what is somewhat corresponding to this discussion:

make a strict Haskell dialect. implement it by translating all
expressions of form f x into f $! x and then going to the
standard (lazy) haskell translator. the same for data fields - add
to all field definitions ! in translation process. then add to
this strict
Haskell language ability to _explicitly_ specify lazy fields and
lazy evaluation, for example using this ~ sign


[Apologies for replying to a reply of a reply but I don't seem to  
have received the original post]


I've been thinking along these lines too, because it has always  
seemed to me that laziness is just a real nuisance because it hides  
a lot of inefficiency under the carpet as well as making the time/ 
space behaviour of programs difficult to understand...


I pointed out some problems with strict Haskell in a recent talk, but  
I think it'd be worth underscoring them here in this forum.


First off, I should mention that I was one of the main implementors  
of pH, which had Haskell's syntax, but used eager evaluation.  So  
what I'm about to say is based on my experience with Haskell code  
which was being eagerly evaluated.


There is one very difficult piece of syntax in a strict setting: The  
*where* clause.  The problem is that it's natural to write a bunch of  
bindings in a where clause which only scope over a few conditional  
clauses.  I'm talking about stuff like this:


f x
  | p x   = . a ...a . a  a ...
  | complex_condition = . b .. b ... b ..
  | otherwise = . a ... b .
  where a = horrible expression in x which is bottom when  
complex_condition is true.
b = nasty expression in x which doesn't terminate when p x  
is true.

complex_condition = big expression which
 goes on for lines and lines
 and would drive the reader
 insane if it occurred in line.

Looks pretty reasonable, right?  Not when you are using eager or  
strict evaluation.  I think a strict variant of Haskell would either  
end up virtually where-free (with tons of lets instead---a pity as I  
often find where clauses more readable) or the semantics of where  
would need to change.


This came up surprisingly more often than I expected, though it was  
hardly a universal problem.  The more interesting the code, the  
more likely there would be trouble in my experience.


A bunch of other stuff would have to be added, removed, or modified.   
The use of lists as generators would need to be re-thought (and  
probably discarded), idioms involving infinite lists would have to  
go, etc., etc.  But this is a simple matter of libraries (well, and  
which type(s) get(s) to use square brackets as special builtin  
notation).


-Jan-Willem Maessen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Haskell Speed

2006-01-04 Thread Jan-Willem Maessen



On Jan 4, 2006, at 5:30 AM, Simon Marlow wrote:


On 30 December 2005 01:23, Jan-Willem Maessen wrote:


Probably.  The minimum table chunk size was rather large.  I have
been experimenting (tests are running even as I type) with alternate
implementations of Data.HashTable.  So far the winning implementation
is one based on multiplicative hashing and simple table doubling.
This seems to be consistently competitive with / faster than
Data.Map.  At this point my inclination is to make the whole suite
available:

Data.HashTable.Class
Data.HashTable.DataMap
Data.HashTable.Doubling
Data.HashTable.Flat
Data.HashTable.Freeze
Data.HashTable.Modulus
Data.HashTable.Multiplicative
Data.HashTable.Original
Data.HashTable.Test
Data.HashTable

I've separated out the hashing technique (Modulus, Multiplicative)
from the hash table implementation.  Note that a few of the above are
bogus, and this is only a fraction of the implementations tried thus
far.

I need to get distribution clearance for a bunch of this code from my
employer, and figure out how to package it.  The latter may be
tricky, as Data.Hashtable is currently rather intimately tied to a
bunch of rather tricky bits of GHC.


I wonder if you could put together a drop-in replacement for
Data.HashTable that we can incorporate?  There's not much point in us
still providing the inefficient version as standard after you've done
all this work to figure out how to do it better.


That'd be good---though what qualifies as the efficient version  
will, I think, change based on the GC changes you said you made (I  
haven't had the time/patience to try to bootstrap the development  
version of GHC).  This is one of the reasons I've been saving so many  
bread crumbs along the way.  All of the above will work as drop-in  
Data.HashTable replacements, with Data.HashTable simply re-exporting  
multiplicative hashing and table doubling.


The tricky bit, as you probably know, is the rather intimate ties  
between Data.HashTable and the rest of the builtin code.  This  
requires the shipping Data.HashTable library to import most things  
from the source, rather than from the place where they're made  
publicly available.  Do you have any suggestions as to how I might go  
about testing that integration?


-Jan



Cheers,
Simon


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Progress on shootout entries

2006-01-04 Thread Jan-Willem Maessen



On Jan 4, 2006, at 8:11 AM, Chris Kuklewicz wrote:


Krasimir Angelov wrote:

...
In this particular case the flop function is very slow.
...
It can be optimized using a new mangle function:

mangle :: Int - [a] - [a]
mangle m xs = xs'
  where
(rs,xs') = splitAt m xs rs

splitAt :: Int - [a] - [a] - ([a], [a])
splitAt 0xs  ys = (xs,ys)
splitAt _[]  ys = ([],ys)
splitAt m (x:xs) ys = splitAt (m - 1) xs (x:ys)

The mangle function transforms the list in one step while the  
original

implementation is using reverse, (++) and splitAt. With this function
the new flop is:

flop :: Int8 - [Int8] - Int8
flop acc (1:xs) = acc
flop acc list@(x:xs) = flop (acc+1) (mangle (fromIntegral x) list)


You seem to have also discovered the fast way to flop.

This benchmarks exactly as fast as the similar entry assembled by
Bertram Felgenhauer using Jan-Willem Maessen's flop code:


...
flop :: Int - [Int] - [Int]
flop n xs = rs
  where (rs, ys) = fl n xs ys
fl 0 xs ys = (ys, xs)
fl n (x:xs) ys = fl (n-1) xs (x:ys)


Indeed, I believe these are isomorphic.  My fl function is the  
splitAt function above, perhaps more descriptively named  
splitAtAndReverseAppend...


-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Progress on shootout entries

2006-01-03 Thread Jan-Willem Maessen


I was surprised to learn that indexed insertion:

permutations (x:xs) =
[insertAt n x perms | perms - permutations xs,
  n - [0..length xs] ]

insertAt :: Int - a - [a] - [a]
insertAt 0 y xs = y:xs
insertAt n y (x:xs) = x:(insertAt (n-1) y xs)

was faster than the usual version of permutation based on inserts:

permutations (x:xs) =
[insertAt n x perms | perms - permutations xs,
  n - [0..length xs] ]
insertAt 0 y xs = y:xs
insertAt n y (x:xs) = x:(insertAt (n-1) y xs)

However, try these on for size.  The non-strict flop, which  
traverses its input exactly once, is the most surprising and made by  
far the biggest difference:



findmax :: [[Int]] - Int
findmax xss = fm xss 0
  where fm [] mx = mx
fm (p:ps) mx = fm ps $! (countFlops p `max` mx)

countFlops :: [Int] - Int
countFlops as = cf as 0
  where cf(1:_) flops = flops
cf xs@(x:_) flops = cf (flop x xs) $! (flops+1)

flop :: Int - [Int] - [Int]
flop n xs = rs
  where (rs,ys) = fl n xs ys
fl 0 xs ys = (ys, xs)
fl n (x:xs) ys = fl (n-1) xs (x:ys)


On Jan 3, 2006, at 8:01 PM, Kimberley Burchett wrote:

I took a quick crack at optimizing fannkuch.hs.  I got it down from  
33s to 1.25s on my machine, with N=9.  That should put it between  
forth and ocaml(bytecode) in the shootout page.  The main changes I  
made were using Int instead of Int8, foldl' to accumulate the max  
number of folds, a custom flop function rather than a combination  
of reverse and splitAt, and a simpler definition for permutations.


   http://kimbly.com/code/fannkuch.hs

Kimberley Burchett
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Haskell Speed

2005-12-29 Thread Jan-Willem Maessen



On Dec 29, 2005, at 7:44 PM, Branimir Maksimovic wrote:

To comment some observation on this program.
Most of the pressure now is on Data.HashTable.
I've susspected such large memory usage on substring from array  
conversions,
so mad version with data MyString = MakeMyStrinf { buf :: Ptr Char,  
size :: Int }

and there was no substrings in map or anywhere else, but memory
consumption remains.
So after eliminating inserts and updates into HashTable memory was ok.
Finally I've realized that updates into hash table actually increase
memory usage to large extent instead to keep memory same
on average. So I guess this is bug in HashTable?


Probably.  The minimum table chunk size was rather large.  I have  
been experimenting (tests are running even as I type) with alternate  
implementations of Data.HashTable.  So far the winning implementation  
is one based on multiplicative hashing and simple table doubling.   
This seems to be consistently competitive with / faster than  
Data.Map.  At this point my inclination is to make the whole suite  
available:


Data.HashTable.Class
Data.HashTable.DataMap
Data.HashTable.Doubling
Data.HashTable.Flat
Data.HashTable.Freeze
Data.HashTable.Modulus
Data.HashTable.Multiplicative
Data.HashTable.Original
Data.HashTable.Test
Data.HashTable

I've separated out the hashing technique (Modulus, Multiplicative)  
from the hash table implementation.  Note that a few of the above are  
bogus, and this is only a fraction of the implementations tried thus  
far.


I need to get distribution clearance for a bunch of this code from my  
employer, and figure out how to package it.  The latter may be  
tricky, as Data.Hashtable is currently rather intimately tied to a  
bunch of rather tricky bits of GHC.


-Jan-Willem Maessen


Second in this test, hash function needs to be very strong,
as even with following I got longest chain of 16 elements.
myHashString = fromIntegral . ff''' . ff'' . ff' . foldr f 0
 where f c m = f'' $ f' (ord c + m)
   f' m = m + (m `shiftL` 10)
   f'' m = m `xor` (m `shiftR` 6)
   ff' m = m + (m `shiftL` 3)
   ff'' m = m `xor` (m `shiftR` 11)
   ff''' m = m + (m `shiftL` 15)
Default hashString has longestChain of 18 elements.
Perhaps someone can explain if such a memory leaks from HashTable  
updates

are normal or are bugs?
All in all functional version consumes less memory and is twice as
fast.

Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's  
FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Can't Haskell catch up with Clean's uniqueness typing?

2005-12-07 Thread Jan-Willem Maessen



On Dec 6, 2005, at 9:17 AM, [EMAIL PROTECTED] wrote:


Hi all,

being occupied with learning both languages, I'm getting curious if  
Haskell couldn't achieve most of the performance gains
resulting from uniqueness typing in Clean by *automatically*  
determining the reference count of arguments wherever
possible and subsequently allowing them to be physically replaced  
immediately by (the corresponding part of) the
function's result. Are there any principal obstacles, or *could*  
this be done, or *is* this even done already, e. g. in

ghc?


Yes, this could be done.  The principle obstacles are the same as for  
any reference counting scheme: It imposes more run-time overhead than  
GC does, unless the data structures involved are large.  Let me  
repeat that: accurate up-to-the-moment reference counting is  
dramatically slower than GC.  Techniques exist to make ref counting  
fast, but they all require the equivalent of a full stack walk in  
order to get an accurate count.


That said, clever techniques (like 1-bit ref counting) are available  
that will get 80% of what is desired.  1-bit reference counting keeps  
a single bit which says either this is certainly the only reference  
or other references may exist.  The bit can be kept in the pointer  
itself.  There's still run-time overhead, though---the bit must be  
masked on each pointer dereference.


Wearing my Fortress language designer hat, we've given serious  
thought to these techniques for very large arrays.  Copying such  
structures is terribly expensive, or even impossible (imagine copying  
a 1PB array).  I'd think hard before I used them for, say, cons cells.


Shae: All this is very, very different from eager / optimistic  
evaluation.


-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Records (was Re: [Haskell] Improvements to GHC)

2005-11-17 Thread Jan-Willem Maessen



On Nov 17, 2005, at 1:52 PM, Benjamin Franksen wrote:

...
Yes, yes, yes. I'd rather use a different operator for record  
selection.
For instance the colon (:). Yes, I know it is the 'cons' operator  
for a

certain concrete data type that implements stacks (so called 'lists').
However I am generally opposed to wasting good operator and function
names as well as syntactic sugar of any kind on a /concrete/ data  
type,

and especially not for stacks aka lists.


Would you be happier if it were the yield operator for iterators?

Yours lazily,

Jan-Willem Maessen



Just my 2 cent.

Ben
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: FPS: Finalizers not running (was Memoryusageoutside of the Haskell heap)

2005-11-10 Thread Jan-Willem Maessen



On Nov 9, 2005, at 4:46 AM, Ketil Malde wrote:


Sebastian Sylvan wrote:


On 11/8/05, Jan-Willem Maessen [EMAIL PROTECTED] wrote:


Just wanted to let people know that I've been working on improving
Data.HashTable, with the help of Ketil Malde's badly performing code



Always happy to help, of course - bad performance R us:-)


I think that if I can get unsafeFreeze/unsafeThaw to work reliably,  
it'll finally outperform Data.Map on your example.  I haven't yet  
played with the hash function, which looks kind of bad; there may be  
hope for improvement there as well.



Request: Data.HashTable.Immutable

I'm not sure you really want that - do you wish to copy a large  
array each time you do an update?

Freezing and thawing might be a good idea, though.


A true read-only hash table, with a freeze for the mutable version,  
but NO THAW, could potentially be useful.  I seem to recall that hbc  
had an immutable hash table (along with a derivable Hashable class),  
where the content was specified as in an array comprehension.


User-level Thaw is a bad idea, and I will resist it, even if the  
library ends up using Freeze/Thaw internally to work around GC  
shortcomings.


HashTable's interface is rather impoverished, compared to Map and  
even Array.
Ideally, it should support many of the same operations, and  
presumably it could work with the ST monad as well as IO.


It's certainly possible to code many of these up---the cleanest code  
I've gotten so far would even make it easy, as everything but lookup  
goes through 2 higher-order INLINE functions (a generic insert/delete  
and a map/reduce, each on a list of pairs).


But I'm going to focus for the moment on the most pressing need,  
which is acceptable performance for what we've got.  I hope the  
result will make new functionality easier to provide.


-Jan



-k
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

1 2 >

1 - 100 of 132 matches

Mail list logo