Re: [Haskell-cafe] Interesting data structure

2007-12-29 Thread Steve Lihn
What you described has a Java parallel:

 In the first program, I am reading on-demand from a database - just
 reading, not making any changes.

This is similar to EJB's entity bean.  Usually EJB requires XA driver
to preserve database state across the cluster, but in your read-only
case, this is not needed. HDBC puts you into the IO monad. You need to
get yourself out of IO monad if you believe your code is pure.

 In the second, I am requesting computations to be evaluated
 externally, in order to take advantage of a grid of machines.

This is similar to J2EE's clustering, where beans can sit on any node
of the cluster and be accessed via RMI. The ability to handle
(de)serialization is the key here. Java's solution is proprietary,
while Microsoft dot NET has choosen SOAP to request computation over
network. Network calls will also put you into the IO monad, which has
to be hidden in your case..

If you came up with a solution, it might have invented the H2EE.

Steve
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Interesting data structure

2007-12-28 Thread Paul Johnson

Tim Docker wrote:

I found it worthwhile to try and visualise what's going on here. Let's
say I have 4 calculations that I want to run in parallel. The first
doesn't need a request; the second needs to make a single request
(A1); the third needs to make two requests where the second (B2)
depends on the result of the first (B1), etc. The resulting parallel
operations will be done in 3 batches, looking like:
  
This sounds similar to futures, where a request for a parallel 
computation returns immediately, but the value returned is just a 
placeholder for the result, which will be filled in when it becomes 
available.


Paul.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Interesting data structure

2007-12-27 Thread Tim Docker
I'm using a control structure that's a variation of a monad and I'm
interested in whether

- it's got a name
- it deserves a name (!)
- anything else similar is used elsewhere

Please excuse the longer post...


I have two programs that need to interact with the outside world, and
I want to constrain the nature of these interactions. I don't want to
just sprinkle IO throughout the code.

In the first program, I am reading on-demand from a database - just
reading, not making any changes.

In the second, I am requesting computations to be evaluated
externally, in order to take advantage of a grid of machines.

In both of these cases, the external requests don't change the state
of the world, and the programs can be consided pure as long as
the world isn't being changed by some other means. Hence the requests
can be reordered, performed in parallel, and optimised in various ways
without affecting the result.

To make this concrete, if an external request has a type like:

a - m b

where a is the input to the request, b is the response, and m is some
monad, probably IO. Then a data structure capturing calculations over
these requests, with result type c can be:

data SCalc a b c = SCResult c
 | SCStage {
 sc_req :: a,
 sc_calc :: b - SCalc a b c
   }

The idea is that we either have a result, or we need to make a
external request, whose response is used to generate a new
calculation.

Running such a calculation is straightforward:

runSC :: (Monad m ) = SCalc a b c - (a - m b) - m c
runSC (SCResult v) _ = return v
runSC (SCStage a cf) reqf = do
b - reqf a
runSC (cf b) reqf

and calculations can be sequence by making a monad instance:

instance Monad (SCalc a b) where
   return = SCResult

   (=) (SCResult v)cf = cf v
   (=) (SCStage a cf1) cf = SCStage a (\b - cf1 b = cf)

Where it gets interesting, however, is that when they don't depend on
each other, we can run these calculations in parallel. We need the
ability to merge requests, and split responses. Hence,

class Req a where
   merge :: a - a - a

class Resp b where
   split :: b - (b,b)

par :: (Req a, Resp b) = SCalc a b c - SCalc a b d - SCalc a b (c,d)

The par primitive above can be used to define other parallel
operations, such as

parList :: (Req a, Resp b) = [SCalc a b c] - SCalc a b [c]

I found it worthwhile to try and visualise what's going on here. Let's
say I have 4 calculations that I want to run in parallel. The first
doesn't need a request; the second needs to make a single request
(A1); the third needs to make two requests where the second (B2)
depends on the result of the first (B1), etc. The resulting parallel
operations will be done in 3 batches, looking like:

 batch1  batch2   batch3   result

calc1 V0
calc2A1   V1
calc3B1  B2   --- V2
calc4C1  C2   C3   -- V3

(excuse the ascii art). batch1 will consist of A1,B1,C1 merged
together; batch2 of B2,C3 merged; etc.

In practice, I've used the above data types to abstract out database
access in some reporting code. It works quite well, as use of the
parallel primitive above means that the haskell code talking to the
database sees all of the information in each batch simultaneously, so
it can optimise the queries, remove redundant requests etc. It also
makes the reporting code pure despite the fact that information is
being loaded on demand from the db (without any unsafe calls behind
the scene). I guess the use of the term pure here should be
qualified: the impure code has been factored out to a single function
in a different module that has a limited and well defined interface.

I haven't implemented the grid calculation example described above,
though I see that it ought to be able to work similarly, potentially
removing duplicate calculation requests, etc.

So my questions are: does this sort of monad allowing parallel
evaluation structure have a name? Is it an existing design pattern
in fp somewhere that I haven't seen?

thanks,

Tim



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Interesting data structure

2007-12-27 Thread Ryan Ingram
This monad seems to be basically the same as Prompt; see
http://www.haskell.org/pipermail/haskell-cafe/2007-November/034830.html, the
only difference I see is that Prompt allows the return value's type to
be based on the request instead of forcing everything to be wrapped in a
single result type.

You implemented the monad operations exactly the same as Prompt, and your
bind operator suffers from the same quadratic behavior problem that was
pointed out in that thread.

As was pointed out there, what you are doing is turning the potential side
effects of your computations into a term algebra, which allows you to write
different interpretation functions to use when running the calculation
(the reqf passed to runSC).  As far as I can tell, this pattern is general
enough to implement any computation, so it's not surprising that you found
it possible to use it to implement parallel computation.

As an example, here's the State monad implemented in terms of SCalc:

 data StateReq s = Get | Put s
 get :: SCalc (StateReq s) s s
 get = SCStage Get return
 put :: s - SCalc (StateReq s) s ()
 put s = SCStage (Put s) (const $ return ())

 runState :: SCalc (StateReq a) s b - s - (a, s)
 runState (SCResult v) s = (v, s)
 runState (SCStage Get cont) s = runState (cont s) s
 runState (SCStage (Put s) cont) _ = runState (cont s) s

I think it's a useful pattern and I definitely am getting a lot of use out
of Prompt in my code.  But I'm trying to figure out the best way
to eliminate the quadratic behavior of (=) that is exhibited by, for
example:

foldl1 (=) $ take 100 $ repeat $ (\x - put (x+1) = get) $ 0

The only way I've found so far is to wrap Prompt inside of ContT which
solves the problem in much the same way that difference lists (newtype DList
a = [a] - [a]) solve the problem of quadratic time append for lists.

  -- ryan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe