Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-04-06 Thread Heinrich Apfelmus

Ben wrote:

perhaps it is too late to suggest things for GSOC --

but stephen tetley on a different thread pointed at aaron turon's
work, which there's a very interesting new concurrency framework he
calls reagents which seems to give the best of all worlds : it is
declarative and compositional like STM, but gives performance akin to
hand-coded lock-free data structures.  he seems to have straddled the
duality of isolation vs message-passing nicely, and can subsume
things like actors and the join calculus.

http://www.ccs.neu.edu/home/turon/reagents.pdf

he has a BSD licensed library in scala at

https://github.com/aturon/ChemistrySet

if someone doesn't want to pick this up for GSOC i might have a hand
at implementing it myself.


That looks great! While I didn't take the time to understand the 
concurrency model in detail, the overall idea is to use arrows that can 
be run atomically


   runAtomically :: Reagent a b - (a - IO b)

This is very similar to STM: combining computations within the 
monad/arrow is atomic while combining computations outside the 
monad/arrow can interleave them.


   runAtomically (f . g)  -- atomic
   runAtomically f . runAtomically g  -- interleaving


Actually, it turns out that the  Reagent  arrow is also a monad, but the 
author seems to claim that the static arrow style enables certain 
optimizations. I haven't checked his model in detail to see whether this 
is really the case and how exactly it differs from STM, but we know that 
situations like this happen for parser combinators. Maybe it's enough to 
recast reagents as an applicative functor?


To summarize: the way I understand it is that it's apparently possible 
to improve the STM monad by turning it into an arrow. (I refer to STM 
in a very liberal sense here: whether memory is transactional or not is 
unimportant, the only thing that matters is a computation that composes 
atomically.)



Best regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-04-05 Thread Ben
perhaps it is too late to suggest things for GSOC --

but stephen tetley on a different thread pointed at aaron turon's work, which 
there's a very interesting new concurrency framework he calls reagents which 
seems to give the best of all worlds : it is declarative and compositional like 
STM, but gives performance akin to hand-coded lock-free data structures.  he 
seems to have straddled the duality of isolation vs message-passing nicely, and 
can subsume things like actors and the join calculus.

http://www.ccs.neu.edu/home/turon/reagents.pdf

he has a BSD licensed library in scala at

https://github.com/aturon/ChemistrySet

if someone doesn't want to pick this up for GSOC i might have a hand at 
implementing it myself.

b

On Mar 29, 2012, at 6:46 AM, Tim Harris (RESEARCH) wrote:

 Hi,
 
 Somewhat related to this...
 
 Next month we have a paper coming up at EuroSys about a middle-ground between 
 using STM and programming directly with CAS:
 
 http://research.microsoft.com/en-us/um/people/tharris/papers/2012-eurosys.pdf
 
 This was done in the context of shared memory data structures in C/C++, 
 rather than Haskell. 
 
 It might be interesting to see how the results carry over to Haskell, e.g. 
 adding short forms of specialized transactions that interact correctly with 
 normal STM-Haskell transactions.
 
 In the paper we have some examples of using short specialized transactions 
 for the fast paths in data structures, while keeping the full STM available 
 as a fall-back for expressing the cases that cannot be implemented using 
 short transactions.
 
 --Tim
 
 
 
 
 -Original Message-
 From: haskell-cafe-boun...@haskell.org 
 [mailto:haskell-cafe-boun...@haskell.org] On Behalf Of Heinrich Apfelmus
 Sent: 29 March 2012 13:30
 To: haskell-cafe@haskell.org
 Subject: Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
 
 Florian Hartwig wrote:
 Heinrich Apfelmus wrote:
 
 So while the two are related, CAS is a machine primitive that works 
 for a single operation and on a single word while STM is a software 
 abstraction that isolates sequences of operations on multiple memory 
 locations from each other.
 
 Is it possible to implement every data structure based on CAS in 
 terms of STM? What are the drawbacks? The other way round?
 
 I think so. Atomically reading and writing a single memory location 
 (which CAS does) is just a very simple transaction. But using a CAS 
 instruction should be more efficient, since STM has to maintain a 
 transaction log and commit transactions, which creates some overhead.
 
 Ah, I see. In that case, it may be worthwhile to implement the CAS 
 instruction in terms of STM as well and measure the performance difference 
 this makes for the final data structure. After all, STM is a lot more 
 compositional than CAS, so I'd like to know whether the loss of 
 expressiveness is worth the gain in performance.
 
 
 Best regards,
 Heinrich Apfelmus
 
 --
 http://apfelmus.nfshost.com
 
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 
 
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-04-05 Thread Ben Gamari
Ben midfi...@gmail.com writes:

 perhaps it is too late to suggest things for GSOC --

 but stephen tetley on a different thread pointed at aaron turon's
 work, which there's a very interesting new concurrency framework he
 calls reagents which seems to give the best of all worlds : it is
 declarative and compositional like STM, but gives performance akin to
 hand-coded lock-free data structures.  he seems to have straddled the
 duality of isolation vs message-passing nicely, and can subsume things
 like actors and the join calculus.

 http://www.ccs.neu.edu/home/turon/reagents.pdf

 he has a BSD licensed library in scala at

 https://github.com/aturon/ChemistrySet

 if someone doesn't want to pick this up for GSOC i might have a hand
 at implementing it myself.

Keep use in the loop if you do. I have a very nice application that has
been needing a nicer approach to concurrency than IORefs but
really can't afford STM.

Cheers,

- Ben


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-04-05 Thread Ryan Newton
+1 -- the reagents model is interesting and it would be good to see a
Haskell implementation.


On Thu, Apr 5, 2012 at 3:05 PM, Ben Gamari bgamari.f...@gmail.com wrote:

 Ben midfi...@gmail.com writes:

  perhaps it is too late to suggest things for GSOC --
 
  but stephen tetley on a different thread pointed at aaron turon's
  work, which there's a very interesting new concurrency framework he
  calls reagents which seems to give the best of all worlds : it is
  declarative and compositional like STM, but gives performance akin to
  hand-coded lock-free data structures.  he seems to have straddled the
  duality of isolation vs message-passing nicely, and can subsume things
  like actors and the join calculus.
 
  http://www.ccs.neu.edu/home/turon/reagents.pdf
 
  he has a BSD licensed library in scala at
 
  https://github.com/aturon/ChemistrySet
 
  if someone doesn't want to pick this up for GSOC i might have a hand
  at implementing it myself.
 
 Keep use in the loop if you do. I have a very nice application that has
 been needing a nicer approach to concurrency than IORefs but
 really can't afford STM.

 Cheers,

 - Ben


 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data

2012-04-01 Thread wren ng thornton

On 3/30/12 4:27 AM, Krzysztof Skrzętnicki wrote:

You mention benchmarking TChans: one particular problem with TChans and
Chans is that they are unbounded. If the producers outpace consumers it
soon ends with memory exhaustion.


If that's the case, then you should consider TBChan[1] which is a 
bounded TChan, to solve exactly that problem. (There's also TMChan for 
closeable channels, and TBMChan for bounded closeable channels, as needed.)



[1] http://hackage.haskell.org/package/stm-chans

--
Live well,
~wren

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data

2012-03-30 Thread Krzysztof Skrzętnicki
Hi,

I was interested in Distruptor few months ago and started to write a
Haskell implementation. Soon I discovered the lack of native CAS operations
and abandoned project for a while. I don't really have time to develop it
further right now, but the code is available:
https://github.com/Tener/disruptor-hs

The code as it is now is mostly benchmarking code. I think it is worth
trying out.

You mention benchmarking TChans: one particular problem with TChans and
Chans is that they are unbounded. If the producers outpace consumers it
soon ends with memory exhaustion.

In the process of writing above code I discovered particularly simple data
structure that performs suprisingly well (full implementation:
https://github.com/Tener/disruptor-hs/blob/master/Test/ManyQueue.hs) :

type Queue a = [MVar a]
mkQueue size = cycle `fmap` (replicateM size newEmptyMVar)


The throutput is very high, it is bounded and scales well with the number
of producer/consumers threads. In the presence of multiple
consumers/producers it's not a FIFO queue though, but rather a kind of
buffer with funny ordering.

Best regards,
Krzysztof Skrzętnicki

On Fri, Mar 30, 2012 at 00:03, John Lato jwl...@gmail.com wrote:

 Slightly related: I think it would be interesting to compare a
 Disruptor-based concurrency communications mechanism and compare it to
 e.g. TChans

 1.  Disruptor - http://code.google.com/p/disruptor/

  From: Ryan Newton rrnew...@gmail.com
 
  I think so. Atomically reading and writing a single memory location
  (which CAS does) is just a very simple transaction. But using a CAS
  instruction should be more efficient, since STM has to maintain a
  transaction log and commit transactions, which creates some overhead.
 
 
  Ah, I see. In that case, it may be worthwhile to implement the CAS
  instruction in terms of STM as well and measure the performance
 difference
  this makes for the final data structure. After all, STM is a lot more
  compositional than CAS, so I'd like to know whether the loss of
  expressiveness is worth the gain in performance.
 
 
  There's one annoying hitch with doing apples-to-apples comparisons here.
 
  The problem is that CAS falls short of being a hardware-accelerated
 version
  of a simple transaction (read TVar, (==) against expected value,
  conditionally update TVar), because CAS is based on pointer equality
 rather
  than true equality.  (eq? rather than equal? for any Schemers out there.)
 
  For this reason our Fake version of CAS -- for older GHCs and for
  performance comparison -- has to use reallyUnsafePointerEquality#:
 
 
 
 http://hackage.haskell.org/packages/archive/IORefCAS/0.2/doc/html/Data-CAS-Internal-Fake.html
 
  But we do provide a CASable type class in that package which is
 precisely
  for comparing the performance of:
 
- Hardware CAS
- Fake CAS -- atomicModifyIORef + ptrEquality
- Foreign CAS -- gcc intrinsic + function call

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-29 Thread Heinrich Apfelmus

Florian Hartwig wrote:

Hi everyone,
I would like to do the GSoC project outlined in
http://hackage.haskell.org/trac/summer-of-code/ticket/1608

One of Haskell's great strengths is its support for all kinds of concurrent
and parallel programmming, so I think that the Haskell ecosystem would
benefit from having a wider range of efficient concurrent data structures.
Also, more selfishly, I remember reading the article in CACM last summer and
thinking that the whole idea of updating data structures directly using
atomic compare-and-swap was really cool, so I would love to have an excuse
to play around with it.

Some (initial) ideas for lock-free data structures worth implementing:
1. A lock-free priority queue, e.g. using the approach based on skip-lists
explained in [2]
2. Stacks, see [1] and [3]
3. Hash tables [4]
but if anyone has other suggestions, I would obviously happy to hear them.


Since I don't know much about concurrency, I have a simple question: 
what is the difference between atomic compare-and-swap and software 
transactional memory? Both are lock-free? Is it possible to implement 
every data structure based on CAS in terms of STM? What are the 
drawbacks? The other way round?



Best regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-29 Thread Eugene Kirpichov
Though of course you can implement CAS in terms of STM, CAS is much
more low-level and will probably be many times (though not
asymptotically) faster.

On Thu, Mar 29, 2012 at 12:01 PM, Heinrich Apfelmus
apfel...@quantentunnel.de wrote:
 Florian Hartwig wrote:

 Hi everyone,
 I would like to do the GSoC project outlined in
 http://hackage.haskell.org/trac/summer-of-code/ticket/1608

 One of Haskell's great strengths is its support for all kinds of
 concurrent
 and parallel programmming, so I think that the Haskell ecosystem would
 benefit from having a wider range of efficient concurrent data structures.
 Also, more selfishly, I remember reading the article in CACM last summer
 and
 thinking that the whole idea of updating data structures directly using
 atomic compare-and-swap was really cool, so I would love to have an excuse
 to play around with it.

 Some (initial) ideas for lock-free data structures worth implementing:
 1. A lock-free priority queue, e.g. using the approach based on skip-lists
 explained in [2]
 2. Stacks, see [1] and [3]
 3. Hash tables [4]
 but if anyone has other suggestions, I would obviously happy to hear them.


 Since I don't know much about concurrency, I have a simple question: what is
 the difference between atomic compare-and-swap and software transactional
 memory? Both are lock-free? Is it possible to implement every data structure
 based on CAS in terms of STM? What are the drawbacks? The other way round?


 Best regards,
 Heinrich Apfelmus

 --
 http://apfelmus.nfshost.com


 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe



-- 
Eugene Kirpichov
Principal Engineer, Mirantis Inc. http://www.mirantis.com/
Editor, http://fprog.ru/

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-29 Thread Florian Hartwig
On 29 March 2012 05:57, Ryan Newton rrnew...@gmail.com wrote:
 I just read in your proposal that you started looking into the casMutArray#
 issue as well.  How far have you gotten with that?  Do you want to work on
 this together a bit?

 I've got an implementation of a casArray# primop that passes a basic test,
 but I'm not sure if the GC write barrier is correct:


 https://github.com/rrnewton/ghc/commit/18ed460be111b47a759486677960093d71eef386

The write barrier is what I got stuck on as well. I don't know much
about Haskell's GC. I'm going to read up on it, but my Master's
project is due in in 3 weeks, so it's difficult to find the time right
now. I'd be happy to work with you on this, but it'll probably have to
wait until then.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-29 Thread Florian Hartwig
On 29 March 2012 at 12:01 PM, Heinrich Apfelmus
apfelmus at quantentunnel.de wrote:
 Since I don't know much about concurrency, I have a simple question: what is
 the difference between atomic compare-and-swap and software transactional
 memory? Both are lock-free?

Well, terminology-wise it would probably be more accurate to say that
you can implement lock-free algorithms using both (i.e. it's not
really CAS and STM that are lock-free, but the algorithms implemented
using them). But maybe this is a pedantic distinction.

As to the difference between the two: CAS is just a machine
instruction that takes an old expected value, a new value and an
address and updates the memory pointed to by the address iff it
contains the old/expected value. All this happens atomically, so you
avoid the potential conflicts between threads concurrently reading
from and writing to the same address.

STM is much higher-level. It's an abstraction that allows the
programmer to treat any sequence of reads and writes as a single
atomic operation. This is, as the name says, implemented in software.

So while the two are related, CAS is a machine primitive that works
for a single operation and on a single word while STM is a software
abstraction that isolates sequences of operations on multiple memory
locations from each other.

 Is it possible to implement every data structure
 based on CAS in terms of STM? What are the drawbacks? The other way round?

I think so. Atomically reading and writing a single memory location
(which CAS does) is just a very simple transaction. But using a CAS
instruction should be more efficient, since STM has to maintain a
transaction log and commit transactions, which creates some overhead.

I hope this makes some sense.
Cheers,
Florian

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-29 Thread Heinrich Apfelmus

Florian Hartwig wrote:

Heinrich Apfelmus wrote:

So while the two are related, CAS is a machine primitive that works
for a single operation and on a single word while STM is a software
abstraction that isolates sequences of operations on multiple memory
locations from each other.


Is it possible to implement every data structure
based on CAS in terms of STM? What are the drawbacks? The other way round?


I think so. Atomically reading and writing a single memory location
(which CAS does) is just a very simple transaction. But using a CAS
instruction should be more efficient, since STM has to maintain a
transaction log and commit transactions, which creates some overhead.


Ah, I see. In that case, it may be worthwhile to implement the CAS 
instruction in terms of STM as well and measure the performance 
difference this makes for the final data structure. After all, STM is a 
lot more compositional than CAS, so I'd like to know whether the loss of 
expressiveness is worth the gain in performance.



Best regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-29 Thread Gregory Collins
On Thu, Mar 29, 2012 at 6:57 AM, Ryan Newton rrnew...@gmail.com wrote:

 The ByteArray versions will be more annoying, requiring more variations,
 but they are also less essential, because the user can always use
 ForeignPtr and bits-atomic in this case, and I believe for our concurrent
 data structures we want to store arbitrary pointers (hence casArray#).


This is true, although using bits-atomic does a function call (i.e the
calls are not inlined), which would be pretty bad for performance.

G
-- 
Gregory Collins g...@gregorycollins.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-29 Thread Ryan Newton
On Thu, Mar 29, 2012 at 9:01 AM, Gregory Collins g...@gregorycollins.netwrote:

 On Thu, Mar 29, 2012 at 6:57 AM, Ryan Newton rrnew...@gmail.com wrote:

 The ByteArray versions will be more annoying, requiring more variations,
 but they are also less essential, because the user can always use
 ForeignPtr and bits-atomic in this case, and I believe for our concurrent
 data structures we want to store arbitrary pointers (hence casArray#).


 This is true, although using bits-atomic does a function call (i.e the
 calls are not inlined), which would be pretty bad for performance.


Yes, absolutely... I'd like to add the byte array versions.  Actually,
those don't have GC write barriers so they should be much easier to get
right!
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-29 Thread Ryan Newton

 I think so. Atomically reading and writing a single memory location
 (which CAS does) is just a very simple transaction. But using a CAS
 instruction should be more efficient, since STM has to maintain a
 transaction log and commit transactions, which creates some overhead.


 Ah, I see. In that case, it may be worthwhile to implement the CAS
 instruction in terms of STM as well and measure the performance difference
 this makes for the final data structure. After all, STM is a lot more
 compositional than CAS, so I'd like to know whether the loss of
 expressiveness is worth the gain in performance.


There's one annoying hitch with doing apples-to-apples comparisons here.

The problem is that CAS falls short of being a hardware-accelerated version
of a simple transaction (read TVar, (==) against expected value,
conditionally update TVar), because CAS is based on pointer equality rather
than true equality.  (eq? rather than equal? for any Schemers out there.)

For this reason our Fake version of CAS -- for older GHCs and for
performance comparison -- has to use reallyUnsafePointerEquality#:


http://hackage.haskell.org/packages/archive/IORefCAS/0.2/doc/html/Data-CAS-Internal-Fake.html

But we do provide a CASable type class in that package which is precisely
for comparing the performance of:

   - Hardware CAS
   - Fake CAS -- atomicModifyIORef + ptrEquality
   - Foreign CAS -- gcc intrinsic + function call

Cheers,
   -Ryan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-29 Thread Tim Harris (RESEARCH)
Hi,

Somewhat related to this...

Next month we have a paper coming up at EuroSys about a middle-ground between 
using STM and programming directly with CAS:

http://research.microsoft.com/en-us/um/people/tharris/papers/2012-eurosys.pdf

This was done in the context of shared memory data structures in C/C++, rather 
than Haskell. 

It might be interesting to see how the results carry over to Haskell, e.g. 
adding short forms of specialized transactions that interact correctly with 
normal STM-Haskell transactions.

In the paper we have some examples of using short specialized transactions for 
the fast paths in data structures, while keeping the full STM available as a 
fall-back for expressing the cases that cannot be implemented using short 
transactions.

 --Tim




-Original Message-
From: haskell-cafe-boun...@haskell.org 
[mailto:haskell-cafe-boun...@haskell.org] On Behalf Of Heinrich Apfelmus
Sent: 29 March 2012 13:30
To: haskell-cafe@haskell.org
Subject: Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

Florian Hartwig wrote:
 Heinrich Apfelmus wrote:
 
 So while the two are related, CAS is a machine primitive that works 
 for a single operation and on a single word while STM is a software 
 abstraction that isolates sequences of operations on multiple memory 
 locations from each other.
 
 Is it possible to implement every data structure based on CAS in 
 terms of STM? What are the drawbacks? The other way round?
 
 I think so. Atomically reading and writing a single memory location 
 (which CAS does) is just a very simple transaction. But using a CAS 
 instruction should be more efficient, since STM has to maintain a 
 transaction log and commit transactions, which creates some overhead.

Ah, I see. In that case, it may be worthwhile to implement the CAS instruction 
in terms of STM as well and measure the performance difference this makes for 
the final data structure. After all, STM is a lot more compositional than CAS, 
so I'd like to know whether the loss of expressiveness is worth the gain in 
performance.


Best regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data

2012-03-29 Thread John Lato
Slightly related: I think it would be interesting to compare a
Disruptor-based concurrency communications mechanism and compare it to
e.g. TChans

1.  Disruptor - http://code.google.com/p/disruptor/

 From: Ryan Newton rrnew...@gmail.com

 I think so. Atomically reading and writing a single memory location
 (which CAS does) is just a very simple transaction. But using a CAS
 instruction should be more efficient, since STM has to maintain a
 transaction log and commit transactions, which creates some overhead.


 Ah, I see. In that case, it may be worthwhile to implement the CAS
 instruction in terms of STM as well and measure the performance difference
 this makes for the final data structure. After all, STM is a lot more
 compositional than CAS, so I'd like to know whether the loss of
 expressiveness is worth the gain in performance.


 There's one annoying hitch with doing apples-to-apples comparisons here.

 The problem is that CAS falls short of being a hardware-accelerated version
 of a simple transaction (read TVar, (==) against expected value,
 conditionally update TVar), because CAS is based on pointer equality rather
 than true equality.  (eq? rather than equal? for any Schemers out there.)

 For this reason our Fake version of CAS -- for older GHCs and for
 performance comparison -- has to use reallyUnsafePointerEquality#:


 http://hackage.haskell.org/packages/archive/IORefCAS/0.2/doc/html/Data-CAS-Internal-Fake.html

 But we do provide a CASable type class in that package which is precisely
 for comparing the performance of:

   - Hardware CAS
   - Fake CAS -- atomicModifyIORef + ptrEquality
   - Foreign CAS -- gcc intrinsic + function call

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-28 Thread Florian Hartwig
Hi again,
I just submitted my proposal on the GSoC website. You can find it here:
http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/florianhartwig/1
I would be very grateful if someone could read over it and tell me if
it makes sense and if/how it could be improved.
Cheers,
Florian

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-28 Thread Ryan Newton
 GHC already has a CAS primitive on MutVar#, it just needs to be extended
 to MutableArray# and MutableByteArray# (at all of the bit-widths the CAS
 instruction would support, e.g. see readWordXxArray# in
 http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-prim-0.2.0.0/GHC-Prim.html).
 The implementation should be almost identical to casMutVar#.
 In particular I think you need:
 casMutArray# :: MutableArray# s a - Int# - a - a - State# s - (#
 State# s, Int#, a #)
 casWord16MutByteArray :: MutableByteArray# s - Int# - Word# - Word#
 - State# s - (# State# s, Int#, Word#)


FYI, I started working on adding these.  I'm hoping to have it working in
GHC HEAD for any students who need to use it.  To my knowledge the only two
patches required to implement casMutVar# were these two (plus the
preexisting cas() definition in SMP.h):


https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b

https://github.com/ghc/ghc/commit/606f6e1cfcb2e79abaadcc5ed643817d2a4585d8

The latter is a bugfix to the former.

Florian, your proposal looks good to me (
http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/florianhartwig/1).
 You touched on the major things we need to know.

I just read in your proposal that you started looking into the casMutArray#
issue as well.  How far have you gotten with that?  Do you want to work on
this together a bit?

I've got an implementation of a casArray# primop that passes a basic test,
but I'm not sure if the GC write barrier is correct:


https://github.com/rrnewton/ghc/commit/18ed460be111b47a759486677960093d71eef386

The ByteArray versions will be more annoying, requiring more variations,
but they are also less essential, because the user can always use
ForeignPtr and bits-atomic in this case, and I believe for our concurrent
data structures we want to store arbitrary pointers (hence casArray#).

Cheers,
  -Ryan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-20 Thread Gregory Collins
On Tue, Mar 20, 2012 at 2:53 AM, Florian Hartwig 
florian.j.hart...@gmail.com wrote:

 On 19 March 2012 11:46, Ryan Newton rrnew...@gmail.com wrote:
  As Adam Foltzer mentioned in the trac ticket a really good structure
 would
  be the concurrent bags from this paper:
 
 
 http://www.cse.chalmers.se/~tsigas/papers/Lock%20Free%20Bag%20SPAA11.pdf


Looks like they rely on thread-local storage, which would have to be worked
around in Haskell somehow.


 I've just read the paper and they look both useful and interesting to
 implement. Adam mentioned that GHC would need to be extended first,
 and I can't really judge how much work that would be. Can anyone chime
 in on how feasible that is?


GHC already has a CAS primitive on MutVar#, it just needs to be extended to
MutableArray# and MutableByteArray# (at all of the bit-widths the CAS
instruction would support, e.g. see readWordXxArray# in
http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-prim-0.2.0.0/GHC-Prim.html).
The implementation should be almost identical to casMutVar#.

In particular I think you need:

casMutArray# :: MutableArray# s a - Int# - a - a - State# s - (#
State# s, Int#, a #)
casWord16MutByteArray :: MutableByteArray# s - Int# - Word# - Word#
- State# s - (# State# s, Int#, Word#)

and equivalents for Word32. Word64, Int16, Int32, Int64.

G
-- 
Gregory Collins g...@gregorycollins.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-19 Thread Gregory Collins
A lock-free concurrent queue alone would be worth a summer project IMO.

G

On Mon, Mar 19, 2012 at 3:25 AM, Florian Hartwig 
florian.j.hart...@gmail.com wrote:

 On 19 March 2012 00:59, Chris Smith cdsm...@gmail.com wrote:
  On Mar 18, 2012 6:39 PM, Florian Hartwig florian.j.hart...@gmail.com
  wrote:
  GSoC stretches over 13 weeks. I would estimate that implementing a data
  structure, writing tests, benchmarks, documentation etc. should not take
  more
  than 3 weeks (it is supposed to be full-time work, after all), which
 means
  that I could implement 4 of them in the time available and still have
 some
  slack.
 
  Don't underestimate the time required for performance tuning, and be
 careful
  to leave yourself learning time, unless you have already extensively used
  ThreadScope, read GHC Core, and worked with low-level strictness,
 unpacking,
  possibly even rewrite rules.  I suspect that the measurable performance
  benefit from lockless data structures might be tricky to tease out of the
  noise created by unintentional strictness or unboxing issues.  And we'd
 be
  much happier with one or two really production quality implementations
 than
  even six or seven at a student project level.
 
  --
  Chris Smith

 Thank you, Hofstadter's law definitely rears its head in many of my
 projects.
 I do have some experience with ThreadScope and strictness issues, but
 you I agree that I'm probably underestimating the time I need to
 learn.
 I also agree that my focus would be on quality rather than quantity. I
 quite like the modularity of this project, because it minimises the
 chance of having a lot of half-finished but useless code at the end of
 summer.

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe




-- 
Gregory Collins g...@gregorycollins.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-19 Thread Florian Hartwig
On 19 March 2012 09:56, Gregory Collins g...@gregorycollins.net wrote:
 A lock-free concurrent queue alone would be worth a summer project IMO.

 G

Ryan Newton is already doing that
(https://github.com/rrnewton/haskell-lockfree-queue).

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-19 Thread Ryan Newton
The MichaelScott lockefree queues in that repository pass tests and should
work.  Additional stress testing and feedback would be appreciated.  There
are some other queues in the literature that might be worth implementing
but I think other data structures are higher priority.

As Adam Foltzer mentioned in the trac ticket a really good structure would
be the concurrent bags from this paper:

   http://www.cse.chalmers.se/~tsigas/papers/Lock%20Free%20Bag%20SPAA11.pdf

We separately did a C implementation of those and they performed really
well in our bake-off of producer consumer data structures (vs. TBB queues,
and many other implementations).  By the way, we can share the code for
this little bake-off as a performance baseline for the Haskell versions.

I'm less familiar with the literature on concurrent hash tables.  TBB's
have been a little bit underwhelming in performance.  But it's definitely
an important structure.   Ditto for priority queues.

In any case, I welcome your interest in the topic, Florian!

Cheers,
   -Ryan



On Mon, Mar 19, 2012 at 7:33 AM, Florian Hartwig 
florian.j.hart...@gmail.com wrote:

 On 19 March 2012 09:56, Gregory Collins g...@gregorycollins.net wrote:
  A lock-free concurrent queue alone would be worth a summer project IMO.
 
  G

 Ryan Newton is already doing that
 (https://github.com/rrnewton/haskell-lockfree-queue).

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-19 Thread Florian Hartwig
On 19 March 2012 11:46, Ryan Newton rrnew...@gmail.com wrote:
 As Adam Foltzer mentioned in the trac ticket a really good structure would
 be the concurrent bags from this paper:

    http://www.cse.chalmers.se/~tsigas/papers/Lock%20Free%20Bag%20SPAA11.pdf

 We separately did a C implementation of those and they performed really well
 in our bake-off of producer consumer data structures (vs. TBB queues, and
 many other implementations).

I've just read the paper and they look both useful and interesting to
implement. Adam mentioned that GHC would need to be extended first,
and I can't really judge how much work that would be. Can anyone chime
in on how feasible that is?

 I'm less familiar with the literature on concurrent hash tables.  TBB's have
 been a little bit underwhelming in performance.  But it's definitely an
 important structure.   Ditto for priority queues.

The data structures I mentioned were just my initial ideas, I'm open
to any other suggestions. In my (limited) experience with parallel
programming, queues and priority queues tend to come up quite a bit,
but I'd be happy to get input from anyone with more experience
regarding what would be useful/important.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-18 Thread Florian Hartwig
Hi everyone,
I would like to do the GSoC project outlined in
http://hackage.haskell.org/trac/summer-of-code/ticket/1608

One of Haskell's great strengths is its support for all kinds of concurrent
and parallel programmming, so I think that the Haskell ecosystem would
benefit from having a wider range of efficient concurrent data structures.
Also, more selfishly, I remember reading the article in CACM last summer and
thinking that the whole idea of updating data structures directly using
atomic compare-and-swap was really cool, so I would love to have an excuse
to play around with it.

Some (initial) ideas for lock-free data structures worth implementing:
1. A lock-free priority queue, e.g. using the approach based on skip-lists
explained in [2]
2. Stacks, see [1] and [3]
3. Hash tables [4]
but if anyone has other suggestions, I would obviously happy to hear them.

GSoC stretches over 13 weeks. I would estimate that implementing a data
structure, writing tests, benchmarks, documentation etc. should not take more
than 3 weeks (it is supposed to be full-time work, after all), which means
that I could implement 4 of them in the time available and still have some
slack.

About me: My name is Florian Hartwig, I'm a fifth year (Master's) student in
Computing Science at the University of Glasgow. I've been using Haskell for a
bit more than two years now (both for university courses and my recreational
programming) and I'm currently using it for my Master's project, so I won't
have to spend any time learning the Haskell basics.
I don't have any other plans for the summer that might conflict with the
project.

I'd be thankful for any thoughts, questions and suggestions.
Cheers,
Florian

[1] 
http://cacm.acm.org/magazines/2011/3/105308-data-structures-in-the-multicore-age/fulltext
[2] http://www.sciencedirect.com/science/article/pii/S0743731504002333
[3] http://dl.acm.org/citation.cfm?id=1007944
[4] http://dl.acm.org/citation.cfm?id=564881

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-18 Thread Chris Smith
On Mar 18, 2012 6:39 PM, Florian Hartwig florian.j.hart...@gmail.com
wrote:
 GSoC stretches over 13 weeks. I would estimate that implementing a data
 structure, writing tests, benchmarks, documentation etc. should not take
more
 than 3 weeks (it is supposed to be full-time work, after all), which means
 that I could implement 4 of them in the time available and still have some
 slack.

Don't underestimate the time required for performance tuning, and be
careful to leave yourself learning time, unless you have already
extensively used ThreadScope, read GHC Core, and worked with low-level
strictness, unpacking, possibly even rewrite rules.  I suspect that the
measurable performance benefit from lockless data structures might be
tricky to tease out of the noise created by unintentional strictness or
unboxing issues.  And we'd be much happier with one or two really
production quality implementations than even six or seven at a student
project level.

-- 
Chris Smith
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures

2012-03-18 Thread Florian Hartwig
On 19 March 2012 00:59, Chris Smith cdsm...@gmail.com wrote:
 On Mar 18, 2012 6:39 PM, Florian Hartwig florian.j.hart...@gmail.com
 wrote:
 GSoC stretches over 13 weeks. I would estimate that implementing a data
 structure, writing tests, benchmarks, documentation etc. should not take
 more
 than 3 weeks (it is supposed to be full-time work, after all), which means
 that I could implement 4 of them in the time available and still have some
 slack.

 Don't underestimate the time required for performance tuning, and be careful
 to leave yourself learning time, unless you have already extensively used
 ThreadScope, read GHC Core, and worked with low-level strictness, unpacking,
 possibly even rewrite rules.  I suspect that the measurable performance
 benefit from lockless data structures might be tricky to tease out of the
 noise created by unintentional strictness or unboxing issues.  And we'd be
 much happier with one or two really production quality implementations than
 even six or seven at a student project level.

 --
 Chris Smith

Thank you, Hofstadter's law definitely rears its head in many of my projects.
I do have some experience with ThreadScope and strictness issues, but
you I agree that I'm probably underestimating the time I need to
learn.
I also agree that my focus would be on quality rather than quantity. I
quite like the modularity of this project, because it minimises the
chance of having a lot of half-finished but useless code at the end of
summer.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe