Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
Ben wrote: perhaps it is too late to suggest things for GSOC -- but stephen tetley on a different thread pointed at aaron turon's work, which there's a very interesting new concurrency framework he calls reagents which seems to give the best of all worlds : it is declarative and compositional like STM, but gives performance akin to hand-coded lock-free data structures. he seems to have straddled the duality of isolation vs message-passing nicely, and can subsume things like actors and the join calculus. http://www.ccs.neu.edu/home/turon/reagents.pdf he has a BSD licensed library in scala at https://github.com/aturon/ChemistrySet if someone doesn't want to pick this up for GSOC i might have a hand at implementing it myself. That looks great! While I didn't take the time to understand the concurrency model in detail, the overall idea is to use arrows that can be run atomically runAtomically :: Reagent a b - (a - IO b) This is very similar to STM: combining computations within the monad/arrow is atomic while combining computations outside the monad/arrow can interleave them. runAtomically (f . g) -- atomic runAtomically f . runAtomically g -- interleaving Actually, it turns out that the Reagent arrow is also a monad, but the author seems to claim that the static arrow style enables certain optimizations. I haven't checked his model in detail to see whether this is really the case and how exactly it differs from STM, but we know that situations like this happen for parser combinators. Maybe it's enough to recast reagents as an applicative functor? To summarize: the way I understand it is that it's apparently possible to improve the STM monad by turning it into an arrow. (I refer to STM in a very liberal sense here: whether memory is transactional or not is unimportant, the only thing that matters is a computation that composes atomically.) Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
perhaps it is too late to suggest things for GSOC -- but stephen tetley on a different thread pointed at aaron turon's work, which there's a very interesting new concurrency framework he calls reagents which seems to give the best of all worlds : it is declarative and compositional like STM, but gives performance akin to hand-coded lock-free data structures. he seems to have straddled the duality of isolation vs message-passing nicely, and can subsume things like actors and the join calculus. http://www.ccs.neu.edu/home/turon/reagents.pdf he has a BSD licensed library in scala at https://github.com/aturon/ChemistrySet if someone doesn't want to pick this up for GSOC i might have a hand at implementing it myself. b On Mar 29, 2012, at 6:46 AM, Tim Harris (RESEARCH) wrote: Hi, Somewhat related to this... Next month we have a paper coming up at EuroSys about a middle-ground between using STM and programming directly with CAS: http://research.microsoft.com/en-us/um/people/tharris/papers/2012-eurosys.pdf This was done in the context of shared memory data structures in C/C++, rather than Haskell. It might be interesting to see how the results carry over to Haskell, e.g. adding short forms of specialized transactions that interact correctly with normal STM-Haskell transactions. In the paper we have some examples of using short specialized transactions for the fast paths in data structures, while keeping the full STM available as a fall-back for expressing the cases that cannot be implemented using short transactions. --Tim -Original Message- From: haskell-cafe-boun...@haskell.org [mailto:haskell-cafe-boun...@haskell.org] On Behalf Of Heinrich Apfelmus Sent: 29 March 2012 13:30 To: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures Florian Hartwig wrote: Heinrich Apfelmus wrote: So while the two are related, CAS is a machine primitive that works for a single operation and on a single word while STM is a software abstraction that isolates sequences of operations on multiple memory locations from each other. Is it possible to implement every data structure based on CAS in terms of STM? What are the drawbacks? The other way round? I think so. Atomically reading and writing a single memory location (which CAS does) is just a very simple transaction. But using a CAS instruction should be more efficient, since STM has to maintain a transaction log and commit transactions, which creates some overhead. Ah, I see. In that case, it may be worthwhile to implement the CAS instruction in terms of STM as well and measure the performance difference this makes for the final data structure. After all, STM is a lot more compositional than CAS, so I'd like to know whether the loss of expressiveness is worth the gain in performance. Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
Ben midfi...@gmail.com writes: perhaps it is too late to suggest things for GSOC -- but stephen tetley on a different thread pointed at aaron turon's work, which there's a very interesting new concurrency framework he calls reagents which seems to give the best of all worlds : it is declarative and compositional like STM, but gives performance akin to hand-coded lock-free data structures. he seems to have straddled the duality of isolation vs message-passing nicely, and can subsume things like actors and the join calculus. http://www.ccs.neu.edu/home/turon/reagents.pdf he has a BSD licensed library in scala at https://github.com/aturon/ChemistrySet if someone doesn't want to pick this up for GSOC i might have a hand at implementing it myself. Keep use in the loop if you do. I have a very nice application that has been needing a nicer approach to concurrency than IORefs but really can't afford STM. Cheers, - Ben ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
+1 -- the reagents model is interesting and it would be good to see a Haskell implementation. On Thu, Apr 5, 2012 at 3:05 PM, Ben Gamari bgamari.f...@gmail.com wrote: Ben midfi...@gmail.com writes: perhaps it is too late to suggest things for GSOC -- but stephen tetley on a different thread pointed at aaron turon's work, which there's a very interesting new concurrency framework he calls reagents which seems to give the best of all worlds : it is declarative and compositional like STM, but gives performance akin to hand-coded lock-free data structures. he seems to have straddled the duality of isolation vs message-passing nicely, and can subsume things like actors and the join calculus. http://www.ccs.neu.edu/home/turon/reagents.pdf he has a BSD licensed library in scala at https://github.com/aturon/ChemistrySet if someone doesn't want to pick this up for GSOC i might have a hand at implementing it myself. Keep use in the loop if you do. I have a very nice application that has been needing a nicer approach to concurrency than IORefs but really can't afford STM. Cheers, - Ben ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data
On 3/30/12 4:27 AM, Krzysztof Skrzętnicki wrote: You mention benchmarking TChans: one particular problem with TChans and Chans is that they are unbounded. If the producers outpace consumers it soon ends with memory exhaustion. If that's the case, then you should consider TBChan[1] which is a bounded TChan, to solve exactly that problem. (There's also TMChan for closeable channels, and TBMChan for bounded closeable channels, as needed.) [1] http://hackage.haskell.org/package/stm-chans -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data
Hi, I was interested in Distruptor few months ago and started to write a Haskell implementation. Soon I discovered the lack of native CAS operations and abandoned project for a while. I don't really have time to develop it further right now, but the code is available: https://github.com/Tener/disruptor-hs The code as it is now is mostly benchmarking code. I think it is worth trying out. You mention benchmarking TChans: one particular problem with TChans and Chans is that they are unbounded. If the producers outpace consumers it soon ends with memory exhaustion. In the process of writing above code I discovered particularly simple data structure that performs suprisingly well (full implementation: https://github.com/Tener/disruptor-hs/blob/master/Test/ManyQueue.hs) : type Queue a = [MVar a] mkQueue size = cycle `fmap` (replicateM size newEmptyMVar) The throutput is very high, it is bounded and scales well with the number of producer/consumers threads. In the presence of multiple consumers/producers it's not a FIFO queue though, but rather a kind of buffer with funny ordering. Best regards, Krzysztof Skrzętnicki On Fri, Mar 30, 2012 at 00:03, John Lato jwl...@gmail.com wrote: Slightly related: I think it would be interesting to compare a Disruptor-based concurrency communications mechanism and compare it to e.g. TChans 1. Disruptor - http://code.google.com/p/disruptor/ From: Ryan Newton rrnew...@gmail.com I think so. Atomically reading and writing a single memory location (which CAS does) is just a very simple transaction. But using a CAS instruction should be more efficient, since STM has to maintain a transaction log and commit transactions, which creates some overhead. Ah, I see. In that case, it may be worthwhile to implement the CAS instruction in terms of STM as well and measure the performance difference this makes for the final data structure. After all, STM is a lot more compositional than CAS, so I'd like to know whether the loss of expressiveness is worth the gain in performance. There's one annoying hitch with doing apples-to-apples comparisons here. The problem is that CAS falls short of being a hardware-accelerated version of a simple transaction (read TVar, (==) against expected value, conditionally update TVar), because CAS is based on pointer equality rather than true equality. (eq? rather than equal? for any Schemers out there.) For this reason our Fake version of CAS -- for older GHCs and for performance comparison -- has to use reallyUnsafePointerEquality#: http://hackage.haskell.org/packages/archive/IORefCAS/0.2/doc/html/Data-CAS-Internal-Fake.html But we do provide a CASable type class in that package which is precisely for comparing the performance of: - Hardware CAS - Fake CAS -- atomicModifyIORef + ptrEquality - Foreign CAS -- gcc intrinsic + function call ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
Florian Hartwig wrote: Hi everyone, I would like to do the GSoC project outlined in http://hackage.haskell.org/trac/summer-of-code/ticket/1608 One of Haskell's great strengths is its support for all kinds of concurrent and parallel programmming, so I think that the Haskell ecosystem would benefit from having a wider range of efficient concurrent data structures. Also, more selfishly, I remember reading the article in CACM last summer and thinking that the whole idea of updating data structures directly using atomic compare-and-swap was really cool, so I would love to have an excuse to play around with it. Some (initial) ideas for lock-free data structures worth implementing: 1. A lock-free priority queue, e.g. using the approach based on skip-lists explained in [2] 2. Stacks, see [1] and [3] 3. Hash tables [4] but if anyone has other suggestions, I would obviously happy to hear them. Since I don't know much about concurrency, I have a simple question: what is the difference between atomic compare-and-swap and software transactional memory? Both are lock-free? Is it possible to implement every data structure based on CAS in terms of STM? What are the drawbacks? The other way round? Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
Though of course you can implement CAS in terms of STM, CAS is much more low-level and will probably be many times (though not asymptotically) faster. On Thu, Mar 29, 2012 at 12:01 PM, Heinrich Apfelmus apfel...@quantentunnel.de wrote: Florian Hartwig wrote: Hi everyone, I would like to do the GSoC project outlined in http://hackage.haskell.org/trac/summer-of-code/ticket/1608 One of Haskell's great strengths is its support for all kinds of concurrent and parallel programmming, so I think that the Haskell ecosystem would benefit from having a wider range of efficient concurrent data structures. Also, more selfishly, I remember reading the article in CACM last summer and thinking that the whole idea of updating data structures directly using atomic compare-and-swap was really cool, so I would love to have an excuse to play around with it. Some (initial) ideas for lock-free data structures worth implementing: 1. A lock-free priority queue, e.g. using the approach based on skip-lists explained in [2] 2. Stacks, see [1] and [3] 3. Hash tables [4] but if anyone has other suggestions, I would obviously happy to hear them. Since I don't know much about concurrency, I have a simple question: what is the difference between atomic compare-and-swap and software transactional memory? Both are lock-free? Is it possible to implement every data structure based on CAS in terms of STM? What are the drawbacks? The other way round? Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
On 29 March 2012 05:57, Ryan Newton rrnew...@gmail.com wrote: I just read in your proposal that you started looking into the casMutArray# issue as well. How far have you gotten with that? Do you want to work on this together a bit? I've got an implementation of a casArray# primop that passes a basic test, but I'm not sure if the GC write barrier is correct: https://github.com/rrnewton/ghc/commit/18ed460be111b47a759486677960093d71eef386 The write barrier is what I got stuck on as well. I don't know much about Haskell's GC. I'm going to read up on it, but my Master's project is due in in 3 weeks, so it's difficult to find the time right now. I'd be happy to work with you on this, but it'll probably have to wait until then. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
On 29 March 2012 at 12:01 PM, Heinrich Apfelmus apfelmus at quantentunnel.de wrote: Since I don't know much about concurrency, I have a simple question: what is the difference between atomic compare-and-swap and software transactional memory? Both are lock-free? Well, terminology-wise it would probably be more accurate to say that you can implement lock-free algorithms using both (i.e. it's not really CAS and STM that are lock-free, but the algorithms implemented using them). But maybe this is a pedantic distinction. As to the difference between the two: CAS is just a machine instruction that takes an old expected value, a new value and an address and updates the memory pointed to by the address iff it contains the old/expected value. All this happens atomically, so you avoid the potential conflicts between threads concurrently reading from and writing to the same address. STM is much higher-level. It's an abstraction that allows the programmer to treat any sequence of reads and writes as a single atomic operation. This is, as the name says, implemented in software. So while the two are related, CAS is a machine primitive that works for a single operation and on a single word while STM is a software abstraction that isolates sequences of operations on multiple memory locations from each other. Is it possible to implement every data structure based on CAS in terms of STM? What are the drawbacks? The other way round? I think so. Atomically reading and writing a single memory location (which CAS does) is just a very simple transaction. But using a CAS instruction should be more efficient, since STM has to maintain a transaction log and commit transactions, which creates some overhead. I hope this makes some sense. Cheers, Florian ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
Florian Hartwig wrote: Heinrich Apfelmus wrote: So while the two are related, CAS is a machine primitive that works for a single operation and on a single word while STM is a software abstraction that isolates sequences of operations on multiple memory locations from each other. Is it possible to implement every data structure based on CAS in terms of STM? What are the drawbacks? The other way round? I think so. Atomically reading and writing a single memory location (which CAS does) is just a very simple transaction. But using a CAS instruction should be more efficient, since STM has to maintain a transaction log and commit transactions, which creates some overhead. Ah, I see. In that case, it may be worthwhile to implement the CAS instruction in terms of STM as well and measure the performance difference this makes for the final data structure. After all, STM is a lot more compositional than CAS, so I'd like to know whether the loss of expressiveness is worth the gain in performance. Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
On Thu, Mar 29, 2012 at 6:57 AM, Ryan Newton rrnew...@gmail.com wrote: The ByteArray versions will be more annoying, requiring more variations, but they are also less essential, because the user can always use ForeignPtr and bits-atomic in this case, and I believe for our concurrent data structures we want to store arbitrary pointers (hence casArray#). This is true, although using bits-atomic does a function call (i.e the calls are not inlined), which would be pretty bad for performance. G -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
On Thu, Mar 29, 2012 at 9:01 AM, Gregory Collins g...@gregorycollins.netwrote: On Thu, Mar 29, 2012 at 6:57 AM, Ryan Newton rrnew...@gmail.com wrote: The ByteArray versions will be more annoying, requiring more variations, but they are also less essential, because the user can always use ForeignPtr and bits-atomic in this case, and I believe for our concurrent data structures we want to store arbitrary pointers (hence casArray#). This is true, although using bits-atomic does a function call (i.e the calls are not inlined), which would be pretty bad for performance. Yes, absolutely... I'd like to add the byte array versions. Actually, those don't have GC write barriers so they should be much easier to get right! ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
I think so. Atomically reading and writing a single memory location (which CAS does) is just a very simple transaction. But using a CAS instruction should be more efficient, since STM has to maintain a transaction log and commit transactions, which creates some overhead. Ah, I see. In that case, it may be worthwhile to implement the CAS instruction in terms of STM as well and measure the performance difference this makes for the final data structure. After all, STM is a lot more compositional than CAS, so I'd like to know whether the loss of expressiveness is worth the gain in performance. There's one annoying hitch with doing apples-to-apples comparisons here. The problem is that CAS falls short of being a hardware-accelerated version of a simple transaction (read TVar, (==) against expected value, conditionally update TVar), because CAS is based on pointer equality rather than true equality. (eq? rather than equal? for any Schemers out there.) For this reason our Fake version of CAS -- for older GHCs and for performance comparison -- has to use reallyUnsafePointerEquality#: http://hackage.haskell.org/packages/archive/IORefCAS/0.2/doc/html/Data-CAS-Internal-Fake.html But we do provide a CASable type class in that package which is precisely for comparing the performance of: - Hardware CAS - Fake CAS -- atomicModifyIORef + ptrEquality - Foreign CAS -- gcc intrinsic + function call Cheers, -Ryan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
Hi, Somewhat related to this... Next month we have a paper coming up at EuroSys about a middle-ground between using STM and programming directly with CAS: http://research.microsoft.com/en-us/um/people/tharris/papers/2012-eurosys.pdf This was done in the context of shared memory data structures in C/C++, rather than Haskell. It might be interesting to see how the results carry over to Haskell, e.g. adding short forms of specialized transactions that interact correctly with normal STM-Haskell transactions. In the paper we have some examples of using short specialized transactions for the fast paths in data structures, while keeping the full STM available as a fall-back for expressing the cases that cannot be implemented using short transactions. --Tim -Original Message- From: haskell-cafe-boun...@haskell.org [mailto:haskell-cafe-boun...@haskell.org] On Behalf Of Heinrich Apfelmus Sent: 29 March 2012 13:30 To: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures Florian Hartwig wrote: Heinrich Apfelmus wrote: So while the two are related, CAS is a machine primitive that works for a single operation and on a single word while STM is a software abstraction that isolates sequences of operations on multiple memory locations from each other. Is it possible to implement every data structure based on CAS in terms of STM? What are the drawbacks? The other way round? I think so. Atomically reading and writing a single memory location (which CAS does) is just a very simple transaction. But using a CAS instruction should be more efficient, since STM has to maintain a transaction log and commit transactions, which creates some overhead. Ah, I see. In that case, it may be worthwhile to implement the CAS instruction in terms of STM as well and measure the performance difference this makes for the final data structure. After all, STM is a lot more compositional than CAS, so I'd like to know whether the loss of expressiveness is worth the gain in performance. Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data
Slightly related: I think it would be interesting to compare a Disruptor-based concurrency communications mechanism and compare it to e.g. TChans 1. Disruptor - http://code.google.com/p/disruptor/ From: Ryan Newton rrnew...@gmail.com I think so. Atomically reading and writing a single memory location (which CAS does) is just a very simple transaction. But using a CAS instruction should be more efficient, since STM has to maintain a transaction log and commit transactions, which creates some overhead. Ah, I see. In that case, it may be worthwhile to implement the CAS instruction in terms of STM as well and measure the performance difference this makes for the final data structure. After all, STM is a lot more compositional than CAS, so I'd like to know whether the loss of expressiveness is worth the gain in performance. There's one annoying hitch with doing apples-to-apples comparisons here. The problem is that CAS falls short of being a hardware-accelerated version of a simple transaction (read TVar, (==) against expected value, conditionally update TVar), because CAS is based on pointer equality rather than true equality. (eq? rather than equal? for any Schemers out there.) For this reason our Fake version of CAS -- for older GHCs and for performance comparison -- has to use reallyUnsafePointerEquality#: http://hackage.haskell.org/packages/archive/IORefCAS/0.2/doc/html/Data-CAS-Internal-Fake.html But we do provide a CASable type class in that package which is precisely for comparing the performance of: - Hardware CAS - Fake CAS -- atomicModifyIORef + ptrEquality - Foreign CAS -- gcc intrinsic + function call ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
Hi again, I just submitted my proposal on the GSoC website. You can find it here: http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/florianhartwig/1 I would be very grateful if someone could read over it and tell me if it makes sense and if/how it could be improved. Cheers, Florian ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
GHC already has a CAS primitive on MutVar#, it just needs to be extended to MutableArray# and MutableByteArray# (at all of the bit-widths the CAS instruction would support, e.g. see readWordXxArray# in http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-prim-0.2.0.0/GHC-Prim.html). The implementation should be almost identical to casMutVar#. In particular I think you need: casMutArray# :: MutableArray# s a - Int# - a - a - State# s - (# State# s, Int#, a #) casWord16MutByteArray :: MutableByteArray# s - Int# - Word# - Word# - State# s - (# State# s, Int#, Word#) FYI, I started working on adding these. I'm hoping to have it working in GHC HEAD for any students who need to use it. To my knowledge the only two patches required to implement casMutVar# were these two (plus the preexisting cas() definition in SMP.h): https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b https://github.com/ghc/ghc/commit/606f6e1cfcb2e79abaadcc5ed643817d2a4585d8 The latter is a bugfix to the former. Florian, your proposal looks good to me ( http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/florianhartwig/1). You touched on the major things we need to know. I just read in your proposal that you started looking into the casMutArray# issue as well. How far have you gotten with that? Do you want to work on this together a bit? I've got an implementation of a casArray# primop that passes a basic test, but I'm not sure if the GC write barrier is correct: https://github.com/rrnewton/ghc/commit/18ed460be111b47a759486677960093d71eef386 The ByteArray versions will be more annoying, requiring more variations, but they are also less essential, because the user can always use ForeignPtr and bits-atomic in this case, and I believe for our concurrent data structures we want to store arbitrary pointers (hence casArray#). Cheers, -Ryan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
On Tue, Mar 20, 2012 at 2:53 AM, Florian Hartwig florian.j.hart...@gmail.com wrote: On 19 March 2012 11:46, Ryan Newton rrnew...@gmail.com wrote: As Adam Foltzer mentioned in the trac ticket a really good structure would be the concurrent bags from this paper: http://www.cse.chalmers.se/~tsigas/papers/Lock%20Free%20Bag%20SPAA11.pdf Looks like they rely on thread-local storage, which would have to be worked around in Haskell somehow. I've just read the paper and they look both useful and interesting to implement. Adam mentioned that GHC would need to be extended first, and I can't really judge how much work that would be. Can anyone chime in on how feasible that is? GHC already has a CAS primitive on MutVar#, it just needs to be extended to MutableArray# and MutableByteArray# (at all of the bit-widths the CAS instruction would support, e.g. see readWordXxArray# in http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-prim-0.2.0.0/GHC-Prim.html). The implementation should be almost identical to casMutVar#. In particular I think you need: casMutArray# :: MutableArray# s a - Int# - a - a - State# s - (# State# s, Int#, a #) casWord16MutByteArray :: MutableByteArray# s - Int# - Word# - Word# - State# s - (# State# s, Int#, Word#) and equivalents for Word32. Word64, Int16, Int32, Int64. G -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
A lock-free concurrent queue alone would be worth a summer project IMO. G On Mon, Mar 19, 2012 at 3:25 AM, Florian Hartwig florian.j.hart...@gmail.com wrote: On 19 March 2012 00:59, Chris Smith cdsm...@gmail.com wrote: On Mar 18, 2012 6:39 PM, Florian Hartwig florian.j.hart...@gmail.com wrote: GSoC stretches over 13 weeks. I would estimate that implementing a data structure, writing tests, benchmarks, documentation etc. should not take more than 3 weeks (it is supposed to be full-time work, after all), which means that I could implement 4 of them in the time available and still have some slack. Don't underestimate the time required for performance tuning, and be careful to leave yourself learning time, unless you have already extensively used ThreadScope, read GHC Core, and worked with low-level strictness, unpacking, possibly even rewrite rules. I suspect that the measurable performance benefit from lockless data structures might be tricky to tease out of the noise created by unintentional strictness or unboxing issues. And we'd be much happier with one or two really production quality implementations than even six or seven at a student project level. -- Chris Smith Thank you, Hofstadter's law definitely rears its head in many of my projects. I do have some experience with ThreadScope and strictness issues, but you I agree that I'm probably underestimating the time I need to learn. I also agree that my focus would be on quality rather than quantity. I quite like the modularity of this project, because it minimises the chance of having a lot of half-finished but useless code at the end of summer. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
On 19 March 2012 09:56, Gregory Collins g...@gregorycollins.net wrote: A lock-free concurrent queue alone would be worth a summer project IMO. G Ryan Newton is already doing that (https://github.com/rrnewton/haskell-lockfree-queue). ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
The MichaelScott lockefree queues in that repository pass tests and should work. Additional stress testing and feedback would be appreciated. There are some other queues in the literature that might be worth implementing but I think other data structures are higher priority. As Adam Foltzer mentioned in the trac ticket a really good structure would be the concurrent bags from this paper: http://www.cse.chalmers.se/~tsigas/papers/Lock%20Free%20Bag%20SPAA11.pdf We separately did a C implementation of those and they performed really well in our bake-off of producer consumer data structures (vs. TBB queues, and many other implementations). By the way, we can share the code for this little bake-off as a performance baseline for the Haskell versions. I'm less familiar with the literature on concurrent hash tables. TBB's have been a little bit underwhelming in performance. But it's definitely an important structure. Ditto for priority queues. In any case, I welcome your interest in the topic, Florian! Cheers, -Ryan On Mon, Mar 19, 2012 at 7:33 AM, Florian Hartwig florian.j.hart...@gmail.com wrote: On 19 March 2012 09:56, Gregory Collins g...@gregorycollins.net wrote: A lock-free concurrent queue alone would be worth a summer project IMO. G Ryan Newton is already doing that (https://github.com/rrnewton/haskell-lockfree-queue). ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
On 19 March 2012 11:46, Ryan Newton rrnew...@gmail.com wrote: As Adam Foltzer mentioned in the trac ticket a really good structure would be the concurrent bags from this paper: http://www.cse.chalmers.se/~tsigas/papers/Lock%20Free%20Bag%20SPAA11.pdf We separately did a C implementation of those and they performed really well in our bake-off of producer consumer data structures (vs. TBB queues, and many other implementations). I've just read the paper and they look both useful and interesting to implement. Adam mentioned that GHC would need to be extended first, and I can't really judge how much work that would be. Can anyone chime in on how feasible that is? I'm less familiar with the literature on concurrent hash tables. TBB's have been a little bit underwhelming in performance. But it's definitely an important structure. Ditto for priority queues. The data structures I mentioned were just my initial ideas, I'm open to any other suggestions. In my (limited) experience with parallel programming, queues and priority queues tend to come up quite a bit, but I'd be happy to get input from anyone with more experience regarding what would be useful/important. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Google Summer of Code - Lock-free data structures
Hi everyone, I would like to do the GSoC project outlined in http://hackage.haskell.org/trac/summer-of-code/ticket/1608 One of Haskell's great strengths is its support for all kinds of concurrent and parallel programmming, so I think that the Haskell ecosystem would benefit from having a wider range of efficient concurrent data structures. Also, more selfishly, I remember reading the article in CACM last summer and thinking that the whole idea of updating data structures directly using atomic compare-and-swap was really cool, so I would love to have an excuse to play around with it. Some (initial) ideas for lock-free data structures worth implementing: 1. A lock-free priority queue, e.g. using the approach based on skip-lists explained in [2] 2. Stacks, see [1] and [3] 3. Hash tables [4] but if anyone has other suggestions, I would obviously happy to hear them. GSoC stretches over 13 weeks. I would estimate that implementing a data structure, writing tests, benchmarks, documentation etc. should not take more than 3 weeks (it is supposed to be full-time work, after all), which means that I could implement 4 of them in the time available and still have some slack. About me: My name is Florian Hartwig, I'm a fifth year (Master's) student in Computing Science at the University of Glasgow. I've been using Haskell for a bit more than two years now (both for university courses and my recreational programming) and I'm currently using it for my Master's project, so I won't have to spend any time learning the Haskell basics. I don't have any other plans for the summer that might conflict with the project. I'd be thankful for any thoughts, questions and suggestions. Cheers, Florian [1] http://cacm.acm.org/magazines/2011/3/105308-data-structures-in-the-multicore-age/fulltext [2] http://www.sciencedirect.com/science/article/pii/S0743731504002333 [3] http://dl.acm.org/citation.cfm?id=1007944 [4] http://dl.acm.org/citation.cfm?id=564881 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
On Mar 18, 2012 6:39 PM, Florian Hartwig florian.j.hart...@gmail.com wrote: GSoC stretches over 13 weeks. I would estimate that implementing a data structure, writing tests, benchmarks, documentation etc. should not take more than 3 weeks (it is supposed to be full-time work, after all), which means that I could implement 4 of them in the time available and still have some slack. Don't underestimate the time required for performance tuning, and be careful to leave yourself learning time, unless you have already extensively used ThreadScope, read GHC Core, and worked with low-level strictness, unpacking, possibly even rewrite rules. I suspect that the measurable performance benefit from lockless data structures might be tricky to tease out of the noise created by unintentional strictness or unboxing issues. And we'd be much happier with one or two really production quality implementations than even six or seven at a student project level. -- Chris Smith ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Google Summer of Code - Lock-free data structures
On 19 March 2012 00:59, Chris Smith cdsm...@gmail.com wrote: On Mar 18, 2012 6:39 PM, Florian Hartwig florian.j.hart...@gmail.com wrote: GSoC stretches over 13 weeks. I would estimate that implementing a data structure, writing tests, benchmarks, documentation etc. should not take more than 3 weeks (it is supposed to be full-time work, after all), which means that I could implement 4 of them in the time available and still have some slack. Don't underestimate the time required for performance tuning, and be careful to leave yourself learning time, unless you have already extensively used ThreadScope, read GHC Core, and worked with low-level strictness, unpacking, possibly even rewrite rules. I suspect that the measurable performance benefit from lockless data structures might be tricky to tease out of the noise created by unintentional strictness or unboxing issues. And we'd be much happier with one or two really production quality implementations than even six or seven at a student project level. -- Chris Smith Thank you, Hofstadter's law definitely rears its head in many of my projects. I do have some experience with ThreadScope and strictness issues, but you I agree that I'm probably underestimating the time I need to learn. I also agree that my focus would be on quality rather than quantity. I quite like the modularity of this project, because it minimises the chance of having a lot of half-finished but useless code at the end of summer. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe