Re: [Haskell-cafe] Efficient mutable arrays in STM

2011-10-30 Thread Ben Franksen
Benjamin Franksen wrote:
 David Barbour wrote:
 Create an extra TVar Int for every `chunk` in the array (e.g every 256
 elements, tuned to your update patterns). Read-write it (increment it, be
 sure to force evaluation) just before every time you write an element or
 slice it or slice the array element.
 
 Incrementing and forcing evaluation should not be necessary, a  TVar ()
 should be enough. 

I was wrong, though not completely.

 I would be very much surprised if the internal STM
 machinery compares the actual _values_ of what is inside a TVar, I guess
 it just notes that a read or a write happened. 

According to the original STM paper the implementation does an equality 
test, albeit only for pointer equality. So, one should make sure that 
whatever is written to the TVar is a new object. An incremented integer 
would probably be ok, (no need to evaluate it, since the closure is newly 
allocated, thus a new object), a little more on the safe side is a new TVar 
i.e. use TVar (TVar ()).

Cheers
Ben


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient mutable arrays in STM

2011-10-30 Thread David Barbour
On Sun, Oct 30, 2011 at 6:20 PM, Ben Franksen ben.frank...@online.dewrote:

 According to the original STM paper the implementation does an equality
 test, albeit only for pointer equality.


It strikes me as bad form to depend on characteristics of `the
implementation`.



An incremented integer would probably be ok, (no need to evaluate it,

since the closure is newly allocated, thus a new object)


Evaluation would be necessary to avoid a subtle space-leak with laziness
semantics. The size of the closure is potentially linear with the number of
allocations.



A little more on the safe side is a new TVar


That works too.

Regards,

Dave
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient mutable arrays in STM

2011-10-28 Thread Benjamin Franksen
David Barbour wrote:
 Create an extra TVar Int for every `chunk` in the array (e.g every 256
 elements, tuned to your update patterns). Read-write it (increment it, be
 sure to force evaluation) just before every time you write an element or
 slice it or slice the array element.

Incrementing and forcing evaluation should not be necessary, a  TVar ()  
should be enough. I would be very much surprised if the internal STM 
machinery compares the actual _values_ of what is inside a TVar, I guess it 
just notes that a read or a write happened. Anyway, this is just a guess, I 
wonder if these details are documented somewhere...

Cheers
Ben


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient mutable arrays in STM

2011-10-27 Thread Ryan Ingram
On Tue, Oct 25, 2011 at 1:46 PM, Ben Franksen ben.frank...@online.dewrote:

  IME, there are (at least) two possible problems
  here, 1) transactions scale (quadratically, I think) with the number of
  TVars touched,

 Ouch! What would be the reason for that? I thought it would be linear... I
 mean what happens is that the transaction log gets built when the
 transaction runs, which should take a constant time per TVar, and then on
 commit we have to traverse the log, which is again linear in the number of
 TVars...


Your first assumption is incorrect.  Every time you access a TVar it needs
to check in the log if that TVar was already accessed in this transaction,
so that it can get/update the current value.  Right now the log is just a
list, so accessing a TVar takes O(number of TVars already accessed).

Consider this code:

f :: TVar Int - STM ()
f v = do
x - readTVar v
writeTVar v $! (x+1)

g :: Int - TVar Int - STM ()
g n v = mapM_ (\_ - f v) [1..n]

In order for f to get the current value of the TVar, it has to check if the
variable is in the log already; otherwise later calls to f will just see the
original value in memory and not repeatedly increment it.

As to your second assumption, it's been a while since I read the STM source,
so I can't comment there.

  -- ryan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient mutable arrays in STM

2011-10-27 Thread Alberto G. Corona


 The main question is: does the STM transaction actually see that I
 changed
 part of the underlying array, so that the transaction gets re-tried? Or do
 I
 have to implement this manually, and if yes: how?


The transaction does not detect anything inside the unsafeIOtoSTM.  But to
implement this manually is  simple: use retry whenever you need to retry
the local  transaction. if the affected transactions are more than the local
trnasaction, create an special TVar to be inspected by all of them. for
example  doesTheArrayHasChanged :: TVar Bool.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Efficient mutable arrays in STM

2011-10-25 Thread Ben Franksen
I have an application in mind where concurrent access to large arrays (up to 
millions of elements) of mostly small elements (Int or Double) is common. 
Typical access patterns would be chunk-wise, i.e. reads or writes from index 
n up to index m. Imagine stuff like images, scientific data, etc.

All this suggests that Control.Concurrent.STM.TArray, in its current 
implementation, is not appropriate. Quoting the docs:

It is currently implemented as Array ix (TVar e), but it may be replaced by 
a more efficient implementation in the future (the interface will remain the 
same, however).

An array of TVars is certainly *much* too inefficient for what I have in 
mind w.r.t. both memory and cpu time. In fact I had already decided to use 
Data.Vector.Unboxed from the vector package.

I see that Data.Vector.Unboxed.Mutable provides

  slice :: Unbox a = Int - Int - MVector s a - MVector s a
Yield a part of the mutable vector without copying it.

which is almost what I need... Can I use this together with unsafeIOToSTM 
internally inside a library to provide shared transactional access to an 
IOArray? The docs warn that using unsafeIOToSTM is (obviously) highly 
dangerous, but for what I have in mind the listed problems are not an 
issue:

 * running the IO code multiple times is ok
 * aborting is ok, too
 * inconsistent views are ok, too

The main question is: does the STM transaction actually see that I changed 
part of the underlying array, so that the transaction gets re-tried? Or do I 
have to implement this manually, and if yes: how?

Has anyone done something like that before? (If you tried this and found it 
doesn't work, please tell me, it would save me a lot of work repeating the 
effort).

Is someone working on providing a more efficient version of TArray?

Would it help if I said I'd be a happy user of a better TArray? ;-)

If what I sketched above is infeasible (I would not be surprised if it was, 
I haven't yet spent much effort trying it out), what other options do I 
have? Is there an internal API for the STM stuff, i.e. a C header file or 
something which would make it possible to add efficient TArrays?

Or should I use a high-level approach, something like a Data.Sequence.Seq of 
medium sized chunks (TVar (IOVector e))?

Any comments are highly appreciated!

Cheers
Ben


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient mutable arrays in STM

2011-10-25 Thread Antoine Latter
As far as I lnow the function 'unsafeIOToSTM' is not transactional in nature
- IO actions will be performed immediately and are not rolled back, and are
then re-performed on retry.
On Oct 25, 2011 12:49 PM, Ben Franksen ben.frank...@online.de wrote:

 I have an application in mind where concurrent access to large arrays (up
 to
 millions of elements) of mostly small elements (Int or Double) is common.
 Typical access patterns would be chunk-wise, i.e. reads or writes from
 index
 n up to index m. Imagine stuff like images, scientific data, etc.

 All this suggests that Control.Concurrent.STM.TArray, in its current
 implementation, is not appropriate. Quoting the docs:

 It is currently implemented as Array ix (TVar e), but it may be replaced
 by
 a more efficient implementation in the future (the interface will remain
 the
 same, however).

 An array of TVars is certainly *much* too inefficient for what I have in
 mind w.r.t. both memory and cpu time. In fact I had already decided to use
 Data.Vector.Unboxed from the vector package.

 I see that Data.Vector.Unboxed.Mutable provides

  slice :: Unbox a = Int - Int - MVector s a - MVector s a
Yield a part of the mutable vector without copying it.

 which is almost what I need... Can I use this together with unsafeIOToSTM
 internally inside a library to provide shared transactional access to an
 IOArray? The docs warn that using unsafeIOToSTM is (obviously) highly
 dangerous, but for what I have in mind the listed problems are not an
 issue:

  * running the IO code multiple times is ok
  * aborting is ok, too
  * inconsistent views are ok, too

 The main question is: does the STM transaction actually see that I
 changed
 part of the underlying array, so that the transaction gets re-tried? Or do
 I
 have to implement this manually, and if yes: how?

 Has anyone done something like that before? (If you tried this and found it
 doesn't work, please tell me, it would save me a lot of work repeating the
 effort).

 Is someone working on providing a more efficient version of TArray?

 Would it help if I said I'd be a happy user of a better TArray? ;-)

 If what I sketched above is infeasible (I would not be surprised if it was,
 I haven't yet spent much effort trying it out), what other options do I
 have? Is there an internal API for the STM stuff, i.e. a C header file or
 something which would make it possible to add efficient TArrays?

 Or should I use a high-level approach, something like a Data.Sequence.Seq
 of
 medium sized chunks (TVar (IOVector e))?

 Any comments are highly appreciated!

 Cheers
 Ben


 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient mutable arrays in STM

2011-10-25 Thread David Barbour
On Tue, Oct 25, 2011 at 10:47 AM, Ben Franksen ben.frank...@online.dewrote:

 The main question is: does the STM transaction actually see that I
 changed

part of the underlying array, so that the transaction gets re-tried? Or do I
 have to implement this manually, and if yes: how?


Create an extra TVar Int for every `chunk` in the array (e.g every 256
elements, tuned to your update patterns). Read-write it (increment it, be
sure to force evaluation) just before every time you write an element or
slice it or slice the array element. The IO mutable array is then adjusted
unsafely, but there is enough transactional context to restart transactions
that see an inconsistent state. You will have extra read/write and
write/write conflicts relative to a pure STM solution, but only within each
chunk.

A cleaner alternative is to create a `chunked` TArray, i.e. that works with
fixed-width immutable chunks in a spine. This should have good performance
characteristics, and would be a lot safer for general purpose use.

Regards,

Dave
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient mutable arrays in STM

2011-10-25 Thread Ben Franksen
David Barbour wrote:
 On Tue, Oct 25, 2011 at 10:47 AM, Ben Franksen
 ben.frank...@online.dewrote:
 
 The main question is: does the STM transaction actually see that I
 changed
 
 part of the underlying array, so that the transaction gets re-tried? Or do
 I
 have to implement this manually, and if yes: how?

 
 Create an extra TVar Int for every `chunk` in the array (e.g every 256
 elements, tuned to your update patterns). Read-write it (increment it, be
 sure to force evaluation) just before every time you write an element or
 slice it or slice the array element. The IO mutable array is then adjusted
 unsafely, but there is enough transactional context to restart
 transactions that see an inconsistent state. You will have extra
 read/write and write/write conflicts relative to a pure STM solution, but
 only within each chunk.

Your idea is quite similar to what I had in mind, and it encourages me that 
you think it should be possible to do it like that. My idea was to use 
variable-size chunks, based on which slices are currently in use and how 
they overlap, i.e. calculate the maximal non-overlapping index intervals. 
Such a solution would automatically adapt to the usage pattern, but is 
harder to implement efficiently.

 A cleaner alternative is to create a `chunked` TArray, i.e. that works
 with fixed-width immutable chunks in a spine. This should have good
 performance characteristics, and would be a lot safer for general purpose
 use.

This is also an interesting idea, probably much easier to implement, too.

Thanks for the feedback

Cheers
Ben


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient mutable arrays in STM

2011-10-25 Thread Ketil Malde
Ben Franksen ben.frank...@online.de writes:

 An array of TVars is certainly *much* too inefficient for what I have in 
 mind w.r.t. both memory and cpu time. 

You must be a lot more confident than I if you say this without
benchmarking first. :-) IME, there are (at least) two possible problems
here, 1) transactions scale (quadratically, I think) with the number of
TVars touched, so if any transaction touch a large part of the array,
it's going to cost you, and 2) every element of your array will have a
pointer to it, making GC potentially expensive.  Perhaps you can get
around the latter by tuning GC, e.g. +RTS -A100M might help.

 Or should I use a high-level approach, something like a Data.Sequence.Seq of 
 medium sized chunks (TVar (IOVector e))?

I'm not sure exactly what you mean here, but if you're going to touch
contigous segments of the array, why not TArray (Vector e) or similar?

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient mutable arrays in STM

2011-10-25 Thread Bryan O'Sullivan
On Tue, Oct 25, 2011 at 1:24 PM, Ketil Malde ke...@malde.org wrote:

 You must be a lot more confident than I if you say this without
 benchmarking first. :-) IME, there are (at least) two possible problems
 here, 1) transactions scale (quadratically, I think) with the number of
 TVars touched, so if any transaction touch a large part of the array,
 it's going to cost you, [...]


That woud remain true no matter what, but the current quadratic behaviour is
I believe easily enough fixed by switching to a better data structure than a
list.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient mutable arrays in STM

2011-10-25 Thread Ben Franksen
Ketil Malde wrote:

 Ben Franksen ben.frank...@online.de writes:
 
 An array of TVars is certainly *much* too inefficient for what I have in
 mind w.r.t. both memory and cpu time.
 
 You must be a lot more confident than I if you say this without
 benchmarking first. :-) 

Ok, not science, but an informed guess based on what I read about how STM is 
implemented in ghc. Cache locality is one of the main reasons why unboxed 
arrays perform so much better in practice than boxed ones, and TVars are 
most certainly boxed...

 IME, there are (at least) two possible problems
 here, 1) transactions scale (quadratically, I think) with the number of
 TVars touched, 

Ouch! What would be the reason for that? I thought it would be linear... I 
mean what happens is that the transaction log gets built when the 
transaction runs, which should take a constant time per TVar, and then on 
commit we have to traverse the log, which is again linear in the number of 
TVars...

 so if any transaction touch a large part of the array,
 it's going to cost you, and 2) every element of your array will have a
 pointer to it, making GC potentially expensive.  Perhaps you can get
 around the latter by tuning GC, e.g. +RTS -A100M might help.
 
 Or should I use a high-level approach, something like a Data.Sequence.Seq
 of medium sized chunks (TVar (IOVector e))?
 
 I'm not sure exactly what you mean here, but if you're going to touch
 contigous segments of the array, why not TArray (Vector e) or similar?

Yes, that was what David suggested, too. Have to think about it.

Cheers
Ben


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe