Re: [Haskell-cafe] Best bit LIST data structure
On Sun, Oct 9, 2011 at 6:18 AM, Ryan Newton rrnew...@gmail.com wrote: Yep, it is simple. But I prefer to only use well-tested data structure libraries where I can! Here's an example simple implementation (partial -- missing some common functions): module Data.BitList ( BitList , cons, head, tail, empty , pack, unpack, length, drop ) where import Data.Int import Data.Bits import Prelude as P hiding (head,tail,drop,length) import qualified Data.List as L import Test.HUnit data BitList = One {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 | More {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 BitList I suggest data BitTail = Zero | More {-# UNPACK #-} !Int64 BitTail data BitList = Head {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 BitTail empty = Head 0 0 Zero or else just data BitList = Head {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 [Int64] empty = Head 0 0 [] length (Head n _ xs) = n + 64 * List.length xs unpack :: BitList - [Bool] unpack (One 0 _) = [] unpack (One i bv)= (bv `testBit` (i-1)) : unpack (One (i-1) bv) unpack (More 0 _ r) = unpack r unpack (More i bv r) = (bv `testBit` (i-1)) : unpack (More (i-1) bv r) I'd implement as view :: BitList - Maybe (Bool, BitList) view (One 0 _) = Nothing view bl = Just (head bl, tail bl) unpack = unfoldr view drop :: Int - BitList - BitList drop 0 bl = bl drop n bl | n = 64 = case bl of One _ _- error drop: not enough elements in BitList More i _ r - drop (n-i) r drop n bl = case bl of One i bv - One (i-n) bv More i bv r - More (i-n) bv r This is wrong. drop 5 (More 1 0 (One 64 0)) - More (-4) 0 (One 64 0) Fixed version (also gives same behavior as List.drop when n length l) drop :: Int - BitList - BitList drop n (One i bv) | n = i = empty | otherwise = One (i - n) bv drop n (More i bv r) | n = i = drop (n - i) r | otherwise = More (i - n) bv r -- ryan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Best bit LIST data structure
On Sun, Oct 9, 2011 at 12:11 PM, Roman Beslik ber...@ukr.net wrote: Yes, if you do not use high-level concepts and optimize everything by hand, it requires a lot of testing. :) There are probably more constructive, jibe-free ways to frame this suggestion... Regarding testing: my preference for using a preexisting solution is a product of 18 years of programming in Scheme without a large base of shared infrastructure -- I've seen way too much roll your own X leading to trouble. Regarding high-performance data-structures in Haskell: I wish high-level concepts were sufficient for their optimization. But if you look at all the tricks played by, for example, Johan Tibell and Greg Collins in their excellent hashmaps and hashtables libraries, that, alas, seems not to be the case yet. GHC is in a good position to do inlining and specialization (making the world safe for type classes), but it can't add unpack and strictness annotations, nor can it change data representations themselves. For example, to answer Yves question: I fail to understand. Why not just: data BitList b = Nil | BitList Int b (BitList b) ?? That was a data structure unrolling to optimize the memory representation in the common case (64 bits). Starting with: type I = Int64 -- or whatever data BitList = Nil | BL Int I BitList The recursive datatype can be inlined (once): data BitList = Nil | BL Int I (Nil | BL Int I BitList) *-- not real syntax * data BitList = Nil | BL2 Int I Nil | BL3 Int I Int I BitList* -- distribute * data BitList = Nil | BL2 Int I | BL3 Int I Int I BitList *-- prune Nil* This unrolled data structure has two advantages. It can directly represent the common case 64 bits with one object, and it can use half the tail pointers for longer lists. GHC could conceivably transform code automatically to enable this unrolling (but it can't now). However, there are some further observations that really require a human. Because we are using that extra Int to track the bit position inside the I the Nil case is redundant -- BL2 0 0 can represent empty. Further one of the Ints in the BL3 case is always 64 (sizeof I) and needn't be represented. That gives us: data BitList = BL2 Int I | BL3 Int I I BitList *-- prune Nil* Which is pretty much what I used. Actually, I skipped the double wide second case because I was really only worried about simplifying the representation for shorter lists and that would indeed complicate the code. data *(Bits b) =* BitList b FYI, in the bit of code I sent I didn't generalize over the Bits class because it's really an implementation detail what size chunk is used (just like in Lazy ByteStrings). I didn't want to pollute the interface. That said, the code should certainly be CSEd to make the 64/Int64 choice swappable. Best regards, -Ryan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Best bit LIST data structure
I am not aware of such a library, but IMHO this code will be very simple. data Bits b = BitList b = BitList Int {- number of used bits in the next component -} b [b] Write an isomorphism between @BitList b@ and @ListStep (BitList b)@ where data ListStep e rc = Nil | Cons e rc On 07.10.11 17:52, Ryan Newton wrote: Hi Cafe, We are lucky to have a plethora of data structures out there. But it does make choosing one off hackage difficult at times. In this case I'm *not* looking for a O(1) access bit vector (Data.Vector.Unboxed seems to be the choice there), but an efficient representation for a list of bits (cons,head,tail). Let's say that you want to represent tree indices as you walk down a binary tree. [Bool] is a simple choice, you only add to the front of the list (0/1 = Left/Right), sharing the tails. But [Bool] is quite space inefficient. Something like [Int] would allow packing the bits more efficiently. A Lazy ByteString could amortize the space overhead even more... but in both cases there's a tiny bit of work to do in wrapping those structures for per-bit access. That's probably the right thing but I wanted to check to see if there's something else recommended, perhaps more off-the-shelf. What about just using the Data.Bits instance of Integer? Well, presently, the setBit instance for very large integers creates a whole new integer, shifts, and xors: http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#setBit (I don't know if it's possible to do better. From quick googling GMP seems to use an array of limbs rather than a chunked list, so maybe there's no way to treat large Integers as a list and update only the front...) Advice appreciated! Thanks, -Ryan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Best bit LIST data structure
Yep, it is simple. But I prefer to only use well-tested data structure libraries where I can! Here's an example simple implementation (partial -- missing some common functions): module Data.BitList ( BitList , cons, head, tail, empty , pack, unpack, length, drop ) where import Data.Int import Data.Bits import Prelude as P hiding (head,tail,drop,length) import qualified Data.List as L import Test.HUnit data BitList = One {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 | More {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 BitList instance Show BitList where show bl = BitList ++ show (map (\b - case b of True - '1'; False - '0') (unpack bl)) -- show bl = pack ++ show (unpack bl) empty :: BitList empty = One 0 0 cons :: Bool - BitList - BitList cons True x@(One 64 _ ) = More 1 1 x cons False x@(One 64 _ ) = More 1 0 x cons True x@(More 64 bv _) = More 1 1 x cons False x@(More 64 bv _) = More 1 0 x cons True(One i bv) = One (i+1) (bv `setBit` i) cons False (One i bv) = One (i+1) (bv ) cons True(More i bv r) = More (i+1) (bv `setBit` i) r cons False (More i bv r) = More (i+1) (bv ) r -- TODO: May consider (More 0 _ _) representation to reduce extra -- allocation when size of the BitList is fluctuating back and forth. head :: BitList - Bool head (One 0 _ ) = error tried to take head of an empty BitList head (More 0 _ r) = error BitList: data structure invariant failure! head (One i bv ) = bv `testBit` (i-1) head (More i bv r) = bv `testBit` (i-1) tail :: BitList - BitList tail (One 0 _ ) = error tried to take the tail of an empty BitList tail (One i bv ) = One (i-1) bv tail (More 1 bv r) = r tail (More i bv r) = More (i-1) bv r pack :: [Bool] - BitList pack [] = One 0 0 pack (h:t) = cons h (pack t) unpack :: BitList - [Bool] unpack (One 0 _) = [] unpack (One i bv)= (bv `testBit` (i-1)) : unpack (One (i-1) bv) unpack (More 0 _ r) = unpack r unpack (More i bv r) = (bv `testBit` (i-1)) : unpack (More (i-1) bv r) drop :: Int - BitList - BitList drop 0 bl = bl drop n bl | n = 64 = case bl of One _ _- error drop: not enough elements in BitList More i _ r - drop (n-i) r drop n bl = case bl of One i bv - One (i-n) bv More i bv r - More (i-n) bv r length :: BitList - Int length (One i _) = i length (More i _ r) = i + length r -- TODO: index, take, etc -- TODO: functor instance, etc. -- Testing: t1 = pack (L.concat$ L.replicate 10 [True,False,True]) t2 = L.length $ unpack $ pack $ replicate 1000 True t3 = pack $ replicate 1000 True t4 = drop 500 t3 p3 = L.and (unpack t3) p4 = L.and (unpack t4) t5 = iterate tail t4 !! 250 t5a = length t5 t5b = L.length (unpack t5) tests :: Test tests = TestList [ show t1 ~=? BitList \101101101101101101101101101101\ , t2 ~=? 1000 , t5a ~=? 250 , t5b ~=? 250 , p3 ~=? True , p4 ~=? True ] -- TODO: QuickCheck On Sun, Oct 9, 2011 at 7:50 AM, Roman Beslik ber...@ukr.net wrote: I am not aware of such a library, but IMHO this code will be very simple. data Bits b = BitList b = BitList Int {- number of used bits in the next component -} b [b] Write an isomorphism between @BitList b@ and @ListStep (BitList b)@ where data ListStep e rc = Nil | Cons e rc On 07.10.11 17:52, Ryan Newton wrote: Hi Cafe, We are lucky to have a plethora of data structures out there. But it does make choosing one off hackage difficult at times. In this case I'm *not* looking for a O(1) access bit vector (Data.Vector.Unboxed seems to be the choice there), but an efficient representation for a list of bits (cons,head,tail). Let's say that you want to represent tree indices as you walk down a binary tree. [Bool] is a simple choice, you only add to the front of the list (0/1 = Left/Right), sharing the tails. But [Bool] is quite space inefficient. Something like [Int] would allow packing the bits more efficiently. A Lazy ByteString could amortize the space overhead even more... but in both cases there's a tiny bit of work to do in wrapping those structures for per-bit access. That's probably the right thing but I wanted to check to see if there's something else recommended, perhaps more off-the-shelf. What about just using the Data.Bits instance of Integer? Well, presently, the setBit instance for very large integers creates a whole new integer, shifts, and xors: http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#setBit (I don't know if it's possible to do better. From quick googling GMP seems to use an array of limbs rather than a chunked list, so maybe there's no way to treat large Integers as a list and update only the front...) Advice appreciated! Thanks, -Ryan ___ Haskell-Cafe mailing
Re: [Haskell-cafe] Best bit LIST data structure
Hi, Am Freitag, den 07.10.2011, 10:52 -0400 schrieb Ryan Newton: What about just using the Data.Bits instance of Integer? Well, presently, the setBit instance for very large integers creates a whole new integer, shifts, and xors: http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#setBit (I don't know if it's possible to do better. From quick googling GMP seems to use an array of limbs rather than a chunked list, so maybe there's no way to treat large Integers as a list and update only the front...) interesting idea. Should this be considered a bug in ghc? (Not that it cannot represent the result, but that it crashes even out of ghci): $ ghci GHCi, version 7.0.4: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude :m + Data.Bits Prelude Data.Bits setBit 0 (2^63-1::Int) gmp: overflow in mpz type Abgebrochen Greetings, Joachim -- Joachim nomeata Breitner m...@joachim-breitner.de | nome...@debian.org | GPG: 0x4743206C xmpp: nome...@joachim-breitner.de | http://www.joachim-breitner.de/ signature.asc Description: This is a digitally signed message part ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Best bit LIST data structure
On Sunday 09 October 2011, 15:54:14, Joachim Breitner wrote: Hi, Am Freitag, den 07.10.2011, 10:52 -0400 schrieb Ryan Newton: What about just using the Data.Bits instance of Integer? Well, presently, the setBit instance for very large integers creates a whole new integer, shifts, and xors: http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.h tml#setBit (I don't know if it's possible to do better. From quick googling GMP seems to use an array of limbs rather than a chunked list, so maybe there's no way to treat large Integers as a list and update only the front...) interesting idea. Should this be considered a bug in ghc? (Not that it cannot represent the result, but that it crashes even out of ghci): $ ghci GHCi, version 7.0.4: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude :m + Data.Bits Prelude Data.Bits setBit 0 (2^63-1::Int) gmp: overflow in mpz type Abgebrochen says info gmp: `_mp_size' and `_mp_alloc' are `int', although `mp_size_t' is usually a `long'. This is done to make the fields just 32 bits on some 64 bits systems, thereby saving a few bytes of data space but still providing plenty of range. So it seems to be GMP itself. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Best bit LIST data structure
On 9 October 2011 14:54, Joachim Breitner m...@joachim-breitner.de wrote: Hi, Am Freitag, den 07.10.2011, 10:52 -0400 schrieb Ryan Newton: What about just using the Data.Bits instance of Integer? Well, presently, the setBit instance for very large integers creates a whole new integer, shifts, and xors: http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#setBit (I don't know if it's possible to do better. From quick googling GMP seems to use an array of limbs rather than a chunked list, so maybe there's no way to treat large Integers as a list and update only the front...) interesting idea. Should this be considered a bug in ghc? (Not that it cannot represent the result, but that it crashes even out of ghci): $ ghci GHCi, version 7.0.4: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude :m + Data.Bits Prelude Data.Bits setBit 0 (2^63-1::Int) gmp: overflow in mpz type Abgebrochen Yes, that's a bug. GMP shouldn't call abort(), but it should be turned into a Haskell exception. It probably doesn't make much of a difference in practise, but safe could should never crash GHC. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Best bit LIST data structure
Yes, if you do not use high-level concepts and optimize everything by hand, it requires a lot of testing. :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Best bit LIST data structure
Must it be a list? What about a Bloom Filter? On Sun, Oct 9, 2011 at 9:11 AM, Roman Beslik ber...@ukr.net wrote: Yes, if you do not use high-level concepts and optimize everything by hand, it requires a lot of testing. :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- -- Regards, KC ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Best bit LIST data structure
data *(Bits b) =* BitList b Is deprecated and soon to be removed from the language. I fail to understand. Why not just: data BitList b = Nil | BitList Int b (BitList b) ?? 2011/10/9 Roman Beslik ber...@ukr.net I am not aware of such a library, but IMHO this code will be very simple. data Bits b = BitList b = BitList Int {- number of used bits in the next component -} b [b] Write an isomorphism between @BitList b@ and @ListStep (BitList b)@ where data ListStep e rc = Nil | Cons e rc On 07.10.11 17:52, Ryan Newton wrote: Hi Cafe, We are lucky to have a plethora of data structures out there. But it does make choosing one off hackage difficult at times. In this case I'm *not* looking for a O(1) access bit vector (Data.Vector.Unboxed seems to be the choice there), but an efficient representation for a list of bits (cons,head,tail). Let's say that you want to represent tree indices as you walk down a binary tree. [Bool] is a simple choice, you only add to the front of the list (0/1 = Left/Right), sharing the tails. But [Bool] is quite space inefficient. Something like [Int] would allow packing the bits more efficiently. A Lazy ByteString could amortize the space overhead even more... but in both cases there's a tiny bit of work to do in wrapping those structures for per-bit access. That's probably the right thing but I wanted to check to see if there's something else recommended, perhaps more off-the-shelf. What about just using the Data.Bits instance of Integer? Well, presently, the setBit instance for very large integers creates a whole new integer, shifts, and xors: http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#setBit (I don't know if it's possible to do better. From quick googling GMP seems to use an array of limbs rather than a chunked list, so maybe there's no way to treat large Integers as a list and update only the front...) Advice appreciated! Thanks, -Ryan ___ Haskell-Cafe mailing listHaskell-Cafe@haskell.orghttp://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Best bit LIST data structure
Hi Cafe, We are lucky to have a plethora of data structures out there. But it does make choosing one off hackage difficult at times. In this case I'm *not* looking for a O(1) access bit vector (Data.Vector.Unboxed seems to be the choice there), but an efficient representation for a list of bits (cons,head,tail). Let's say that you want to represent tree indices as you walk down a binary tree. [Bool] is a simple choice, you only add to the front of the list (0/1 = Left/Right), sharing the tails. But [Bool] is quite space inefficient. Something like [Int] would allow packing the bits more efficiently. A Lazy ByteString could amortize the space overhead even more... but in both cases there's a tiny bit of work to do in wrapping those structures for per-bit access. That's probably the right thing but I wanted to check to see if there's something else recommended, perhaps more off-the-shelf. What about just using the Data.Bits instance of Integer? Well, presently, the setBit instance for very large integers creates a whole new integer, shifts, and xors: http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#setBit (I don't know if it's possible to do better. From quick googling GMP seems to use an array of limbs rather than a chunked list, so maybe there's no way to treat large Integers as a list and update only the front...) Advice appreciated! Thanks, -Ryan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe