Re: [Haskell-cafe] Best bit LIST data structure

2011-10-11 Thread Ryan Ingram
On Sun, Oct 9, 2011 at 6:18 AM, Ryan Newton rrnew...@gmail.com wrote:


 Yep, it is simple.  But I prefer to only use well-tested data structure
 libraries where I can!  Here's an example simple implementation (partial --
 missing some common functions):


 module Data.BitList
   ( BitList
   , cons, head, tail, empty
   , pack, unpack, length, drop
   )
 where

 import Data.Int
 import Data.Bits
 import Prelude as P hiding (head,tail,drop,length)
 import qualified Data.List as L
 import Test.HUnit

 data BitList = One  {-# UNPACK #-} !Int {-# UNPACK #-} !Int64
  | More {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 BitList


I suggest

data BitTail = Zero | More {-# UNPACK #-} !Int64 BitTail
data BitList = Head {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 BitTail
empty = Head 0 0 Zero

or else just
data BitList = Head {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 [Int64]
empty = Head 0 0 []
length (Head n _ xs) = n + 64 * List.length xs

unpack :: BitList - [Bool]
 unpack (One 0 _) = []
 unpack (One i bv)= (bv `testBit` (i-1)) : unpack (One (i-1) bv)
 unpack (More 0 _ r)  =  unpack r
 unpack (More i bv r) = (bv `testBit` (i-1)) : unpack (More (i-1) bv r)


I'd implement as

view :: BitList - Maybe (Bool, BitList)
view (One 0 _) = Nothing
view bl = Just (head bl, tail bl)

unpack = unfoldr view


 drop :: Int - BitList - BitList
 drop 0 bl   = bl
 drop n bl | n = 64 = case bl of
 One _ _- error drop: not enough elements in BitList
 More i _ r - drop (n-i) r
 drop n bl = case bl of
   One i  bv   - One  (i-n) bv
   More i bv r - More (i-n) bv r


This is wrong.

drop 5 (More 1 0 (One 64 0))
-
More (-4) 0 (One 64 0)

Fixed version (also gives same behavior as List.drop when n  length l)

drop :: Int - BitList - BitList
drop n (One i bv)
   | n = i = empty
   | otherwise = One (i - n) bv
drop n (More i bv r)
   | n = i = drop (n - i) r
   | otherwise = More (i - n) bv r

  -- ryan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Best bit LIST data structure

2011-10-10 Thread Ryan Newton
On Sun, Oct 9, 2011 at 12:11 PM, Roman Beslik ber...@ukr.net wrote:

 Yes, if you do not use high-level concepts and optimize everything by hand,
 it requires a lot of testing. :)


There are probably more constructive, jibe-free ways to frame this
suggestion...

Regarding testing:  my preference for using a preexisting solution is a
product of 18 years of programming in Scheme without a large base of shared
infrastructure -- I've seen way too much roll your own X leading to
trouble.

Regarding high-performance data-structures in Haskell: I wish high-level
concepts were sufficient for their optimization.  But if you look at all the
tricks played by, for example, Johan Tibell and Greg Collins in their
excellent hashmaps and hashtables libraries, that, alas, seems not to be the
case yet.  GHC is in a good position to do inlining and specialization
(making the world safe for type classes), but it can't add unpack and
strictness annotations, nor can it change data representations themselves.

For example, to answer Yves question:

I fail to understand. Why not just:
  data BitList b = Nil | BitList Int b (BitList b)
 ??


That was a data structure unrolling to optimize the memory representation
in the common case (64 bits).  Starting with:

 type I = Int64 -- or whatever
 data BitList = Nil | BL Int I BitList

The recursive datatype can be inlined (once):

 data BitList = Nil | BL  Int I (Nil | BL Int I BitList) *-- not real syntax
*
 data BitList = Nil | BL2 Int I Nil | BL3 Int I Int I BitList* -- distribute
*
 data BitList = Nil | BL2 Int I | BL3 Int I Int I BitList *-- prune Nil*

This unrolled data structure has two advantages.  It can directly represent
the common case 64 bits with one object, and it can use half the tail
pointers for longer lists.  GHC could conceivably transform code
automatically to enable this unrolling (but it can't now).

However, there are some further observations that really require a human.
Because we are using that extra Int to track the bit position inside the I
the Nil case is redundant -- BL2 0 0 can represent empty.  Further one of
the Ints in the BL3 case is always 64 (sizeof I) and needn't be
represented.  That gives us:

 data BitList = BL2 Int I | BL3 Int I I BitList *-- prune Nil*

Which is pretty much what I used.  Actually, I skipped the double wide
second case because I was really only worried about simplifying the
representation for shorter lists and that would indeed complicate the code.

 data *(Bits b) =* BitList b

FYI, in the bit of code I sent I didn't generalize over the Bits class
because it's really an implementation detail what size chunk is used (just
like in Lazy ByteStrings).  I didn't want to pollute the interface.  That
said, the code should certainly be CSEd to make the 64/Int64 choice
swappable.

Best regards,
  -Ryan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Best bit LIST data structure

2011-10-09 Thread Roman Beslik

I am not aware of such a library, but IMHO this code will be very simple.
 data Bits b = BitList b = BitList Int {- number of used bits in the 
next component -} b [b]

Write an isomorphism between @BitList b@ and @ListStep (BitList b)@
where
 data ListStep e rc = Nil | Cons e rc

On 07.10.11 17:52, Ryan Newton wrote:

Hi Cafe,

We are lucky to have a plethora of data structures out there.  But it 
does make choosing one off hackage difficult at times.  In this case 
I'm *not* looking for a O(1) access bit vector (Data.Vector.Unboxed 
seems to be the choice there), but an efficient representation for a 
list of bits (cons,head,tail).


Let's say that you want to represent tree indices as you walk down a 
binary tree.  [Bool] is a simple choice, you only add to the front of 
the list (0/1 = Left/Right), sharing the tails.  But [Bool] is quite 
space inefficient.


Something like [Int] would allow packing the bits more efficiently.  A 
Lazy ByteString could amortize the space overhead even more... but in 
both cases there's a tiny bit of work to do in wrapping those 
structures for per-bit access.  That's probably the right thing but I 
wanted to check to see if there's something else recommended, perhaps 
more off-the-shelf.


What about just using the Data.Bits instance of Integer?  Well, 
presently, the setBit instance for very large integers creates a whole 
new integer, shifts, and xors:
http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#setBit 

(I don't know if it's possible to do better.  From quick googling GMP 
seems to use an array of limbs rather than a chunked list, so maybe 
there's no way to treat large Integers as a list and update only the 
front...)


Advice appreciated!

Thanks,
  -Ryan


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Best bit LIST data structure

2011-10-09 Thread Ryan Newton
Yep, it is simple.  But I prefer to only use well-tested data structure
libraries where I can!  Here's an example simple implementation (partial --
missing some common functions):


module Data.BitList
  ( BitList
  , cons, head, tail, empty
  , pack, unpack, length, drop
  )
where

import Data.Int
import Data.Bits
import Prelude as P hiding (head,tail,drop,length)
import qualified Data.List as L
import Test.HUnit

data BitList = One  {-# UNPACK #-} !Int {-# UNPACK #-} !Int64
 | More {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 BitList

instance Show BitList where
 show bl = BitList  ++ show (map (\b - case b of True - '1'; False -
'0') (unpack bl))
-- show bl = pack  ++ show (unpack bl)

empty :: BitList
empty = One 0 0

cons :: Bool - BitList - BitList
cons True  x@(One  64 _ )   = More 1 1 x
cons False x@(One  64 _ )   = More 1 0 x
cons True  x@(More 64 bv _) = More 1 1 x
cons False x@(More 64 bv _) = More 1 0 x
cons True(One   i bv)   = One  (i+1) (bv `setBit` i)
cons False   (One   i bv)   = One  (i+1) (bv   )
cons True(More  i bv r) = More (i+1) (bv `setBit` i) r
cons False   (More  i bv r) = More (i+1) (bv   ) r

-- TODO: May consider (More 0 _ _) representation to reduce extra
-- allocation when size of the BitList is fluctuating back and forth.

head :: BitList - Bool
head (One  0 _   ) = error tried to take head of an empty BitList
head (More 0 _  r) = error BitList: data structure invariant failure!
head (One  i bv  ) = bv `testBit` (i-1)
head (More i bv r) = bv `testBit` (i-1)

tail :: BitList - BitList
tail (One  0 _   ) = error tried to take the tail of an empty BitList
tail (One  i bv  ) = One  (i-1) bv
tail (More 1 bv r) = r
tail (More i bv r) = More (i-1) bv r

pack :: [Bool] - BitList
pack  []   = One 0 0
pack (h:t) = cons h (pack t)

unpack :: BitList - [Bool]
unpack (One 0 _) = []
unpack (One i bv)= (bv `testBit` (i-1)) : unpack (One (i-1) bv)
unpack (More 0 _ r)  =  unpack r
unpack (More i bv r) = (bv `testBit` (i-1)) : unpack (More (i-1) bv r)

drop :: Int - BitList - BitList
drop 0 bl   = bl
drop n bl | n = 64 = case bl of
One _ _- error drop: not enough elements in BitList
More i _ r - drop (n-i) r
drop n bl = case bl of
  One i  bv   - One  (i-n) bv
  More i bv r - More (i-n) bv r

length :: BitList - Int
length (One  i _)   = i
length (More i _ r) = i + length r


-- TODO: index, take, etc

-- TODO: functor instance, etc.



-- Testing:

t1 = pack (L.concat$ L.replicate 10 [True,False,True])

t2 = L.length $ unpack $ pack $ replicate 1000 True

t3 = pack $ replicate 1000 True
t4 = drop 500 t3
p3 = L.and (unpack t3)
p4 = L.and (unpack t4)

t5 = iterate tail t4 !! 250
t5a = length t5
t5b = L.length (unpack t5)

tests :: Test
tests =
  TestList
[
  show t1 ~=? BitList \101101101101101101101101101101\
, t2  ~=? 1000
, t5a ~=? 250
, t5b ~=? 250
, p3  ~=? True
, p4  ~=? True
]

-- TODO: QuickCheck



On Sun, Oct 9, 2011 at 7:50 AM, Roman Beslik ber...@ukr.net wrote:

  I am not aware of such a library, but IMHO this code will be very simple.
  data Bits b = BitList b = BitList Int {- number of used bits in the next
 component -} b [b]
 Write an isomorphism between @BitList b@ and @ListStep (BitList b)@
 where
  data ListStep e rc = Nil | Cons e rc


 On 07.10.11 17:52, Ryan Newton wrote:

 Hi Cafe,

 We are lucky to have a plethora of data structures out there.  But it does
 make choosing one off hackage difficult at times.  In this case I'm *not*
 looking for a O(1) access bit vector (Data.Vector.Unboxed seems to be the
 choice there), but an efficient representation for a list of bits
 (cons,head,tail).

 Let's say that you want to represent tree indices as you walk down a binary
 tree.  [Bool] is a simple choice, you only add to the front of the list (0/1
 = Left/Right), sharing the tails.  But [Bool] is quite space inefficient.

 Something like [Int] would allow packing the bits more efficiently.  A Lazy
 ByteString could amortize the space overhead even more... but in both cases
 there's a tiny bit of work to do in wrapping those structures for per-bit
 access.  That's probably the right thing but I wanted to check to see if
 there's something else recommended, perhaps more off-the-shelf.

 What about just using the Data.Bits instance of Integer?  Well, presently,
 the setBit instance for very large integers creates a whole new integer,
 shifts, and xors:

 http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#setBit
 (I don't know if it's possible to do better.  From quick googling GMP seems
 to use an array of limbs rather than a chunked list, so maybe there's no
 way to treat large Integers as a list and update only the front...)

 Advice appreciated!

 Thanks,
   -Ryan


 ___
 Haskell-Cafe mailing 
 

Re: [Haskell-cafe] Best bit LIST data structure

2011-10-09 Thread Joachim Breitner
Hi,

Am Freitag, den 07.10.2011, 10:52 -0400 schrieb Ryan Newton:
 What about just using the Data.Bits instance of Integer?  Well,
 presently, the setBit instance for very large integers creates a whole
 new integer, shifts, and xors:
 
 http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#setBit
  
 (I don't know if it's possible to do better.  From quick googling GMP
 seems to use an array of limbs rather than a chunked list, so maybe
 there's no way to treat large Integers as a list and update only the
 front...)

interesting idea. Should this be considered a bug in ghc? (Not that it
cannot represent the result, but that it crashes even out of ghci):

$ ghci
GHCi, version 7.0.4: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude :m + Data.Bits
Prelude Data.Bits setBit 0 (2^63-1::Int)
gmp: overflow in mpz type
Abgebrochen


Greetings,
Joachim
-- 
Joachim nomeata Breitner
  m...@joachim-breitner.de  |  nome...@debian.org  |  GPG: 0x4743206C
  xmpp: nome...@joachim-breitner.de | http://www.joachim-breitner.de/



signature.asc
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Best bit LIST data structure

2011-10-09 Thread Daniel Fischer
On Sunday 09 October 2011, 15:54:14, Joachim Breitner wrote:
 Hi,
 
 Am Freitag, den 07.10.2011, 10:52 -0400 schrieb Ryan Newton:
  What about just using the Data.Bits instance of Integer?  Well,
  presently, the setBit instance for very large integers creates a whole
  new integer, shifts, and xors:
  
  http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.h
  tml#setBit (I don't know if it's possible to do better.  From quick
  googling GMP seems to use an array of limbs rather than a chunked
  list, so maybe there's no way to treat large Integers as a list and
  update only the front...)
 
 interesting idea. Should this be considered a bug in ghc? (Not that it
 cannot represent the result, but that it crashes even out of ghci):
 
 $ ghci
 GHCi, version 7.0.4: http://www.haskell.org/ghc/  :? for help
 Loading package ghc-prim ... linking ... done.
 Loading package integer-gmp ... linking ... done.
 Loading package base ... linking ... done.
 Prelude :m + Data.Bits
 Prelude Data.Bits setBit 0 (2^63-1::Int)
 gmp: overflow in mpz type
 Abgebrochen

says info gmp:

  `_mp_size' and `_mp_alloc' are `int', although `mp_size_t' is
usually a `long'.  This is done to make the fields just 32 bits on some
64 bits systems, thereby saving a few bytes of data space but still
providing plenty of range.

So it seems to be GMP itself.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Best bit LIST data structure

2011-10-09 Thread Thomas Schilling
On 9 October 2011 14:54, Joachim Breitner m...@joachim-breitner.de wrote:
 Hi,

 Am Freitag, den 07.10.2011, 10:52 -0400 schrieb Ryan Newton:
 What about just using the Data.Bits instance of Integer?  Well,
 presently, the setBit instance for very large integers creates a whole
 new integer, shifts, and xors:

 http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#setBit
 (I don't know if it's possible to do better.  From quick googling GMP
 seems to use an array of limbs rather than a chunked list, so maybe
 there's no way to treat large Integers as a list and update only the
 front...)

 interesting idea. Should this be considered a bug in ghc? (Not that it
 cannot represent the result, but that it crashes even out of ghci):

 $ ghci
 GHCi, version 7.0.4: http://www.haskell.org/ghc/  :? for help
 Loading package ghc-prim ... linking ... done.
 Loading package integer-gmp ... linking ... done.
 Loading package base ... linking ... done.
 Prelude :m + Data.Bits
 Prelude Data.Bits setBit 0 (2^63-1::Int)
 gmp: overflow in mpz type
 Abgebrochen

Yes, that's a bug.  GMP shouldn't call abort(), but it should be
turned into a Haskell exception.  It probably doesn't make much of a
difference in practise, but safe could should never crash GHC.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Best bit LIST data structure

2011-10-09 Thread Roman Beslik
Yes, if you do not use high-level concepts and optimize everything by 
hand, it requires a lot of testing. :)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Best bit LIST data structure

2011-10-09 Thread KC
Must it be a list?

What about a Bloom Filter?

On Sun, Oct 9, 2011 at 9:11 AM, Roman Beslik ber...@ukr.net wrote:
 Yes, if you do not use high-level concepts and optimize everything by hand,
 it requires a lot of testing. :)

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe




-- 
--
Regards,
KC

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Best bit LIST data structure

2011-10-09 Thread Yves Parès
 data *(Bits b) =* BitList b
Is deprecated and soon to be removed from the language.

I fail to understand. Why not just:
 data BitList b = Nil | BitList Int b (BitList b)
??

2011/10/9 Roman Beslik ber...@ukr.net

  I am not aware of such a library, but IMHO this code will be very simple.
  data Bits b = BitList b = BitList Int {- number of used bits in the next
 component -} b [b]
 Write an isomorphism between @BitList b@ and @ListStep (BitList b)@
 where
  data ListStep e rc = Nil | Cons e rc


 On 07.10.11 17:52, Ryan Newton wrote:

 Hi Cafe,

 We are lucky to have a plethora of data structures out there.  But it does
 make choosing one off hackage difficult at times.  In this case I'm *not*
 looking for a O(1) access bit vector (Data.Vector.Unboxed seems to be the
 choice there), but an efficient representation for a list of bits
 (cons,head,tail).

 Let's say that you want to represent tree indices as you walk down a binary
 tree.  [Bool] is a simple choice, you only add to the front of the list (0/1
 = Left/Right), sharing the tails.  But [Bool] is quite space inefficient.

 Something like [Int] would allow packing the bits more efficiently.  A Lazy
 ByteString could amortize the space overhead even more... but in both cases
 there's a tiny bit of work to do in wrapping those structures for per-bit
 access.  That's probably the right thing but I wanted to check to see if
 there's something else recommended, perhaps more off-the-shelf.

 What about just using the Data.Bits instance of Integer?  Well, presently,
 the setBit instance for very large integers creates a whole new integer,
 shifts, and xors:

 http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#setBit
 (I don't know if it's possible to do better.  From quick googling GMP seems
 to use an array of limbs rather than a chunked list, so maybe there's no
 way to treat large Integers as a list and update only the front...)

 Advice appreciated!

 Thanks,
   -Ryan


 ___
 Haskell-Cafe mailing 
 listHaskell-Cafe@haskell.orghttp://www.haskell.org/mailman/listinfo/haskell-cafe


 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Best bit LIST data structure

2011-10-07 Thread Ryan Newton
Hi Cafe,

We are lucky to have a plethora of data structures out there.  But it does
make choosing one off hackage difficult at times.  In this case I'm *not*
looking for a O(1) access bit vector (Data.Vector.Unboxed seems to be the
choice there), but an efficient representation for a list of bits
(cons,head,tail).

Let's say that you want to represent tree indices as you walk down a binary
tree.  [Bool] is a simple choice, you only add to the front of the list (0/1
= Left/Right), sharing the tails.  But [Bool] is quite space inefficient.

Something like [Int] would allow packing the bits more efficiently.  A Lazy
ByteString could amortize the space overhead even more... but in both cases
there's a tiny bit of work to do in wrapping those structures for per-bit
access.  That's probably the right thing but I wanted to check to see if
there's something else recommended, perhaps more off-the-shelf.

What about just using the Data.Bits instance of Integer?  Well, presently,
the setBit instance for very large integers creates a whole new integer,
shifts, and xors:

http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#setBit
(I don't know if it's possible to do better.  From quick googling GMP seems
to use an array of limbs rather than a chunked list, so maybe there's no
way to treat large Integers as a list and update only the front...)

Advice appreciated!

Thanks,
  -Ryan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe