Re: [Haskell-cafe] Layered maps

2010-10-10 Thread wren ng thornton

On 10/9/10 2:02 AM, Alex Rozenshteyn wrote:

This came up as I was doing homework for natural language processing.

I'm constructing a trigram model from training data, but I also need the
bigram and unigram counts.  I could, for each triple of characters, add the
3 tuple to a trigram map and increment its count (I know I'm not actually
mutating anything), add the last two letters to a bigram map, and add the
last character to a unigram map.

But this looks very much like a tree...  each node contains the character
(the key) and list of characters that can appear after it.  (I think this is
a trie, but I don't know very much about tries.)  The problem is that lookup
is more horribly inefficient than I can stand, if I use lists.


Yep, that's a trie :)

For what it's worth, in the HMM tagger I'm working on now, I'm using 
something to the effect of:


type IntTrie a = IntMap (a, IntTrie a)

Of course that's not a valid type synonym and I'm not really using 
infinite depth (though there's no reason not to allow it), but that's 
the idea. The Ints are gotten by interning the word/tag strings and I 
actually do a lot of newtype wrapping to ensure that my IDs for words, 
tags, etc and their maps are all kept distinct. In order to update your 
counts during training you can use things like:


succZ z = alter (Just . succ . maybe 0 id) z

succYZ y z =
alter (Just . (succ *** succZ z) . maybe (0,empty) id) y

succYZ x y z =
alter (Just . (succ *** succYZ y z) . maybe (0,empty) id) x

And to look things up you can use things like:

lookupZ   z = maybe 0 id . fst . lookup z
lookupYZy z = lookupZz . snd = lookup y
lookupXYZ x y z = lookupYZ y z . snd = lookup x

Clearly there's a lot of boilerplate there, but it works and it's very 
efficient. Ironically, combining the unigram, bigram, and trigram tables 
actually decreases performance (slightly) on my benchmarks; but I'm 
keeping them combined in order to reduce memory pressure.


You can also take a look at bytestring-trie[1] which is very similar and 
provides a nicer interface. Unfortunately, there's currently a big 
performance hit for using bytestring-trie, which I'm in the process of 
tracking down. Most of it is probably because it's Word8-based instead 
of Int-based, but I'd like to be sure. bytestring-trie does expose the 
trie-ness (more to come in the HEAD version). Of course, for HMM stuff, 
you don't really take advantage of the trie structure, so there's 
nothing special about using a trie instead of a Map NGram where,


data NGram
= Unigram {-# UNPACK #-} !Int
| Bigram  {-# UNPACK #-} !Int {-# UNPACK #-} !Int
| Trigram {-# UNPACK #-} !Int {-# UNPACK #-} !Int
  {-# UNPACK #-} !Int
deriving (Eq, Ord, Read, Show)

Expanding that out to three different IntMaps or the nested IntMaps is 
just a performance improvement, which I'm sure won't be important for 
your homework (i.e., it's not asymptotically significant).



[1] http://hackage.haskell.org/package/bytestring-trie

--
Live well,
~wren
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Layered maps

2010-10-09 Thread Alex Rozenshteyn
This came up as I was doing homework for natural language processing.

I'm constructing a trigram model from training data, but I also need the
bigram and unigram counts.  I could, for each triple of characters, add the
3 tuple to a trigram map and increment its count (I know I'm not actually
mutating anything), add the last two letters to a bigram map, and add the
last character to a unigram map.

But this looks very much like a tree...  each node contains the character
(the key) and list of characters that can appear after it.  (I think this is
a trie, but I don't know very much about tries.)  The problem is that lookup
is more horribly inefficient than I can stand, if I use lists.

As for what constitutes convenient, I haven't thought about it that much; I
just noted that my naive attempt was full of boilerplate and decided that
since there was already python code provided I used that.  I'm planning to
port the code *after* I have the assignment finished.

On Fri, Oct 8, 2010 at 11:18 PM, wren ng thornton w...@freegeek.org wrote:

 On 10/8/10 5:46 PM, Thomas DuBuisson wrote:

 Alex,

 The containers library can do this already - there are no constraints
 on the elements of a Map.  For example:

  type TripleNestedMap a = Map Int (Map Char (Map String a))


 But this is rather silly as you can just do:

  type MapOfTriples a = Map (Int ,Char, String) a


 for most uses.


 However, Map is a lot less efficient than IntMap when dealing with Ints.
 And the IntMap (IntMap ... a) type requires you to write (lookup m = ...
 = lookup n) instead of just one lookup. Unfortunately, when you're
 interested in performance issues, the standard tricks for implementing
 polyvariadic functions aren't very useful.

 FWIW, the monadic combinators are usually sufficient to create your own
 functions legiblely (e.g., using (=) for lookup), but it's still a lot
 noiser than it could be--- especially if you want a trie instead of a
 product map, since you have to add fst and snd everywhere. I've been playing
 around with some ad-hoc tries like these a lot lately, both for HMM tagging
 and for hunting down performance issues in bytestring-trie.

 --
 Live well,
 ~wren

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe




-- 
  Alex R
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Layered maps

2010-10-09 Thread Jake McArthur
What you describe sounds like a perfect job for a trie, so that's what I 
think you should look into.


- Jake
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Layered maps

2010-10-09 Thread Alex Rozenshteyn
Hmm.

It seems that both the trie packages I found just provide a standard
map-like interface, without exposing their trie-ness.  Makes sense, but
limits usefulness for me.

Thanks.

On Sat, Oct 9, 2010 at 3:08 PM, Jake McArthur jake.mcart...@gmail.comwrote:

 What you describe sounds like a perfect job for a trie, so that's what I
 think you should look into.

 - Jake

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe




-- 
  Alex R
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Layered maps

2010-10-08 Thread Alex Rozenshteyn
Does there exist a library which allows me to have maps whose elements are
maps whose elements ... with a convenient syntax.

Alternatively, does there exist a library like Data.Tree where forests are
sets rather than lists?

-- 
  Alex R
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Layered maps

2010-10-08 Thread Thomas DuBuisson
Alex,

The containers library can do this already - there are no constraints
on the elements of a Map.  For example:

 type TripleNestedMap a = Map Int (Map Char (Map String a))

But this is rather silly as you can just do:

 type MapOfTriples a = Map (Int ,Char, String) a

for most uses.

Cheers,
Thomas

On Fri, Oct 8, 2010 at 2:23 PM, Alex Rozenshteyn rpglove...@gmail.com wrote:
 Does there exist a library which allows me to have maps whose elements are
 maps whose elements ... with a convenient syntax.

 Alternatively, does there exist a library like Data.Tree where forests are
 sets rather than lists?

 --
           Alex R

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Layered maps

2010-10-08 Thread Jake McArthur

On 10/08/2010 04:23 PM, Alex Rozenshteyn wrote:

Does there exist a library which allows me to have maps whose elements
are maps whose elements ... with a convenient syntax.


It sounds like you might be looking for a trie of some sort. Would 
something like the TrieMap package suit your needs? It's hard to guess 
based only on the question as posed.


- Jake
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Layered maps

2010-10-08 Thread wren ng thornton

On 10/8/10 5:46 PM, Thomas DuBuisson wrote:

Alex,

The containers library can do this already - there are no constraints
on the elements of a Map.  For example:


type TripleNestedMap a = Map Int (Map Char (Map String a))


But this is rather silly as you can just do:


type MapOfTriples a = Map (Int ,Char, String) a


for most uses.


However, Map is a lot less efficient than IntMap when dealing with Ints. 
And the IntMap (IntMap ... a) type requires you to write (lookup m = 
... = lookup n) instead of just one lookup. Unfortunately, when you're 
interested in performance issues, the standard tricks for implementing 
polyvariadic functions aren't very useful.


FWIW, the monadic combinators are usually sufficient to create your own 
functions legiblely (e.g., using (=) for lookup), but it's still a lot 
noiser than it could be--- especially if you want a trie instead of a 
product map, since you have to add fst and snd everywhere. I've been 
playing around with some ad-hoc tries like these a lot lately, both for 
HMM tagging and for hunting down performance issues in bytestring-trie.


--
Live well,
~wren
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe