Re: [Haskell-cafe] Layered maps
On 10/9/10 2:02 AM, Alex Rozenshteyn wrote: This came up as I was doing homework for natural language processing. I'm constructing a trigram model from training data, but I also need the bigram and unigram counts. I could, for each triple of characters, add the 3 tuple to a trigram map and increment its count (I know I'm not actually mutating anything), add the last two letters to a bigram map, and add the last character to a unigram map. But this looks very much like a tree... each node contains the character (the key) and list of characters that can appear after it. (I think this is a trie, but I don't know very much about tries.) The problem is that lookup is more horribly inefficient than I can stand, if I use lists. Yep, that's a trie :) For what it's worth, in the HMM tagger I'm working on now, I'm using something to the effect of: type IntTrie a = IntMap (a, IntTrie a) Of course that's not a valid type synonym and I'm not really using infinite depth (though there's no reason not to allow it), but that's the idea. The Ints are gotten by interning the word/tag strings and I actually do a lot of newtype wrapping to ensure that my IDs for words, tags, etc and their maps are all kept distinct. In order to update your counts during training you can use things like: succZ z = alter (Just . succ . maybe 0 id) z succYZ y z = alter (Just . (succ *** succZ z) . maybe (0,empty) id) y succYZ x y z = alter (Just . (succ *** succYZ y z) . maybe (0,empty) id) x And to look things up you can use things like: lookupZ z = maybe 0 id . fst . lookup z lookupYZy z = lookupZz . snd = lookup y lookupXYZ x y z = lookupYZ y z . snd = lookup x Clearly there's a lot of boilerplate there, but it works and it's very efficient. Ironically, combining the unigram, bigram, and trigram tables actually decreases performance (slightly) on my benchmarks; but I'm keeping them combined in order to reduce memory pressure. You can also take a look at bytestring-trie[1] which is very similar and provides a nicer interface. Unfortunately, there's currently a big performance hit for using bytestring-trie, which I'm in the process of tracking down. Most of it is probably because it's Word8-based instead of Int-based, but I'd like to be sure. bytestring-trie does expose the trie-ness (more to come in the HEAD version). Of course, for HMM stuff, you don't really take advantage of the trie structure, so there's nothing special about using a trie instead of a Map NGram where, data NGram = Unigram {-# UNPACK #-} !Int | Bigram {-# UNPACK #-} !Int {-# UNPACK #-} !Int | Trigram {-# UNPACK #-} !Int {-# UNPACK #-} !Int {-# UNPACK #-} !Int deriving (Eq, Ord, Read, Show) Expanding that out to three different IntMaps or the nested IntMaps is just a performance improvement, which I'm sure won't be important for your homework (i.e., it's not asymptotically significant). [1] http://hackage.haskell.org/package/bytestring-trie -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Layered maps
This came up as I was doing homework for natural language processing. I'm constructing a trigram model from training data, but I also need the bigram and unigram counts. I could, for each triple of characters, add the 3 tuple to a trigram map and increment its count (I know I'm not actually mutating anything), add the last two letters to a bigram map, and add the last character to a unigram map. But this looks very much like a tree... each node contains the character (the key) and list of characters that can appear after it. (I think this is a trie, but I don't know very much about tries.) The problem is that lookup is more horribly inefficient than I can stand, if I use lists. As for what constitutes convenient, I haven't thought about it that much; I just noted that my naive attempt was full of boilerplate and decided that since there was already python code provided I used that. I'm planning to port the code *after* I have the assignment finished. On Fri, Oct 8, 2010 at 11:18 PM, wren ng thornton w...@freegeek.org wrote: On 10/8/10 5:46 PM, Thomas DuBuisson wrote: Alex, The containers library can do this already - there are no constraints on the elements of a Map. For example: type TripleNestedMap a = Map Int (Map Char (Map String a)) But this is rather silly as you can just do: type MapOfTriples a = Map (Int ,Char, String) a for most uses. However, Map is a lot less efficient than IntMap when dealing with Ints. And the IntMap (IntMap ... a) type requires you to write (lookup m = ... = lookup n) instead of just one lookup. Unfortunately, when you're interested in performance issues, the standard tricks for implementing polyvariadic functions aren't very useful. FWIW, the monadic combinators are usually sufficient to create your own functions legiblely (e.g., using (=) for lookup), but it's still a lot noiser than it could be--- especially if you want a trie instead of a product map, since you have to add fst and snd everywhere. I've been playing around with some ad-hoc tries like these a lot lately, both for HMM tagging and for hunting down performance issues in bytestring-trie. -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Alex R ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Layered maps
What you describe sounds like a perfect job for a trie, so that's what I think you should look into. - Jake ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Layered maps
Hmm. It seems that both the trie packages I found just provide a standard map-like interface, without exposing their trie-ness. Makes sense, but limits usefulness for me. Thanks. On Sat, Oct 9, 2010 at 3:08 PM, Jake McArthur jake.mcart...@gmail.comwrote: What you describe sounds like a perfect job for a trie, so that's what I think you should look into. - Jake ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Alex R ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Layered maps
Does there exist a library which allows me to have maps whose elements are maps whose elements ... with a convenient syntax. Alternatively, does there exist a library like Data.Tree where forests are sets rather than lists? -- Alex R ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Layered maps
Alex, The containers library can do this already - there are no constraints on the elements of a Map. For example: type TripleNestedMap a = Map Int (Map Char (Map String a)) But this is rather silly as you can just do: type MapOfTriples a = Map (Int ,Char, String) a for most uses. Cheers, Thomas On Fri, Oct 8, 2010 at 2:23 PM, Alex Rozenshteyn rpglove...@gmail.com wrote: Does there exist a library which allows me to have maps whose elements are maps whose elements ... with a convenient syntax. Alternatively, does there exist a library like Data.Tree where forests are sets rather than lists? -- Alex R ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Layered maps
On 10/08/2010 04:23 PM, Alex Rozenshteyn wrote: Does there exist a library which allows me to have maps whose elements are maps whose elements ... with a convenient syntax. It sounds like you might be looking for a trie of some sort. Would something like the TrieMap package suit your needs? It's hard to guess based only on the question as posed. - Jake ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Layered maps
On 10/8/10 5:46 PM, Thomas DuBuisson wrote: Alex, The containers library can do this already - there are no constraints on the elements of a Map. For example: type TripleNestedMap a = Map Int (Map Char (Map String a)) But this is rather silly as you can just do: type MapOfTriples a = Map (Int ,Char, String) a for most uses. However, Map is a lot less efficient than IntMap when dealing with Ints. And the IntMap (IntMap ... a) type requires you to write (lookup m = ... = lookup n) instead of just one lookup. Unfortunately, when you're interested in performance issues, the standard tricks for implementing polyvariadic functions aren't very useful. FWIW, the monadic combinators are usually sufficient to create your own functions legiblely (e.g., using (=) for lookup), but it's still a lot noiser than it could be--- especially if you want a trie instead of a product map, since you have to add fst and snd everywhere. I've been playing around with some ad-hoc tries like these a lot lately, both for HMM tagging and for hunting down performance issues in bytestring-trie. -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe