Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-24 Thread Udo Stenzel
Pete Kazmier wrote: train:: [B.ByteString] - WordFreq train words = frequencyMap where frequencyMap = foldr incWordCount M.empty words incWordCount w m = M.insertWith (+) w 1 m So is 'incWordCount' strict in its second argument? I'm still not sure exactly

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-24 Thread Bryan O'Sullivan
Udo Stenzel wrote: There is another bug of this sort in your code. Consider incWordCount w m = M.insertWith (+) w 1 m There is no reason to evaluate the sum inside the map, instead an unevaluated thunk is put in there. Would not Data.Map.insertWith' do the trick? b

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-24 Thread Udo Stenzel
Bryan O'Sullivan wrote: Udo Stenzel wrote: There is another bug of this sort in your code. Consider incWordCount w m = M.insertWith (+) w 1 m There is no reason to evaluate the sum inside the map, instead an unevaluated thunk is put in there. Would not Data.Map.insertWith'

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-23 Thread Albert Y. C. Lai
Albert Y. C. Lai wrote: I try using WordSet = [String] (plus corresponding change in code) and get great speedup, actually way more than 3x. There was also a memory growth phenomenon using Set String, and replacement by [String] stops that too, now it's constant space (constant = 20M). It is

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Bryan O'Sullivan
Ketil Malde wrote: Hm - nobody suggested using ByteStrings yet? I wrote an independent port of Norvig's spellchecker, because I figured it would make the basis of an interesting tutorial. For such a small program, it's been quite a challenge! I started out using lazy ByteStrings,

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Ketil Malde
On Sun, 2007-04-22 at 00:25 -0400, Pete Kazmier wrote: Pete Kazmier [EMAIL PROTECTED] writes: I'd love to see other Haskell implementations as well if anyone has a few moments to spare. Admittedly, it took me several hours to get my version working, but I'm a Haskell newbie.

[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Pete Kazmier
Ketil Malde [EMAIL PROTECTED] writes: On Sun, 2007-04-22 at 00:25 -0400, Pete Kazmier wrote: Pete Kazmier [EMAIL PROTECTED] writes: I'd love to see other Haskell implementations as well if anyone has a few moments to spare. Admittedly, it took me several hours to get my version

[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Pete Kazmier
Bryan O'Sullivan [EMAIL PROTECTED] writes: After this switch, I found that spellchecking one word still took 4x as long in Haskell as Norvig's Python program. Since I was checking only one word in each case, essentially all of the runtime was taken up by building the word frequency map.

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Derek Elkins
Pete Kazmier wrote: Bryan O'Sullivan [EMAIL PROTECTED] writes: After this switch, I found that spellchecking one word still took 4x as long in Haskell as Norvig's Python program. Since I was checking only one word in each case, essentially all of the runtime was taken up by building the word

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Bryan O'Sullivan
Pete Kazmier wrote: Bryan, out of curiosity, is a non bytestring version of your code faster? No, but the difference is smaller than I expected: the lazy ByteString version is about 1.8x faster than using String. b ___ Haskell-Cafe

[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Pete Kazmier
Derek Elkins [EMAIL PROTECTED] writes: After reading Bryan's post, I switched my right fold to a strict left fold and almost tripled my original speed. Could someone provide some guidelines on when to use each? I thought foldr should be used when building some large data structure such as

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Derek Elkins
Pete Kazmier wrote: Derek Elkins [EMAIL PROTECTED] writes: After reading Bryan's post, I switched my right fold to a strict left fold and almost tripled my original speed. Could someone provide some guidelines on when to use each? I thought foldr should be used when building some large data

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Ketil Malde
On Sun, 2007-04-22 at 11:51 -0400, Pete Kazmier wrote: type WordFreq = M.Map B.ByteString Int train:: [B.ByteString] - WordFreq train words = frequencyMap where frequencyMap = foldr incWordCount M.empty words incWordCount w m = M.insertWith (+) w 1 m

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Stefan O'Rear
On Sun, Apr 22, 2007 at 10:10:44PM +0200, Ketil Malde wrote: On Sun, 2007-04-22 at 11:51 -0400, Pete Kazmier wrote: type WordFreq = M.Map B.ByteString Int train:: [B.ByteString] - WordFreq train words = frequencyMap where frequencyMap = foldr incWordCount

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Felipe Almeida Lessa
On 4/22/07, Pete Kazmier [EMAIL PROTECTED] wrote: Worse - and this is true for ByteStrings, too - String comparisons are O(n), which means lookups in Sets and Maps are expensive. A trie (i.e, a search tree where each internal node corresponds to a word prefix, and has one branch per letter

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Duncan Coutts
On Sun, 2007-04-22 at 12:55 -0400, Pete Kazmier wrote: After reading Bryan's post, I switched my right fold to a strict left fold and almost tripled my original speed. Could someone provide some guidelines on when to use each? I thought foldr should be used when building some large data

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Bryan O'Sullivan
Bryan O'Sullivan wrote: In my profile results, I find that simply converting words to lower case accounts for a whopping 40% of time and allocation (see the attachment for my definition of the train function). COST CENTREMODULE %time %alloc lower

[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-21 Thread Arie Peterson
Hi Pete, Recently I read an interesting article by Peter Norvig[1] on how to write a spelling corrector in 21-lines of Python. I wanted to try and implement it in Haskell. My implementation is terribly slow and I was hoping for some tips on how to improve it and make it idiomatic. I had a

[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-21 Thread Pete Kazmier
Pete Kazmier [EMAIL PROTECTED] writes: I'd love to see other Haskell implementations as well if anyone has a few moments to spare. Admittedly, it took me several hours to get my version working, but I'm a Haskell newbie. Unfortunately, I think it runs as slow as it took me to write it!

Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-21 Thread Albert Y. C. Lai
I try using WordSet = [String] (plus corresponding change in code) and get great speedup, actually way more than 3x. There was also a memory growth phenomenon using Set String, and replacement by [String] stops that too, now it's constant space (constant = 20M). It is possible to attribute