Re: [Haskell-cafe] Re: Optimizing spelling correction program

2009-06-24 Thread wren ng thornton
Kamil Dworakowski wrote: On Jun 22, 10:03 am, Eugene Kirpichov ekirpic...@gmail.com wrote: Hey, you're using String I/O! nWORDS - fmap (train . map B.pack . words) (readFile big.txt) This should be WORDS - fmap (train . B.words) (B.readFile big.txt) By the way, which exact file do you use

[Haskell-cafe] Re: Optimizing spelling correction program

2009-06-23 Thread Kamil Dworakowski
Using Bryan O'Sullivan's fantastic BloomFilter I got it down below Python's run time! Now it is 35.56s, 28% of the time is spent on GC, which I think means there is still some room for improvement. One easy way to fix the GC time is to increase the default heap size.  ./a.out +RTS -A200M

[Haskell-cafe] Re: Optimizing spelling correction program

2009-06-23 Thread Kamil Dworakowski
On Jun 22, 10:12 pm, Bulat Ziganshin bulat.zigans...@gmail.com wrote: Hello Kamil, Tuesday, June 23, 2009, 12:54:49 AM, you wrote: I went back to using Strings instead of ByteStrings and with that hashtable the program finishes in 31.5s! w00t! and GC times are? also, try ByteString+HT,

Re: [Haskell-cafe] Re: Optimizing spelling correction program

2009-06-23 Thread Bulat Ziganshin
Hello Kamil, Tuesday, June 23, 2009, 11:17:43 AM, you wrote: One easy way to fix the GC time is to increase the default heap size.  ./a.out +RTS -A200M It does make the GC only 1.4% of run time but it increases it overall by 14s. not surprising - you lose L2 cache locality. try to use -A

[Haskell-cafe] Re: Optimizing spelling correction program

2009-06-22 Thread Kamil Dworakowski
On Jun 22, 10:03 am, Eugene Kirpichov ekirpic...@gmail.com wrote: Hey, you're using String I/O! nWORDS - fmap (train . map B.pack . words) (readFile big.txt) This should be WORDS - fmap (train . B.words) (B.readFile big.txt) By the way, which exact file do you use as a misspellings file?

[Haskell-cafe] Re: Optimizing spelling correction program

2009-06-22 Thread Kamil Dworakowski
On Jun 22, 6:46 am, Bulat Ziganshin bulat.zigans...@gmail.com wrote: Hello Kamil, Monday, June 22, 2009, 12:01:40 AM, you wrote: Right... Python uses hashtables while here I have a tree with log n you can try this pure hashtable approach: import Prelude hiding (lookup) import qualified

Re: [Haskell-cafe] Re: Optimizing spelling correction program

2009-06-22 Thread Daniel Fischer
Am Montag 22 Juni 2009 21:31:50 schrieb Kamil Dworakowski: On Jun 22, 6:46 am, Bulat Ziganshin bulat.zigans...@gmail.com wrote: Hello Kamil, Monday, June 22, 2009, 12:01:40 AM, you wrote: Right... Python uses hashtables while here I have a tree with log n you can try this pure

[Haskell-cafe] Re: Optimizing spelling correction program

2009-06-22 Thread Kamil Dworakowski
On Jun 22, 9:10 am, Ketil Malde ke...@malde.org wrote: Kamil Dworakowski ka...@dworakowski.name writes: Right... Python uses hashtables while here I have a tree with log n access time. I did not want to use the Data.HashTable, it would pervade my program with IO. The alternative is an ideal

[Haskell-cafe] Re: Optimizing spelling correction program

2009-06-22 Thread Kamil Dworakowski
On Jun 22, 9:06 pm, Daniel Fischer daniel.is.fisc...@web.de wrote: Am Montag 22 Juni 2009 21:31:50 schrieb Kamil Dworakowski: On Jun 22, 6:46 am, Bulat Ziganshin bulat.zigans...@gmail.com wrote: Hello Kamil, Monday, June 22, 2009, 12:01:40 AM, you wrote: Right... Python uses

Re: [Haskell-cafe] Re: Optimizing spelling correction program

2009-06-22 Thread Bulat Ziganshin
Hello Kamil, Tuesday, June 23, 2009, 12:54:49 AM, you wrote: I went back to using Strings instead of ByteStrings and with that hashtable the program finishes in 31.5s! w00t! and GC times are? also, try ByteString+HT, it should be pretty easy to write hashByteString -- Best regards, Bulat

Re: [Haskell-cafe] Re: Optimizing spelling correction program

2009-06-22 Thread Daniel Fischer
Am Montag 22 Juni 2009 22:54:49 schrieb Kamil Dworakowski: Wait! Have you typed that definition into the msg off the top of your head? :) No, took a bit of looking. I went back to using Strings instead of ByteStrings and with that hashtable the program finishes in 31.5s! w00t! Nice :D

Re: [Haskell-cafe] Re: Optimizing spelling correction program

2009-06-22 Thread Don Stewart
kamil: On Jun 22, 9:10 am, Ketil Malde ke...@malde.org wrote: Kamil Dworakowski ka...@dworakowski.name writes: Right... Python uses hashtables while here I have a tree with log n access time. I did not want to use the Data.HashTable, it would pervade my program with IO. The alternative

Re[2]: [Haskell-cafe] Re: Optimizing spelling correction program

2009-06-22 Thread Bulat Ziganshin
Hello Don, Tuesday, June 23, 2009, 1:22:46 AM, you wrote: One easy way to fix the GC time is to increase the default heap size. ./a.out +RTS -A200M to be exact, -A isn't a heap size - it's frequency of generation-1 collections. by default, collection perfromed every 512kbytes, tied to L2

[Haskell-cafe] Re: Optimizing spelling correction program

2009-06-21 Thread Kamil Dworakowski
What are the keys in your Map? Strict ByteStrings? And you're using ghc 6.10.x? For the record, I use 6.10.1 and strict ByteString everywhere now. I used to have some lazy IO with vanilla strings, but switching to Data.ByteString.Char8.readFile didn't change the time at all. The big.txt is