RE: [Haskell-cafe] Optimizing a title matcher

2006-09-29 Thread Bayley, Alistair
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bulat Ziganshin Sent: 28 September 2006 17:29 To: Lyle Kopnicky now i will work on edit-distance algorithm. i'm have an idea of checking the real distance between chars - as on my keyboard, so for example sprry can be

Re: [Haskell-cafe] Optimizing a title matcher

2006-09-28 Thread Bulat Ziganshin
Hello Lyle, Wednesday, September 27, 2006, 12:44:05 AM, you wrote: It's supposed to match movie titles from an imported database to a reference database. The import files each have 3,000 records, and the reference table has 137,986 records. Building the hash tables out of the files is

Re: [Haskell-cafe] Optimizing a title matcher

2006-09-27 Thread Johan Tibell
On 9/27/06, Lyle Kopnicky [EMAIL PROTECTED] wrote: Hi folks, It turns out Haskell is vindicated. It's my algorithm that was slow. As Robert Dockins pointed out, the double nested loop is just going to take a long time. As evidence, it turns out my C++ version is just as slow as the Haskell

Re: [Haskell-cafe] Optimizing a title matcher

2006-09-27 Thread Ketil Malde
Lyle Kopnicky [EMAIL PROTECTED] writes: If you have some other metric other than prefix in mind for partial matches, then things probably get a lot more complicated. You're probably looking at calculating minimum distances in some feature-space, which calls for pretty sophisticated

Re: [Haskell-cafe] Optimizing a title matcher

2006-09-27 Thread Johan Tibell
On 9/27/06, Johan Tibell [EMAIL PROTECTED] wrote: On 9/27/06, Lyle Kopnicky [EMAIL PROTECTED] wrote: Hi folks, It turns out Haskell is vindicated. It's my algorithm that was slow. As Robert Dockins pointed out, the double nested loop is just going to take a long time. As evidence, it

Re: [Haskell-cafe] Optimizing a title matcher

2006-09-27 Thread Lyle Kopnicky
Ketil Malde wrote: Do you really need that to search for movie titles? At any rate, an exact-match finite-map implementation is a good start - to get good performance, you probably will need to use some kind of index to reduce the amount of data to search exhaustively (all-against-all). For

Re: [Haskell-cafe] Optimizing a title matcher

2006-09-27 Thread Brandon Moore
Ketil Malde wrote: Lyle Kopnicky [EMAIL PROTECTED] writes: If you have some other metric other than prefix in mind for partial matches, then things probably get a lot more complicated. You're probably looking at calculating minimum distances in some feature-space, which calls for pretty

[Haskell-cafe] Optimizing a title matcher

2006-09-26 Thread Lyle Kopnicky
Hi folks, I'm competing in a contest at work, and we're allowed to use whatever language we want. I decided this was my chance to prove to people that Haskell was up to the challenge. Unfortunately, I ran into performance problems. Since the contest ends this Friday, I've decided to switch to

Re: [Haskell-cafe] Optimizing a title matcher

2006-09-26 Thread Robert Dockins
On Tuesday 26 September 2006 16:44, Lyle Kopnicky wrote: Hi folks, I'm competing in a contest at work, and we're allowed to use whatever language we want. I decided this was my chance to prove to people that Haskell was up to the challenge. Unfortunately, I ran into performance problems.

Re: [Haskell-cafe] Optimizing a title matcher

2006-09-26 Thread Lyle Kopnicky
Lemmih wrote: Do you have some test input online? I've attached some (very short) input files. Sorry I can't provide more - they're proprietary databases. I know that means you can't actually test the performance, but can only give me advice. At least you can run the program on them, and

Re: [Haskell-cafe] Optimizing a title matcher

2006-09-26 Thread Lyle Kopnicky
Robert Dockins wrote: Humm... well, double nested loops seems like the wrong approach. It may be. I had hoped it would be fast enough that way. Also, if you are using GHC, it's hashtable implementation has farily well-known performance problems. If all you care about is exact matching, then

Re: [Haskell-cafe] Optimizing a title matcher

2006-09-26 Thread Bertram Felgenhauer
Lyle Kopnicky wrote: [snip] data TextTable s = TextTable { tableFields :: ![s], keyFieldIndex :: !Int, tableRecords :: !(HashTable s (Array Int s)) } [snip] listRecords :: AbsString s = TextTable s - IO [TextRecord s] listRecords

Re: [Haskell-cafe] Optimizing a title matcher

2006-09-26 Thread Lyle Kopnicky
Bertram Felgenhauer wrote: Lyle Kopnicky wrote: [snip] listRecords :: AbsString s = TextTable s - IO [TextRecord s] listRecords (TextTable fields _ records) = do keyRecs - HT.toList records return $ map (fromList . zip fields . elems . snd) keyRecs Doing

Re: [Haskell-cafe] Optimizing a title matcher

2006-09-26 Thread Lyle Kopnicky
Hi folks, It turns out Haskell is vindicated. It's my algorithm that was slow. As Robert Dockins pointed out, the double nested loop is just going to take a long time. As evidence, it turns out my C++ version is just as slow as the Haskell version. So, I'm going to go back to Haskell, but