Re: Suggestions for optimization?

2020-02-18 Thread cblake
Incidentally, @lagerratrobe, once you save a lot of time by replacing sorting 
by a perfect commutative hash, the next optimization is making the IO part 
faster (you need cligen-0.9.43 for this): 


import strutils, tables, cligen/[mfile, mslice]

const prime: array[26, uint64] = [ #9/267751 oflow
  7'u64, 61, 41, 53, 2, 71, 47, 29, 3, 97, 89, 17, 59,
  19, 5, 31, 101, 11, 13, 23, 37, 79, 73, 67, 43, 83 ]

proc product(word: MSlice): uint64 =
  result = 1'u64  #Assumes whole file is uppercase
  for ch in word: #iterator needs cligen>=0.9.43
result *= prime[ord(ch) - ord('A')]

proc pBuildQry(dict="words", query: seq[string]) =
  var anas = initTable[uint64, seq[MSlice]]()
  let mf = mopen(dict)
  if mf == nil: return
  defer: mf.close
  for word in mf.mSlices():
anas.mgetOrPut(word.product, @[]).add word
  for word in query:
let word = word.toUpperAscii
let prod = word.toMSlice.product
echo word, ":"
try:
  for ana in anas[prod]: echo "  ", ana
except: discard
import cligen; dispatch(pBuildQry)


Run

With that my 129 ms goes down by 57ms to 72 ms (1.8x better) on that SOWPODS 
dictionary. Then if you Nim compile with `--gc:arc`, it goes down to about 47 
ms (1.53x better). Then if you use gcc's PGO it goes down to 42 ms (1.12x 
better).

I suspect with your dictionary and CPU that you would see a similar overall 4x 
to 5x improvement taking you from 286 ms down to more like <70ms. Language 
comparison-wise, I do not believe R or CPython would be able to be sped up 
similarly. (Perhaps in PyPy the product might be able to become fast, but 
perhaps not if they are worrying about overflow switching to arbitrary 
precision ints.)


Re: Suggestions for optimization?

2020-02-14 Thread cblake
Incidentally, in the small alphabet case (and realistically anagrams are 
usually related to word games over small alphabets) there is a neat trick for 
rapid signature computation: map the 26 letters to the first 26 primes (2, 3, 
5, 7, .., 97, 101). Then make the signature of a word the product of the 
primes. If you never overflow the product this is guaranteed to be a valid 
anagram signature by the uniqueness of prime factorization and commutativity of 
multiplication. Note that this **removes sorting from the equation entirely**.

In the worst case, that product can start to overflow a 64-bit integer past 
about 9 letters (since 101^9 =~ 2^60 and 101^10 > 2^66). However, neither worst 
nor average cases really matter since we have a static dictionary of every 
possible case. So, it's actually easy to just test a given dictionary to see if 
any overflows collide in our exact case. While it is easy to defeat by an 
attacker, for "organic/natural" dictionaries this is likely to be very rare 
because 2^64 is much more than the square of the number of words in most 
dictionaries. So, we are in a regime very far from birthday paradox collisions. 
Really, we aren't random, but also really the "load on the address space" is 
much less than the square of the number of words - it is the product of the 
number of non-product overflowing words and product overflowing words which is 
much smaller (assuming most words do not overflow). So, this is really a 
somewhat empirical question about words and dictionaries and letters.

With just a naive 'A' .. 'Z' \--> 2 .. 101 mapping and this dictionary: 
[https://github.com/jesstess/Scrabble/blob/master/scrabble/sowpods.txt](https://github.com/jesstess/Scrabble/blob/master/scrabble/sowpods.txt)
 I got 15_294 words that overflowed (and no collisions!) for 267_751 words. One 
can do better, though. At least in English, letters have a very skewed usage 
distribution. So, we can make overflows even less likely. You can just look 
that frequency up on Wikipedia 
([https://en.wikipedia.org/wiki/Letter_frequency](https://en.wikipedia.org/wiki/Letter_frequency))
 and sort the first 26 primes by that frequency. Doing that with the above 
SOWPODS dictionary, we reduce the number of overflows to 223.

You can do better still with a pre-pass to _measure_ this frequency in the 
dictionary, sort the primes list by that measured frequency. I get 175 
overflows that way. You can do better still by measuring only the frequency of 
letter usage in words _longer than 9 letters_ (where it actually matters to 
contain overflow). I get only 136 overflows (over 100x better than the initial 
15,000) that way. You might be able to do slightly better by measuring the 
frequency of letters in just the 136 overflow words and adjusting, but that is 
more work and upsets the apple cart of the prior measurement perhaps requiring 
iteration. I think that 136 yields some a priori probability of any collision 
of under 2e-12 (under certain not quite right randomness assumptions).

So, while you might worry this "product signature" technique is not applicable 
due to overflow, it probably is in a language where anagrams are interesting 
Applying all these ideas (except the measurement which I did in a 15 line 
Python script for its convenient arbitrary precision arithmetic) in Nim we get: 


import strutils,tables,sets #toUpperAscii,Table,HashSet

const prime: array[26, uint] = [  # >.len==9 frequency
  7'u, 59, 29, 31, 2, 67, 47, 53, 5, 101, 73, 23, 43,
  13, 17, 41, 97, 11, 3, 19, 37, 71, 79, 89, 61, 83 ]

proc product(word: string): uint =
  result = 1   #You might worry about overflow, BUT
  for ch in word:  #..long anagrams are so rare it's ok!
result *= prime[ord(ch) - ord('A')] #223/267751oflow

proc sortByLetter(word: string): string =
  var cnt: array[26, int16]
  for ch in word: cnt[ord(ch) - ord('A')].inc
  for i, c in cnt:
for n in 0 ..< c: result.add chr(ord('A') + i)

proc pBuildQry(dict="words", query: seq[string]) =
  var anas = initTable[uint, seq[string]]()
  for line in lines(dict):
let word = line.toUpperAscii
anas.mgetOrPut(word.product, @[]).add word
  for word in query:
let word = word.toUpperAscii
echo word, ":"
try:
  for ana in anas[word.product]: echo "  ", ana
except: discard

proc sBuildQry(dict="words", query: seq[string]) =
  var anas = initTable[string, seq[string]]()
  for line in lines(dict):
let word = line.toUpperAscii
anas.mgetOrPut(word.sortByLetter, @[]).add word
  for word in query:
let word = word.toUpperAscii
echo word, ":"
try:
  for ana in anas[word.sortByLetter]: echo "  ", ana
except: discard

proc collisions(dict="words") =
  var t = initTable[uint, seq[string]]()
  for line in lines(dict):
let word = 

Re: Suggestions for optimization?

2020-02-11 Thread cblake
To clarify how @cdome's comment differs from @SolitudeSF's and @lagerratrobe's 
own observation, the intent was always for `-d:danger` to imply `-d:release`, 
but I guess there are several versions of Nim out there for which this 
implication does not hold. You have to define both for those Nim versions to 
disable all checking. Going forward (from `devel` and probably from version 
1.2) that will be redundant upon simply `-d:danger`.

Another thing you can do along the lines of just how you build like `-d:release 
-d:danger`, though is using GCC's profile-guided optimization in the C backend. 
I've seen up to 2X speed boosts for Nim generated code with that.

I suspect there is much more improvement to be had by the various algorithmic 
improvements I mentioned, though, and some are pretty easy. They do start to 
seem more like "cheating" in cross-programming language comparisons, especially 
moving all the hard work to compile-time. So, it depends if you're actually 
doing anything real with this code.


Re: Suggestions for optimization?

2020-02-11 Thread cdome
-d:release -d:danger might give extra speed


Re: Suggestions for optimization?

2020-02-11 Thread cblake
BTW, the Unicode version of this with 65536 to 2e6 "letters" (depending upon if 
you can restrict to "one Unicode plane") is definitely trickier. You might 
think a 2 pass radix sort would help, but since most words are short most of 
your sorts are _very_ small. Indeed, even insertion sort might well beat the 
`algorithm.sort` merge sort in some average sense for this problem. Insertion 
sort tends to beat almost everything for N < 16..32 on modern CPUs due to cache 
effects. Almost any dictionary will have average, median, and mode word lengths 
below 16 just because long words are unpopular. So, just having a "wrapper 
sort" that switches to insertion for N < 20 and falls back to `algorithm.sort` 
for bigger is probably best. It would also be possibly valuable to the 
community if you looked into the stdlib `algorithm.sort` and had it switch to 
insertion at small N. It doesn't right now.

The trouble you run into with most array/table based approaches is that the 
alphabet is so much larger than the word length. So, a Fenwick Tree or anything 
with "implicit order" from the indices is just too big to iterate over 
effectively. Without implicit order you wind up still having to sort "used 
letters" and letter repeats are not usually very dramatic. About the only thing 
I can think of that _might_ help Unicode (beyond just in-place/d-danger tricks) 
in that counting-sort-style approach is another structure from Preston Briggs 
in a 1993 Rice University tech report/thesis. That sparse-dense thing can 
sometimes be useful to manage iteration over very sparse subsets of small-ish 
universes like this Unicode code point space. It's conceivable you could use 
that as a Unicode histogram, but you would still have to sort a small array 
(number of _unique_ letters). So, it's hard to say it'd be faster than the 
insertion-or-merge idea, but it's obscure enough to be worth mentioning as also 
neglected.

Of course, **if you find yourself running this calculation a lot on a small-ish 
fixed set of basically static dictionaries** , the single biggest optimization 
you might do is to simply **save the signature index** to a file. You could 
extend @juancarlospaco 's idea and have a `const Table` built into the binary 
executable on a per dictionary basis (e.g. 1 program file per dict). You would 
absolutely have to time several hundred or even thousand anagram queries to get 
a reading. Or if you wanted a generic program to work with many dictionary 
files then instead of a `Table` you could sort the signatures themselves paired 
with their words. Then you could just save that sorted list to a file and do 
lookup via binary search on the file with at most O(lg(Nwords)) disk probes per 
anagram query (plus time to open/mmap the file). You could also hash-structure 
that file to get that down to 1 probe at substantial code baggage. 
([https://github.com/c-blake/suggest](https://github.com/c-blake/suggest)/ has 
a fully worked out example of a much more complex persistent hash-structure 
store along those lines.) You _could_ try using Nim's `marshal` module to save 
your `Table` to disk and load it back whenever you want it, but in this case I 
think the loading of the table would be no faster than just building it from 
scratch. You could also "save in memory" by having a long-running server that 
processes each dictionary just once and then answers anagram queries over a 
local network or pipe or something.

Personally, I think _all_ of the above are good coding exercises.


Re: Suggestions for optimization?

2020-02-11 Thread juancarlospaco
const file = staticRead("dict.txt").strip.splitLines


Re: Suggestions for optimization?

2020-02-11 Thread cblake
Besides @SolitudeSF's important suggestion, an algorithmic improvement you can 
use is to replace `algorithm.sorted(lower, system.cmp)` with a hand-written 
"counting sort". For ASCII (i.e. 1 byte textual characters) this is (very?) 
easy and should be quite a bit faster than `algorithm.sorted`. Something like 
this:


proc sortByLetter(word: string): string =
var cnt: array[26, int8]
for ch in word:   #simple counting sort
  cnt[ord(ch) - ord('A')].inc
for i, c in cnt:
  for n in 0 ..< c:
result.add chr(ord('A') + i)


Run

You probably need some `toUpper` in there if your dictionary is stored in 
lowercase (although that could also be done prior to entry to the above `proc` 
or you could also just change `A` to `a`). That `int8` type covers words up to 
255 characters long, but I think the longest real word in any spoken language 
with dictionaries is less than that. You could always count and print a warning 
or switch that to `int16` or `int32` which could help if chemical elements or 
really crazy stuff is possibly in play.

The same idea could be adapted to unicode but then you would probably want a 
`Table`, not an `array` which would slow it down more than the above minor 
modifications.

I actually think this is a good problem context in which to introduce the oft 
neglected counting sort which can also be used with (unlike here) non-empty 
satellite data (at some slightly increased code baggage).


Re: Suggestions for optimization?

2020-02-11 Thread SolitudeSF
use -d:danger if you need performance


Re: Suggestions for optimization?

2020-02-10 Thread lagerratrobe
Oh! strtabs would have saved some time. Still, it was good to understand how to 
use the core features.

Thank you for the feedback.


Re: Suggestions for optimization?

2020-02-10 Thread Hlaaftana
Tiny optimization to create less copies:


proc makeSignature(word: string): string =
  let ascii = unidecode(word)
  result = strutils.toLowerAscii(ascii)
  result.sort(lower, system.cmp) # or algorithm.sort(result, lower, 
system.cmp)


Run

Note that there is also the [strtabs](https://nim-lang.org/docs/strtabs.html) 
module for efficient tables of strings to strings and case insensitivity.


Re: Suggestions for optimization?

2020-02-10 Thread lagerratrobe
Answering my own question on this. Using the "-d: release" compile option made 
all the difference. With that, I was able to reduce the time to:


real0m0.286s
user0m0.269s
sys 0m0.012s


Run


Suggestions for optimization?

2020-02-10 Thread lagerratrobe
Hi All,

I read about Nim for the 1st time last week and decided to make something over 
the weekend. I was pleasantly surprised that I was able to cobble something 
together that did what I wanted it to do, but am a bit surprised that it's not 
faster. Maybe some of you could give some suggestions on how to improve 
performance?

The goal was to write a program that takes a word as input from the user, reads 
a local file of dictionary words, and returns all of the words in the 
dictionary that are anagrams of the user’s original word.

The code I came up with is this.


# anagram_redux.nim

#  Imports 
#
import strutils
import algorithm
import os
import strformat
import tables
import unidecode

#  Converts a word into it's lower-case, sorted "signature" 
---#
proc makeSignature(word: string): string =
  let ascii = unidecode(word)
  let lower = strutils.toLowerAscii(ascii)
  let sorted_word = algorithm.sorted(lower, system.cmp)
  let signature = sorted_word.join()
  result = signature

#  Main body of program 
---#
proc main =
  # -- Parse the lookup word from the argument passed in
  if os.paramCount() < 1:
  quit("Usage: anagram_redux ")
  let lookup_word = os.paramStr(1)
  let lookup_signature = makeSignature(lookup_word)
  echo strformat.fmt("Looking up '{lookup_word}'")
  
  # -- Create empty table to store results in
  var anagram_table = tables.initTable[string, seq[string]]()
  
  # -- Open the text file and process it
  let file = open("dict.txt")
  
  for word in file.lines():
let signature = makeSignature(word)
if anagram_table.hasKey(signature):
  anagram_table[signature].add(word)
else:
  anagram_table[signature] = @[word]
  
  # -- Lookup user's word via its signature in the anagram_table
  if anagram_table[lookup_signature].len == 1:
echo strformat.fmt("'{lookup_word}' does not have any anagrams")
  else:
echo anagram_table[lookup_signature]

main()


Run

The "dict.txt" file with dictionary words is on github at 
[https://github.com/lagerratrobe/nim_dev](https://github.com/lagerratrobe/nim_dev).
 For comparison, here are the performance numbers I get from this and similar 
implementations in Python and R.

  * Nim unoptimized




real0m1.809s
user0m1.782s
sys 0m0.012s


Run

  * Nim --opt:speed




real0m0.599s
user0m0.575s
sys 0m0.016s


Run

  * Python




real0m0.348s
user0m0.325s
sys 0m0.020s


Run

  * R




  user  system elapsed
  1.773   0.020   1.803


Run