On 8/3/16, 7:32 PM, "bilbosax" <waspenc...@comcast.net> wrote:
>True, I can live with the speed that it is currently running. I was >willing >to construct 4 workers in hopes of getting it down to 15 minutes. To be >able to have it running in my main application in under three minutes is a >dream. Now it is just a fun academic exercise for me. Well, if we're being academic, it may be that this is more of a sorting problem than a comparison problem. While changing the looping as I suggested in my previous post can make a n^2 algorithm a nlogn algorithm, hashing might make the algorithm linear. One thing might matter is how many matches you expect to find in the database. If there are always going to be relative few matches (the other extreme would be all 38K items having the same 8 properties), then hashing into buckets might be an alternative implementation. Then the algorithm is more like: Var hashTable:Object = {}; for (i = 0; i < n; i++) { var hash:String = computeHash(items[i]); // if no entry in table, create an array to hold items if (hashTable[hash] == null) hashTable[hash] = []; hashTable[hash].push(items[i]); } The hash has to be designed to so that items who match all 8 properties have the same hash value. Then in just one loop, the hashTable has buckets with matching items and you can scan the table and run the math: for each (hash in hashTable) { var matches:Array = hashTable[hash]; if (matches.length > 1) { runMathOnMatches(matches); } } The hash could be a simple concatenation of the strings of all 8 properties as long as you are sure there won't be inadvertent collisions. For example, if one property is State and the other is City, you can compute a hash of State+City. Such a hash could compute long strings so hash computation and lookup won't be optimal, so then if you really want to squeeze a few cycles more out of it you could consider encoding the properties or running better hash algorithms to generate numeric hashes or other shorter hashes. Food for thought, -Alex