Malte,

Here is a quick example of what binary search can do. It's a bit of work to maintain the sorted data, but the speed is impressive if you have very large data. If you are not familiar with binary search (forgive me if you are), it is very intuitive -- it searches like you might search an alphabetized phonebook. Look at the middle page, and determine if the name is after or before the page you are on. At each step you cut the search in half by looking at the "middle" page of the remaining section and deciding which side to look on next. Doing the simple math, you can see that you can search 2^n records in n steps. Suppose you have 1,000,000 records. A linear search will take 1,000,000 steps. Binary will take 20!

There are two caveats, though:

1) To do a binary search, you must have your data sorted first
2) Even with binary search you will still need to read all of the matching rows after you have found one. So if all of your records are the same, you lose some benefit.

This sample creates 500,000 rows with 1000 different names.

Whether this is worth doing is going to depend very much on your application, but the times I get for this data set are roughly:

BINARY SEARCH:

Sort time (upfront cost): 2.5 seconds
Search time: <= 1 millisecond

ORIGINAL METHOD:

Sort time (upfront cost): 0
Search time: 300 milliseconds

global sortedData

on mouseUp

  ## prep the test data
  local testarray,tprocess,test
  repeat with i=1 to 500000
     put "bob"&random(1000) into testarray[i]["name"]
  end repeat
  answer the number of lines of the keys of testarray

  ## create a sorted version of the data
  put the millisecs into test
  put the keys of testarray into tprocess
  repeat for each line theLine in tprocess
     put testarray[theLine]["name"]&cr after temp
  end repeat
  delete char -1 of temp
  sort lines of temp
  put 1 into i
  put empty into sortedData
  repeat for each line l in temp
     put l into sortedData[i]
     add 1 to i
  end repeat
  put i-1 into sortedData[0]
  answer ("data sorted")&cr&(the millisecs-test)

  ## lookup bob100
  put the millisecs into test
  get count("bob100")
  answer ("found="&it&&(the millisecs-test))
end mouseUp

## calls binary search algorithm and then scans before and after found record
function count pName
   put nameOffset(pName) into startLine

   if (startLine < 0) then return 0

   repeat with i=startLine down to 1
      if (sortedData[i] = pName) then
         add 1 to tCount
      else
         exit repeat
      end if
   end repeat

   repeat with i=startLine+1 to sortedData[0]
      if (sortedData[i] = pName) then
         add 1 to tCount
      else
         exit repeat
      end if
   end repeat

   return tCount
end count

## binary search
function nameOffset pName,startKey, endKey
   if (startKey is empty) then put 1 into startKey
   if (endKey is empty) then put sortedData[0] into endKey

   if (startKey > endKey) then return -1

   put ((startKey + endKey) div 2) into tIndex
   put sortedData[tIndex] into tName

   if (pName > tName) then put tIndex + 1 into startKey
   else if (pName < tName) then put tIndex - 1 into endKey
   else if (pName = tName) then return tIndex

   return nameOffset(pName, startKey, endKey)
end nameOffset

Thank you all.

I ended up creating a new array anduse repeat for each key.

Brian, I would be very much interested in the binary method on sorted data.

Cheers,

Malte



_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to