Lynch, Jonathan wrote:
Say you had a custom property set - "theProps"
And the keys of this set were 1 through 500,000

So, basically, you had an array with half a million elements stored as a
custom property set. Then, you wanted to search that array, and do it in
such a way that the search returned the name and content of the first
custom property that contains the item for which you search.

Would the following method be fastest?

  Set the custompropertyset to "theProps"
  Put 0 into Z
  Repeat for each element E in the customproperties of <myobject>
    Add 1 to Z
    If E contains <searchterm> then exit repeat
  End repeat
  Put Z && the customproperties[Z] of <myobject> into field "feedback"


My questions are this: 1) Is there a better or faster-access way to store the array than as a custom property set?

Using "repeat for each line" on a string benchmarked about 20% faster here than using "repeat for each element" on an array.


I only tested on 40,000 lines, though. In general "repeat for each..." scales well, so I would feel fairly confident extrapolating my results to larger data sets.

2) Is there a search method that is faster than doing all those
comparisons in transcript? For example, lineoffset and itemoffset are
supposed to be very fast. Is it possible to use itemoffset on an array,
or is there anything that works like an elementoffset command would
work, if it existed?
3) Would it be faster to combine the array, and use itemoffset?

Offset can rip through large blocks of text very fast, but it's not very precise.


For example, in my case I had to do comparisons on specific items within a line. LineOffset will get you to that line, but won't tell you where within that line it is.

With a low number of hits lineOffset can be faster to find the line, and then you could evaluate specific elements to find the item if needed.

But for larger numbers of hits it should be slower, since once you find the line you still need to "get line x", and that requires the engine to count lines. Requiring the engine to count lines is the bottleneck.

So for my purposes, using "repeat for each line" gave me a consistently scalable solution which allowed me to query any items within a line without ever having to count lines. So lazy person that I am, I stopped there and moved on to other things. :)

4) Could filter be made to work in this situation, or would it only give
the value of the element, but not the name of the element?

Filter may benchmark the fastest if you're looking for an item anywhere in a line (never tried it myself, since I need to find matches for specific items within lines, but worth testing).


5) Anything faster that I am not thinking of?

Probably. Search algorithms are a deep subject, and there's always one more clever way to solve a given problem.


I stopped benchmarking these things once I found that "repeat for each line" was coming out okay. Because I have so many comparisons to perform on specific items within a line, it gave me a robust (though admittedly brute force) solutuion with acceptable performance on data sets larger than will be needed in real-world performance with my app's audience.

But with half a million records it begs the question: Why not consider a database, where searching is done by compiled code optimized by people who specialize in such things?

--
 Richard Gaskin
 Fourth World Media Corporation
 __________________________________________________
 Rev tools and more: http://www.fourthworld.com/rev
_______________________________________________
use-revolution mailing list
[email protected]
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to