Re: Filtering algorithm strategies to mimic intellisense

Michael Minutillo Mon, 17 May 2010 04:33:34 -0700

I agree. Possible but probably not a good idea.

If speed is more important than space I'd consider the trie approach because
you can get the list of terms that contain an N character substring in N
steps. That's true even if you have 1,000,000 x N items in the list. If the
list is particularly volatile then a tiered cache approach is probably
better. If the end user is likely to be "IntelliSensing" a small subset of
common terms then use a scan but cache the 1000 most common terms in a trie
or something. I realize that IntelliSense in VS2010 doesn't search every
possible combination but I'd guess that any algorithm that does is likely to
pretty close to the "just scan the whole list" approach in performance.


I guess the final answer is: It depends ;)

On Mon, May 17, 2010 at 6:24 PM, Bill McCarthy <[email protected]> wrote:

> Hi Michael,
>
> Doing an index for every possible substring would be slow and add a large
> overhead: it would generally be unsuitable for substring matching. In
> Visual
> Studio 2010 intellisense only matches based on starts with OR pascal cased
> words inside the string; so that does make it more viable for indexing the
> same string multiple times based on words inside the string .
> If the requirement is exactly the same as 2010, then a tree would be a
> better approach as you could index each Word in each string based purely on
> Case. If the requirement is a case insensitive substring search, then a
> tree
> is likely to be much slower and have a large memory overhead.  For typical
> business applications the "Like" matching via SQL, or a case insensitive
> sub
> string is what is usually wanted, definitely not a case sensitive.
>
>
> |-----Original Message-----
> |From: [email protected] [mailto:ozdotnet-
> |[email protected]] On Behalf Of Michael Minutillo
> |Sent: Monday, 17 May 2010 7:07 PM
> |To: ozDotNet
> |Subject: Re: Filtering algorithm strategies to mimic intellisense
> |
> |This is how it works and I'd bet that the IntelliSense index is based on
> some kind
> |of trie (or a http://en.wikipedia.org/wiki/Radix_tree) built from this
> data. You
> |can still do a "contains" search with a trie but you'd have to index every
> |possible substring (which could become a time/space concern): i.e.
> |
> |void AddString(string newString) {
> |  AddCore(newString); // Actually adds the string to the trie
> |  if(newString.Length > 1)
> |    AddString(newString.SubString(1));
> |}
> |
> |On Mon, May 17, 2010 at 10:04 AM, Matt Siebert <[email protected]>
> |wrote:
> |
> |
> |       I think VS2010 uses more than just simple 'contains' logic for
> matching
> |substrings.  It distinguishes between upper and lower case chars to
> delimit
> |substrings.
> |
> |       For example, suppose I have a variable named "fooBar", then
> |intellisense filters nicely when I type "foo" or "bar".  "FB" kind of
> works
> but
> |doesn't actually select "fooBar".  "fBar" and "ooBar" don't seem to do
> anything.
> |
> |       I suspect its done this way so intellisense can break up the
> identifiers
> |into reasonable substrings that are likely to be used for lookups.  These
> |substrings can then be used to index the identifiers using an appropriate
> |structure.  Matches on these indexed substrings are then done using
> |StartsWith() instead of Contains(), so if the substrings are sorted then a
> more
> |efficient search can be used.
> |
> |       On Mon, May 17, 2010 at 11:13 AM, Winston Pang
> |<[email protected]> wrote:
> |
> |
> |               yeah i did realise it must be a O(n^2) operation.
> |
> |               Still trying to figure out how I can improve it. Since a
> contains
> |operation will naturally be an O(n).
> |
> |               I'm wondering if there's other forms of optimizations i can
> do as
> |well.
> |
> |
> |               On Mon, May 17, 2010 at 10:35 AM, Mitch Wheat
> |<[email protected]> wrote:
> |
> |
> |                       Sorry, I meant order O(N) for the linear search.
> |
> |
> |
> |                       If you search degrades rapidly as the list
> increases, that
> |doesn't sound like O(N) performance, but more like O(N^2).
> |
> |
> |
> |                       Mitch
> |
> |
> |
> |                       From: [email protected]
> [mailto:ozdotnet-
> |[email protected]] On Behalf Of Mitch Wheat
> |                       Sent: Monday, 17 May 2010 8:33 AM
> |                       To: 'ozDotNet'
> |                       Subject: RE: Filtering algorithm strategies to
> mimic
> |intellisense
> |
> |
> |
> |                       Performing a linear search is O(N^2), so it will
> perform
> |poorly as the list grows, as you have observed.
> |
> |
> |
> |                       Either keep the list sorted and use binary search
> O(log N)
> |or use some other standard data structure such as a tree.
> |
> |
> |
> |                       Mitch
> |
> |
> |
> |                       From: [email protected]
> [mailto:ozdotnet-
> |[email protected]] On Behalf Of Winston Pang
> |                       Sent: Monday, 17 May 2010 7:41 AM
> |                       To: ozDotNet
> |                       Subject: Filtering algorithm strategies to mimic
> |intellisense
> |
> |
> |
> |                       Hey everyone,
> |
> |                       So I'm building a intellisense like autocomplete.
> I've
> |stumbled on some perf issues because its literally just iterating over the
> list and
> |re-applying a filter function on every item, based on every key stroke.
> The
> perf
> |degrades obviously as more times is in the list.
> |
> |                       I was wondering, does anyone have any strategies or
> past
> |expeirence with smarter ways to filter a list. Now this list is just a
> list
> of a
> |custom objects, with a particular display field that is a string.
> Filtering
> is filtered
> |based on a contains for each item. So it's essentially mimicing VS2010's
> |contains for intellisense.
> |
> |                       The VS2010 intellisense seems to be extremely
> speedy.
> |
> |                       Cheers,
> |
> |
> |                       Winston
> |
> |
> |
> |
> |
> |
> |--
> |Michael M. Minutillo
> |Indiscriminate Information Sponge
> |Blog: http://wolfbyte-net.blogspot.com
>
>
>


-- 
Michael M. Minutillo
Indiscriminate Information Sponge
Blog: http://wolfbyte-net.blogspot.com

Re: Filtering algorithm strategies to mimic intellisense

Reply via email to