As I research performance-and-scalability problems, I often find related UX problems such as unexpected or less-than-useful results. For the most part, those seem pretty uncontroversial. In one case, though, a change I've made in local CalCentral code eliminates behavior that's been discussed on-list.
The behavior is for user-entered text to be automatically treated as a substring embedded somewhere inside a word, like playing Scrabble or solving a crossword puzzle. (This differs from the familiar auto-complete sort of search which searches the *beginning* of key words, and from Solr/Lucene's smart-fuzzy searches.) Implicit embedded-string-search used to be implemented by automatically surrounding user-entered text by wildcards and now it's implemented via a massive automatically-generated NGram index. What do I mean by "massive"? On our not-all-that-big CalCentral pilot, when I drop automatic-embedded-string search and restrict auto-complete search to match user expectations, our Solr index storage drops from 905.3 MB to 306.9 MB. However, my motive for making the change was usability. At least with an institution the size of UC Berkeley, the results are just too noisy. [1] The last few times the "implicit wildcards" feature was brought up on list, it was described in historical terms rather than user stories. And there's evidence in the code base that at least a few developers brought over invalid assumptions from SQL-based search. Does anyone recall the actual requirement here? There may be a better way to meet it. And if not, I'm certain we can at least reduce the worst side-effects of the current approach. Best, Ray [1] https://jira.sakaiproject.org/browse/KERN-2806 https://jira.sakaiproject.org/browse/SAKIII-5611 _______________________________________________ oae-dev mailing list [email protected] http://collab.sakaiproject.org/mailman/listinfo/oae-dev
