As I research performance-and-scalability problems, I often find related 
UX problems such as unexpected or less-than-useful results. For the most 
part, those seem pretty uncontroversial. In one case, though, a change 
I've made in local CalCentral code eliminates behavior that's been 
discussed on-list.

The behavior is for user-entered text to be automatically treated as a 
substring embedded somewhere inside a word, like playing Scrabble or 
solving a crossword puzzle. (This differs from the familiar 
auto-complete sort of search which searches the *beginning* of key 
words, and from Solr/Lucene's smart-fuzzy searches.) Implicit 
embedded-string-search used to be implemented by automatically 
surrounding user-entered text by wildcards and now it's implemented via 
a massive automatically-generated NGram index.

What do I mean by "massive"? On our not-all-that-big CalCentral pilot, 
when I drop automatic-embedded-string search and restrict auto-complete 
search to match user expectations, our Solr index storage drops from 
905.3 MB to 306.9 MB.

However, my motive for making the change was usability. At least with an 
institution the size of UC Berkeley, the results are just too noisy. [1]

The last few times the "implicit wildcards" feature was brought up on 
list, it was described in historical terms rather than user stories. And 
there's evidence in the code base that at least a few developers brought 
over invalid assumptions from SQL-based search. Does anyone recall the 
actual requirement here? There may be a better way to meet it. And if 
not, I'm certain we can at least reduce the worst side-effects of the 
current approach.

Best,
Ray

[1] https://jira.sakaiproject.org/browse/KERN-2806
     https://jira.sakaiproject.org/browse/SAKIII-5611
_______________________________________________
oae-dev mailing list
[email protected]
http://collab.sakaiproject.org/mailman/listinfo/oae-dev

Reply via email to