Re: [SlimDevices: Plugins] Full-text search missed results

Michael Herger Thu, 22 Feb 2018 00:21:56 -0800

- There are three searches logged for each "1999" and "2000" while I
   only searched once.

As this is the live search, the search action isn't triggered when youhit the enter key, but on a timer. It can happen that a search istriggered more than once under certain circumstances. Which explains whyperformance is crucial for this search action. See below.

Why do some searches get a "w10"?

The full text index stores multiple sets of data for every item. Theyare put in different places, which later on would be weighted by theirimportance. w10 has highest weight, w1 lowest.

And here comes the lengthy explanation of what I've found investigatingthis very special case. Most likely (depending on your music collection)this really is a rather special case:


- the keyword is very precise: you know what you expect
- the keyword is very short
- the keyword likely is very popular

Now that popularity thing might be a bit irritating. You probably onlyhave one single track with this name. Why would it be popular? Becausewe're dealing with a full text index, covering not only titles, but lotsof other pieces of information, too. Eg. years, file paths, comments,even MusicBrainz IDs.


Digging the 99 case in my collection I found a lot of these:

Comment: ExactAudioCopy v0.99pb5

Yep. Or something like that:

UFID: [ http://musicbrainz.org, ebe13618-bbdd-4ef3-9a91-9981602e528f ]

That -9981602e528f at the end would match, too, as our search term is atthe start of that "word".

That would explain the popularity of the search term. But why would anobvious hit not show up, but some obscure, hidden data would win?

Now this is getting complicated. Many factors play a role: optimizationfor speed (which might penalize this particular case), the nature offull text search indexing not only the obvious data, but anything. Andsome poor, deliberate choices. And bugs. Wow. Searching for "99" broughtquite a few issues to the light of day :-).

So there's some optimization going on because the search needs to befast. One of these optimizations is to try to limit the result set whenwe risk to deal with a large number of hits. Eg. short search terms, orsingle terms. In this case we're limiting the results to hits in thehighest priority column only (which explains the "w10:99").

If we know that we are still dealing with a large resultset (>500 itemsfound), the current implementation would only pick the top 500 items.And that's where I would say there is/was a bug: we pick the top itemsout of an non-ordered list... which means that even if the score of "99Luftballons" was high, but it was far down the "randomly" ordered resultlist, it would be cut off.

When the search is being run, it does weigh the results based onaforementioned columns. If Nena's album had one track called "99Luftballons", but another album had ten tracks with the EAC versionstring in the comment, the latter might outweigh Nena, because the tracktitle on an album has weight 5, but the comment has 10x weight 1.

This is where a stupid decision kicks in: for whatever reason I decidedit was a good idea to put the MusicBrainz IDs in w10. Sure, it's aunique value for every item. But nothing else should have them, right?Therefore they should always bring up exactly one track, even if thevalue is stored in the lowest priority column.

New builds are due out in a bit. Unfortunately my shiny new build systemstill isn't installed in a decent place. Therefore I have to upload frombehind this super slow 10Mb connection... So please be patient.


Thanks for an interesting test/edge case! :-)

--

Michael
_______________________________________________
plugins mailing list
[email protected]
http://lists.slimdevices.com/mailman/listinfo/plugins

Re: [SlimDevices: Plugins] Full-text search missed results

Reply via email to