In a twist of irony, this issue was actually caused by a patch I wrote <https://gerrit.wikimedia.org/r/#/c/207727/> to fix an annoying little bug <https://phabricator.wikimedia.org/T96944> in the app where the namespace of some pages was being set to null when they were saved to the user's storage.
You can see in the changes I made to the persistence helper <https://gerrit.wikimedia.org/r/#/c/207727/3/wikipedia/src/main/java/org/wikipedia/history/HistoryEntryPersistenceHelper.java> that I took the column that was the timestamp and used it for the namespace instead. This was my first change to the database layer of the app, and I didn't quite realise the ramifications of doing what I did. Since Dmitry's fix <https://gerrit.wikimedia.org/r/#/c/228766/> noted that it was silly to ever use column indices rather than looking them up by name, I don't feel *too* bad about it.. ;-) 99 little bugs in the code, 99 little bugs, take one down, patch it around, 127 little bugs in the code. Dan On 2 August 2015 at 17:14, Oliver Keyes <[email protected]> wrote: > Hey all, > > This Friday, Trey Jones (our awesome Relevance Engineer) and I spent > some time playing detective with the sampled request logs and a list > of the most common queries resulting in zero results. We found a lot > of interesting things. In particular: > > 1. A common pattern in which queries, for no particular reason, had a > UNIX timestamp preceding them (example: "1436336857594:2019 FIFA > Women's World Cup"). This is responsible, on its own, for 3% of zero > results queries - and it appears to be caused by the Wikimedia Apps. > 2. A search for strings in quotes followed by 'film' (example: > "\"Seventh Son\" film"). This is caused by a media player and is > responsible for around 0.5% of zero results queries. > 3. A search for "quot" strings (example: " quot James Tree quot"). > This is from the National Library of Australia and is again around > 0.5% of zero results queries. > 4. A search for a page title and the name of a page that appears as a > link within that page (example: "\"2C-T-19\" AND \"JWH-081\""). This > is about 6% of queries and appears to come from a German IP address. > We're unaware of who this person is or what they're trying, so if > anyone knows what on earth this is, we'd appreciate the hint ;). > > https://phabricator.wikimedia.org/T107724 is a card representing the > need to reach out to these people, where possible (obviously this will > be easier for the app team than anyone else ;p). If we can get all of > these solved for, we could drop the zero results rate for full text by > about 10% Obviously cutting /all/ of it out is improbable, but we're > hopeful that we can drop this number and get a better understanding of > what third-party users are trying to achieve, to boot. > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > _______________________________________________ > Wikimedia-search mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search > -- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation
_______________________________________________ Wikimedia-search mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
