Hello again,
Below you'll find the weekly update from the Search Platform team for the
week starting 2018-01-29.

== Highlights==
* Several members of the Search Platform team participated in the annual
Developer Summit held January 22-23, 2018 in Berkeley, California. [0]
* During the WMF's All Hand's two day event, the Search Platform team met
with the WMDE team and the Multimedia team to talk about the future of
search and how the teams can work together on Structured Data on Commons.
* The Search Platform team had their team offsite the two days after the
All Hands meetings — lots of great conversations about the future of search
were had.
* Q3 Goals can be found on MediaWiki.org [1]

== Discussions ==

=== Search ===
* Chris Schilling opened a Phab ticket (T185721) on the difficulties of
searching in the Khmer script. [2] [3] The Unicode encoding for the script
uses many diacritics, and (with proper font support) the same glyph can be
properly written with the underlying Unicode characters in any of several
different orders, which complicates searching. If you are interested in
learning more—or if you have any experience with computing in Khmer—please
check out the Phab ticket.
* Erik made a new utility script that reads in the spark dataframe and
emits binary xgboost datasets to hdfs; all in order to switch Mjolnir to
file based training [4]
* Gehel cleaned up multiple definitions of logstash endpoint in puppet /
hiera so that almost all references to the logstash host are now
consolidated in a single variable [5]
* Stas added hidden status to category dumps, to be deployed on Feb 7 [6]

=== Analysis  ===
* Chelsy finished up a draft analysis result on a MLR test on Hebrew wiki
that is being reviewed by the Search Platform team [7] [8]

=== WDQS ===
* Gehel took on the large-ish task of defining the constraints of the new
WDQS cluster and getting that information on wiki. [9] [10]
* Stas added support for continuations to WDQS queries to Mediawiki API [11]

== Did you know? ==
English has a very large vocabulary—possibly larger than any other
language. [12] In part this is because English likes to "[pursue] other
languages down alleyways to beat them unconscious and rifle their pockets
for new vocabulary," [13] and in part because of its history
[14]—particularly the Norman conquest, which brought in a ruling class that
spoke a dialect of Old French.

An interesting consequence of this history is that English has distinct
words for the animal, "cow," and for the meat of that animal, "beef."
Surprisingly, both come from the same Proto-Indo-European [15] root:
"gʷṓws". [16] The word "cow" derives from Proto-Germanic "kūz," while
"beef" was borrowed from Old French "boef", derived in turn from Latin
"bōs". Both "kūz" and "bōs" come from PIE "gʷṓws," though they obviously
followed very different sound change [17] paths along the way.

Chris Koerner
Community Liaison
Wikimedia Foundation
