On Thu, May 15, 2014 at 10:03 PM, John Mark Vandenberg <jay...@gmail.com> wrote:

> We're getting a long way off topic of the still frame on MOTD, but I
> agree, and wish that the WMF would make this a priority for their
> multimedia and search team.
> Many improvements have been suggested by the community, and both sides
> of the fence have even agreed on some of them, such as clustered
> search results:
>
> https://meta.wikimedia.org/wiki/Controversial_content/Brainstorming#Clustering_for_search_results_on_Commons
> https://bugzilla.wikimedia.org/show_bug.cgi?id=35701

First, as general background, WMF recently started migrating its
search infrastructure over to ElasticSearch. See:

https://www.mediawiki.org/wiki/Search
https://www.mediawiki.org/wiki/Help:CirrusSearch

The new search is available on Commons as a BetaFeature. It's worth
looking at search results that are viewed as problematic through the
new search and compare. For example, the results for "Asian" are
markedly different in the new search.

I would caution against a simplistic characterization of technology as
a solution for what's inherently a complex socio-technical problem.
That was a core issue with the image filter proposal and it's a
similar issue here. If people insist on uploading pictures of
masturbation with toothbrushes, those pictures will come up in
searches. If we insist on not having a distinction between explicit
and non-explicit materials in file metadata, search results won't have
it either. We can point the finger at technology because that's easy,
but it's not magical pixie dust.

To get a feel for ElasticSearch's capabilities, please see the help
page above, as well as the tech talk that Nik gave earlier today on
the subject:
https://www.youtube.com/watch?v=FubXExbAvOA

Capabilities that exist today with the new search include
template-based "boosting" of results, a feature that's already enabled
on Commons and which will boost quality content in search results:
https://commons.wikimedia.org/w/index.php?title=MediaWiki:Cirrussearch-boost-templates&action=edit

ElasticSearch has support for faceting (see
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets.html
), which might come in handy for creating a breakdown of search
results.

However, keep in mind that unless you collapse each facet by default,
you're still going to show explicit thumbs -- and collapsing results
by default could compromise usability to an unacceptable degree for
the common use case. The more complex suggestions that include taking
the full category tree into account also seem fairly complex/expensive
(ElasticSearch has no awareness of the actual category tree structure,
which is a complex structure to traverse) and a faceted search that
only operates on the specific categories associated with a given file
might not be very useful due to the high degree of granularity that
exists in the category structure.

I'd encourage Nik and Chad (search engineers) to weigh in here & on
the bug as they see fit, as well as correct me if I'm misrepresenting
anything in the above.

Cheers,
Erik
-- 
Erik Möller
VP of Engineering and Product Development, Wikimedia Foundation

_______________________________________________
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Reply via email to