Hello!

Please join Nuria Ruiz and Andrew Otto next Tuesday, July 15th at 10am SF
time/5pm UTC
<http://www.timeanddate.com/worldclock/fixedtime.html?msg=Analytics+Tech+Talk&iso=20140715T10&p1=224&am=30>
for a 30 min tech talk. You can join our hangout or follow along on
youtube:
https://plus.google.com/u/0/b/103470172168784626509/events/c53ho5esd0luccd09a1c30rlrmg
(please note that a link to join the hangout will be posted in the comments
of this event just as it starts).

You can follow ask questions on IRC during the talk in #wikimedia-dev.

If you are not able to follow along live, a video recording will be posted
here
<https://plus.google.com/u/0/b/103470172168784626509/103470172168784626509/videos>,
to the MediaWiki YouTube channel immediately following the tech talk for
you to view at any time.

More information about the tech talk:

*Hadoop and Beyond. An overview of Analytics infrastructure*In this tech
talk we will be presenting the analytics infrastructure that we have
recently rolled out in production. By now probably everybody knows that
wikimedia hosts an instance of hadoop from which we are going to extract
pageview data in the near future. But .. how exactly does the data get
there?

We will go over the path that webrequest log data takes from varnish to
kafka (a distributed log buffer) to hadoop and the challenges of deploying
this java-based infrastructure in production. We will also talk about how
can we query the data with hive, an SQL-like interface. How can you set up
this stack on vagrant to play with and, last but not least, how we used
hive recently to provide GLAM folks with image view stats:
https://commons.wikimedia.org/wiki/Commons:GLAMwiki_Toolset_Project/NARA_analytics_pilot

Thanks!
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to