[Analytics] Re: Missing Page View Data for September 21st, 2022

2022-09-23 Thread Joseph Allemandou
there awareness of this issue and/or an estimate of when the data might > be available? > > Regards, > Ben > ___ > Analytics mailing list -- analytics@lists.wikimedia.org > To unsubscribe send an email to analytics-le...@lists.wikimedia.org >

[Analytics] Re: Wikimedia AQS Pageviews API Question

2022-04-19 Thread Joseph Allemandou
gt; accessible for the latest day via the AQS Pageviews REST API? > > Best, > Ben > > ___ > Analytics mailing list -- analytics@lists.wikimedia.org > To unsubscribe send an email to analytics-le...@lists.wikimedia.org > -- Jose

[Analytics] Re: Backfill the public api for daily top pages per country

2022-01-20 Thread Joseph Allemandou
_day_ > > [2] http://www.europeansocialsurvey.org/downloadwizard/ > ___ > Analytics mailing list -- analytics@lists.wikimedia.org > To unsubscribe send an email to analytics-le...@lists.wikimedia.org > -- Joseph Allemandou (joal) (he / him) Staff Data Engin

Re: [Analytics] Pageview-complete entries labeled as "-"

2021-03-15 Thread Joseph Allemandou
r in the > future, which could simplify my aggregation process. > > Thank you again for your answer. > Regards, > Ogier > > > Le 15 mars 2021 à 14:10, Joseph Allemandou a > écrit : > > Hello Ogier, > Thank you a lot for the wikimaps work, and your thorough a

Re: [Analytics] Pageview-complete entries labeled as "-"

2021-03-15 Thread Joseph Allemandou
gt;> >> Sorry for the long introduction and thank you for your time. >> >> Regards, >> Ogier >> ___ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Joseph Allemandou (joal) (he / him) Staff Data Engineer Wikimedia Foundation ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

[Analytics] Fwd: About: refine_webrequest.hql

2021-03-15 Thread Joseph Allemandou
Forwarding to the analytics list for reference. -- Forwarded message - From: Ho Chung Date: Mon, Mar 15, 2021 at 11:45 AM Subject: Re: [Analytics] About: refine_webrequest.hql To: Joseph Allemandou Hello Thanks for your reply Because i was research your Analytics team

Re: [Analytics] About: refine_webrequest.hql

2021-03-15 Thread Joseph Allemandou
ttps://github.com/wikimedia/analytics-refinery/blob/master/oozie/webrequest/load/refine_webrequest.hql >> >> I emailed wiki legal request 3 month they not sure , can you clearly ask >> me . >> >> If not use utc, is use your server clock or , my computer clock? >>

Re: [Analytics] Clickstream: mobile vs. desktop, empty referrers

2020-06-09 Thread Joseph Allemandou
is clouded in mystery and > seems to depend a lot on browser and website specificities. Any insights > (small or big) would be appreciated! > > Thanks a lot! > Bob > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > http

Re: [Analytics] [Please read if you use Superset] Superset, Druid and SQLAlchemy

2020-04-08 Thread Joseph Allemandou
ns in your daily workflows. > > I created a task to report thoughts/suggestions/bugs/etc.. to avoid > spamming too many people: https://phabricator.wikimedia.org/T249681 > > Thanks! > > Luca > ___ > Analytics mailing list >

Re: [Analytics] [Wiki-research-l] Announcement - Mediawiki History Dumps

2020-02-18 Thread Joseph Allemandou
vailable to the public and the media upon > request.* > > > On Thu, Feb 13, 2020 at 9:27 AM Joseph Allemandou < > jalleman...@wikimedia.org> > wrote: > > > Hi Giovanni, > > Thank you for your message :) > > You are correct in that there is no information on

Re: [Analytics] [Wiki-research-l] Announcement - Mediawiki History Dumps

2020-02-13 Thread Joseph Allemandou
t; <https://www.usf.edu/engineering/cse/> ∙ University > of South Florida <https://www.usf.edu/> > > *Due to Florida’s broad open records law, email to or from university > employees is public record, available to the public and the media upon > request.* > > > On Mo

[Analytics] Announcement - Mediawiki History Dumps

2020-02-10 Thread Joseph Allemandou
it, and we're eager to hear from you [7], whether for issues, ideas or usage of the data. Analytically yours, -- Joseph Allemandou (joal) (he / him) Sr Data Engineer Wikimedia Foundation [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html [2] https://wikitech.wikimedia.org/wiki

Re: [Analytics] Completeness of Wikipedia Clickstream dataset

2019-05-13 Thread Joseph Allemandou
Adding Simon Back as he might not be in the list. On Mon, May 13, 2019 at 5:58 PM Joseph Allemandou wrote: > Hi Simon, > Thanks for reaching out :) > > I tried a similar analysis on our cluster with the same original files as > the ones in dumps.wikimedia.org, using Sp

Re: [Analytics] Completeness of Wikipedia Clickstream dataset

2019-05-13 Thread Joseph Allemandou
link233 2018-07 > 7 Air_pollution Smog link 45 2018-09 > 8 Smog Air_pollution link 96 2018-10 > 9 Smog Air_pollution link 90 2018-12 > > Am I missing something here? > > Thanks in advance, > Simon > _____

Re: [Analytics] When is the new pages API updated?

2018-10-11 Thread Joseph Allemandou
gt;>> ___ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> ___ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] temporary drop in pageviews to ig.wikipedia

2018-06-21 Thread Joseph Allemandou
s updated. >> >> Federico >> > > > _______ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

[Analytics] Turnilo / Superset / Druid datasource names changed

2018-06-04 Thread Joseph Allemandou
Hi Analytics folks, Last Friday we have changed the name of some datasources in Druid, and it therefore impacts Turnilo and Superset. We have renamed every datasource containing a `-`, changing those for `_`. The reason for this is to facilitate future SQL querying in Druid. Turnilo configuration

Re: [Analytics] Latency of hourly vs daily endpoints?

2018-03-26 Thread Joseph Allemandou
he best way to get daily > updates for all of them (i.e., edit the query every day, create a new > query for each day, etc.). Using Quarry seems much easier than > generating these daily numbers from the Wikimedia EventStreams: > > https://stream.wikimedia.org/?doc

[Analytics] Hadoop Cluster Maintenance - Now

2018-02-13 Thread Joseph Allemandou
Hi ! The hadoop cluster maintenance (upgrade to Java 8) was planned to happen earlier today but is finally happening now. Il will require a complete shutdown and should not last longer than a couple of hours (expected less than one). Thanks ! Joseph on behalf of the Analytics-Team

[Analytics] [Engineering] Analytics Hadoop cluster maintenance postponed - Tue 13th February

2018-02-05 Thread Joseph Allemandou
the maintenance of the cluster to next week, allowing for those jobs to be finished. We are very sorry about the short notice and will send another email the day before maintenance. Best Joseph Allemandou on behalf of the Analytics-Team Data Engineer @ Wikimedia Foundation IRC: joal

[Analytics] Fwd: Help needed on web request analytics

2018-01-29 Thread Joseph Allemandou
Hive so I can get it by myself? > > I'm looking forward to your reply. Thank you sincerely! > > > Simon > 2018.01.26 > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___

Re: [Analytics] is there an hourly pageviews API?

2018-01-19 Thread Joseph Allemandou
$year-$month/pageviews-$year$ > month$day-[012][0-9].gz > > Is there an API faster than zgreping those? > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >

Re: [Analytics] Tool to visualize which wiki pages link to which wiki pages?

2017-11-21 Thread Joseph Allemandou
php/Visualizations/Clust > erBall> and a graph of links between user pages, which was made perhaps > in 2014. > > Federico > > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/

[Analytics] Hive issue yesterday

2017-08-24 Thread Joseph Allemandou
, we know which jobs have failed and we've taken care of it, however for jobs that are not monitored (report-updater, manual scripts etc), some silent failures might have occurred. Please check your logs :) Cheers -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal

[Analytics] Unique Devices get a lifting

2017-06-15 Thread Joseph Allemandou
Hello Analytics Fellows, In preparation for a future where unique devices will be counted per-domain and project-wide, we have renamed the unique_devices (also named last_access_uniques in some places) to unique_devices_per_domain. - New URL for dumps:

Re: [Analytics] Question regarding specific pageviews graph

2017-04-13 Thread Joseph Allemandou
;https://tools.wmflabs.org/pageviews/?project=fr.wikipedia.org=all-access=user=last-year=Batman_v_Superman_:_L%27Aube_de_la_Justice%7CBatman_v_Superman_:_L%27Aube_de_la_justice> > [2] https://tools.wmflabs.org/redirectviews/?project=fr. > wikipedia.org=all-access=user=last- > ye

Re: [Analytics] Question regarding specific pageviews graph

2017-04-13 Thread Joseph Allemandou
t; > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Missing mediacounts for 2016-12-01

2017-02-16 Thread Joseph Allemandou
analytics >> >> > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

[Analytics] Fwd: Wiktionary page view counts?

2016-12-06 Thread Joseph Allemandou
onary. *en Furniture_Brands_International 1 0en George_Coventry,_9th_Earl_of_Coventry 2 0en George_Palaiologos 1 0en Leningrad_(song) 2 0en Olivet_Discourse 9 0* Thanks, Michael Douma www.idea.org -- *Joseph Allemandou* Data Engineer @ Wikimedia Foun

Re: [Analytics] High number of pageviews on page with single hyphen as title

2016-11-08 Thread Joseph Allemandou
o I am wondering where these numbers are coming from. > > Best regards, > Issa > > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Joseph Allem

Re: [Analytics] Pageviews dumps behind

2016-11-05 Thread Joseph Allemandou
; Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Pagecount Datasets to be Deprecated at the end of May

2016-08-08 Thread Joseph Allemandou
iling list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Clickhouse

2016-06-15 Thread Joseph Allemandou
attributes per event > registered in 2011. > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal _

Re: [Analytics] Wikipedia Clickstream dataset refreshed (March 2016)

2016-05-03 Thread Joseph Allemandou
E- >>> >>> ___ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >> >> -- >> >> >> *Dario Tarabo

Re: [Analytics] Query Improvement Question

2016-04-26 Thread Joseph Allemandou
eph On Mon, Apr 25, 2016 at 11:26 AM, Joseph Allemandou < jalleman...@wikimedia.org> wrote: > Again, without misclick sending (sorry for the spam). > > Hi Justin, > > First, one important thing: the data you are trying to get is VERY > sensitive data in term of potentia

Re: [Analytics] Query Improvement Question

2016-04-25 Thread Joseph Allemandou
as looking for, and way easier than starting from scratch. > > - Justin > > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/anal

Re: [Analytics] Query Improvement Question

2016-04-25 Thread Joseph Allemandou
t experimentation, but more options to try are totally >> welcome. >> >> Thanks, >> Justin >> >> ___ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/a

Re: [Analytics] Beeline as Hive client

2016-04-22 Thread Joseph Allemandou
:) > > > > ___ > > Analytics mailing list > > Analytics@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Tilman Bayer > Senior Analyst > Wikimedia Foundati

Re: [Analytics] Researcher Student

2016-04-12 Thread Joseph Allemandou
t;> >> ___ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > ___ > Analytics mailing list > Analytics@lists.wikime

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-02 Thread Joseph Allemandou
to detect this issue (too costly for occurrence probability, particularly if we force file.encoding) Cheers Joseph On Wed, Mar 2, 2016 at 11:24 AM, Joseph Allemandou < jalleman...@wikimedia.org> wrote: > @Ori: Needs to be discussed with the team - My 2 cents > >- Dete

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-02 Thread Joseph Allemandou
iki/Analytics/Data/Projectview_hourly> has > not been affected? > > On Tue, Mar 1, 2016 at 7:24 AM, Joseph Allemandou < > jalleman...@wikimedia.org> wrote: > >> Hey Oliver, >> It depends on what data you've used: if page_title or other 'encoding >> s

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Joseph Allemandou
Hi Again, @Dan: We will indeed reload data into cassandra. @Bo: Actually the two datasets are fairly different. The one called pagecounts is slowly getting deprecated toward the one called pageview, defined by Research people at WMF: https://meta.wikimedia.org/wiki/Research:Page_view The

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Joseph Allemandou
lt;bo.ning@gmail.com> wrote: > >> > >> Hi, > >> > >> Would you mind linking the bug fix here? I couldn't find it on > >> phabricator. > >> > >> Thanks, > >> Bo > >> > >> On Tue, Mar 1, 2016 at 7:2

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Joseph Allemandou
letting us know. So we should delete and backfill last > week's data, for our regularly scheduled scripts? > > On 1 March 2016 at 08:26, Joseph Allemandou <jalleman...@wikimedia.org> > wrote: > > Hi, > > > > TL,DR: Please don't use hive / spark / hadoop before next w

[Analytics] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Joseph Allemandou
. This means you shouldn't query last week data during this week, first because it is incorrect, and second because you'll curse the cluster for being too slow :) We are sorry for the inconvenience. Don't hesitate to contact us if you have any question -- *Joseph Allemandou* Data Engineer @ Wikimedia

Re: [Analytics] Issues on Cluster

2016-01-31 Thread Joseph Allemandou
Hi All, Everything is back to normal on the cluster, no data loss has been incurred and jobs are up-to-date. You can get back to your normal utilisation ! Thanks Joseph ___ Analytics mailing list Analytics@lists.wikimedia.org

[Analytics] Issues on Cluster

2016-01-28 Thread Joseph Allemandou
Hi Analytics fellows, We are experiencing issues with loading data into the hadoop cluster, therefore blocking the full job pipeline. When fixed, the cluster will be heavily loaded trying to catch up, so please, be nice with it and don't run heavy jobs in the next hours. We'll keep you posted

Re: [Analytics] Vital signs no longer displaying legacy pageview data

2016-01-22 Thread Joseph Allemandou
cs mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/l

Re: [Analytics] mobile and zero legacy tsvs on stat1002

2015-12-14 Thread Joseph Allemandou
ilman/listinfo/analytics > > > > > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Confusing pageviews

2015-12-02 Thread Joseph Allemandou
it.m.wikipedia.org /w/index.php ?search==Ricerca 22 On Wed, Dec 2, 2015 at 6:16 PM, Oliver Keyes <oke...@wikimedia.org> wrote: > I mean, now I want to know how we can have a condition where there's > no page title but it registers as a pageview. > > On 2 December 2015 at 12:14, Joseph Alle

Re: [Analytics] Issue on cluster - Delay in data

2015-09-17 Thread Joseph Allemandou
Hi again ! The issue is fixed, cluster is still catching up. Hopefully everything will be in place by tomorrow morning. Sorry for the inconvenience On Thu, Sep 17, 2015 at 12:26 PM, Joseph Allemandou < jalleman...@wikimedia.org> wrote: > Hi Analytics listeners, > > We are expe

Re: [Analytics] [Survey] Pageview API

2015-09-16 Thread Joseph Allemandou
oint for his >> use case. But I think it might be hard to make that useful generally. I >> think for now, let's just collect these one-off pageview querying use cases >> and slowly build them into the API when we can generalize two or more of >> them into one endpoint. >> >> ___

Re: [Analytics] pageviews_hourly table

2015-08-17 Thread Joseph Allemandou
when the definition is adapted. On 17 August 2015 at 10:31, Oliver Keyes oke...@wikimedia.org wrote: Excellent; thank you. On 17 August 2015 at 04:42, Joseph Allemandou jalleman...@wikimedia.org wrote: Oliver, It was a mistake from me to add the 'outreach' subdomain without asking

Re: [Analytics] pageviews_hourly table

2015-08-17 Thread Joseph Allemandou
___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing

Re: [Analytics] [Technical] Pick storage for pageview cubes

2015-06-13 Thread Joseph Allemandou
, Joseph Allemandou jalleman...@wikimedia.org wrote: I think we could add Impala in storage technologies to assess. It allows reading / computing straight from HDFS and should be fast enough for not too bad UEx. Maybe ? On Thu, Jun 11, 2015 at 11:11 PM, Marcel Ruiz Forns mfo...@wikimedia.org

[Analytics] New fields in wmf.webrequest hive table

2015-04-10 Thread Joseph Allemandou
web | desktop] - agent_type - To differentiate easily between spiders and users (more values may be added later). These additions are based on the tags, as defined here: https://meta.wikimedia.org/wiki/Research:Page_view Have a good weekend ! -- *Joseph Allemandou* Data Engineer

Re: [Analytics] MySQL binlog - JSON in Kafka

2015-03-17 Thread Joseph Allemandou
Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman

Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-09 Thread Joseph Allemandou
@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation

Re: [Analytics] [Technical][Debate] Historical client ip and geocoded data

2015-02-23 Thread Joseph Allemandou
? Country? City? :D On 23 February 2015 at 13:59, Joseph Allemandou jalleman...@wikimedia.org wrote: As per the IRC discussion, we won't recompute historical data, but start computing new values from the deploy time onward. A new version field, and associated documentation will also be provided