[Analytics] Thank you !

2015-02-19 Thread Joseph Allemandou
heers -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

[Analytics] [Technical][Debate] Historical client ip and geocoded data

2015-02-23 Thread Joseph Allemandou
ns gives me the right to run the script) :) Thanks -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] [Technical][Debate] Historical client ip and geocoded data

2015-02-23 Thread Joseph Allemandou
r data): I would expect city to. > > What's the status on getting an oozie job or similar to compute going > forward? To me that's more of a priority than historical data. > > On 23 February 2015 at 10:53, Joseph Allemandou > wrote: > > Hi, > > > >

Re: [Analytics] [Technical][Debate] Historical client ip and geocoded data

2015-02-23 Thread Joseph Allemandou
itude":"47.913","country":"United States"} Cheers Joseph On Mon, Feb 23, 2015 at 8:24 PM, Oliver Keyes wrote: > Gotcha. So, for transparency...what are we calculating? Country? City? :D > > On 23 February 2015 at 13:59, Joseph Allemandou > wrot

Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-09 Thread Joseph Allemandou
_ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] MySQL binlog -> JSON in Kafka

2015-03-17 Thread Joseph Allemandou
_ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analyti

[Analytics] [Release] Hive wmf.webrequest new fields

2015-04-01 Thread Joseph Allemandou
est Enjoy :) -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

[Analytics] New fields in wmf.webrequest hive table

2015-04-10 Thread Joseph Allemandou
web | desktop] - agent_type - To differentiate easily between spiders and users (more values may be added later). These additions are based on the "tags", as defined here: https://meta.wikimedia.org/wiki/Research:Page_view Have a good weekend ! -- *Joseph Allemandou* Dat

Re: [Analytics] New fields in wmf.webrequest hive table

2015-04-10 Thread Joseph Allemandou
've had many requests in the > past that required agent_type and access_method information and having them > readily available is awesome! :-) > > Have a great weekend! > > Leila > > On Fri, Apr 10, 2015 at 1:21 PM, Joseph Allemandou < > jalleman...@wikimedia.org> wr

Re: [Analytics] New fields in wmf.webrequest hive table

2015-04-10 Thread Joseph Allemandou
7;" and "WHERE user_agent_map['device_family'] != 'Spider'"? > Does agent_type include the isCrawler UDF results? > > On 10 April 2015 at 16:47, Joseph Allemandou > wrote: > > And I forgot one field : > > > > is_zero - True if a requ

Re: [Analytics] [Apps][IOS] Adding build number to user agent string

2015-05-07 Thread Joseph Allemandou
Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] [Apps][IOS] Adding build number to user agent string

2015-05-07 Thread Joseph Allemandou
Done ! I add planned to that yesterday and it went out of my mind ... thanks for reminding :) On Thu, May 7, 2015 at 12:32 PM, Joseph Allemandou < jalleman...@wikimedia.org> wrote: > Hi Corey, > We don't use version in our parsing of user agent for apps, so it won't >

Re: [Analytics] [Technical] Pick storage for pageview cubes

2015-06-12 Thread Joseph Allemandou
;>>> >>>> The API used to define these tables is described in >>>> https://github.com/wikimedia/restbase/blob/master/doc/TableStorageAPI.md, >>>> and the algorithm used to keep those indexes up to date is described in >>>> https://github.com/wikimedia/restbase-mod-table-cassandra/blob/master/doc/Second

Re: [Analytics] [Technical] Pick storage for pageview cubes

2015-06-13 Thread Joseph Allemandou
pageview > API to be impacted. > > -Toby > > On Fri, Jun 12, 2015 at 9:46 AM, Andrew Otto wrote: > >> I think we could add Impala in storage technologies to assess. >> >> I think we don’t want to build the pageview API on top of the Analytics >> Cluster. >

Re: [Analytics] pageviews_hourly table

2015-08-17 Thread Joseph Allemandou
lytics@lists.wikimedia.org > >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > > > > > -- > > --Madhu :) > > > > ___ > > Analytics mailing list > > Analytics@lists.wikimed

Re: [Analytics] pageviews_hourly table

2015-08-17 Thread Joseph Allemandou
tly informing > > customers when the definition is adapted. > > > > On 17 August 2015 at 10:31, Oliver Keyes wrote: > >> Excellent; thank you. > >> > >> On 17 August 2015 at 04:42, Joseph Allemandou < > jalleman...@wikimedia.org> wrote:

Re: [Analytics] [Survey] Pageview API

2015-09-14 Thread Joseph Allemandou
generating the underlying data, > which would be a lot of added complexity. > > -- > - Andrew Gray > andrew.g...@dunelm.org.uk > > ___________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinf

Re: [Analytics] Users changing language version through interwiki links

2015-09-14 Thread Joseph Allemandou
at it can be used as a > signal of demand in the destination language. > > Best, > Leila > > >> >> Thanks, >> Strainu >> >> >> ___ >> Analytics mailing list >> Analytics@lists.wikim

Re: [Analytics] corrupted and missing log files

2015-09-14 Thread Joseph Allemandou
essing it, but I can give you a full log file) Could you provide some feedback concerning the above cases? Best regards, George -- /g -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@

Re: [Analytics] [Survey] Pageview API

2015-09-16 Thread Joseph Allemandou
27;s just collect these one-off pageview querying use cases >> and slowly build them into the API when we can generalize two or more of >> them into one endpoint. >> >> ___ >> Analytics mailing list >> Analytics@lists.

[Analytics] Issue on cluster - Delay in data

2015-09-17 Thread Joseph Allemandou
when solved. -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Issue on cluster - Delay in data

2015-09-17 Thread Joseph Allemandou
Hi again ! The issue is fixed, cluster is still catching up. Hopefully everything will be in place by tomorrow morning. Sorry for the inconvenience On Thu, Sep 17, 2015 at 12:26 PM, Joseph Allemandou < jalleman...@wikimedia.org> wrote: > Hi Analytics listeners, > > We are experie

Re: [Analytics] Confusing pageviews

2015-12-02 Thread Joseph Allemandou
.org > >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > > > ___ > > Analytics mailing list > > Analytics@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Confusing pageviews

2015-12-02 Thread Joseph Allemandou
amp;ns0=1 23 en.wikipedia.org /w/index.php ?redirs=0&search=The%20Bridge%20%2B%20Film%20%2B%20Sofia%20Helin%20%2B%202011&fulltext=Search&ns0=1 23 en.wikipedia.org /w/index.php ?redirs=0&search=Rob%20the%20Mob%20%2B%20Film%20%2B%20Michael%20Pitt%20%2B%202014&fulltext=Search&n

Re: [Analytics] mobile and zero legacy tsvs on stat1002

2015-12-14 Thread Joseph Allemandou
gt; > > [1] https://phabricator.wikimedia.org/T109286 > > > > > > ___ > > Analytics mailing list > > Analytics@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > --

Re: [Analytics] Vital signs no longer displaying legacy pageview data

2016-01-22 Thread Joseph Allemandou
.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

[Analytics] Issues on Cluster

2016-01-28 Thread Joseph Allemandou
Hi Analytics fellows, We are experiencing issues with loading data into the hadoop cluster, therefore blocking the full job pipeline. When fixed, the cluster will be heavily loaded trying to catch up, so please, be nice with it and don't run heavy jobs in the next hours. We'll keep you posted abou

Re: [Analytics] Issues on Cluster

2016-01-28 Thread Joseph Allemandou
h On Thu, Jan 28, 2016 at 11:57 AM, Joseph Allemandou < jalleman...@wikimedia.org> wrote: > Hi Analytics fellows, > > We are experiencing issues with loading data into the hadoop cluster, > therefore blocking the full job pipeline. > When fixed, the cluster will be heavily loa

Re: [Analytics] Issues on Cluster

2016-01-29 Thread Joseph Allemandou
Hi, The hadoop cluster is still having issues this morning (ingestion from kafka fails on one partition). Please refrain from using it, it needs all possible resources to catch back :) Thanks Joseph On Thu, Jan 28, 2016 at 2:55 PM, Joseph Allemandou < jalleman...@wikimedia.org> wrote: >

Re: [Analytics] Issues on Cluster

2016-01-29 Thread Joseph Allemandou
Hi Folks, Data ingestion problem have been solved without data loss, but a lot re-computation is needed. I hope it'll be done over the weekend but in it's not I'll send an email on monday. Have a good weekend ! Joseph On Fri, Jan 29, 2016 at 12:03 PM, Joseph Allemandou < jallema

Re: [Analytics] Issues on Cluster

2016-01-31 Thread Joseph Allemandou
Hi All, Everything is back to normal on the cluster, no data loss has been incurred and jobs are up-to-date. You can get back to your normal utilisation ! Thanks Joseph ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/m

[Analytics] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Joseph Allemandou
nward. This means you shouldn't query last week data during this week, first because it is incorrect, and second because you'll curse the cluster for being too slow :) We are sorry for the inconvenience. Don't hesitate to contact us if you have any question -- *Joseph Allemandou* Data

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Joseph Allemandou
know. So we should delete and backfill last > week's data, for our regularly scheduled scripts? > > On 1 March 2016 at 08:26, Joseph Allemandou > wrote: > > Hi, > > > > TL,DR: Please don't use hive / spark / hadoop before next week. > > > > Las

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Joseph Allemandou
, > >> > >> Would you mind linking the bug fix here? I couldn't find it on > >> phabricator. > >> > >> Thanks, > >> Bo > >> > >> On Tue, Mar 1, 2016 at 7:24 AM, Joseph Allemandou > >> wrote: > >> > Hey Oliver, >

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-01 Thread Joseph Allemandou
Hi Again, @Dan: We will indeed reload data into cassandra. @Bo: Actually the two datasets are fairly different. The one called pagecounts is slowly getting deprecated toward the one called pageview, defined by Research people at WMF: https://meta.wikimedia.org/wiki/Research:Page_view The pagevi

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-02 Thread Joseph Allemandou
tview_hourly> has > not been affected? > > On Tue, Mar 1, 2016 at 7:24 AM, Joseph Allemandou < > jalleman...@wikimedia.org> wrote: > >> Hey Oliver, >> It depends on what data you've used: if page_title or other 'encoding >> sensitive' data (I can&

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-02 Thread Joseph Allemandou
I'd love to have your thoughts / ideas and discuss them with the team. Thanks On Wed, Mar 2, 2016 at 10:53 AM, Ori Livneh wrote: > So: what is the planning for making sure this doesn't happen the next time > around? :) > > On Tue, Mar 1, 2016 at 5:26 AM, Joseph Allemandou &l

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-02 Thread Joseph Allemandou
rying to detect this issue (too costly for occurrence probability, particularly if we force file.encoding) Cheers Joseph On Wed, Mar 2, 2016 at 11:24 AM, Joseph Allemandou < jalleman...@wikimedia.org> wrote: > @Ori: Needs to be discussed with the team - My 2 cents > >- De

Re: [Analytics] Hadoop - Last week data needs to be backfilled

2016-03-07 Thread Joseph Allemandou
Hi, Quick follow-up: All data has been backfilled, you can get back to normal cluster activity :) Sorry for the inconvenience. Joseph On Tue, Mar 1, 2016 at 2:26 PM, Joseph Allemandou wrote: > Hi, > > *TL,DR: Please don't use hive / spark / hadoop before next week.* > > Las

Re: [Analytics] Researcher Student

2016-04-12 Thread Joseph Allemandou
alytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org

Re: [Analytics] Beeline as Hive client

2016-04-22 Thread Joseph Allemandou
__ > > Analytics mailing list > > Analytics@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Tilman Bayer > Senior Analyst > Wikimedia Foundation > IRC (Freenode): HaeB > > __

Re: [Analytics] Query Improvement Question

2016-04-25 Thread Joseph Allemandou
there's no magic bullet and everything will take about as long as > everything else). If no one can say quickly off the top of their head, I > can just do that experimentation, but more options to try are totally > welcome. > > Thanks, > Justin > >

Re: [Analytics] Query Improvement Question

2016-04-25 Thread Joseph Allemandou
gt;> everything else). If no one can say quickly off the top of their head, I >> can just do that experimentation, but more options to try are totally >> welcome. >> >> Thanks, >> Justin >> >> ___ >> Analytics ma

Re: [Analytics] Query Improvement Question

2016-04-25 Thread Joseph Allemandou
gt; exactly what I was looking for, and way easier than starting from scratch. > > - Justin > > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- *Joseph Allemandou* Data Engi

Re: [Analytics] Query Improvement Question

2016-04-26 Thread Joseph Allemandou
; Best Joseph On Mon, Apr 25, 2016 at 11:26 AM, Joseph Allemandou < jalleman...@wikimedia.org> wrote: > Again, without misclick sending (sorry for the spam). > > Hi Justin, > > First, one important thing: the data you are trying to get is VERY > sensitive data in term

Re: [Analytics] Wikipedia Clickstream dataset refreshed (March 2016)

2016-05-03 Thread Joseph Allemandou
PGP SIGNATURE- >>> >>> ___ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >> >> -- >> >> >> *Dario Taraborelli *Head of Researc

Re: [Analytics] University project to make entire English Wikipedia history searchable on Hadoop using Solr

2016-05-18 Thread Joseph Allemandou
>> IRC (Freenode): HaeB >> >> >> -- >> Sent from Gmail Mobile >> >> >> ___ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wiki

Re: [Analytics] Clickhouse

2016-06-15 Thread Joseph Allemandou
attributes per event > registered in 2011. > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal

Re: [Analytics] [Wiki-research-l] question about Pageviews dumps

2016-06-29 Thread Joseph Allemandou
>>> Best, >>>>>> >>>>>> Marc Miquel >>>>>> ᐧ >>>>>> >>>>>> _______ >>>>>> Wiki-research-l mailing list &g

Re: [Analytics] Pagecount Datasets to be Deprecated at the end of May

2016-08-07 Thread Joseph Allemandou
ikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Question re. PageCounts

2016-09-29 Thread Joseph Allemandou
_Bourne_(film) 55 1226127) > > (pagecounts-20160731-22.gz,hu Jason_Bourne_(film) 3 34335) > > (pagecounts-20160731-22.gz,it Jason_Bourne_(film) 29 579129) > > (pagecounts-20160731-22.gz,nl Jason_Bourne_(film) 11 125928) > > > ___ > Analytics

Re: [Analytics] Pageviews dumps behind

2016-11-05 Thread Joseph Allemandou
ytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] High number of pageviews on page with single hyphen as title

2016-11-08 Thread Joseph Allemandou
so I am wondering where these numbers are coming from. > > Best regards, > Issa > > > _______ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Joseph All

[Analytics] Fwd: Wiktionary page view counts?

2016-12-06 Thread Joseph Allemandou
ded Wiktionary. *en Furniture_Brands_International 1 0en George_Coventry,_9th_Earl_of_Coventry 2 0en George_Palaiologos 1 0en Leningrad_(song) 2 0en Olivet_Discourse 9 0* Thanks, Michael Douma www.idea.org -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal

Re: [Analytics] Missing mediacounts for 2016-12-01

2017-02-16 Thread Joseph Allemandou
x27;t notice it). >> >> Luca >> >> ___ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > ____

Re: [Analytics] Question regarding specific pageviews graph

2017-04-13 Thread Joseph Allemandou
e? Even if for > some reason the page was not very popular in FR, shouldn't it have received > at least some views? > > Many thanks in advance for any ideas, > Gheorghe > > > > > ___ > Analytics mailing list > Analytics@lists.wikimedia

Re: [Analytics] Question regarding specific pageviews graph

2017-04-13 Thread Joseph Allemandou
ages in IT or ES show quite a lot of views: >>> >>> https://tools.wmflabs.org/pageviews/?project=it.wikipedia.or >>> g&platform=all-access&agent=user&range=last-year&pages=Batma >>> n_v_Superman:_Dawn_of_Justice >>> https://tools.wmflabs.org/pa

[Analytics] Unique Devices get a lifting

2017-06-15 Thread Joseph Allemandou
Hello Analytics Fellows, In preparation for a future where unique devices will be counted per-domain and project-wide, we have renamed the unique_devices (also named last_access_uniques in some places) to unique_devices_per_domain. - New URL for dumps: https://dumps.wikimedia.org/other/unique

[Analytics] Hive issue yesterday

2017-08-24 Thread Joseph Allemandou
, we know which jobs have failed and we've taken care of it, however for jobs that are not monitored (report-updater, manual scripts etc), some silent failures might have occurred. Please check your logs :) Cheers -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC:

Re: [Analytics] Tool to visualize which wiki pages link to which wiki pages?

2017-11-21 Thread Joseph Allemandou
index.php/Visualizations/Clust > erBall> and a graph of links between user pages, which was made perhaps > in 2014. > > Federico > > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://li

Re: [Analytics] is there an hourly pageviews API?

2018-01-19 Thread Joseph Allemandou
ear$ > month$day-[012][0-9].gz > > Is there an API faster than zgreping those? > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- *Joseph Allem

[Analytics] Fwd: Help needed on web request analytics

2018-01-29 Thread Joseph Allemandou
can get it by myself? > > I'm looking forward to your reply. Thank you sincerely! > > > Simon > 2018.01.26 > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal _

[Analytics] [Engineering] Analytics Hadoop cluster maintenance postponed - Tue 13th February

2018-02-05 Thread Joseph Allemandou
e the maintenance of the cluster to next week, allowing for those jobs to be finished. We are very sorry about the short notice and will send another email the day before maintenance. Best Joseph Allemandou on behalf of the Analytics-Team Data Engineer @ Wikimedia Foundation IRC:

[Analytics] Hadoop Cluster Maintenance - Now

2018-02-13 Thread Joseph Allemandou
Hi ! The hadoop cluster maintenance (upgrade to Java 8) was planned to happen earlier today but is finally happening now. Il will require a complete shutdown and should not last longer than a couple of hours (expected less than one). Thanks ! Joseph on behalf of the Analytics-Team _

Re: [Analytics] Latency of hourly vs daily endpoints?

2018-03-23 Thread Joseph Allemandou
Hi Ahmed and Neil, Super interesting project you have Ahmed :) Thanks Neil for the very precise you had to Ahmed's question ! Some comments about number disparity below: > >> https://quarry.wmflabs.org/query/25783 > > >> >> and I see that Quarry reports 168668 while the REST API reports 169754 >>

Re: [Analytics] Latency of hourly vs daily endpoints?

2018-03-26 Thread Joseph Allemandou
might be the hardest?), and see what's the best way to get daily > updates for all of them (i.e., edit the query every day, create a new > query for each day, etc.). Using Quarry seems much easier than > generating these daily numbers from the Wiki

[Analytics] Turnilo / Superset / Druid datasource names changed

2018-06-04 Thread Joseph Allemandou
Hi Analytics folks, Last Friday we have changed the name of some datasources in Druid, and it therefore impacts Turnilo and Superset. We have renamed every datasource containing a `-`, changing those for `_`. The reason for this is to facilitate future SQL querying in Druid. Turnilo configuration

Re: [Analytics] temporary drop in pageviews to ig.wikipedia

2018-06-21 Thread Joseph Allemandou
t rate the GeoIP information is updated. >> >> Federico >> > > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] When is the new pages API updated?

2018-10-11 Thread Joseph Allemandou
s) >>> product analyst, Wikimedia Foundation >>> ___ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> ___ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Completeness of Wikipedia Clickstream dataset

2019-05-13 Thread Joseph Allemandou
Smog link233 2018-07 > 7 Air_pollution Smog link 45 2018-09 > 8 Smog Air_pollution link 96 2018-10 > 9 Smog Air_pollution link 90 2018-12 > > Am I missing something here? > > Thanks in advance, > Simon > _______

Re: [Analytics] Completeness of Wikipedia Clickstream dataset

2019-05-13 Thread Joseph Allemandou
Adding Simon Back as he might not be in the list. On Mon, May 13, 2019 at 5:58 PM Joseph Allemandou wrote: > Hi Simon, > Thanks for reaching out :) > > I tried a similar analysis on our cluster with the same original files as > the ones in dumps.wikimedia.org, using Sp

[Analytics] Announcement - Mediawiki History Dumps

2020-02-10 Thread Joseph Allemandou
, and we're eager to hear from you [7], whether for issues, ideas or usage of the data. Analytically yours, -- Joseph Allemandou (joal) (he / him) Sr Data Engineer Wikimedia Foundation [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html [2] https://wikitech.wikimedia.org

Re: [Analytics] [Wiki-research-l] Announcement - Mediawiki History Dumps

2020-02-13 Thread Joseph Allemandou
ng > <https://www.usf.edu/engineering/cse/> ∙ University > of South Florida <https://www.usf.edu/> > > *Due to Florida’s broad open records law, email to or from university > employees is public record, available to the public and the media upon > request.* > > >

Re: [Analytics] [Wiki-research-l] Announcement - Mediawiki History Dumps

2020-02-18 Thread Joseph Allemandou
vailable to the public and the media upon > request.* > > > On Thu, Feb 13, 2020 at 9:27 AM Joseph Allemandou < > jalleman...@wikimedia.org> > wrote: > > > Hi Giovanni, > > Thank you for your message :) > > You are correct in that there is no information on pag

Re: [Analytics] [Please read if you use Superset] Superset, Druid and SQLAlchemy

2020-04-08 Thread Joseph Allemandou
ere are pros/cons in your daily workflows. > > I created a task to report thoughts/suggestions/bugs/etc.. to avoid > spamming too many people: https://phabricator.wikimedia.org/T249681 > > Thanks! > > Luca > _______ > Analytics m

Re: [Analytics] Clickstream: mobile vs. desktop, empty referrers

2020-06-09 Thread Joseph Allemandou
, but the issue is clouded in mystery and > seems to depend a lot on browser and website specificities. Any insights > (small or big) would be appreciated! > > Thanks a lot! > Bob > ___ > Analytics mailing list > Analytics@lists.wik

Re: [Analytics] About: refine_webrequest.hql

2021-03-15 Thread Joseph Allemandou
> >> https://github.com/wikimedia/analytics-refinery/blob/master/oozie/webrequest/load/refine_webrequest.hql >> >> I emailed wiki legal request 3 month they not sure , can you clearly ask >> me . >> >> If not use utc, is use your server clock or , my computer

[Analytics] Fwd: About: refine_webrequest.hql

2021-03-15 Thread Joseph Allemandou
Forwarding to the analytics list for reference. -- Forwarded message - From: Ho Chung Date: Mon, Mar 15, 2021 at 11:45 AM Subject: Re: [Analytics] About: refine_webrequest.hql To: Joseph Allemandou Hello Thanks for your reply Because i was research your Analytics team

Re: [Analytics] Pageview-complete entries labeled as "-"

2021-03-15 Thread Joseph Allemandou
ile-app" appears two times ? >> >> Sorry for the long introduction and thank you for your time. >> >> Regards, >> Ogier >> ___ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Joseph Allemandou (joal) (he / him) Staff Data Engineer Wikimedia Foundation ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Pageview-complete entries labeled as "-"

2021-03-15 Thread Joseph Allemandou
entry disappear in the > future, which could simplify my aggregation process. > > Thank you again for your answer. > Regards, > Ogier > > > Le 15 mars 2021 à 14:10, Joseph Allemandou a > écrit : > > Hello Ogier, > Thank you a lot for the wikimaps work, and

[Analytics] Re: Backfill the public api for daily top pages per country

2022-01-20 Thread Joseph Allemandou
nth___day_ > > [2] http://www.europeansocialsurvey.org/downloadwizard/ > ___ > Analytics mailing list -- analytics@lists.wikimedia.org > To unsubscribe send an email to analytics-le...@lists.wikimedia.org > -- Joseph Allemandou (joal) (he / him) Staff Data

[Analytics] Re: Wikimedia AQS Pageviews API Question

2022-04-19 Thread Joseph Allemandou
> accessible for the latest day via the AQS Pageviews REST API? > > Best, > Ben > > ___ > Analytics mailing list -- analytics@lists.wikimedia.org > To unsubscribe send an email to analytics-le...@lists.wikimedia.org > --

[Analytics] Re: Missing Page View Data for September 21st, 2022

2022-09-23 Thread Joseph Allemandou
reness of this issue and/or an estimate of when the data might > be available? > > Regards, > Ben > ___ > Analytics mailing list -- analytics@lists.wikimedia.org > To unsubscribe send an email to analytics-le...@lists.wikimedia.org >