[Analytics] Analytics Hadoop cluster is going to be rebooted for kernel upgrades

2016-06-29 Thread Luca Toscano
Hi again! Forgot to mention that I have currently stopped all Analytics jobs on Hadoop (Camus, Oozie) as preparation step for a complete Hadoop cluster reboot to install new Linux kernel upgrades. The reboots will be performed in small batches to limit the blast radius but it might affect your

Re: [Analytics] pagecounts dumps a couple hours behind

2016-02-19 Thread Luca Toscano
AM, Luca Toscano <ltosc...@wikimedia.org> wrote: > Hi! > > we have some problems with data ingestion at the moment, I'll update the > list with more precise information later on. > > Sorry and thanks for the patience! > > Luca > > On Fri, Feb 19, 2016 at 8:

[Analytics] All the nodes in the Analytics Hadoop cluster will be rebooted today

2016-03-16 Thread Luca Toscano
Hi folks! Due to a kernel upgrade for a security fix we need to reboot each node of the Hadoop cluster. The task will be started later on today and it will be done in small batches to avoid causing major delays to outstanding jobs. Please contact me if you notice any major issue (elukey or the

[Analytics] Hadoop cluster - Added automatic failover to the HDFS Namenode

2016-03-29 Thread Luca Toscano
Hi! TL;DR: The Analytics team added automatic failover for the HDFS Namenode. A new daemon is running on the analytics1001/1002 hosts called hadoop-hdfs-zkfc (port 8019) responsible to talk with Zookeeper and execute periodical health checks. Monitoring and Ferm rules has been added. More info:

[Analytics] Planned downtime/maintenance for stat1001.eqiad.wmnet

2016-04-27 Thread Luca Toscano
Hi all, as part of https://phabricator.wikimedia.org/T76348 the Analytics team is going to re-image stat1001.wikimedia.org with Debian Jessie. The activity will start on Monday May 2nd at 14:00 PM CEST (UTC+2). Three major services will not be available during the downtime: -

Re: [Analytics] Upcoming reboots of stat1002, stat1003 and stat1004 (on Jun 30th)

2016-06-30 Thread Luca Toscano
On Thu, Jun 30, 2016 at 12:17 PM, Luca Toscano <ltosc...@wikimedia.org> wrote: > > On Wed, Jun 29, 2016 at 10:38 AM, Luca Toscano <ltosc...@wikimedia.org> > wrote: > >> Hi! >> >> Tomorrow morning (Jun 30th - CET timezone) I'd need to reboot stat1002, &g

Re: [Analytics] Upcoming reboots of stat1002, stat1003 and stat1004 (on Jun 30th)

2016-06-30 Thread Luca Toscano
On Wed, Jun 29, 2016 at 10:38 AM, Luca Toscano <ltosc...@wikimedia.org> wrote: > Hi! > > Tomorrow morning (Jun 30th - CET timezone) I'd need to reboot stat1002, > stat1003 and stat1004 for kernel upgrades (Ubuntu security patches). This > could potentially terminate long ru

[Analytics] Delays in loading newest Pageview data

2017-02-08 Thread Luca Toscano
Hi everybody, as FYI we are currently tracking some issues between Hadoop worker nodes and the AQS Cassandra cluster in https://phabricator.wikimedia.org/T157533 This means that the last Pageview data will be loaded with a bit of delay. We expect to solve the issue during the next couple of

[Analytics] AQS API outage

2017-02-23 Thread Luca Toscano
Hi everybody, we are currently experiencing a wide outage of the AQS API, so Pageviews are not available at the moment. I am currently working on it, will update this list as soon as possible. Really sorry for the trouble, Luca ___ Analytics mailing

Re: [Analytics] AQS API outage

2017-02-23 Thread Luca Toscano
2017-02-23 11:10 GMT+01:00 Luca Toscano <ltosc...@wikimedia.org>: > Hi everybody, > > we are currently experiencing a wide outage of the AQS API, so Pageviews > are not available at the moment. I am currently working on it, will update > this list as soon as possible. &g

Re: [Analytics] Missing mediacounts for 2016-12-01

2017-02-16 Thread Luca Toscano
2017-02-16 15:18 GMT+01:00 Federico Leva (Nemo) : > As far as I can see, mediacounts.2016-12-01.v00.tsv.bz2 is missing: > > http://dumps.wikimedia.your.org/other/mediacounts/daily/2016/ > https://dumps.wikimedia.org/other/mediacounts/daily/2016/ > > Thanks a lot for the

[Analytics] Upcoming reboots of stat and Hadoop hosts due to Kernel upgrades

2016-09-21 Thread Luca Toscano
Hi everybody, the Analytics team is going to reboot all the stat hosts (stat1002, stat1003 and stat1004) and the Hadoop cluster nodes to install new kernels (security upgrade required). The work will start tomorrow morning (Sep 22nd) at around 9:00 AM CEST. This task might interfere with ongoing

Re: [Analytics] Upcoming reboots of stat and Hadoop hosts due to Kernel upgrades

2016-09-21 Thread Luca Toscano
lt;dandree...@wikimedia.org >> > wrote: >> >>> + research >>> >>> (btw, if we cc analytics and research does that reach everyone using >>> these boxes? Like discovery, fundraising, etc? Basically, everyone who >>> doesn't see this m

[Analytics] Java daemons restart on the Hadoop cluster (security upgrades)

2016-08-26 Thread Luca Toscano
Hi all, I need to restart all the Java daemons on the Analytics Hadoop cluster for security upgrades. This procedure might affect ongoing jobs so please let me know if you see any issue during the next hours. IRC: elukey (#wikimedia-analytics Freenode) Thanks for the patience! Luca

[Analytics] Upcoming reboots of stat100[234] and most of the Analytics hosts (Kafka and Hadoop)

2016-10-20 Thread Luca Toscano
Hi everybody, due to a severe kernel vulnerability ( https://access.redhat.com/security/vulnerabilities/2706661) I need to reboot the stat1002, stat1003 and stat1004 hosts to install the new kernel. The reboots are scheduled for 9 AM CEST tomorrow (Oct 21st), please follow up with me or anybody

Re: [Analytics] Resources stat1005

2017-08-14 Thread Luca Toscano
Hi Adrian, you should open a phab task like the following: https://phabricator.wikimedia.org/T158053 to get into the nda LDAP group (if you really need it as Nuria mentioned :). Luca 2017-08-13 0:52 GMT+02:00 Adrian Bielefeldt < adrian.bielefe...@mailbox.tu-dresden.de>: > Hi Andrew, > > thanks

[Analytics] Ongoing Network maintenance will affect all analytics websites

2017-04-27 Thread Luca Toscano
Tracking task: T148506 Should last approximately 10/15 mins from now and it will affect all the Analytics websites (Yarn, Hue, Pivot, etc..). Please reach out to me (elukey) on IRC if you have further questions. Luca ___ Analytics mailing list

Re: [Analytics] Short Hive, Oozie, Druid & Pivot downtime Tuesday April 25th

2017-04-24 Thread Luca Toscano
Quick update: the maintenance procedure will begin in some minutes! 2017-04-21 20:14 GMT+02:00 Luca Toscano <ltosc...@wikimedia.org>: > Update: > > Due to a opsen that forgot about his bank holiday (namely me) we decided > to anticipate the downtime on Monday April 24th (sa

[Analytics] Upcoming reboots of all the Analytics hosts (including stat1002, stat1003 and stat1004)

2017-06-20 Thread Luca Toscano
Hi everybody, due to a severe kernel vulnerability (*https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt )* we need to reboot the stat1002, stat1003 and stat1004 hosts to install the new kernel. The reboots are scheduled for

[Analytics] Alter tables for the log database on the analytics slaves

2017-06-27 Thread Luca Toscano
Hi everybody, the Analytics team is working on some alter tables to the Eventlogging 'log' database on analytics-store (dbstore1002) and analytics-slave (db1047) as part of https://phabricator.wikimedia.org/T167162. The list of alter tables are the following:

[Analytics] Eventlogging - Mysql inserts to m4-master/db1107 are temporarily suspended to allow a backup

2017-12-18 Thread Luca Toscano
Hi everybody, just wanted to let you know that we have stopped the Eventlogging Mysql Kafka consumers on eventlog1001 for https://phabricator.wikimedia.org/T183123. They will be re-enabled as soon as possible. Thanks! Luca ___ Analytics mailing list

Re: [Analytics] The notebook1002 host will be temporary repurposed as Kafka Analytics broker

2017-12-18 Thread Luca Toscano
2017-12-15 20:29 GMT+01:00 Tilman Bayer <tba...@wikimedia.org>: > > > On Fri, Dec 15, 2017 at 5:35 AM, Luca Toscano <ltosc...@wikimedia.org> > wrote: >> >> >> >>> We are in the process of ordering new hardware to replace the current >>>&

Re: [Analytics] Eventlogging - Mysql inserts to m4-master/db1107 are temporarily suspended to allow a backup

2017-12-18 Thread Luca Toscano
Hi again, everything is back running as expected. Thanks! Luca 2017-12-18 15:25 GMT+01:00 Luca Toscano <ltosc...@wikimedia.org>: > Hi everybody, > > just wanted to let you know that we have stopped the Eventlogging Mysql > Kafka consumers on eventlog1001 for h

Re: [Analytics] The notebook1002 host will be temporary repurposed as Kafka Analytics broker

2017-12-15 Thread Luca Toscano
Hi Tilman, 2017-12-15 8:53 GMT+01:00 Tilman Bayer <tba...@wikimedia.org>: > > On Wed, Dec 6, 2017 at 11:03 AM, Luca Toscano <ltosc...@wikimedia.org> > wrote: > >> Hi everybody, >> >> as outlined in https://phabricator.wikimedia.org/T18151

Re: [Analytics] Analytics maintenance windows announced for stat boxes, db1046 and thorium (analytics websites)

2017-11-21 Thread Luca Toscano
3764603, apologies for the trouble and the time wasted :( Good news is that the master database was switched without any data loss and we are now using a more powerful host! Thanks! Luca 2017-11-14 18:59 GMT+01:00 Luca Toscano <ltosc...@wikimedia.org>: > Hi everybody, > > the Analyti

Re: [Analytics] Important news about Analytics databases

2017-11-21 Thread Luca Toscano
to proceed, please ping us otherwise. The log database is scheduled to be dropped from dbstore1002 on Tuesday 28th. After that, the log database will be available only on db1108 (analytics-slave.eqiad.wmnet). Thanks a lot! Luca 2017-11-08 12:02 GMT+01:00 Luca Toscano <ltosc...@wikimedia.org>:

[Analytics] Analytics maintenance windows announced for stat boxes, db1046 and thorium (analytics websites)

2017-11-14 Thread Luca Toscano
Hi everybody, the Analytics team needs to do the following maintenance operations: 1) migrate the Event-Logging master db ('log', currently on db1046) to the new host db1107 (T156844). This should happen on *Wed Nov 15th (EU morning)*, and it should be transparent to all the Event Logging users.

[Analytics] Important news about Analytics databases

2017-11-08 Thread Luca Toscano
Hi everybody, the Analytics team needs to make some changes to the current configuration and deployment of the Analytics databases. Before starting a little refresh to be on the same page: - db1046 - eventlogging master database - db1047 - also known as analytics-slave.eqiad.wmnet - replicates

Re: [Analytics] [Analytics Cluster] Downtime announcement for Oozie/Hive - Dec 7 10AM CET

2017-12-07 Thread Luca Toscano
GMT+01:00 Luca Toscano <ltosc...@wikimedia.org>: > Hi everybody, > > we'd need to reboot the analytics1003 host for Linux kernel and openjdk > updates tomorrow Dec 07 at 10 AM CET. Hive and Oozie will stop for a > (hopefully) brief amount of time, but since they'll need to sto

Re: [Analytics] [Analytics Cluster] Downtime announcement for Oozie/Hive - Dec 7 10AM CET

2017-12-07 Thread Luca Toscano
a temporary unavailability of Hive. Thanks for the patience! Luca 2017-12-07 12:36 GMT+01:00 Luca Toscano <ltosc...@wikimedia.org>: > Hi everybody, > > we are experiencing some issues with the Hive daemon, so currently Hive > queries are not available. I am going to update t

[Analytics] [Analytics Cluster] Downtime announcement for Oozie/Hive - Dec 7 10AM CET

2017-12-06 Thread Luca Toscano
Hi everybody, we'd need to reboot the analytics1003 host for Linux kernel and openjdk updates tomorrow Dec 07 at 10 AM CET. Hive and Oozie will stop for a (hopefully) brief amount of time, but since they'll need to stop before the reboot it might happen that in flight jobs/queries fail. We'll try

[Analytics] The notebook1002 host will be temporary repurposed as Kafka Analytics broker

2017-12-06 Thread Luca Toscano
Hi everybody, as outlined in https://phabricator.wikimedia.org/T181518 the Analytics team needs to repurpose the notebook1002 host (one of the PAWS/Jupyter nodes) as Kafka Analytics broker for a urgent maintenance procedure. We are not aware of anybody actively using it (as it happens with

[Analytics] Stopping mysql on db1047 (analytics slave) for maintenance

2017-10-26 Thread Luca Toscano
Hi everybody, mysql will not be available on db1047 for some time due to maintenance for https://phabricator.wikimedia.org/T177405. Ping me on IRC (elukey) or the #wikimedia-analytics channell if you encounter any issue or if you have any questions. Thanks! Luca

Re: [Analytics] Data ingestion issue with Webrequest 2018-06-14-11

2018-06-18 Thread Luca Toscano
Hi again, everything is back on track! Please let us know in the Phabricator task (T197281) if you still see something out of the ordinary. Thanks! Luca Il giorno ven 15 giu 2018 alle ore 18:55 Luca Toscano < ltosc...@wikimedia.org> ha scritto: > Hi everybody, > > we ha

[Analytics] Data ingestion issue with Webrequest 2018-06-14-11

2018-06-15 Thread Luca Toscano
Hi everybody, we have been working on an issue while refining webrequest data for the 2018-06-14-11 hour, tracked in https://phabricator.wikimedia.org/T197281. We have a fix that will be deployed on Monday, so we apologize in advance if today and during the weekend some data will be missing.

[Analytics] Piwik maintenance ongoing

2018-06-27 Thread Luca Toscano
Hi everybody, as FYI piwik.wikimedia.org will be in maintenance mode for a couple of hours due to a software upgrade. More info in https://phabricator.wikimedia.org/T192298 Thanks! Luca ___ Analytics mailing list Analytics@lists.wikimedia.org

[Analytics] dbstore1002 / analytics-store.eqiad.wmnet downtime announcement for Jan 09 15:00 UTC

2018-01-08 Thread Luca Toscano
Hi everybody, dbstore1002 (also known as analytics-store.eqiad.wmnet) needs to be shutdown for maintenance tomorrow Jan 09 at around 15:00 UTC for https://phabricator.wikimedia.org/T183771. We don't expect the downtime to last more than a couple of hours, but there are some outstanding issues

[Analytics] Reboot of eventlog1001 for kernel upgrades

2018-01-15 Thread Luca Toscano
Hi everybody, I am about to reboot eventlog1001 for kernel upgrades. This host runs all the Eventlogging daemons that pull data from Kafka, elaborate it and then push to Mysql. The maintenance is needed to deploy the new Linux Kernel that fixes the Meltdown vulnerability. If you see a dip in

[Analytics] Analytics hosts reboot announcement - Wed 17th 10 AM CET

2018-01-15 Thread Luca Toscano
Hi everybody, as you already know we are deploying the new Linux kernel to fix the Meltdown vulnerability across the production fleet. This means that I need to reboot all the stat boxes (stat100[456]) and also analytics1003 (running Oozie, Camus, Hive, etc..), probably interfering with the work

Re: [Analytics] Page hourly views

2018-02-12 Thread Luca Toscano
Hi everybody, thanks a lot for the ping. The pageviews archives are copied from a HDFS read-only mount point to the host that servers them statically via HTTP, and during the weekend an event happenedthat caused the rsync to stop working. The issue should now be fixed. I opened

Re: [Analytics] [Engineering] Analytics Hadoop cluster maintenance postponed - Tue 13th February

2018-02-13 Thread Luca Toscano
Hi everybody, the Analytics Hadoop upgrade to Java 8 has been postponed due to unexpected hw failures (https://phabricator.wikimedia.org/T187164 and https://phabricator.wikimedia.org/T187162) happened this morning and causing some HDFS blocks to be under replicated/corrupted. To be on the safe

[Analytics] Maintenance window for db1107 (Event Logging log database)

2018-01-02 Thread Luca Toscano
Hi everybody, as part of https://phabricator.wikimedia.org/T168414 the Analytics team needs to execute a lot of alter tables to the log database to be able to complete the work of data purging/sanitization. The plan is to stop the Eventlogging Mysql Consumer on eventlog1001 tomorrow Jan 03 during

[Analytics] Archiva moves to a new host and gets upgraded to 2.2.3

2018-08-28 Thread Luca Toscano
Hi everybody, if you are not a Archiva user (https://archiva.wikimedia.org/) you can stop reading this email. Tomorrow morning EU time I am going to move archiva.wikimedia.org to a new host, as explained in details in T192639. Important changes: - Archiva gets upgraded to the latest upstream

[Analytics] Cron jobs running on Analytics stat hosts and firewall rules

2018-08-27 Thread Luca Toscano
Hi everybody, as part of T198623 the Analytics and Traffic team worked on a better set of firewall rules for ipv4/ipv6 traffic generated within the Analytics VLAN and going towards Production. For example, we are now enforcing the usage of https://wikitech.wikimedia.org/wiki/HTTP_proxy for all

[Analytics] Hive and Oozie unavailable for a brief hardware maintenance on Sept 7th

2018-09-06 Thread Luca Toscano
Hi everybody, just wanted to let you know that tomorrow morning EU time I'll stop Hive and Oozie on analytics1003 (Hadoop Cluster) to allow the host to be rebooted. I don't plan to kill any job but to wait for their completion. While the maintenance happens some errors might happen to your jobs.

Re: [Analytics] Hive and Oozie unavailable for a brief hardware maintenance on Sept 7th

2018-09-10 Thread Luca Toscano
ic_...@wikimedia.de> ha scritto: > Hi, > > I am sorry if it turns out that you have already informed us about this: > please, is the cluster reboot completed? > > I have a lot of Hive jobs waiting. > > Thanks, > > Goran > > On Fri, Sep 7, 2018, 13:57 Luca Toscano w

Re: [Analytics] Hive and Oozie unavailable for a brief hardware maintenance on Sept 7th

2018-09-10 Thread Luca Toscano
----- > > > On Mon, Sep 10, 2018 at 3:09 PM Luca Toscano > wrote: > >> Hi Goran, >> >> sorry for the delay but this morning we didn't manage to drain the >> cluster and reboot, I am going to do it now. It should take a maximum of 2 >

[Analytics] Upcoming reboot of stat* and notebook* hosts - Sept 13th

2018-09-11 Thread Luca Toscano
Hi everybody, on Thursday Sept 13th (EU morning) I am planning to reboot the stat hosts (stat1004, stat1005 and stat1006) and the notebook hosts (notebook1003, notebook1004) for Linux kernel upgrades. Please let me know if this impacts your work in https://phabricator.wikimedia.org/T203165 or on

Re: [Analytics] Kafka Main Eqiad outage and failover of Eventbus/Eventstreams to codfw

2018-07-12 Thread Luca Toscano
during the next couple of hours. The consumers of the Eventstreams service may get some failures or data drops, apologies in advance for the trouble. Cheers, Luca Il giorno gio 12 lug 2018 alle ore 00:00 Luca Toscano < ltosc...@wikimedia.org> ha scritto: > Hi everybody, > > as you

Re: [Analytics] Eventlogging mysql consumers temporarily stopped due to maintenance

2018-03-06 Thread Luca Toscano
13:55 GMT+01:00 Luca Toscano <ltosc...@wikimedia.org>: > Hi everybody, > > today, while performing maintenance to the Eventlogging Master database, > we ended up in https://phabricator.wikimedia.org/T188991 (TL;DR: two > hours of data inserted to the slave database and no

[Analytics] Eventlogging daemons migrated to a new host - a brief hole in metrics is expected

2018-03-08 Thread Luca Toscano
Hi everybody, today as part of https://phabricator.wikimedia.org/T114199 we migrated all the eventlogging daemons (except the zmq-forwarder, see https://gerrit.wikimedia.org/r/#/c/415218/) from eventlog1001 (Ubuntu Trusty) to eventlog1002 (Debian Stretch). The maintenance that we followed

[Analytics] Introducing the HDFS Trash directory in the Analytics Hadoop cluster

2018-04-12 Thread Luca Toscano
Hi everybody, in T189051 the Analytics team introduced a new feature in the Hadoop cluster, namely the HDFS Trash directory. This means that now if you use the hdfs -rm CLI command you will not directly delete a file or a directory, but you'll move it under /user/$yourusername/.Trash. The Trash

[Analytics] Upcoming reboot of stat100[56] and analytics1003 (Hive, Oozie) for kernel security upgrades

2018-03-06 Thread Luca Toscano
Hi everybody, tomorrow EU morning (Wed Mar 7th) I'd need to reboot stat100[56] and analytics1003 for kernel security updates. Hive and Oozie (Analytics Hadoop cluster) will not be available for a (hopefully) brief period of time. Please let me know if there is an important work that you are doing

[Analytics] Eventlogging mysql consumers temporarily stopped due to maintenance

2018-03-06 Thread Luca Toscano
Hi everybody, today, while performing maintenance to the Eventlogging Master database, we ended up in https://phabricator.wikimedia.org/T188991 (TL;DR: two hours of data inserted to the slave database and not the master one). We are working to find a feasible solution to avoid loosing data and

Re: [Analytics] Piwik maintenance ongoing

2018-06-28 Thread Luca Toscano
Hi again! I am about to upgrade Piwik again, this time to the new Matomo version (the last rebranding name), hopefully the last invasive action for a while. Progress tracked in https://phabricator.wikimedia.org/T192298 Luca Il giorno mer 27 giu 2018 alle ore 14:49 Luca Toscano < lt

[Analytics] Hive and Oozie unavailable for maintenance on Tue Oct 9th 10 AM CEST

2018-10-05 Thread Luca Toscano
Hi everybody, the Analytics team is going to move the Oozie and Hive daemons from the analytics1003 host to an-coord1001 (new host, hardware refresh) on Tuesday Oct 9th at 10 AM CEST. This will require downtime for Oozie and Hive, so some jobs might fail or not work at all during the maintenance.

[Analytics] Druid upgrade - Thu 25th 11 AM CEST

2018-10-23 Thread Luca Toscano
Hi everybody, the Analytics team will upgrade the Druid cluster behind Superset/Turnilo (druid100[1-3]) to version 0.12.3 on Thursday 25th at 11AM CEST. At the same time, we'll upgrade Turnilo to version 1.8.1. Since it will be a rolling upgrade, you shouldn't see a major impact but possibly

[Analytics] Upcoming move of users from stat1005 to stat1007

2018-10-31 Thread Luca Toscano
Hi everybody, as part of https://phabricator.wikimedia.org/T205846 we are going to ask to all the stat1005's users to move to stat1007 during the next two weeks. The deadline is November 14th, by which time ssh access to stat1005 will be removed. Background: on stat1005 we have a GPU (more

[Analytics] Analytics Hadoop Cloudera upgrade scheduled for Nov 12th (Monday) at 14:00 CEST

2018-11-08 Thread Luca Toscano
Hi everybody, the Analytics team will shutdown completely the Hadoop cluster for a couple of hours on Monday Nov 12th at 14:00 CEST to upgrade the Cloudera distribution to 5.15 (currently 5.10). No big updates but only a collection of small/medium fixes that (hopefully) will improve the

Re: [Analytics] Upcoming move of users from stat1005 to stat1007

2018-11-06 Thread Luca Toscano
Hi everybody, this is a reminder that in a week stat1005 will not be usable anymore. Please follow up with me or the Analytics team if you need more time or if you have any question :) Thanks! Luca Il giorno mer 31 ott 2018 alle ore 16:03 Luca Toscano < ltosc...@wikimedia.org> ha s

Re: [Analytics] [Wiki-research-l] Hive and Oozie unavailable for maintenance on Tue Oct 9th 10 AM CEST

2018-10-10 Thread Luca Toscano
has moved, you'll have to update > its url from *analytics1003.eqiad.wmnet *to *an-coord1001.eqiad.wmnet *in > any scripts you have. > > On Fri, 5 Oct 2018 at 09:54, Luca Toscano wrote: > >> Hi everybody, >> >> the Analytics team is going to move the Oozie and H

[Analytics] Eventlogging host rebooted, metrics might show some dips

2018-10-11 Thread Luca Toscano
Hi everybody, I stopped Eventlogging completely from 14:16 to 14:17 UTC to allow a host reboot for kernel upgrades. This might end up shown in some Kafka throughput metrics related to the Eventlogging schemas as a dip. If you have any question please feel free to follow up with me or the

Re: [Analytics] Analytics Hadoop cluster full shutdown scheduled for Sept 25th

2018-09-24 Thread Luca Toscano
Hi everybody, this is a reminder that the maintenance will happen tomorrow (Tue 25th, 10 CEST). Luca Il giorno ven 14 set 2018 alle ore 12:13 Luca Toscano < ltosc...@wikimedia.org> ha scritto: > Hi everybody, > > the Analytics team needs to replace the Hadoop master node hosts

[Analytics] Brief unavailability scheduled for the Event Logging database replica

2018-09-26 Thread Luca Toscano
Hi everybody, Tomorrow Sept 27th at 10 CEST db1108 (alias analytics-slave) will be down for a brief (max 30 mins) maintenance (Mariadb and Linux kernel upgrade). This means that the log database will not be available for querying during this time frame. Please reach out to me or to the Analytics

Re: [Analytics] dbstore1002 currently under unexpected maintenance

2019-01-17 Thread Luca Toscano
Hi everybody, we are again back into a maintenance window due to an unexpected mysql crash. More info in https://phabricator.wikimedia.org/T213670. Luca (on behalf of the Analytics team and the Data persistence team) Il giorno lun 14 gen 2019 alle ore 09:47 Luca Toscano < ltosc...@wikimedia.

[Analytics] Eventlogging master database down for maintenance

2019-01-17 Thread Luca Toscano
Hi everybody, as FYI the Eventlogging master database (on db1107) is currently down to ease rack maintenance. More info in https://phabricator.wikimedia.org/T213748 Recent data on the db1108/analytics-slave's log database will be delayed. Let me know if this is an issue for you on IRC or via

[Analytics] Investigation ongoing about data loss error for Webrequest 2018-12-01 hour 14

2018-12-03 Thread Luca Toscano
Hi everybody, during the weekend Oozie alerted us about a suspect data loss for the Webrequest dataset for hour 14 of 2018-12-01. We opened a task to investigate: https://phabricator.wikimedia.org/T211000 This means that related datasets will be missing until we have a final fix/answer,

[Analytics] Unscheduled reboot of stat1007

2018-11-22 Thread Luca Toscano
Hi everybody, as FYI today I rebooted stat1007 due to unexpected maintenance (an error from my side) while investigating a Spark2 issue (that is now fixed). Apologies if this has impacted your work! Luca ___ Analytics mailing list

Re: [Analytics] Upcoming move of users from stat1005 to stat1007

2018-11-20 Thread Luca Toscano
check all your data on stat1007 as soon as possible (and let us know if you are missing something). Thanks a lot! Luca (on behalf of the Analytics team) Il giorno mer 7 nov 2018 alle ore 07:32 Luca Toscano ha scritto: > Hi everybody, > > this is a reminder that in a week

[Analytics] dbstore1002 currently under unexpected maintenance

2019-01-14 Thread Luca Toscano
Hi everybody, analytics-store/dbstore1002 is currently experiencing issues, more info in https://phabricator.wikimedia.org/T213670. The mysql daemon on the host will likely experience downtime while we attempt to fix the issue, apologies in advance for the trouble. For any question feel free to

[Analytics] Analytics Hadoop cluster full shutdown scheduled for Sept 25th

2018-09-14 Thread Luca Toscano
Hi everybody, the Analytics team needs to replace the Hadoop master node hosts (analytics100[1,2]) and the Hive/Oozie host (analytics1003) as part of regular hardware refresh (hosts getting out of warranty). In order to do things safely we decided to proceed with a full cluster shutdown on Sept

Re: [Analytics] Trouble getting yesterday's pageviews data

2019-04-02 Thread Luca Toscano
Hi Collin, you have anticipated my email :) We are tracking the issue in https://phabricator.wikimedia.org/T219842, we had a Kafka outage yesterday and we are still fixing jobs that didn't run. Luca Il giorno mar 2 apr 2019 alle ore 14:47 Collin Stedman ha scritto: > Hello, > > I'm having

Re: [Analytics] Maintenance proposal for dbstore1002 - staging database migration to dbstore1005 on Monday 18th (EU Morning)

2019-02-19 Thread Luca Toscano
M Manuel Arostegui > wrote: > >> Hello, >> >> I am setting dbstore1002 on read-only now, to start the migration. >> >> Thanks >> Manuel. >> >> On Tue, Feb 12, 2019 at 10:11 AM Luca Toscano >> wrote: >> >>> Hi everybody, >

[Analytics] DEPRECATION WARNING: dbstore1002 is going to be decommissioned on March 4th

2019-02-22 Thread Luca Toscano
Hi everybody, the Analytics team has been working with the SRE Data Persistence team during the last months to replace dbstore1002 with three brand new nodes, dbstore100[3-5]. We are moving from a single mysql instance (multi-source) to a multi-instance environment. For more info please check: *

[Analytics] Hadoop Yarn running application data moved from zookeeper to HDFS

2019-02-28 Thread Luca Toscano
Hi everybody, today we configured Hadoop Yarn to store its application/jobs data (called rmstore) from Zookeeper to HDFS. We are going to remove a lot of data from our Zookeeper cluster in eqiad (several thousands of znodes), hopefully increasing its reliability (it is shared with all the Kafka

[Analytics] Maintenance proposal for dbstore1002 - staging database migration to dbstore1005 on Monday 18th (EU Morning)

2019-02-12 Thread Luca Toscano
Hi everybody, as described in here (https://phabricator.wikimedia.org/T215589#4946535) I am proposing a maintenance window to allow the Data Persistence and Analytics teams to move the staging database from dbstore1002 to dbstore1005 (its new home) on Monday 18th during the EU morning. This will

[Analytics] Rename of sX-analytics-slave DNS CNAMEs to sX-analytics-replica

2019-02-05 Thread Luca Toscano
Hi everybody, as you already know dbstore1002 is going to be migrated to dbstore100[3-5] moving from a multisource (single host) scheme to a multi instance (multi host) one. As part of the migration we have to upgrade the following DNS CNAMEs (they all point now to dbstore1002):

[Analytics] Analytics Hadoop cluster offline for maintenance on Wed Apr 17th at 15:00 CET

2019-04-15 Thread Luca Toscano
Hi everybody, the Analytics team is planning to upgrade the Hadoop cluster to CDH 5.16.1 (changelog in https://phabricator.wikimedia.org/T218343) on Wed Apr 17th at 15:00 CET. All services (HDFS, Hive, Oozie, Notebooks, etc..) will be unavailable for one hour if everything goes according to plan,

Re: [Analytics] Superset 0.32 upgrade coming tomorrow (May 15th, early EU morning)

2019-05-16 Thread Luca Toscano
gt; Edits per platform in Indonesia and Arabic Wikipedia last month: > https://bit.ly/2JlKNIG > > Thanks, > > Nuria > > On Tue, May 14, 2019 at 10:57 AM Luca Toscano > wrote: > >> Hi everybody, >> >> as FYI I am going to upgrade Superset tomorr

[Analytics] Hive and Oozie unavailable for maintenance on Wed 26th 9 AM CEST

2019-06-25 Thread Luca Toscano
Hi everybody, as part of https://phabricator.wikimedia.org/T225306 I need to reboot the an-coord1001 host, that runs the Hive server/metastore and Oozie. Tomorrow June 26th I'll reboot the host at around 9 AM CEST, the maintenance window should last 10/15 minutes more or less. This means that

[Analytics] Maintenance to the Eventlogging databases

2019-06-24 Thread Luca Toscano
Hi everybody, at 10 AM CEST the SRE/Analytics team will take down db1107 and db1108 (where the log database is stored/accessed) for maintenance. It should last half an hour. If you have any questions please reach out to me (elukey) on IRC or to the a-team alias at #wikimedia-analytics on

[Analytics] Superset 0.32 upgrade coming tomorrow (May 15th, early EU morning)

2019-05-14 Thread Luca Toscano
Hi everybody, as FYI I am going to upgrade Superset tomorrow (May 15th) to 0.32. This will involve moving to a new host based on Debian Buster and Python 3.7, so the move will require some time and it will be hopefully fully done early during the EU morning. Tracking task:

[Analytics] Reboot of stat1004--6-7 and notebook1003-4 happening on May 21st (early EU morning)

2019-05-20 Thread Luca Toscano
Hi everybody, the stat1004-6-7 and notebook1003-4 hosts will be rebooted tomorrow morning, May 21st, during the EU morning for security upgrades (Linux kernel upgrades). Please let me or anybody in the Analytics team know if this is problematic for your work so we can schedule a better

[Analytics] Reboot of the Analytics dbstore database hosts and upgrade of Mariadb

2019-04-19 Thread Luca Toscano
Hi everybody, on Monday 22nd the SRE Data Persistence team will reboot the Analytics dbstore database hosts for Linux kernel + Mariadb upgrade during early EU morning. It shouldn't affect anybody but please let me know if you have any issue with it. Thanks! Luca (on behalf of Analytics and Data

[Analytics] Firewall on stat100x and notebook100x hosts

2019-07-05 Thread Luca Toscano
TL;DR: In https://phabricator.wikimedia.org/T170826 the Analytics team wants to add base firewall rules to stat100x and notebook100x hosts, that will cause any non-localhost or known traffic to be blocked by default. Please let us know in the task if this is a problem for you. Hi everybody, the

Re: [Analytics] [Wiki-research-l] Firewall on stat100x and notebook100x hosts

2019-07-10 Thread Luca Toscano
Hi Isaac, Il giorno mer 10 lug 2019 alle ore 16:14 Isaac Johnson ha scritto: > Hey Luca, > We discussed this in Research and it all sounds good to us with one > question below. If something else arises, we'll ping you. Thanks for the > heads up! > > > We assumed that instructing Spark to use a

Re: [Analytics] [Wiki-research-l] Analytics clients (stat/notebook hosts) and backups of home directories

2019-07-11 Thread Luca Toscano
uestion for you: As you allow/encourage for more copies of >>> the files to exist, what is the mechanism you'd like to put in place >>> for reducing the chances of PII to be copied in new folders that then >>> will be even harder (for your team) to keep track of? Havin

[Analytics] Analytics clients (stat/notebook hosts) and backups of home directories

2019-07-04 Thread Luca Toscano
Hi everybody, as part of https://phabricator.wikimedia.org/T201165 the Analytics team thought to reach out to everybody to make it clear that all the home directories on the stat/notebook nodes are not backed up periodically. They run on a software RAID configuration spanning multiple disks of

[Analytics] Urgent maintenance to an-coord1001 requires a brief stop of Oozie/Hive/Spark/etc..

2019-07-15 Thread Luca Toscano
Hi everybody, due to https://phabricator.wikimedia.org/T227941 we'd need to take down Oozie/Hive/etc.. on an-coord1001. The maintenance should not last long, but if you have any issue please reach out to us on IRC (#wikimedia-analytics on Freenode). Thanks! Luca (on behalf of the Analytics

Re: [Analytics] Urgent maintenance to an-coord1001 requires a brief stop of Oozie/Hive/Spark/etc..

2019-07-15 Thread Luca Toscano
Maintenance completed! Luca Il giorno lun 15 lug 2019 alle ore 16:22 Luca Toscano < ltosc...@wikimedia.org> ha scritto: > Hi everybody, > > due to https://phabricator.wikimedia.org/T227941 we'd need to take down > Oozie/Hive/etc.. on an-coord1001. The maintenance sh

[Analytics] Python 2 is going EOL on January 1st - Do you need Python 2 packages in Analytics?

2019-09-13 Thread Luca Toscano
Hi everybody, as https://www.python.org/doc/sunset-python-2/ says Python 2 is finally going EOL on January 1st. We (as Analytics team) have a lot of packages deployed on stat/notebook/hadoop hosts via puppet that should be removed, but before doing so we'd need to know if anybody of you is

[Analytics] stat1005 back in the pool of Analytics client hosts

2019-09-26 Thread Luca Toscano
Hi everybody, stat1005 was replaced almost a year ago by stat1007 to allow GPU research and testing (https://phabricator.wikimedia.org/T148843). After a long journey we are happy to add stat1005 back in the pool of available Analytics client hosts. I have updated the documentation in:

[Analytics] Enable Kerberos authentication for Hadoop (please read if you use Hadoop for your daily work)

2019-11-18 Thread Luca Toscano
Hi everybody, the Analytics team is going to enable Kerberos authentication for Hadoop on Monday December 2nd. The procedure will start around 10 AM CET and will hopefully last 3/4 hours, but since this is an invasive change there might be a possibility that it will last more. If you have

Re: [Analytics] Brief shutdown of stat1007 for maintenance - Thu Dec 12th 15:30 CET

2019-12-17 Thread Luca Toscano
Hi again, the maintenance has been postponed to tomorrow (Dec 18th) around 15:00 CET. Please let me know if it will be a problem for you. The whole maintenance shouldn't last more than 30 mins :) Thanks! Luca Il giorno mer 11 dic 2019 alle ore 08:36 Luca Toscano < ltosc...@wikimedia.org&

Re: [Analytics] Enable Kerberos authentication for Hadoop (please read if you use Hadoop for your daily work)

2019-12-16 Thread Luca Toscano
take more if we encounter unexpected issues. Thanks for the patience! Luca Il giorno mar 26 nov 2019 alle ore 09:46 Luca Toscano < ltosc...@wikimedia.org> ha scritto: > Hi everybody, > > to avoid any conflict with the start of the Fundraising season, we moved > the Kerberos

Re: [Analytics] Enable Kerberos authentication for Hadoop (please read if you use Hadoop for your daily work)

2019-11-26 Thread Luca Toscano
://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide#Get_a_password_for_Kerberos Thanks! Luca (on behalf of the Analytics team) Il giorno lun 18 nov 2019 alle ore 16:11 Luca Toscano < ltosc...@wikimedia.org> ha scritto: > Hi everybody, > > the Analytics team is going to

[Analytics] Maintenance window for the Hadoop cluster - Tue Oct 15th 14:30 CET - 15:30 CET

2019-10-14 Thread Luca Toscano
Hi everybody, the Analytics team is going to stop HDFS and Yarn services for a (hopefully) brief time window tomorrow, Tue Oct 15th, from 14:30 to 15:30 CEST. We are going to swap the Zookeeper cluster from the one currently used by all the Kafka production services to a dedicated one within the

[Analytics] Home directories of users belonging to analytics-privatedata-users will change permissions

2020-03-03 Thread Luca Toscano
Hi everybody, as part of https://phabricator.wikimedia.org/T246578 we'd like to enforce some basic permissions via puppet to all the home directories on analytics clients (stat/notebooks) of analytics-privatedata-users to $user:analytics-privatedata-users 750. For example, let's pick my home,

Re: [Analytics] Home directories of users belonging to analytics-privatedata-users will change permissions

2020-03-03 Thread Luca Toscano
ansm > > to be able to read and write any file in any directory in my home > directory. > > Thanks. > > Best, > Goran > > > > On Tue, Mar 3, 2020, 19:06 Luca Toscano wrote: > >> Hi everybody, >> >> as part of https://phabricator.wikimedia.o

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-05 Thread Luca Toscano
Hey Neil, there were two Yarn jobs running related to your notebooks, I just killed them, let's see if it solves the problem (you might need to restart again your notebook). If not, let's open a task and investigate :) Luca Il giorno gio 6 feb 2020 alle ore 02:08 Neil Shah-Quinn <

  1   2   >