Re: [Analytics] pagecounts dumps a couple hours behind

2016-02-19 Thread Luca Toscano
Hi! we have some problems with data ingestion at the moment, I'll update the list with more precise information later on. Sorry and thanks for the patience! Luca On Fri, Feb 19, 2016 at 8:36 AM, Bo Han wrote: > Hey team, > > I noticed that the dumps at > http://dumps.wikimedia.org/other/pagec

Re: [Analytics] pagecounts dumps a couple hours behind

2016-02-19 Thread Luca Toscano
AM, Luca Toscano wrote: > Hi! > > we have some problems with data ingestion at the moment, I'll update the > list with more precise information later on. > > Sorry and thanks for the patience! > > Luca > > On Fri, Feb 19, 2016 at 8:36 AM, Bo Han wrote: >

Re: [Analytics] Requesting access to Wikimedia Pageview Dumps for Research

2016-03-03 Thread Luca Toscano
Hi Gonzalo, I believe that yesterday we had to perform some maintenance tasks causing the issue that you were experiencing, they should be gone now, can you double check? There are no issues in consuming the data, but please be a good citizen avoiding to send tons of requests to our servers at the

[Analytics] All the nodes in the Analytics Hadoop cluster will be rebooted today

2016-03-16 Thread Luca Toscano
Hi folks! Due to a kernel upgrade for a security fix we need to reboot each node of the Hadoop cluster. The task will be started later on today and it will be done in small batches to avoid causing major delays to outstanding jobs. Please contact me if you notice any major issue (elukey or the #wi

[Analytics] Hadoop cluster - Added automatic failover to the HDFS Namenode

2016-03-29 Thread Luca Toscano
Hi! TL;DR: The Analytics team added automatic failover for the HDFS Namenode. A new daemon is running on the analytics1001/1002 hosts called hadoop-hdfs-zkfc (port 8019) responsible to talk with Zookeeper and execute periodical health checks. Monitoring and Ferm rules has been added. More info:

[Analytics] Planned downtime/maintenance for stat1001.eqiad.wmnet

2016-04-27 Thread Luca Toscano
Hi all, as part of https://phabricator.wikimedia.org/T76348 the Analytics team is going to re-image stat1001.wikimedia.org with Debian Jessie. The activity will start on Monday May 2nd at 14:00 PM CEST (UTC+2). Three major services will not be available during the downtime: - datasets.wikimedia

[Analytics] Upcoming reboots of stat1002, stat1003 and stat1004 (on Jun 30th)

2016-06-29 Thread Luca Toscano
Hi! Tomorrow morning (Jun 30th - CET timezone) I'd need to reboot stat1002, stat1003 and stat1004 for kernel upgrades (Ubuntu security patches). This could potentially terminate long running queries or jobs, so please ping me on IRC or email me if your work can't be postponed or stopped. Thanks!

[Analytics] Analytics Hadoop cluster is going to be rebooted for kernel upgrades

2016-06-29 Thread Luca Toscano
Hi again! Forgot to mention that I have currently stopped all Analytics jobs on Hadoop (Camus, Oozie) as preparation step for a complete Hadoop cluster reboot to install new Linux kernel upgrades. The reboots will be performed in small batches to limit the blast radius but it might affect your job

Re: [Analytics] Upcoming reboots of stat1002, stat1003 and stat1004 (on Jun 30th)

2016-06-30 Thread Luca Toscano
On Wed, Jun 29, 2016 at 10:38 AM, Luca Toscano wrote: > Hi! > > Tomorrow morning (Jun 30th - CET timezone) I'd need to reboot stat1002, > stat1003 and stat1004 for kernel upgrades (Ubuntu security patches). This > could potentially terminate long running queries or jobs, so

Re: [Analytics] Upcoming reboots of stat1002, stat1003 and stat1004 (on Jun 30th)

2016-06-30 Thread Luca Toscano
On Thu, Jun 30, 2016 at 12:17 PM, Luca Toscano wrote: > > On Wed, Jun 29, 2016 at 10:38 AM, Luca Toscano > wrote: > >> Hi! >> >> Tomorrow morning (Jun 30th - CET timezone) I'd need to reboot stat1002, >> stat1003 and stat1004 for kernel upgrades

[Analytics] Java daemons restart on the Hadoop cluster (security upgrades)

2016-08-26 Thread Luca Toscano
Hi all, I need to restart all the Java daemons on the Analytics Hadoop cluster for security upgrades. This procedure might affect ongoing jobs so please let me know if you see any issue during the next hours. IRC: elukey (#wikimedia-analytics Freenode) Thanks for the patience! Luca

[Analytics] Upcoming reboots of stat and Hadoop hosts due to Kernel upgrades

2016-09-21 Thread Luca Toscano
Hi everybody, the Analytics team is going to reboot all the stat hosts (stat1002, stat1003 and stat1004) and the Hadoop cluster nodes to install new kernels (security upgrade required). The work will start tomorrow morning (Sep 22nd) at around 9:00 AM CEST. This task might interfere with ongoing H

Re: [Analytics] Upcoming reboots of stat and Hadoop hosts due to Kernel upgrades

2016-09-21 Thread Luca Toscano
> >>> (btw, if we cc analytics and research does that reach everyone using >>> these boxes? Like discovery, fundraising, etc? Basically, everyone who >>> doesn't see this message, raise your hand :)) >>> >>> On Wed, Sep 21, 2016 at 10:44 A

[Analytics] Upcoming reboots of stat100[234] and most of the Analytics hosts (Kafka and Hadoop)

2016-10-20 Thread Luca Toscano
Hi everybody, due to a severe kernel vulnerability ( https://access.redhat.com/security/vulnerabilities/2706661) I need to reboot the stat1002, stat1003 and stat1004 hosts to install the new kernel. The reboots are scheduled for 9 AM CEST tomorrow (Oct 21st), please follow up with me or anybody in

[Analytics] Delays in loading newest Pageview data

2017-02-08 Thread Luca Toscano
Hi everybody, as FYI we are currently tracking some issues between Hadoop worker nodes and the AQS Cassandra cluster in https://phabricator.wikimedia.org/T157533 This means that the last Pageview data will be loaded with a bit of delay. We expect to solve the issue during the next couple of hours

Re: [Analytics] Missing mediacounts for 2016-12-01

2017-02-16 Thread Luca Toscano
2017-02-16 15:18 GMT+01:00 Federico Leva (Nemo) : > As far as I can see, mediacounts.2016-12-01.v00.tsv.bz2 is missing: > > http://dumps.wikimedia.your.org/other/mediacounts/daily/2016/ > https://dumps.wikimedia.org/other/mediacounts/daily/2016/ > > Thanks a lot for the report! We are currently re

[Analytics] AQS API outage

2017-02-23 Thread Luca Toscano
Hi everybody, we are currently experiencing a wide outage of the AQS API, so Pageviews are not available at the moment. I am currently working on it, will update this list as soon as possible. Really sorry for the trouble, Luca ___ Analytics mailing li

Re: [Analytics] AQS API outage

2017-02-23 Thread Luca Toscano
2017-02-23 11:10 GMT+01:00 Luca Toscano : > Hi everybody, > > we are currently experiencing a wide outage of the AQS API, so Pageviews > are not available at the moment. I am currently working on it, will update > this list as soon as possible. > > Update: the AQS API is no

Re: [Analytics] Short Hive, Oozie, Druid & Pivot downtime Tuesday April 25th

2017-04-21 Thread Luca Toscano
Update: Due to a opsen that forgot about his bank holiday (namely me) we decided to anticipate the downtime on Monday April 24th (same time, 13:30 UTC). Please let me know if this causes any trouble to you. Thanks and sorry for the late notification! Luca 2017-04-13 18:36 GMT+02:00 Andrew Otto

Re: [Analytics] Short Hive, Oozie, Druid & Pivot downtime Tuesday April 25th

2017-04-24 Thread Luca Toscano
Quick update: the maintenance procedure will begin in some minutes! 2017-04-21 20:14 GMT+02:00 Luca Toscano : > Update: > > Due to a opsen that forgot about his bank holiday (namely me) we decided > to anticipate the downtime on Monday April 24th (same time, 13:30 UTC). > > Ple

[Analytics] Ongoing Network maintenance will affect all analytics websites

2017-04-27 Thread Luca Toscano
Tracking task: T148506 Should last approximately 10/15 mins from now and it will affect all the Analytics websites (Yarn, Hue, Pivot, etc..). Please reach out to me (elukey) on IRC if you have further questions. Luca ___ Analytics mailing list Analytic

[Analytics] analytics-slave down for maintenance on June 1st 1600 UTC

2017-05-30 Thread Luca Toscano
Hi everybody, analytics-slave (db1047) will be down for a bit (maximum a couple of hours) for hardware maintenance on June 1st at 1600. More info in https://phabricator.wikimedia.org/T159266 Luca ___ Analytics mailing list Analytics@lists.wikimedia.org

[Analytics] Upcoming reboots of all the Analytics hosts (including stat1002, stat1003 and stat1004)

2017-06-20 Thread Luca Toscano
Hi everybody, due to a severe kernel vulnerability (*https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt )* we need to reboot the stat1002, stat1003 and stat1004 hosts to install the new kernel. The reboots are scheduled for

[Analytics] Alter tables for the log database on the analytics slaves

2017-06-27 Thread Luca Toscano
Hi everybody, the Analytics team is working on some alter tables to the Eventlogging 'log' database on analytics-store (dbstore1002) and analytics-slave (db1047) as part of https://phabricator.wikimedia.org/T167162. The list of alter tables are the following: https://phabricator.wikimedia.org/P55

Re: [Analytics] Resources stat1005

2017-08-14 Thread Luca Toscano
Hi Adrian, you should open a phab task like the following: https://phabricator.wikimedia.org/T158053 to get into the nda LDAP group (if you really need it as Nuria mentioned :). Luca 2017-08-13 0:52 GMT+02:00 Adrian Bielefeldt < adrian.bielefe...@mailbox.tu-dresden.de>: > Hi Andrew, > > thanks

[Analytics] Stopping mysql on db1047 (analytics slave) for maintenance

2017-10-26 Thread Luca Toscano
Hi everybody, mysql will not be available on db1047 for some time due to maintenance for https://phabricator.wikimedia.org/T177405. Ping me on IRC (elukey) or the #wikimedia-analytics channell if you encounter any issue or if you have any questions. Thanks! Luca

[Analytics] Important news about Analytics databases

2017-11-08 Thread Luca Toscano
Hi everybody, the Analytics team needs to make some changes to the current configuration and deployment of the Analytics databases. Before starting a little refresh to be on the same page: - db1046 - eventlogging master database - db1047 - also known as analytics-slave.eqiad.wmnet - replicates vi

[Analytics] Analytics maintenance windows announced for stat boxes, db1046 and thorium (analytics websites)

2017-11-14 Thread Luca Toscano
Hi everybody, the Analytics team needs to do the following maintenance operations: 1) migrate the Event-Logging master db ('log', currently on db1046) to the new host db1107 (T156844). This should happen on *Wed Nov 15th (EU morning)*, and it should be transparent to all the Event Logging users.

Re: [Analytics] Analytics maintenance windows announced for stat boxes, db1046 and thorium (analytics websites)

2017-11-21 Thread Luca Toscano
.wikimedia.org/T179914#3764603, apologies for the trouble and the time wasted :( Good news is that the master database was switched without any data loss and we are now using a more powerful host! Thanks! Luca 2017-11-14 18:59 GMT+01:00 Luca Toscano : > Hi everybody, > > the Analytics tea

Re: [Analytics] Important news about Analytics databases

2017-11-21 Thread Luca Toscano
osition to proceed, please ping us otherwise. The log database is scheduled to be dropped from dbstore1002 on Tuesday 28th. After that, the log database will be available only on db1108 (analytics-slave.eqiad.wmnet). Thanks a lot! Luca 2017-11-08 12:02 GMT+01:00 Luca Toscano : > Hi everybody, >

[Analytics] [Analytics Cluster] Downtime announcement for Oozie/Hive - Dec 7 10AM CET

2017-12-06 Thread Luca Toscano
Hi everybody, we'd need to reboot the analytics1003 host for Linux kernel and openjdk updates tomorrow Dec 07 at 10 AM CET. Hive and Oozie will stop for a (hopefully) brief amount of time, but since they'll need to stop before the reboot it might happen that in flight jobs/queries fail. We'll try

[Analytics] The notebook1002 host will be temporary repurposed as Kafka Analytics broker

2017-12-06 Thread Luca Toscano
Hi everybody, as outlined in https://phabricator.wikimedia.org/T181518 the Analytics team needs to repurpose the notebook1002 host (one of the PAWS/Jupyter nodes) as Kafka Analytics broker for a urgent maintenance procedure. We are not aware of anybody actively using it (as it happens with noteboo

Re: [Analytics] [Analytics Cluster] Downtime announcement for Oozie/Hive - Dec 7 10AM CET

2017-12-07 Thread Luca Toscano
GMT+01:00 Luca Toscano : > Hi everybody, > > we'd need to reboot the analytics1003 host for Linux kernel and openjdk > updates tomorrow Dec 07 at 10 AM CET. Hive and Oozie will stop for a > (hopefully) brief amount of time, but since they'll need to stop before the > re

Re: [Analytics] [Analytics Cluster] Downtime announcement for Oozie/Hive - Dec 7 10AM CET

2017-12-07 Thread Luca Toscano
nly a temporary unavailability of Hive. Thanks for the patience! Luca 2017-12-07 12:36 GMT+01:00 Luca Toscano : > Hi everybody, > > we are experiencing some issues with the Hive daemon, so currently Hive > queries are not available. I am going to update this thread as soon as th

Re: [Analytics] The notebook1002 host will be temporary repurposed as Kafka Analytics broker

2017-12-15 Thread Luca Toscano
Hi Tilman, 2017-12-15 8:53 GMT+01:00 Tilman Bayer : > > On Wed, Dec 6, 2017 at 11:03 AM, Luca Toscano > wrote: > >> Hi everybody, >> >> as outlined in https://phabricator.wikimedia.org/T181518 the Analytics >> team needs to repurpose the notebook1002 host (

[Analytics] Eventlogging - Mysql inserts to m4-master/db1107 are temporarily suspended to allow a backup

2017-12-18 Thread Luca Toscano
Hi everybody, just wanted to let you know that we have stopped the Eventlogging Mysql Kafka consumers on eventlog1001 for https://phabricator.wikimedia.org/T183123. They will be re-enabled as soon as possible. Thanks! Luca ___ Analytics mailing list An

Re: [Analytics] The notebook1002 host will be temporary repurposed as Kafka Analytics broker

2017-12-18 Thread Luca Toscano
2017-12-15 20:29 GMT+01:00 Tilman Bayer : > > > On Fri, Dec 15, 2017 at 5:35 AM, Luca Toscano > wrote: >> >> >> >>> We are in the process of ordering new hardware to replace the current >>>> notebook1001 and 1002 hosts, so the absence of no

Re: [Analytics] Eventlogging - Mysql inserts to m4-master/db1107 are temporarily suspended to allow a backup

2017-12-18 Thread Luca Toscano
Hi again, everything is back running as expected. Thanks! Luca 2017-12-18 15:25 GMT+01:00 Luca Toscano : > Hi everybody, > > just wanted to let you know that we have stopped the Eventlogging Mysql > Kafka consumers on eventlog1001 for https://phabricator. > wikimedia.org/T18312

[Analytics] Maintenance window for db1107 (Event Logging log database)

2018-01-02 Thread Luca Toscano
Hi everybody, as part of https://phabricator.wikimedia.org/T168414 the Analytics team needs to execute a lot of alter tables to the log database to be able to complete the work of data purging/sanitization. The plan is to stop the Eventlogging Mysql Consumer on eventlog1001 tomorrow Jan 03 during

[Analytics] dbstore1002 / analytics-store.eqiad.wmnet downtime announcement for Jan 09 15:00 UTC

2018-01-08 Thread Luca Toscano
Hi everybody, dbstore1002 (also known as analytics-store.eqiad.wmnet) needs to be shutdown for maintenance tomorrow Jan 09 at around 15:00 UTC for https://phabricator.wikimedia.org/T183771. We don't expect the downtime to last more than a couple of hours, but there are some outstanding issues that

[Analytics] Analytics hosts reboot announcement - Wed 17th 10 AM CET

2018-01-15 Thread Luca Toscano
Hi everybody, as you already know we are deploying the new Linux kernel to fix the Meltdown vulnerability across the production fleet. This means that I need to reboot all the stat boxes (stat100[456]) and also analytics1003 (running Oozie, Camus, Hive, etc..), probably interfering with the work t

[Analytics] Reboot of eventlog1001 for kernel upgrades

2018-01-15 Thread Luca Toscano
Hi everybody, I am about to reboot eventlog1001 for kernel upgrades. This host runs all the Eventlogging daemons that pull data from Kafka, elaborate it and then push to Mysql. The maintenance is needed to deploy the new Linux Kernel that fixes the Meltdown vulnerability. If you see a dip in Even

[Analytics] Analytics Hadoop cluster maintenance announce for Feb 6th

2018-01-23 Thread Luca Toscano
*TL;DR*: The Analytics Hadoop cluster will be completely down for max 2h on *Feb 6th* (EU/CET morning) to upgrade all the daemons to Java 8. Hi everybody, we are planning to upgrade the Analytics Hadoop cluster to Java 8 on *Feb 6th* (EU/CET morning) for https://phabricator.wikimedia.org/T166248.

Re: [Analytics] Analytics Hadoop cluster maintenance announce for Feb 6th

2018-02-05 Thread Luca Toscano
Hi everybody, just a reminder that the upgrade is scheduled for tomorrow EU/CET morning. Luca 2018-01-23 17:58 GMT+01:00 Luca Toscano : > *TL;DR*: The Analytics Hadoop cluster will be completely down for max 2h > on *Feb 6th* (EU/CET morning) to upgrade all the daemons to Java 8.

Re: [Analytics] Page hourly views

2018-02-12 Thread Luca Toscano
Hi everybody, thanks a lot for the ping. The pageviews archives are copied from a HDFS read-only mount point to the host that servers them statically via HTTP, and during the weekend an event happenedthat caused the rsync to stop working. The issue should now be fixed. I opened https://phabricato

Re: [Analytics] [Engineering] Analytics Hadoop cluster maintenance postponed - Tue 13th February

2018-02-13 Thread Luca Toscano
Hi everybody, the Analytics Hadoop upgrade to Java 8 has been postponed due to unexpected hw failures (https://phabricator.wikimedia.org/T187164 and https://phabricator.wikimedia.org/T187162) happened this morning and causing some HDFS blocks to be under replicated/corrupted. To be on the safe sid

[Analytics] Upcoming reboot of stat100[56] and analytics1003 (Hive, Oozie) for kernel security upgrades

2018-03-06 Thread Luca Toscano
Hi everybody, tomorrow EU morning (Wed Mar 7th) I'd need to reboot stat100[56] and analytics1003 for kernel security updates. Hive and Oozie (Analytics Hadoop cluster) will not be available for a (hopefully) brief period of time. Please let me know if there is an important work that you are doing

[Analytics] Eventlogging mysql consumers temporarily stopped due to maintenance

2018-03-06 Thread Luca Toscano
Hi everybody, today, while performing maintenance to the Eventlogging Master database, we ended up in https://phabricator.wikimedia.org/T188991 (TL;DR: two hours of data inserted to the slave database and not the master one). We are working to find a feasible solution to avoid loosing data and get

Re: [Analytics] Eventlogging mysql consumers temporarily stopped due to maintenance

2018-03-06 Thread Luca Toscano
13:55 GMT+01:00 Luca Toscano : > Hi everybody, > > today, while performing maintenance to the Eventlogging Master database, > we ended up in https://phabricator.wikimedia.org/T188991 (TL;DR: two > hours of data inserted to the slave database and not the master one). We > are

[Analytics] Eventlogging daemons migrated to a new host - a brief hole in metrics is expected

2018-03-08 Thread Luca Toscano
Hi everybody, today as part of https://phabricator.wikimedia.org/T114199 we migrated all the eventlogging daemons (except the zmq-forwarder, see https://gerrit.wikimedia.org/r/#/c/415218/) from eventlog1001 (Ubuntu Trusty) to eventlog1002 (Debian Stretch). The maintenance that we followed involved

[Analytics] Eventlogging Analytics migrates to Kafka Jumbo

2018-03-14 Thread Luca Toscano
Hi everybody, as part of https://phabricator.wikimedia.org/T183297 the Analytics team is migrating all the Varnishkafka Eventlogging traffic from Kafka Analytics to Kafka Jumbo. The procedure that we are going to use is the following: 1) change the varnishkafka configuration - this will effective

[Analytics] Introducing the HDFS Trash directory in the Analytics Hadoop cluster

2018-04-12 Thread Luca Toscano
Hi everybody, in T189051 the Analytics team introduced a new feature in the Hadoop cluster, namely the HDFS Trash directory. This means that now if you use the hdfs -rm CLI command you will not directly delete a file or a directory, but you'll move it under /user/$yourusername/.Trash. The Trash di

[Analytics] Data ingestion issue with Webrequest 2018-06-14-11

2018-06-15 Thread Luca Toscano
Hi everybody, we have been working on an issue while refining webrequest data for the 2018-06-14-11 hour, tracked in https://phabricator.wikimedia.org/T197281. We have a fix that will be deployed on Monday, so we apologize in advance if today and during the weekend some data will be missing. Bria

Re: [Analytics] Data ingestion issue with Webrequest 2018-06-14-11

2018-06-18 Thread Luca Toscano
Hi again, everything is back on track! Please let us know in the Phabricator task (T197281) if you still see something out of the ordinary. Thanks! Luca Il giorno ven 15 giu 2018 alle ore 18:55 Luca Toscano < ltosc...@wikimedia.org> ha scritto: > Hi everybody, > > we have bee

[Analytics] Piwik maintenance ongoing

2018-06-27 Thread Luca Toscano
Hi everybody, as FYI piwik.wikimedia.org will be in maintenance mode for a couple of hours due to a software upgrade. More info in https://phabricator.wikimedia.org/T192298 Thanks! Luca ___ Analytics mailing list Analytics@lists.wikimedia.org https://

Re: [Analytics] Piwik maintenance ongoing

2018-06-28 Thread Luca Toscano
Hi again! I am about to upgrade Piwik again, this time to the new Matomo version (the last rebranding name), hopefully the last invasive action for a while. Progress tracked in https://phabricator.wikimedia.org/T192298 Luca Il giorno mer 27 giu 2018 alle ore 14:49 Luca Toscano < lt

Re: [Analytics] Kafka Main Eqiad outage and failover of Eventbus/Eventstreams to codfw

2018-07-12 Thread Luca Toscano
during the next couple of hours. The consumers of the Eventstreams service may get some failures or data drops, apologies in advance for the trouble. Cheers, Luca Il giorno gio 12 lug 2018 alle ore 00:00 Luca Toscano < ltosc...@wikimedia.org> ha scritto: > Hi everybody, > > as you

[Analytics] Eventlogging parsing problem, down during the weekend

2018-07-29 Thread Luca Toscano
Hi everybody, there is an ongoing task for Eventlogging ( https://phabricator.wikimedia.org/T200630) related to the parsing of weird user agents. The issue caused Eventlogging to be down during the weekend, but we hope to make it work again today. This implies that recent data is delayed, apologie

[Analytics] Cron jobs running on Analytics stat hosts and firewall rules

2018-08-27 Thread Luca Toscano
Hi everybody, as part of T198623 the Analytics and Traffic team worked on a better set of firewall rules for ipv4/ipv6 traffic generated within the Analytics VLAN and going towards Production. For example, we are now enforcing the usage of https://wikitech.wikimedia.org/wiki/HTTP_proxy for all the

[Analytics] Archiva moves to a new host and gets upgraded to 2.2.3

2018-08-28 Thread Luca Toscano
Hi everybody, if you are not a Archiva user (https://archiva.wikimedia.org/) you can stop reading this email. Tomorrow morning EU time I am going to move archiva.wikimedia.org to a new host, as explained in details in T192639. Important changes: - Archiva gets upgraded to the latest upstream versi

[Analytics] Hive and Oozie unavailable for a brief hardware maintenance on Sept 7th

2018-09-06 Thread Luca Toscano
Hi everybody, just wanted to let you know that tomorrow morning EU time I'll stop Hive and Oozie on analytics1003 (Hadoop Cluster) to allow the host to be rebooted. I don't plan to kill any job but to wait for their completion. While the maintenance happens some errors might happen to your jobs. T

Re: [Analytics] Hive and Oozie unavailable for a brief hardware maintenance on Sept 7th

2018-09-07 Thread Luca Toscano
fects your work :) Thanks and sorry for the late notice! Luca Il giorno gio 6 set 2018 alle ore 17:23 Luca Toscano ha scritto: > Hi everybody, > > just wanted to let you know that tomorrow morning EU time I'll stop Hive > and Oozie on analytics1003 (Hadoop Cluster) to allow t

Re: [Analytics] Hive and Oozie unavailable for a brief hardware maintenance on Sept 7th

2018-09-10 Thread Luca Toscano
vanovic_...@wikimedia.de> ha scritto: > Hi, > > I am sorry if it turns out that you have already informed us about this: > please, is the cluster reboot completed? > > I have a lot of Hive jobs waiting. > > Thanks, > > Goran > > On Fri, Sep 7, 2018, 13:57 Luca Tosca

Re: [Analytics] Hive and Oozie unavailable for a brief hardware maintenance on Sept 7th

2018-09-10 Thread Luca Toscano
------- > > > On Mon, Sep 10, 2018 at 3:09 PM Luca Toscano > wrote: > >> Hi Goran, >> >> sorry for the delay but this morning we didn't manage to drain the >> cluster and reboot, I am going to do it now. It should take a m

[Analytics] Upcoming reboot of stat* and notebook* hosts - Sept 13th

2018-09-10 Thread Luca Toscano
Hi everybody, on Thursday Sept 13th (EU morning) I am planning to reboot the stat hosts (stat1004, stat1005 and stat1006) and the notebook hosts (notebook1003, notebook1004) for Linux kernel upgrades. Please let me know if this impacts your work in https://phabricator.wikimedia.org/T203165 or on I

[Analytics] Analytics Hadoop cluster full shutdown scheduled for Sept 25th

2018-09-14 Thread Luca Toscano
Hi everybody, the Analytics team needs to replace the Hadoop master node hosts (analytics100[1,2]) and the Hive/Oozie host (analytics1003) as part of regular hardware refresh (hosts getting out of warranty). In order to do things safely we decided to proceed with a full cluster shutdown on Sept 25

Re: [Analytics] Analytics Hadoop cluster full shutdown scheduled for Sept 25th

2018-09-24 Thread Luca Toscano
Hi everybody, this is a reminder that the maintenance will happen tomorrow (Tue 25th, 10 CEST). Luca Il giorno ven 14 set 2018 alle ore 12:13 Luca Toscano < ltosc...@wikimedia.org> ha scritto: > Hi everybody, > > the Analytics team needs to replace the Hadoop master node hosts

Re: [Analytics] Analytics Hadoop cluster full shutdown scheduled for Sept 25th

2018-09-25 Thread Luca Toscano
y for Hive/Oozie only), please don't hate me :) If you see any issue please contact us (via https://phabricator.wikimedia.org/T203635 or IRC Freenode #wikimedia-analytics). Thanks! Luca Il giorno lun 24 set 2018 alle ore 16:50 Luca Toscano < ltosc...@wikimedia.org> ha scritto: > H

[Analytics] Brief unavailability scheduled for the Event Logging database replica

2018-09-26 Thread Luca Toscano
Hi everybody, Tomorrow Sept 27th at 10 CEST db1108 (alias analytics-slave) will be down for a brief (max 30 mins) maintenance (Mariadb and Linux kernel upgrade). This means that the log database will not be available for querying during this time frame. Please reach out to me or to the Analytics t

[Analytics] Hive and Oozie unavailable for maintenance on Tue Oct 9th 10 AM CEST

2018-10-05 Thread Luca Toscano
Hi everybody, the Analytics team is going to move the Oozie and Hive daemons from the analytics1003 host to an-coord1001 (new host, hardware refresh) on Tuesday Oct 9th at 10 AM CEST. This will require downtime for Oozie and Hive, so some jobs might fail or not work at all during the maintenance.

Re: [Analytics] [Wiki-research-l] Hive and Oozie unavailable for maintenance on Tue Oct 9th 10 AM CEST

2018-10-10 Thread Luca Toscano
ator has moved, you'll have to update > its url from *analytics1003.eqiad.wmnet *to *an-coord1001.eqiad.wmnet *in > any scripts you have. > > On Fri, 5 Oct 2018 at 09:54, Luca Toscano wrote: > >> Hi everybody, >> >> the Analytics team is going to move the Oozie a

[Analytics] Eventlogging host rebooted, metrics might show some dips

2018-10-11 Thread Luca Toscano
Hi everybody, I stopped Eventlogging completely from 14:16 to 14:17 UTC to allow a host reboot for kernel upgrades. This might end up shown in some Kafka throughput metrics related to the Eventlogging schemas as a dip. If you have any question please feel free to follow up with me or the Analytic

[Analytics] Druid upgrade - Thu 25th 11 AM CEST

2018-10-23 Thread Luca Toscano
Hi everybody, the Analytics team will upgrade the Druid cluster behind Superset/Turnilo (druid100[1-3]) to version 0.12.3 on Thursday 25th at 11AM CEST. At the same time, we'll upgrade Turnilo to version 1.8.1. Since it will be a rolling upgrade, you shouldn't see a major impact but possibly spora

[Analytics] Upcoming move of users from stat1005 to stat1007

2018-10-31 Thread Luca Toscano
Hi everybody, as part of https://phabricator.wikimedia.org/T205846 we are going to ask to all the stat1005's users to move to stat1007 during the next two weeks. The deadline is November 14th, by which time ssh access to stat1005 will be removed. Background: on stat1005 we have a GPU (more detail

Re: [Analytics] Upcoming move of users from stat1005 to stat1007

2018-11-06 Thread Luca Toscano
Hi everybody, this is a reminder that in a week stat1005 will not be usable anymore. Please follow up with me or the Analytics team if you need more time or if you have any question :) Thanks! Luca Il giorno mer 31 ott 2018 alle ore 16:03 Luca Toscano < ltosc...@wikimedia.org> ha s

[Analytics] Analytics Hadoop Cloudera upgrade scheduled for Nov 12th (Monday) at 14:00 CEST

2018-11-08 Thread Luca Toscano
Hi everybody, the Analytics team will shutdown completely the Hadoop cluster for a couple of hours on Monday Nov 12th at 14:00 CEST to upgrade the Cloudera distribution to 5.15 (currently 5.10). No big updates but only a collection of small/medium fixes that (hopefully) will improve the reliabilit

Re: [Analytics] Upcoming move of users from stat1005 to stat1007

2018-11-20 Thread Luca Toscano
so please check all your data on stat1007 as soon as possible (and let us know if you are missing something). Thanks a lot! Luca (on behalf of the Analytics team) Il giorno mer 7 nov 2018 alle ore 07:32 Luca Toscano ha scritto: > Hi everybody, > > this is a reminder that in a week s

[Analytics] Unscheduled reboot of stat1007

2018-11-22 Thread Luca Toscano
Hi everybody, as FYI today I rebooted stat1007 due to unexpected maintenance (an error from my side) while investigating a Spark2 issue (that is now fixed). Apologies if this has impacted your work! Luca ___ Analytics mailing list Analytics@lists.wikime

[Analytics] Investigation ongoing about data loss error for Webrequest 2018-12-01 hour 14

2018-12-03 Thread Luca Toscano
Hi everybody, during the weekend Oozie alerted us about a suspect data loss for the Webrequest dataset for hour 14 of 2018-12-01. We opened a task to investigate: https://phabricator.wikimedia.org/T211000 This means that related datasets will be missing until we have a final fix/answer, apologies

[Analytics] dbstore1002 currently under unexpected maintenance

2019-01-14 Thread Luca Toscano
Hi everybody, analytics-store/dbstore1002 is currently experiencing issues, more info in https://phabricator.wikimedia.org/T213670. The mysql daemon on the host will likely experience downtime while we attempt to fix the issue, apologies in advance for the trouble. For any question feel free to r

[Analytics] Eventlogging master database down for maintenance

2019-01-17 Thread Luca Toscano
Hi everybody, as FYI the Eventlogging master database (on db1107) is currently down to ease rack maintenance. More info in https://phabricator.wikimedia.org/T213748 Recent data on the db1108/analytics-slave's log database will be delayed. Let me know if this is an issue for you on IRC or via emai

Re: [Analytics] dbstore1002 currently under unexpected maintenance

2019-01-17 Thread Luca Toscano
Hi everybody, we are again back into a maintenance window due to an unexpected mysql crash. More info in https://phabricator.wikimedia.org/T213670. Luca (on behalf of the Analytics team and the Data persistence team) Il giorno lun 14 gen 2019 alle ore 09:47 Luca Toscano < ltosc...@wikimedia.

[Analytics] Rename of sX-analytics-slave DNS CNAMEs to sX-analytics-replica

2019-02-05 Thread Luca Toscano
Hi everybody, as you already know dbstore1002 is going to be migrated to dbstore100[3-5] moving from a multisource (single host) scheme to a multi instance (multi host) one. As part of the migration we have to upgrade the following DNS CNAMEs (they all point now to dbstore1002): s1-analytics-slav

[Analytics] Maintenance proposal for dbstore1002 - staging database migration to dbstore1005 on Monday 18th (EU Morning)

2019-02-12 Thread Luca Toscano
Hi everybody, as described in here (https://phabricator.wikimedia.org/T215589#4946535) I am proposing a maintenance window to allow the Data Persistence and Analytics teams to move the staging database from dbstore1002 to dbstore1005 (its new home) on Monday 18th during the EU morning. This will m

Re: [Analytics] Maintenance proposal for dbstore1002 - staging database migration to dbstore1005 on Monday 18th (EU Morning)

2019-02-19 Thread Luca Toscano
M Manuel Arostegui > wrote: > >> Hello, >> >> I am setting dbstore1002 on read-only now, to start the migration. >> >> Thanks >> Manuel. >> >> On Tue, Feb 12, 2019 at 10:11 AM Luca Toscano >> wrote: >> >>> Hi everybody, >

[Analytics] DEPRECATION WARNING: dbstore1002 is going to be decommissioned on March 4th

2019-02-22 Thread Luca Toscano
Hi everybody, the Analytics team has been working with the SRE Data Persistence team during the last months to replace dbstore1002 with three brand new nodes, dbstore100[3-5]. We are moving from a single mysql instance (multi-source) to a multi-instance environment. For more info please check: *

[Analytics] Hadoop Yarn running application data moved from zookeeper to HDFS

2019-02-28 Thread Luca Toscano
Hi everybody, today we configured Hadoop Yarn to store its application/jobs data (called rmstore) from Zookeeper to HDFS. We are going to remove a lot of data from our Zookeeper cluster in eqiad (several thousands of znodes), hopefully increasing its reliability (it is shared with all the Kafka cl

Re: [Analytics] Trouble getting yesterday's pageviews data

2019-04-02 Thread Luca Toscano
Hi Collin, you have anticipated my email :) We are tracking the issue in https://phabricator.wikimedia.org/T219842, we had a Kafka outage yesterday and we are still fixing jobs that didn't run. Luca Il giorno mar 2 apr 2019 alle ore 14:47 Collin Stedman ha scritto: > Hello, > > I'm having trou

[Analytics] Analytics Hadoop cluster offline for maintenance on Wed Apr 17th at 15:00 CET

2019-04-15 Thread Luca Toscano
Hi everybody, the Analytics team is planning to upgrade the Hadoop cluster to CDH 5.16.1 (changelog in https://phabricator.wikimedia.org/T218343) on Wed Apr 17th at 15:00 CET. All services (HDFS, Hive, Oozie, Notebooks, etc..) will be unavailable for one hour if everything goes according to plan,

[Analytics] Reboot of the Analytics dbstore database hosts and upgrade of Mariadb

2019-04-19 Thread Luca Toscano
Hi everybody, on Monday 22nd the SRE Data Persistence team will reboot the Analytics dbstore database hosts for Linux kernel + Mariadb upgrade during early EU morning. It shouldn't affect anybody but please let me know if you have any issue with it. Thanks! Luca (on behalf of Analytics and Data

[Analytics] Superset 0.32 upgrade coming tomorrow (May 15th, early EU morning)

2019-05-14 Thread Luca Toscano
Hi everybody, as FYI I am going to upgrade Superset tomorrow (May 15th) to 0.32. This will involve moving to a new host based on Debian Buster and Python 3.7, so the move will require some time and it will be hopefully fully done early during the EU morning. Tracking task: https://phabricator.wik

Re: [Analytics] Superset 0.32 upgrade coming tomorrow (May 15th, early EU morning)

2019-05-16 Thread Luca Toscano
Bua > > Edits per platform in Indonesia and Arabic Wikipedia last month: > https://bit.ly/2JlKNIG > > Thanks, > > Nuria > > On Tue, May 14, 2019 at 10:57 AM Luca Toscano > wrote: > >> Hi everybody, >> >> as FYI I am going to upgrade Superset tom

[Analytics] Reboot of stat1004--6-7 and notebook1003-4 happening on May 21st (early EU morning)

2019-05-20 Thread Luca Toscano
Hi everybody, the stat1004-6-7 and notebook1003-4 hosts will be rebooted tomorrow morning, May 21st, during the EU morning for security upgrades (Linux kernel upgrades). Please let me or anybody in the Analytics team know if this is problematic for your work so we can schedule a better maintenance

[Analytics] Maintenance to the Eventlogging databases

2019-06-24 Thread Luca Toscano
Hi everybody, at 10 AM CEST the SRE/Analytics team will take down db1107 and db1108 (where the log database is stored/accessed) for maintenance. It should last half an hour. If you have any questions please reach out to me (elukey) on IRC or to the a-team alias at #wikimedia-analytics on Freenode

[Analytics] Hive and Oozie unavailable for maintenance on Wed 26th 9 AM CEST

2019-06-24 Thread Luca Toscano
Hi everybody, as part of https://phabricator.wikimedia.org/T225306 I need to reboot the an-coord1001 host, that runs the Hive server/metastore and Oozie. Tomorrow June 26th I'll reboot the host at around 9 AM CEST, the maintenance window should last 10/15 minutes more or less. This means that hive

[Analytics] Analytics clients (stat/notebook hosts) and backups of home directories

2019-07-04 Thread Luca Toscano
Hi everybody, as part of https://phabricator.wikimedia.org/T201165 the Analytics team thought to reach out to everybody to make it clear that all the home directories on the stat/notebook nodes are not backed up periodically. They run on a software RAID configuration spanning multiple disks of cou

[Analytics] Firewall on stat100x and notebook100x hosts

2019-07-05 Thread Luca Toscano
TL;DR: In https://phabricator.wikimedia.org/T170826 the Analytics team wants to add base firewall rules to stat100x and notebook100x hosts, that will cause any non-localhost or known traffic to be blocked by default. Please let us know in the task if this is a problem for you. Hi everybody, the A

Re: [Analytics] [Wiki-research-l] Firewall on stat100x and notebook100x hosts

2019-07-10 Thread Luca Toscano
Hi Isaac, Il giorno mer 10 lug 2019 alle ore 16:14 Isaac Johnson ha scritto: > Hey Luca, > We discussed this in Research and it all sounds good to us with one > question below. If something else arises, we'll ping you. Thanks for the > heads up! > > > We assumed that instructing Spark to use a p

Re: [Analytics] [Wiki-research-l] Analytics clients (stat/notebook hosts) and backups of home directories

2019-07-11 Thread Luca Toscano
side. >>> >>> I have one question for you: As you allow/encourage for more copies of >>> the files to exist, what is the mechanism you'd like to put in place >>> for reducing the chances of PII to be copied in new folders that then >>> will be e

[Analytics] Urgent maintenance to an-coord1001 requires a brief stop of Oozie/Hive/Spark/etc..

2019-07-15 Thread Luca Toscano
Hi everybody, due to https://phabricator.wikimedia.org/T227941 we'd need to take down Oozie/Hive/etc.. on an-coord1001. The maintenance should not last long, but if you have any issue please reach out to us on IRC (#wikimedia-analytics on Freenode). Thanks! Luca (on behalf of the Analytics team)

  1   2   >