Re: [Analytics] [Wiki-research-l] Analytics clients (stat/notebook hosts) and backups of home directories

2019-07-10 Thread Nuria Ruiz
>I have one question for you: As you allow/encourage for more copies of >the files to exist To be extra clear, we do not encourage for data to be in that notebooks hosts at all, there is no capacity of them to neither process nor hosts large amounts of data. Data that you are working with is best

Re: [Analytics] [Wiki-research-l] Analytics clients (stat/notebook hosts) and backups of home directories

2019-07-10 Thread Kate Zimmerman
I second Leila's question. The issue of how we flag PII data and ensure it's appropriately scrubbed came up in our team meeting yesterday. We're discussing team practices for data/project backups tomorrow and plan to come out with some proposals, at least for the short term. Are there any

Re: [Analytics] [Wiki-research-l] Analytics clients (stat/notebook hosts) and backups of home directories

2019-07-10 Thread Leila Zia
Hi Luca, Thanks for the heads up. Isaac is coordinating a response from the Research side. I have one question for you: As you allow/encourage for more copies of the files to exist, what is the mechanism you'd like to put in place for reducing the chances of PII to be copied in new folders that

Re: [Analytics] [Wiki-research-l] Firewall on stat100x and notebook100x hosts

2019-07-10 Thread Isaac Johnson
Sounds perfect Luca -- thanks for the clarification! On Wed, Jul 10, 2019 at 9:20 AM Luca Toscano wrote: > Hi Isaac, > > Il giorno mer 10 lug 2019 alle ore 16:14 Isaac Johnson < > is...@wikimedia.org> > ha scritto: > > > Hey Luca, > > We discussed this in Research and it all sounds good to us

Re: [Analytics] [Wiki-research-l] Firewall on stat100x and notebook100x hosts

2019-07-10 Thread Luca Toscano
Hi Isaac, Il giorno mer 10 lug 2019 alle ore 16:14 Isaac Johnson ha scritto: > Hey Luca, > We discussed this in Research and it all sounds good to us with one > question below. If something else arises, we'll ping you. Thanks for the > heads up! > > > We assumed that instructing Spark to use a

Re: [Analytics] [Wiki-research-l] Firewall on stat100x and notebook100x hosts

2019-07-10 Thread Isaac Johnson
Hey Luca, We discussed this in Research and it all sounds good to us with one question below. If something else arises, we'll ping you. Thanks for the heads up! > We assumed that instructing Spark to use a predefined range of random ports was not possible, but in