Inside Infra: Greg Stein --Part II

Sally Khudairi Mon, 29 Jun 2020 07:56:42 -0700

[this interview is available online at https://s.apache.org/InsideInfra-Greg2 ]


The "Inside Infra" interview continues with ASF Infrastructure Administrator 
Greg Stein, who shares his experience with Sally Khudairi, ASF VP Marketing & 
Publicity.

- - -
"Who are these crazy guys spread around the world that are keeping 200 machines 
up and running for all these different projects and committers and 
contributors?"
- - -

PART TWO.

- How or what would you describe the Infra "brand" to be?

I don't really know. I've never really thought about branding or marketing 
ourselves, so ...

- Well, you guys have a certain persona, you have those funky t-shirts you wear 
at ApacheCon ...there's definitely some kind of street cred that's different 
from everybody else. I was curious to see if that's part of your natural sense 
of hip, or is that something that you guys deliberately planned for.

The t-shirts and other things go back to the team bonding kind of thing. We'll 
give ourselves an identity, but haven't tried to create or market ourselves. I 
think it is something that we do need to take some control over. We hired a 
part-time writer in December and he's been organizing our content to provide a 
better and more useful front to Infrastructure.

There were a lot of pages on www.apache.org that have now moved over to 
infra.apache.org. That creates a more coherent Web space, if you will. We can 
really talk about those different channels. "How do you reach Infrastructure? 
Do I go to the Slack channel or do I file a JIRA ticket: how do I decide?" So 
he's helping to, while I wouldn't say "market a new face", he's certainly 
helping people figure out who we are, what we do, what we can help with and 
getting that information organized.

- Which is good. That's new. Even to have you guys featured in a project like 
this, it's unusual and it's refreshing. I'm personally curious, and I'm sure 
other people are also curious about what's behind Infra.

Right, right. Who are these crazy guys spread around the world that are keeping 
200 machines up and running for all these different projects and committers and 
contributors?

So Andrew (technical writer Andrew Wetmore) is primarily going to work on the 
infrastructure docs until those are whipped into shape because a lot of the 
material that we have, a lot of the Webpages, is really infrastructure related. 
He has been working with the team on those pages. What's going to be harder 
though is when he's kind of at a stopping point for that, what to turn his 
focus to, and that would be www.apache. But then it gets a lot more difficult 
because when he wants to update the How It Works page, who does he talk to? 
Who's authoritative? He can do some edits for flow and word consistency, 
punctuation, clarity, right, but he can't really update the process.

- Right. Right. That's the Foundation thing.

Yeah. But the problem is we don't really even have a concept of who's in charge 
of that How It Works page, who is, you know, it's just there's nobody that the 
foundation is willing to say, "That person controls that process." You know 
what I mean?

- I totally do --I come across the same pages and people go, "Are they yours?" 
It's hard to determine not only evolving processes, but who signs off on this 
or who gets it. I hear you.

I've recommended for the past year, or three, that Marketing is the owner of 
DubDubDub (www.), but you know, that's the "face" of Apache. You know? But the 
raw content, as you point out, who approves the raw content.

- One thing that I asked Drew and Chris, and I'm always curious with people who 
are super busy and juggling 50 things, is to describe a typical workday for you.

I wake up, I look for email first, generally, sometimes I'll hop onto Slack 
because sometimes people ask me directly for something. Then I go look at email 
and sort through a number of different categories between direct team stuff, 
operations, the Apache Board, and then Apache in general. And then of course, 
if there's any vendor email to deal with. So there's a bunch of different 
categories in priority order. After I get through that initial work, then it's 
go and read all the back scroll in the team channel, which is anywhere from 200 
to 400 lines of back scroll ...

- Can you get any work done? Beyond just catching up on the communications?

Yes. But it does take like 30 minutes to read that back scroll. For me there's 
a lot in there about what the guys are doing and what they're working on, how 
to solve a particular problem when they're asking somebody else, "Hey, can you 
look at this? Can you help me with this?" But I don't, for the most part, 
"serve", you know ...they are the technical staff... I can do it: I have 
technical chops, but I let them do their jobs as they know best. I do like 
reading the back scroll because I'm also looking at it from the angle of "how 
is the team working together? Is that going well? Is there something that I 
need to poke and prod to improve how they're working? Are they getting jammed 
up on something that I can unblock for them so that they can get their work 
done?"

Stuff like that. That's what I look for when I go through that back scrolling, 
so it's important to me to read that back scroll. Most of the guys do tend to, 
when they first sign in in the morning, go back and scan for stuff where they 
might be needed. I've never really asked them how detailed they get, but I 
think pretty much everybody reads all of it to catch up, but they're going to 
be looking at it with a different lens than how I look at it. Mostly I'm 
looking at unblocking --are they running into problems that I can ease for them?

- How do you keep your workload organized?

I don't.

- Fair enough. Again, there's a lot, so it's curious to me, like everything at 
Apache, with the exception of a handful of things, everything could be a 
priority, if you're always on fire and always running around, putting out 
fires, you know? It's funny when I've talked to the Infra guys and you also, 
you all have the same reaction to that question, which is the laugh. I think 
that's the nature of the beast with the ASF.

Yes. That really is the nature of system administration work. My career has 
been product development, and you can reasonably plot that out. You can say, 
"We're going to develop these five new features, which is going to take us 
between two and four months." We'll see...we might cut a feature to try and 
limit our time development. The feature is going to change, unless we'll plan 
in time for change. But system administration is very reactive, so it's a very 
different beast. This is where, like I said, we were kind of treading water 
with four people, but we could see as Apache was growing we were not going to 
be able to keep up. And we certainly weren't going to be able to move ahead of 
the curve and do things like selfserve.apache.org where, you know, before we 
would get a dozen tickets to create repositories and that took time. Now we 
don't have to do anything.

It's all selfserve.apache.org, but we had to write the tool first and have 
enough air time to get that tool written. So I think we're ahead of the curve. 
We're getting some of our longer-term initiatives done, but it is still a very 
reactive thing. For myself, my back office work is pretty straightforward and 
it's a lot of email and Website work, you know, going in, paying an invoice, 
putting in the infrastructure credit card, sending out a purchase order, stuff 
like verifying and improving payroll, that doesn't require me sitting down and 
writing Python scripts.

The other half of my job is being present on that channel because I also help 
to set priorities. When something comes up, I ask, "Is this a thing that we 
want to do? Do we want to take on this new task? Do we want to provide this new 
tool to the projects?" You know, like a project is going to say, "Well, we want 
to integrate this thing into our GitHub repository," and we go and review it. 
It may require permissions that we simply don't want to allow. So there's some 
of those kinds of policy kind of things that I also help with. And there's 
always being present to help set policies and priorities.

- OK... so how do you work with (VP Infrastructure) David Nalley? Are you 
making the decisions? Infra is an unusual type of group as opposed to other 
areas of activity operationally at the ASF. How do you work together?

Correct: I'm the day-to-day, so I look at it like he's the brains and I'm the 
hands. That said, he's like the strategic brain and I do all the tactical 
decisions.

I make all the tactical decisions. I am an officer of the corporation. I can 
make any decision that I need to, related to Infrastructure. If I feel it's a 
little bit weird, then I'll bounce that off David, but for most of the stuff, 
he doesn't feel a need to inject himself in. He feels comfortable letting me go 
ahead and run with the things, and rely upon me asking when it seems a little 
sketchy.

- That's good: that process suits both of your personalities, both your 
sensibilities. It sounds like a good fit.

I report to the VP of Infrastructure, and that is still David, even though he 
became Executive VP and is now (ASF) President. He still holds that title. He's 
asked me, "Well, Greg, maybe you should just be VP Infra," and I said, "No 
way." Because we're paid people, but the Foundation is all volunteers. I told 
him I do not want to be a VP, because I want to report to a volunteer. I think 
that I (and the team) should report to a volunteer that always has a volunteer 
eye on the Foundation's long-term goals.

Because I manage all the day-to-day, it's a very lightweight hat for him. That 
VP hat is a tiny aspect compared to his President hat. One day, he'll find 
somebody to take over that VP Infra hat, but I've essentially mandated to him 
that it has to be a volunteer position.

It's not that I see we're going to go all out of control and we need a check 
from a volunteer; I just want a volunteer to always be able to say, "Okay, you 
guys are a little bit crazy, let's redirect our long-term thinking more in line 
with what the Foundation wants," and have a volunteer interpret what the 
Foundation wants.

- That perfectly dovetails into what folks referred to in our ("Trillions and 
Trillions Served") documentary, where they were talking about Greg Stein's 
famous "plan for the ASF for 50 years..." This super long-term vision, which 
again, everyone goes back and says, "Greg Stein said..." What does that mean 
exactly, and how does that translate to Infra, considering that you can't 
really plan that far out? How does that work?

Well, actually we can plan that far out. I wrote that "50 years" in one of my 
Director's statements, I think it was 2014 or 2012 ...maybe earlier. Where I 
was going in that Director statement was the Board doesn't deal with the 
communities. The Board is there to support the communities. So we want the 
Foundation to exist for 50 years so that these communities can continue to run 
and see through evolution.

Some communities are going to move to the Attic, new ones are going to come 
along, but we want the Foundation to be viable. To say "forever" is okay. 
Nobody can really put that in their brain. So I just said, "OK, we can think 
what 50 years means." That is long enough out, but still within people's brain 
capacity to think, "Okay, what _does_ 50 years mean?"

And so that's where I came up with that. What does the Board need to think 
about to ensure that we are here 50 years from now and our projects are 
successful and can run through their lifetime, lifecycles. Apache HTTP and 
Tomcat, I don't think they are ever going to go away, but you could see maybe 
in 30 years they might. There might be some other mechanism in computing that 
would obsolete them, but the model of Apache does need to exist for at least 
that long.

Now, within Infra, I think we actually can plan that far out because we have 
growth curves. We see what kinds of computing resources people need. So we can 
plan for project growth, for machine growth. We can do long-term planning on 
how we allocate machines among our various cloud resources that we have, and 
start to schedule those further out. None of that really affects our day to 
day, but it is something that we can project out a ways and think about what 
kinds of resources we are going to need two, three, five years from now.

There isn't anything really that we can do for 50 years, but we can keep it in 
mind. Okay, that is going to be a larger team. That is going to need a larger 
staff, a full time manager, a full time HR person, a full time... There's 
different things that will change over that time, but we can actually do some 
of that projection, although we haven't bothered.

I do the five year plans for the Board, but mostly that is a simple cost growth 
as opposed to actually changing the structure of the team or the role 
assignments, because like I said, I think probably within 10 years, we'll 
probably need to add one or two more staff on top of the head count of six that 
we have right now. And I think supporting that would still be fine for a 
part-time person like myself. But once it grows to 10 or 12, then I think it's 
going to need a real change. Where we need to have a full-time person managing 
and so, we'll need to adjust the budget considerably to make that happen.

But if we ever get there, the Foundation is going to be likely in a very 
different position. We're talking 10 years from now. And so, who knows.

- So with more than 350 projects and initiatives as we've discussed before, how 
do you guys stay ahead of the demand? And again, if you're trying to plan for 
five, ten years out, you mentioned earlier cloud computing. Not so long ago, 
cloud computing was a novelty. How do you plan for this?

And that is where we try and move more things to selfserve.apache.org, where we 
look at the kinds of requests that we're getting. The kinds of tasks that we’re 
performing and find a way to automate that workflow and create more self-serve 
options for the kinds of tasks that we regularly get tickets on.

Where we used to get tickets on creating Git repositories, we get zero now and, 
and we can see over the past six months, we've had 20 tickets to do X, is there 
a way that we can automate that, so we don't have to get our hands on that 
ourselves and save our hands for doing things like machine upgrades, for 
rebalancing some of our computer resources, where things are running on an old 
operating system and we need to get that onto a newer version. Right now, all 
of our machines are managed by a system called Puppet, which does the basic 
configuration work for us. But today, we're on two different versions of 
Puppet, a really old one and a reasonably new one.

And we're trying to get everything migrated off the old stuff onto the new but 
once we finished that migration, we're going to have to start all over again, 
or maybe switch to a different tool. We're looking at a tool called Ansible to 
use instead of Puppet.

And so there's this never-ending ongoing set of tasks, but each time we do it, 
it reduces our workload by that much more. So when we upgrade from Puppet 3 to 
Puppet 6, we get an improvement in the maintainability of that server. And that 
means that we spend less time with that server going forward and have more time 
to do other things or to deal with project growth.

- Regarding a scale of efficiency, how do you close your skills gaps? When I 
spoke to Chris and Drew, they both said, "We do everything." How do you do 
that? How do you know all of this? Do you look at this big picture and say, 
"Okay, we need a person to specialize in X and Y and Z," and then you send them 
out to learn about it? How do you cope with that?

The team definitely specializes. And the guys have specializations around 
different areas, but we do a little bit of cross training, but not a lot 
because as I mentioned, we've got like 200 machines, each individually doing 
their own thing. If we cross trained everybody in everything, we'd get nothing 
done. So, there's a little bit of cross training, but mostly some specialties. 
It does create a little bit of bus factor...

- Which is very scary. I was just going to say, your bus factor is very scary. 
Talk about that.

The thing is that Puppet allows us to create configurations and that's in 
version control. If all of a sudden somebody leaves, another person can 
backfill them because if somebody leaves, it's not like they take their work 
with them: all the work is in version control. And so that work doesn't go with 
them, but we may need to backfill some education on that particular specialized 
area. For example, Chris (ASF Infra team member Chris Thistlethwaite) does a 
lot of our monitoring work. If he left, now we need somebody to get a little 
more familiar with NodePing and a little more familiar with Datadog, but 
that'll be like a week for somebody to pick that up.

It wouldn't be, "Oh my God, this is three years of expertise that we need to go 
backfill" ...we don't have anything that is that highly specialized.

- Is that because the team is more well rounded or because you guys are more 
efficient or what about it? Because of technology evolution, or...

We don't deal with systems of that level of complexity. We've got 200 machines, 
like I said, each doing their thing, but it's not like we've got a cluster of 
200 machines all trying to coordinate to create one particular outcome. It's, 
here's my SQL server, here's a JIRA server, here's a Puppet server. Things like 
that, where the amount of technology is pretty small in each little pocket ... 
but we just have a hundred pockets on our pants.

[END OF PART II]

= = =

NOTE: you are receiving this message because you are subscribed to the 
announce@apache.org distribution list. To unsubscribe, send email from the 
recipient account to announce-unsubscr...@apache.org with the word 
"Unsubscribe" in the subject line.

Inside Infra: Greg Stein --Part II

Reply via email to