Re: git-scm.com status report

2017-05-18 Thread Jeff King
On Thu, May 18, 2017 at 02:06:16PM +0200, Lars Schneider wrote:

> > I haven't ever tried to do this in the local development environment.
> > The production site currently just use a cloud-hosted ES (Bonsai). They
> > have free "Sandbox" plans for testing, so you could probably use that as
> > a test resource after setting up the relevant environment variables. Or
> > alternatively, I think ElasticSearch folks produce binary builds you can
> > try, and you could host locally.
> > 
> > There's a rake job that inserts documents into the search index (see
> > lib/tasks/search.rake).
> 
> Disclaimer: I am jumping in here without much knowledge. Feel free
> to ignore me :-)
> 
> In our TravisCI builds we create the AsciiDoc/Doctor documentation
> already. Couldn't we push that result to some static hosting service?
> Would that help in any way with git-scm.com?

Not really. The site builds the asciidoctor documentation already via an
automated job. This question was just about putting it into the search
index (which also happens in production with an automated job; this is
just about setting up the search database).  So I don't think there's
any real problem to be solved with respect to generating pages.

-Peff


Re: git-scm.com status report

2017-05-18 Thread Lars Schneider

> On 17 May 2017, at 04:03, Jeff King  wrote:
> 
> On Tue, May 16, 2017 at 09:56:37PM -0400, Samuel Lijin wrote:
> 
>> So I've finally found the time to get everything set up (in the
>> process discovering that remote_genbook2 consistently induces a
>> segfault in VirtualBox's networking driver, impressively enough) and
>> am taking a look at how much work it would take to get the site
>> running on AWS EC2/DO or some other hosting provider.
>> 
>> Some things I'm wondering about:
>> 
>> - You mentioned a lot of people reaching out off-list about hosting
>> options. Do any of them look particularly appealing at the moment?
> 
> Yes. I actually have stuff to announce there soon, but was holding off
> until the final pieces are in place. But basically, the architecture
> would remain largely the same, but hosted on community-owned accounts
> (that I can share access to), with sponsorship from various hosting
> services.
> 
>> - How do I set up the ES service?
> 
> I haven't ever tried to do this in the local development environment.
> The production site currently just use a cloud-hosted ES (Bonsai). They
> have free "Sandbox" plans for testing, so you could probably use that as
> a test resource after setting up the relevant environment variables. Or
> alternatively, I think ElasticSearch folks produce binary builds you can
> try, and you could host locally.
> 
> There's a rake job that inserts documents into the search index (see
> lib/tasks/search.rake).

Disclaimer: I am jumping in here without much knowledge. Feel free
to ignore me :-)

In our TravisCI builds we create the AsciiDoc/Doctor documentation
already. Couldn't we push that result to some static hosting service?
Would that help in any way with git-scm.com?

- Lars


Re: git-scm.com status report

2017-05-16 Thread Jeff King
On Tue, May 16, 2017 at 09:56:37PM -0400, Samuel Lijin wrote:

> So I've finally found the time to get everything set up (in the
> process discovering that remote_genbook2 consistently induces a
> segfault in VirtualBox's networking driver, impressively enough) and
> am taking a look at how much work it would take to get the site
> running on AWS EC2/DO or some other hosting provider.
> 
> Some things I'm wondering about:
> 
> - You mentioned a lot of people reaching out off-list about hosting
> options. Do any of them look particularly appealing at the moment?

Yes. I actually have stuff to announce there soon, but was holding off
until the final pieces are in place. But basically, the architecture
would remain largely the same, but hosted on community-owned accounts
(that I can share access to), with sponsorship from various hosting
services.

> - How do I set up the ES service?

I haven't ever tried to do this in the local development environment.
The production site currently just use a cloud-hosted ES (Bonsai). They
have free "Sandbox" plans for testing, so you could probably use that as
a test resource after setting up the relevant environment variables. Or
alternatively, I think ElasticSearch folks produce binary builds you can
try, and you could host locally.

There's a rake job that inserts documents into the search index (see
lib/tasks/search.rake).

-Peff


Re: git-scm.com status report

2017-05-16 Thread Samuel Lijin
On Mon, Feb 6, 2017 at 1:27 PM, Jeff King  wrote:
> On Thu, Feb 02, 2017 at 03:33:50AM +0100, Jeff King wrote:
>
>> We (the Git project) got control of the git-scm.com domain this year. We
>> have never really had an "official" website, but I think a lot of people
>> consider this to be one.
>>
>> This is an overview of the current state, as well as some possible
>> issues and future work.
>
> Thanks everybody, for your responses here and off-list. After my mail
> got posted to HN, I got quite a lot of private responses, including
> offers to sponsor hosting, work on the site, etc. I'm still working my
> way through them, but I wanted to try to respond in aggregate here.
>
> First, a few clarifications:
>
>   - The money for the site wasn't mentioned to me by GitHub at all.  I'm
> quite sure they would continue to sponsor the site financially if
> need be. The only reason I didn't promise that is because I hadn't
> arranged it specifically, and "step 0" seemed like first making sure
> our costs were reasonable.
>
>   - Spinning the site out of GitHub's Heroku account isn't an urgent or
> impending change. It came out of a conversation I had with people
> auditing the GitHub account, where it is clearly a funny historical
> anomaly. So I suspect we could just stay there indefinitely if need
> be. But it seems to me like the right thing is to move it out for
> two reasons:
>
>   1. The site was always intended to serve the Git community, not
>  GitHub, and it has increasingly become a community asset (e.g.,
>  with the transfer of the domain name). The hosting assets
>  should be held by the community, too, to help with things like
>  continuity. If I get hit by a bus, the rest of the Git PLC
>  should have access to the site without having to figure out who
>  owns what.
>
>   2. Right now I can't add any other co-admins to handle operational
>  issues. So the bus factor and load of that part of operating
>  the site can't be spread.
>
> The responses I've gotten fall into a few buckets, I think:
>
>   - Yes, the current hosting cost really is unnecessarily high. Most of
> this is due to scaling wrong. The main costs are:
>
>   1. Using 2x dynos; these have 1GB of RAM versus 512MB. The site
>  does seem to use about 750MB. I have no idea why that is the
>  case. There's probably some low-hanging fruit in reducing the
>  memory use to keep it below 512MB, but I don't think anybody
>  has dug in there.
>
>   2. The site is scaled by using 3 dynos. It would be simpler and
>  cheaper to stick a CDN in front of it, since the pages change
>  very rarely. That's something I haven't looked into setting up
>  yet.
>
>  The prerequisite to using a CDN is actually making sure the
>  content is deterministic and cacheable. There was a nice PR
>  opened at https://github.com/git/git-scm.com/pull/941 towards
>  that end.
>
>   - It's mostly silly for this to be a Rails app at all. It's a static
> site which occasionally sucks in and formats new content (like the
> latest git version, new manpages, etc). The intent here was to make
> something that would "just run" forever and pick up new versions
> without human intervention. And that _does_ work, but it also makes
> things more expensive and complicated than they need to be.
>
> So a viable alternative is to use some kind of static site
> generator and have someone (or something) responsible for pulling in
> the new git versions occasionally.
>
> A few people have expressed interesting this. There's some
> preliminary work here:
>
>   https://github.com/git/git-scm.com/pull/941
>
> and at least GitLab has expressed some interest. So I'll let people
> coordinate in that PR or a new one what the result should look like.
> Working patches trump discussion. :)
>
> I have also talked with the GitHub Pages people, and they think
> hosting it as a Jekyll page wouldn't be a big deal performance-wise
> (with the caveat that we'd need to pre-render the asciidoctor bits
> ourselves, as Jekyll doesn't do asciidoc). So that's a viable option
> for hosting it for effectively free (though I think we _would_ still
> want to put a CDN in front of it). But if somebody has an
> alternative option, that's fine, too.
>
>   - Some people offered to help with running the site, or making major
> transitions (like converting to a static site). The most important
> thing to me there is that we have a solid maintenance plan. So I
> would want some evidence that anybody doing a major work would stick
> around in the community afterwards, or that it be done in a way that
> the handoff back to community members is easy. So I'd probably look
> for somebody already involved 

Re: git-scm.com status report

2017-02-19 Thread Jeff King
On Sat, Feb 18, 2017 at 10:27:51PM +, pedro rijo wrote:

> I would say everyone did an amazing job, closing more than 150 old issues
> in a single week! I think the amount of issues is finally manageable (40
> issues currently).

Yes, thank you to all who have been helping. But especially you and
Samuel, who obviously spent a lot of time sifting through old issues.

> And if you agree, I would like to start looking at old PRs (some will
> probably don't make sense anymore), and will start reviewing them as soon
> as I have the time to setup the RoR app on my machine so that I can
> understand and see the changes introduced on the PRs.

Sounds good.

>  Many PRs seem to introduce small and innocent changes, but I always like
> to run the code to see :)

Yeah, many of the display-oriented changes are pretty obvious from
reading the code, but I have caught a couple of regressions just by
running the PRs and making sure the rendered result is sane.

-Peff


Re: git-scm.com status report

2017-02-08 Thread Eric Wong
Jeff King  wrote:
> I agree we should continue to serve HTTPS. The usual solution for our
> use case is to stick a CDN like Cloudflare in front of GitHub Pages (and
> I think we'd want to do that anyway for performance).
> 
> I haven't done it, but there are various guides. Here's the one from
> Cloudflare:
> 
>   https://blog.cloudflare.com/secure-and-fast-github-pages-with-cloudflare/

AFAIK, there's a way to keep CloudFlare stuff accessible to Tor
users.  If there is, please do so.  As a Tor user, it's been
disappointing to see so much of the web walled off by CAPTCHAs.

Thank you.

Heck, maybe a .onion mirror would be nice :)
I wouldn't mind hosting one myself if it's static.


Re: git-scm.com status report

2017-02-08 Thread Jeff King
On Thu, Feb 09, 2017 at 02:12:09AM +, brian m. carlson wrote:

> My only concern with using GitHub Pages is that I don't believe it
> currently supports TLS on custom domains.  Since we currently have TLS
> enabled, along with HTTP Strict Transport Security (as we should), such
> a configuration literally wouldn't work[0].  I think it's important that
> we continue to serve HTTPS only, anyway.

I agree we should continue to serve HTTPS. The usual solution for our
use case is to stick a CDN like Cloudflare in front of GitHub Pages (and
I think we'd want to do that anyway for performance).

I haven't done it, but there are various guides. Here's the one from
Cloudflare:

  https://blog.cloudflare.com/secure-and-fast-github-pages-with-cloudflare/

> I agree that a static site is the way to go from a maintenance
> perspective, though.  Jekyll does support Asciidoctor with a plugin,
> just not on GitHub Pages, so it would theoretically be possible to build
> the site as one big unit if we did it that way.  I've played around with
> that plugin, so I'm happy to provide guidance if we want to do that.

We already massage the data coming from Git (and from the Pro Git books)
a bit before and after feeding it to asciidoctor. So I always assumed
that any static site would involve some import steps for those things,
and we'd commit the intermediate product into the repository.

-Peff


Re: git-scm.com status report

2017-02-08 Thread brian m. carlson
On Mon, Feb 06, 2017 at 07:27:54PM +0100, Jeff King wrote:
>   - It's mostly silly for this to be a Rails app at all. It's a static
> site which occasionally sucks in and formats new content (like the
> latest git version, new manpages, etc). The intent here was to make
> something that would "just run" forever and pick up new versions
> without human intervention. And that _does_ work, but it also makes
> things more expensive and complicated than they need to be.
> 
> So a viable alternative is to use some kind of static site
> generator and have someone (or something) responsible for pulling in
> the new git versions occasionally.
> 
> A few people have expressed interesting this. There's some
> preliminary work here:
> 
>   https://github.com/git/git-scm.com/pull/941
> 
> and at least GitLab has expressed some interest. So I'll let people
> coordinate in that PR or a new one what the result should look like.
> Working patches trump discussion. :)
> 
> I have also talked with the GitHub Pages people, and they think
> hosting it as a Jekyll page wouldn't be a big deal performance-wise
> (with the caveat that we'd need to pre-render the asciidoctor bits
> ourselves, as Jekyll doesn't do asciidoc). So that's a viable option
> for hosting it for effectively free (though I think we _would_ still
> want to put a CDN in front of it). But if somebody has an
> alternative option, that's fine, too.

My only concern with using GitHub Pages is that I don't believe it
currently supports TLS on custom domains.  Since we currently have TLS
enabled, along with HTTP Strict Transport Security (as we should), such
a configuration literally wouldn't work[0].  I think it's important that
we continue to serve HTTPS only, anyway.

I agree that a static site is the way to go from a maintenance
perspective, though.  Jekyll does support Asciidoctor with a plugin,
just not on GitHub Pages, so it would theoretically be possible to build
the site as one big unit if we did it that way.  I've played around with
that plugin, so I'm happy to provide guidance if we want to do that.

[0] HSTS would prevent anyone who had visited the page from downgrading
to an insecure connection for the next year.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: git-scm.com status report

2017-02-06 Thread Jeff King
On Thu, Feb 02, 2017 at 03:33:50AM +0100, Jeff King wrote:

> We (the Git project) got control of the git-scm.com domain this year. We
> have never really had an "official" website, but I think a lot of people
> consider this to be one.
> 
> This is an overview of the current state, as well as some possible
> issues and future work.

Thanks everybody, for your responses here and off-list. After my mail
got posted to HN, I got quite a lot of private responses, including
offers to sponsor hosting, work on the site, etc. I'm still working my
way through them, but I wanted to try to respond in aggregate here.

First, a few clarifications:

  - The money for the site wasn't mentioned to me by GitHub at all.  I'm
quite sure they would continue to sponsor the site financially if
need be. The only reason I didn't promise that is because I hadn't
arranged it specifically, and "step 0" seemed like first making sure
our costs were reasonable.

  - Spinning the site out of GitHub's Heroku account isn't an urgent or
impending change. It came out of a conversation I had with people
auditing the GitHub account, where it is clearly a funny historical
anomaly. So I suspect we could just stay there indefinitely if need
be. But it seems to me like the right thing is to move it out for
two reasons:

  1. The site was always intended to serve the Git community, not
 GitHub, and it has increasingly become a community asset (e.g.,
 with the transfer of the domain name). The hosting assets
 should be held by the community, too, to help with things like
 continuity. If I get hit by a bus, the rest of the Git PLC
 should have access to the site without having to figure out who
 owns what.

  2. Right now I can't add any other co-admins to handle operational
 issues. So the bus factor and load of that part of operating
 the site can't be spread.

The responses I've gotten fall into a few buckets, I think:

  - Yes, the current hosting cost really is unnecessarily high. Most of
this is due to scaling wrong. The main costs are:

  1. Using 2x dynos; these have 1GB of RAM versus 512MB. The site
 does seem to use about 750MB. I have no idea why that is the
 case. There's probably some low-hanging fruit in reducing the
 memory use to keep it below 512MB, but I don't think anybody
 has dug in there.

  2. The site is scaled by using 3 dynos. It would be simpler and
 cheaper to stick a CDN in front of it, since the pages change
 very rarely. That's something I haven't looked into setting up
 yet.

 The prerequisite to using a CDN is actually making sure the
 content is deterministic and cacheable. There was a nice PR
 opened at https://github.com/git/git-scm.com/pull/941 towards
 that end.

  - It's mostly silly for this to be a Rails app at all. It's a static
site which occasionally sucks in and formats new content (like the
latest git version, new manpages, etc). The intent here was to make
something that would "just run" forever and pick up new versions
without human intervention. And that _does_ work, but it also makes
things more expensive and complicated than they need to be.

So a viable alternative is to use some kind of static site
generator and have someone (or something) responsible for pulling in
the new git versions occasionally.

A few people have expressed interesting this. There's some
preliminary work here:

  https://github.com/git/git-scm.com/pull/941

and at least GitLab has expressed some interest. So I'll let people
coordinate in that PR or a new one what the result should look like.
Working patches trump discussion. :)

I have also talked with the GitHub Pages people, and they think
hosting it as a Jekyll page wouldn't be a big deal performance-wise
(with the caveat that we'd need to pre-render the asciidoctor bits
ourselves, as Jekyll doesn't do asciidoc). So that's a viable option
for hosting it for effectively free (though I think we _would_ still
want to put a CDN in front of it). But if somebody has an
alternative option, that's fine, too.

  - Some people offered to help with running the site, or making major
transitions (like converting to a static site). The most important
thing to me there is that we have a solid maintenance plan. So I
would want some evidence that anybody doing a major work would stick
around in the community afterwards, or that it be done in a way that
the handoff back to community members is easy. So I'd probably look
for somebody already involved in the community, or somebody who
wants to join it building up that trust by taking on site
responsibilities over time.

  - Lots of people asked about small tasks to do. Mostly reviewing and
responding to issues and PR is 

Re: git-scm.com status report

2017-02-06 Thread Jeff King
On Mon, Feb 06, 2017 at 01:41:04AM +0530, Pranit Bauva wrote:

> On Thu, Feb 2, 2017 at 8:03 AM, Jeff King  wrote:
> > ## What's on the site
> >
> > We have the domains git-scm.com and git-scm.org (the latter we've had
> > for a while). They both point to the same website, which has general
> > information about Git, including:
> 
> Since we have an "official" control over the website, shouldn't we be
> using the .org domain more because we are more of an organization?
> What I mean is that in many places, we have referred to git-scm.com,
> which was perfectly fine because it was done by github which is a
> company but now I think it would be more appropriate to use
> git-scm.org domain. We can forward all .com requests to .org and try
> to move all reference we know about, to .org. What do you all think?

I don't have a preference myself. I know a lot of non-commercial groups
(which I think the Git project is) try to prefer ".org" to signal that.

Switching it around would require some DNS changes. I think ".org" goes
to a server the DNS provider (Gandi) runs which issues an HTTP 301 to
".com". So we'd want to reverse that, or possibly just treat them both
as equals. That shouldn't be too hard, and will have to be done via
Conservancy.

I don't know what it would mean in terms of search-engine optimization.
I know Google tries to detect duplicate names for sites and treat one as
canonical. And that's going to be ".com" now, based on the existing
redirect and on the fact that most people will have linked to .com.

I'm not sure what disadvantages there are to switching now, or if there
are things we should be doing to tell search engines (I seem to recall
Google's Webmaster tools have options to say "this is the canonical
name"). This is pretty far outside my area of expertise, so it may not
even be something to care about at all. Just things to consider (and
hopefully more clueful people than I can comment on it).

-Peff


Re: git-scm.com status report

2017-02-05 Thread Pranit Bauva
Hey Peff,

On Thu, Feb 2, 2017 at 8:03 AM, Jeff King  wrote:
> ## What's on the site
>
> We have the domains git-scm.com and git-scm.org (the latter we've had
> for a while). They both point to the same website, which has general
> information about Git, including:

Since we have an "official" control over the website, shouldn't we be
using the .org domain more because we are more of an organization?
What I mean is that in many places, we have referred to git-scm.com,
which was perfectly fine because it was done by github which is a
company but now I think it would be more appropriate to use
git-scm.org domain. We can forward all .com requests to .org and try
to move all reference we know about, to .org. What do you all think?

Regards,
Pranit Bauva


Re: git-scm.com status report

2017-02-03 Thread Jeff King
On Fri, Feb 03, 2017 at 09:23:33PM +, pedro rijo wrote:

> Seems a good idea. I will start by going through some old prs/issues to
> look for trash. If I do find some like the one I referred I will let you
> know by mentioning you. After that I will have a look at simpler issues/prs.
> 
> Let me know if you do agree (or you recommend another workflow) so that I
> can start looking at it this weekend :)

That sounds perfect. Thanks!

-Peff


Re: git-scm.com status report

2017-02-03 Thread Samuel Lijin
On Fri, Feb 3, 2017 at 5:58 AM, Jeff King  wrote:
> On Thu, Feb 02, 2017 at 12:54:53AM -0600, Samuel Lijin wrote:
>
>> In theory, you could also dump the build artifacts to a GH Pages repo
>> and host it from there, although I don't know if you would run up
>> against any of the usage limits[0]. The immediate problem I see with
>> that approach, though, is that I have no idea how any of the dynamic
>> stuff (e.g. search) would be replaced.
>
> I've talked with Pages people and they say it shouldn't be a big deal to
> host. The main issue is that it's not _just_ a static site. It's a site
> that's static once built, but a lot of the content is auto-generated
> from other sources (git manpages, Pro Git and its translations, etc).
>
> So there's work involved in moving that generation step to whatever the
> new process is (it's fine if it's running "make" locally after a Git
> release and pushing up the result).

Yep, noticed that when I cloned the repo the other day. Still
wrangling with my own setup so that I can build everything locally. I
imagine it would also be possible to set up some sort of CI/CD
pipeline to handle generating build artifacts automatically; so to be
honest, I don't think any of the static assets would pose a
significant problem.

The bigger issue, in my opinion, is that there seems to be a fair
amount of non-trivial back-end stuff
(https://github.com/git/git-scm.com/blob/master/spec/controllers/site_controller_spec.rb,
https://github.com/git/git-scm.com/blob/master/app/controllers/site_controller.rb)
including an Elasticsearch layer. (The redirects would be mildly
inconvenient to handle with Pages, but like the static asset
generation, should be more than doable.)

>> A question: there's a DB schema in there. Does the site still use a DB?
>
> It does use the database to hold all of the bits that aren't checked
> into Git. So renderings of the manpages, the latest release git version,
> etc. AFAIK, it's all things that I would be comfortable committing into
> a git repository.
>
> -Peff

In the meantime, I've also pinged a friend at Digital Ocean about
their hosting options and they've expressed interest. At the very
least, they seem to offer a lot more than Heroku for 230$/mo[0], and I
imagine it wouldn't be impossible to reduce the hosting costs by an
order of magnitude. Think it's worth looking into?

[0] https://www.digitalocean.com/pricing/#droplet


Re: git-scm.com status report

2017-02-03 Thread Jeff King
On Thu, Feb 02, 2017 at 10:01:45PM +, pedro rijo wrote:

> While I’m not experienced with Rails apps, I would like to give my
> contribution to the Git project. I could help doing some kind of
> triage, removing abusing PRs/issues (like
> https://github.com/git/git-scm.com/pull/557
> ), looking for typos and
> other tasks that wouldn’t require a lot of RoR knowledge to get start.
> Also, completely free and available to start digging into the RoR
> stuff of course!

Thanks! I think a good first step is just to start watching the
repository and jump in on issues where you think you can contribute.

Clicking "close" or "merge" on an issue is something only I can do for
now, but having a group of people reviewing and responding to issues and
PRs is a big help (so I _can_ just click those buttons). And then
over time hopefully we can grow a stable of reviewers, and hand out
repo privileges to the active ones.

-Peff


Re: git-scm.com status report

2017-02-03 Thread Jeff King
On Thu, Feb 02, 2017 at 12:54:53AM -0600, Samuel Lijin wrote:

> In theory, you could also dump the build artifacts to a GH Pages repo
> and host it from there, although I don't know if you would run up
> against any of the usage limits[0]. The immediate problem I see with
> that approach, though, is that I have no idea how any of the dynamic
> stuff (e.g. search) would be replaced.

I've talked with Pages people and they say it shouldn't be a big deal to
host. The main issue is that it's not _just_ a static site. It's a site
that's static once built, but a lot of the content is auto-generated
from other sources (git manpages, Pro Git and its translations, etc).

So there's work involved in moving that generation step to whatever the
new process is (it's fine if it's running "make" locally after a Git
release and pushing up the result).

> A question: there's a DB schema in there. Does the site still use a DB?

It does use the database to hold all of the bits that aren't checked
into Git. So renderings of the manpages, the latest release git version,
etc. AFAIK, it's all things that I would be comfortable committing into
a git repository.

-Peff


Re: git-scm.com status report

2017-02-02 Thread Samuel Lijin
For anyone interested, this thread is on the HN front page right now[0].

There's one suggestion in particular that stands out to me - shifting
to Digital Ocean[1], which for $240/mo offers wa more than what it
sounds like the current Heroku costs are.

[0] https://news.ycombinator.com/item?id=13554065
[1] https://news.ycombinator.com/item?id=13554632

On Thu, Feb 2, 2017 at 4:01 PM, pedro rijo  wrote:
> Hey,
>
> While I’m not experienced with Rails apps, I would like to give my
> contribution to the Git project. I could help doing some kind of triage,
> removing abusing PRs/issues (like
> https://github.com/git/git-scm.com/pull/557), looking for typos and other
> tasks that wouldn’t require a lot of RoR knowledge to get start. Also,
> completely free and available to start digging into the RoR stuff of course!
>
> If you are interested, just let me know :)
>
> Thanks,
> Pedro Rijo


Re: git-scm.com status report

2017-02-01 Thread Samuel Lijin
In theory, you could also dump the build artifacts to a GH Pages repo
and host it from there, although I don't know if you would run up
against any of the usage limits[0]. The immediate problem I see with
that approach, though, is that I have no idea how any of the dynamic
stuff (e.g. search) would be replaced.

A question: there's a DB schema in there. Does the site still use a DB?

[0] https://help.github.com/articles/what-is-github-pages/#usage-limits

On Wed, Feb 1, 2017 at 10:36 PM, Eric Wong  wrote:
> Jeff King  wrote:
>> With the caveat that I know very little about web hosting, $230/mo
>> sounds like an awful lot for what is essentially a static web site.
>
> Yes, that's a lot.
>
> Fwiw, that covers a year of low-end VPS hosting for the main
> public-inbox.org/git machine + mail host
> (~1GB git objects + ~3GB Xapian index).
>
>> The site does see a lot of hits, but most of the content is a few basic
>> web pages, and copies of other static content that is updated
>> only occasionally (manpage content, lists of downloads, etc).  The biggest
>> dynamic component is the site search, I think.
>
> Maybe optimize search if that's slowest, first.  public-inbox
> uses per-host Xapian indexes so there's no extra network latency
> and it seems to work well.  But maybe you don't get FS write
> access without full VPS access on Heroku...
>
> nginx handles static content easily, and since it looks like you
> guys use unicorn[*] for running the Ruby part.  I really hope
> nginx is in front of unicorn, since (AFAIK) Heroku doesn't put
> nginx in front of it by default.
>
>
> [*] I wrote and maintain unicorn; and have not yet recommended
> any reverse proxy besides nginx to buffer for it.
> However, having varnish or any other caching layer in
> between nginx and unicorn is great, too.  I dunno how Heroku
> (or any proprietary deployment systems) handle it, though.
>
>> I do wonder if there's room for improvement either:
>>
>>   - by measuring and optimizing the Heroku deploy. I have no idea about
>> scaling Rails or Heroku apps. Do we really need three expensive
>> dynos, or a $50/mo database plan? I'm not even sure what to measure,
>> or how. There are some analytics on the site, but I don't have
>> access to them (I could probably dig around for access if there was
>> somebody who actually knew how to do something productive with
>> them).
>
> I track down the most expensive requests in per-request timing
> logs and work on profiling + optimizations from there...
> Nothing fancy and no relying on proprietary tools like NewRelic.
>
> I also watch for queueing in listen socket backlog (with
> raindrops  or ss(8) to
> notice overloading.  Again, I don't know how much visibility
> you have with Heroku.
>
>>   - by moving to a simpler model. I wonder if we could build the site
>> once and then deploy a more static variant of it to a cheaper
>> hosting platform. I'm not really sure what our options would be, how
>> much work it would take to do the conversion, and if we'd lose any
>> functionality.
>
> *shrug*  That'd be more work, at least.  I'd figure out what's
> slow, first.
>
> Fwiw, Varnish definitely helps public-inbox when slammed by
> HN/Reddit traffic.  It's great as long as you don't have
> per-user data to invalidate, which seems to be the case for
> git-scm.
>
>> If anybody is interested in tackling a project like this, let me know,
>> and I can try to provide access to whatever parts are needed.
>
> While I'm not up-to-date with modern Rails or deployment stuff,
> I'm available via email if you have any lower-level
> Ruby/unicorn/nginx-related questions.  I'm sure GitHub/GitLab
> also has folks familiar with nginx+unicorn deployment on
> bare metal or VPS who could also help.


Re: git-scm.com status report

2017-02-01 Thread Eric Wong
Jeff King  wrote:
> With the caveat that I know very little about web hosting, $230/mo
> sounds like an awful lot for what is essentially a static web site.

Yes, that's a lot.

Fwiw, that covers a year of low-end VPS hosting for the main
public-inbox.org/git machine + mail host
(~1GB git objects + ~3GB Xapian index).

> The site does see a lot of hits, but most of the content is a few basic
> web pages, and copies of other static content that is updated
> only occasionally (manpage content, lists of downloads, etc).  The biggest
> dynamic component is the site search, I think.

Maybe optimize search if that's slowest, first.  public-inbox
uses per-host Xapian indexes so there's no extra network latency
and it seems to work well.  But maybe you don't get FS write
access without full VPS access on Heroku...

nginx handles static content easily, and since it looks like you
guys use unicorn[*] for running the Ruby part.  I really hope
nginx is in front of unicorn, since (AFAIK) Heroku doesn't put
nginx in front of it by default.


[*] I wrote and maintain unicorn; and have not yet recommended
any reverse proxy besides nginx to buffer for it.
However, having varnish or any other caching layer in
between nginx and unicorn is great, too.  I dunno how Heroku
(or any proprietary deployment systems) handle it, though.

> I do wonder if there's room for improvement either:
> 
>   - by measuring and optimizing the Heroku deploy. I have no idea about
> scaling Rails or Heroku apps. Do we really need three expensive
> dynos, or a $50/mo database plan? I'm not even sure what to measure,
> or how. There are some analytics on the site, but I don't have
> access to them (I could probably dig around for access if there was
> somebody who actually knew how to do something productive with
> them).

I track down the most expensive requests in per-request timing
logs and work on profiling + optimizations from there...
Nothing fancy and no relying on proprietary tools like NewRelic.

I also watch for queueing in listen socket backlog (with
raindrops  or ss(8) to
notice overloading.  Again, I don't know how much visibility
you have with Heroku.

>   - by moving to a simpler model. I wonder if we could build the site
> once and then deploy a more static variant of it to a cheaper
> hosting platform. I'm not really sure what our options would be, how
> much work it would take to do the conversion, and if we'd lose any
> functionality.

*shrug*  That'd be more work, at least.  I'd figure out what's
slow, first.

Fwiw, Varnish definitely helps public-inbox when slammed by
HN/Reddit traffic.  It's great as long as you don't have
per-user data to invalidate, which seems to be the case for
git-scm.

> If anybody is interested in tackling a project like this, let me know,
> and I can try to provide access to whatever parts are needed.

While I'm not up-to-date with modern Rails or deployment stuff,
I'm available via email if you have any lower-level
Ruby/unicorn/nginx-related questions.  I'm sure GitHub/GitLab
also has folks familiar with nginx+unicorn deployment on
bare metal or VPS who could also help.


git-scm.com status report

2017-02-01 Thread Jeff King
We (the Git project) got control of the git-scm.com domain this year. We
have never really had an "official" website, but I think a lot of people
consider this to be one.

This is an overview of the current state, as well as some possible
issues and future work.

## What's on the site

We have the domains git-scm.com and git-scm.org (the latter we've had
for a while). They both point to the same website, which has general
information about Git, including:

  - a general overview of Git

  - links to the latest releases (both source and some binary
installers)

  - HTML-rendered copies of the manpages (both for the current version
and historical versions)

  - an HTML rendering of the contents of the Pro Git book, along with
translations. The book content is licensed cc-by-nc-sa and developed
openly.

  - various external links to books, tutorials, GUI tools, etc

## How is it developed and hosted

The site is a Ruby on Rails app. The git repository is
https://github.com/git/git-scm.com. Modifications are generally done by
pull requests there. I have admin access on the repository.

The deployed site is hosted on Heroku. It's part of GitHub's
meta-account, and they pay the bills. I have access to it, and am the
only person who deploys updates. Other technical staff at GitHub have
access, too, because of the account setup, but don't generally
participate in maintenance.

It uses three 1GB Heroku dynos for scaling, which is $150/mo. It also
uses some Heroku addons which add up to another $80/mo.

## Who's the maintainer

These days, it's pretty much me, with a lot of help from Jean-Noël Avila
on issues with the Pro Git import and formatting code.

Long ago, the site content and code was done by Scott Chacon, with
graphic design help from Jason Long.  Scott maintained the site with
help from Bryan Turner for many years. But over time, they both seemed
to get less active, and I haven't seen a peep from either on the site's
GitHub repo in the past year. I've started trying to respond to issues
and pull requests to keep things healthy.

The site is mostly in maintenance mode, but things do need addressing.
People show up with new additions, fixes for typos, broken links and
other formatting problems, etc. There are a lot of long-standing
Asciidoc formatting problems both for the manpages and the imported Pro
Git content.

## What next

We can probably continue in maintenance mode like this for a while.
We've fixed a lot of of the long-standing formatting issues over the
past year, so maintaining seems to have subsided in the past few months
to mostly just merging or rejecting the occasional PR.

Still, if anybody is interested in helping with this work, I'd love to
have more eyes on it. I can give people access to the GitHub repo.
Unfortunately, I can't do so for the Heroku deploy, and part of the
maintenance burden is that the site is finicky and often needs manual
intervention (e.g., a fix to formatting requires rebuilding the
manpages, which needs a job run manually on Heroku).

It's possible that the content or visual design of the site could be
improved in various ways. I don't have any strong desires myself, but
maybe others do. If people start doing larger work, though, we have a
real lack of reviewers, and I have very little expertise with Rails or
with visual design. So anybody who wants to do this should be prepared to
take maintenance ownership.

At some point, GitHub may boot us off of the shared Heroku account,
because my impression is that it's somewhat of an administrative
headache. I don't think the Git project could afford the $230/mo hosting
fees; that's basically all the money we make. On the other hand, we
haven't actively solicited funds to any great degree, and it's possible
we could get GitHub or some other entity to just sponsor us with site
fees (I've heard zero complaints from GitHub about the money; it's
mostly just that the site is an oddball among their other assets).

With the caveat that I know very little about web hosting, $230/mo
sounds like an awful lot for what is essentially a static web site.
The site does see a lot of hits, but most of the content is a few basic
web pages, and copies of other static content that is updated
only occasionally (manpage content, lists of downloads, etc).  The biggest
dynamic component is the site search, I think.

I do wonder if there's room for improvement either:

  - by measuring and optimizing the Heroku deploy. I have no idea about
scaling Rails or Heroku apps. Do we really need three expensive
dynos, or a $50/mo database plan? I'm not even sure what to measure,
or how. There are some analytics on the site, but I don't have
access to them (I could probably dig around for access if there was
somebody who actually knew how to do something productive with
them).

  - by moving to a simpler model. I wonder if we could build the site
once and then deploy a more static variant of it to a cheaper