Re: Proposal to move website source to arrow-site, add automatic builds

2019-08-12 Thread Neal Richardson
I started https://github.com/apache/arrow/pull/5015 for the removal
last week; will finish that up today or tomorrow.

Neal

On Sun, Aug 11, 2019 at 8:23 AM Wes McKinney  wrote:
>
> It looks like the git pruning is done. So we can remove the site/
> directory from the main repository at some point soon.
>
> On Thu, Aug 8, 2019 at 2:29 PM Neal Richardson
>  wrote:
> >
> > I need a committer to make a master branch on arrow-site so that I can
> > PR to it. I thought it could be just an empty orphan branch but that
> > proved not to work, so a committer will need to do the following:
> >
> > ```
> > git clone g...@github.com:$YOURGITHUB/arrow.git arrow-copy
> > cd arrow-copy
> > git filter-branch --prune-empty --subdirectory-filter site master
> > vi .git/config
> > # Change remote "origin"'s URL to be g...@github.com:arrow/arrow-site.git
> > git push -f origin master
> > ```
> >
> > On Thu, Aug 8, 2019 at 12:07 PM Wes McKinney  wrote:
> > >
> > > Yes, I think we have adequate lazy consensus. Can you spell out what
> > > are the next steps?
> > >
> > > On Thu, Aug 8, 2019 at 2:01 PM Neal Richardson
> > >  wrote:
> > > >
> > > > Have we reached "lazy consensus" here? No further comments in the last
> > > > three days.
> > > >
> > > > Thanks,
> > > > Neal
> > > >
> > > > On Mon, Aug 5, 2019 at 1:46 PM Joris Van den Bossche
> > > >  wrote:
> > > > >
> > > > > This sounds as a good proposal to me (at least at the moment where we 
> > > > > have
> > > > > separate docs and main site).
> > > > > I agree that documentation should indeed stay with the code, as you 
> > > > > want to
> > > > > update those together in PRs. But the website is something you can
> > > > > typically update separately and also might want to update 
> > > > > independently
> > > > > from code releases. And certainly if this proposal makes it easier to 
> > > > > work
> > > > > on the site, all the better.
> > > > >
> > > > > Joris
> > > > >
> > > > > Op ma 5 aug. 2019 20:30 schreef Wes McKinney :
> > > > >
> > > > > > Let's wait a little while to collect any additional opinions about 
> > > > > > this.
> > > > > >
> > > > > > There's pretty good evidence from other Apache projects that this
> > > > > > isn't too bad of an idea
> > > > > >
> > > > > > Apache Calcite: https://github.com/apache/calcite-site
> > > > > > Apache Kafka: https://github.com/apache/kafka-site
> > > > > > Apache Spark: https://github.com/apache/spark-website
> > > > > >
> > > > > > The Apache projects I've seen where the same repository is used for
> > > > > > $FOO.apache.org tend to be ones where the documentation _is_ the
> > > > > > website. I think we would need to commission a significant web 
> > > > > > design
> > > > > > overhaul to be able to make our documentation page adequate as the
> > > > > > landing point for visitors to https://arrow.apache.org.
> > > > > >
> > > > > > On Sat, Aug 3, 2019 at 3:46 PM Neal Richardson
> > > > > >  wrote:
> > > > > > >
> > > > > > > Given the status quo, it would be difficult for this to make the 
> > > > > > > Arrow
> > > > > > > website less maintained. In fact, arrow-site is currently missing 
> > > > > > > the
> > > > > > > most recent two patches that modified the site directory in
> > > > > > > apache/arrow. Having multiple manual deploy steps increases the
> > > > > > > likelihood that the website stays stale.
> > > > > > >
> > > > > > > As someone who has been working on the arrow site lately, this
> > > > > > > proposal makes it easier for me to make changes to the website 
> > > > > > > because
> > > > > > > I can automatically deploy my changes to a test site, and that 
> > > > > > > lets
> > > > > > > others in the community, who perhaps don't touch the website much,
> > > > > > > verify that they're good.
> > > > > > >
> > > > > > > I agree that the documentation situation needs attention, but as I
> > > > > > > said initially, that's orthogonal to this static site generation. 
> > > > > > > I'd
> > > > > > > like to work on that next, and I think these changes will make it
> > > > > > > easier to do. I would not propose moving doc generation out of
> > > > > > > apache/arrow--that belongs with the code.
> > > > > > >
> > > > > > > Neal
> > > > > > >
> > > > > > > On Sat, Aug 3, 2019 at 9:49 AM Wes McKinney  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > I think that the project website and the project documentation 
> > > > > > > > are
> > > > > > > > currently distinct entities. The current Jekyll website is 
> > > > > > > > independent
> > > > > > > > from the Sphinx documentation project aside from a link to the
> > > > > > > > documentation from the website.
> > > > > > > >
> > > > > > > > I am guessing that we would want to maintain some amount of 
> > > > > > > > separation
> > > > > > > > between the main site at arrow.apache.org and the code / format
> > > > > > > > documentation, at minimum because we may want to make 
> > > > > > > > documentation
> > > > > > > > available for multiple 

Re: Proposal to move website source to arrow-site, add automatic builds

2019-08-11 Thread Wes McKinney
It looks like the git pruning is done. So we can remove the site/
directory from the main repository at some point soon.

On Thu, Aug 8, 2019 at 2:29 PM Neal Richardson
 wrote:
>
> I need a committer to make a master branch on arrow-site so that I can
> PR to it. I thought it could be just an empty orphan branch but that
> proved not to work, so a committer will need to do the following:
>
> ```
> git clone g...@github.com:$YOURGITHUB/arrow.git arrow-copy
> cd arrow-copy
> git filter-branch --prune-empty --subdirectory-filter site master
> vi .git/config
> # Change remote "origin"'s URL to be g...@github.com:arrow/arrow-site.git
> git push -f origin master
> ```
>
> On Thu, Aug 8, 2019 at 12:07 PM Wes McKinney  wrote:
> >
> > Yes, I think we have adequate lazy consensus. Can you spell out what
> > are the next steps?
> >
> > On Thu, Aug 8, 2019 at 2:01 PM Neal Richardson
> >  wrote:
> > >
> > > Have we reached "lazy consensus" here? No further comments in the last
> > > three days.
> > >
> > > Thanks,
> > > Neal
> > >
> > > On Mon, Aug 5, 2019 at 1:46 PM Joris Van den Bossche
> > >  wrote:
> > > >
> > > > This sounds as a good proposal to me (at least at the moment where we 
> > > > have
> > > > separate docs and main site).
> > > > I agree that documentation should indeed stay with the code, as you 
> > > > want to
> > > > update those together in PRs. But the website is something you can
> > > > typically update separately and also might want to update independently
> > > > from code releases. And certainly if this proposal makes it easier to 
> > > > work
> > > > on the site, all the better.
> > > >
> > > > Joris
> > > >
> > > > Op ma 5 aug. 2019 20:30 schreef Wes McKinney :
> > > >
> > > > > Let's wait a little while to collect any additional opinions about 
> > > > > this.
> > > > >
> > > > > There's pretty good evidence from other Apache projects that this
> > > > > isn't too bad of an idea
> > > > >
> > > > > Apache Calcite: https://github.com/apache/calcite-site
> > > > > Apache Kafka: https://github.com/apache/kafka-site
> > > > > Apache Spark: https://github.com/apache/spark-website
> > > > >
> > > > > The Apache projects I've seen where the same repository is used for
> > > > > $FOO.apache.org tend to be ones where the documentation _is_ the
> > > > > website. I think we would need to commission a significant web design
> > > > > overhaul to be able to make our documentation page adequate as the
> > > > > landing point for visitors to https://arrow.apache.org.
> > > > >
> > > > > On Sat, Aug 3, 2019 at 3:46 PM Neal Richardson
> > > > >  wrote:
> > > > > >
> > > > > > Given the status quo, it would be difficult for this to make the 
> > > > > > Arrow
> > > > > > website less maintained. In fact, arrow-site is currently missing 
> > > > > > the
> > > > > > most recent two patches that modified the site directory in
> > > > > > apache/arrow. Having multiple manual deploy steps increases the
> > > > > > likelihood that the website stays stale.
> > > > > >
> > > > > > As someone who has been working on the arrow site lately, this
> > > > > > proposal makes it easier for me to make changes to the website 
> > > > > > because
> > > > > > I can automatically deploy my changes to a test site, and that lets
> > > > > > others in the community, who perhaps don't touch the website much,
> > > > > > verify that they're good.
> > > > > >
> > > > > > I agree that the documentation situation needs attention, but as I
> > > > > > said initially, that's orthogonal to this static site generation. 
> > > > > > I'd
> > > > > > like to work on that next, and I think these changes will make it
> > > > > > easier to do. I would not propose moving doc generation out of
> > > > > > apache/arrow--that belongs with the code.
> > > > > >
> > > > > > Neal
> > > > > >
> > > > > > On Sat, Aug 3, 2019 at 9:49 AM Wes McKinney  
> > > > > > wrote:
> > > > > > >
> > > > > > > I think that the project website and the project documentation are
> > > > > > > currently distinct entities. The current Jekyll website is 
> > > > > > > independent
> > > > > > > from the Sphinx documentation project aside from a link to the
> > > > > > > documentation from the website.
> > > > > > >
> > > > > > > I am guessing that we would want to maintain some amount of 
> > > > > > > separation
> > > > > > > between the main site at arrow.apache.org and the code / format
> > > > > > > documentation, at minimum because we may want to make 
> > > > > > > documentation
> > > > > > > available for multiple versions of the project (this has already 
> > > > > > > been
> > > > > > > cited as an issue -- when we release, we're overwriting the 
> > > > > > > previous
> > > > > > > version of the docs)
> > > > > > >
> > > > > > > On Sat, Aug 3, 2019 at 11:33 AM Antoine Pitrou 
> > > > > > > 
> > > > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > I am concerned with this.  What happens if we happen to move 
> > > > > > > > part of
> > > > > the
> > > > 

Re: Proposal to move website source to arrow-site, add automatic builds

2019-08-08 Thread Neal Richardson
I need a committer to make a master branch on arrow-site so that I can
PR to it. I thought it could be just an empty orphan branch but that
proved not to work, so a committer will need to do the following:

```
git clone g...@github.com:$YOURGITHUB/arrow.git arrow-copy
cd arrow-copy
git filter-branch --prune-empty --subdirectory-filter site master
vi .git/config
# Change remote "origin"'s URL to be g...@github.com:arrow/arrow-site.git
git push -f origin master
```

On Thu, Aug 8, 2019 at 12:07 PM Wes McKinney  wrote:
>
> Yes, I think we have adequate lazy consensus. Can you spell out what
> are the next steps?
>
> On Thu, Aug 8, 2019 at 2:01 PM Neal Richardson
>  wrote:
> >
> > Have we reached "lazy consensus" here? No further comments in the last
> > three days.
> >
> > Thanks,
> > Neal
> >
> > On Mon, Aug 5, 2019 at 1:46 PM Joris Van den Bossche
> >  wrote:
> > >
> > > This sounds as a good proposal to me (at least at the moment where we have
> > > separate docs and main site).
> > > I agree that documentation should indeed stay with the code, as you want 
> > > to
> > > update those together in PRs. But the website is something you can
> > > typically update separately and also might want to update independently
> > > from code releases. And certainly if this proposal makes it easier to work
> > > on the site, all the better.
> > >
> > > Joris
> > >
> > > Op ma 5 aug. 2019 20:30 schreef Wes McKinney :
> > >
> > > > Let's wait a little while to collect any additional opinions about this.
> > > >
> > > > There's pretty good evidence from other Apache projects that this
> > > > isn't too bad of an idea
> > > >
> > > > Apache Calcite: https://github.com/apache/calcite-site
> > > > Apache Kafka: https://github.com/apache/kafka-site
> > > > Apache Spark: https://github.com/apache/spark-website
> > > >
> > > > The Apache projects I've seen where the same repository is used for
> > > > $FOO.apache.org tend to be ones where the documentation _is_ the
> > > > website. I think we would need to commission a significant web design
> > > > overhaul to be able to make our documentation page adequate as the
> > > > landing point for visitors to https://arrow.apache.org.
> > > >
> > > > On Sat, Aug 3, 2019 at 3:46 PM Neal Richardson
> > > >  wrote:
> > > > >
> > > > > Given the status quo, it would be difficult for this to make the Arrow
> > > > > website less maintained. In fact, arrow-site is currently missing the
> > > > > most recent two patches that modified the site directory in
> > > > > apache/arrow. Having multiple manual deploy steps increases the
> > > > > likelihood that the website stays stale.
> > > > >
> > > > > As someone who has been working on the arrow site lately, this
> > > > > proposal makes it easier for me to make changes to the website because
> > > > > I can automatically deploy my changes to a test site, and that lets
> > > > > others in the community, who perhaps don't touch the website much,
> > > > > verify that they're good.
> > > > >
> > > > > I agree that the documentation situation needs attention, but as I
> > > > > said initially, that's orthogonal to this static site generation. I'd
> > > > > like to work on that next, and I think these changes will make it
> > > > > easier to do. I would not propose moving doc generation out of
> > > > > apache/arrow--that belongs with the code.
> > > > >
> > > > > Neal
> > > > >
> > > > > On Sat, Aug 3, 2019 at 9:49 AM Wes McKinney  
> > > > > wrote:
> > > > > >
> > > > > > I think that the project website and the project documentation are
> > > > > > currently distinct entities. The current Jekyll website is 
> > > > > > independent
> > > > > > from the Sphinx documentation project aside from a link to the
> > > > > > documentation from the website.
> > > > > >
> > > > > > I am guessing that we would want to maintain some amount of 
> > > > > > separation
> > > > > > between the main site at arrow.apache.org and the code / format
> > > > > > documentation, at minimum because we may want to make documentation
> > > > > > available for multiple versions of the project (this has already 
> > > > > > been
> > > > > > cited as an issue -- when we release, we're overwriting the previous
> > > > > > version of the docs)
> > > > > >
> > > > > > On Sat, Aug 3, 2019 at 11:33 AM Antoine Pitrou 
> > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > I am concerned with this.  What happens if we happen to move part 
> > > > > > > of
> > > > the
> > > > > > > current site to e.g. the Sphinx docs in the Arrow repository (we
> > > > already
> > > > > > > did that, so it's not theoretical)?
> > > > > > >
> > > > > > > More generally, I also think that any move towards separating 
> > > > > > > website
> > > > > > > and code repo more will lead to an even less maintained website.
> > > > > > >
> > > > > > > Regards
> > > > > > >
> > > > > > > Antoine.
> > > > > > >
> > > > > > >
> > > > > > > Le 02/08/2019 à 22:39, Wes McKinney a écrit :
> > > > > > > > hi Neal,

Re: Proposal to move website source to arrow-site, add automatic builds

2019-08-08 Thread Wes McKinney
Yes, I think we have adequate lazy consensus. Can you spell out what
are the next steps?

On Thu, Aug 8, 2019 at 2:01 PM Neal Richardson
 wrote:
>
> Have we reached "lazy consensus" here? No further comments in the last
> three days.
>
> Thanks,
> Neal
>
> On Mon, Aug 5, 2019 at 1:46 PM Joris Van den Bossche
>  wrote:
> >
> > This sounds as a good proposal to me (at least at the moment where we have
> > separate docs and main site).
> > I agree that documentation should indeed stay with the code, as you want to
> > update those together in PRs. But the website is something you can
> > typically update separately and also might want to update independently
> > from code releases. And certainly if this proposal makes it easier to work
> > on the site, all the better.
> >
> > Joris
> >
> > Op ma 5 aug. 2019 20:30 schreef Wes McKinney :
> >
> > > Let's wait a little while to collect any additional opinions about this.
> > >
> > > There's pretty good evidence from other Apache projects that this
> > > isn't too bad of an idea
> > >
> > > Apache Calcite: https://github.com/apache/calcite-site
> > > Apache Kafka: https://github.com/apache/kafka-site
> > > Apache Spark: https://github.com/apache/spark-website
> > >
> > > The Apache projects I've seen where the same repository is used for
> > > $FOO.apache.org tend to be ones where the documentation _is_ the
> > > website. I think we would need to commission a significant web design
> > > overhaul to be able to make our documentation page adequate as the
> > > landing point for visitors to https://arrow.apache.org.
> > >
> > > On Sat, Aug 3, 2019 at 3:46 PM Neal Richardson
> > >  wrote:
> > > >
> > > > Given the status quo, it would be difficult for this to make the Arrow
> > > > website less maintained. In fact, arrow-site is currently missing the
> > > > most recent two patches that modified the site directory in
> > > > apache/arrow. Having multiple manual deploy steps increases the
> > > > likelihood that the website stays stale.
> > > >
> > > > As someone who has been working on the arrow site lately, this
> > > > proposal makes it easier for me to make changes to the website because
> > > > I can automatically deploy my changes to a test site, and that lets
> > > > others in the community, who perhaps don't touch the website much,
> > > > verify that they're good.
> > > >
> > > > I agree that the documentation situation needs attention, but as I
> > > > said initially, that's orthogonal to this static site generation. I'd
> > > > like to work on that next, and I think these changes will make it
> > > > easier to do. I would not propose moving doc generation out of
> > > > apache/arrow--that belongs with the code.
> > > >
> > > > Neal
> > > >
> > > > On Sat, Aug 3, 2019 at 9:49 AM Wes McKinney  wrote:
> > > > >
> > > > > I think that the project website and the project documentation are
> > > > > currently distinct entities. The current Jekyll website is independent
> > > > > from the Sphinx documentation project aside from a link to the
> > > > > documentation from the website.
> > > > >
> > > > > I am guessing that we would want to maintain some amount of separation
> > > > > between the main site at arrow.apache.org and the code / format
> > > > > documentation, at minimum because we may want to make documentation
> > > > > available for multiple versions of the project (this has already been
> > > > > cited as an issue -- when we release, we're overwriting the previous
> > > > > version of the docs)
> > > > >
> > > > > On Sat, Aug 3, 2019 at 11:33 AM Antoine Pitrou 
> > > wrote:
> > > > > >
> > > > > >
> > > > > > I am concerned with this.  What happens if we happen to move part of
> > > the
> > > > > > current site to e.g. the Sphinx docs in the Arrow repository (we
> > > already
> > > > > > did that, so it's not theoretical)?
> > > > > >
> > > > > > More generally, I also think that any move towards separating 
> > > > > > website
> > > > > > and code repo more will lead to an even less maintained website.
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > Antoine.
> > > > > >
> > > > > >
> > > > > > Le 02/08/2019 à 22:39, Wes McKinney a écrit :
> > > > > > > hi Neal,
> > > > > > >
> > > > > > > In general the improvements to the site sound good, and I agree
> > > with
> > > > > > > moving the site into the apache/arrow-site repository.
> > > > > > >
> > > > > > > It sounds like a committer will have to volunteer a PAT for the
> > > Travis
> > > > > > > CI settings in
> > > > > > >
> > > > > > > https://travis-ci.org/apache/arrow-site/settings
> > > > > > >
> > > > > > > Even though you can't get at such an environment variable there
> > > after
> > > > > > > it's set, it could still technically be compromised. Personally I
> > > > > > > wouldn't be comfortable having a token with "repo" scope out
> > > there. We
> > > > > > > might need to think about this some more -- the general idea of
> > > making
> > > > > > > it easier to deploy the website 

Re: Proposal to move website source to arrow-site, add automatic builds

2019-08-08 Thread Neal Richardson
Have we reached "lazy consensus" here? No further comments in the last
three days.

Thanks,
Neal

On Mon, Aug 5, 2019 at 1:46 PM Joris Van den Bossche
 wrote:
>
> This sounds as a good proposal to me (at least at the moment where we have
> separate docs and main site).
> I agree that documentation should indeed stay with the code, as you want to
> update those together in PRs. But the website is something you can
> typically update separately and also might want to update independently
> from code releases. And certainly if this proposal makes it easier to work
> on the site, all the better.
>
> Joris
>
> Op ma 5 aug. 2019 20:30 schreef Wes McKinney :
>
> > Let's wait a little while to collect any additional opinions about this.
> >
> > There's pretty good evidence from other Apache projects that this
> > isn't too bad of an idea
> >
> > Apache Calcite: https://github.com/apache/calcite-site
> > Apache Kafka: https://github.com/apache/kafka-site
> > Apache Spark: https://github.com/apache/spark-website
> >
> > The Apache projects I've seen where the same repository is used for
> > $FOO.apache.org tend to be ones where the documentation _is_ the
> > website. I think we would need to commission a significant web design
> > overhaul to be able to make our documentation page adequate as the
> > landing point for visitors to https://arrow.apache.org.
> >
> > On Sat, Aug 3, 2019 at 3:46 PM Neal Richardson
> >  wrote:
> > >
> > > Given the status quo, it would be difficult for this to make the Arrow
> > > website less maintained. In fact, arrow-site is currently missing the
> > > most recent two patches that modified the site directory in
> > > apache/arrow. Having multiple manual deploy steps increases the
> > > likelihood that the website stays stale.
> > >
> > > As someone who has been working on the arrow site lately, this
> > > proposal makes it easier for me to make changes to the website because
> > > I can automatically deploy my changes to a test site, and that lets
> > > others in the community, who perhaps don't touch the website much,
> > > verify that they're good.
> > >
> > > I agree that the documentation situation needs attention, but as I
> > > said initially, that's orthogonal to this static site generation. I'd
> > > like to work on that next, and I think these changes will make it
> > > easier to do. I would not propose moving doc generation out of
> > > apache/arrow--that belongs with the code.
> > >
> > > Neal
> > >
> > > On Sat, Aug 3, 2019 at 9:49 AM Wes McKinney  wrote:
> > > >
> > > > I think that the project website and the project documentation are
> > > > currently distinct entities. The current Jekyll website is independent
> > > > from the Sphinx documentation project aside from a link to the
> > > > documentation from the website.
> > > >
> > > > I am guessing that we would want to maintain some amount of separation
> > > > between the main site at arrow.apache.org and the code / format
> > > > documentation, at minimum because we may want to make documentation
> > > > available for multiple versions of the project (this has already been
> > > > cited as an issue -- when we release, we're overwriting the previous
> > > > version of the docs)
> > > >
> > > > On Sat, Aug 3, 2019 at 11:33 AM Antoine Pitrou 
> > wrote:
> > > > >
> > > > >
> > > > > I am concerned with this.  What happens if we happen to move part of
> > the
> > > > > current site to e.g. the Sphinx docs in the Arrow repository (we
> > already
> > > > > did that, so it's not theoretical)?
> > > > >
> > > > > More generally, I also think that any move towards separating website
> > > > > and code repo more will lead to an even less maintained website.
> > > > >
> > > > > Regards
> > > > >
> > > > > Antoine.
> > > > >
> > > > >
> > > > > Le 02/08/2019 à 22:39, Wes McKinney a écrit :
> > > > > > hi Neal,
> > > > > >
> > > > > > In general the improvements to the site sound good, and I agree
> > with
> > > > > > moving the site into the apache/arrow-site repository.
> > > > > >
> > > > > > It sounds like a committer will have to volunteer a PAT for the
> > Travis
> > > > > > CI settings in
> > > > > >
> > > > > > https://travis-ci.org/apache/arrow-site/settings
> > > > > >
> > > > > > Even though you can't get at such an environment variable there
> > after
> > > > > > it's set, it could still technically be compromised. Personally I
> > > > > > wouldn't be comfortable having a token with "repo" scope out
> > there. We
> > > > > > might need to think about this some more -- the general idea of
> > making
> > > > > > it easier to deploy the website I'm totally on board with
> > > > > >
> > > > > > - Wes
> > > > > >
> > > > > >
> > > > > > On Fri, Aug 2, 2019 at 1:35 PM Neal Richardson
> > > > > >  wrote:
> > > > > >>
> > > > > >> Hi all,
> > > > > >> https://issues.apache.org/jira/browse/ARROW-5746 requested to
> > move the
> > > > > >> source for https://arrow.apache.org out of `apache/arrow` due to
> > the
> > > > > >> 

Re: Proposal to move website source to arrow-site, add automatic builds

2019-08-05 Thread Joris Van den Bossche
This sounds as a good proposal to me (at least at the moment where we have
separate docs and main site).
I agree that documentation should indeed stay with the code, as you want to
update those together in PRs. But the website is something you can
typically update separately and also might want to update independently
from code releases. And certainly if this proposal makes it easier to work
on the site, all the better.

Joris

Op ma 5 aug. 2019 20:30 schreef Wes McKinney :

> Let's wait a little while to collect any additional opinions about this.
>
> There's pretty good evidence from other Apache projects that this
> isn't too bad of an idea
>
> Apache Calcite: https://github.com/apache/calcite-site
> Apache Kafka: https://github.com/apache/kafka-site
> Apache Spark: https://github.com/apache/spark-website
>
> The Apache projects I've seen where the same repository is used for
> $FOO.apache.org tend to be ones where the documentation _is_ the
> website. I think we would need to commission a significant web design
> overhaul to be able to make our documentation page adequate as the
> landing point for visitors to https://arrow.apache.org.
>
> On Sat, Aug 3, 2019 at 3:46 PM Neal Richardson
>  wrote:
> >
> > Given the status quo, it would be difficult for this to make the Arrow
> > website less maintained. In fact, arrow-site is currently missing the
> > most recent two patches that modified the site directory in
> > apache/arrow. Having multiple manual deploy steps increases the
> > likelihood that the website stays stale.
> >
> > As someone who has been working on the arrow site lately, this
> > proposal makes it easier for me to make changes to the website because
> > I can automatically deploy my changes to a test site, and that lets
> > others in the community, who perhaps don't touch the website much,
> > verify that they're good.
> >
> > I agree that the documentation situation needs attention, but as I
> > said initially, that's orthogonal to this static site generation. I'd
> > like to work on that next, and I think these changes will make it
> > easier to do. I would not propose moving doc generation out of
> > apache/arrow--that belongs with the code.
> >
> > Neal
> >
> > On Sat, Aug 3, 2019 at 9:49 AM Wes McKinney  wrote:
> > >
> > > I think that the project website and the project documentation are
> > > currently distinct entities. The current Jekyll website is independent
> > > from the Sphinx documentation project aside from a link to the
> > > documentation from the website.
> > >
> > > I am guessing that we would want to maintain some amount of separation
> > > between the main site at arrow.apache.org and the code / format
> > > documentation, at minimum because we may want to make documentation
> > > available for multiple versions of the project (this has already been
> > > cited as an issue -- when we release, we're overwriting the previous
> > > version of the docs)
> > >
> > > On Sat, Aug 3, 2019 at 11:33 AM Antoine Pitrou 
> wrote:
> > > >
> > > >
> > > > I am concerned with this.  What happens if we happen to move part of
> the
> > > > current site to e.g. the Sphinx docs in the Arrow repository (we
> already
> > > > did that, so it's not theoretical)?
> > > >
> > > > More generally, I also think that any move towards separating website
> > > > and code repo more will lead to an even less maintained website.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > > Le 02/08/2019 à 22:39, Wes McKinney a écrit :
> > > > > hi Neal,
> > > > >
> > > > > In general the improvements to the site sound good, and I agree
> with
> > > > > moving the site into the apache/arrow-site repository.
> > > > >
> > > > > It sounds like a committer will have to volunteer a PAT for the
> Travis
> > > > > CI settings in
> > > > >
> > > > > https://travis-ci.org/apache/arrow-site/settings
> > > > >
> > > > > Even though you can't get at such an environment variable there
> after
> > > > > it's set, it could still technically be compromised. Personally I
> > > > > wouldn't be comfortable having a token with "repo" scope out
> there. We
> > > > > might need to think about this some more -- the general idea of
> making
> > > > > it easier to deploy the website I'm totally on board with
> > > > >
> > > > > - Wes
> > > > >
> > > > >
> > > > > On Fri, Aug 2, 2019 at 1:35 PM Neal Richardson
> > > > >  wrote:
> > > > >>
> > > > >> Hi all,
> > > > >> https://issues.apache.org/jira/browse/ARROW-5746 requested to
> move the
> > > > >> source for https://arrow.apache.org out of `apache/arrow` due to
> the
> > > > >> growing number of binary files (mostly images) there.
> > > > >>
> > > > >> https://issues.apache.org/jira/browse/ARROW-4473 requested
> > > > >> improvements to the ability to make a test deploy of the website
> and
> > > > >> noted challenges/bugs in trying to do this when the site
> `baseurl` is
> > > > >> a subdirectory.
> > > > >>
> > > > >> On my fork of `arrow-site` [1] I have a 

Re: Proposal to move website source to arrow-site, add automatic builds

2019-08-05 Thread Wes McKinney
Let's wait a little while to collect any additional opinions about this.

There's pretty good evidence from other Apache projects that this
isn't too bad of an idea

Apache Calcite: https://github.com/apache/calcite-site
Apache Kafka: https://github.com/apache/kafka-site
Apache Spark: https://github.com/apache/spark-website

The Apache projects I've seen where the same repository is used for
$FOO.apache.org tend to be ones where the documentation _is_ the
website. I think we would need to commission a significant web design
overhaul to be able to make our documentation page adequate as the
landing point for visitors to https://arrow.apache.org.

On Sat, Aug 3, 2019 at 3:46 PM Neal Richardson
 wrote:
>
> Given the status quo, it would be difficult for this to make the Arrow
> website less maintained. In fact, arrow-site is currently missing the
> most recent two patches that modified the site directory in
> apache/arrow. Having multiple manual deploy steps increases the
> likelihood that the website stays stale.
>
> As someone who has been working on the arrow site lately, this
> proposal makes it easier for me to make changes to the website because
> I can automatically deploy my changes to a test site, and that lets
> others in the community, who perhaps don't touch the website much,
> verify that they're good.
>
> I agree that the documentation situation needs attention, but as I
> said initially, that's orthogonal to this static site generation. I'd
> like to work on that next, and I think these changes will make it
> easier to do. I would not propose moving doc generation out of
> apache/arrow--that belongs with the code.
>
> Neal
>
> On Sat, Aug 3, 2019 at 9:49 AM Wes McKinney  wrote:
> >
> > I think that the project website and the project documentation are
> > currently distinct entities. The current Jekyll website is independent
> > from the Sphinx documentation project aside from a link to the
> > documentation from the website.
> >
> > I am guessing that we would want to maintain some amount of separation
> > between the main site at arrow.apache.org and the code / format
> > documentation, at minimum because we may want to make documentation
> > available for multiple versions of the project (this has already been
> > cited as an issue -- when we release, we're overwriting the previous
> > version of the docs)
> >
> > On Sat, Aug 3, 2019 at 11:33 AM Antoine Pitrou  wrote:
> > >
> > >
> > > I am concerned with this.  What happens if we happen to move part of the
> > > current site to e.g. the Sphinx docs in the Arrow repository (we already
> > > did that, so it's not theoretical)?
> > >
> > > More generally, I also think that any move towards separating website
> > > and code repo more will lead to an even less maintained website.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 02/08/2019 à 22:39, Wes McKinney a écrit :
> > > > hi Neal,
> > > >
> > > > In general the improvements to the site sound good, and I agree with
> > > > moving the site into the apache/arrow-site repository.
> > > >
> > > > It sounds like a committer will have to volunteer a PAT for the Travis
> > > > CI settings in
> > > >
> > > > https://travis-ci.org/apache/arrow-site/settings
> > > >
> > > > Even though you can't get at such an environment variable there after
> > > > it's set, it could still technically be compromised. Personally I
> > > > wouldn't be comfortable having a token with "repo" scope out there. We
> > > > might need to think about this some more -- the general idea of making
> > > > it easier to deploy the website I'm totally on board with
> > > >
> > > > - Wes
> > > >
> > > >
> > > > On Fri, Aug 2, 2019 at 1:35 PM Neal Richardson
> > > >  wrote:
> > > >>
> > > >> Hi all,
> > > >> https://issues.apache.org/jira/browse/ARROW-5746 requested to move the
> > > >> source for https://arrow.apache.org out of `apache/arrow` due to the
> > > >> growing number of binary files (mostly images) there.
> > > >>
> > > >> https://issues.apache.org/jira/browse/ARROW-4473 requested
> > > >> improvements to the ability to make a test deploy of the website and
> > > >> noted challenges/bugs in trying to do this when the site `baseurl` is
> > > >> a subdirectory.
> > > >>
> > > >> On my fork of `arrow-site` [1] I have a solution to both. I created a
> > > >> `master` branch and copied the contents of the `site/` directory in
> > > >> `apache/arrow` to that, using `git filter-branch --prune-empty
> > > >> --subdirectory-filter site master` to preserve the commit history [2].
> > > >> Then I added a build script [3] that gets executed by Travis-CI [4].
> > > >>
> > > >> The script builds the Jekyll site and pushes it to a branch that gets
> > > >> published. On `apache/arrow-site`, commits to the `master` branch
> > > >> trigger a build of the Jekyll site and push the result to the
> > > >> `asf-site` branch. On forks, commits to `master` build the site and
> > > >> publish to the `gh-pages` branch, which can 

Re: Proposal to move website source to arrow-site, add automatic builds

2019-08-03 Thread Neal Richardson
Given the status quo, it would be difficult for this to make the Arrow
website less maintained. In fact, arrow-site is currently missing the
most recent two patches that modified the site directory in
apache/arrow. Having multiple manual deploy steps increases the
likelihood that the website stays stale.

As someone who has been working on the arrow site lately, this
proposal makes it easier for me to make changes to the website because
I can automatically deploy my changes to a test site, and that lets
others in the community, who perhaps don't touch the website much,
verify that they're good.

I agree that the documentation situation needs attention, but as I
said initially, that's orthogonal to this static site generation. I'd
like to work on that next, and I think these changes will make it
easier to do. I would not propose moving doc generation out of
apache/arrow--that belongs with the code.

Neal

On Sat, Aug 3, 2019 at 9:49 AM Wes McKinney  wrote:
>
> I think that the project website and the project documentation are
> currently distinct entities. The current Jekyll website is independent
> from the Sphinx documentation project aside from a link to the
> documentation from the website.
>
> I am guessing that we would want to maintain some amount of separation
> between the main site at arrow.apache.org and the code / format
> documentation, at minimum because we may want to make documentation
> available for multiple versions of the project (this has already been
> cited as an issue -- when we release, we're overwriting the previous
> version of the docs)
>
> On Sat, Aug 3, 2019 at 11:33 AM Antoine Pitrou  wrote:
> >
> >
> > I am concerned with this.  What happens if we happen to move part of the
> > current site to e.g. the Sphinx docs in the Arrow repository (we already
> > did that, so it's not theoretical)?
> >
> > More generally, I also think that any move towards separating website
> > and code repo more will lead to an even less maintained website.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 02/08/2019 à 22:39, Wes McKinney a écrit :
> > > hi Neal,
> > >
> > > In general the improvements to the site sound good, and I agree with
> > > moving the site into the apache/arrow-site repository.
> > >
> > > It sounds like a committer will have to volunteer a PAT for the Travis
> > > CI settings in
> > >
> > > https://travis-ci.org/apache/arrow-site/settings
> > >
> > > Even though you can't get at such an environment variable there after
> > > it's set, it could still technically be compromised. Personally I
> > > wouldn't be comfortable having a token with "repo" scope out there. We
> > > might need to think about this some more -- the general idea of making
> > > it easier to deploy the website I'm totally on board with
> > >
> > > - Wes
> > >
> > >
> > > On Fri, Aug 2, 2019 at 1:35 PM Neal Richardson
> > >  wrote:
> > >>
> > >> Hi all,
> > >> https://issues.apache.org/jira/browse/ARROW-5746 requested to move the
> > >> source for https://arrow.apache.org out of `apache/arrow` due to the
> > >> growing number of binary files (mostly images) there.
> > >>
> > >> https://issues.apache.org/jira/browse/ARROW-4473 requested
> > >> improvements to the ability to make a test deploy of the website and
> > >> noted challenges/bugs in trying to do this when the site `baseurl` is
> > >> a subdirectory.
> > >>
> > >> On my fork of `arrow-site` [1] I have a solution to both. I created a
> > >> `master` branch and copied the contents of the `site/` directory in
> > >> `apache/arrow` to that, using `git filter-branch --prune-empty
> > >> --subdirectory-filter site master` to preserve the commit history [2].
> > >> Then I added a build script [3] that gets executed by Travis-CI [4].
> > >>
> > >> The script builds the Jekyll site and pushes it to a branch that gets
> > >> published. On `apache/arrow-site`, commits to the `master` branch
> > >> trigger a build of the Jekyll site and push the result to the
> > >> `asf-site` branch. On forks, commits to `master` build the site and
> > >> publish to the `gh-pages` branch, which can deploy to GitHub Pages.
> > >>
> > >> ## Features
> > >>
> > >> * Automatic building of the arrow.apache.org site whenever changes are
> > >> made to the Jekyll source--no manual build step required.
> > >> * Automatic building of a test site from your fork, which will enable
> > >> reviewers to verify your changes without having to build and serve
> > >> locally and trust that what works locally will work when deployed.
> > >> * Relative URL problems are fixed: links work regardless of whether
> > >> the "base URL" is top level or a subdirectory.
> > >> * Reduced size of the core `apache/arrow` repository
> > >> * Documentation publishing is not affected. Updating the contents of
> > >> the `docs/` directory in the published `asf-site` branch can continue
> > >> to happen by whatever other process. The automatic building and
> > >> publishing of the Jekyll site does not overwrite the 

Re: Proposal to move website source to arrow-site, add automatic builds

2019-08-03 Thread Wes McKinney
I think that the project website and the project documentation are
currently distinct entities. The current Jekyll website is independent
from the Sphinx documentation project aside from a link to the
documentation from the website.

I am guessing that we would want to maintain some amount of separation
between the main site at arrow.apache.org and the code / format
documentation, at minimum because we may want to make documentation
available for multiple versions of the project (this has already been
cited as an issue -- when we release, we're overwriting the previous
version of the docs)

On Sat, Aug 3, 2019 at 11:33 AM Antoine Pitrou  wrote:
>
>
> I am concerned with this.  What happens if we happen to move part of the
> current site to e.g. the Sphinx docs in the Arrow repository (we already
> did that, so it's not theoretical)?
>
> More generally, I also think that any move towards separating website
> and code repo more will lead to an even less maintained website.
>
> Regards
>
> Antoine.
>
>
> Le 02/08/2019 à 22:39, Wes McKinney a écrit :
> > hi Neal,
> >
> > In general the improvements to the site sound good, and I agree with
> > moving the site into the apache/arrow-site repository.
> >
> > It sounds like a committer will have to volunteer a PAT for the Travis
> > CI settings in
> >
> > https://travis-ci.org/apache/arrow-site/settings
> >
> > Even though you can't get at such an environment variable there after
> > it's set, it could still technically be compromised. Personally I
> > wouldn't be comfortable having a token with "repo" scope out there. We
> > might need to think about this some more -- the general idea of making
> > it easier to deploy the website I'm totally on board with
> >
> > - Wes
> >
> >
> > On Fri, Aug 2, 2019 at 1:35 PM Neal Richardson
> >  wrote:
> >>
> >> Hi all,
> >> https://issues.apache.org/jira/browse/ARROW-5746 requested to move the
> >> source for https://arrow.apache.org out of `apache/arrow` due to the
> >> growing number of binary files (mostly images) there.
> >>
> >> https://issues.apache.org/jira/browse/ARROW-4473 requested
> >> improvements to the ability to make a test deploy of the website and
> >> noted challenges/bugs in trying to do this when the site `baseurl` is
> >> a subdirectory.
> >>
> >> On my fork of `arrow-site` [1] I have a solution to both. I created a
> >> `master` branch and copied the contents of the `site/` directory in
> >> `apache/arrow` to that, using `git filter-branch --prune-empty
> >> --subdirectory-filter site master` to preserve the commit history [2].
> >> Then I added a build script [3] that gets executed by Travis-CI [4].
> >>
> >> The script builds the Jekyll site and pushes it to a branch that gets
> >> published. On `apache/arrow-site`, commits to the `master` branch
> >> trigger a build of the Jekyll site and push the result to the
> >> `asf-site` branch. On forks, commits to `master` build the site and
> >> publish to the `gh-pages` branch, which can deploy to GitHub Pages.
> >>
> >> ## Features
> >>
> >> * Automatic building of the arrow.apache.org site whenever changes are
> >> made to the Jekyll source--no manual build step required.
> >> * Automatic building of a test site from your fork, which will enable
> >> reviewers to verify your changes without having to build and serve
> >> locally and trust that what works locally will work when deployed.
> >> * Relative URL problems are fixed: links work regardless of whether
> >> the "base URL" is top level or a subdirectory.
> >> * Reduced size of the core `apache/arrow` repository
> >> * Documentation publishing is not affected. Updating the contents of
> >> the `docs/` directory in the published `asf-site` branch can continue
> >> to happen by whatever other process. The automatic building and
> >> publishing of the Jekyll site does not overwrite the `docs/`
> >> directory.
> >>
> >> ## Usage
> >>
> >> Local development and serving of the Jekyll site is not affected by
> >> this build process--it works exactly the same as before, just located
> >> in the `arrow-site` repository instead of the `site/` directory of
> >> `apache/arrow`.
> >>
> >> To enable the automatic building on your fork, there are a couple of
> >> quick setup steps to enable GitHub Pages and Travis-CI, described here
> >> [5].
> >>
> >> In order set up the automatic deploy on `apache/arrow-site`, a
> >> committer will need to set a GITHUB_PAT there. I imagine there could
> >> be some hesitation to doing this, but it is safe because
> >>
> >> 1. Builds only happen on the master branch, and only committers can
> >> modify the master branch, so by accepting a patch to `master`, they're
> >> implicitly accepting a patch to `asf-site`
> >> 2. Malicious actors can't modify the build script in a pull request
> >> and use the token because Travis does "not provide [repository-setting
> >> environment variables] to untrusted builds, triggered by pull requests
> >> from another repository" [6]
> >> 3. 

Re: Proposal to move website source to arrow-site, add automatic builds

2019-08-03 Thread Antoine Pitrou


I am concerned with this.  What happens if we happen to move part of the
current site to e.g. the Sphinx docs in the Arrow repository (we already
did that, so it's not theoretical)?

More generally, I also think that any move towards separating website
and code repo more will lead to an even less maintained website.

Regards

Antoine.


Le 02/08/2019 à 22:39, Wes McKinney a écrit :
> hi Neal,
> 
> In general the improvements to the site sound good, and I agree with
> moving the site into the apache/arrow-site repository.
> 
> It sounds like a committer will have to volunteer a PAT for the Travis
> CI settings in
> 
> https://travis-ci.org/apache/arrow-site/settings
> 
> Even though you can't get at such an environment variable there after
> it's set, it could still technically be compromised. Personally I
> wouldn't be comfortable having a token with "repo" scope out there. We
> might need to think about this some more -- the general idea of making
> it easier to deploy the website I'm totally on board with
> 
> - Wes
> 
> 
> On Fri, Aug 2, 2019 at 1:35 PM Neal Richardson
>  wrote:
>>
>> Hi all,
>> https://issues.apache.org/jira/browse/ARROW-5746 requested to move the
>> source for https://arrow.apache.org out of `apache/arrow` due to the
>> growing number of binary files (mostly images) there.
>>
>> https://issues.apache.org/jira/browse/ARROW-4473 requested
>> improvements to the ability to make a test deploy of the website and
>> noted challenges/bugs in trying to do this when the site `baseurl` is
>> a subdirectory.
>>
>> On my fork of `arrow-site` [1] I have a solution to both. I created a
>> `master` branch and copied the contents of the `site/` directory in
>> `apache/arrow` to that, using `git filter-branch --prune-empty
>> --subdirectory-filter site master` to preserve the commit history [2].
>> Then I added a build script [3] that gets executed by Travis-CI [4].
>>
>> The script builds the Jekyll site and pushes it to a branch that gets
>> published. On `apache/arrow-site`, commits to the `master` branch
>> trigger a build of the Jekyll site and push the result to the
>> `asf-site` branch. On forks, commits to `master` build the site and
>> publish to the `gh-pages` branch, which can deploy to GitHub Pages.
>>
>> ## Features
>>
>> * Automatic building of the arrow.apache.org site whenever changes are
>> made to the Jekyll source--no manual build step required.
>> * Automatic building of a test site from your fork, which will enable
>> reviewers to verify your changes without having to build and serve
>> locally and trust that what works locally will work when deployed.
>> * Relative URL problems are fixed: links work regardless of whether
>> the "base URL" is top level or a subdirectory.
>> * Reduced size of the core `apache/arrow` repository
>> * Documentation publishing is not affected. Updating the contents of
>> the `docs/` directory in the published `asf-site` branch can continue
>> to happen by whatever other process. The automatic building and
>> publishing of the Jekyll site does not overwrite the `docs/`
>> directory.
>>
>> ## Usage
>>
>> Local development and serving of the Jekyll site is not affected by
>> this build process--it works exactly the same as before, just located
>> in the `arrow-site` repository instead of the `site/` directory of
>> `apache/arrow`.
>>
>> To enable the automatic building on your fork, there are a couple of
>> quick setup steps to enable GitHub Pages and Travis-CI, described here
>> [5].
>>
>> In order set up the automatic deploy on `apache/arrow-site`, a
>> committer will need to set a GITHUB_PAT there. I imagine there could
>> be some hesitation to doing this, but it is safe because
>>
>> 1. Builds only happen on the master branch, and only committers can
>> modify the master branch, so by accepting a patch to `master`, they're
>> implicitly accepting a patch to `asf-site`
>> 2. Malicious actors can't modify the build script in a pull request
>> and use the token because Travis does "not provide [repository-setting
>> environment variables] to untrusted builds, triggered by pull requests
>> from another repository" [6]
>> 3. Non-committers cannot access the Travis-CI settings to alter the
>> GITHUB_PAT (and even committers cannot view the value of the token
>> once it is set)
>> 4. IIUC there is still a manual action required to get the ASF to
>> update arrow.apache.org with the contents of the `asf-site` branch
>>
>> While it would be useful, it is not required that we enable automatic
>> deploy on `apache/arrow-site` in order to get benefit from this
>> proposal because this enables contributors to opt-in to deploying test
>> sites from their forks, and those tests sites will actually work.
>>
>> Let me know if you have any questions or concerns. If there are no
>> objections, then to proceed I'll need a committer to create an orphan
>> `master` branch on `apache/arrow-site`, and then I can make a pull
>> request to that, which we'd want to merge 

Re: Proposal to move website source to arrow-site, add automatic builds

2019-08-02 Thread Neal Richardson
Cool. Happy to defer the PAT on apache/arrow-site for now while we
evaluate options. As I mentioned, that's not a hard requirement for
this proposal--we still get benefit by having the source in arrow-site
and enabling contributors to auto-deploy test sites from their forks
if they're comfortable with storing a github token in Travis.

Neal

On Fri, Aug 2, 2019 at 1:40 PM Wes McKinney  wrote:
>
> hi Neal,
>
> In general the improvements to the site sound good, and I agree with
> moving the site into the apache/arrow-site repository.
>
> It sounds like a committer will have to volunteer a PAT for the Travis
> CI settings in
>
> https://travis-ci.org/apache/arrow-site/settings
>
> Even though you can't get at such an environment variable there after
> it's set, it could still technically be compromised. Personally I
> wouldn't be comfortable having a token with "repo" scope out there. We
> might need to think about this some more -- the general idea of making
> it easier to deploy the website I'm totally on board with
>
> - Wes
>
>
> On Fri, Aug 2, 2019 at 1:35 PM Neal Richardson
>  wrote:
> >
> > Hi all,
> > https://issues.apache.org/jira/browse/ARROW-5746 requested to move the
> > source for https://arrow.apache.org out of `apache/arrow` due to the
> > growing number of binary files (mostly images) there.
> >
> > https://issues.apache.org/jira/browse/ARROW-4473 requested
> > improvements to the ability to make a test deploy of the website and
> > noted challenges/bugs in trying to do this when the site `baseurl` is
> > a subdirectory.
> >
> > On my fork of `arrow-site` [1] I have a solution to both. I created a
> > `master` branch and copied the contents of the `site/` directory in
> > `apache/arrow` to that, using `git filter-branch --prune-empty
> > --subdirectory-filter site master` to preserve the commit history [2].
> > Then I added a build script [3] that gets executed by Travis-CI [4].
> >
> > The script builds the Jekyll site and pushes it to a branch that gets
> > published. On `apache/arrow-site`, commits to the `master` branch
> > trigger a build of the Jekyll site and push the result to the
> > `asf-site` branch. On forks, commits to `master` build the site and
> > publish to the `gh-pages` branch, which can deploy to GitHub Pages.
> >
> > ## Features
> >
> > * Automatic building of the arrow.apache.org site whenever changes are
> > made to the Jekyll source--no manual build step required.
> > * Automatic building of a test site from your fork, which will enable
> > reviewers to verify your changes without having to build and serve
> > locally and trust that what works locally will work when deployed.
> > * Relative URL problems are fixed: links work regardless of whether
> > the "base URL" is top level or a subdirectory.
> > * Reduced size of the core `apache/arrow` repository
> > * Documentation publishing is not affected. Updating the contents of
> > the `docs/` directory in the published `asf-site` branch can continue
> > to happen by whatever other process. The automatic building and
> > publishing of the Jekyll site does not overwrite the `docs/`
> > directory.
> >
> > ## Usage
> >
> > Local development and serving of the Jekyll site is not affected by
> > this build process--it works exactly the same as before, just located
> > in the `arrow-site` repository instead of the `site/` directory of
> > `apache/arrow`.
> >
> > To enable the automatic building on your fork, there are a couple of
> > quick setup steps to enable GitHub Pages and Travis-CI, described here
> > [5].
> >
> > In order set up the automatic deploy on `apache/arrow-site`, a
> > committer will need to set a GITHUB_PAT there. I imagine there could
> > be some hesitation to doing this, but it is safe because
> >
> > 1. Builds only happen on the master branch, and only committers can
> > modify the master branch, so by accepting a patch to `master`, they're
> > implicitly accepting a patch to `asf-site`
> > 2. Malicious actors can't modify the build script in a pull request
> > and use the token because Travis does "not provide [repository-setting
> > environment variables] to untrusted builds, triggered by pull requests
> > from another repository" [6]
> > 3. Non-committers cannot access the Travis-CI settings to alter the
> > GITHUB_PAT (and even committers cannot view the value of the token
> > once it is set)
> > 4. IIUC there is still a manual action required to get the ASF to
> > update arrow.apache.org with the contents of the `asf-site` branch
> >
> > While it would be useful, it is not required that we enable automatic
> > deploy on `apache/arrow-site` in order to get benefit from this
> > proposal because this enables contributors to opt-in to deploying test
> > sites from their forks, and those tests sites will actually work.
> >
> > Let me know if you have any questions or concerns. If there are no
> > objections, then to proceed I'll need a committer to create an orphan
> > `master` branch on 

Re: Proposal to move website source to arrow-site, add automatic builds

2019-08-02 Thread Wes McKinney
hi Neal,

In general the improvements to the site sound good, and I agree with
moving the site into the apache/arrow-site repository.

It sounds like a committer will have to volunteer a PAT for the Travis
CI settings in

https://travis-ci.org/apache/arrow-site/settings

Even though you can't get at such an environment variable there after
it's set, it could still technically be compromised. Personally I
wouldn't be comfortable having a token with "repo" scope out there. We
might need to think about this some more -- the general idea of making
it easier to deploy the website I'm totally on board with

- Wes


On Fri, Aug 2, 2019 at 1:35 PM Neal Richardson
 wrote:
>
> Hi all,
> https://issues.apache.org/jira/browse/ARROW-5746 requested to move the
> source for https://arrow.apache.org out of `apache/arrow` due to the
> growing number of binary files (mostly images) there.
>
> https://issues.apache.org/jira/browse/ARROW-4473 requested
> improvements to the ability to make a test deploy of the website and
> noted challenges/bugs in trying to do this when the site `baseurl` is
> a subdirectory.
>
> On my fork of `arrow-site` [1] I have a solution to both. I created a
> `master` branch and copied the contents of the `site/` directory in
> `apache/arrow` to that, using `git filter-branch --prune-empty
> --subdirectory-filter site master` to preserve the commit history [2].
> Then I added a build script [3] that gets executed by Travis-CI [4].
>
> The script builds the Jekyll site and pushes it to a branch that gets
> published. On `apache/arrow-site`, commits to the `master` branch
> trigger a build of the Jekyll site and push the result to the
> `asf-site` branch. On forks, commits to `master` build the site and
> publish to the `gh-pages` branch, which can deploy to GitHub Pages.
>
> ## Features
>
> * Automatic building of the arrow.apache.org site whenever changes are
> made to the Jekyll source--no manual build step required.
> * Automatic building of a test site from your fork, which will enable
> reviewers to verify your changes without having to build and serve
> locally and trust that what works locally will work when deployed.
> * Relative URL problems are fixed: links work regardless of whether
> the "base URL" is top level or a subdirectory.
> * Reduced size of the core `apache/arrow` repository
> * Documentation publishing is not affected. Updating the contents of
> the `docs/` directory in the published `asf-site` branch can continue
> to happen by whatever other process. The automatic building and
> publishing of the Jekyll site does not overwrite the `docs/`
> directory.
>
> ## Usage
>
> Local development and serving of the Jekyll site is not affected by
> this build process--it works exactly the same as before, just located
> in the `arrow-site` repository instead of the `site/` directory of
> `apache/arrow`.
>
> To enable the automatic building on your fork, there are a couple of
> quick setup steps to enable GitHub Pages and Travis-CI, described here
> [5].
>
> In order set up the automatic deploy on `apache/arrow-site`, a
> committer will need to set a GITHUB_PAT there. I imagine there could
> be some hesitation to doing this, but it is safe because
>
> 1. Builds only happen on the master branch, and only committers can
> modify the master branch, so by accepting a patch to `master`, they're
> implicitly accepting a patch to `asf-site`
> 2. Malicious actors can't modify the build script in a pull request
> and use the token because Travis does "not provide [repository-setting
> environment variables] to untrusted builds, triggered by pull requests
> from another repository" [6]
> 3. Non-committers cannot access the Travis-CI settings to alter the
> GITHUB_PAT (and even committers cannot view the value of the token
> once it is set)
> 4. IIUC there is still a manual action required to get the ASF to
> update arrow.apache.org with the contents of the `asf-site` branch
>
> While it would be useful, it is not required that we enable automatic
> deploy on `apache/arrow-site` in order to get benefit from this
> proposal because this enables contributors to opt-in to deploying test
> sites from their forks, and those tests sites will actually work.
>
> Let me know if you have any questions or concerns. If there are no
> objections, then to proceed I'll need a committer to create an orphan
> `master` branch on `apache/arrow-site`, and then I can make a pull
> request to that, which we'd want to merge without squashing in order
> to preserve the git history of the site from `apache/arrow`.
>
> Thanks,
> Neal
>
> [1] https://github.com/nealrichardson/arrow-site/
> [2] https://github.com/nealrichardson/arrow-site/commits/master
> [3] 
> https://github.com/nealrichardson/arrow-site/blob/master/build-and-deploy.sh
> [4] https://github.com/nealrichardson/arrow-site/blob/master/.travis.yml
> [5] 
> https://github.com/nealrichardson/arrow-site/tree/master#previewing-the-site
> [6] 
> 

Proposal to move website source to arrow-site, add automatic builds

2019-08-02 Thread Neal Richardson
Hi all,
https://issues.apache.org/jira/browse/ARROW-5746 requested to move the
source for https://arrow.apache.org out of `apache/arrow` due to the
growing number of binary files (mostly images) there.

https://issues.apache.org/jira/browse/ARROW-4473 requested
improvements to the ability to make a test deploy of the website and
noted challenges/bugs in trying to do this when the site `baseurl` is
a subdirectory.

On my fork of `arrow-site` [1] I have a solution to both. I created a
`master` branch and copied the contents of the `site/` directory in
`apache/arrow` to that, using `git filter-branch --prune-empty
--subdirectory-filter site master` to preserve the commit history [2].
Then I added a build script [3] that gets executed by Travis-CI [4].

The script builds the Jekyll site and pushes it to a branch that gets
published. On `apache/arrow-site`, commits to the `master` branch
trigger a build of the Jekyll site and push the result to the
`asf-site` branch. On forks, commits to `master` build the site and
publish to the `gh-pages` branch, which can deploy to GitHub Pages.

## Features

* Automatic building of the arrow.apache.org site whenever changes are
made to the Jekyll source--no manual build step required.
* Automatic building of a test site from your fork, which will enable
reviewers to verify your changes without having to build and serve
locally and trust that what works locally will work when deployed.
* Relative URL problems are fixed: links work regardless of whether
the "base URL" is top level or a subdirectory.
* Reduced size of the core `apache/arrow` repository
* Documentation publishing is not affected. Updating the contents of
the `docs/` directory in the published `asf-site` branch can continue
to happen by whatever other process. The automatic building and
publishing of the Jekyll site does not overwrite the `docs/`
directory.

## Usage

Local development and serving of the Jekyll site is not affected by
this build process--it works exactly the same as before, just located
in the `arrow-site` repository instead of the `site/` directory of
`apache/arrow`.

To enable the automatic building on your fork, there are a couple of
quick setup steps to enable GitHub Pages and Travis-CI, described here
[5].

In order set up the automatic deploy on `apache/arrow-site`, a
committer will need to set a GITHUB_PAT there. I imagine there could
be some hesitation to doing this, but it is safe because

1. Builds only happen on the master branch, and only committers can
modify the master branch, so by accepting a patch to `master`, they're
implicitly accepting a patch to `asf-site`
2. Malicious actors can't modify the build script in a pull request
and use the token because Travis does "not provide [repository-setting
environment variables] to untrusted builds, triggered by pull requests
from another repository" [6]
3. Non-committers cannot access the Travis-CI settings to alter the
GITHUB_PAT (and even committers cannot view the value of the token
once it is set)
4. IIUC there is still a manual action required to get the ASF to
update arrow.apache.org with the contents of the `asf-site` branch

While it would be useful, it is not required that we enable automatic
deploy on `apache/arrow-site` in order to get benefit from this
proposal because this enables contributors to opt-in to deploying test
sites from their forks, and those tests sites will actually work.

Let me know if you have any questions or concerns. If there are no
objections, then to proceed I'll need a committer to create an orphan
`master` branch on `apache/arrow-site`, and then I can make a pull
request to that, which we'd want to merge without squashing in order
to preserve the git history of the site from `apache/arrow`.

Thanks,
Neal

[1] https://github.com/nealrichardson/arrow-site/
[2] https://github.com/nealrichardson/arrow-site/commits/master
[3] https://github.com/nealrichardson/arrow-site/blob/master/build-and-deploy.sh
[4] https://github.com/nealrichardson/arrow-site/blob/master/.travis.yml
[5] https://github.com/nealrichardson/arrow-site/tree/master#previewing-the-site
[6] 
https://docs.travis-ci.com/user/environment-variables/#defining-variables-in-repository-settings