Re: Continuous benchmarking setup
I know the tool we are using for Python benchmarks is Python-specific -- it would be interesting to see if there's a way to ingest benchmark output (as JSON or some other output) from other programming languages. On Mon, May 14, 2018 at 8:56 AM, Brian Hulettewrote: > Is anyone aware of a way we could set up similar continuous benchmarks for > JS? We wrote some benchmarks earlier this year but currently have no > automated way of running them. > > Brian > > > > On 05/11/2018 08:21 PM, Wes McKinney wrote: >> >> Thanks Tom and Antoine! >> >> Since these benchmarks are literally running on a machine in my closet >> at home, there may be some downtime in the future. At some point we >> should document a process of setting up a new machine from scratch to >> be the nightly bare metal benchmark slave. >> >> - Wes >> >> On Fri, May 11, 2018 at 9:08 AM, Antoine Pitrou >> wrote: >>> >>> Hi again, >>> >>> Tom has configured the benchmarking machine to run and publish Arrow's >>> ASV-based benchmarks. The latest results can now be seen at: >>> https://pandas.pydata.org/speed/arrow/ >>> >>> I expect these are regenerated on a regular (daily?) basis. >>> >>> Thanks Tom :-) >>> >>> Regards >>> >>> Antoine. >>> >>> >>> On Wed, 11 Apr 2018 15:40:17 +0200 >>> Antoine Pitrou wrote: Hello With the following changes, it seems we might reach the point where we're able to run the Python-based benchmark suite accross multiple commits (at least the ones not anterior to those changes): https://github.com/apache/arrow/pull/1775 To make this truly useful, we would need a dedicated host. Ideally a (Linux) OS running on bare metal, with SMT/HyperThreading disabled. If running virtualized, the VM should have dedicated physical CPU cores. That machine would run the benchmarks on a regular basis (perhaps once per night) and publish the results in static HTML form somewhere. (note: nice to have in the future might be access to NVidia hardware, but right now there are no CUDA benchmarks in the Python benchmarks) What should be the procedure here? Regards Antoine. >
Re: Continuous benchmarking setup
Is anyone aware of a way we could set up similar continuous benchmarks for JS? We wrote some benchmarks earlier this year but currently have no automated way of running them. Brian On 05/11/2018 08:21 PM, Wes McKinney wrote: Thanks Tom and Antoine! Since these benchmarks are literally running on a machine in my closet at home, there may be some downtime in the future. At some point we should document a process of setting up a new machine from scratch to be the nightly bare metal benchmark slave. - Wes On Fri, May 11, 2018 at 9:08 AM, Antoine Pitrouwrote: Hi again, Tom has configured the benchmarking machine to run and publish Arrow's ASV-based benchmarks. The latest results can now be seen at: https://pandas.pydata.org/speed/arrow/ I expect these are regenerated on a regular (daily?) basis. Thanks Tom :-) Regards Antoine. On Wed, 11 Apr 2018 15:40:17 +0200 Antoine Pitrou wrote: Hello With the following changes, it seems we might reach the point where we're able to run the Python-based benchmark suite accross multiple commits (at least the ones not anterior to those changes): https://github.com/apache/arrow/pull/1775 To make this truly useful, we would need a dedicated host. Ideally a (Linux) OS running on bare metal, with SMT/HyperThreading disabled. If running virtualized, the VM should have dedicated physical CPU cores. That machine would run the benchmarks on a regular basis (perhaps once per night) and publish the results in static HTML form somewhere. (note: nice to have in the future might be access to NVidia hardware, but right now there are no CUDA benchmarks in the Python benchmarks) What should be the procedure here? Regards Antoine.
Re: Continuous benchmarking setup
Thanks Tom and Antoine! Since these benchmarks are literally running on a machine in my closet at home, there may be some downtime in the future. At some point we should document a process of setting up a new machine from scratch to be the nightly bare metal benchmark slave. - Wes On Fri, May 11, 2018 at 9:08 AM, Antoine Pitrouwrote: > > Hi again, > > Tom has configured the benchmarking machine to run and publish Arrow's > ASV-based benchmarks. The latest results can now be seen at: > https://pandas.pydata.org/speed/arrow/ > > I expect these are regenerated on a regular (daily?) basis. > > Thanks Tom :-) > > Regards > > Antoine. > > > On Wed, 11 Apr 2018 15:40:17 +0200 > Antoine Pitrou wrote: >> Hello >> >> With the following changes, it seems we might reach the point where >> we're able to run the Python-based benchmark suite accross multiple >> commits (at least the ones not anterior to those changes): >> https://github.com/apache/arrow/pull/1775 >> >> To make this truly useful, we would need a dedicated host. Ideally a >> (Linux) OS running on bare metal, with SMT/HyperThreading disabled. >> If running virtualized, the VM should have dedicated physical CPU cores. >> >> That machine would run the benchmarks on a regular basis (perhaps once >> per night) and publish the results in static HTML form somewhere. >> >> (note: nice to have in the future might be access to NVidia hardware, >> but right now there are no CUDA benchmarks in the Python benchmarks) >> >> What should be the procedure here? >> >> Regards >> >> Antoine. >> >
Re: Continuous benchmarking setup
Hi again, Tom has configured the benchmarking machine to run and publish Arrow's ASV-based benchmarks. The latest results can now be seen at: https://pandas.pydata.org/speed/arrow/ I expect these are regenerated on a regular (daily?) basis. Thanks Tom :-) Regards Antoine. On Wed, 11 Apr 2018 15:40:17 +0200 Antoine Pitrouwrote: > Hello > > With the following changes, it seems we might reach the point where > we're able to run the Python-based benchmark suite accross multiple > commits (at least the ones not anterior to those changes): > https://github.com/apache/arrow/pull/1775 > > To make this truly useful, we would need a dedicated host. Ideally a > (Linux) OS running on bare metal, with SMT/HyperThreading disabled. > If running virtualized, the VM should have dedicated physical CPU cores. > > That machine would run the benchmarks on a regular basis (perhaps once > per night) and publish the results in static HTML form somewhere. > > (note: nice to have in the future might be access to NVidia hardware, > but right now there are no CUDA benchmarks in the Python benchmarks) > > What should be the procedure here? > > Regards > > Antoine. >
Re: Continuous benchmarking setup
Currently, there are 3 snowflakes :) - Benchmark setup: https://github.com/TomAugspurger/asv-runner + Some setup to bootstrap a clean install with airflow, conda, asv, supervisor, etc. All the infrastructure around running the benchmarks. + Each project adds itself to the list of benchmarks, as in https://github.com/TomAugspurger/asv-runner/pull/3. Then things are re-deployed. Deployment requires ansible and an SSH key for the benchmark machine - Benchmark publishing: After running all the benchmarks, the results are collected and pushed to https://github.com/tomaugspurger/asv-collection - Benchmark hosting: A cron job on the server hosting pandas docs pulls https://github.com/tomaugspurger/asv-collection and serves them from the `/speed` directory. There are many things that could be improved on here, but I personally won't have time in the near term. Happy to assist though. On Mon, Apr 23, 2018 at 10:15 AM, Wes McKinney <wesmck...@gmail.com> wrote: > hi Tom -- is the publishing workflow for this documented someplace, or > available in a GitHub repo? We want to make sure we don't accumulate > any "snowflakes" in the development process. > > thanks! > Wes > > On Fri, Apr 13, 2018 at 8:36 AM, Tom Augspurger > <tom.augspurge...@gmail.com> wrote: > > They are run daily and published to http://pandas.pydata.org/speed/ > > > > > > > > From: Antoine Pitrou <anto...@python.org> > > Sent: Friday, April 13, 2018 4:28:11 AM > > To: dev@arrow.apache.org > > Subject: Re: Continuous benchmarking setup > > > > > > Nice! Are the benchmark results published somewhere? > > > > > > > > Le 13/04/2018 à 02:50, Tom Augspurger a écrit : > >> https://github.com/TomAugspurger/asv-runner/ is the setup for the > projects currently running. Adding arrow to https://github.com/ > TomAugspurger/asv-runner/blob/master/tests/full.yml might work. I'll have > to redeploy with the update. > >> > >> ________ > >> From: Wes McKinney <wesmck...@gmail.com> > >> Sent: Thursday, April 12, 2018 7:24:20 PM > >> To: dev@arrow.apache.org > >> Subject: Re: Continuous benchmarking setup > >> > >> hi Antoine, > >> > >> I have a bare metal machine at home (affectionately known as the > >> "pandabox") that's available via SSH that we've been using for > >> continuous benchmarking for other projects. Arrow is welcome to use > >> it. I can give you access to the machine if you would like. Hopefully, > >> we can suitably the process of setting up a continuous benchmarking > >> machine so that if we need to migrate to a new machine, it is not too > >> much of a hardship to do so. > >> > >> Thanks > >> Wes > >> > >> On Wed, Apr 11, 2018 at 9:40 AM, Antoine Pitrou <anto...@python.org> > wrote: > >>> > >>> Hello > >>> > >>> With the following changes, it seems we might reach the point where > >>> we're able to run the Python-based benchmark suite accross multiple > >>> commits (at least the ones not anterior to those changes): > >>> https://github.com/apache/arrow/pull/1775 > >>> > >>> To make this truly useful, we would need a dedicated host. Ideally a > >>> (Linux) OS running on bare metal, with SMT/HyperThreading disabled. > >>> If running virtualized, the VM should have dedicated physical CPU > cores. > >>> > >>> That machine would run the benchmarks on a regular basis (perhaps once > >>> per night) and publish the results in static HTML form somewhere. > >>> > >>> (note: nice to have in the future might be access to NVidia hardware, > >>> but right now there are no CUDA benchmarks in the Python benchmarks) > >>> > >>> What should be the procedure here? > >>> > >>> Regards > >>> > >>> Antoine. > >> >
Re: Continuous benchmarking setup
hi Tom -- is the publishing workflow for this documented someplace, or available in a GitHub repo? We want to make sure we don't accumulate any "snowflakes" in the development process. thanks! Wes On Fri, Apr 13, 2018 at 8:36 AM, Tom Augspurger <tom.augspurge...@gmail.com> wrote: > They are run daily and published to http://pandas.pydata.org/speed/ > > > > From: Antoine Pitrou <anto...@python.org> > Sent: Friday, April 13, 2018 4:28:11 AM > To: dev@arrow.apache.org > Subject: Re: Continuous benchmarking setup > > > Nice! Are the benchmark results published somewhere? > > > > Le 13/04/2018 à 02:50, Tom Augspurger a écrit : >> https://github.com/TomAugspurger/asv-runner/ is the setup for the projects >> currently running. Adding arrow to >> https://github.com/TomAugspurger/asv-runner/blob/master/tests/full.yml might >> work. I'll have to redeploy with the update. >> >> >> From: Wes McKinney <wesmck...@gmail.com> >> Sent: Thursday, April 12, 2018 7:24:20 PM >> To: dev@arrow.apache.org >> Subject: Re: Continuous benchmarking setup >> >> hi Antoine, >> >> I have a bare metal machine at home (affectionately known as the >> "pandabox") that's available via SSH that we've been using for >> continuous benchmarking for other projects. Arrow is welcome to use >> it. I can give you access to the machine if you would like. Hopefully, >> we can suitably the process of setting up a continuous benchmarking >> machine so that if we need to migrate to a new machine, it is not too >> much of a hardship to do so. >> >> Thanks >> Wes >> >> On Wed, Apr 11, 2018 at 9:40 AM, Antoine Pitrou <anto...@python.org> wrote: >>> >>> Hello >>> >>> With the following changes, it seems we might reach the point where >>> we're able to run the Python-based benchmark suite accross multiple >>> commits (at least the ones not anterior to those changes): >>> https://github.com/apache/arrow/pull/1775 >>> >>> To make this truly useful, we would need a dedicated host. Ideally a >>> (Linux) OS running on bare metal, with SMT/HyperThreading disabled. >>> If running virtualized, the VM should have dedicated physical CPU cores. >>> >>> That machine would run the benchmarks on a regular basis (perhaps once >>> per night) and publish the results in static HTML form somewhere. >>> >>> (note: nice to have in the future might be access to NVidia hardware, >>> but right now there are no CUDA benchmarks in the Python benchmarks) >>> >>> What should be the procedure here? >>> >>> Regards >>> >>> Antoine. >>
Re: Continuous benchmarking setup
Nice! Are the benchmark results published somewhere? Le 13/04/2018 à 02:50, Tom Augspurger a écrit : > https://github.com/TomAugspurger/asv-runner/ is the setup for the projects > currently running. Adding arrow to > https://github.com/TomAugspurger/asv-runner/blob/master/tests/full.yml might > work. I'll have to redeploy with the update. > > > From: Wes McKinney <wesmck...@gmail.com> > Sent: Thursday, April 12, 2018 7:24:20 PM > To: dev@arrow.apache.org > Subject: Re: Continuous benchmarking setup > > hi Antoine, > > I have a bare metal machine at home (affectionately known as the > "pandabox") that's available via SSH that we've been using for > continuous benchmarking for other projects. Arrow is welcome to use > it. I can give you access to the machine if you would like. Hopefully, > we can suitably the process of setting up a continuous benchmarking > machine so that if we need to migrate to a new machine, it is not too > much of a hardship to do so. > > Thanks > Wes > > On Wed, Apr 11, 2018 at 9:40 AM, Antoine Pitrou <anto...@python.org> wrote: >> >> Hello >> >> With the following changes, it seems we might reach the point where >> we're able to run the Python-based benchmark suite accross multiple >> commits (at least the ones not anterior to those changes): >> https://github.com/apache/arrow/pull/1775 >> >> To make this truly useful, we would need a dedicated host. Ideally a >> (Linux) OS running on bare metal, with SMT/HyperThreading disabled. >> If running virtualized, the VM should have dedicated physical CPU cores. >> >> That machine would run the benchmarks on a regular basis (perhaps once >> per night) and publish the results in static HTML form somewhere. >> >> (note: nice to have in the future might be access to NVidia hardware, >> but right now there are no CUDA benchmarks in the Python benchmarks) >> >> What should be the procedure here? >> >> Regards >> >> Antoine. >
Re: Continuous benchmarking setup
https://github.com/TomAugspurger/asv-runner/ is the setup for the projects currently running. Adding arrow to https://github.com/TomAugspurger/asv-runner/blob/master/tests/full.yml might work. I'll have to redeploy with the update. From: Wes McKinney <wesmck...@gmail.com> Sent: Thursday, April 12, 2018 7:24:20 PM To: dev@arrow.apache.org Subject: Re: Continuous benchmarking setup hi Antoine, I have a bare metal machine at home (affectionately known as the "pandabox") that's available via SSH that we've been using for continuous benchmarking for other projects. Arrow is welcome to use it. I can give you access to the machine if you would like. Hopefully, we can suitably the process of setting up a continuous benchmarking machine so that if we need to migrate to a new machine, it is not too much of a hardship to do so. Thanks Wes On Wed, Apr 11, 2018 at 9:40 AM, Antoine Pitrou <anto...@python.org> wrote: > > Hello > > With the following changes, it seems we might reach the point where > we're able to run the Python-based benchmark suite accross multiple > commits (at least the ones not anterior to those changes): > https://github.com/apache/arrow/pull/1775 > > To make this truly useful, we would need a dedicated host. Ideally a > (Linux) OS running on bare metal, with SMT/HyperThreading disabled. > If running virtualized, the VM should have dedicated physical CPU cores. > > That machine would run the benchmarks on a regular basis (perhaps once > per night) and publish the results in static HTML form somewhere. > > (note: nice to have in the future might be access to NVidia hardware, > but right now there are no CUDA benchmarks in the Python benchmarks) > > What should be the procedure here? > > Regards > > Antoine.
Re: Continuous benchmarking setup
hi Antoine, I have a bare metal machine at home (affectionately known as the "pandabox") that's available via SSH that we've been using for continuous benchmarking for other projects. Arrow is welcome to use it. I can give you access to the machine if you would like. Hopefully, we can suitably the process of setting up a continuous benchmarking machine so that if we need to migrate to a new machine, it is not too much of a hardship to do so. Thanks Wes On Wed, Apr 11, 2018 at 9:40 AM, Antoine Pitrouwrote: > > Hello > > With the following changes, it seems we might reach the point where > we're able to run the Python-based benchmark suite accross multiple > commits (at least the ones not anterior to those changes): > https://github.com/apache/arrow/pull/1775 > > To make this truly useful, we would need a dedicated host. Ideally a > (Linux) OS running on bare metal, with SMT/HyperThreading disabled. > If running virtualized, the VM should have dedicated physical CPU cores. > > That machine would run the benchmarks on a regular basis (perhaps once > per night) and publish the results in static HTML form somewhere. > > (note: nice to have in the future might be access to NVidia hardware, > but right now there are no CUDA benchmarks in the Python benchmarks) > > What should be the procedure here? > > Regards > > Antoine.
Continuous benchmarking setup
Hello With the following changes, it seems we might reach the point where we're able to run the Python-based benchmark suite accross multiple commits (at least the ones not anterior to those changes): https://github.com/apache/arrow/pull/1775 To make this truly useful, we would need a dedicated host. Ideally a (Linux) OS running on bare metal, with SMT/HyperThreading disabled. If running virtualized, the VM should have dedicated physical CPU cores. That machine would run the benchmarks on a regular basis (perhaps once per night) and publish the results in static HTML form somewhere. (note: nice to have in the future might be access to NVidia hardware, but right now there are no CUDA benchmarks in the Python benchmarks) What should be the procedure here? Regards Antoine.