Thanks everyone for your input. I've created an issue for tracking the
increased sanity build time, but this should be treated as a separate
project.
https://github.com/apache/incubator-mxnet/issues/17945
In the meantime, to keep momentum going on the staggered build pipeline
project, please let
The docker cache images can be used by you. They're available in Dockerhub,
you just have to tweak the docker run method.
The thing is that the scripts CI uses has the intention that layers change
and thus the cache is used.
If you want to be able to change the layers, then you have to accept the
Sure. That's the fix for now.
But, I've noticed that when that's done and there's no process to enforce
upgrades and patching, these get really out of date and the problems
compound.
Plus, when I build locally using docker, I can never seem to get the
benefit of the cache. Or at least not in the
What about dependency pinning?
The cache should not be our method to do dependency pinning and
synchronization.
-Marco
Aaron Markham schrieb am Fr., 27. März 2020,
03:45:
> I'm dealing with a Ruby dep breaking the site build right now.
> I wish this would be on occasion that I choose, not when
I'm dealing with a Ruby dep breaking the site build right now.
I wish this would be on occasion that I choose, not when Ruby or x
dependency releases a new version. When the cache expires for Jekyll the
site won't publish anymore... And CI will be blocked for the website test.
If we built the base
Correct. But I'm surprised about 2:50min to pull down the images.
Maybe it makes sense to use ECR as mirror?
-Marco
Joe Evans schrieb am Do., 26. März 2020, 22:02:
> +1 on rebuilding the containers regularly without caching layers.
>
> We are both pulling down a bunch of docker layers (when do
+1 on rebuilding the containers regularly without caching layers.
We are both pulling down a bunch of docker layers (when docker pulls an
image) and then building a new container to run the sanity build in.
Pulling down all the layers is what is taking so long (2m50s.) Within the
docker build, all
The job which rebuilds the cache has a property where you can set whether
to rebuild the cache from scratch or not. You could duplicate that job,
disable publishing and enable rebuild. Then add an alarm to the result and
you should be golden.
-Marco
Lausen, Leonard schrieb am Do., 26. März 2020,
WRT Docker Cache: We need to add a mechanism to invalidate the cache and rebuild
the containers on a set schedule. The builds break too often and the breakage is
only detected when a contributor touches the Dockerfiles (manually causing cache
invalidation)
On Thu, 2020-03-26 at 16:06 -0400, Aaron
I think it is a good idea to do the sanity check first. Even at 10 minutes.
And also try to fix the docker cache situation, but those can be separate
tasks.
On Thu, Mar 26, 2020, 12:52 Marco de Abreu wrote:
> Jenkins doesn't load for me, so let me ask this way: are we actually
> rebuilding every
Jenkins doesn't load for me, so let me ask this way: are we actually
rebuilding every single time or do you mean the docker cache? Pulling the
cache should only take a few seconds from my experience - docker build
should be a no-op in most cases.
-Marco
Joe Evans schrieb am Do., 26. März 2020,
The sanity-lint check pulls a docker image cache, builds a new container
and runs inside. The docker setup is taking around 3 minutes, at least:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fsanity/detail/master/1764/pipeline/39
We could improve this by not h
Do you know what's driving the duration for sanity? It used to be 50 sec
execution and 60 sec preparation.
-Marco
Joe Evans schrieb am Do., 26. März 2020, 20:31:
> Thanks Marco and Aaron for your input.
>
> > Can you show by how much the duration will increase?
>
> The average sanity build time
Thanks Marco and Aaron for your input.
> Can you show by how much the duration will increase?
The average sanity build time is around 10min, while the average build time
for unix-cpu is about 2 hours, so the entire build pipeline would increase
by 2 hours if we required both unix-cpu and sanity t
Back then I have created a system which exports all Jenkins results to
cloud watch. It does not include individual test results but rather stages
and jobs. The data for the sanity check should be available there.
Something I'd also be curious about is the percentage of the failures in
one run. Spe
+1 for sanity check - that's fast.
-1 for unix-cpu - that's slow and can just hang.
So my suggestion would be to see the data apart - what's the failure
rate on the sanity check and the unix-cpu? Actually, can we get a
table of all of the tests with this data?!
If the sanity check fails... let's s
We had this structure in the past and the community was bothered by CI
taking more time, thus we moved to the current model with everything
parallelized. We'd basically revert that then.
Can you show by how much the duration will increase?
Also, we have zero test parallelisation, speak we are run
Hi,
First, I just wanted to introduce myself to the MXNet community. I’m Joe
and will be working with Chai and the AWS team to improve some issues
around MXNet CI. One of our goals is to reduce the costs associated with
running MXNet CI. The task I’m working on now is this issue:
https://github
18 matches
Mail list logo