Using the solo thread pool is good to get a one-off coverage report. The one you linked to [3] can be used to talk about the state of pulp smash at that point in time. I spot checked the report and the areas that I knew were incorrectly reported as uncovered before are shown as covered in [3] so that is great.

For a permanent fix to the coverage problem, I recommend not putting the workaround in place on Jenkins. The test environment needs to run exactly the same as the production infrastructure. As one example, there could be meaningful problems that occur in forking celery environments which would not be discovered in the test environment which would not fork.

In talking with @asksol he indicates that this is not a known issue in celery. Can an issue with a simple reproducer be filed against celery and we can pick up the technical conversation on that bug? Maybe reply on this thread with a link?

Thank you for all the work that you've done on this issue. This is not an easy problem. I really appreciate your effort here.

-Brian


On 08/29/2016 08:40 PM, Elyezer Rezende wrote:
All,

During this month of August I've been working on making the coverage
report we have for Pulp Automation better. I got stuck many times during
the process but I've finally found something that helped improving it.

The current coverage report is not reliable since any of the Celery
tasks have its coverage data being captured. Because that we had some
guesses on what could be causing that, we suspected that for some reason
the coverage hook was not being installed and so the data not being
captured, also we suspected that coverage was not playing good on how
Celery forks its processes.

With those two guesses in mind, I've read Celery and Coverage source
code and also got some help from the community. This allowed me to
discover that Celery uses a forked multiprocessing library called
Billiard [1], knowing that then I started checking if coverage.py was
able to capture coverage data multiprocessing processes [2] and created
a sample Django + Celery application to test some assumptions. The first
one I tested was if coverage.py was really able to capture coverage data
from decorated functions (since celery tasks are decorated functions)
ran by the celery workers, the first try failed since I had to make a
call from a celery task and celery tasks was not being captured at all
which made also the simple decorated function data not be captured. So
my first assumption have failed and I got stuck one more time on the
coverage data not being captured issue. Continuing the research I found
the POOL_CLS configuration for the celery workers, that allowed me to
change how celery deals with concurrency and it happened that using
"solo" or "threads" the coverage data was finally captured.

Next I checked how coverage.py deals with the multiprocessing built-in
library and discovered that it does a patch [3]. That made me realize
that maybe something similar for the Billiard library will be necessary.
Since just changing the POOL_CLS config works and is easier than
creating a patch for the Billiard plugin to capture coverage data I
dropped this and will proceed only if necessary.

I checked Pulp code and updated the services that spawn celery processes
to use threads as the POOL_CLS (this also required to install the
threadpool Python package), then I tried to run the Pulp Smash
test_sync_publish for RPM over API, and it was passing before the change
and started failing due to celery workers not being able to proper
handle the process' signals since they are only received by the parent
process. I did not investigate more about why it was failing and gave
"solo" a try, it happened that it worked and I was able to proper
capture coverage data for the pulp.server.async.tasks module which was
not being captured before, finally!

I went ahead and ran all the Pulp Smash tests which produced the
coverage report [4], that has more captured data from the one available
on Jenkins. Now I need your help on identifying if there is any missing
data to be captured still. Please let me know if you think any module
should have more coverage data than presented by the updated report.

Finally I am about to send a PR to the pulp repository in order to
update the coverage hook with the changes needed in order to set the
POOL_CLS to solo. The pulp worker template [5] and the
pulp_resource_manager.service [6] need to be patched, again, please let
me know if I am missing something and should patch any other file.

PS.: I would like to thank all the help I received from Pulp developers
with all my newbie questions about Pulp design. A special thanks to Sean
who created the first version of the coverage data and provided some
initial input about what was missing, Brian who spent about an hour with
me explaining how Pulp workers and resource manager work and also to
provide some insights about what could be the cause for the coverage
data not being captured and Jeremy who listened me all the times I was
frustrated by being stuck. I hope this improved coverage report will
bring us value and the information we need in order to ensure Pulp is
well tested by Pulp Smash.

[1] https://github.com/celery/billiard
[2] http://coverage.readthedocs.io/en/coverage-4.2/subprocess.html
[3]
https://bitbucket.org/ned/coveragepy/src/257e52793fb0f28853ddca679f67b158107262bf/coverage/multiproc.py
[4] 
https://dl.dropboxusercontent.com/u/109792/pulp-2-11-coverage-report-20160829/index.html
[5] 
https://github.com/pulp/pulp/blob/master/server/pulp/server/async/manage_workers.py#L23-L25
[6] 
https://github.com/pulp/pulp/blob/master/server/usr/lib/systemd/system/pulp_resource_manager.service#L10-L12

--
Elyézer Rezende
Senior Quality Engineer
irc: elyezer


_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev


_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev

Reply via email to