[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2022-10-28 Thread thcipriani
thcipriani edited projects, added Release-Engineering-Team (Blocking 隣); 
removed Release-Engineering-Team (CI & Testing services), 
Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)).
thcipriani closed this task as "Resolved".
thcipriani added a comment.


  Calling this resolved. I took a look at a handful of selenium jobs for a few 
repos (core, wikibase), in each instance `INFO:backend.MySQL:Terminating MySQL` 
seems to take less than a second now. So either our iops quotas are tuned well, 
or one of the version changes of something is working well.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, thcipriani
Cc: aborrero, dcaro, Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, 
hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Jersione, Hellket777, 
LisafBia6531, Astuthiodit_1, 786, TheReadOnly, Biggs657, karapayneWMDE, 
Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, NebulousIris, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Neuronton, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2022-01-20 Thread hashar
hashar closed subtask T299211: Request increased quota for integration Cloud 
VPS project as Resolved.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: aborrero, dcaro, Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, 
hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 786, TheReadOnly, Suran38, 
Biggs657, Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, 
Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, 
NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, 
Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2022-01-14 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL (#wikimedia-releng) [2022-01-14T13:49:34Z]  
Restarting all CI Docker agents via Horizon to apply new flavor settings 
T265615  T299211 


TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, Stashbot
Cc: aborrero, dcaro, Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, 
hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 786, TheReadOnly, Suran38, 
Biggs657, Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, 
Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, 
NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, 
Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2022-01-14 Thread hashar
hashar added a subtask: T299211: Request increased quota for integration Cloud 
VPS project.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: aborrero, dcaro, Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, 
hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 786, TheReadOnly, Suran38, 
Biggs657, Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, 
Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, 
NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, 
Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2022-01-14 Thread dcaro
dcaro added a comment.


  I've added a burst configuration for 3x the iops for 10s for that flavor, 
might require stopping and starting (not restarting) the VMs:
  
root@cloudvirt1024:~# openstack --os-project-id integration flavor show 
g2.cores8.ram24.disk80.4xiops

++--+
| Field  | Value


|

++--+
| OS-FLV-DISABLED:disabled   | True 


|
| OS-FLV-EXT-DATA:ephemeral  | 0


|
| access_project_ids | integration  


|
| disk   | 80   


|
| id | c86647d3-d157-43fa-9849-2bd16ef27a5f 


|
| name   | g2.cores8.ram24.disk80.4xiops


|
| os-flavor-access:is_public | False


|
| properties | aggregate_instance_extra_specs:ceph='true', 
quota:disk_read_iops_sec='2', quota:disk_total_bytes_sec='8', 
quota:disk_write_iops_sec='2000', quota:disk_write_iops_sec_max='6', 
quota:disk_write_iops_sec_max_length='10' |
| ram| 24576


|
| rxtx_factor| 1.0  


|
| swap   |  


|
| vcpus  | 8


|

++--+
  
  Can you try that and report back if there was any change?
  We can try increasing the base limit instead if this does not work

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, dcaro
Cc: aborrero, dcaro, Aklapper, dduvall, bd808, jeena, Daimona, 

[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2022-01-14 Thread hashar
hashar added subscribers: dcaro, aborrero.
hashar added a comment.


  I have talked to @aborrero and @dcaro about the limits.  They are inclined to 
further tune the limit like has been done last year by Andrew ( T266777#6589231 
 ) and tentatively raise 
them.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: aborrero, dcaro, Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, 
hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 786, TheReadOnly, Suran38, 
Biggs657, Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, 
Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, 
NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, 
Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2022-01-13 Thread hashar
hashar added a comment.


  @awight potentially though `tmpfs` goes in memory and is expensive. A run 
with core + Wikibase and all the npm dependencies turn out to be around 9G iirc 
which is a lot to put in RAM.
  
  I was looking at a stall build today and it `iotop` definitely showed up the 
Docker daemon being IO busy. Ideally the container destroying should be done 
asynchronously rather than holding and preventing `docker run` from completing 
but that is how things are :/
  
  On the "hardware" side files are written to the instance "disk" which is 
backed up by Ceph. There is some IO throttling there. If we could find a way to 
make the underlying storage to be less synchronous or accept data loss, we 
would certainly save a bunch of time.  I have read some presentation about 
making CephFS (no idea whether we use that) which indicates all unlinks are 
synchronous, deleting the myriads of files installed by npm would definitely 
explain the slowness :-\

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, hashar, awight, 
kostajh, Lucas_Werkmeister_WMDE, 786, TheReadOnly, Suran38, Biggs657, 
Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, Beast1978, Un1tY, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, NebulousIris, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, 
Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2021-11-15 Thread awight
awight added a comment.


  This is just a walk-by comment so feel free to ignore, but didn't we have a 
nearly identical issue last year, and the solution was to put all these little 
files in a tmpfs which could simply be unmounted when the container is 
destroyed?  If this sounds like something to pursue, I could track down the 
details...

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, awight
Cc: Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, hashar, awight, 
kostajh, Lucas_Werkmeister_WMDE, 786, TheReadOnly, Suran38, Biggs657, 
Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, Beast1978, Un1tY, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, NebulousIris, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, 
Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2021-11-12 Thread hashar
hashar added a comment.


  I have checked in Docker and it does a `os.removeAll` which is waited for 
completion. If a container has huge amount of files that have to be deleted, 
that would file the memory cache for writes which is is known in Linux as the 
Dirty memory. From `proc(5)`:
  
  > **Dirty %lu**
  > Memory which is waiting to get written back to the disk.
  
  That is tunable via system settings:
  
name="sudo sysctl -ar 'dirty'|egrep -v "= 0$""
vm.dirty_background_ratio = 10
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200
  
  And In Graphite looking at a random agent using the metrics:
  
  | 
secondYAxis(integration.integration-agent-docker-1002.iostat.vda.writes_per_second)
 |
  | integration.integration-agent-docker-1002.memory.Dirty  
|
  |
  
  The Graphite graph 

 (green memory to be written, blue writes per seconds):
  
  F34743426: disk_write_dirty.png 
  
  I would assume the `unlinkat` are piled up in memory fil it up and at some 
point the cache is full and the unlinks are done synchronously?  Maybe if we 
raise the dirty cache limit (`vm.dirty_ratio`) we would get a larger buffer.
I don't know much about the arcane of Linux disk cache though :-\

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, hashar, awight, 
kostajh, Lucas_Werkmeister_WMDE, 786, TheReadOnly, Suran38, Biggs657, 
Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, Beast1978, Un1tY, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, NebulousIris, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, 
Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2021-11-12 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, Lucas_Werkmeister_WMDE
Cc: Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, hashar, awight, 
kostajh, Lucas_Werkmeister_WMDE, 786, TheReadOnly, Suran38, Biggs657, 
Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, Beast1978, Un1tY, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, NebulousIris, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, 
Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2021-11-12 Thread hashar
hashar added a comment.


  Pfff I am not focusing today.   The `INFO:backend.MySQL:Terminating MySQL` is 
the last message send by Quibble and it has indeed completed (at max roughly 2 
seconds after the message got written).  What is taking age and got thoroughly 
investigated previously is the container deletion. From the above message:
  
  > T265615#6586508 
  > essentially yeah the slowness is due to the file deletion taking ages. Some 
unlinkat calls even take 100 - 200 ms. We have too many files to delete and 
there is a lot of I/O contention.
  
  Which has lead to T266777  to 
raise the disk IO quota. I guess we should tune it up, unless Docker can be 
made to manage the files deletions asynchronously.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, hashar, awight, 
kostajh, Lucas_Werkmeister_WMDE, 786, TheReadOnly, Suran38, Biggs657, 
Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, Beast1978, Un1tY, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, NebulousIris, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, 
Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2021-11-12 Thread hashar
hashar added a comment.


  Quibble terminates MySQL via `quibble.backend.Backend.stop()` which does:
  
def stop(self):
if self.server is not None:
self.log.info('Terminating %s', self.__class__.__name__)
self.server.terminate()
try:
self.server.wait(2)
except subprocess.TimeoutExpired:
self.server.kill()  # SIGKILL
finally:
self.server = None
  
  `terminate` sends the `SIGTERM` signal. It might be ignored and the server 
can definitely take time to complete. We allow two seconds grace period 
(`wait(2)`) then it is send `SIGKILL` which really should terminate it abruptly 
no question asked.
  
  So I don't know why it would be holding :(  Maybe something is still running 
or holding :-\

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, hashar, awight, 
kostajh, Lucas_Werkmeister_WMDE, 786, TheReadOnly, Suran38, Biggs657, 
Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, Beast1978, Un1tY, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, NebulousIris, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, 
Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2021-09-23 Thread hashar
hashar reopened this task as "Open".
hashar added a comment.


  Sorry @Lucas_Werkmeister_WMDE , I have missed your comment. @kostajh raised 
it again today so reopening.
  
  21:26:14  hashar: does it seem normal for 
`INFO:backend.MySQL:Terminating MySQL` to take ~2 minutes? looking at 
https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/114519/console

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, hashar, awight, 
kostajh, Lucas_Werkmeister_WMDE, Suran38, Biggs657, Invadibot, Lalamarie69, 
maantietaja, Juan90264, Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, 
Kent7301, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, 
Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, 
Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2021-04-14 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  This seems to have been increasing again, especially in Selenium builds; some 
recent examples:
  
  - 3m29s in 
https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/88456/console
  - 12m25s(!) in 
https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/88759/console
  - 2m40s in 
https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/88987/console
  - 2m42s in 
https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-selenium-docker/43709/console
 (for WikibaseQualityConstraints)
  
  Though it isn’t always so bad. Not sure if this is worth a new task…

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, Lucas_Werkmeister_WMDE
Cc: Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, hashar, awight, 
kostajh, Lucas_Werkmeister_WMDE, Invadibot, maantietaja, Alter-paule, 
Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, 
NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, 
Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2021-04-13 Thread Andrew
Andrew closed subtask T266777: integration instances suffer from high IO 
latency due to Ceph as Resolved.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, Andrew
Cc: Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, hashar, awight, 
kostajh, Lucas_Werkmeister_WMDE, Invadibot, maantietaja, Alter-paule, 
Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, 
NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, 
Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-11-25 Thread hashar
hashar closed this task as "Resolved".
hashar added a comment.


  That has been more or less fixed by raising the newish IO quota for the 
integration instances.  It is still not ideal and enhancements will be made 
later on via the infrastructure task T266777 
.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, hashar, awight, 
kostajh, Lucas_Werkmeister_WMDE, Alter-paule, Beast1978, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, 
QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Liudvikas, 
Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-11-03 Thread hashar
hashar merged a task: T267075: Pipeline jobs freezing during teardown.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, hashar, awight, 
kostajh, Lucas_Werkmeister_WMDE, Alter-paule, Beast1978, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, 
QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Liudvikas, 
Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-11-03 Thread hashar
hashar merged a task: T267075: Pipeline jobs freezing during teardown.
hashar added subscribers: jeena, bd808, dduvall, Aklapper.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Aklapper, dduvall, bd808, jeena, Daimona, thcipriani, hashar, awight, 
kostajh, Lucas_Werkmeister_WMDE, Alter-paule, Beast1978, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, 
QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Liudvikas, 
Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-29 Thread hashar
hashar added a comment.


  @Lucas_Werkmeister_WMDE yes Wikibase kind of highlights the issue since it 
has so many files.  I have also noticed the disk slowness on various other jobs 
though, some started to timeout as a result due to the number of IO operations 
they perform.
  
  I thought about having the files to be written on the host via a bind mount, 
however we would still require to delete those files during the job,  and I 
guess that will still take a while to complete. It is just that we would do the 
`rm -fR` instead of Dockerd handling it for us.  So I don't think that would 
enhance anything.
  
  I am very confident that is related to the WMCS migration of storage to Ceph 
and filed the subtask T266777: integration instances suffer from high IO 
latency due to Ceph 

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-29 Thread hashar
hashar added a subtask: T266777: integration instances suffer from high IO 
latency due to Ceph.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-29 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  I guess Wikibase might be hit harder than other extensions by the deletion 
slowness due to a larger volume of files in the `node_modules` of subprojects 
like Wikidata Bridge?

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, Lucas_Werkmeister_WMDE
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-28 Thread gerritbot
gerritbot added a comment.


  Change 636908 **abandoned** by Hashar:
  [operations/puppet@production] ci: run docker with debug logging
  
  Reason:
  I have removed the patch from the integration puppet master and reloaded 
docker
  
  https://gerrit.wikimedia.org/r/636908

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, gerritbot
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-28 Thread hashar
hashar added a comment.


  I have found out the root cause which is in the underlying infrastructure. 
Will file a task about it tomorrow.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-28 Thread hashar
hashar added a comment.


  I have looked at it live by running strace profiling against `dockerd` while 
a container was being reaped:
  
  name=strace -C -e trace=file -f -p `pidof dockerd`
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 94.500.221674   2145394 11771 unlinkat
  3.410.0080002667 3 1 umount2
  1.710.0040001000 4   fchmodat
  0.290.000674   0 11782   openat
  0.090.000220   0 11774 3 newfstatat
  0.000.00   0 4   renameat
-- --- --- - - 
100.000.234568168961 11775 total
  
  And is not even a full deletion.  But essentially yeah the slowness is due to 
the file deletion taking ages. Some `unlinkat` calls even take 100 - 200 ms. We 
have too many files to delete and there is a lot of I/O contention.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-28 Thread hashar
hashar added a comment.


  So my best assumption is that the source code / package install etc are done 
inside the Docker container. Upon removal, Docker spends a lot of time to 
delete all those files which might explain the issue.  If that is true, that 
can be reproduced by spawning a container, writing a ton of files and check 
whether it takes a long time to get it removed.
  
  The `wmf-quibble-selenium-php72-docker` job has the following volumes:
  
--volume /srv/git:/srv/git:ro 
--volume 
/srv/jenkins/workspace/workspace/wmf-quibble-selenium-php72-docker/cache:/cache 
--volume 
/srv/jenkins/workspace/workspace/wmf-quibble-selenium-php72-docker/log:/workspace/log
 
  
  Quibble runs in `/workspace` and clones / install under `/workspace/src`. 
That is not mounted and everything thus happens in the container.   So I guess 
it might be all about mounting `/workspace` from the host and have a publisher 
to delete it once the job has completed.
  
  There is something off in the JJB macros.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-28 Thread hashar
hashar added a comment.


  Looks like the container is stuck in `removing` status. For a build with:
  
13:09:39 INFO:backend.MySQL:Terminating MySQL
13:10:33 [wmf-quibble-selenium-php72-docker] $ /bin/bash -xe 
/tmp/jenkins4716707043969520657.sh
  
  Running `docker container inspect` in a loop shows:
  
{
  "Status": "removing",
  "Running": false,
  "Paused": false,
  "Restarting": false,
  "OOMKilled": false,
  "Dead": true,
  "Pid": 0,
  "ExitCode": 0,
  "Error": "",
  "StartedAt": "2020-10-28T12:42:43.482992712Z",
  "FinishedAt": "2020-10-28T13:09:41.834741168Z"
}
  
  The 2 seconds difference is Quibble waiting for MySQL to terminate before 
issuing a SIGKILL.The inspect loop stopped at `13:10:33` when the container 
was no more known to Docker.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-28 Thread gerritbot
gerritbot added a comment.


  Change 636908 had a related patch set uploaded (by Hashar; owner: Hashar):
  [operations/puppet@production] ci: run docker with debug logging
  
  https://gerrit.wikimedia.org/r/636908

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, gerritbot
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-28 Thread hashar
hashar added a comment.


  No with child tracing (`-f`) and time spent in system call (`-T`). PID 23094 
invokes `epoll_pwait` which resumes after 3 seconds for whatever reason.
  
  name=strace -s 128 -tt -yy -T -f
[pid 23092] 10:21:51.923319 write(2, 
"time=\"2020-10-28T10:21:51Z\" level=debug msg=\"[hijack] End of stdout\"\n", 
69) = 69 <0.28>
[pid 23092] 10:21:51.923443 futex(0xc00032e848, FUTEX_WAKE_PRIVATE, 1 

[pid 23094] 10:21:51.923498 <... futex resumed> ) = 0 <0.146771>
[pid 23092] 10:21:51.923531 <... futex resumed> ) = 1 <0.64>
[pid 23094] 10:21:51.923631 epoll_pwait(4, [], 128, 
0, NULL, 0) = 0 <0.17>
[pid 23092] 10:21:51.923711 epoll_pwait(4,  

[pid 23094] 10:21:51.923772 epoll_pwait(4,  

[pid 23092] 10:21:51.923823 <... epoll_pwait resumed> [], 128, 0, NULL, 
824637311584) = 0 <0.71>
[pid 23092] 10:21:51.923866 futex(0xc00032e4c8, FUTEX_WAIT_PRIVATE, 0, NULL 

[pid 23086] 10:21:51.923916 <... futex resumed> ) = 0 <0.147012>
[pid 23086] 10:21:51.923964 nanosleep({tv_sec=0, tv_nsec=2}, NULL) = 0 
<0.000114>
[pid 23086] 10:21:51.924144 futex(0x55ab98437a90, FUTEX_WAIT_PRIVATE, 0, 
{tv_sec=60, tv_nsec=0}  
[pid 23094] 10:21:55.008992 <... epoll_pwait resumed> [{EPOLLIN|EPOLLOUT, 
{u32=2505785096, u64=140237482966792}}], 128, -1, NULL, 3) = 1 <3.085160>
[pid 23094] 10:21:55.009155 futex(0x55ab98437a90, FUTEX_WAKE_PRIVATE, 1) = 
1 <0.65>
[pid 23086] 10:21:55.009303 <... futex resumed> ) = 0 <3.085131>

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-28 Thread hashar
hashar added a comment.


  With `TINY_VERBOSITY=3` and `docker -D -l debug run`:
  
08:58:27 [INFO  tini (1)] Main child exited normally (with status '1')
08:58:28 time="2020-10-28T08:58:28Z" level=debug msg="[hijack] End of 
stdout"
08:58:38 Build step 'Execute shell' marked build as failure
08:58:38 Archiving artifacts
  
  I had strace attached to the `docker run` process:
  
lsof -p 25752
COMMAND   PID   USER   FD  TYPE DEVICE  SIZE/OFF
 NODE NAME
docker  25752 jenkins-deploy  cwd   DIR  253,1  4096   
788000 /srv/jenkins/workspace/workspace/integration-quibble-fullrun
docker  25752 jenkins-deploy  rtd   DIR  254,3  4096
2 /
docker  25752 jenkins-deploy  txt   REG  254,3  88967576   
396341 /usr/bin/docker
docker  25752 jenkins-deploy  mem   REG  254,3 102803208   
262167 /var/cache/nscd/passwd
docker  25752 jenkins-deploy  mem   REG  254,3   1689360  
1048605 /lib/x86_64-linux-gnu/libc-2.24.so
docker  25752 jenkins-deploy  mem   REG  254,3 14640  
1048608 /lib/x86_64-linux-gnu/libdl-2.24.so
docker  25752 jenkins-deploy  mem   REG  254,3135440  
1048634 /lib/x86_64-linux-gnu/libpthread-2.24.so
docker  25752 jenkins-deploy  mem   REG  254,3153288  
1048601 /lib/x86_64-linux-gnu/ld-2.24.so
docker  25752 jenkins-deploy0r FIFO   0,10   0t0 
37856380 pipe
docker  25752 jenkins-deploy1w FIFO   0,10   0t0 
37856381 pipe
docker  25752 jenkins-deploy2w FIFO   0,10   0t0 
37856381 pipe
docker  25752 jenkins-deploy3u unix 0x9c0f4c584800   0t0 
37858369 type=STREAM
docker  25752 jenkins-deploy4u  a_inode   0,11 0
 7575 [eventpoll]
docker  25752 jenkins-deploy5u unix 0x9c0f40ca7400   0t0 
37848939 type=STREAM
docker  25752 jenkins-deploy6u unix 0x9c0f29f3e000   0t0 
37855997 type=STREAM
docker  25752 jenkins-deploy   63r FIFO   0,10   0t0 
37858367 pipe
  
  File descriptor 5 comes from container-shim and is the process output.  1 is 
stdout and 2 is stderr which are read by the Jenkins console.
  
  name=strace -tt -yy
# Python traceback
08:58:27.167448 read(537856385]>, 
"\2\0\0\0\0\0\4\262Traceback (most recent c"..., 32777) = 1210
08:58:27.167629 write(2, "Traceback (most recent call 
last"..., 1202) = 1202
08:58:27.168045 read(537856385]>, 0xc00060c000, 32777) = 
-1 EAGAIN (Resource temporarily unavailable)
08:58:27.168210 epoll_pwait(4, [], 128, 0, NULL, 0) 
= 0
08:58:27.168356 epoll_pwait(4, [{EPOLLIN|EPOLLOUT, 
{u32=2869452344, u64=140645868527160}}], 128, -1, NULL, 0) = 1
08:58:27.216842 futex(0x56457b0f20d0, FUTEX_WAKE_PRIVATE, 1) = 1
08:58:27.216992 futex(0x56457b0f1fd0, FUTEX_WAKE_PRIVATE, 1) = 1
# tiny output
08:58:27.217106 read(537856385]>, "\1\0\0\0\0\0\n\32[INFO  
tini (1)] Spawned"..., 32777) = 2594
08:58:27.217293 write(1, "[INFO  tini (1)] Spawned child 
p"..., 2586) = 2586
08:58:27.217471 read(537856385]>, 0xc00060c000, 32777) = 
-1 EAGAIN (Resource temporarily unavailable)

08:58:27.217603 epoll_pwait(4, [], 128, 0, NULL, 0) 
= 0
08:58:27.217800 epoll_pwait(4, 
[{EPOLLIN|EPOLLOUT|EPOLLHUP|EPOLLRDHUP, {u32=2869452344, 
u64=140645868527160}}], 128, -1, NULL, 0) = 1
08:58:28.230175 futex(0x56457b0f20d0, FUTEX_WAKE_PRIVATE, 1) = 1
08:58:28.230378 futex(0x56457b0f1fd0, FUTEX_WAKE_PRIVATE, 1) = 1
08:58:28.230586 read(537856385]>, "", 32777) = 0
08:58:28.231167 ioctl(2, TCGETS, 0xc0002286b4) = -1 ENOTTY 
(Inappropriate ioctl for device)

# Docker run debug message: time="2020-10-28T08:58:28Z" level=debug 
msg="[hijack] End of stdout"
08:58:28.231596 openat(AT_FDCWD, "/etc//localtime", O_RDONLY) = 
7
08:58:28.231833 read(7, 
"TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 127
08:58:28.232004 read(7, "", 4096) = 0
08:58:28.232167 close(7) = 0
08:58:28.232403 write(2, "time=\"2020-10-28T08:58:28Z\" 
leve"..., 69) = 69

08:58:28.232624 futex(0xc0004f2148, FUTEX_WAKE_PRIVATE, 1) = 1
08:58:28.232860 epoll_pwait(4, [], 128, 0, NULL, 
824640192576) = 0
08:58:28.233009 futex(0x56457b0f2d88, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
08:58:31.637180 futex(0x56457b0f2d88, FUTEX_WAIT_PRIVATE, 0, NULL) = 0

# five seconds delay??

08:58:36.637075 futex(0x56457b0f20c0, FUTEX_WAKE_PRIVATE, 1) = 0
08:58:36.637223 futex(0x56457b0f1fd0, FUTEX_WAKE_PRIVATE, 1) = 1
08:58:36.641284 futex(0x56457b0f2d88, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 
EAGAIN (Resource temporarily unavailable)
08:58:36.641482 futex(0x56457b0f20e8, FUTEX_WAKE_PRIVATE, 1) = 0
08:58:36.641624 futex(0x56457b0f1fd0, FUTEX_WAKE_PRIVATE, 1) = 1

[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-28 Thread gerritbot
gerritbot added a comment.


  Change 634481 **abandoned** by Hashar:
  [integration/quibble@master] (DO NOT SUBMIT) test dummy full run
  
  Reason:
  
  https://gerrit.wikimedia.org/r/634481

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, gerritbot
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-27 Thread hashar
hashar added a comment.


  I have done various hacking here and there nothing conclusive.
  
  I have rolled back:
  
  - https://integration.wikimedia.org/ci/job/integration-quibble-fullrun/
  - Dockerd
  
  ---
  
  Dockerd running with `docker: true` gives a lot more logs in the journal.  At 
the end of the run, the container is reaped as far as dockerd knows about.  The 
issue seems to be betwen the docker run client and the docker daemon.
  
  Some extra finding: we run the containers with `--init`, that instructs 
Docker to spawn the container with `docker-init` as PID #1 and it is in charge 
of passing signals and killing all process / zombies etc.  Passing 
`TINI_VERBOSITY=3` to the container gives a bit more information. Example of me 
killing php:
  
13:37:57 INFO:backend.MySQL:Terminating MySQL

# expected 2 seconds wait

13:37:59 subprocess.CalledProcessError: Command '['php', 
'tests/phpunit/phpunit.php', '--group', 'Database', '--exclude-group', 
'Broken,ParserFuzz,Stub,Standalone', '--log-junit', 
'/workspace/log/junit-db.xml']' returned non-zero exit status -15
13:37:59 [INFO  tini (1)] Spawned child process '/src/utils/ci-fullrun.sh' 
with pid '7'
13:37:59 [DEBUG tini (1)] Received SIGCHLD


13:37:59 [DEBUG tini (1)] Reaped child with pid: '267'

# long wait which is the subject of this bug

13:38:05 + forensics
...
  
  `forensics` is a result of a `trap forensics EXIT` in the shell script that 
invokes `docker run`.
  
  The above definitely confirms to me that it is an issue with docker run not 
exiting immediately after the container has finished.  I have no idea how to 
further debug that though.   Maybe the docker command line client has some 
logging/debugging.  I looked for the sources but couldn't find them under 
https://github.com/moby/moby

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-27 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL (#wikimedia-releng) [2020-10-27T13:19:54Z]  Enabling 
puppet again on CI agent (that also causes Docker/containerd to be rolled back) 
# T265615 

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, Stashbot
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-27 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL (#wikimedia-releng) [2020-10-27T13:20:30Z]  Enable 
Docker daemon debug mode on integration-agent-docker-1001 via Horizon 
https://horizon.wikimedia.org/project/instances/837fff63-ed52-4c2f-823d-936801056fd2/
  # T265615 

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, Stashbot
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-27 Thread gerritbot
gerritbot added a comment.


  Change 636621 **merged** by jenkins-bot:
  [integration/config@master] jjb: disable workspace size collection
  
  https://gerrit.wikimedia.org/r/636621

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, gerritbot
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-27 Thread gerritbot
gerritbot added a comment.


  Change 636621 had a related patch set uploaded (by Hashar; owner: Hashar):
  [integration/config@master] jjb: disable workspace size collection
  
  https://gerrit.wikimedia.org/r/636621

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, gerritbot
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-26 Thread hashar
hashar added a comment.


  Same for Docker 19.03.13 / 1.3.7 . wmf-quibble-selenium-php72-docker builds 
still have a minute delay or so.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-26 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL (#wikimedia-releng) [2020-10-26T12:00:01Z]  
Upgrading docker-ce to 19.03.13 and containerd to 1.3.7 on 
integration-agent-docker-1001, integration-agent-docker-1002, 
integration-agent-docker-1003 and integration-agent-docker-1004 # T265615 


TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, Stashbot
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-26 Thread hashar
hashar added a comment.


  I have upgraded containerd from 1.2.10-3 to 1.2.13-2 on eight hosts but that 
unfortunately does not seem to help :-(

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-26 Thread hashar
hashar added a comment.


  For later reference:
  
11:23:24 INFO:backend.MySQL:Terminating MySQL
11:24:41 [wmf-quibble-selenium-php72-docker] $ /bin/bash -xe 
/tmp/jenkins937915457220888713.sh
  
  And in containerd the log shows the process got reaped properly?:
  
Oct 26 10:56:21 integration-agent-docker-1006 containerd[666]: 
time="2020-10-26T10:56:21.739785528Z" level=info msg="shim containerd-shim 
started" 
address="/containerd-shim/moby/4c219e28b278dad4397630db404d287e6a176c9203474347b0cc2b0abded/shim.sock"
 debug=false pid=22410
Oct 26 11:23:27 integration-agent-docker-1006 containerd[666]: 
time="2020-10-26T11:23:27.106361593Z" level=info msg="shim reaped" 
id=4c219e28b278dad4397630db404d287e6a176c9203474347b0cc2b0abded

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-26 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL (#wikimedia-releng) [2020-10-26T11:12:10Z]  
Upgrading containerd.io from 1.2.10 to 1.2.13-2  on 
integration-agent-docker-1005, integration-agent-docker-1006, 
integration-agent-docker-1007, integration-agent-docker-1008 # T265615 


TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, Stashbot
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-26 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL (#wikimedia-releng) [2020-10-26T11:03:32Z]  Bring 
back integration-agent-docker-1020 . It is not the only one affected by T265615 
  which probably rules out Ceph as 
the slowness root cause (T260916 )

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, Stashbot
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-26 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL (#wikimedia-releng) [2020-10-26T10:56:59Z]  
Upgrading containerd.io from 1.2.10 to 1.2.13-2  on 
integration-agent-docker-1001, integration-agent-docker-1002, 
integration-agent-docker-1003, integration-agent-docker-1004 # T265615 


TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar, Stashbot
Cc: Daimona, thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, 
Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-21 Thread hashar
hashar added a comment.


  I was watching integration-agent-docker-1004 which had 3 builds involving 
Quibble. I have opened pages using the Jenkins monitoring interface that lets 
one list processes and java threads.  As soon as one showed the delay I 
refreshed the two pages.
  
  On the list of processes at 
https://integration.wikimedia.org/ci/monitoring/nodes/integration-agent-docker-1004?part=processes
 I see:
  
  - three `docker run` corresponding to the three builds.
  - a bunch of old `docker run --rm --init -v 
/srv/jenkins/workspace/workspace:/workspaces 
docker-registry.wikimedia.org/releng/ci-stretch find /workspaces -mindepth 1 
-maxdepth 1 -type d -exec du -s {} ;`.  They come from a maintenance job 
written in Groovy used to collect the disk space used by workspaces.
  - containerd only has TWO containerd-shim
  
  So the containerized process has completed, it is no more present in the 
process table but somehow containerd hasn't noticed it, docker run did not 
receive the notification or ignored it.  Which well ... sounds like a Docker 
bug?
  
  We use Debian Stretch with:
  
  apt-cache madison docker-ce containerd.io
  
  | docker-ce | 5:19.03.5~3-0~debian-stretch | 
http://apt.wikimedia.org/wikimedia stretch-wikimedia/thirdparty/ci amd64 
Packages |
  | containerd.io | 1.2.10-3 | 
http://apt.wikimedia.org/wikimedia stretch-wikimedia/thirdparty/ci amd64 
Packages |
  |

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Alter-paule, 
Beast1978, Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-21 Thread hashar
hashar claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: thcipriani, hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Alter-paule, 
Beast1978, Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Liudvikas, Scott_WUaS, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-20 Thread hashar
hashar added a comment.


  Next fun, digging into Jenkins!
  
  I looked at 
https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/64820/
 which got affected by the same issue (all times are UTC):
  
  counterexample
18:08:34 [wmf-quibble-selenium-php72-docker] $ /bin/bash 
/tmp/jenkins5993728317798275711.sh
// Quibble output
18:34:14 INFO:backend.MySQL:Terminating MySQL
18:35:12 [wmf-quibble-selenium-php72-docker] $ /bin/bash -xe 
/tmp/jenkins5970126818379489927.sh
  
  It ran on integration-agent-docker-1009 for which I took a couple thread dump 
while the job was waiting using 
https://integration.wikimedia.org/ci/computer/integration-agent-docker-1009/systemInfo
  
  A couple thread dumps for the agent:
  
  F32409486: A.txt 
  
  F32409489: B.txt 
  
  See 
https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMjAvMTAvMjAvLS1CLnR4dC0tMjAtMzQtMTM7Oy0tQS50eHQtLTIwLTM0LTEz
  
  Not much standing out? There is a stdout copier thread for the script that 
runs Quibble (/tmp/jenkins5993728317798275711.sh). So I definitely took the 
thread dump while it was blocked and there is some Groovy scripts running.
  
  I also took two thread dumps of the Jenkins server:
  
  F32409499: server1.txt 
  
  F32409500: server2.txt 
  
  See also: 
https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMjAvMTAvMjAvLS1zZXJ2ZXIyLnR4dC0tMjAtMjctNTI7Oy0tc2VydmVyMS50eHQtLTIwLTI3LTUy;
  
  The later second dump has a thread:
  
line=5
"Running 
CpsFlowExecution[Owner[maintenance-sample-workspace-sizes/123736:maintenance-sample-workspace-sizes
 #123736]] / waiting for integration-agent-docker-1009 id=6114779" daemon 
prio=5 TIMED_WAITING
java.lang.Object.wait(Native Method)
hudson.remoting.Request.call(Request.java:176)
hudson.remoting.Channel.call(Channel.java:1000)

hudson.util.RemotingDiagnostics.executeGroovy(RemotingDiagnostics.java:111)
hudson.util.RemotingDiagnostics$executeGroovy.call(Unknown Source)

org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)

org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)

com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20)

com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:86)

com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)

com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)
sun.reflect.GeneratedMethodAccessor268.invoke(Unknown Source)

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)

com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)

com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:89)

com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)

com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixName(FunctionCallBlock.java:78)
sun.reflect.GeneratedMethodAccessor267.invoke(Unknown Source)

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)

com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
com.cloudbees.groovy.cps.Next.step(Next.java:83)
com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)

org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)

org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)
com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)

org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:185)

org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:400)

org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:96)

org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:312)

org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:276)

org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
java.util.concurrent.FutureTask.run(FutureTask.java:266)


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-20 Thread gerritbot
gerritbot added a comment.


  Change 634465 **abandoned** by Hashar:
  [integration/config@master] jjb: wmf job with quibble 44
  
  Reason:
  
  https://gerrit.wikimedia.org/r/634465

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: gerritbot
Cc: hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Alter-paule, Beast1978, 
Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, 
Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Liudvikas, Scott_WUaS, thcipriani, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-20 Thread hashar
hashar added a comment.


  After many attempts, I finally got a lead 
https://integration.wikimedia.org/ci/job/integration-quibble-fullrun/153/console
  
  The job executes a shell script utils/ci-fullrun.sh in 
integration/quibble.git, I have edited it to `echo CI FULLRUN completed` when 
the shell exit (using `trap`). Quibble has a bunch of extra logging added here 
and there, notably to list processes under /proc/.
  
  I have manually killed the php process running PHPUnit to trigger a fault. 
Times below are UTC hh:mm:ss
  
12:59:25 INFO:root:Left processes?
12:59:25 /proc/1/cmdline:/sbin/docker-init--/src/utils/ci-fullrun.sh--color
12:59:25 
/proc/25/cmdline:python3-s/tmp/quibble/bin/quibble--skipselenium--color
12:59:25 /proc/7/cmdline:/bin/bash/src/utils/ci-fullrun.sh--color
  
  We are left with 3 processes:
  
  - docker-init
  - ci-fullrun.sh which is the shell entry point used
  - quibble
  
12:59:25 INFO:root:Finished Quibble run
  
  Quibble has finished.
  
`
12:59:25 subprocess.CalledProcessError: Command '['php', 
'tests/phpunit/phpunit.php', '--testsuite', 'extensions', '--group', 
'Database', '--exclude-group', 'Broken,ParserFuzz,Stub,Standalone', 
'--log-junit', '/workspace/log/junit-db.xml']' returned non-zero exit status -15
  
  I have `kill` php hence the exit code `-15`.
  
12:59:25 + echo 'CI FULLRUN completed'
12:59:25 CI FULLRUN completed
  
  At this point the shell script being run inside the Docker container has 
completed.
  
12:59:44 Build step 'Execute shell' marked build as failure
  
  And it takes more than 18 seconds to complete :-\   I guess I am confident to 
state that Quibble is not the root cause. That seems to be either in Docker or 
in Jenkins itself.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Alter-paule, Beast1978, 
Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, 
Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Liudvikas, Scott_WUaS, thcipriani, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-16 Thread hashar
hashar added a comment.


  I tried various combination and I can not reproduce it.  Maybe it is race 
condition between us sending SIGKILL (`self.server.kill()`) and python reaping 
the processed on garbage collection when we set `self.server = None` :-\

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Alter-paule, Beast1978, 
Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, 
Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Liudvikas, Scott_WUaS, thcipriani, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-16 Thread hashar
hashar added a comment.


  Gave it a try with a bit more logging. One run with a 2 seconds delay which 
is too short and trigger a SIGKILL:
  
00:22:05.191 INFO:quibble.commands:Shutting down backends: 
00:22:05.191 INFO:backend.MySQL:Stopping MySQL
00:22:05.192 INFO:backend.MySQL:Terminating MySQL
00:22:05.192 INFO:backend.MySQL:Waiting up to two seconds
00:22:07.184 INFO:backend.MySQL:Timeout sending SIGKILL
00:22:07.184 INFO:backend.MySQL:self.server = None
00:22:07.184 INFO:backend.MySQL:Completed stop()
00:22:07.184 INFO:quibble.cmd:Out of context
00:22:07.185 INFO:root:Finished Quibble run
00:22:11.905 Archiving artifacts
  
  And after raising the timeout to issue a SIGTERM:
  
00:10:34.044 INFO:quibble.commands:Shutting down backends: 
00:10:34.044 INFO:backend.MySQL:Stopping MySQL
00:10:34.044 INFO:backend.MySQL:Terminating MySQL
00:10:34.044 INFO:backend.MySQL:Waiting up to five seconds
00:10:36.466 INFO:backend.MySQL:self.server = None
00:10:36.466 INFO:backend.MySQL:Completed stop()
00:10:36.466 INFO:quibble.cmd:Out of context
00:10:36.466 INFO:root:Finished Quibble run
00:10:39.529 Archiving artifacts
  
  Not any conclusive unfortunately :-\

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Alter-paule, Beast1978, 
Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, 
Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Liudvikas, Scott_WUaS, thcipriani, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-16 Thread gerritbot
gerritbot added a comment.


  Change 634481 had a related patch set uploaded (by Hashar; owner: Hashar):
  [integration/quibble@master] (DO NOT SUBMIT) test dummy full run
  
  https://gerrit.wikimedia.org/r/634481

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: gerritbot
Cc: hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Alter-paule, Beast1978, 
Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, 
Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Liudvikas, Scott_WUaS, thcipriani, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-16 Thread hashar
hashar added a comment.


  CI for `integration/quibble` does have a job that runs it entirely against 
mediawiki/core. The last few builds show more or less show the same issue:
  
00:11:36.278 INFO:backend.MySQL:Terminating MySQL
00:11:41.731 Archiving artifacts
  
00:11:09.274 INFO:backend.MySQL:Terminating MySQL
00:11:14.666 Archiving artifacts
  
00:13:52.197 INFO:backend.MySQL:Terminating MySQL
00:13:57.506 Archiving artifacts
  
  Aka more than two seconds?

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Alter-paule, Beast1978, 
Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, 
Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Liudvikas, Scott_WUaS, thcipriani, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-16 Thread hashar
hashar added a comment.


  To rule out Quibble 0.0.45, I have create a dummy job using 
docker-registry.wikimedia.org/releng/quibble-stretch-php72:0.0.44-s5. And it is 
still affected:
  
00:21:32.612 INFO:backend.MySQL:Terminating MySQL
00:22:36.029 [wmf-quibble44-selenium-php72-docker] $ /bin/bash -xe 
/tmp/jenkins2652394246537911623.sh
  
  So the logic is faulty somewhere or Quibble is being held by something else 
that does not show up in logs.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Alter-paule, Beast1978, 
Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, 
Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Liudvikas, Scott_WUaS, thcipriani, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-16 Thread gerritbot
gerritbot added a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: gerritbot
Cc: hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Alter-paule, Beast1978, 
Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, 
Nandana, NebulousIris, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Liudvikas, Scott_WUaS, thcipriani, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-16 Thread gerritbot
gerritbot added a comment.


  Change 634465 had a related patch set uploaded (by Hashar; owner: Hashar):
  [integration/config@master] jjb: wmf job with quibble 44
  
  https://gerrit.wikimedia.org/r/634465

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: gerritbot
Cc: hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Akuckartz, darthmon_wmde, 
Nandana, NebulousIris, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Liudvikas, Scott_WUaS, thcipriani, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-16 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL (#wikimedia-releng) [2020-10-16T07:48:42Z]  
Disabling integration-agent-docker-1020 (the sole agent using Ceph): it is too 
slow # T260916  T265615 


TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Stashbot
Cc: hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Akuckartz, darthmon_wmde, 
Nandana, NebulousIris, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Liudvikas, Scott_WUaS, thcipriani, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-16 Thread hashar
hashar added a comment.


  Looking at some random builds, it usually takes 15 / 20 seconds to start the 
next command which is a bug by itself:
  
00:17:39.940 INFO:backend.MySQL:Terminating MySQL
00:17:55.311 [wmf-quibble-selenium-php72-docker] $ /bin/bash -xe 
/tmp/jenkins4972097457850427826.sh
  
00:19:01.909 INFO:backend.MySQL:Terminating MySQL
00:19:20.063 [wmf-quibble-selenium-php72-docker] $ /bin/bash -xe 
/tmp/jenkins443040905686928605.sh
  
00:18:17.942 INFO:backend.MySQL:Terminating MySQL
00:18:34.254 [wmf-quibble-selenium-php72-docker] $ /bin/bash -xe 
/tmp/jenkins7269376008034149723.sh
  
  I have looked at that job duration trend on 
https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/buildTimeTrend
 and the job is constantly way too slow whenever it happens to run on 
`integration-agent-docker-1020`:
  
00:30:08.778 INFO:backend.MySQL:Terminating MySQL
00:31:21.208 [wmf-quibble-selenium-php72-docker] $ /bin/bash -xe 
/tmp/jenkins8569331040755739778.sh
  
00:38:54.981 INFO:backend.MySQL:Terminating MySQL
00:42:18.062 [wmf-quibble-selenium-php72-docker@2] $ /bin/bash -xe 
/tmp/jenkins6862799909867122709.sh
  
00:39:41.829 INFO:backend.MySQL:Terminating MySQL
00:44:44.184 [wmf-quibble-selenium-php72-docker] $ /bin/bash -xe 
/tmp/jenkins6896978991300300753.sh
  
00:31:37.581 INFO:backend.MySQL:Terminating MySQL
00:32:57.181 [wmf-quibble-selenium-php72-docker@2] $ /bin/bash -xe 
/tmp/jenkins6303269567405582114.sh
  
00:30:08.106 INFO:backend.MySQL:Terminating MySQL
00:31:22.239 [wmf-quibble-selenium-php72-docker] $ /bin/bash -xe 
/tmp/jenkins6647671750866068876.sh
  
  That agent runs on a WMCS host that uses Ceph for file backend (T260916 
). Which itself does not explain the 
problem since other agents are affected though in seconds, not minutes! I will 
unpoll it.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Akuckartz, darthmon_wmde, 
Nandana, NebulousIris, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Liudvikas, Scott_WUaS, thcipriani, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-15 Thread hashar
hashar triaged this task as "High" priority.
hashar added a comment.


  Will look at it tomorrow and try to at least reproduce the issue.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Akuckartz, darthmon_wmde, 
Nandana, NebulousIris, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Liudvikas, Scott_WUaS, thcipriani, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-15 Thread hashar
hashar added subscribers: kostajh, awight, hashar.
hashar edited projects, added Continuous-Integration-Infrastructure, 
Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)), 
Release-Engineering-Team (CI & Testing services); removed 
Continuous-Integration-Config.
hashar added a comment.


  The relevant code:
  
  name=quibble/backend.py
class BackendServer:
...
   def stop(self):
if self.server is not None:
self.log.info('Terminating %s', self.__class__.__name__)
self.server.terminate()
try:
self.server.wait(2)
except subprocess.TimeoutExpired:
self.server.kill()  # SIGKILL
finally:
self.server = None
  
  Where `self.server` is a Popen instance 
. So we 
basically send `SIGTERM` and have a busy loop waiting for two seconds which 
should then send `SIGKILL`.  Clearly that does not work as expected :-\
  
  The python documentation for `Popen.wait()` mentions it can deadlock when 
sending stdout/stderr to pipes:
  
  > Note
  >
  > This will deadlock when using stdout=PIPE or stderr=PIPE and the child 
process generates enough output to a pipe such that it blocks waiting for the 
OS pipe buffer to accept more data. Use Popen.communicate() when using pipes to 
avoid that.
  
  But we use /dev/null when we start the server:
  
self.server = subprocess.Popen([
'/usr/sbin/mysqld',  # fixme drop path
'--skip-networking',
'--datadir=%s' % self.rootdir,
'--log-error=%s' % self.errorlog,
'--pid-file=%s' % self.pidfile,
'--socket=%s' % self.socket,
],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
  
  Some stuff we can try:
  
  - reproduce it obviously
  - check whether MariaDB version has changed in the CI image
  - add some logging in the stop command (cause clearly 
`subprocess.TimeoutExpired` should be raised and cause a SIGKILL to be send)
  - add some option to capture MySQL stdout/stderr (using `stream_relay` which 
can consumes a PIPE and output it via `logging`).

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hashar
Cc: hashar, awight, kostajh, Lucas_Werkmeister_WMDE, Akuckartz, darthmon_wmde, 
Nandana, NebulousIris, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Liudvikas, Scott_WUaS, thcipriani, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g, Totolinototo3, Redabr4, Zanziii, Sadisticturd, 
A.S.Kochergin
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T265615: Terminating MySQL takes several minutes in (Wikibase?) CI jobs

2020-10-15 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE created this task.
Lucas_Werkmeister_WMDE added projects: Quibble, Continuous-Integration-Config, 
Wikidata.

TASK DESCRIPTION
  I’ve noticed that the “Terminating MySQL” step at the end of a CI build can 
take several minutes. For example, in this recent job 
:
  
14:42:01 INFO:quibble.commands:Shutting down backends: http://127.0.0.1:9412 /workspace/src>  
14:42:01 INFO:backend.ChromeWebDriver:Terminating ChromeWebDriver
14:42:01 INFO:backend.Xvfb:Terminating Xvfb
14:42:01 INFO:backend.DevWebServer:Terminating DevWebServer
14:42:01 INFO:quibble.commands:Shutting down backends: 
14:42:01 INFO:backend.MySQL:Terminating MySQL
14:46:16 [wmf-quibble-selenium-php72-docker@2] $ /bin/bash -xe 
/tmp/jenkins6976447998931934422.sh
  
  I have no idea what Quibble or MySQL were doing there, but I doubt that it 
was really worth four minutes(!) of CI time.
  
  So far, I’ve only noticed this in Wikibase jobs, but that might just be 
because I don’t often look at the CI output for other repositories.

TASK DETAIL
  https://phabricator.wikimedia.org/T265615

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lucas_Werkmeister_WMDE, Akuckartz, Totolinototo3, darthmon_wmde, Redabr4, 
Zanziii, Sadisticturd, Nandana, kostajh, A.S.Kochergin, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, hashar
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs