Re: [Wikitech-l] Stuck/Missing Grid Job for tools.william-avery-bot
Thanks Bryan, It's now resumed it's not particularly critical task: https://www.wikidata.org/wiki/Special:Contributions/William_Avery_Bot Will On Fri, 26 Mar 2021 at 21:45, Bryan Davis wrote: > On Fri, Mar 26, 2021 at 3:27 PM William Avery > wrote: > > > > Hi, > > > > I got the email below telling me that my cron job running as > william-avery-bot had throw an error, and I noticed that the Grid job that > it kicks off hasn't run since. > > > > I tried deleting the job using the instructions at > https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Stopping_jobs_with_%E2%80%98qdel%E2%80%99_and_%E2%80%98jstop%E2%80%99 > but it appeared "stuck". > > I have "force deleted" your job using my Toolforge admin rights. > > $ sudo qdel -f 749 > root forced the deletion of job 749 > > The Toolforge grid engine had numerous problems yesterday which led to > the scheduler losing track of the state of many jobs. Brooke did > several rounds of looking for these and cleaning the queue state, but > obviously yours was not cleaned up in that process. Thank you for your > report, and I hope you can get your tool back into its proper working > state. > > Bryan > -- > Bryan Davis Technical Engagement Wikimedia Foundation > Principal Software Engineer Boise, ID USA > [[m:User:BDavis_(WMF)]] irc: bd808 > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Stuck/Missing Grid Job for tools.william-avery-bot
On Fri, Mar 26, 2021 at 3:27 PM William Avery wrote: > > Hi, > > I got the email below telling me that my cron job running as > william-avery-bot had throw an error, and I noticed that the Grid job that it > kicks off hasn't run since. > > I tried deleting the job using the instructions at > https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Stopping_jobs_with_%E2%80%98qdel%E2%80%99_and_%E2%80%98jstop%E2%80%99 > but it appeared "stuck". I have "force deleted" your job using my Toolforge admin rights. $ sudo qdel -f 749 root forced the deletion of job 749 The Toolforge grid engine had numerous problems yesterday which led to the scheduler losing track of the state of many jobs. Brooke did several rounds of looking for these and cleaning the queue state, but obviously yours was not cleaned up in that process. Thank you for your report, and I hope you can get your tool back into its proper working state. Bryan -- Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Stuck/Missing Grid Job for tools.william-avery-bot
Hi, I got the email below telling me that my cron job running as william-avery-bot had throw an error, and I noticed that the Grid job that it kicks off hasn't run since. I tried deleting the job using the instructions at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Stopping_jobs_with_%E2%80%98qdel%E2%80%99_and_%E2%80%98jstop%E2%80%99 but it appeared "stuck". "qstat -xml" outputs the following: http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/schemas/qstat/qstat.xsd "> 749 0.25319 cron-TaxonbarSyncerBot tools.william-avery-bot dr 2021-03-25T17:49:16 task@tools-sgeexec-0916.tools.eqiad.wmflabs 1 But when I ssh to tools-sgeexec-0916.tools.eqiad.wmflabs I see no sign of any processes under tools.william-avery-bot, except the ones associated with my interactive session. Can anyone help resolve this or advise of a venue to raise it? Thanks in advance, Will -- Forwarded message - From: Cron Daemon Date: Thu, 25 Mar 2021 at 16:49 Subject: Cron /usr/bin/jsub -N cron-TaxonbarSyncerBot -once -quiet ~/TaxonbarSyncerBot.sh To: error: commlib error: got select error (Connection refused) error: unable to send message to qmaster using port 6444 on host "tools-sgegrid-shadow.tools.eqiad1.wikimedia.cloud": got send error Traceback (most recent call last): File "/usr/bin/job", line 48, in root = xml.etree.ElementTree.fromstring(proc.stdout.read()) File "/usr/lib/python3.5/xml/etree/ElementTree.py", line 1345, in XML return parser.close() xml.etree.ElementTree.ParseError: no element found: line 1, column 0 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l