Re: [Wikitech-l] Stuck/Missing Grid Job for tools.william-avery-bot

2021-03-26 Thread William Avery
Thanks Bryan,

It's now resumed it's not particularly critical task:
https://www.wikidata.org/wiki/Special:Contributions/William_Avery_Bot

Will

On Fri, 26 Mar 2021 at 21:45, Bryan Davis  wrote:

> On Fri, Mar 26, 2021 at 3:27 PM William Avery 
> wrote:
> >
> > Hi,
> >
> > I got the email below telling me that my cron job running as
> william-avery-bot had throw an error, and I noticed that the Grid job that
> it kicks off hasn't run since.
> >
> > I tried deleting the job using the instructions at
> https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Stopping_jobs_with_%E2%80%98qdel%E2%80%99_and_%E2%80%98jstop%E2%80%99
> but it appeared "stuck".
>
> I have "force deleted" your job using my Toolforge admin rights.
>
>   $ sudo qdel -f 749
>   root forced the deletion of job 749
>
> The Toolforge grid engine had numerous problems yesterday which led to
> the scheduler losing track of the state of many jobs. Brooke did
> several rounds of looking for these and cleaning the queue state, but
> obviously yours was not cleaned up in that process. Thank you for your
> report, and I hope you can get your tool back into its proper working
> state.
>
> Bryan
> --
> Bryan Davis  Technical Engagement  Wikimedia Foundation
> Principal Software Engineer   Boise, ID USA
> [[m:User:BDavis_(WMF)]]  irc: bd808
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Stuck/Missing Grid Job for tools.william-avery-bot

2021-03-26 Thread Bryan Davis
On Fri, Mar 26, 2021 at 3:27 PM William Avery  wrote:
>
> Hi,
>
> I got the email below telling me that my cron job running as 
> william-avery-bot had throw an error, and I noticed that the Grid job that it 
> kicks off hasn't run since.
>
> I tried deleting the job using the instructions at 
> https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Stopping_jobs_with_%E2%80%98qdel%E2%80%99_and_%E2%80%98jstop%E2%80%99
>  but it appeared "stuck".

I have "force deleted" your job using my Toolforge admin rights.

  $ sudo qdel -f 749
  root forced the deletion of job 749

The Toolforge grid engine had numerous problems yesterday which led to
the scheduler losing track of the state of many jobs. Brooke did
several rounds of looking for these and cleaning the queue state, but
obviously yours was not cleaned up in that process. Thank you for your
report, and I hope you can get your tool back into its proper working
state.

Bryan
-- 
Bryan Davis  Technical Engagement  Wikimedia Foundation
Principal Software Engineer   Boise, ID USA
[[m:User:BDavis_(WMF)]]  irc: bd808

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Stuck/Missing Grid Job for tools.william-avery-bot

2021-03-26 Thread William Avery
Hi,

I got the email below telling me that my cron job running as
william-avery-bot had throw an error, and I noticed that the Grid job that
it kicks off hasn't run since.

I tried deleting the job using the instructions at
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Stopping_jobs_with_%E2%80%98qdel%E2%80%99_and_%E2%80%98jstop%E2%80%99
but it appeared "stuck".

"qstat -xml" outputs the following:

http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/schemas/qstat/qstat.xsd
">
  

  749
  0.25319
  cron-TaxonbarSyncerBot
  tools.william-avery-bot
  dr
  2021-03-25T17:49:16
  task@tools-sgeexec-0916.tools.eqiad.wmflabs
  1

  
  
  


But when I ssh to tools-sgeexec-0916.tools.eqiad.wmflabs I see no sign of
any processes under tools.william-avery-bot, except the ones associated
with my interactive session.

Can anyone help resolve this or advise of a venue to raise it?

Thanks in advance,

Will

-- Forwarded message -
From: Cron Daemon 
Date: Thu, 25 Mar 2021 at 16:49
Subject: Cron  /usr/bin/jsub -N
cron-TaxonbarSyncerBot -once -quiet ~/TaxonbarSyncerBot.sh
To: 


error: commlib error: got select error (Connection refused)
error: unable to send message to qmaster using port 6444 on host
"tools-sgegrid-shadow.tools.eqiad1.wikimedia.cloud": got send error
Traceback (most recent call last):
  File "/usr/bin/job", line 48, in 
root = xml.etree.ElementTree.fromstring(proc.stdout.read())
  File "/usr/lib/python3.5/xml/etree/ElementTree.py", line 1345, in XML
return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l