Dvorapa added a comment.
On many projects there already is some list of common fixes (see e.g. frwiki, cswiki or hewiki in https://www.wikidata.org/wiki/Q10957404 for WPCleaner or https://www.wikidata.org/wiki/Q6585066 for AWB). Pywikibot could read their syntax as well. I imagine something like (p
Dalba closed this task as "Resolved".Herald added a subscriber: Zoranzoki21.
TASK DETAILhttps://phabricator.wikimedia.org/T188232EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Xqt, DalbaCc: Zoranzoki21, gerritbot, Aklapper, Xqt, pywikibot-bugs-list, Giuliamoc
gerritbot added a comment.
Change 414640 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Enable -whitelist option
https://gerrit.wikimedia.org/r/414640TASK DETAILhttps://phabricator.wikimedia.org/T188232EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:
Xqt raised the priority of this task from "Normal" to "High".Xqt added a subscriber: Dalba.Xqt added a comment.
appveyor fails due to this problem:
https://ci.appveyor.com/project/Ladsgroup/pywikibot-g4xqx/build/1.0.156/job/cjwxkvunc50b0c66TASK DETAILhttps://phabricator.wikimedia.org/T78152EMAIL PR
zhuyifei1999 added a comment.
In T185561#4016642, @Dvorapa wrote:
The only option I know is to hard shutdown, neither switching to terminal interface, nor some magic key combinations work in this case.
Fix your OOM killer :). Hard shutdown is for when the kernel itself crashes (and in my case I
Dvorapa added a comment.
In T185561#4018322, @zhuyifei1999 wrote:
In T185561#4016642, @Dvorapa wrote:
I'll look into patches and test them tommorow. Thanks both Dalba and zhuyifei1999 for analysis and fixes. Is there something I can help with/test?
The script freeze was caused by a semaphore no
zhuyifei1999 added a comment.
In T185561#4018476, @Dvorapa wrote:
With https://gerrit.wikimedia.org/r/415771 there is still no difference for -repeat. It still behaves as I described in T185561#4015457
Please give me the traceback of:
The last printed traceback in the script's console/stderr.
Dvorapa added a comment.
I will, but I think this can be some error in the data file.
Currently I am trying to collect new data using https://gerrit.wikimedia.org/r/415771 and it works more smoothly, the memory consumption still slowly grows (from initial cca 200 MB to 500 MB now), but it seems to
zhuyifei1999 added a comment.
In T185561#4018687, @Dvorapa wrote:
It will when RAM consumption will be close to 100 %, but now on 90 % it works really fast.
That definitely sounds like an OOM condition.
Linux typically overcommits memory (see /proc/sys/vm/overcommit* in proc(5)), allowing more
gerritbot added a comment.
Change 415928 had a related patch set uploaded (by Zhuyifei1999; owner: Zhuyifei1999):
[pywikibot/core@master] [WIP] weblinkchecker: use shelve to persist data in disk real-time
https://gerrit.wikimedia.org/r/415928TASK DETAILhttps://phabricator.wikimedia.org/T185561EMAI
zhuyifei1999 added a comment.
^ The patch uses shelve for persistence, but memory is still increasing non-stop.TASK DETAILhttps://phabricator.wikimedia.org/T185561EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: zhuyifei1999Cc: gerritbot, Dalba, Xqt, Zoranzoki2
Dvorapa added a comment.
Linux typically...
Thank you!
From my current measurement:
$ python pwb.py weblinkchecker -start:! -ns:0 on freshly booted system
Python memory usage after start 200 MB
Worked quite smoothly for 3 hours only increasing memory usage
Python memory usage stopped on 1.1 GB
zhuyifei1999 added a comment.
Since the original mem_top measurement blamed Cookies, I looked into wheere they are referenced with guppy:
>>> h[11]
Partition of a set of 704 objects. Total size = 737792 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0704
Dvorapa added a comment.
That would confirm the original deadlocking problem fixed :)
For collecting phase probably yes! :)
But...
I tested the -repeat with weblink_dead_days set to 0 on newly created data file (containing freshly collected links from ! to Adam Lambert) and got stuck on [[.lb]]
zhuyifei1999 added a comment.
I captured two guppy heaps (same process, the second is captured later the the first):
>>> hpy().heap()
Partition of a set of 1071160 objects. Total size = 112056904 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 338785 32 357
Dvorapa added a comment.
In T185561#4018624, @zhuyifei1999 wrote:
In T185561#4018476, @Dvorapa wrote:
With https://gerrit.wikimedia.org/r/415771 there is still no difference for -repeat. It still behaves as I described in T185561#4015457
Please give me the traceback of:
The last printed trace
zhuyifei1999 added a comment.
Can you apply py-bt? If not, could you at least get the python symbols? See https://wiki.python.org/moin/DebuggingWithGdb. That is really hard to read :(TASK DETAILhttps://phabricator.wikimedia.org/T185561EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/pane
zhuyifei1999 added a comment.
I stripped the duplicated information. Threads 3 to 50 has exact same stack traces, and 1 & 2:
Thread 2 (Thread 0x7fd4147e8700 (LWP 6449)):
#0 0x7fd46d12b988 in read () from /usr/lib/libpthread.so.0
#1 0x7fd46ac4c390 in ?? () from /usr/lib/libcrypto.so.1.1
#
Dvorapa added a comment.
On Arch Linux there is no support for py-bt and also there are missing things like symbols completely. Arch's python package does not contain any debug things. I use gdb the first time in my life so I even don't know how to use it correctly. I can also run it from beginning
zhuyifei1999 added a comment.
In my attempt to isolate the cause of the memory leak, I applied this onto the above patch:
diff --git a/pywikibot/comms/http.py b/pywikibot/comms/http.py
index 76878ae..168b4a3 100644
--- a/pywikibot/comms/http.py
+++ b/pywikibot/comms/http.py
@@ -381,7 +381,7 @@ def
zhuyifei1999 added a comment.
In T185561#4019573, @Dvorapa wrote:
On Arch Linux there is no support for py-bt and also there are missing things like symbols completely. Arch's python package does not contain any debug things. I use gdb the first time in my life so I even don't know how to use it c
Dvorapa added a comment.
Or I can use pdb if needed (but it needs to log from the beginning too)
Strace output:
$ sudo strace -p 1797
[sudo] heslo pro pavel:
strace: Process 1797 attached
select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=610594}) = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {tv_sec
zhuyifei1999 added a comment.
That indeed looks like freezing, but without the gdb symbols it's hard to conclude what it's doing.TASK DETAILhttps://phabricator.wikimedia.org/T185561EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: zhuyifei1999Cc: gerritbot, Dalb
zhuyifei1999 added a comment.
In T185561#4019574, @zhuyifei1999 wrote:
Will check the contribution of each.
AFAICT, both the cookie change (session => requests) and the http method (GET => HEAD) significantly affects the rate of resident memory bloating.TASK DETAILhttps://phabricator.wikimedia.o
Dvorapa added a comment.
I installed Python 3.5 and repeat worked as expected, so this is definitelly a python 3.6 issue!TASK DETAILhttps://phabricator.wikimedia.org/T185561EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: zhuyifei1999, DvorapaCc: gerritbot, Dal
zhuyifei1999 removed zhuyifei1999 as the assignee of this task.zhuyifei1999 added a comment.
Too many technical debts half of the script is deprecated and unused; issues with concurrency and persistence.TASK DETAILhttps://phabricator.wikimedia.org/T185561EMAIL PREFERENCEShttps://phabricator.wik
gerritbot added a comment.
Change 415928 abandoned by Zhuyifei1999:
[WIP] weblinkchecker: use shelve to persist data in disk real-time
https://gerrit.wikimedia.org/r/415928TASK DETAILhttps://phabricator.wikimedia.org/T185561EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpre
zhuyifei1999 added a comment.
Oh, if someone has the time to provide a core dump, along with gdb symbols, when the script memory usage is really high, I might be, unlikely, able to figure something out the root cause. Right now I really lack the time to babysit the script.TASK DETAILhttps://phabric
28 matches
Mail list logo