great, thanks for the toolforge tips! for now i think i can go a long way hacking locally on datasets, even if it's a pain to get them. but i can totally see how toolforge is going to be perfect for anything i need to repeat great ecosystem! lots to learn
On Wed, 23 Dec 2020 at 10:59, AntiCompositeNumber < [email protected]> wrote: > Due to <https://phabricator.wikimedia.org/T188684>, PAWS isn't really > a good environment for long-running unattended or under-attended > tasks. You can often make it just about work, but as you noticed, the > memory limits can also make such tasks more difficult. > > Once I start to hit PAWS limits, I usually switch to Toolforge. I've > found that writing simple HTML to a file in the static directory > <https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Static_file_server > > > is usually a good intermediary between a PAWS notebook and writing a > full-blown webservice. You can easily open a Python container with > `webservice --backend=kubernetes python3.7 shell` and run things from > there. <https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python> has > more information. > > AntiCompositeNumber > > On Tue, Dec 22, 2020 at 4:47 PM Mat Kelcey <[email protected]> > wrote: > > > > Thanks Nicholas for the response, apologies this isn't threaded, I was > subscribed only to a daily digest. > > > > Here's a version of the notebook that (sometimes) shows the lost > connection problem. > > > https://public.paws.wmcloud.org/User:Mat_kelcey/timeout%20and%20OOM%20repro.ipynb > > > > It either fails directly with OOM or we lose connection to the server; I > think it's as simple as it being just a long running query with a large > result set. I'm thinking perhaps PAWS just isn't right for these types of > queries? Not sure what tuning I can do, re: PAWS config or the query > itself, I think I just need to learn more about other execution > environments. > > > > In any case I have a way of running the query with minimal > postprocessing that doesn't OOM, that I can write to disk and download to > my local machine to play with. That's fine for now as I poke around with > the dataset. > > > > Cheers, > > Mat > > > > > hi all! > > > > > > as part of task "Look into matching images of the same painting" > > > https://phabricator.wikimedia.org/T131553 > > > <https://phabricator.wikimedia.org/T131553> > > > i've been trying to reproduce some sql queries as described in > > > > https://github.com/multichill/toollabs/blob/master/bot/wikidata/find_painting_images.py > > > > > > where as usually these scripts would be running under toolforge (or > some > > > other bot execution environment i'm not sure of) i've been finding > these > > > long running queries timeout under PAWS > > > > > > does anyone have suggestions / examples for running queries such as > > > > http://tools.wmflabs.org/multichill/queries2/commons/paintings_without_wikidata_all.sql > > > under PAWS? > > > > > > cheers, > > > mat > > > ____ > > _______________________________________________ > > Wikitech-l mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >
_______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
