great, thanks for the toolforge tips!
for now i think i can go a long way hacking locally on datasets, even if
it's a pain to get them.
but i can totally see how toolforge is going to be perfect for anything i
need to repeat
great ecosystem! lots to learn

On Wed, 23 Dec 2020 at 10:59, AntiCompositeNumber <
[email protected]> wrote:

> Due to <https://phabricator.wikimedia.org/T188684>, PAWS isn't really
> a good environment for long-running unattended or under-attended
> tasks. You can often make it just about work, but as you noticed, the
> memory limits can also make such tasks more difficult.
>
> Once I start to hit PAWS limits, I usually switch to Toolforge. I've
> found that writing simple HTML to a file in the static directory
> <https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Static_file_server
> >
> is usually a good intermediary between a PAWS notebook and writing a
> full-blown webservice. You can easily open a Python container with
> `webservice --backend=kubernetes python3.7 shell` and run things from
> there. <https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python> has
> more information.
>
> AntiCompositeNumber
>
> On Tue, Dec 22, 2020 at 4:47 PM Mat Kelcey <[email protected]>
> wrote:
> >
> > Thanks Nicholas for the response, apologies this isn't threaded, I was
> subscribed only to a daily digest.
> >
> > Here's a version of the notebook that (sometimes) shows the lost
> connection problem.
> >
> https://public.paws.wmcloud.org/User:Mat_kelcey/timeout%20and%20OOM%20repro.ipynb
> >
> > It either fails directly with OOM or we lose connection to the server; I
> think it's as simple as it being just a long running query with a large
> result set. I'm thinking perhaps PAWS just isn't right for these types of
> queries? Not sure what tuning I can do, re: PAWS config or the query
> itself, I think I just need to learn more about other execution
> environments.
> >
> > In any case I have a way of running the query with minimal
> postprocessing that doesn't OOM, that I can write to disk and download to
> my local machine to play with. That's fine for now as I poke around with
> the dataset.
> >
> > Cheers,
> > Mat
> >
> > > hi all!
> > >
> > > as part of task "Look into matching images of the same painting"
> > > https://phabricator.wikimedia.org/T131553
> > > <https://phabricator.wikimedia.org/T131553>
> > > i've been trying to reproduce some sql queries as described in
> > >
> https://github.com/multichill/toollabs/blob/master/bot/wikidata/find_painting_images.py
> > >
> > > where as usually these scripts would be running under toolforge (or
> some
> > > other bot execution environment i'm not sure of) i've been finding
> these
> > > long running queries timeout under PAWS
> > >
> > > does anyone have suggestions / examples for running queries such as
> > >
> http://tools.wmflabs.org/multichill/queries2/commons/paintings_without_wikidata_all.sql
> > > under PAWS?
> > >
> > > cheers,
> > > mat
> > > ____
> > _______________________________________________
> > Wikitech-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to