Hi Joseph,
wow, interesting project! Would you be up to sharing some example input
that you are trying this on? I might be up to looking into why the
program is slower than on CPython, but I'd need a way that I can
actually run it with realistic input.
Cheers,
Carl Friedrich
On 13/02/2019 16:58, Joseph Reagle wrote:
On 2/13/19 10:42 AM, René Dudfield wrote:
you can run it as a daemon/server(for example a little flask app).
This optimization also works for cpython apps if you want to avoid
the startup/import time.
That would be a big change for a uncertain improvement, so I'm not willing to
go there yet. Performance is okay, but I want to see if I can improve it
further as a stand-alone.
Can the work be split up per xml file easily? Then perhaps
multiprocessing will work nicely for you.
Do you need to process all the files each time? Or can you avoid
work?
I've tried multiprocessing too, but it is slower. Parsing the XML can be done
in parallel but I suspect the overhead of multi-processing was the drag. I've
also thought about caching the XML parse trees, but serializing and reloading
pickles of unchanged parse trees seems slower than just parsing the XML anew.
Using lru_cache on text processing functions (e.g., removing accents) didn't
help either.
I haven't been able to find good examples of people using multiprocessing or
pypy for XML processing, perhaps this is why.
Thank you all for the suggestions!
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev