yannvgn <h...@yannvgn.io> added the comment:
> Indeed, it was not expected that the character set contains hundreds of > thousands items. What is its size in your real code? > Could you please show benchmarking results for different implementations and > different sizes? I can't precisely answer that, but sacremoses (a tokenization package) for example is strongly impacted. See https://github.com/alvations/sacremoses/issues/61#issuecomment-516401853 ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue37723> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com