yannvgn <h...@yannvgn.io> added the comment:

> Indeed, it was not expected that the character set contains hundreds of 
> thousands items. What is its size in your real code?

> Could you please show benchmarking results for different implementations and 
> different sizes?

I can't precisely answer that, but sacremoses (a tokenization package) for 
example is strongly impacted. See 
https://github.com/alvations/sacremoses/issues/61#issuecomment-516401853

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37723>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to