On 22Oct2018 0413, Victor Stinner wrote:
For code like "for name in os.listdir(): open(name): ...." (replace listdir with scandir if you want to get file metadata), the cache is useless, since the fresh string has to be converted to wchar_t* anyway, and the cache is destroyed at the end of the loop iteration, whereas the cache has never been used...
Agreed the cache is useless here, but since the listdir() result came in as wchar_t we could keep it that way (assuming we'd only be changing it to char), and then there wouldn't have to be a conversion when we immediately pass it back to open().
That said, I spent some time yesterday converting the importlib cache to use scandir and separate caches for dir/file (to avoid the stat calls) and it made very little overall difference. I have to assume the string manipulation dominates. (Making DirEntry lazily calculate its .path had a bigger impact. Also, I didn't try to make Windows flush its own stat cache, and accessing warm files is much faster than cold ones.)
I'm not saying that the cache is useless. I just doubt that it's so common that it really provide any performance benefit.
I think that it is mostly useless, but if we can transparently keep many strings "native" size, that will handle many of the useful cases such as the single-use pass-through scenario like above.
Cheers, Steve _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com