Re: Pandas string type

2020-06-18 Thread Adam Lippai
Thanks for the detailed answer. It's indeed 5-10% faster with the correct arguments you provided, but the performance is far from the categorical type based solution. I'll track the linked pandas issue. I'm not a C++ dev, but I'll be happy to test, benchmark or add docs. Best regards, Adam Lippai

Re: Pandas string type

2020-06-18 Thread Joris Van den Bossche
Hi Adam, On Wed, 17 Jun 2020 at 13:07, Adam Lippai wrote: > Hi, > > I was reading https://wesmckinney.com/blog/high-perf-arrow-to-pandas/ > where > Wes writes > > > "string or binary data would come with additional overhead while pandas > > continues to use Python objects in its memory represent

Pandas string type

2020-06-17 Thread Adam Lippai
Hi, I was reading https://wesmckinney.com/blog/high-perf-arrow-to-pandas/ where Wes writes > "string or binary data would come with additional overhead while pandas > continues to use Python objects in its memory representation" Pandas 1.0 introduced StringDType which I thought could help with