Thanks for the detailed answer.
It's indeed 5-10% faster with the correct arguments you provided, but the
performance is far from the categorical type based solution.
I'll track the linked pandas issue. I'm not a C++ dev, but I'll be happy to
test, benchmark or add docs.
Best regards,
Adam Lippai
Hi Adam,
On Wed, 17 Jun 2020 at 13:07, Adam Lippai wrote:
> Hi,
>
> I was reading https://wesmckinney.com/blog/high-perf-arrow-to-pandas/
> where
> Wes writes
>
> > "string or binary data would come with additional overhead while pandas
> > continues to use Python objects in its memory represent
Hi,
I was reading https://wesmckinney.com/blog/high-perf-arrow-to-pandas/ where
Wes writes
> "string or binary data would come with additional overhead while pandas
> continues to use Python objects in its memory representation"
Pandas 1.0 introduced StringDType which I thought could help with