Re: On adding applyInArrow to groupBy and cogroup

2023-11-06 Thread Hyukjin Kwon
> Luca >> >> >> >> *From:* Enrico Minack >> *Sent:* Thursday, October 26, 2023 15:33 >> *To:* dev >> *Subject:* On adding applyInArrow to groupBy and cogroup >> >> >> >> Hi devs, >> >> PySpark allows to transform a DataFrame

Re: On adding applyInArrow to groupBy and cogroup

2023-11-03 Thread Abdeali Kothari
aCanali/Miscellaneous/blob/master/Spark_Notes/Spark_MapInArrow.md > ) > > > > Cheers, > > Luca > > > > *From:* Enrico Minack > *Sent:* Thursday, October 26, 2023 15:33 > *To:* dev > *Subject:* On adding applyInArrow to groupBy and cogroup > > &g

RE: On adding applyInArrow to groupBy and cogroup

2023-11-03 Thread Luca Canali
be found at https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_MapInArrow.md ) Cheers, Luca From: Enrico Minack Sent: Thursday, October 26, 2023 15:33 To: dev Subject: On adding applyInArrow to groupBy and cogroup Hi devs, PySpark allows to transform a DataFrame via

Re: On adding applyInArrow to groupBy and cogroup

2023-10-28 Thread Adam Binford
I'm definitely +1 to include this. - It seems like an odd feature parity gap to have a map function but no group apply function. - There's currently no way to use large arrow types with applyInPandas, which can lead to errors hitting the 2 GiB max string/binary array size. I have a PR to Arrow

On adding applyInArrow to groupBy and cogroup

2023-10-26 Thread Enrico Minack
Hi devs, PySpark allows to transform a |DataFrame| via Pandas *and* Arrow API: df.mapInArrow(map_arrow, schema="...") df.mapInPandas(map_pandas, schema="...") For |df.groupBy(...)| and |df.groupBy(...).cogroup(...)|, there is *only* a Pandas interface, no Arrow interface: