> Luca
>>
>>
>>
>> *From:* Enrico Minack
>> *Sent:* Thursday, October 26, 2023 15:33
>> *To:* dev
>> *Subject:* On adding applyInArrow to groupBy and cogroup
>>
>>
>>
>> Hi devs,
>>
>> PySpark allows to transform a DataFrame
aCanali/Miscellaneous/blob/master/Spark_Notes/Spark_MapInArrow.md
> )
>
>
>
> Cheers,
>
> Luca
>
>
>
> *From:* Enrico Minack
> *Sent:* Thursday, October 26, 2023 15:33
> *To:* dev
> *Subject:* On adding applyInArrow to groupBy and cogroup
>
>
&g
be found at
https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_MapInArrow.md
)
Cheers,
Luca
From: Enrico Minack
Sent: Thursday, October 26, 2023 15:33
To: dev
Subject: On adding applyInArrow to groupBy and cogroup
Hi devs,
PySpark allows to transform a DataFrame via
I'm definitely +1 to include this.
- It seems like an odd feature parity gap to have a map function but no
group apply function.
- There's currently no way to use large arrow types with applyInPandas,
which can lead to errors hitting the 2 GiB max string/binary array size. I
have a PR to Arrow
Hi devs,
PySpark allows to transform a |DataFrame| via Pandas *and* Arrow API:
df.mapInArrow(map_arrow, schema="...")
df.mapInPandas(map_pandas, schema="...")
For |df.groupBy(...)| and |df.groupBy(...).cogroup(...)|, there is
*only* a Pandas interface, no Arrow interface: