Re: Use Arrow instead of Pickle without pandas_udf
Thanks Bryan for the pointer +1 Hichame From: cutl...@gmail.com Sent: July 30, 2018 6:40 PM To: hich...@elkhalfi.com Cc: hol...@pigscanfly.ca; user@spark.apache.org Subject: Re: Use Arrow instead of Pickle without pandas_udf Here is a link to the JIRA for adding StructType support for scalar pandas_udf https://issues.apache.org/jira/browse/SPARK-24579 On Wed, Jul 25, 2018 at 3:36 PM, Hichame El Khalfi mailto:hich...@elkhalfi.com>> wrote: Hey Holden, Thanks for your reply, We currently using a python function that produces a Row(TS=LongType(), bin=BinaryType()). We use this function like this dataframe.rdd.map(my_function).toDF().write.parquet() To reuse it in pandas_udf, we changes the return type to StructType(StructField(Long), StructField(BinaryType). 1)But we face an issue that StructType is not supported by pandas_udf. So I was wondering to still continue to reuse dataftame.rdd.map but get an improvement in serialization by using ArrowFormat instead of Pickle. From: hol...@pigscanfly.ca<mailto:hol...@pigscanfly.ca> Sent: July 25, 2018 4:41 PM To: hich...@elkhalfi.com<mailto:hich...@elkhalfi.com> Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Use Arrow instead of Pickle without pandas_udf Not currently. What's the problem with pandas_udf for your use case? On Wed, Jul 25, 2018 at 1:27 PM, Hichame El Khalfi mailto:hich...@elkhalfi.com>> wrote: Hi There, Is there a way to use Arrow format instead of Pickle but without using pandas_udf ? Thank for your help, Hichame -- Twitter: https://twitter.com/holdenkarau
Re: Use Arrow instead of Pickle without pandas_udf
Here is a link to the JIRA for adding StructType support for scalar pandas_udf https://issues.apache.org/jira/browse/SPARK-24579 On Wed, Jul 25, 2018 at 3:36 PM, Hichame El Khalfi wrote: > Hey Holden, > Thanks for your reply, > > We currently using a python function that produces a Row(TS=LongType(), > bin=BinaryType()). > We use this function like this dataframe.rdd.map(my_function) > .toDF().write.parquet() > > To reuse it in pandas_udf, we changes the return type to > StructType(StructField(Long), StructField(BinaryType). > > 1)But we face an issue that StructType is not supported by pandas_udf. > > So I was wondering to still continue to reuse dataftame.rdd.map but get an > improvement in serialization by using ArrowFormat instead of Pickle. > > *From:* hol...@pigscanfly.ca > *Sent:* July 25, 2018 4:41 PM > *To:* hich...@elkhalfi.com > *Cc:* user@spark.apache.org > *Subject:* Re: Use Arrow instead of Pickle without pandas_udf > > Not currently. What's the problem with pandas_udf for your use case? > > On Wed, Jul 25, 2018 at 1:27 PM, Hichame El Khalfi > wrote: > >> Hi There, >> >> >> Is there a way to use Arrow format instead of Pickle but without using >> pandas_udf ? >> >> >> Thank for your help, >> >> >> Hichame >> > > > > -- > Twitter: https://twitter.com/holdenkarau >
Re: Use Arrow instead of Pickle without pandas_udf
Hey Holden, Thanks for your reply, We currently using a python function that produces a Row(TS=LongType(), bin=BinaryType()). We use this function like this dataframe.rdd.map(my_function).toDF().write.parquet() To reuse it in pandas_udf, we changes the return type to StructType(StructField(Long), StructField(BinaryType). 1)But we face an issue that StructType is not supported by pandas_udf. So I was wondering to still continue to reuse dataftame.rdd.map but get an improvement in serialization by using ArrowFormat instead of Pickle. From: hol...@pigscanfly.ca Sent: July 25, 2018 4:41 PM To: hich...@elkhalfi.com Cc: user@spark.apache.org Subject: Re: Use Arrow instead of Pickle without pandas_udf Not currently. What's the problem with pandas_udf for your use case? On Wed, Jul 25, 2018 at 1:27 PM, Hichame El Khalfi mailto:hich...@elkhalfi.com>> wrote: Hi There, Is there a way to use Arrow format instead of Pickle but without using pandas_udf ? Thank for your help, Hichame -- Twitter: https://twitter.com/holdenkarau
Re: Use Arrow instead of Pickle without pandas_udf
Not currently. What's the problem with pandas_udf for your use case? On Wed, Jul 25, 2018 at 1:27 PM, Hichame El Khalfi wrote: > Hi There, > > > Is there a way to use Arrow format instead of Pickle but without using > pandas_udf ? > > > Thank for your help, > > > Hichame > -- Twitter: https://twitter.com/holdenkarau
Use Arrow instead of Pickle without pandas_udf
Hi There, Is there a way to use Arrow format instead of Pickle but without using pandas_udf ? Thank for your help, Hichame