Re: Use Arrow instead of Pickle without pandas_udf

2018-07-31 Thread Hichame El Khalfi
Thanks Bryan for the pointer +1

Hichame

From: cutl...@gmail.com
Sent: July 30, 2018 6:40 PM
To: hich...@elkhalfi.com
Cc: hol...@pigscanfly.ca; user@spark.apache.org
Subject: Re: Use Arrow instead of Pickle without pandas_udf


Here is a link to the JIRA for adding StructType support for scalar pandas_udf 
https://issues.apache.org/jira/browse/SPARK-24579


On Wed, Jul 25, 2018 at 3:36 PM, Hichame El Khalfi 
mailto:hich...@elkhalfi.com>> wrote:
Hey Holden,
Thanks for your reply,

We currently using a python function that produces a Row(TS=LongType(), 
bin=BinaryType()).
We use this function like this 
dataframe.rdd.map(my_function).toDF().write.parquet()

To reuse it in pandas_udf, we changes the return type to 
StructType(StructField(Long), StructField(BinaryType).

1)But we face an issue that StructType is not supported by pandas_udf.

So I was wondering to still continue to reuse dataftame.rdd.map but get an 
improvement in serialization by using ArrowFormat instead of Pickle.

From: hol...@pigscanfly.ca<mailto:hol...@pigscanfly.ca>
Sent: July 25, 2018 4:41 PM
To: hich...@elkhalfi.com<mailto:hich...@elkhalfi.com>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Use Arrow instead of Pickle without pandas_udf


Not currently. What's the problem with pandas_udf for your use case?

On Wed, Jul 25, 2018 at 1:27 PM, Hichame El Khalfi 
mailto:hich...@elkhalfi.com>> wrote:

Hi There,


Is there a way to use Arrow format instead of Pickle but without using 
pandas_udf ?


Thank for your help,


Hichame



--
Twitter: https://twitter.com/holdenkarau



Re: Use Arrow instead of Pickle without pandas_udf

2018-07-30 Thread Bryan Cutler
Here is a link to the JIRA for adding StructType support for scalar
pandas_udf https://issues.apache.org/jira/browse/SPARK-24579


On Wed, Jul 25, 2018 at 3:36 PM, Hichame El Khalfi 
wrote:

> Hey Holden,
> Thanks for your reply,
>
> We currently using a python function that produces a Row(TS=LongType(),
> bin=BinaryType()).
> We use this function like this dataframe.rdd.map(my_function)
> .toDF().write.parquet()
>
> To reuse it in pandas_udf, we changes the return type to
> StructType(StructField(Long), StructField(BinaryType).
>
> 1)But we face an issue that StructType is not supported by pandas_udf.
>
> So I was wondering to still continue to reuse dataftame.rdd.map but get an
> improvement in serialization by using ArrowFormat instead of Pickle.
>
> *From:* hol...@pigscanfly.ca
> *Sent:* July 25, 2018 4:41 PM
> *To:* hich...@elkhalfi.com
> *Cc:* user@spark.apache.org
> *Subject:* Re: Use Arrow instead of Pickle without pandas_udf
>
> Not currently. What's the problem with pandas_udf for your use case?
>
> On Wed, Jul 25, 2018 at 1:27 PM, Hichame El Khalfi 
> wrote:
>
>> Hi There,
>>
>>
>> Is there a way to use Arrow format instead of Pickle but without using
>> pandas_udf ?
>>
>>
>> Thank for your help,
>>
>>
>> Hichame
>>
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
>


Re: Use Arrow instead of Pickle without pandas_udf

2018-07-25 Thread Hichame El Khalfi
Hey Holden,
Thanks for your reply,

We currently using a python function that produces a Row(TS=LongType(), 
bin=BinaryType()).
We use this function like this 
dataframe.rdd.map(my_function).toDF().write.parquet()

To reuse it in pandas_udf, we changes the return type to 
StructType(StructField(Long), StructField(BinaryType).

1)But we face an issue that StructType is not supported by pandas_udf.

So I was wondering to still continue to reuse dataftame.rdd.map but get an 
improvement in serialization by using ArrowFormat instead of Pickle.

From: hol...@pigscanfly.ca
Sent: July 25, 2018 4:41 PM
To: hich...@elkhalfi.com
Cc: user@spark.apache.org
Subject: Re: Use Arrow instead of Pickle without pandas_udf


Not currently. What's the problem with pandas_udf for your use case?

On Wed, Jul 25, 2018 at 1:27 PM, Hichame El Khalfi 
mailto:hich...@elkhalfi.com>> wrote:

Hi There,


Is there a way to use Arrow format instead of Pickle but without using 
pandas_udf ?


Thank for your help,


Hichame



--
Twitter: https://twitter.com/holdenkarau


Re: Use Arrow instead of Pickle without pandas_udf

2018-07-25 Thread Holden Karau
Not currently. What's the problem with pandas_udf for your use case?

On Wed, Jul 25, 2018 at 1:27 PM, Hichame El Khalfi 
wrote:

> Hi There,
>
>
> Is there a way to use Arrow format instead of Pickle but without using
> pandas_udf ?
>
>
> Thank for your help,
>
>
> Hichame
>



-- 
Twitter: https://twitter.com/holdenkarau


Use Arrow instead of Pickle without pandas_udf

2018-07-25 Thread Hichame El Khalfi
Hi There,


Is there a way to use Arrow format instead of Pickle but without using 
pandas_udf ?


Thank for your help,


Hichame