Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-16 Thread Gourav Sengupta
Hi,

I am not very sure if SPARK data frames apply to your used case, if it does
please give a try by creating a UDF in Python and check whether you can
call it in Scala or not using select and expr.

Regards,
Gourav Sengupta

On Mon, Jul 16, 2018 at 5:32 AM, Chetan Khatri 
wrote:

> Hello Jayant,
>
> Thanks for great OSS Contribution :)
>
> On Thu, Jul 12, 2018 at 1:36 PM, Jayant Shekhar 
> wrote:
>
>> Hello Chetan,
>>
>> Sorry missed replying earlier. You can find some sample code here :
>>
>> http://sparkflows.readthedocs.io/en/latest/user-guide/python
>> /pipe-python.html
>>
>> We will continue adding more there.
>>
>> Feel free to ping me directly in case of questions.
>>
>> Thanks,
>> Jayant
>>
>>
>> On Mon, Jul 9, 2018 at 9:56 PM, Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Hello Jayant,
>>>
>>> Thank you so much for suggestion. My view was to  use Python function as
>>> transformation which can take couple of column names and return object.
>>> which you explained. would that possible to point me to similiar codebase
>>> example.
>>>
>>> Thanks.
>>>
>>> On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar 
>>> wrote:
>>>
 Hello Chetan,

 We have currently done it with .pipe(.py) as Prem suggested.

 That passes the RDD as CSV strings to the python script. The python
 script can either process it line by line, create the result and return it
 back. Or create things like Pandas Dataframe for processing and finally
 write the results back.

 In the Spark/Scala/Java code, you get an RDD of string, which we
 convert back to a Dataframe.

 Feel free to ping me directly in case of questions.

 Thanks,
 Jayant


 On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri <
 chetan.opensou...@gmail.com> wrote:

> Prem sure, Thanks for suggestion.
>
> On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure 
> wrote:
>
>> try .pipe(.py) on RDD
>>
>> Thanks,
>> Prem
>>
>> On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Can someone please suggest me , thanks
>>>
>>> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <
>>> chetan.opensou...@gmail.com> wrote:
>>>
 Hello Dear Spark User / Dev,

 I would like to pass Python user defined function to Spark Job
 developed using Scala and return value of that function would be 
 returned
 to DF / Dataset API.

 Can someone please guide me, which would be best approach to do
 this. Python function would be mostly transformation function. Also 
 would
 like to pass Java Function as a String to Spark / Scala job and it 
 applies
 to RDD / Data Frame and should return RDD / Data Frame.

 Thank you.




>>
>

>>>
>>
>


Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-15 Thread Chetan Khatri
Hello Jayant,

Thanks for great OSS Contribution :)

On Thu, Jul 12, 2018 at 1:36 PM, Jayant Shekhar 
wrote:

> Hello Chetan,
>
> Sorry missed replying earlier. You can find some sample code here :
>
> http://sparkflows.readthedocs.io/en/latest/user-guide/
> python/pipe-python.html
>
> We will continue adding more there.
>
> Feel free to ping me directly in case of questions.
>
> Thanks,
> Jayant
>
>
> On Mon, Jul 9, 2018 at 9:56 PM, Chetan Khatri  > wrote:
>
>> Hello Jayant,
>>
>> Thank you so much for suggestion. My view was to  use Python function as
>> transformation which can take couple of column names and return object.
>> which you explained. would that possible to point me to similiar codebase
>> example.
>>
>> Thanks.
>>
>> On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar 
>> wrote:
>>
>>> Hello Chetan,
>>>
>>> We have currently done it with .pipe(.py) as Prem suggested.
>>>
>>> That passes the RDD as CSV strings to the python script. The python
>>> script can either process it line by line, create the result and return it
>>> back. Or create things like Pandas Dataframe for processing and finally
>>> write the results back.
>>>
>>> In the Spark/Scala/Java code, you get an RDD of string, which we convert
>>> back to a Dataframe.
>>>
>>> Feel free to ping me directly in case of questions.
>>>
>>> Thanks,
>>> Jayant
>>>
>>>
>>> On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri <
>>> chetan.opensou...@gmail.com> wrote:
>>>
 Prem sure, Thanks for suggestion.

 On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure 
 wrote:

> try .pipe(.py) on RDD
>
> Thanks,
> Prem
>
> On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <
> chetan.opensou...@gmail.com> wrote:
>
>> Can someone please suggest me , thanks
>>
>> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Hello Dear Spark User / Dev,
>>>
>>> I would like to pass Python user defined function to Spark Job
>>> developed using Scala and return value of that function would be 
>>> returned
>>> to DF / Dataset API.
>>>
>>> Can someone please guide me, which would be best approach to do
>>> this. Python function would be mostly transformation function. Also 
>>> would
>>> like to pass Java Function as a String to Spark / Scala job and it 
>>> applies
>>> to RDD / Data Frame and should return RDD / Data Frame.
>>>
>>> Thank you.
>>>
>>>
>>>
>>>
>

>>>
>>
>


Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-12 Thread Jayant Shekhar
Hello Chetan,

Sorry missed replying earlier. You can find some sample code here :

http://sparkflows.readthedocs.io/en/latest/user-guide/python/pipe-python.html

We will continue adding more there.

Feel free to ping me directly in case of questions.

Thanks,
Jayant


On Mon, Jul 9, 2018 at 9:56 PM, Chetan Khatri 
wrote:

> Hello Jayant,
>
> Thank you so much for suggestion. My view was to  use Python function as
> transformation which can take couple of column names and return object.
> which you explained. would that possible to point me to similiar codebase
> example.
>
> Thanks.
>
> On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar 
> wrote:
>
>> Hello Chetan,
>>
>> We have currently done it with .pipe(.py) as Prem suggested.
>>
>> That passes the RDD as CSV strings to the python script. The python
>> script can either process it line by line, create the result and return it
>> back. Or create things like Pandas Dataframe for processing and finally
>> write the results back.
>>
>> In the Spark/Scala/Java code, you get an RDD of string, which we convert
>> back to a Dataframe.
>>
>> Feel free to ping me directly in case of questions.
>>
>> Thanks,
>> Jayant
>>
>>
>> On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Prem sure, Thanks for suggestion.
>>>
>>> On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure 
>>> wrote:
>>>
 try .pipe(.py) on RDD

 Thanks,
 Prem

 On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <
 chetan.opensou...@gmail.com> wrote:

> Can someone please suggest me , thanks
>
> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <
> chetan.opensou...@gmail.com> wrote:
>
>> Hello Dear Spark User / Dev,
>>
>> I would like to pass Python user defined function to Spark Job
>> developed using Scala and return value of that function would be returned
>> to DF / Dataset API.
>>
>> Can someone please guide me, which would be best approach to do this.
>> Python function would be mostly transformation function. Also would like 
>> to
>> pass Java Function as a String to Spark / Scala job and it applies to 
>> RDD /
>> Data Frame and should return RDD / Data Frame.
>>
>> Thank you.
>>
>>
>>
>>

>>>
>>
>


Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-09 Thread Chetan Khatri
Hello Jayant,

Thank you so much for suggestion. My view was to  use Python function as
transformation which can take couple of column names and return object.
which you explained. would that possible to point me to similiar codebase
example.

Thanks.

On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar 
wrote:

> Hello Chetan,
>
> We have currently done it with .pipe(.py) as Prem suggested.
>
> That passes the RDD as CSV strings to the python script. The python script
> can either process it line by line, create the result and return it back.
> Or create things like Pandas Dataframe for processing and finally write the
> results back.
>
> In the Spark/Scala/Java code, you get an RDD of string, which we convert
> back to a Dataframe.
>
> Feel free to ping me directly in case of questions.
>
> Thanks,
> Jayant
>
>
> On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri  > wrote:
>
>> Prem sure, Thanks for suggestion.
>>
>> On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure  wrote:
>>
>>> try .pipe(.py) on RDD
>>>
>>> Thanks,
>>> Prem
>>>
>>> On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <
>>> chetan.opensou...@gmail.com> wrote:
>>>
 Can someone please suggest me , thanks

 On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, 
 wrote:

> Hello Dear Spark User / Dev,
>
> I would like to pass Python user defined function to Spark Job
> developed using Scala and return value of that function would be returned
> to DF / Dataset API.
>
> Can someone please guide me, which would be best approach to do this.
> Python function would be mostly transformation function. Also would like 
> to
> pass Java Function as a String to Spark / Scala job and it applies to RDD 
> /
> Data Frame and should return RDD / Data Frame.
>
> Thank you.
>
>
>
>
>>>
>>
>


Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-05 Thread Jayant Shekhar
Hello Chetan,

We have currently done it with .pipe(.py) as Prem suggested.

That passes the RDD as CSV strings to the python script. The python script
can either process it line by line, create the result and return it back.
Or create things like Pandas Dataframe for processing and finally write the
results back.

In the Spark/Scala/Java code, you get an RDD of string, which we convert
back to a Dataframe.

Feel free to ping me directly in case of questions.

Thanks,
Jayant


On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri 
wrote:

> Prem sure, Thanks for suggestion.
>
> On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure  wrote:
>
>> try .pipe(.py) on RDD
>>
>> Thanks,
>> Prem
>>
>> On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Can someone please suggest me , thanks
>>>
>>> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, 
>>> wrote:
>>>
 Hello Dear Spark User / Dev,

 I would like to pass Python user defined function to Spark Job
 developed using Scala and return value of that function would be returned
 to DF / Dataset API.

 Can someone please guide me, which would be best approach to do this.
 Python function would be mostly transformation function. Also would like to
 pass Java Function as a String to Spark / Scala job and it applies to RDD /
 Data Frame and should return RDD / Data Frame.

 Thank you.




>>
>


Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-05 Thread Chetan Khatri
Prem sure, Thanks for suggestion.

On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure  wrote:

> try .pipe(.py) on RDD
>
> Thanks,
> Prem
>
> On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri  > wrote:
>
>> Can someone please suggest me , thanks
>>
>> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, 
>> wrote:
>>
>>> Hello Dear Spark User / Dev,
>>>
>>> I would like to pass Python user defined function to Spark Job developed
>>> using Scala and return value of that function would be returned to DF /
>>> Dataset API.
>>>
>>> Can someone please guide me, which would be best approach to do this.
>>> Python function would be mostly transformation function. Also would like to
>>> pass Java Function as a String to Spark / Scala job and it applies to RDD /
>>> Data Frame and should return RDD / Data Frame.
>>>
>>> Thank you.
>>>
>>>
>>>
>>>
>


Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-04 Thread Prem Sure
try .pipe(.py) on RDD

Thanks,
Prem

On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri 
wrote:

> Can someone please suggest me , thanks
>
> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, 
> wrote:
>
>> Hello Dear Spark User / Dev,
>>
>> I would like to pass Python user defined function to Spark Job developed
>> using Scala and return value of that function would be returned to DF /
>> Dataset API.
>>
>> Can someone please guide me, which would be best approach to do this.
>> Python function would be mostly transformation function. Also would like to
>> pass Java Function as a String to Spark / Scala job and it applies to RDD /
>> Data Frame and should return RDD / Data Frame.
>>
>> Thank you.
>>
>>
>>
>>


Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-04 Thread Chetan Khatri
Can someone please suggest me , thanks

On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, 
wrote:

> Hello Dear Spark User / Dev,
>
> I would like to pass Python user defined function to Spark Job developed
> using Scala and return value of that function would be returned to DF /
> Dataset API.
>
> Can someone please guide me, which would be best approach to do this.
> Python function would be mostly transformation function. Also would like to
> pass Java Function as a String to Spark / Scala job and it applies to RDD /
> Data Frame and should return RDD / Data Frame.
>
> Thank you.
>
>
>
>


Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-03 Thread Chetan Khatri
Hello Dear Spark User / Dev,

I would like to pass Python user defined function to Spark Job developed
using Scala and return value of that function would be returned to DF /
Dataset API.

Can someone please guide me, which would be best approach to do this.
Python function would be mostly transformation function. Also would like to
pass Java Function as a String to Spark / Scala job and it applies to RDD /
Data Frame and should return RDD / Data Frame.

Thank you.