Hello Jayant,

Thank you so much for suggestion. My view was to  use Python function as
transformation which can take couple of column names and return object.
which you explained. would that possible to point me to similiar codebase
example.

Thanks.

On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar <jayantbaya...@gmail.com>
wrote:

> Hello Chetan,
>
> We have currently done it with .pipe(.py) as Prem suggested.
>
> That passes the RDD as CSV strings to the python script. The python script
> can either process it line by line, create the result and return it back.
> Or create things like Pandas Dataframe for processing and finally write the
> results back.
>
> In the Spark/Scala/Java code, you get an RDD of string, which we convert
> back to a Dataframe.
>
> Feel free to ping me directly in case of questions.
>
> Thanks,
> Jayant
>
>
> On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri <chetan.opensou...@gmail.com
> > wrote:
>
>> Prem sure, Thanks for suggestion.
>>
>> On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure <sparksure...@gmail.com> wrote:
>>
>>> try .pipe(.py) on RDD
>>>
>>> Thanks,
>>> Prem
>>>
>>> On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <
>>> chetan.opensou...@gmail.com> wrote:
>>>
>>>> Can someone please suggest me , thanks
>>>>
>>>> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <chetan.opensou...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Dear Spark User / Dev,
>>>>>
>>>>> I would like to pass Python user defined function to Spark Job
>>>>> developed using Scala and return value of that function would be returned
>>>>> to DF / Dataset API.
>>>>>
>>>>> Can someone please guide me, which would be best approach to do this.
>>>>> Python function would be mostly transformation function. Also would like 
>>>>> to
>>>>> pass Java Function as a String to Spark / Scala job and it applies to RDD 
>>>>> /
>>>>> Data Frame and should return RDD / Data Frame.
>>>>>
>>>>> Thank you.
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>
>

Reply via email to