Re: Run Python User Defined Functions / code in Spark with Scala Codebase
Hi, I am not very sure if SPARK data frames apply to your used case, if it does please give a try by creating a UDF in Python and check whether you can call it in Scala or not using select and expr. Regards, Gourav Sengupta On Mon, Jul 16, 2018 at 5:32 AM, Chetan Khatri wrote: > Hello Jayant, > > Thanks for great OSS Contribution :) > > On Thu, Jul 12, 2018 at 1:36 PM, Jayant Shekhar > wrote: > >> Hello Chetan, >> >> Sorry missed replying earlier. You can find some sample code here : >> >> http://sparkflows.readthedocs.io/en/latest/user-guide/python >> /pipe-python.html >> >> We will continue adding more there. >> >> Feel free to ping me directly in case of questions. >> >> Thanks, >> Jayant >> >> >> On Mon, Jul 9, 2018 at 9:56 PM, Chetan Khatri < >> chetan.opensou...@gmail.com> wrote: >> >>> Hello Jayant, >>> >>> Thank you so much for suggestion. My view was to use Python function as >>> transformation which can take couple of column names and return object. >>> which you explained. would that possible to point me to similiar codebase >>> example. >>> >>> Thanks. >>> >>> On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar >>> wrote: >>> Hello Chetan, We have currently done it with .pipe(.py) as Prem suggested. That passes the RDD as CSV strings to the python script. The python script can either process it line by line, create the result and return it back. Or create things like Pandas Dataframe for processing and finally write the results back. In the Spark/Scala/Java code, you get an RDD of string, which we convert back to a Dataframe. Feel free to ping me directly in case of questions. Thanks, Jayant On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri < chetan.opensou...@gmail.com> wrote: > Prem sure, Thanks for suggestion. > > On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure > wrote: > >> try .pipe(.py) on RDD >> >> Thanks, >> Prem >> >> On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri < >> chetan.opensou...@gmail.com> wrote: >> >>> Can someone please suggest me , thanks >>> >>> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, < >>> chetan.opensou...@gmail.com> wrote: >>> Hello Dear Spark User / Dev, I would like to pass Python user defined function to Spark Job developed using Scala and return value of that function would be returned to DF / Dataset API. Can someone please guide me, which would be best approach to do this. Python function would be mostly transformation function. Also would like to pass Java Function as a String to Spark / Scala job and it applies to RDD / Data Frame and should return RDD / Data Frame. Thank you. >> > >>> >> >
Re: Run Python User Defined Functions / code in Spark with Scala Codebase
Hello Jayant, Thanks for great OSS Contribution :) On Thu, Jul 12, 2018 at 1:36 PM, Jayant Shekhar wrote: > Hello Chetan, > > Sorry missed replying earlier. You can find some sample code here : > > http://sparkflows.readthedocs.io/en/latest/user-guide/ > python/pipe-python.html > > We will continue adding more there. > > Feel free to ping me directly in case of questions. > > Thanks, > Jayant > > > On Mon, Jul 9, 2018 at 9:56 PM, Chetan Khatri > wrote: > >> Hello Jayant, >> >> Thank you so much for suggestion. My view was to use Python function as >> transformation which can take couple of column names and return object. >> which you explained. would that possible to point me to similiar codebase >> example. >> >> Thanks. >> >> On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar >> wrote: >> >>> Hello Chetan, >>> >>> We have currently done it with .pipe(.py) as Prem suggested. >>> >>> That passes the RDD as CSV strings to the python script. The python >>> script can either process it line by line, create the result and return it >>> back. Or create things like Pandas Dataframe for processing and finally >>> write the results back. >>> >>> In the Spark/Scala/Java code, you get an RDD of string, which we convert >>> back to a Dataframe. >>> >>> Feel free to ping me directly in case of questions. >>> >>> Thanks, >>> Jayant >>> >>> >>> On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri < >>> chetan.opensou...@gmail.com> wrote: >>> Prem sure, Thanks for suggestion. On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure wrote: > try .pipe(.py) on RDD > > Thanks, > Prem > > On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri < > chetan.opensou...@gmail.com> wrote: > >> Can someone please suggest me , thanks >> >> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, < >> chetan.opensou...@gmail.com> wrote: >> >>> Hello Dear Spark User / Dev, >>> >>> I would like to pass Python user defined function to Spark Job >>> developed using Scala and return value of that function would be >>> returned >>> to DF / Dataset API. >>> >>> Can someone please guide me, which would be best approach to do >>> this. Python function would be mostly transformation function. Also >>> would >>> like to pass Java Function as a String to Spark / Scala job and it >>> applies >>> to RDD / Data Frame and should return RDD / Data Frame. >>> >>> Thank you. >>> >>> >>> >>> > >>> >> >
Re: Run Python User Defined Functions / code in Spark with Scala Codebase
Hello Chetan, Sorry missed replying earlier. You can find some sample code here : http://sparkflows.readthedocs.io/en/latest/user-guide/python/pipe-python.html We will continue adding more there. Feel free to ping me directly in case of questions. Thanks, Jayant On Mon, Jul 9, 2018 at 9:56 PM, Chetan Khatri wrote: > Hello Jayant, > > Thank you so much for suggestion. My view was to use Python function as > transformation which can take couple of column names and return object. > which you explained. would that possible to point me to similiar codebase > example. > > Thanks. > > On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar > wrote: > >> Hello Chetan, >> >> We have currently done it with .pipe(.py) as Prem suggested. >> >> That passes the RDD as CSV strings to the python script. The python >> script can either process it line by line, create the result and return it >> back. Or create things like Pandas Dataframe for processing and finally >> write the results back. >> >> In the Spark/Scala/Java code, you get an RDD of string, which we convert >> back to a Dataframe. >> >> Feel free to ping me directly in case of questions. >> >> Thanks, >> Jayant >> >> >> On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri < >> chetan.opensou...@gmail.com> wrote: >> >>> Prem sure, Thanks for suggestion. >>> >>> On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure >>> wrote: >>> try .pipe(.py) on RDD Thanks, Prem On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri < chetan.opensou...@gmail.com> wrote: > Can someone please suggest me , thanks > > On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, < > chetan.opensou...@gmail.com> wrote: > >> Hello Dear Spark User / Dev, >> >> I would like to pass Python user defined function to Spark Job >> developed using Scala and return value of that function would be returned >> to DF / Dataset API. >> >> Can someone please guide me, which would be best approach to do this. >> Python function would be mostly transformation function. Also would like >> to >> pass Java Function as a String to Spark / Scala job and it applies to >> RDD / >> Data Frame and should return RDD / Data Frame. >> >> Thank you. >> >> >> >> >>> >> >
Re: Run Python User Defined Functions / code in Spark with Scala Codebase
Hello Jayant, Thank you so much for suggestion. My view was to use Python function as transformation which can take couple of column names and return object. which you explained. would that possible to point me to similiar codebase example. Thanks. On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar wrote: > Hello Chetan, > > We have currently done it with .pipe(.py) as Prem suggested. > > That passes the RDD as CSV strings to the python script. The python script > can either process it line by line, create the result and return it back. > Or create things like Pandas Dataframe for processing and finally write the > results back. > > In the Spark/Scala/Java code, you get an RDD of string, which we convert > back to a Dataframe. > > Feel free to ping me directly in case of questions. > > Thanks, > Jayant > > > On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri > wrote: > >> Prem sure, Thanks for suggestion. >> >> On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure wrote: >> >>> try .pipe(.py) on RDD >>> >>> Thanks, >>> Prem >>> >>> On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri < >>> chetan.opensou...@gmail.com> wrote: >>> Can someone please suggest me , thanks On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, wrote: > Hello Dear Spark User / Dev, > > I would like to pass Python user defined function to Spark Job > developed using Scala and return value of that function would be returned > to DF / Dataset API. > > Can someone please guide me, which would be best approach to do this. > Python function would be mostly transformation function. Also would like > to > pass Java Function as a String to Spark / Scala job and it applies to RDD > / > Data Frame and should return RDD / Data Frame. > > Thank you. > > > > >>> >> >
Re: Run Python User Defined Functions / code in Spark with Scala Codebase
Hello Chetan, We have currently done it with .pipe(.py) as Prem suggested. That passes the RDD as CSV strings to the python script. The python script can either process it line by line, create the result and return it back. Or create things like Pandas Dataframe for processing and finally write the results back. In the Spark/Scala/Java code, you get an RDD of string, which we convert back to a Dataframe. Feel free to ping me directly in case of questions. Thanks, Jayant On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri wrote: > Prem sure, Thanks for suggestion. > > On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure wrote: > >> try .pipe(.py) on RDD >> >> Thanks, >> Prem >> >> On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri < >> chetan.opensou...@gmail.com> wrote: >> >>> Can someone please suggest me , thanks >>> >>> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, >>> wrote: >>> Hello Dear Spark User / Dev, I would like to pass Python user defined function to Spark Job developed using Scala and return value of that function would be returned to DF / Dataset API. Can someone please guide me, which would be best approach to do this. Python function would be mostly transformation function. Also would like to pass Java Function as a String to Spark / Scala job and it applies to RDD / Data Frame and should return RDD / Data Frame. Thank you. >> >
Re: Run Python User Defined Functions / code in Spark with Scala Codebase
Prem sure, Thanks for suggestion. On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure wrote: > try .pipe(.py) on RDD > > Thanks, > Prem > > On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri > wrote: > >> Can someone please suggest me , thanks >> >> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, >> wrote: >> >>> Hello Dear Spark User / Dev, >>> >>> I would like to pass Python user defined function to Spark Job developed >>> using Scala and return value of that function would be returned to DF / >>> Dataset API. >>> >>> Can someone please guide me, which would be best approach to do this. >>> Python function would be mostly transformation function. Also would like to >>> pass Java Function as a String to Spark / Scala job and it applies to RDD / >>> Data Frame and should return RDD / Data Frame. >>> >>> Thank you. >>> >>> >>> >>> >
Re: Run Python User Defined Functions / code in Spark with Scala Codebase
try .pipe(.py) on RDD Thanks, Prem On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri wrote: > Can someone please suggest me , thanks > > On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, > wrote: > >> Hello Dear Spark User / Dev, >> >> I would like to pass Python user defined function to Spark Job developed >> using Scala and return value of that function would be returned to DF / >> Dataset API. >> >> Can someone please guide me, which would be best approach to do this. >> Python function would be mostly transformation function. Also would like to >> pass Java Function as a String to Spark / Scala job and it applies to RDD / >> Data Frame and should return RDD / Data Frame. >> >> Thank you. >> >> >> >>
Re: Run Python User Defined Functions / code in Spark with Scala Codebase
Can someone please suggest me , thanks On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, wrote: > Hello Dear Spark User / Dev, > > I would like to pass Python user defined function to Spark Job developed > using Scala and return value of that function would be returned to DF / > Dataset API. > > Can someone please guide me, which would be best approach to do this. > Python function would be mostly transformation function. Also would like to > pass Java Function as a String to Spark / Scala job and it applies to RDD / > Data Frame and should return RDD / Data Frame. > > Thank you. > > > >
Run Python User Defined Functions / code in Spark with Scala Codebase
Hello Dear Spark User / Dev, I would like to pass Python user defined function to Spark Job developed using Scala and return value of that function would be returned to DF / Dataset API. Can someone please guide me, which would be best approach to do this. Python function would be mostly transformation function. Also would like to pass Java Function as a String to Spark / Scala job and it applies to RDD / Data Frame and should return RDD / Data Frame. Thank you.