Re: [PySpark] [SparkR] Is it possible to invoke a PySpark function with a SparkR DataFrame?

2019-07-16 Thread Felix Cheung
Not currently in Spark.

However, there are systems out there that can share DataFrame between languages 
on top of Spark - it’s not calling the python UDF directly but you can pass the 
DataFrame to python and then .map(UDF) that way.



From: Fiske, Danny 
Sent: Monday, July 15, 2019 6:58:32 AM
To: user@spark.apache.org
Subject: [PySpark] [SparkR] Is it possible to invoke a PySpark function with a 
SparkR DataFrame?

Hi all,

Forgive this naïveté, I’m looking for reassurance from some experts!

In the past we created a tailored Spark library for our organisation, 
implementing Spark functions in Scala with Python and R “wrappers” on top, but 
the focus on Scala has alienated our analysts/statisticians/data scientists and 
collaboration is important for us (yeah… we’re aware that your SDKs are very 
similar across languages… :/ ). We’d like to see if we could forego the Scala 
facet in order to present the source code in a language more familiar to users 
and internal contributors.

We’d ideally write our functions with PySpark and potentially create a SparkR 
“wrapper” over the top, leading to the question:

Given a function written with PySpark that accepts a DataFrame parameter, is 
there a way to invoke this function using a SparkR DataFrame?

Is there any reason to pursue this? Is it even possible?

Many thanks,

Danny

For the latest data on the economy and society, consult our website at 
http://www.ons.gov.uk<http://www.ons.gov.uk/>

***
Please Note:  Incoming and outgoing email messages are routinely monitored for 
compliance with our policy on the use of electronic communications

***

Legal Disclaimer:  Any views expressed by the sender of this message are not 
necessarily those of the Office for National Statistics
***


[PySpark] [SparkR] Is it possible to invoke a PySpark function with a SparkR DataFrame?

2019-07-15 Thread Fiske, Danny
Hi all,

Forgive this naïveté, I'm looking for reassurance from some experts!

In the past we created a tailored Spark library for our organisation, 
implementing Spark functions in Scala with Python and R "wrappers" on top, but 
the focus on Scala has alienated our analysts/statisticians/data scientists and 
collaboration is important for us (yeah... we're aware that your SDKs are very 
similar across languages... :/ ). We'd like to see if we could forego the Scala 
facet in order to present the source code in a language more familiar to users 
and internal contributors.

We'd ideally write our functions with PySpark and potentially create a SparkR 
"wrapper" over the top, leading to the question:

Given a function written with PySpark that accepts a DataFrame parameter, is 
there a way to invoke this function using a SparkR DataFrame?

Is there any reason to pursue this? Is it even possible?

Many thanks,

Danny

For the latest data on the economy and society, consult our website at 
http://www.ons.gov.uk

***
Please Note:  Incoming and outgoing email messages are routinely monitored for 
compliance with our policy
on the use of electronic communications

***

Legal Disclaimer:  Any views expressed by the sender of this message are not 
necessarily those of the
Office for National Statistics
***