Yes pretty straight forward define, register and use
def cleanupCurrency (word : String) : Double = {
word.toString.substring(1).replace(",", "").toDouble
}
sqlContext.udf.register("cleanupCurrency", cleanupCurrency(_:String))
val a = df.filter(col("Total") > "").map(p => Invoices(p(0).toString,
p(1).toString, cleanupCurrency(p(2).toString),
cleanupCurrency(p(3).toString), cleanupCurrency(p(4).toString)))
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On 4 August 2016 at 17:09, Nicholas Chammas <[email protected]>
wrote:
> No, SQLContext is not disappearing. The top-level class is replaced by
> SparkSession, but you can always get the underlying context from the
> session.
>
> You can also use SparkSession.udf.register()
> <http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.SparkSession.udf>,
> which is just a wrapper for sqlContext.registerFunction
> <https://github.com/apache/spark/blob/2182e4322da6ba732f99ae75dce00f76f1cdc4d9/python/pyspark/sql/context.py#L511-L520>
> .
>
>
> On Thu, Aug 4, 2016 at 12:04 PM Ben Teeuwen <[email protected]> wrote:
>
>> Yes, but I don’t want to use it in a select() call.
>> Either selectExpr() or spark.sql(), with the udf being called inside a
>> string.
>>
>> Now I got it to work using "sqlContext.registerFunction('
>> encodeOneHot_udf',encodeOneHot, VectorUDT())”
>> But this sqlContext approach will disappear, right? So I’m curious what
>> to use instead.
>>
>> On Aug 4, 2016, at 3:54 PM, Nicholas Chammas <[email protected]>
>> wrote:
>>
>> Have you looked at pyspark.sql.functions.udf and the associated examples?
>> 2016년 8월 4일 (목) 오전 9:10, Ben Teeuwen <[email protected]>님이 작성:
>>
>>> Hi,
>>>
>>> I’d like to use a UDF in pyspark 2.0. As in ..
>>> ________
>>>
>>> def squareIt(x):
>>> return x * x
>>>
>>> # register the function and define return type
>>> ….
>>>
>>> spark.sql(“”"select myUdf(adgroupid, 'extra_string_parameter') as
>>> function_result from df’)
>>>
>>> _________
>>>
>>> How can I register the function? I only see registerFunction in the
>>> deprecated sqlContext at http://spark.apache.org/
>>> docs/2.0.0/api/python/pyspark.sql.html.
>>> As the ‘spark’ object unifies hiveContext and sqlContext, what is the
>>> new way to go?
>>>
>>> Ben
>>>
>>
>>