Hitested wih Spark 1.4
We need to import pow otherwise it uses python version of pow I guess.
>>> from pyspark.sql.functions import pow>>>
>>> df.select(pow(df.age,df.age)).show()
15/06/29 22:36:05 INFO Ta+--------------------+| POWER(age,
age)|+--------------------+| null||
2.05891132094649E44||1.978419655660313...|+--------------------+>>>
df.select(pow(df.age,2)).show()
+---------------+|POWER(age, 2.0)|+---------------+| null||
900.0|| 361.0|+---------------+
Kind RegardsSalih Oztop
From: Krishna Sankar <[email protected]>
To: Bob Corsaro <[email protected]>
Cc: Salih Oztop <[email protected]>; user <[email protected]>
Sent: Monday, June 29, 2015 9:52 PM
Subject: Re: SparkSQL built in functions
Interesting. Looking at the definitions, sql.functions.pow is defined only for
(col,col). Just as an experiment, create a column with value 2 and see if that
works.Cheers<k/>
On Mon, Jun 29, 2015 at 1:34 PM, Bob Corsaro <[email protected]> wrote:
1.4 and I did set the second parameter. The DSL works fine but trying out with
SQL doesn't.
On Mon, Jun 29, 2015, 4:32 PM Salih Oztop <[email protected]> wrote:
Hi Bob,I tested your scenario with Spark 1.3 and I assumed you did not miss the
second parameter of pow(x,y)
from pyspark.sql import SQLContextsqlContext = SQLContext(sc)
df = sqlContext.jsonFile("/vagrant/people.json")# Displays the content of the
DataFrame to stdoutdf.show()#These are all finedf.select("name",
(df.age)*(df.age)).show()
name (age * age)
Michael null
Andy 900
Justin 361
df.select("name", (df.age)+1).show()
name (age + 1)
Michael null
Andy 31
Justin 20
However the following tests give the same error.df.select("name",
pow(df.age,2)).show()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-27-ce7299d3ef76> in <module>()
----> 1 df.select("name", pow(df.age,2)).show()
TypeError: unsupported operand type(s) for ** or pow(): 'Column' and 'int'
df.select("name", (df.age)**2).show()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-24-29540c3536bf> in <module>()
----> 1 df.select("name", (df.age)**2).show()
TypeError: unsupported operand type(s) for ** or pow(): 'Column' and 'int'
Moreover testing the functions individually they are working fine.pow(2,4)
162**4
16
Kind Regards
Salih Oztop
From: Bob Corsaro <[email protected]>
To: user <[email protected]>
Sent: Monday, June 29, 2015 7:27 PM
Subject: SparkSQL built in functions
I'm having trouble using "select pow(col) from table" It seems the function is
not registered for SparkSQL. Is this on purpose or an oversight? I'm using
pyspark.