Re: Problem in avg function Spark 1.6.3 using spark-shell

2017-06-25 Thread Riccardo Ferrari
Hi,

Looks like you performed an aggregation on the ImageWidth column already.
The error itself is quite self-explanatory:

Cannot resolve column name "ImageWidth" among (MainDomainCode,
*avg(length(ImageWidth))*)

The column available in that DF are MainDomainCode and
avg(length(ImageWidth)) so you should use the alias and rename the column
back:
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column

best,

On Sun, Jun 25, 2017 at 1:19 PM, Eko Susilo 
wrote:

> Hi,
>
> I have a data frame collection called “secondDf” when I tried to perform
> groupBy and then sum of each column it works perfectly. However when I
> tried to calculate average of that column it says the column name is not
> found. The details are as follow
>
> val total = secondDf.filter("ImageWidth > 1 and ImageHeight > 1").
>groupBy("MainDomainCode").
>agg(sum("ImageWidth"),
>sum("ImageHeight"),
>sum("ImageArea”))
>
>
> total.show  will show result as expected, However when I tried to
> calculate avg, the result is script error. Any help to resolve this issue?
>
> Regards,
> Eko
>
>
>   val average = secondDf.filter("ImageWidth > 1 and ImageHeight > 1").
>groupBy("MainDomainCode").
>agg(avg("ImageWidth"),
>avg("ImageHeight"),
>avg("ImageArea"))
>
>
> org.apache.spark.sql.AnalysisException: Cannot resolve column name
> "ImageWidth" among (MainDomainCode, avg(length(ImageWidth)));
> at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.
> apply(DataFrame.scala:152)
> at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.
> apply(DataFrame.scala:152)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151)
> at org.apache.spark.sql.DataFrame.col(DataFrame.scala:664)
> at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:652)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$
> $iwC.(:42)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> (:49)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<
> init>(:51)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:57)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:59)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:61)
> at $iwC$$iwC$$iwC$$iwC$$iwC.(:63)
> at $iwC$$iwC$$iwC$$iwC.(:65)
> at $iwC$$iwC$$iwC.(:67)
> at $iwC$$iwC.(:69)
> at $iwC.(:71)
> at (:73)
> at .(:77)
> at .()
> at .(:7)
> at .()
> at $print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(
> SparkIMain.scala:1065)
> at org.apache.spark.repl.SparkIMain$Request.loadAndRun(
> SparkIMain.scala:1346)
> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(
> SparkILoop.scala:857)
> at org.apache.spark.repl.SparkILoop.interpretStartingWith(
> SparkILoop.scala:902)
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(
> SparkILoop.scala:875)
> at org.apache.spark.repl.SparkILoop.interpretStartingWith(
> SparkILoop.scala:902)
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(
> SparkILoop.scala:875)
> at org.apache.spark.repl.SparkILoop.interpretStartingWith(
> SparkILoop.scala:902)
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(
> SparkILoop.scala:875)
> at org.apache.spark.repl.SparkILoop.interpretStartingWith(
> SparkILoop.scala:902)
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(
> SparkILoop.scala:875)
> at org.apache.spark.repl.SparkILoop.interpretStartingWith(
> SparkILoop.scala:902)
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
> at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
> at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$loop(SparkILoop.scala:670)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$
> 

Problem in avg function Spark 1.6.3 using spark-shell

2017-06-25 Thread Eko Susilo
Hi,

I have a data frame collection called “secondDf” when I tried to perform 
groupBy and then sum of each column it works perfectly. However when I tried to 
calculate average of that column it says the column name is not found. The 
details are as follow

val total = secondDf.filter("ImageWidth > 1 and ImageHeight > 1").
   groupBy("MainDomainCode").
   agg(sum("ImageWidth"),
   sum("ImageHeight"),
   sum("ImageArea”))


total.show  will show result as expected, However when I tried to calculate 
avg, the result is script error. Any help to resolve this issue?

Regards,
Eko


  val average = secondDf.filter("ImageWidth > 1 and ImageHeight > 1").
   groupBy("MainDomainCode").
   agg(avg("ImageWidth"),
   avg("ImageHeight"),
   avg("ImageArea"))


org.apache.spark.sql.AnalysisException: Cannot resolve column name "ImageWidth" 
among (MainDomainCode, avg(length(ImageWidth)));
at 
org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152)
at 
org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:152)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151)
at org.apache.spark.sql.DataFrame.col(DataFrame.scala:664)
at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:652)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:42)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:49)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:57)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:59)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:61)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:63)
at $iwC$$iwC$$iwC$$iwC.(:65)
at $iwC$$iwC$$iwC.(:67)
at $iwC$$iwC.(:69)
at $iwC.(:71)
at (:73)
at .(:77)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at