Re: [SparkSQL 1.3.0] Cannot resolve column name "SUM('p.q)" among (k, SUM('p.q));

Michael Armbrust Thu, 02 Apr 2015 15:54:06 -0700

Thanks for reporting.  The root cause is (SPARK-5632
<https://issues.apache.org/jira/browse/SPARK-5632>), which is actually
pretty hard to fix.  Fortunately, for this particular case there is an easy
workaround: https://github.com/apache/spark/pull/5337


We can try to include this in 1.3.1.

On Thu, Apr 2, 2015 at 3:29 AM, Haopu Wang <hw...@qilinsoft.com> wrote:

>  Hi, I want to rename an aggregation field using DataFrame API. The
> aggregation is done on a nested field. But I got below exception.
>
> Do you see the same issue and any workaround? Thank you very much!
>
>
>
> ======
>
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot
> resolve column name "SUM('p.q)" among (k, SUM('p.q));
>
>         at
> org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:162)
>
>         at
> org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:162)
>
>         at scala.Option.getOrElse(Option.scala:120)
>
>         at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:161)
>
>         at org.apache.spark.sql.DataFrame.col(DataFrame.scala:436)
>
>         at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:426)
>
>         at
> org.apache.spark.sql.DataFrame$$anonfun$3.apply(DataFrame.scala:244)
>
>         at
> org.apache.spark.sql.DataFrame$$anonfun$3.apply(DataFrame.scala:243)
>
>         at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
>         at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
>         at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>
>         at
> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>
>         at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>
>         at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
>
>         at org.apache.spark.sql.DataFrame.toDF(DataFrame.scala:243)
>
> ======
>
>
>
> And this code can be used to reproduce the issue:
>
>
>
>   *case* *class* ChildClass(q: Long)
>
>   *case* *class* ParentClass(k: String, p: ChildClass)
>
>
>
>   *def* main(args: Array[String]): Unit = {
>
>
>
>     *val* conf = *new* SparkConf().setAppName("DFTest").setMaster(
> "local[*]")
>
>     *val* ctx = *new* SparkContext(conf)
>
>     *val* sqlCtx = *new* HiveContext(ctx)
>
>
>
>     *import* sqlCtx.implicits._
>
>
>
>     *val* source = *ctx**.makeRDD(Seq(ParentClass(**"c1"**, ChildClass(*
> *100**))))*.toDF
>
>
>
>     *import* org.apache.spark.sql.functions._
>
>
>
>     *val* target = source.groupBy(*'k*).agg(*'k*, sum("p.q"))
>
>
>
>     // This line prints the correct contents
>
>     // k  SUM('p.q)
>
>     // c1 100
>
>     target.show
>
>
>
>     // But this line triggers the exception
>
>     target.toDF("key", "total")
>
>
>
> ======
>

Re: [SparkSQL 1.3.0] Cannot resolve column name "SUM('p.q)" among (k, SUM('p.q));

Reply via email to