First off, I would advise against having dots in column names, thats just playing with fire.
Second the exception is really strange since spark is complaining about a completely unrelated column. I would like to see the df schema before the exception was thrown. -- Jan Sterba https://twitter.com/honzasterba | http://flickr.com/honzasterba | http://500px.com/honzasterba On Tue, Mar 15, 2016 at 6:51 PM, Emmanuel <[email protected]> wrote: > > In Spark 1.6 > > if I do (column name has dot in it, but is not a nested column): > > df = df.withColumn("raw.hourOfDay", df.col("`raw.hourOfDay`")) > > > scala> df = df.withColumn("raw.hourOfDay", df.col("`raw.hourOfDay`")) > org.apache.spark.sql.AnalysisException: cannot resolve 'raw.minOfDay' given > input columns raw.hourOfDay_2, raw.dayOfWeek, raw.sensor2, raw.hourOfDay, > raw.minOfDay; > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:319) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:319) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:53) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:318) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:107) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:117) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:121) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:121) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:125) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > > > but if I do: > > df = df.withColumn("raw.hourOfDay_2", df.col("`raw.hourOfDay`")) > > scala> df.printSchema > root > |-- raw.hourOfDay: long (nullable = true) > |-- raw.minOfDay: long (nullable = true) > |-- raw.dayOfWeek: long (nullable = true) > |-- raw.sensor2: long (nullable = true) > |-- raw.hourOfDay_2: long (nullable = true) > > > it works fine (i.e. column is created). > > The only difference is that the name "raw.hourOfDay_2" does not exist yet, > and is properly created as a colName with dot, not as a nested column. > > The documentation however says that if the column exists it will replace it, > but it seems there is a miss-interpretation of the column name as a nested > column > > > defwithColumn(colName: String, col: Column): DataFrame > > Returns a new DataFrame by adding a column or replacing the existing column > that has the same name. > > > > > Any thoughts on why the different behavior when the column exists? > > > Thanks > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
