Iaroslav Zeigerman created SPARK-17024: ------------------------------------------
Summary: Weird behaviour of the DataFrame when the column name contains dots. Key: SPARK-17024 URL: https://issues.apache.org/jira/browse/SPARK-17024 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.0 Reporter: Iaroslav Zeigerman When a column name contains dots and one of the segment in a name is the same as other column's name, Spark treats this column as a nested structure, although the actual type of column is String/Int/etc. Example: {code} val df = sqlContext.createDataFrame(Seq( ("user1", "task1"), ("user2", "task2") )).toDF("user", "user.task") {code} Two columns "user" and "user.task". Both of them are string, and the schema resolution seems to be correct: {noformat} root |-- user: string (nullable = true) |-- user.task: string (nullable = true) {noformat} But when I'm trying to query this DataFrame like i.e.: {code} df.select(df("user"), df("user.task")) {code} Spark throws an exception "Can't extract value from user#2;" It happens during the resolution of the LogicalPlan and while processing the "user.task" column. Here is the full stacktrace: {noformat} Can't extract value from user#2; org.apache.spark.sql.AnalysisException: Can't extract value from user#2; at org.apache.spark.sql.catalyst.expressions.ExtractValue$.apply(complexTypeExtractors.scala:73) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$4.apply(LogicalPlan.scala:276) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$4.apply(LogicalPlan.scala:275) at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) at scala.collection.immutable.List.foldLeft(List.scala:84) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:275) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:191) at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151) at org.apache.spark.sql.DataFrame.col(DataFrame.scala:708) at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:696) {noformat} Is this actually an expected behaviour? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org