That is a good question.  Names with `.` in them are in particular broken
by SPARK-5632 <https://issues.apache.org/jira/browse/SPARK-5632>, which I'd
like to fix.

There is a more general question of whether strings that are passed to
DataFrames should be treated as quoted identifiers (i.e. `as though they
were in backticks`) or interpreted as normal identifiers in SQL.  I've
opened this JIRA to discuss further: SPARK-6865
<https://issues.apache.org/jira/browse/SPARK-6865>

On Fri, Apr 10, 2015 at 7:18 PM, Justin Yip <yipjus...@prediction.io> wrote:

> Hello,
>
> Are there any restriction in the column name? I tried to use ".", but
> sqlContext.sql cannot find the column. I would guess that "." is tricky as
> this affects accessing StructType, but are there any more restriction on
> column name?
>
> scala> case class A(a: Int)
> defined class A
>
> scala> sqlContext.createDataFrame(Seq(A(10), A(20))).withColumn("b.b",
> $"a" + 1)
> res19: org.apache.spark.sql.DataFrame = [a: int, b.b: int]
>
> scala> res19.registerTempTable("res19")
>
> scala> res19.select("a")
> res24: org.apache.spark.sql.DataFrame = [a: int]
>
> scala> res19.select("a", "b.b")
> org.apache.spark.sql.AnalysisException: cannot resolve 'b.b' given input
> columns a, b.b;
> at
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
> ....
>
>
> Thanks.
>
> Justin
>

Reply via email to