I have created https://issues.apache.org/jira/browse/SPARK-2775 to track it.
On Thu, Jul 31, 2014 at 11:47 AM, Budde, Adam <bu...@amazon.com> wrote: > I still see the same “Unresolved attributes” error when using hql + > backticks. > > Here’s a code snippet that replicates this behavior: > > val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) > val sampleRDD = sc.parallelize(Array("""{"key.one": "value1", "key.two": > "value2"}""")) > val sampleTable = hiveContext.jsonRDD(sampleRDD) > sampleTable.registerAsTable("sample_table") > hiveContext.hql("SELECT `key.one` FROM sample_table") > > From: Michael Armbrust <mich...@databricks.com> > Reply-To: "user@spark.apache.org" <user@spark.apache.org> > Date: Thursday, July 31, 2014 at 11:20 AM > To: "user@spark.apache.org" <user@spark.apache.org> > Subject: Re: Inconsistent Spark SQL behavior when column names contain > dots > > Ideally you'd use backticks to reference columns that contain weird > characters. I don't believe this works in sql parser, but I'm curious if > using the hql parser in HiveContext would work for you? > > If you wanted to add support for this in the sql parser I'd check out > SqlParser.scala. Thought it is likely we will abandon that code in the > next release for something more complete. > > > On Thu, Jul 31, 2014 at 11:16 AM, Budde, Adam <bu...@amazon.com> wrote: > >> I’m working with a dataset where each row is stored as a single-line >> flat JSON object. I want to leverage Spark SQL to run relational queries on >> this data. Many of the object keys in this dataset have dots in them, e.g.: >> >> { “key.number1”: “value1”, “key.number2”: “value2” … } >> >> I can successfully load the data as an RDD in Spark and construct a >> Spark SQL table using the jsonRDD function. If I print the schema of the >> table, I see that Spark SQL infers the full object key, dot included, as >> the column name: >> >> > sqlTable.printSchema() >> root >> |-- key.number1: StringType >> |-- key.number2: StringType >> … >> >> However, when I try to use one of these column names in a query, it >> seems that the Spark SQL parser always assumes I’m trying to reference a >> nested attribute. The same thing happens when using HiveQL. If there’s a >> way to escape the dot in the column name, I haven’t found it: >> >> > sqlContext.sql(“SELECT key.number1 FROM TABLE sql_table LIMIT >> 1”).first >> == Query Plan == >> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: >> Unresolved attributes: ‘key.number1, tree >> Project [‘key.number1] >> … >> >> This is not a critical issue by any means— it’s simple enough to use >> map() to transform the dots to underscores after loading the JSON data as >> text. I just wanted to reach out to the community for some guidance as to >> whether or not this issue warrants a bug report. To me, this behavior seems >> to be inconsistent— you can create a table with column names containing >> dots, but AFAICT you cannot include such columns in a query. >> >> Also, I’d greatly appreciate it if anybody has any pointers as to where >> in the source I should be looking if I wanted to patch this issue in my >> local branch. I’ve taken a glance at some of the Spark SQL Catalyst code >> but I’m afraid I’m too much of a Scala novice to make much headway here. >> >> For reference, I’m using Spark 1.0.1. Thanks for your input. >> >> Adam >> > >