Can you open a JIRA for this please? On Wed, Apr 1, 2015 at 6:14 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> +1 on escaping column names. > > > > > On Apr 1, 2015, at 5:50 AM, fergjo00 <johngfergu...@gmail.com> wrote: > > > > Question: > > ----------- > > Is there a way to have JDBC DataFrames use quoted/escaped column names? > > Right now, it looks like it "sees" the names correctly in the schema > created > > but does not escape them in the SQL it creates when they are not > compliant: > > > > org.apache.spark.sql.jdbc.JDBCRDD > > .... > > private val columnList: String = { > > val sb = new StringBuilder() > > columns.foreach(x => sb.append(",").append(x)) > > if (sb.length == 0) "1" else sb.substring(1) > > } > > > > > > If you see value in this, I would take a shot at adding the quoting > > (escaping) of column names here. If you don't do it, some drivers... > like > > postgresql's will simply drop case all names when parsing the query. As > you > > can see in the TL;DR below that means they won't match the schema I am > > given. > > > > Thanks. > > > > TL;DR: > > -------- > > I am able to connect to a Postgres database in the shell (with driver > > referenced): > > > > val jdbcDf = > > sqlContext.jdbc("jdbc:postgresql://localhost/sparkdemo?user=dbuser", > > "sp500") > > > > In fact when I run: > > > > jdbcDf.registerTempTable("sp500") > > val avgEPSNamed = sqlContext.sql("SELECT AVG(`Earnings/Share`) as > AvgCPI > > FROM sp500") > > > > and > > > > val avgEPSProg = jsonDf.agg(avg(jsonDf.col("Earnings/Share"))) > > > > The values come back as expected. However, if I try: > > > > jdbcDf.show > > > > Or if I try > > > > val all = sqlContext.sql("SELECT * FROM sp500") > > all.show > > > > I get errors about column names not being found. In fact the error > includes > > a mention of column names all lower cased. For now I will change my > schema > > to be more restrictive. Right now it is, per a Stack Overflow poster, > not > > ANSI compliant by doing things that are allowed by ""'s in pgsql, MySQL > and > > SQLServer. BTW, our users are giving us tables like this... because > various > > tools they already use support non-compliant names. In fact, this is > mild > > compared to what we've had to support. > > > > Currently the schema in question uses mixed case, quoted names with > special > > characters and spaces: > > > > CREATE TABLE sp500 > > ( > > "Symbol" text, > > "Name" text, > > "Sector" text, > > "Price" double precision, > > "Dividend Yield" double precision, > > "Price/Earnings" double precision, > > "Earnings/Share" double precision, > > "Book Value" double precision, > > "52 week low" double precision, > > "52 week high" double precision, > > "Market Cap" double precision, > > "EBITDA" double precision, > > "Price/Sales" double precision, > > "Price/Book" double precision, > > "SEC Filings" text > > ) > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-3-0-DataFrame-and-Postgres-tp22338.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >