Re: Spark 1.3.0 DataFrame and Postgres

Michael Armbrust Wed, 01 Apr 2015 10:57:07 -0700

Can you open a JIRA for this please?

On Wed, Apr 1, 2015 at 6:14 AM, Ted Yu <yuzhih...@gmail.com> wrote:


> +1 on escaping column names.
>
>
>
> > On Apr 1, 2015, at 5:50 AM, fergjo00 <johngfergu...@gmail.com> wrote:
> >
> > Question:
> > -----------
> > Is there a way to have JDBC DataFrames use quoted/escaped column names?
> > Right now, it looks like it "sees" the names correctly in the schema
> created
> > but does not escape them in the SQL it creates when they are not
> compliant:
> >
> > org.apache.spark.sql.jdbc.JDBCRDD
> > ....
> > private val columnList: String = {
> > val sb = new StringBuilder()
> > columns.foreach(x => sb.append(",").append(x))
> > if (sb.length == 0) "1" else sb.substring(1)
> > }
> >
> >
> > If you see value in this, I would take a shot at adding the quoting
> > (escaping) of column names here.  If you don't do it, some drivers...
> like
> > postgresql's will simply drop case all names when parsing the query.  As
> you
> > can see in the TL;DR below that means they won't match the schema I am
> > given.
> >
> > Thanks.
> >
> > TL;DR:
> > --------
> > I am able to connect to a Postgres database in the shell (with driver
> > referenced):
> >
> >   val jdbcDf =
> > sqlContext.jdbc("jdbc:postgresql://localhost/sparkdemo?user=dbuser",
> > "sp500")
> >
> > In fact when I run:
> >
> >   jdbcDf.registerTempTable("sp500")
> >   val avgEPSNamed = sqlContext.sql("SELECT AVG(`Earnings/Share`) as
> AvgCPI
> > FROM sp500")
> >
> > and
> >
> >   val avgEPSProg = jsonDf.agg(avg(jsonDf.col("Earnings/Share")))
> >
> > The values come back as expected.  However, if I try:
> >
> >   jdbcDf.show
> >
> > Or if I try
> >
> >   val all = sqlContext.sql("SELECT * FROM sp500")
> >   all.show
> >
> > I get errors about column names not being found.  In fact the error
> includes
> > a mention of column names all lower cased.  For now I will change my
> schema
> > to be more restrictive.  Right now it is, per a Stack Overflow poster,
> not
> > ANSI compliant by doing things that are allowed by ""'s in pgsql, MySQL
> and
> > SQLServer.  BTW, our users are giving us tables like this... because
> various
> > tools they already use support non-compliant names.  In fact, this is
> mild
> > compared to what we've had to support.
> >
> > Currently the schema in question uses mixed case, quoted names with
> special
> > characters and spaces:
> >
> > CREATE TABLE sp500
> > (
> > "Symbol" text,
> > "Name" text,
> > "Sector" text,
> > "Price" double precision,
> > "Dividend Yield" double precision,
> > "Price/Earnings" double precision,
> > "Earnings/Share" double precision,
> > "Book Value" double precision,
> > "52 week low" double precision,
> > "52 week high" double precision,
> > "Market Cap" double precision,
> > "EBITDA" double precision,
> > "Price/Sales" double precision,
> > "Price/Book" double precision,
> > "SEC Filings" text
> > )
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-3-0-DataFrame-and-Postgres-tp22338.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Spark 1.3.0 DataFrame and Postgres

Reply via email to