HI,
I just want to figure out why the two contexts behavior differently even on
a simple query.
In a netshell, I have a query in which there is a String containing single
quote and casting to Array/Map.
I have tried all the combination of diff type of sql context and query call
api (sql, df.select, df.selectExpr).
I can't find one rules all.
Here is the code for reproducing the problem.
-----------------------------------------------------------------------------
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkConf, SparkContext}
object Test extends App {
val sc = new SparkContext("local[2]", "test", new SparkConf)
val hiveContext = new HiveContext(sc)
val sqlContext = new SQLContext(sc)
val context = hiveContext
// val context = sqlContext
import context.implicits._
val df = Seq((Seq(1, 2), 2)).toDF("a", "b")
df.registerTempTable("tbl")
df.printSchema()
// case 1
context.sql("select cast(a as array<string>) from tbl").show()
// HiveContext => org.apache.spark.sql.AnalysisException: cannot
recognize input near 'array' '<' 'string' in primitive type
specification; line 1 pos 17
// SQLContext => OK
// case 2
context.sql("select 'a\\'b'").show()
// HiveContext => OK
// SQLContext => failure: ``union'' expected but ErrorToken(unclosed
string literal) found
// case 3
df.selectExpr("cast(a as array<string>)").show() // OK with
HiveContext and SQLContext
// case 4
df.selectExpr("'a\\'b'").show() // HiveContext, SQLContext =>
failure: end of input expected
}
-----------------------------------------------------------------------------
Any clarification / workaround is high appreciated.
--
Hao Ren
Data Engineer @ leboncoin
Paris, France