Here's the solution I got after talking with Liancheng: 1) using backquote `..` to wrap up all illegal characters
val rdd = parquetFile(file) val schema = rdd.schema.fields.map(f => s"`${f.name}` ${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n") val ddl_13 = s""" |CREATE EXTERNAL TABLE $name ( | $schema |) |STORED AS PARQUET |LOCATION '$file' """.stripMargin sql(ddl_13) 2) create a new Schema and do applySchema to generate a new SchemaRDD, had to drop and register table val t = table(name) val newSchema = StructType(t.schema.fields.map(s => s.copy(name = s.name.replaceAll(".*?::", "")))) sql(s"drop table $name") applySchema(t, newSchema).registerTempTable(name) I'm testing it for now. Thanks for the help! Jianshi On Sat, Dec 6, 2014 at 8:41 AM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > Hi, > > I had to use Pig for some preprocessing and to generate Parquet files for > Spark to consume. > > However, due to Pig's limitation, the generated schema contains Pig's > identifier > > e.g. > sorted::id, sorted::cre_ts, ... > > I tried to put the schema inside CREATE EXTERNAL TABLE, e.g. > > create external table pmt ( > sorted::id bigint > ) > stored as parquet > location '...' > > Obviously it didn't work, I also tried removing the identifier sorted::, > but the resulting rows contain only nulls. > > Any idea how to create a table in HiveContext from these Parquet files? > > Thanks, > Jianshi > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/