Here's the solution I got after talking with Liancheng:

1) using backquote `..` to wrap up all illegal characters

    val rdd = parquetFile(file)
    val schema = rdd.schema.fields.map(f => s"`${f.name}`
${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n")

    val ddl_13 = s"""
      |CREATE EXTERNAL TABLE $name (
      |  $schema
      |)
      |STORED AS PARQUET
      |LOCATION '$file'
      """.stripMargin

    sql(ddl_13)

2) create a new Schema and do applySchema to generate a new SchemaRDD, had
to drop and register table

    val t = table(name)
    val newSchema = StructType(t.schema.fields.map(s => s.copy(name =
s.name.replaceAll(".*?::", ""))))
    sql(s"drop table $name")
    applySchema(t, newSchema).registerTempTable(name)

I'm testing it for now.

Thanks for the help!


Jianshi

On Sat, Dec 6, 2014 at 8:41 AM, Jianshi Huang <jianshi.hu...@gmail.com>
wrote:

> Hi,
>
> I had to use Pig for some preprocessing and to generate Parquet files for
> Spark to consume.
>
> However, due to Pig's limitation, the generated schema contains Pig's
> identifier
>
> e.g.
> sorted::id, sorted::cre_ts, ...
>
> I tried to put the schema inside CREATE EXTERNAL TABLE, e.g.
>
>   create external table pmt (
>     sorted::id bigint
>   )
>   stored as parquet
>   location '...'
>
> Obviously it didn't work, I also tried removing the identifier sorted::,
> but the resulting rows contain only nulls.
>
> Any idea how to create a table in HiveContext from these Parquet files?
>
> Thanks,
> Jianshi
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Reply via email to