If you are running a version > 1.1 you can create external parquet tables. I'd recommend setting spark.sql.hive.convertMetastoreParquet=true. Here's a helper function to do it automatically:
/** * Sugar for creating a Hive external table from a parquet path. */ def createParquetTable(name: String, file: String): Unit = { import org.apache.spark.sql.hive.HiveMetastoreTypes val rdd = parquetFile(file) val schema = rdd.schema.fields.map(f => s"${f.name} ${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n") val ddl = s""" |CREATE EXTERNAL TABLE $name ( | $schema |) |ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' |STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' |OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat' |LOCATION '$file'""".stripMargin sql(ddl) setConf("spark.sql.hive.convertMetastoreParquet", "true") } On Mon, Oct 13, 2014 at 9:20 AM, Sadhan Sood <sadhan.s...@gmail.com> wrote: > We want to persist table schema of parquet file so as to use spark-sql cli > on that table later on? Is it possible or is spark-sql cli only good for > tables in hive metastore ? We are reading parquet data using this example: > > // Read in the parquet file created above. Parquet files are self-describing > so the schema is preserved.// The result of loading a Parquet file is also a > SchemaRDD.val parquetFile = sqlContext.parquetFile("people.parquet") > //Parquet files can also be registered as tables and then used in SQL > statements.parquetFile.registerTempTable("parquetFile") > >