E.g. in Spark SQL I can create temporary table from ORC, Parquet or json files without specifying column names and types
val myDf = sqlContext.read.format("orc").load("s3n://alex/test/mytable_orc") myDf.printSchema root |-- id: string (nullable = true) |-- name: string (nullable = true) |-- rc_state: string (nullable = true) |-- rc_county_name: string (nullable = true) myDf.registerTempTable("mytable") val res = sqlContext.sql(""" select rc_state, count(*) cnt from mytable group by rc_state order by rc_state""") res.show(10) +--------+---+ |rc_state|cnt| +--------+---+ | AK| 37| | AL|224| | AR|109| | AZ| 81| | CA|417| | CO|145| | CT| 71| | DC| 15| | DE| 27| | FL|452| +--------+---+ only showing top 10 rows Lots of companies switch to Spark for ETL. But Hive is still used by many people, reporting tools or legacy solutions to select data from files (tables) prepared by Spark. It would be nice if Hive can create table based on ORC or Parquet file(s) without specifying table columns and types. Integration with Spark output will be easier. On Wed, Dec 9, 2015 at 9:50 AM, Owen O'Malley <omal...@apache.org> wrote: > So your use case is that you already have the ORC files and you want a > table that can read those files without specifying the columns in the > table? Obviously without the columns being specified Hive wouldn't be able > to write to that table, so I assume you only care about reading it. Is that > right? > > .. Owen > > On Wed, Dec 2, 2015 at 9:53 PM, Alexander Pivovarov <apivova...@gmail.com> > wrote: > >> Hi Everyone >> >> Is it possible to create Hive table from ORC or Parquet file without >> specifying field names and their types. ORC or Parquet files contain field >> name and type information inside. >> >> Alex >> > >