Lars Volker created HIVE-14086: ---------------------------------- Summary: org.apache.hadoop.hive.metastore.api.Table does not return columns from Avro schema file Key: HIVE-14086 URL: https://issues.apache.org/jira/browse/HIVE-14086 Project: Hive Issue Type: Bug Components: API Reporter: Lars Volker
Consider this table, using an external Avro schema file: {noformat} CREATE TABLE avro_table PARTITIONED BY (str_part STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='hdfs://localhost:20500/tmp/avro.json' ); {noformat} This will populate the "COLUMNS_V2" metastore table with the correct column information (as per HIVE-6308). The columns of this table can then be queried via the Hive API, for example by calling {{.getSd().getCols()}} on a {{org.apache.hadoop.hive.metastore.api.Table}} object. Changes to the avro.schema.url file - either changing where it points to or changing its contents - will be reflected in the output of {{describe formatted avro_table}} *but not* in the result of the {{.getSd().getCols()}} API call. Instead it looks like Hive only reads the Avro schema file internally, but does not expose the information therein via its API. Is there a way to obtain the effective Table information via Hive? Would it make sense to fix table retrieval so calls to {{get_table}} return the correct set of columns? -- This message was sent by Atlassian JIRA (v6.3.4#6332)