Lars Volker created HIVE-14086:
----------------------------------

             Summary: org.apache.hadoop.hive.metastore.api.Table does not 
return columns from Avro schema file
                 Key: HIVE-14086
                 URL: https://issues.apache.org/jira/browse/HIVE-14086
             Project: Hive
          Issue Type: Bug
          Components: API
            Reporter: Lars Volker


Consider this table, using an external Avro schema file:

{noformat}
CREATE TABLE avro_table
  PARTITIONED BY (str_part STRING)
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.url'='hdfs://localhost:20500/tmp/avro.json'
  );
{noformat}

This will populate the "COLUMNS_V2" metastore table with the correct column 
information (as per HIVE-6308). The columns of this table can then be queried 
via the Hive API, for example by calling {{.getSd().getCols()}} on a 
{{org.apache.hadoop.hive.metastore.api.Table}} object.

Changes to the avro.schema.url file - either changing where it points to or 
changing its contents - will be reflected in the output of {{describe formatted 
avro_table}} *but not* in the result of the {{.getSd().getCols()}} API call. 
Instead it looks like Hive only reads the Avro schema file internally, but does 
not expose the information therein via its API.

Is there a way to obtain the effective Table information via Hive? Would it 
make sense to fix table retrieval so calls to {{get_table}} return the correct 
set of columns?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to