[ 
https://issues.apache.org/jira/browse/IMPALA-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565820#comment-16565820
 ] 

Todd Lipcon commented on IMPALA-7309:
-------------------------------------

Another weird thing to note: the current behavior seems to be different 
depending whether the main table fileformat is text or parquet. This seems to 
be because of the following code:

{code}
      String serdeLib = msTbl.getSd().getSerdeInfo().getSerializationLib();
      if (serdeLib == null ||
          
serdeLib.equals("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")) {
        // If the SerDe library is null or set to LazySimpleSerDe or is null, it
        // indicates there is an issue with the table metadata since Avro table 
need a
        // non-native serde. Instead of failing to load the table, fall back to
        // using the fields from the storage descriptor (same as Hive).
        return;
{code}

In the case of text, we hit this code path and ignore the avro schema. In the 
case of Parquet, the serde is set to some Parquet-related SerDe and thus we 
fall through to the "reconcile avro schema" code path.

> Prevent the addition of Avro schemas to non-Avro tables with incompatible 
> schema
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-7309
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7309
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog, Frontend
>            Reporter: Todd Lipcon
>            Priority: Major
>
> Per a recent [mailing list 
> thread|https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@<dev.impala.apache.org>]
>  the behavior of Avro partitions within non-Avro tables is inconsistent with 
> Hive, and somewhat suprising. For example, the addition of a partition can 
> cause the results of "describe" on the table to change, but only after a 
> refresh or invalidate. In the mailing list thread, we decided to change the 
> behavior to:
> 1. Schema handling:
> - if a table's properties indicate it's an avro table, parse and adopt the
> external avro schema as the table schema, or infer an avro-compatible schema 
> from the existing columns
> - if a table's properties indicate it's _not_ an avro table, but there is
> an external avro schema defined in the table properties, then parse the
> avro schema and include it in the TableDescriptor (for use by avro
> partitions) but *do not* adopt it as the table schema.
> 2. Handling incompatible schemas:
> - If the table-level format is non-Avro,
> - AND the table contains column types incompatible with Avro (eg tinyint),
> - AND the table has an existing avro partition,
> - THEN the query will yield an error about incompatible types
> 3. Try to prevent shooting in the foot
> - If the table-level format is non-Avro,
> - AND the table contains column types incompatible with Avro (eg tinyint),
> - THEN disallow changing the file format of an existing partition to Avro



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to