Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15290 )

Change subject: IMPALA-9363: Add support for skipping given table types
......................................................................


Patch Set 1:

(3 comments)

Hi Vihang, I have replied you regarding whether or not it is possible to 
perform a more fine-grained filtering of tables based on the information in the 
corresponding SerDeInfo. Please let me know how you would like to proceed and 
let me know if I have missed something important. Thanks!

http://gerrit.cloudera.org:8080/#/c/15290/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15290/1//COMMIT_MSG@14
PS1, Line 14: 
--catalogd_args=--blacklisted_table_types=<list_of_blacklisted_types>'.
            : Five table types are supported, namely, 'hdfs', 'hbase', 'view',
            : 'data_source', and 'kudu
> I think this is a very broad way to ignore the table types. Rather than ign
Thanks for the suggestion Vihang!

In this regard, I conducted a preliminary investigation and found that except 
for the HDFS tables, it seems not that obvious to perform a more fine-grained 
filtering of tables based on the information given in the field 'serdeInfo' 
under the corresponding instance of StorageDescriptor, which is a field under 
the class org.apache.hadoop.hive.metastore.api.Table.

Specifically, I tried to collect all the possible mappings from the set of 
TTableType to the set of serialization libraries. Recall that there is a field 
of 'serializationLib' under each instance of SerDeInfo. The mappings are given 
in the following.

The possible mappings for HDFS tables.
TTableType.HDFS_TABLE -> org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
TTableType.HDFS_TABLE -> 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe
TTableType.HDFS_TABLE -> 
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
TTableType.HDFS_TABLE -> 
org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
TTableType.HDFS_TABLE -> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
TTableType.HDFS_TABLE -> org.apache.hadoop.hive.serde2.avro.AvroSerDe
TTableType.HDFS_TABLE -> org.apache.hadoop.hive.ql.io.orc.OrcSerde

The possible mapping for HBase tables.
TTableType.HBASE_TABLE -> org.apache.hadoop.hive.hbase.HBaseSerDe

The possible mapping for a view.
TTableType.VIEW -> null

The possible mapping for a data source table.
TTableType.DATA_SOURCE_TABLE -> 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

The possible mapping for a Kudu table.
TTableType.KUDU_TABLE -> org.apache.hadoop.hive.kudu.KuduSerDe

On the other hand, I also tried to observe the values in other fields of a 
SerdeInfo, but for now I cannot find a field that could help us perform a more 
fine-grained filtering of tables.


http://gerrit.cloudera.org:8080/#/c/15290/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/15290/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3085
PS1, Line 3085:   private TTableType 
getTableType(org.apache.hadoop.hive.metastore.api.Table msTbl) {
> change to static method?
Done


http://gerrit.cloudera.org:8080/#/c/15290/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3101
PS1, Line 3101: HBASE_TABLE
> Why do we return Hbase table type here? Can you please reconfirm?
Thanks for catching this! It should be TTableType.HDFS_TABLE instead.



--
To view, visit http://gerrit.cloudera.org:8080/15290
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I49c4062b48f1bb87adfd851ee26cc144fb70b4b7
Gerrit-Change-Number: 15290
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Comment-Date: Tue, 25 Feb 2020 23:08:22 +0000
Gerrit-HasComments: Yes

Reply via email to