[ 
https://issues.apache.org/jira/browse/HIVE-15653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830719#comment-15830719
 ] 

Alexander Behm edited comment on HIVE-15653 at 1/19/17 10:28 PM:
-----------------------------------------------------------------

[~ctang.ma] Impala calls the Metastore API alter_table(). I tried the following 
alterations in Impala and those did wipe the table stats:
ALTER TABLE ADD COLUMNS
ALTER TABLE CHANGE COLUMN
ALTER TABLE SET TBLPROPERTIES
ALTER TABLE SET SERDEPROPERTIES
ALTER TABLE SET LOCATION
ALTER TABLE SET FILEFORMAT
ALTER TABLE SET CACHED

So I would say most ALTER commands do wipe the stats (from Impala). Just trying 
to make sure the fix on the Hive side is complete, i.e. the alter_table() API 
call on the Metastore is fixed and not just the Hive DDL commands.

The ALTER TABLE RENAME command worked fine (preserved table stats).


was (Author: alex.behm):
[~ctang.ma] Impala calls the Metastore API alter_table(). I tried the following 
alterations and those did wipe the table stats:
ALTER TABLE ADD COLUMNS
ALTER TABLE CHANGE COLUMN
ALTER TABLE SET TBLPROPERTIES
ALTER TABLE SET SERDEPROPERTIES
ALTER TABLE SET LOCATION
ALTER TABLE SET FILEFORMAT
ALTER TABLE SET CACHED

So I would say most ALTER commands do wipe the stats. Just trying to make sure 
the fix on the Hive side is complete (i.e. the alter_table() API call on the 
Metastore).

The ALTER TABLE RENAME command worked fine (preserved table stats).

> Some ALTER TABLE commands drop table stats
> ------------------------------------------
>
>                 Key: HIVE-15653
>                 URL: https://issues.apache.org/jira/browse/HIVE-15653
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 1.1.0
>            Reporter: Alexander Behm
>            Assignee: Chaoyu Tang
>            Priority: Critical
>         Attachments: HIVE-15653.patch
>
>
> Some ALTER TABLE commands drop the table stats. That may make sense for some 
> ALTER TABLE operations, but certainly not for others. Personally, I I think 
> ALTER TABLE should only change what was requested by the user without any 
> side effects that may be unclear to users. In particular, collecting stats 
> can be an expensive operation so it's rather inconvenient for users if they 
> get wiped accidentally.
> Repro:
> {code}
> create table t (i int);
> insert into t values(1);
> analyze table t compute statistics;
> alter table t set tblproperties('test'='test');
> hive> describe formatted t;
> OK
> # col_name                    data_type               comment             
>                
> i                     int                                         
>                
> # Detailed Table Information           
> Database:             default                  
> Owner:                abehm                    
> CreateTime:           Tue Jan 17 18:13:34 PST 2017     
> LastAccessTime:       UNKNOWN                  
> Protect Mode:         None                     
> Retention:            0                        
> Location:             hdfs://localhost:20500/test-warehouse/t  
> Table Type:           MANAGED_TABLE            
> Table Parameters:              
>       COLUMN_STATS_ACCURATE   false               
>       last_modified_by        abehm               
>       last_modified_time      1484705748          
>       numFiles                1                   
>       numRows                 -1                  
>       rawDataSize             -1                  
>       test                    test                
>       totalSize               2                   
>       transient_lastDdlTime   1484705748          
>                
> # Storage Information          
> SerDe Library:        org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe      
>  
> InputFormat:          org.apache.hadoop.mapred.TextInputFormat         
> OutputFormat:         
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
> Compressed:           No                       
> Num Buckets:          -1                       
> Bucket Columns:       []                       
> Sort Columns:         []                       
> Storage Desc Params:           
>       serialization.format    1                   
> Time taken: 0.169 seconds, Fetched: 34 row(s)
> {code}
> The same behavior can be observed with several other ALTER TABLE commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to