Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14712
  
    Found another related JIRA: 
https://issues.apache.org/jira/browse/HIVE-12730. This is only available in 
Hive 2.1. They are trying to resolve the issues we hit:
    
    > We would like to provide a way for developers/users to modify the numRows 
and dataSize for a table/partition. Right now although they are part of the 
table properties, they will be set to -1 when the task is not coming from a 
statsTask.
    
    Now, users can change the statistics by the DDL statement. For example,
    ```SQL
    alter table s update statistics set('numRows'='1212', 
'rawDataSize'='500500');
    ```
    
    More important, after this fix, what we did in this PR does not work. 
`STATS_GENERATED_VIA_STATS_TASK` is not available. Instead, they changed it to 
`STATS_GENERATED`. Two options are availabel: `TASK` and `USER`. 
    
    Thus, I believe our solution should not be based on how Hive behaves, if 
possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to