Zoltan Haindrich created HIVE-23776:
---------------------------------------

             Summary: Retire quickstats autocollection
                 Key: HIVE-23776
                 URL: https://issues.apache.org/jira/browse/HIVE-23776
             Project: Hive
          Issue Type: Improvement
            Reporter: Zoltan Haindrich
            Assignee: Zoltan Haindrich


this is about:
* num files
* datasize (sum of filesizes)
* num erasure coded files

right now these are scanned during every BasicStatsTask execution - which means 
some filesystem reads/etc - for small inserts these are visible in case the fs 
is a bit slower (s3 and friends)

I don't think they are really in use...we rely more on columnstats which are 
more accurate ; and because of the datasize in this case is for "offline" 
(ondisk) - while we should be insted calculate with "online" sizes...

proposal:

* remove collection and storage of this data
* collect it on the fly during "desc formatted" statements to provide them for 
informational purposes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to