Sergey Shelukhin created HIVE-17423:
---------------------------------------

             Summary: LLAP Parquet caching - support file ID in splits
                 Key: HIVE-17423
                 URL: https://issues.apache.org/jira/browse/HIVE-17423
             Project: Hive
          Issue Type: Bug
            Reporter: Sergey Shelukhin


To get LLAP cache data one needs a file ID which is either an HDFS inode ID, or 
a composite of path, modification time and size. These can be embedded into 
splits for ORC, cause in particular for the former it's possible to get the IDs 
as a part of a normal file enumeration that split generation performs anyway.
If they are missing, the IDs need to be obtained for every file on the fragment 
side.
We should explore adding file IDs to Parquet splits when the cache is enabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to