Hi Experts, I have a large file with 300+ columns. In order to query only few rows efficiently, I am using RCFile format in Hive.
I have tried setting the RCFile rowgroup size from default size till 32 MB. ex: set hive.io.rcfile.record.buffer.size = 134217728; However, I do not see major changes in the amount of HDFS data scanned. Moreover, the amount of data scanned with RCFile is not significantly different from row based file. Are there any other parameters which needs to be set for scanning only the relevant fields in RCFile. Is there anything obvious I am missing? Any pointers would be appreciated. -- ~Rajesh.B