[ https://issues.apache.org/jira/browse/SPARK-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14067814#comment-14067814 ]
Yin Huai commented on SPARK-2597: --------------------------------- Hive uses HiveInputFormat as the wrapper of different InputFormats. We may want to have a similar approach (HiveInputFormat cannot be used directly). > Improve the code related to Table Scan > -------------------------------------- > > Key: SPARK-2597 > URL: https://issues.apache.org/jira/browse/SPARK-2597 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Yin Huai > Assignee: Yin Huai > > There are a several issues with the current code related to Table Scan. > 1. HadoopTableReader and HiveTableScan are used together to deal with Hive > tables. It is not clear why we do the Hive-specific work in two different > places. > 2. HadoopTableReader creates a RDD for every Hive partition and then Union > these RDDs. Is it the right way to handle partitioned tables? > 3. Right now, we ship initializeLocalJobConfFunc to every task to set some > local properties. Can we avoid it? > I think it will be good to improve the code related to Table Scan. Also, it > is important to make sure we do not introduce performance issues with the > proposed changes. -- This message was sent by Atlassian JIRA (v6.2#6252)