Yin Huai created SPARK-2597: ------------------------------- Summary: Improve the code related to Table Scan Key: SPARK-2597 URL: https://issues.apache.org/jira/browse/SPARK-2597 Project: Spark Issue Type: Improvement Components: SQL Reporter: Yin Huai
There are a several issues with the current code related to Table Scan. 1. HadoopTableReader and HiveTableScan are used together to deal with Hive tables. It is not clear why we do the Hive-specific work in two different places. 2. HadoopTableReader creates a RDD for every Hive partition and then Union these RDDs. Is it the right way to handle partitioned tables? 3. Right now, we ship initializeLocalJobConfFunc to every task to set some local properties. Can we avoid it? I think it will be good to improve the code related to Table Scan. Also, it is important to make sure we do not introduce performance issues with the proposed changes. -- This message was sent by Atlassian JIRA (v6.2#6252)