Yin Huai created SPARK-2597:
-------------------------------

             Summary: Improve the code related to Table Scan
                 Key: SPARK-2597
                 URL: https://issues.apache.org/jira/browse/SPARK-2597
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: Yin Huai


There are a several issues with the current code related to Table Scan.
1. HadoopTableReader and HiveTableScan are used together to deal with Hive 
tables. It is not clear why we do the Hive-specific work in two different 
places.
2. HadoopTableReader creates a RDD for every Hive partition and then Union 
these RDDs. Is it the right way to handle partitioned tables? 
3. Right now, we ship initializeLocalJobConfFunc to every task to set some 
local properties. Can we avoid it?

I think it will be good to improve the code related to Table Scan. Also, it is 
important to make sure we do not introduce performance issues with the proposed 
changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to