TableScan vs PrunedScan

2015-07-07 Thread Gil Vernik
Hi All,

I wanted to experiment a little bit with TableScan and PrunedScan.
My first test was to print columns from various SQL queries. 
To make this test easier, i just took spark-csv and i replaced TableScan 
with PrunedScan. 
I then changed buildScan method of CsvRelation from 

def BuildScan = { 

to 

def buildScan(requiredColumns: Array[String]) = {?

This was the only modification i did to CsvRelation.scala.  And I added 
print of requiredColums to log.

I then took the same CSV file and run very simple SELECT query on it.
I noticed that when CsvRelation used TableScan - all worked correctly.
But when i used PrunedScan - it didn?t worked and returned empty columns / 
or columns in wrong order. 

Why is this happens? Is it some bug? Because I thought that PrunedScan 
suppose to work exactly the same as TableScan and i can modify freely 
TableScan to PrunedScan. I thought that the only difference is that 
buildScan of PrunedScan has requiredColumns as parameter.

Can someone explain me the behavior i saw?

I am using Spark 1.5 from trunk.
Thanks a lot
Gil.

Re: TableScan vs PrunedScan

2015-07-07 Thread Ram Sriharsha
Hi Gil

You would need to prune the resulting Row as well based on the requested 
columns.

Ram

Sent from my iPhone

 On Jul 7, 2015, at 3:12 AM, Gil Vernik g...@il.ibm.com wrote:
 
 Hi All, 
 
 I wanted to experiment a little bit with TableScan and PrunedScan. 
 My first test was to print columns from various SQL queries.  
 To make this test easier, i just took spark-csv and i replaced TableScan with 
 PrunedScan. 
 I then changed buildScan method of CsvRelation from 
 
 def BuildScan = { 
 
 to  
 
 def buildScan(requiredColumns: Array[String]) = {… 
 
 This was the only modification i did to CsvRelation.scala.  And I added print 
 of requiredColums to log. 
 
 I then took the same CSV file and run very simple SELECT query on it. 
 I noticed that when CsvRelation used TableScan - all worked correctly. 
 But when i used PrunedScan - it didn’t worked and returned empty columns / or 
 columns in wrong order.  
 
 Why is this happens? Is it some bug? Because I thought that PrunedScan 
 suppose to work exactly the same as TableScan and i can modify freely 
 TableScan to PrunedScan. I thought that the only difference is that buildScan 
 of PrunedScan has requiredColumns as parameter. 
 
 Can someone explain me the behavior i saw? 
 
 I am using Spark 1.5 from trunk. 
 Thanks a lot 
 Gil.