Kevin Jung created SPARK-5737:
---------------------------------

             Summary: Scanning duplicate columns from parquet table
                 Key: SPARK-5737
                 URL: https://issues.apache.org/jira/browse/SPARK-5737
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.2.1
            Reporter: Kevin Jung


{quote}
import org.apache.spark.sql._
val sqlContext = new SQLContext(sc)
import sqlContext._
val rdd = sqlContext.parquetFile("temp.parquet")
rdd.select('d1,'d1,'d2,'d2).take(3).foreach(println)
{quote}

The results of above code have null values at the preceding columns of 
duplicate two.
For example,

{quote}
[null,-5.7,null,121.05]
[null,-61.17,null,108.91]
[null,50.60,null,72.15]
{quote}

This happens only in ParquetTableScan. PysicalRDD works fine and the rows have 
duplicate values like...

{quote}
[-5.7,-5.7,121.05,121.05]
[-61.17,-61.17,108.91,108.91]
[50.60,50.60,72.15,72.15]
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to