Virgil Palanciuc created SPARK-11657:
----------------------------------------

             Summary: Bad data read using dataframes
                 Key: SPARK-11657
                 URL: https://issues.apache.org/jira/browse/SPARK-11657
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, SQL
    Affects Versions: 1.5.1, 1.5.2
         Environment: EMR (yarn)
            Reporter: Virgil Palanciuc
            Priority: Critical


I get strange behaviour when reading parquet data:

{code}
scala> val data = sqlContext.read.parquet("hdfs:///sample")
data: org.apache.spark.sql.DataFrame = [clusterSize: int, clusterName: string, 
clusterData: array<string>, dpid: int]
scala> data.take(1)    /// this returns garbage
res0: Array[org.apache.spark.sql.Row] = 
Array([1,56169A947F000101????????,WrappedArray(164594606101815510825479776971????????),813])
 
scala> data.collect()    /// this works
res1: Array[org.apache.spark.sql.Row] = 
Array([1,6A01CACD56169A947F000101,WrappedArray(77512098164594606101815510825479776971),813])
{code}

I've included the "hdfs:///sample" directory here:

https://www.dropbox.com/s/su0flfn49rrc7jz/sample.tgz?dl=0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to