[Spark SQL] [Beginner] Dataset[Row] collect to driver throw java.io.EOFException: Premature EOF: no length prefix available

maqy1...@outlook.com Sun, 19 Apr 2020 19:34:22 -0700

Hi all,
　I get a Dataset[Row] through the following code:

val df: Dataset[Row] = 
spark.read.format("csv).schema(schema).load("hdfs://master:9000/mydata")


　After that I want to collect it to the driver:

val df_rows: Array[Row] = df.collect()

　The Spark web ui shows that all tasks have run successfully, but the 
application did not stop. After more than ten minutes, an error will be 
generated in the shell:

java.io.EOFException: Premature EOF: no length prefix available
　
Environment:
    Spark 2.4.3
    Hadoop 2.7.7
    Total rows of data about 800,000,000, 12GB
    
    More detailed information can be seen here:
https://stackoverflow.com/questions/61202566/spark-sql-datasetrow-collect-to-driver-throw-java-io-eofexception-premature-e
    Does anyone know the reason?

Best regards,
maqy

[Spark SQL] [Beginner] Dataset[Row] collect to driver throw java.io.EOFException: Premature EOF: no length prefix available

Reply via email to