Hi all, I get a Dataset[Row] through the following code: val df: Dataset[Row] = spark.read.format("csv).schema(schema).load("hdfs://master:9000/mydata")
After that I want to collect it to the driver: val df_rows: Array[Row] = df.collect() The Spark web ui shows that all tasks have run successfully, but the application did not stop. After more than ten minutes, an error will be generated in the shell: java.io.EOFException: Premature EOF: no length prefix available Environment: Spark 2.4.3 Hadoop 2.7.7 Total rows of data about 800,000,000, 12GB More detailed information can be seen here: https://stackoverflow.com/questions/61202566/spark-sql-datasetrow-collect-to-driver-throw-java-io-eofexception-premature-e Does anyone know the reason? Best regards, maqy