Yanjia Gary Li created HUDI-494: ----------------------------------- Summary: [DEBUGGING] Huge amount of tasks when writing files into HDFS Key: HUDI-494 URL: https://issues.apache.org/jira/browse/HUDI-494 Project: Apache Hudi (incubating) Issue Type: Test Reporter: Yanjia Gary Li Assignee: Vinoth Chandar Attachments: Screen Shot 2020-01-02 at 8.53.24 PM.png, Screen Shot 2020-01-02 at 8.53.44 PM.png
I am using the manual build master after [https://github.com/apache/incubator-hudi/commit/36b3b6f5dd913d3f1c9aa116aff8daf6540fed65] commit. I am seeing 3 million tasks when the Hudi Spark job writing the files into HDFS. I am seeing a huge amount of 0 byte files being written into .hoodie/.temp/ folder in my HDFS. In the Spark UI, each task only writes less than 10 records in {code:java} count at HoodieSparkSqlWriter{code} All the stages before this seems normal. Any idea what happened here? -- This message was sent by Atlassian Jira (v8.3.4#803005)