Nan Zhu created SPARK-22790: ------------------------------- Summary: add a configurable factor to describe HadoopFsRelation's size Key: SPARK-22790 URL: https://issues.apache.org/jira/browse/SPARK-22790 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.2.0 Reporter: Nan Zhu
as per discussion in https://github.com/apache/spark/pull/19864#discussion_r156847927 the current HadoopFsRelation is purely based on the underlying file size which is not accurate and makes the execution vulnerable to errors like OOM Users can enable CBO with the functionalities in https://github.com/apache/spark/pull/19864 to avoid this issue This JIRA proposes to add a configurable factor to sizeInBytes method in HadoopFsRelation class so that users can mitigate this problem without CBO -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org