Yi Zhou created SPARK-10484: ------------------------------- Summary: [Spark SQL] Come across lost task(timeout) or GC OOM error when cross join happen Key: SPARK-10484 URL: https://issues.apache.org/jira/browse/SPARK-10484 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Yi Zhou Priority: Critical
Found that it lost task or GC OOM when below cross join happen. The left big table is ~1.2G in size and the right small table is ~2.2K. Key SQL {code sql} SELECT CONCAT(s_store_sk,"_", s_store_name ) AS store_ID, pr_review_date, pr_review_content FROM product_reviews pr, temp_stores_with_regression stores_with_regression WHERE locate(lower(stores_with_regression.s_store_name), lower(pr.pr_review_content), 1) >= 1 ; {code} Physical Plan {code sql} TungstenProject [concat(cast(s_store_sk#456L as string),_,s_store_name#457) AS store_ID#446,pr_review_date#449,pr_review_content#455] Filter (locate(lower(s_store_name#457),lower(pr_review_content#455),1) >= 1) CartesianProduct HiveTableScan [pr_review_date#449,pr_review_content#455], (MetastoreRelation bigbench, product_reviews, Some(pr)) HiveTableScan [s_store_sk#456L,s_store_name#457], (MetastoreRelation bigbench, temp_stores_with_regression, Some(stores_with_regression)) Code Generation: true {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org