[jira] [Commented] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568197#comment-14568197 ] Jimmy Xiang commented on HIVE-10302: The file was lost during rebasing. I pushed it to master. The build is ok for me now. Thanks. Load small tables (for map join) in executor memory only once [Spark Branch] Key: HIVE-10302 URL: https://issues.apache.org/jira/browse/HIVE-10302 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 1.3.0 Attachments: 10302.patch, HIVE-10302.2-spark.patch, HIVE-10302.3-spark.patch, HIVE-10302.4-spark.patch, HIVE-10302.spark-1.patch Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568202#comment-14568202 ] Sergey Shelukhin commented on HIVE-10302: - Actually I wonder why you guys still work on routine jiras on the branch after the main merge. Usually branch is reserved for major feature and abandoned after merge, unless there's some other major feature with epic merge... Load small tables (for map join) in executor memory only once [Spark Branch] Key: HIVE-10302 URL: https://issues.apache.org/jira/browse/HIVE-10302 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 1.3.0 Attachments: 10302.patch, HIVE-10302.2-spark.patch, HIVE-10302.3-spark.patch, HIVE-10302.4-spark.patch, HIVE-10302.spark-1.patch Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511839#comment-14511839 ] Xuefu Zhang commented on HIVE-10302: +1 Load small tables (for map join) in executor memory only once [Spark Branch] Key: HIVE-10302 URL: https://issues.apache.org/jira/browse/HIVE-10302 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10302.2-spark.patch, HIVE-10302.3-spark.patch, HIVE-10302.4-spark.patch, HIVE-10302.spark-1.patch Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509619#comment-14509619 ] Hive QA commented on HIVE-10302: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727654/HIVE-10302.2-spark.patch {color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 8721 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_2 org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/833/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/833/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-833/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 21 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727654 - PreCommit-HIVE-SPARK-Build Load small tables (for map join) in executor memory only once [Spark Branch] Key: HIVE-10302 URL: https://issues.apache.org/jira/browse/HIVE-10302 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10302.2-spark.patch, HIVE-10302.spark-1.patch Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510158#comment-14510158 ] Hive QA commented on HIVE-10302: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727750/HIVE-10302.3-spark.patch {color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 8721 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/837/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/837/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-837/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 20 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727750 - PreCommit-HIVE-SPARK-Build Load small tables (for map join) in executor memory only once [Spark Branch] Key: HIVE-10302 URL: https://issues.apache.org/jira/browse/HIVE-10302 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10302.2-spark.patch, HIVE-10302.3-spark.patch, HIVE-10302.spark-1.patch Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510336#comment-14510336 ] Xuefu Zhang commented on HIVE-10302: +1 pending on test. The above spark related test failures might be related. Load small tables (for map join) in executor memory only once [Spark Branch] Key: HIVE-10302 URL: https://issues.apache.org/jira/browse/HIVE-10302 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10302.2-spark.patch, HIVE-10302.3-spark.patch, HIVE-10302.spark-1.patch Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)