[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971145#comment-14971145 ] Hive QA commented on HIVE-11954: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12768134/HIVE-11954.12.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9701 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.io.orc.TestColumnStatistics.testHasNull org.apache.hadoop.hive.ql.io.orc.TestJsonFileDump.testJsonDump org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5750/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5750/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5750/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12768134 - PreCommit-HIVE-TRUNK-Build > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.04.patch, HIVE-11954.05.patch, > HIVE-11954.06.patch, HIVE-11954.07.patch, HIVE-11954.08.patch, > HIVE-11954.09.patch, HIVE-11954.10.patch, HIVE-11954.11.patch, > HIVE-11954.12.patch, HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969935#comment-14969935 ] Laljo John Pullokkaran commented on HIVE-11954: --- +1 > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.04.patch, HIVE-11954.05.patch, > HIVE-11954.06.patch, HIVE-11954.07.patch, HIVE-11954.08.patch, > HIVE-11954.09.patch, HIVE-11954.10.patch, HIVE-11954.11.patch, > HIVE-11954.12.patch, HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960609#comment-14960609 ] Hive QA commented on HIVE-11954: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12766757/HIVE-11954.10.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9694 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5675/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5675/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5675/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12766757 - PreCommit-HIVE-TRUNK-Build > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.04.patch, HIVE-11954.05.patch, > HIVE-11954.06.patch, HIVE-11954.07.patch, HIVE-11954.08.patch, > HIVE-11954.09.patch, HIVE-11954.10.patch, HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960622#comment-14960622 ] Jesus Camacho Rodriguez commented on HIVE-11954: [~jpullokkaran], test fails are unrelated. Could you take another look? Thanks > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.04.patch, HIVE-11954.05.patch, > HIVE-11954.06.patch, HIVE-11954.07.patch, HIVE-11954.08.patch, > HIVE-11954.09.patch, HIVE-11954.10.patch, HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957269#comment-14957269 ] Hive QA commented on HIVE-11954: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12766526/HIVE-11954.08.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 28 failed/errored test(s), 9663 tests executed *Failed tests:* {noformat} TestCliDriver-auto_sortmerge_join_13.q-tez_self_join.q-alter_partition_clusterby_sortby.q-and-12-more - did not produce a TEST-*.xml file TestSparkNegativeCliDriver - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_bmj_schema_evolution org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dynpart_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_hash org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_smb_main org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join30 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join_filters org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join_nulls org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_15 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_bmj_schema_evolution org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dynpart_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_smb_main org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_join30 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_join_nulls org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_leftsemi_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_nullsafe_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join5 org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5648/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5648/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5648/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 28 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12766526 - PreCommit-HIVE-TRUNK-Build > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.04.patch, HIVE-11954.05.patch, > HIVE-11954.06.patch, HIVE-11954.07.patch, HIVE-11954.08.patch, > HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956268#comment-14956268 ] Hive QA commented on HIVE-11954: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12766421/HIVE-11954.07.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 9683 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map_multi_distinct org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_bucket_map_join_tez1 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_bucket_map_join_tez2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_explainuser_2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_mrr org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap_auto org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter_partitioned org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5639/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5639/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5639/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12766421 - PreCommit-HIVE-TRUNK-Build > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.04.patch, HIVE-11954.05.patch, > HIVE-11954.06.patch, HIVE-11954.07.patch, HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955069#comment-14955069 ] Jesus Camacho Rodriguez commented on HIVE-11954: [~jpullokkaran], thanks for your comments. I have updated the patch. I reply to your remarks below: 1. This is taken care of in lines 592-597. {noformat} if (inputSize/buckets > maxSize) { {noformat} The original logic of the method remains unchanged, it will work properly. 2. I will do that then; I didn't use the abstract class because then we were getting some additional warnings. 3. Good catch, I didn't realize. > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.04.patch, HIVE-11954.05.patch, > HIVE-11954.06.patch, HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955449#comment-14955449 ] Laljo John Pullokkaran commented on HIVE-11954: --- #1 If a relation has input size > maxSize (i.e it can not be kept in memory) then we shouldn't use costlyops comparison. Instead that relation should be chosen as streaming side. The check in Line 581 could prevent an input that has lower costlyops but whose size > maxis from becoming streaming side. if (bigInputNumberCostlyOps == -1 || inputNumberCostlyOps > bigInputNumberCostlyOps || (inputNumberCostlyOps == bigInputNumberCostlyOps && inputSize > bigInputStat.getDataSize())) { > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.04.patch, HIVE-11954.05.patch, > HIVE-11954.06.patch, HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953649#comment-14953649 ] Laljo John Pullokkaran commented on HIVE-11954: --- [~jcamachorodriguez] Looked at latest patch, comments are below: 1. ConvertJoinMapJoin.getMapJoinConversionPos: If for any relation inputSize > maxSize (i.e noconditionaltask.size) then this optimization shouldn't apply even if the costly ops comparison is not favorable. 2. ConvertJoinMapJoin.COSTLY_OPERATORS: Instead of listing all of the join sub classes could we use "CommonJoinOperator" 3. OperatorUtils.iterateParentsExcludingCurrent: This is not used and could be removed. > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.04.patch, HIVE-11954.05.patch, > HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952104#comment-14952104 ] Hive QA commented on HIVE-11954: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12765805/HIVE-11954.05.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9662 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5601/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5601/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5601/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12765805 - PreCommit-HIVE-TRUNK-Build > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.04.patch, HIVE-11954.05.patch, > HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950362#comment-14950362 ] Hive QA commented on HIVE-11954: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12765579/HIVE-11954.04.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9650 tests executed *Failed tests:* {noformat} TestSparkNegativeCliDriver - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5584/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5584/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5584/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12765579 - PreCommit-HIVE-TRUNK-Build > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.04.patch, HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948415#comment-14948415 ] Hive QA commented on HIVE-11954: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12765374/HIVE-11954.03.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9652 tests executed *Failed tests:* {noformat} TestCompareCliDriver - did not produce a TEST-*.xml file org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5567/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5567/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5567/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12765374 - PreCommit-HIVE-TRUNK-Build > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949282#comment-14949282 ] Laljo John Pullokkaran commented on HIVE-11954: --- As we discussed, patch could reuse OperatorUtils.classifyOperators to for counting no of ops of certain type. > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.04.patch, HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948033#comment-14948033 ] Laljo John Pullokkaran commented on HIVE-11954: --- "getNumberOfCostlyOps" could be made either recursive or use graph walker or by modifying nodeutils. > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942066#comment-14942066 ] Hive QA commented on HIVE-11954: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12764563/HIVE-11954.02.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5505/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5505/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5505/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-5505/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at bbb312f HIVE-11913 : Verify existence of tests for new changes in HiveQA (Szehon, reviewed by Sergio Pena) + git clean -f -d Removing ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java.orig Removing ql/src/test/queries/clientpositive/union36.q Removing ql/src/test/results/clientpositive/union36.q.out + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at bbb312f HIVE-11913 : Verify existence of tests for new changes in HiveQA (Szehon, reviewed by Sergio Pena) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12764563 - PreCommit-HIVE-TRUNK-Build > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB,
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935266#comment-14935266 ] Hive QA commented on HIVE-11954: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12762593/HIVE-11954.01.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9646 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5457/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5457/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5457/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12762593 - PreCommit-HIVE-TRUNK-Build > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909490#comment-14909490 ] Hive QA commented on HIVE-11954: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12762444/HIVE-11954.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 9605 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-vectorized_parquet.q-vector_char_mapjoin1.q-tez_insert_overwrite_local_directory_1.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_10 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mrr org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_unionDistinct_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5429/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5429/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5429/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12762444 - PreCommit-HIVE-TRUNK-Build > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)