[jira] [Updated] (HIVE-8920) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8920: -- Attachment: HIVE-8920.4-spark.patch IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch] --- Key: HIVE-8920 URL: https://issues.apache.org/jira/browse/HIVE-8920 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Xuefu Zhang Attachments: HIVE-8920.1-spark.patch, HIVE-8920.2-spark.patch, HIVE-8920.3-spark.patch, HIVE-8920.4-spark.patch The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. HIVE-9041 addressed partially addressed the problem by removing union task. However, it's still necessary to cloning M1 and M2 to support multi-insert. Because M1 and M2 can run in a single JVM, the original solution of storing a global IOContext will not work because M1 and M2 have different io contexts, both needing to be stored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8920) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8920: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Patch #3 is committed to Spark branch. IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch] --- Key: HIVE-8920 URL: https://issues.apache.org/jira/browse/HIVE-8920 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Xuefu Zhang Fix For: spark-branch Attachments: HIVE-8920.1-spark.patch, HIVE-8920.2-spark.patch, HIVE-8920.3-spark.patch, HIVE-8920.4-spark.patch The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. HIVE-9041 addressed partially addressed the problem by removing union task. However, it's still necessary to cloning M1 and M2 to support multi-insert. Because M1 and M2 can run in a single JVM, the original solution of storing a global IOContext will not work because M1 and M2 have different io contexts, both needing to be stored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8920) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8920: -- Attachment: HIVE-8920.3-spark.patch We have to disable caching for MapInput because of IOContext initialization problem. This would have performance impact, but only for multi-insert cases. Regardless, correctness goes above performance. Patch #3 includes a fix as such. IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch] --- Key: HIVE-8920 URL: https://issues.apache.org/jira/browse/HIVE-8920 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Xuefu Zhang Attachments: HIVE-8920.1-spark.patch, HIVE-8920.2-spark.patch, HIVE-8920.3-spark.patch The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. HIVE-9041 addressed partially addressed the problem by removing union task. However, it's still necessary to cloning M1 and M2 to support multi-insert. Because M1 and M2 can run in a single JVM, the original solution of storing a global IOContext will not work because M1 and M2 have different io contexts, both needing to be stored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8920) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8920: -- Summary: IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch] (was: SplitSparkWorkResolver doesn't work with UnionWork [Spark Branch]) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch] --- Key: HIVE-8920 URL: https://issues.apache.org/jira/browse/HIVE-8920 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Xuefu Zhang Attachments: HIVE-8920.1-spark.patch The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8920) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8920: -- Description: The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. HIVE-9041 addressed partially addressed the problem by removing union task. However, it's still necessary to cloning M1 and M2 to support multi-insert. Because M1 and M2 can run in a single JVM, the original solution of storing a global IOContext will not work because M1 and M2 have different io contexts, both needing to be stored. was: The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch] --- Key: HIVE-8920 URL: https://issues.apache.org/jira/browse/HIVE-8920 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Xuefu Zhang Attachments: HIVE-8920.1-spark.patch The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. HIVE-9041 addressed partially addressed the problem by removing union task. However, it's still necessary to cloning M1 and M2 to support multi-insert. Because M1 and M2 can run in a single JVM, the original solution of storing a global IOContext will not work because M1 and M2 have different io contexts, both needing to be stored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8920) IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8920: -- Attachment: HIVE-8920.2-spark.patch Patch #2 correct some code styling issue. IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch] --- Key: HIVE-8920 URL: https://issues.apache.org/jira/browse/HIVE-8920 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Xuefu Zhang Attachments: HIVE-8920.1-spark.patch, HIVE-8920.2-spark.patch The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. HIVE-9041 addressed partially addressed the problem by removing union task. However, it's still necessary to cloning M1 and M2 to support multi-insert. Because M1 and M2 can run in a single JVM, the original solution of storing a global IOContext will not work because M1 and M2 have different io contexts, both needing to be stored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)