[jira] [Commented] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss
[ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873270#comment-16873270 ] Wei Zhang commented on HIVE-21915: -- Added the test. Can you help to review the code and test? Thanks![~vgarg] > Hive with TEZ UNION ALL and UDTF results in data loss > - > > Key: HIVE-21915 > URL: https://issues.apache.org/jira/browse/HIVE-21915 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.1 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Attachments: HIVE-21915.01.patch, HIVE-21915.02.patch, > HIVE-21915.03.patch, HIVE-21915.04.patch > > > The HQL syntax is like this: > CREATE TEMPORARY TABLE tez_union_all_loss_data AS > SELECT xxx, yyy, zzz,1 as tag > FROM ods_1 > UNION ALL > SELECT xxx, yyy, zzz, tag > FROM > ( > SELECT xxx > ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy > ,zzz > ,2 as tag > FROM ods_2 > LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb > ) tbl > ; > > With above HQL, we are expecting that rows with both tag = 2 and tag = 1 > appear. In our case however, all the rows with tag = 1 are lost. > Dig deeper we can find that the generated two maps have identical task tmp > paths. And that results from when UDTF is present, the FileSinkOperator will > be processed twice generating the tmp path in > GenTezUtils.removeUnionOperators(); > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss
[ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873200#comment-16873200 ] Hive QA commented on HIVE-21915: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12972924/HIVE-21915.04.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 16341 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/17747/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17747/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17747/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12972924 - PreCommit-HIVE-Build > Hive with TEZ UNION ALL and UDTF results in data loss > - > > Key: HIVE-21915 > URL: https://issues.apache.org/jira/browse/HIVE-21915 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.1 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Attachments: HIVE-21915.01.patch, HIVE-21915.02.patch, > HIVE-21915.03.patch, HIVE-21915.04.patch > > > The HQL syntax is like this: > CREATE TEMPORARY TABLE tez_union_all_loss_data AS > SELECT xxx, yyy, zzz,1 as tag > FROM ods_1 > UNION ALL > SELECT xxx, yyy, zzz, tag > FROM > ( > SELECT xxx > ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy > ,zzz > ,2 as tag > FROM ods_2 > LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb > ) tbl > ; > > With above HQL, we are expecting that rows with both tag = 2 and tag = 1 > appear. In our case however, all the rows with tag = 1 are lost. > Dig deeper we can find that the generated two maps have identical task tmp > paths. And that results from when UDTF is present, the FileSinkOperator will > be processed twice generating the tmp path in > GenTezUtils.removeUnionOperators(); > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss
[ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873160#comment-16873160 ] Hive QA commented on HIVE-21915: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 49s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 9s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 0s{color} | {color:blue} ql in master has 2253 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 7s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-17747/dev-support/hive-personality.sh | | git revision | master / 967a1cc | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | modules | C: ql itests U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-17747/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Hive with TEZ UNION ALL and UDTF results in data loss > - > > Key: HIVE-21915 > URL: https://issues.apache.org/jira/browse/HIVE-21915 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.1 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Attachments: HIVE-21915.01.patch, HIVE-21915.02.patch, > HIVE-21915.03.patch, HIVE-21915.04.patch > > > The HQL syntax is like this: > CREATE TEMPORARY TABLE tez_union_all_loss_data AS > SELECT xxx, yyy, zzz,1 as tag > FROM ods_1 > UNION ALL > SELECT xxx, yyy, zzz, tag > FROM > ( > SELECT xxx > ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy > ,zzz > ,2 as tag > FROM ods_2 > LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb > ) tbl > ; > > With above HQL, we are expecting that rows with both tag = 2 and tag = 1 > appear. In our case however, all the rows with tag = 1 are lost. > Dig deeper we can find that the generated two maps have identical task tmp > paths. And that results from when UDTF is present, the FileSinkOperator will > be processed twice generating the tmp path in > GenTezUtils.removeUnionOperators(); > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss
[ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872828#comment-16872828 ] Hive QA commented on HIVE-21915: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12972865/HIVE-21915.03.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/17739/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17739/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17739/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Tests exited with: Exception: Patch URL https://issues.apache.org/jira/secure/attachment/12972865/HIVE-21915.03.patch was found in seen patch url's cache and a test was probably run already on it. Aborting... {noformat} This message is automatically generated. ATTACHMENT ID: 12972865 - PreCommit-HIVE-Build > Hive with TEZ UNION ALL and UDTF results in data loss > - > > Key: HIVE-21915 > URL: https://issues.apache.org/jira/browse/HIVE-21915 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.1 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Attachments: HIVE-21915.01.patch, HIVE-21915.02.patch, > HIVE-21915.03.patch > > > The HQL syntax is like this: > CREATE TEMPORARY TABLE tez_union_all_loss_data AS > SELECT xxx, yyy, zzz,1 as tag > FROM ods_1 > UNION ALL > SELECT xxx, yyy, zzz, tag > FROM > ( > SELECT xxx > ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy > ,zzz > ,2 as tag > FROM ods_2 > LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb > ) tbl > ; > > With above HQL, we are expecting that rows with both tag = 2 and tag = 1 > appear. In our case however, all the rows with tag = 1 are lost. > Dig deeper we can find that the generated two maps have identical task tmp > paths. And that results from when UDTF is present, the FileSinkOperator will > be processed twice generating the tmp path in > GenTezUtils.removeUnionOperators(); > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss
[ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872490#comment-16872490 ] Hive QA commented on HIVE-21915: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12972865/HIVE-21915.03.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/17730/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17730/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17730/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2019-06-25 16:17:41.035 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-17730/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2019-06-25 16:17:41.039 + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive 84b5ba7..18a5dcb master -> origin/master + git reset --hard HEAD HEAD is now at 84b5ba7 HIVE-21913: GenericUDTFGetSplits should handle usernames in the same way as LLAP (Prasanth Jayachandran reviewed by Jason Dere) + git clean -f -d Removing standalone-metastore/metastore-server/src/gen/ + git checkout master Already on 'master' Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded. (use "git pull" to update your local branch) + git reset --hard origin/master HEAD is now at 18a5dcb HIVE-21857: Sort conditions in a filter predicate to accelerate query processing (Jesus Camacho Rodriguez, reviewed by Vineet Garg) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2019-06-25 16:17:43.315 + rm -rf ../yetus_PreCommit-HIVE-Build-17730 + mkdir ../yetus_PreCommit-HIVE-Build-17730 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-17730 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-17730/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: a/itests/src/test/resources/testconfiguration.properties: does not exist in index error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java: does not exist in index error: patch failed: itests/src/test/resources/testconfiguration.properties:339 Falling back to three-way merge... Applied patch to 'itests/src/test/resources/testconfiguration.properties' with conflicts. Going to apply patch with: git apply -p1 error: patch failed: itests/src/test/resources/testconfiguration.properties:339 Falling back to three-way merge... Applied patch to 'itests/src/test/resources/testconfiguration.properties' with conflicts. U itests/src/test/resources/testconfiguration.properties + result=1 + '[' 1 -ne 0 ']' + rm -rf yetus_PreCommit-HIVE-Build-17730 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12972865 - PreCommit-HIVE-Build > Hive with TEZ UNION ALL and UDTF results in data loss > - > > Key: HIVE-21915 > URL: https://issues.apache.org/jira/browse/HIVE-21915 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.1 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Attachments: HIVE-21915.01.patch, HIVE-21915.02.patch, > HIVE-21915.03.patch > > > The HQL syntax is like this: > CREATE TEMPORARY TABLE tez_union_all_loss_data AS > SELECT xxx, yyy, zzz,1 as tag > FROM ods_1 > UNION ALL > SELECT xxx, yyy, zzz, tag > FROM > ( > SELECT xxx > ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy > ,zzz > ,2 as tag > FROM ods_2 > LATERAL VIEW
[jira] [Commented] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss
[ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872385#comment-16872385 ] Wei Zhang commented on HIVE-21915: -- UPDATE: We have to set hive.merge.tezfiles=true; to reproduce this issue, and updated the test case to turn on file merge. In our settings, hive.merge.tezfiles defaults to true. Ignored this factor before. > Hive with TEZ UNION ALL and UDTF results in data loss > - > > Key: HIVE-21915 > URL: https://issues.apache.org/jira/browse/HIVE-21915 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.1 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Attachments: HIVE-21915.01.patch, HIVE-21915.02.patch, > HIVE-21915.03.patch > > > The HQL syntax is like this: > CREATE TEMPORARY TABLE tez_union_all_loss_data AS > SELECT xxx, yyy, zzz,1 as tag > FROM ods_1 > UNION ALL > SELECT xxx, yyy, zzz, tag > FROM > ( > SELECT xxx > ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy > ,zzz > ,2 as tag > FROM ods_2 > LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb > ) tbl > ; > > With above HQL, we are expecting that rows with both tag = 2 and tag = 1 > appear. In our case however, all the rows with tag = 1 are lost. > Dig deeper we can find that the generated two maps have identical task tmp > paths. And that results from when UDTF is present, the FileSinkOperator will > be processed twice generating the tmp path in > GenTezUtils.removeUnionOperators(); > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss
[ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872122#comment-16872122 ] Wei Zhang commented on HIVE-21915: -- Have reproduced this issue with hive test dataset and added a query test in the new patch. > Hive with TEZ UNION ALL and UDTF results in data loss > - > > Key: HIVE-21915 > URL: https://issues.apache.org/jira/browse/HIVE-21915 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.1 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Attachments: HIVE-21915.01.patch, HIVE-21915.02.patch > > > The HQL syntax is like this: > CREATE TEMPORARY TABLE tez_union_all_loss_data AS > SELECT xxx, yyy, zzz,1 as tag > FROM ods_1 > UNION ALL > SELECT xxx, yyy, zzz, tag > FROM > ( > SELECT xxx > ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy > ,zzz > ,2 as tag > FROM ods_2 > LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb > ) tbl > ; > > With above HQL, we are expecting that rows with both tag = 2 and tag = 1 > appear. In our case however, all the rows with tag = 1 are lost. > Dig deeper we can find that the generated two maps have identical task tmp > paths. And that results from when UDTF is present, the FileSinkOperator will > be processed twice generating the tmp path in > GenTezUtils.removeUnionOperators(); > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss
[ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871688#comment-16871688 ] Hive QA commented on HIVE-21915: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12972724/HIVE-21915.01.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/17711/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17711/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17711/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Tests exited with: Exception: Patch URL https://issues.apache.org/jira/secure/attachment/12972724/HIVE-21915.01.patch was found in seen patch url's cache and a test was probably run already on it. Aborting... {noformat} This message is automatically generated. ATTACHMENT ID: 12972724 - PreCommit-HIVE-Build > Hive with TEZ UNION ALL and UDTF results in data loss > - > > Key: HIVE-21915 > URL: https://issues.apache.org/jira/browse/HIVE-21915 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.1 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Attachments: HIVE-21915.01.patch > > > The HQL syntax is like this: > CREATE TEMPORARY TABLE tez_union_all_loss_data AS > SELECT xxx, yyy, zzz,1 as tag > FROM ods_1 > UNION ALL > SELECT xxx, yyy, zzz, tag > FROM > ( > SELECT xxx > ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy > ,zzz > ,2 as tag > FROM ods_2 > LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb > ) tbl > ; > > With above HQL, we are expecting that rows with both tag = 2 and tag = 1 > appear. In our case however, all the rows with tag = 1 are lost. > Dig deeper we can find that the generated two maps have identical task tmp > paths. And that results from when UDTF is present, the FileSinkOperator will > be processed twice generating the tmp path in > GenTezUtils.removeUnionOperators(); > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss
[ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871686#comment-16871686 ] Hive QA commented on HIVE-21915: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12972724/HIVE-21915.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 16339 tests executed *Failed tests:* {noformat} org.apache.hive.service.TestDFSErrorHandling.testAccessDenied (batchId=272) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/17710/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17710/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17710/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12972724 - PreCommit-HIVE-Build > Hive with TEZ UNION ALL and UDTF results in data loss > - > > Key: HIVE-21915 > URL: https://issues.apache.org/jira/browse/HIVE-21915 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.1 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Attachments: HIVE-21915.01.patch > > > The HQL syntax is like this: > CREATE TEMPORARY TABLE tez_union_all_loss_data AS > SELECT xxx, yyy, zzz,1 as tag > FROM ods_1 > UNION ALL > SELECT xxx, yyy, zzz, tag > FROM > ( > SELECT xxx > ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy > ,zzz > ,2 as tag > FROM ods_2 > LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb > ) tbl > ; > > With above HQL, we are expecting that rows with both tag = 2 and tag = 1 > appear. In our case however, all the rows with tag = 1 are lost. > Dig deeper we can find that the generated two maps have identical task tmp > paths. And that results from when UDTF is present, the FileSinkOperator will > be processed twice generating the tmp path in > GenTezUtils.removeUnionOperators(); > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss
[ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871562#comment-16871562 ] Hive QA commented on HIVE-21915: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 8s{color} | {color:blue} ql in master has 2254 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-17710/dev-support/hive-personality.sh | | git revision | master / 11f7856 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-17710/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Hive with TEZ UNION ALL and UDTF results in data loss > - > > Key: HIVE-21915 > URL: https://issues.apache.org/jira/browse/HIVE-21915 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.1 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Attachments: HIVE-21915.01.patch > > > The HQL syntax is like this: > CREATE TEMPORARY TABLE tez_union_all_loss_data AS > SELECT xxx, yyy, zzz,1 as tag > FROM ods_1 > UNION ALL > SELECT xxx, yyy, zzz, tag > FROM > ( > SELECT xxx > ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy > ,zzz > ,2 as tag > FROM ods_2 > LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb > ) tbl > ; > > With above HQL, we are expecting that rows with both tag = 2 and tag = 1 > appear. In our case however, all the rows with tag = 1 are lost. > Dig deeper we can find that the generated two maps have identical task tmp > paths. And that results from when UDTF is present, the FileSinkOperator will > be processed twice generating the tmp path in > GenTezUtils.removeUnionOperators(); > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss
[ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871507#comment-16871507 ] Vineet Garg commented on HIVE-21915: [~zhangweilst] Thanks for providing the patch. Can you please add a test? > Hive with TEZ UNION ALL and UDTF results in data loss > - > > Key: HIVE-21915 > URL: https://issues.apache.org/jira/browse/HIVE-21915 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.1 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Attachments: HIVE-21915.01.patch > > > The HQL syntax is like this: > CREATE TEMPORARY TABLE tez_union_all_loss_data AS > SELECT xxx, yyy, zzz,1 as tag > FROM ods_1 > UNION ALL > SELECT xxx, yyy, zzz, tag > FROM > ( > SELECT xxx > ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy > ,zzz > ,2 as tag > FROM ods_2 > LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb > ) tbl > ; > > With above HQL, we are expecting that rows with both tag = 2 and tag = 1 > appear. In our case however, all the rows with tag = 1 are lost. > Dig deeper we can find that the generated two maps have identical task tmp > paths. And that results from when UDTF is present, the FileSinkOperator will > be processed twice generating the tmp path in > GenTezUtils.removeUnionOperators(); > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss
[ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871119#comment-16871119 ] Wei Zhang commented on HIVE-21915: -- Just added a patch for this issue. Anybody help to review the code? > Hive with TEZ UNION ALL and UDTF results in data loss > - > > Key: HIVE-21915 > URL: https://issues.apache.org/jira/browse/HIVE-21915 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Labels: pull-request-available > Attachments: hive-21915-2019-06-24.patch > > > The HQL syntax is like this: > CREATE TEMPORARY TABLE tez_union_all_loss_data AS > SELECT xxx, yyy, zzz,1 as tag > FROM ods_1 > UNION ALL > SELECT xxx, yyy, zzz, tag > FROM > ( > SELECT xxx > ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy > ,zzz > ,2 as tag > FROM ods_2 > LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb > ) tbl > ; > > With above HQL, we are expecting that rows with both tag = 2 and tag = 1 > appear. In our case however, all the rows with tag = 1 are lost. > Dig deeper we can find that the generated two maps have identical task tmp > paths. And that results from when UDTF is present, the FileSinkOperator will > be processed twice generating the tmp path in > GenTezUtils.removeUnionOperators(); > -- This message was sent by Atlassian JIRA (v7.6.3#76005)