[jira] [Commented] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data
[ https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896731#comment-16896731 ] Yuanbo Liu commented on HIVE-10685: --- [~wangbaoyun] Sorry to interrupt, Have you found any solution to recovery those merged files? > Alter table concatenate oparetor will cause duplicate data > -- > > Key: HIVE-10685 > URL: https://issues.apache.org/jira/browse/HIVE-10685 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1 >Reporter: guoliming >Assignee: guoliming >Priority: Critical > Fix For: 1.2.1 > > Attachments: HIVE-10685.patch, HIVE-10685.patch > > > "Orders" table has 15 rows and stored as ORC. > {noformat} > hive> select count(*) from orders; > OK > 15 > Time taken: 37.692 seconds, Fetched: 1 row(s) > {noformat} > The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB. > After executing command : ALTER TABLE orders CONCATENATE; > The table is already 1530115000 rows. > My hive version is 1.1.0. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data
[ https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863297#comment-15863297 ] wangbaoyun commented on HIVE-10685: --- if the orc file had been merged, the EOFException like "java.io.EOFException: Read past end of RLE integer from compressed stream Stream for column 1 kind DATA" and the stripe index out of the range happened, how to fix the merged file? > Alter table concatenate oparetor will cause duplicate data > -- > > Key: HIVE-10685 > URL: https://issues.apache.org/jira/browse/HIVE-10685 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1 >Reporter: guoliming >Assignee: guoliming >Priority: Critical > Fix For: 1.2.1 > > Attachments: HIVE-10685.patch, HIVE-10685.patch > > > "Orders" table has 15 rows and stored as ORC. > {noformat} > hive> select count(*) from orders; > OK > 15 > Time taken: 37.692 seconds, Fetched: 1 row(s) > {noformat} > The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB. > After executing command : ALTER TABLE orders CONCATENATE; > The table is already 1530115000 rows. > My hive version is 1.1.0. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data
[ https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510822#comment-15510822 ] Prasanth Jayachandran commented on HIVE-10685: -- It got committed https://github.com/apache/hive/commit/aef08f44e29e9a54e73b8029892033fe16c52cc5 > Alter table concatenate oparetor will cause duplicate data > -- > > Key: HIVE-10685 > URL: https://issues.apache.org/jira/browse/HIVE-10685 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1 >Reporter: guoliming >Assignee: guoliming >Priority: Critical > Fix For: 1.2.1 > > Attachments: HIVE-10685.patch, HIVE-10685.patch > > > "Orders" table has 15 rows and stored as ORC. > {noformat} > hive> select count(*) from orders; > OK > 15 > Time taken: 37.692 seconds, Fetched: 1 row(s) > {noformat} > The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB. > After executing command : ALTER TABLE orders CONCATENATE; > The table is already 1530115000 rows. > My hive version is 1.1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data
[ https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510141#comment-15510141 ] Oleksiy Sayankin commented on HIVE-10685: - [prasanth_j] I can see a5afaa04538b6ca96b34febad49cc1daef9fe2f4 for HIVE-10685: Alter table concatenate... and f4a68c9677602e24b06e4f2fd01d8b6258b709e6 for Revert "HIVE-10685: Alter table concatenate... So was patch for HIVE-10685 applied or not? Since our customer has an exception with Hive-1.2: {code} Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream Stream for column 1 kind DATA position: 2108 length: 2108 range: 0 offset: 2108 limit: 2108 range 0 = 0 to 2108 uncompressed: 44422 to 44422 at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:56) at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:302) at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$LongTreeReader.next(TreeReaderFactory.java:564) at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1039) ... 18 more {code} > Alter table concatenate oparetor will cause duplicate data > -- > > Key: HIVE-10685 > URL: https://issues.apache.org/jira/browse/HIVE-10685 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1 >Reporter: guoliming >Assignee: guoliming >Priority: Critical > Fix For: 1.2.1 > > Attachments: HIVE-10685.patch, HIVE-10685.patch > > > "Orders" table has 15 rows and stored as ORC. > {noformat} > hive> select count(*) from orders; > OK > 15 > Time taken: 37.692 seconds, Fetched: 1 row(s) > {noformat} > The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB. > After executing command : ALTER TABLE orders CONCATENATE; > The table is already 1530115000 rows. > My hive version is 1.1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data
[ https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578108#comment-14578108 ] Hive QA commented on HIVE-10685: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12738446/HIVE-10685.1.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9004 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autogen_colalias org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter org.apache.hive.beeline.TestSchemaTool.testSchemaInit org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4219/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4219/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4219/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12738446 - PreCommit-HIVE-TRUNK-Build > Alter table concatenate oparetor will cause duplicate data > -- > > Key: HIVE-10685 > URL: https://issues.apache.org/jira/browse/HIVE-10685 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1 >Reporter: guoliming >Assignee: guoliming > Fix For: 1.2.0, 1.1.0 > > Attachments: HIVE-10685.1.patch, HIVE-10685.patch > > > "Orders" table has 15 rows and stored as ORC. > {noformat} > hive> select count(*) from orders; > OK > 15 > Time taken: 37.692 seconds, Fetched: 1 row(s) > {noformat} > The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB. > After executing command : ALTER TABLE orders CONCATENATE; > The table is already 1530115000 rows. > My hive version is 1.1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data
[ https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577942#comment-14577942 ] Prasanth Jayachandran commented on HIVE-10685: -- [~FanTn] Thanks for the patch. I just updated the patch so that precommit test can apply the patch cleanly. Also made another minor change in the patch to move the stripe index increment out of the condition. Will commit the patch if the precommit test runs cleanly. > Alter table concatenate oparetor will cause duplicate data > -- > > Key: HIVE-10685 > URL: https://issues.apache.org/jira/browse/HIVE-10685 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1 >Reporter: guoliming >Assignee: guoliming > Fix For: 1.2.0, 1.1.0 > > Attachments: HIVE-10685.1.patch, HIVE-10685.patch > > > "Orders" table has 15 rows and stored as ORC. > {noformat} > hive> select count(*) from orders; > OK > 15 > Time taken: 37.692 seconds, Fetched: 1 row(s) > {noformat} > The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB. > After executing command : ALTER TABLE orders CONCATENATE; > The table is already 1530115000 rows. > My hive version is 1.1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data
[ https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577936#comment-14577936 ] Hive QA commented on HIVE-10685: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12732760/HIVE-10685.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4218/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4218/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4218/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4218/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive 7db3fb3..d038bd8 branch-1 -> origin/branch-1 77b2c20..f534590 branch-1.2 -> origin/branch-1.2 a802104..7ae1d0b master -> origin/master + git reset --hard HEAD HEAD is now at a802104 HIVE-8931: Test TestAccumuloCliDriver is not completing (Josh Elser via Daniel Dai + git clean -f -d + git checkout master Already on 'master' Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded. + git reset --hard origin/master HEAD is now at 7ae1d0b HIVE-10910 : Alter table drop partition queries in encrypted zone failing to remove data from HDFS (Eugene Koifman, reviewed by Gunther) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12732760 - PreCommit-HIVE-TRUNK-Build > Alter table concatenate oparetor will cause duplicate data > -- > > Key: HIVE-10685 > URL: https://issues.apache.org/jira/browse/HIVE-10685 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1 >Reporter: guoliming >Assignee: guoliming > Fix For: 1.2.0, 1.1.0 > > Attachments: HIVE-10685.patch > > > "Orders" table has 15 rows and stored as ORC. > {noformat} > hive> select count(*) from orders; > OK > 15 > Time taken: 37.692 seconds, Fetched: 1 row(s) > {noformat} > The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB. > After executing command : ALTER TABLE orders CONCATENATE; > The table is already 1530115000 rows. > My hive version is 1.1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data
[ https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577916#comment-14577916 ] Prasanth Jayachandran commented on HIVE-10685: -- Pending precommit tests. > Alter table concatenate oparetor will cause duplicate data > -- > > Key: HIVE-10685 > URL: https://issues.apache.org/jira/browse/HIVE-10685 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1 >Reporter: guoliming >Assignee: guoliming > Fix For: 1.2.0, 1.1.0 > > Attachments: HIVE-10685.patch > > > "Orders" table has 15 rows and stored as ORC. > {noformat} > hive> select count(*) from orders; > OK > 15 > Time taken: 37.692 seconds, Fetched: 1 row(s) > {noformat} > The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB. > After executing command : ALTER TABLE orders CONCATENATE; > The table is already 1530115000 rows. > My hive version is 1.1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data
[ https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577915#comment-14577915 ] Prasanth Jayachandran commented on HIVE-10685: -- LGTM, +1 > Alter table concatenate oparetor will cause duplicate data > -- > > Key: HIVE-10685 > URL: https://issues.apache.org/jira/browse/HIVE-10685 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1 >Reporter: guoliming >Assignee: guoliming > Fix For: 1.2.0, 1.1.0 > > Attachments: HIVE-10685.patch > > > "Orders" table has 15 rows and stored as ORC. > {noformat} > hive> select count(*) from orders; > OK > 15 > Time taken: 37.692 seconds, Fetched: 1 row(s) > {noformat} > The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB. > After executing command : ALTER TABLE orders CONCATENATE; > The table is already 1530115000 rows. > My hive version is 1.1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)