[jira] [Commented] (HIVE-21292) Break up DDLTask 1 - extract Database related operations
[ https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776598#comment-16776598 ] Hive QA commented on HIVE-21292: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 43s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 13s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 17s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 10s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 39s{color} | {color:blue} hcatalog/core in master has 29 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 45s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} ql: The patch generated 0 new + 507 unchanged - 25 fixed = 507 total (was 532) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} hcatalog/core: The patch generated 0 new + 40 unchanged - 2 fixed = 40 total (was 42) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} The patch hive-unit passed checkstyle {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 13s{color} | {color:green} ql generated 0 new + 2260 unchanged - 1 fixed = 2260 total (was 2261) {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 43s{color} | {color:green} core in the patch passed. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} hive-unit in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 53s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16229/dev-support/hive-personality.sh | | git revision | master / 2daaed7 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | modules | C: ql hcatalog/core itests/hive-unit U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16229/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Break up DDLTask 1 - extract Database related operations > > > Key: HIVE-21292 > URL: https://issues.apache.org/jira/browse/HIVE-21292 > Project: Hive > Issue Type: Imp
[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations
[ https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely updated HIVE-21292: -- Status: Patch Available (was: Open) > Break up DDLTask 1 - extract Database related operations > > > Key: HIVE-21292 > URL: https://issues.apache.org/jira/browse/HIVE-21292 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.1 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, > HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, > HIVE-21292.06.patch, HIVE-21292.07.patch, HIVE-21292.08.patch, > HIVE-21292.09.patch > > Time Spent: 7h > Remaining Estimate: 0h > > DDLTask is a huge class, more than 5000 lines long. The related DDLWork is > also a huge class, which has a field for each DDL operation it supports. The > goal is to refactor these in order to have everything cut into more > handleable classes under the package org.apache.hadoop.hive.ql.exec.ddl: > * have a separate class for each operation > * have a package for each operation group (database ddl, table ddl, etc), so > the amount of classes under a package is more manageable > * make all the requests (DDLDesc subclasses) immutable > * DDLTask should be agnostic to the actual operations > * right now let's ignore the issue of having some operations handled by > DDLTask which are not actual DDL operations (lock, unlock, desc...) > In the interim time when there are two DDLTask and DDLWork classes in the > code base the new ones in the new package are called DDLTask2 and DDLWork2 > thus avoiding the usage of fully qualified class names where both the old and > the new classes are in use. > Step #1: extract all the database related operations from the old DDLTask, > and move them under the new package. Also create the new internal framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations
[ https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely updated HIVE-21292: -- Attachment: HIVE-21292.09.patch > Break up DDLTask 1 - extract Database related operations > > > Key: HIVE-21292 > URL: https://issues.apache.org/jira/browse/HIVE-21292 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.1 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, > HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, > HIVE-21292.06.patch, HIVE-21292.07.patch, HIVE-21292.08.patch, > HIVE-21292.09.patch > > Time Spent: 7h > Remaining Estimate: 0h > > DDLTask is a huge class, more than 5000 lines long. The related DDLWork is > also a huge class, which has a field for each DDL operation it supports. The > goal is to refactor these in order to have everything cut into more > handleable classes under the package org.apache.hadoop.hive.ql.exec.ddl: > * have a separate class for each operation > * have a package for each operation group (database ddl, table ddl, etc), so > the amount of classes under a package is more manageable > * make all the requests (DDLDesc subclasses) immutable > * DDLTask should be agnostic to the actual operations > * right now let's ignore the issue of having some operations handled by > DDLTask which are not actual DDL operations (lock, unlock, desc...) > In the interim time when there are two DDLTask and DDLWork classes in the > code base the new ones in the new package are called DDLTask2 and DDLWork2 > thus avoiding the usage of fully qualified class names where both the old and > the new classes are in use. > Step #1: extract all the database related operations from the old DDLTask, > and move them under the new package. Also create the new internal framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations
[ https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely updated HIVE-21292: -- Status: Open (was: Patch Available) > Break up DDLTask 1 - extract Database related operations > > > Key: HIVE-21292 > URL: https://issues.apache.org/jira/browse/HIVE-21292 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.1 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, > HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, > HIVE-21292.06.patch, HIVE-21292.07.patch, HIVE-21292.08.patch, > HIVE-21292.09.patch > > Time Spent: 7h > Remaining Estimate: 0h > > DDLTask is a huge class, more than 5000 lines long. The related DDLWork is > also a huge class, which has a field for each DDL operation it supports. The > goal is to refactor these in order to have everything cut into more > handleable classes under the package org.apache.hadoop.hive.ql.exec.ddl: > * have a separate class for each operation > * have a package for each operation group (database ddl, table ddl, etc), so > the amount of classes under a package is more manageable > * make all the requests (DDLDesc subclasses) immutable > * DDLTask should be agnostic to the actual operations > * right now let's ignore the issue of having some operations handled by > DDLTask which are not actual DDL operations (lock, unlock, desc...) > In the interim time when there are two DDLTask and DDLWork classes in the > code base the new ones in the new package are called DDLTask2 and DDLWork2 > thus avoiding the usage of fully qualified class names where both the old and > the new classes are in use. > Step #1: extract all the database related operations from the old DDLTask, > and move them under the new package. Also create the new internal framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776524#comment-16776524 ] Hive QA commented on HIVE-21197: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12959970/HIVE-21197.03.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 15816 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16228/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16228/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16228/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12959970 - PreCommit-HIVE-Build > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, > HIVE-21197.03.patch > > Time Spent: 11h 10m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776522#comment-16776522 ] Hive QA commented on HIVE-21197: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 46s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 13s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 1m 12s{color} | {color:blue} standalone-metastore/metastore-server in master has 181 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 3s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 47s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 42s{color} | {color:red} ql: The patch generated 1 new + 325 unchanged - 0 fixed = 326 total (was 325) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s{color} | {color:red} itests/hive-unit: The patch generated 71 new + 272 unchanged - 0 fixed = 343 total (was 272) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 36s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16228/dev-support/hive-personality.sh | | git revision | master / 2daaed7 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16228/yetus/diff-checkstyle-ql.txt | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16228/yetus/diff-checkstyle-itests_hive-unit.txt | | modules | C: standalone-metastore/metastore-server ql itests/hive-unit U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16228/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Prio
[jira] [Assigned] (HIVE-21314) Hive Replication not retaining the owner in the replicated table
[ https://issues.apache.org/jira/browse/HIVE-21314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera reassigned HIVE-21314: -- > Hive Replication not retaining the owner in the replicated table > > > Key: HIVE-21314 > URL: https://issues.apache.org/jira/browse/HIVE-21314 > Project: Hive > Issue Type: Bug >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > > Hive Replication not retaining the owner in the replicated table. The owner > for the target table is set same as the user executing the load command. The > user information should be read from the dump metadata and should be used > while creating the table at target cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-21197: --- Attachment: (was: HIVE-21197.03.patch) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, > HIVE-21197.03.patch > > Time Spent: 11h 10m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-21197: --- Status: Open (was: Patch Available) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, > HIVE-21197.03.patch > > Time Spent: 11h 10m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-21197: --- Attachment: HIVE-21197.03.patch > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, > HIVE-21197.03.patch > > Time Spent: 11h 10m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-21197: --- Status: Patch Available (was: Open) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, > HIVE-21197.03.patch > > Time Spent: 11h 10m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HIVE-21312) FSStatsAggregator::connect is slow
[ https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776457#comment-16776457 ] Rajesh Balamohan edited comment on HIVE-21312 at 2/25/19 2:22 AM: -- Thanks [~kgyrtkirk]. I have made it as threadsafe queue. In my local run, runtime for this went down from 420 seconds to 20 seconds. was (Author: rajesh.balamohan): Thanks [~kgyrtkirk]. I have made it as threadsafe queue. > FSStatsAggregator::connect is slow > -- > > Key: HIVE-21312 > URL: https://issues.apache.org/jira/browse/HIVE-21312 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Rajesh Balamohan >Priority: Trivial > Attachments: HIVE-21312.1.patch, HIVE-21312.2.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21312) FSStatsAggregator::connect is slow
[ https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776465#comment-16776465 ] Hive QA commented on HIVE-21312: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 36s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 17s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 39s{color} | {color:red} ql: The patch generated 5 new + 9 unchanged - 5 fixed = 14 total (was 14) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 15s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16227/dev-support/hive-personality.sh | | git revision | master / 2daaed7 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16227/yetus/diff-checkstyle-ql.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16227/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > FSStatsAggregator::connect is slow > -- > > Key: HIVE-21312 > URL: https://issues.apache.org/jira/browse/HIVE-21312 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Rajesh Balamohan >Priority: Trivial > Attachments: HIVE-21312.1.patch, HIVE-21312.2.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21312) FSStatsAggregator::connect is slow
[ https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776457#comment-16776457 ] Rajesh Balamohan commented on HIVE-21312: - Thanks [~kgyrtkirk]. I have made it as threadsafe queue. > FSStatsAggregator::connect is slow > -- > > Key: HIVE-21312 > URL: https://issues.apache.org/jira/browse/HIVE-21312 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Rajesh Balamohan >Priority: Trivial > Attachments: HIVE-21312.1.patch, HIVE-21312.2.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21312) FSStatsAggregator::connect is slow
[ https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-21312: Attachment: HIVE-21312.2.patch > FSStatsAggregator::connect is slow > -- > > Key: HIVE-21312 > URL: https://issues.apache.org/jira/browse/HIVE-21312 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Rajesh Balamohan >Priority: Trivial > Attachments: HIVE-21312.1.patch, HIVE-21312.2.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21292) Break up DDLTask 1 - extract Database related operations
[ https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776450#comment-16776450 ] Hive QA commented on HIVE-21292: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12959958/HIVE-21292.08.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 15811 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16226/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16226/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16226/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12959958 - PreCommit-HIVE-Build > Break up DDLTask 1 - extract Database related operations > > > Key: HIVE-21292 > URL: https://issues.apache.org/jira/browse/HIVE-21292 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.1 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, > HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, > HIVE-21292.06.patch, HIVE-21292.07.patch, HIVE-21292.08.patch > > Time Spent: 7h > Remaining Estimate: 0h > > DDLTask is a huge class, more than 5000 lines long. The related DDLWork is > also a huge class, which has a field for each DDL operation it supports. The > goal is to refactor these in order to have everything cut into more > handleable classes under the package org.apache.hadoop.hive.ql.exec.ddl: > * have a separate class for each operation > * have a package for each operation group (database ddl, table ddl, etc), so > the amount of classes under a package is more manageable > * make all the requests (DDLDesc subclasses) immutable > * DDLTask should be agnostic to the actual operations > * right now let's ignore the issue of having some operations handled by > DDLTask which are not actual DDL operations (lock, unlock, desc...) > In the interim time when there are two DDLTask and DDLWork classes in the > code base the new ones in the new package are called DDLTask2 and DDLWork2 > thus avoiding the usage of fully qualified class names where both the old and > the new classes are in use. > Step #1: extract all the database related operations from the old DDLTask, > and move them under the new package. Also create the new internal framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21292) Break up DDLTask 1 - extract Database related operations
[ https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776436#comment-16776436 ] Hive QA commented on HIVE-21292: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 43s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 14s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 6s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 39s{color} | {color:blue} hcatalog/core in master has 29 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 44s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} ql: The patch generated 0 new + 507 unchanged - 25 fixed = 507 total (was 532) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} hcatalog/core: The patch generated 0 new + 40 unchanged - 2 fixed = 40 total (was 42) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} The patch hive-unit passed checkstyle {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 19s{color} | {color:green} ql generated 0 new + 2260 unchanged - 1 fixed = 2260 total (was 2261) {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s{color} | {color:green} core in the patch passed. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} hive-unit in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 35m 4s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16226/dev-support/hive-personality.sh | | git revision | master / 2daaed7 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | modules | C: ql hcatalog/core itests/hive-unit U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16226/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Break up DDLTask 1 - extract Database related operations > > > Key: HIVE-21292 > URL: https://issues.apache.org/jira/browse/HIVE-21292 > Project: Hive > Issue Type: Imp
[jira] [Commented] (HIVE-21240) JSON SerDe Re-Write
[ https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776430#comment-16776430 ] Hive QA commented on HIVE-21240: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12959957/HIVE-21240.10.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniHiveKafkaCliDriver.testCliDriver[kafka_storage_handler] (batchId=275) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16225/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16225/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16225/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12959957 - PreCommit-HIVE-Build > JSON SerDe Re-Write > --- > > Key: HIVE-21240 > URL: https://issues.apache.org/jira/browse/HIVE-21240 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 4.0.0, 3.1.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, > HIVE-21240.10.patch, HIVE-21240.2.patch, HIVE-21240.3.patch, > HIVE-21240.4.patch, HIVE-21240.5.patch, HIVE-21240.6.patch, > HIVE-21240.7.patch, HIVE-21240.9.patch, HIVE-24240.8.patch > > Time Spent: 10m > Remaining Estimate: 0h > > The JSON SerDe has a few issues, I will link them to this JIRA. > * Use Jackson Tree parser instead of manually parsing > * Added support for base-64 encoded data (the expected format when using JSON) > * Added support to skip blank lines (returns all columns as null values) > * Current JSON parser accepts, but does not apply, custom timestamp formats > in most cases > * Added some unit tests > * Added cache for column-name to column-index searches, currently O\(n\) for > each row processed, for each column in the row -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21240) JSON SerDe Re-Write
[ https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776425#comment-16776425 ] Hive QA commented on HIVE-21240: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 49s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 11s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 42s{color} | {color:blue} serde in master has 197 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 9s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 37s{color} | {color:blue} hcatalog/core in master has 29 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} serde: The patch generated 0 new + 4 unchanged - 25 fixed = 4 total (was 29) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} ql: The patch generated 0 new + 6 unchanged - 5 fixed = 6 total (was 11) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} The patch core passed checkstyle {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s{color} | {color:green} serde generated 0 new + 193 unchanged - 4 fixed = 193 total (was 197) {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 20s{color} | {color:green} ql in the patch passed. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 47s{color} | {color:green} core in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 33m 19s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16225/dev-support/hive-personality.sh | | git revision | master / 2daaed7 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | modules | C: serde ql hcatalog/core U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16225/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > JSON SerDe Re-Write > --- > > Key: HIVE-21240 > URL: https://issues.apache.org/jira/browse/HIVE-21240 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 4.0.0, 3.1.1 >Reporter: BELU
[jira] [Commented] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776405#comment-16776405 ] Hive QA commented on HIVE-21197: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12959955/HIVE-21197.03.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 15816 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamptz_2] (batchId=86) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[hybridgrace_hashjoin_2] (batchId=109) org.apache.hadoop.hive.ql.exec.repl.TestReplDumpTask.removeDBPropertyToPreventRenameWhenBootstrapDumpOfTableFails (batchId=321) org.apache.hadoop.hive.ql.parse.TestReplWithJsonMessageFormat.testBootstrapWithConcurrentDropTable (batchId=244) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testBootstrapWithConcurrentDropTable (batchId=246) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16224/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16224/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16224/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12959955 - PreCommit-HIVE-Build > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, > HIVE-21197.03.patch > > Time Spent: 11h 10m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations
[ https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely updated HIVE-21292: -- Attachment: HIVE-21292.08.patch > Break up DDLTask 1 - extract Database related operations > > > Key: HIVE-21292 > URL: https://issues.apache.org/jira/browse/HIVE-21292 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.1 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, > HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, > HIVE-21292.06.patch, HIVE-21292.07.patch, HIVE-21292.08.patch > > Time Spent: 7h > Remaining Estimate: 0h > > DDLTask is a huge class, more than 5000 lines long. The related DDLWork is > also a huge class, which has a field for each DDL operation it supports. The > goal is to refactor these in order to have everything cut into more > handleable classes under the package org.apache.hadoop.hive.ql.exec.ddl: > * have a separate class for each operation > * have a package for each operation group (database ddl, table ddl, etc), so > the amount of classes under a package is more manageable > * make all the requests (DDLDesc subclasses) immutable > * DDLTask should be agnostic to the actual operations > * right now let's ignore the issue of having some operations handled by > DDLTask which are not actual DDL operations (lock, unlock, desc...) > In the interim time when there are two DDLTask and DDLWork classes in the > code base the new ones in the new package are called DDLTask2 and DDLWork2 > thus avoiding the usage of fully qualified class names where both the old and > the new classes are in use. > Step #1: extract all the database related operations from the old DDLTask, > and move them under the new package. Also create the new internal framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776399#comment-16776399 ] Hive QA commented on HIVE-21197: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 48s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 25s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 1m 12s{color} | {color:blue} standalone-metastore/metastore-server in master has 181 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 12s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 45s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 45s{color} | {color:red} ql: The patch generated 1 new + 325 unchanged - 0 fixed = 326 total (was 325) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s{color} | {color:red} itests/hive-unit: The patch generated 71 new + 272 unchanged - 0 fixed = 343 total (was 272) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16224/dev-support/hive-personality.sh | | git revision | master / 2daaed7 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16224/yetus/diff-checkstyle-ql.txt | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16224/yetus/diff-checkstyle-itests_hive-unit.txt | | modules | C: standalone-metastore/metastore-server ql itests/hive-unit U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16224/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Prio
[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations
[ https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely updated HIVE-21292: -- Status: Patch Available (was: Open) > Break up DDLTask 1 - extract Database related operations > > > Key: HIVE-21292 > URL: https://issues.apache.org/jira/browse/HIVE-21292 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.1 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, > HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, > HIVE-21292.06.patch, HIVE-21292.07.patch, HIVE-21292.08.patch > > Time Spent: 7h > Remaining Estimate: 0h > > DDLTask is a huge class, more than 5000 lines long. The related DDLWork is > also a huge class, which has a field for each DDL operation it supports. The > goal is to refactor these in order to have everything cut into more > handleable classes under the package org.apache.hadoop.hive.ql.exec.ddl: > * have a separate class for each operation > * have a package for each operation group (database ddl, table ddl, etc), so > the amount of classes under a package is more manageable > * make all the requests (DDLDesc subclasses) immutable > * DDLTask should be agnostic to the actual operations > * right now let's ignore the issue of having some operations handled by > DDLTask which are not actual DDL operations (lock, unlock, desc...) > In the interim time when there are two DDLTask and DDLWork classes in the > code base the new ones in the new package are called DDLTask2 and DDLWork2 > thus avoiding the usage of fully qualified class names where both the old and > the new classes are in use. > Step #1: extract all the database related operations from the old DDLTask, > and move them under the new package. Also create the new internal framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations
[ https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely updated HIVE-21292: -- Status: Open (was: Patch Available) > Break up DDLTask 1 - extract Database related operations > > > Key: HIVE-21292 > URL: https://issues.apache.org/jira/browse/HIVE-21292 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.1 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, > HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, > HIVE-21292.06.patch, HIVE-21292.07.patch, HIVE-21292.08.patch > > Time Spent: 7h > Remaining Estimate: 0h > > DDLTask is a huge class, more than 5000 lines long. The related DDLWork is > also a huge class, which has a field for each DDL operation it supports. The > goal is to refactor these in order to have everything cut into more > handleable classes under the package org.apache.hadoop.hive.ql.exec.ddl: > * have a separate class for each operation > * have a package for each operation group (database ddl, table ddl, etc), so > the amount of classes under a package is more manageable > * make all the requests (DDLDesc subclasses) immutable > * DDLTask should be agnostic to the actual operations > * right now let's ignore the issue of having some operations handled by > DDLTask which are not actual DDL operations (lock, unlock, desc...) > In the interim time when there are two DDLTask and DDLWork classes in the > code base the new ones in the new package are called DDLTask2 and DDLWork2 > thus avoiding the usage of fully qualified class names where both the old and > the new classes are in use. > Step #1: extract all the database related operations from the old DDLTask, > and move them under the new package. Also create the new internal framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21240) JSON SerDe Re-Write
[ https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HIVE-21240: --- Status: Patch Available (was: Open) > JSON SerDe Re-Write > --- > > Key: HIVE-21240 > URL: https://issues.apache.org/jira/browse/HIVE-21240 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 3.1.1, 4.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, > HIVE-21240.10.patch, HIVE-21240.2.patch, HIVE-21240.3.patch, > HIVE-21240.4.patch, HIVE-21240.5.patch, HIVE-21240.6.patch, > HIVE-21240.7.patch, HIVE-21240.9.patch, HIVE-24240.8.patch > > Time Spent: 10m > Remaining Estimate: 0h > > The JSON SerDe has a few issues, I will link them to this JIRA. > * Use Jackson Tree parser instead of manually parsing > * Added support for base-64 encoded data (the expected format when using JSON) > * Added support to skip blank lines (returns all columns as null values) > * Current JSON parser accepts, but does not apply, custom timestamp formats > in most cases > * Added some unit tests > * Added cache for column-name to column-index searches, currently O\(n\) for > each row processed, for each column in the row -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21240) JSON SerDe Re-Write
[ https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HIVE-21240: --- Attachment: HIVE-21240.10.patch > JSON SerDe Re-Write > --- > > Key: HIVE-21240 > URL: https://issues.apache.org/jira/browse/HIVE-21240 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 4.0.0, 3.1.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, > HIVE-21240.10.patch, HIVE-21240.2.patch, HIVE-21240.3.patch, > HIVE-21240.4.patch, HIVE-21240.5.patch, HIVE-21240.6.patch, > HIVE-21240.7.patch, HIVE-21240.9.patch, HIVE-24240.8.patch > > Time Spent: 10m > Remaining Estimate: 0h > > The JSON SerDe has a few issues, I will link them to this JIRA. > * Use Jackson Tree parser instead of manually parsing > * Added support for base-64 encoded data (the expected format when using JSON) > * Added support to skip blank lines (returns all columns as null values) > * Current JSON parser accepts, but does not apply, custom timestamp formats > in most cases > * Added some unit tests > * Added cache for column-name to column-index searches, currently O\(n\) for > each row processed, for each column in the row -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21240) JSON SerDe Re-Write
[ https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HIVE-21240: --- Status: Open (was: Patch Available) > JSON SerDe Re-Write > --- > > Key: HIVE-21240 > URL: https://issues.apache.org/jira/browse/HIVE-21240 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 3.1.1, 4.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, > HIVE-21240.10.patch, HIVE-21240.2.patch, HIVE-21240.3.patch, > HIVE-21240.4.patch, HIVE-21240.5.patch, HIVE-21240.6.patch, > HIVE-21240.7.patch, HIVE-21240.9.patch, HIVE-24240.8.patch > > Time Spent: 10m > Remaining Estimate: 0h > > The JSON SerDe has a few issues, I will link them to this JIRA. > * Use Jackson Tree parser instead of manually parsing > * Added support for base-64 encoded data (the expected format when using JSON) > * Added support to skip blank lines (returns all columns as null values) > * Current JSON parser accepts, but does not apply, custom timestamp formats > in most cases > * Added some unit tests > * Added cache for column-name to column-index searches, currently O\(n\) for > each row processed, for each column in the row -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21292) Break up DDLTask 1 - extract Database related operations
[ https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776387#comment-16776387 ] Hive QA commented on HIVE-21292: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12959954/HIVE-21292.07.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15800 tests executed *Failed tests:* {noformat} TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=184) [auto_sortmerge_join_7.q,mm_exim.q,input16_cc.q,materialized_view_rewrite_no_join_opt.q,vector_char_varchar_1.q,smb_mapjoin_5.q,vector_char_4.q,cross_product_check_2.q,cbo_limit.q,llap_smb.q,materialized_view_create_rewrite_2.q,vector_decimal_udf.q] {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16223/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16223/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16223/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12959954 - PreCommit-HIVE-Build > Break up DDLTask 1 - extract Database related operations > > > Key: HIVE-21292 > URL: https://issues.apache.org/jira/browse/HIVE-21292 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.1 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, > HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, > HIVE-21292.06.patch, HIVE-21292.07.patch > > Time Spent: 7h > Remaining Estimate: 0h > > DDLTask is a huge class, more than 5000 lines long. The related DDLWork is > also a huge class, which has a field for each DDL operation it supports. The > goal is to refactor these in order to have everything cut into more > handleable classes under the package org.apache.hadoop.hive.ql.exec.ddl: > * have a separate class for each operation > * have a package for each operation group (database ddl, table ddl, etc), so > the amount of classes under a package is more manageable > * make all the requests (DDLDesc subclasses) immutable > * DDLTask should be agnostic to the actual operations > * right now let's ignore the issue of having some operations handled by > DDLTask which are not actual DDL operations (lock, unlock, desc...) > In the interim time when there are two DDLTask and DDLWork classes in the > code base the new ones in the new package are called DDLTask2 and DDLWork2 > thus avoiding the usage of fully qualified class names where both the old and > the new classes are in use. > Step #1: extract all the database related operations from the old DDLTask, > and move them under the new package. Also create the new internal framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21292) Break up DDLTask 1 - extract Database related operations
[ https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776381#comment-16776381 ] Hive QA commented on HIVE-21292: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 53s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 14s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 37s{color} | {color:blue} hcatalog/core in master has 29 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 49s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} ql: The patch generated 0 new + 507 unchanged - 25 fixed = 507 total (was 532) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} hcatalog/core: The patch generated 0 new + 40 unchanged - 2 fixed = 40 total (was 42) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} The patch hive-unit passed checkstyle {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 23s{color} | {color:green} ql generated 0 new + 2260 unchanged - 1 fixed = 2260 total (was 2261) {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 48s{color} | {color:green} core in the patch passed. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} hive-unit in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 36m 42s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16223/dev-support/hive-personality.sh | | git revision | master / 2daaed7 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | modules | C: ql hcatalog/core itests/hive-unit U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16223/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Break up DDLTask 1 - extract Database related operations > > > Key: HIVE-21292 > URL: https://issues.apache.org/jira/browse/HIVE-21292 > Project: Hive > Issue Type: Imp
[jira] [Updated] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-21197: --- Attachment: HIVE-21197.03.patch > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, > HIVE-21197.03.patch > > Time Spent: 11h 10m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-21197: --- Status: Patch Available (was: Open) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, > HIVE-21197.03.patch > > Time Spent: 11h 10m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-21197: --- Status: Open (was: Patch Available) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, > HIVE-21197.03.patch > > Time Spent: 11h 10m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations
[ https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely updated HIVE-21292: -- Status: Patch Available (was: Open) > Break up DDLTask 1 - extract Database related operations > > > Key: HIVE-21292 > URL: https://issues.apache.org/jira/browse/HIVE-21292 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.1 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, > HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, > HIVE-21292.06.patch, HIVE-21292.07.patch > > Time Spent: 7h > Remaining Estimate: 0h > > DDLTask is a huge class, more than 5000 lines long. The related DDLWork is > also a huge class, which has a field for each DDL operation it supports. The > goal is to refactor these in order to have everything cut into more > handleable classes under the package org.apache.hadoop.hive.ql.exec.ddl: > * have a separate class for each operation > * have a package for each operation group (database ddl, table ddl, etc), so > the amount of classes under a package is more manageable > * make all the requests (DDLDesc subclasses) immutable > * DDLTask should be agnostic to the actual operations > * right now let's ignore the issue of having some operations handled by > DDLTask which are not actual DDL operations (lock, unlock, desc...) > In the interim time when there are two DDLTask and DDLWork classes in the > code base the new ones in the new package are called DDLTask2 and DDLWork2 > thus avoiding the usage of fully qualified class names where both the old and > the new classes are in use. > Step #1: extract all the database related operations from the old DDLTask, > and move them under the new package. Also create the new internal framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations
[ https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely updated HIVE-21292: -- Attachment: HIVE-21292.07.patch > Break up DDLTask 1 - extract Database related operations > > > Key: HIVE-21292 > URL: https://issues.apache.org/jira/browse/HIVE-21292 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.1 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, > HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, > HIVE-21292.06.patch, HIVE-21292.07.patch > > Time Spent: 7h > Remaining Estimate: 0h > > DDLTask is a huge class, more than 5000 lines long. The related DDLWork is > also a huge class, which has a field for each DDL operation it supports. The > goal is to refactor these in order to have everything cut into more > handleable classes under the package org.apache.hadoop.hive.ql.exec.ddl: > * have a separate class for each operation > * have a package for each operation group (database ddl, table ddl, etc), so > the amount of classes under a package is more manageable > * make all the requests (DDLDesc subclasses) immutable > * DDLTask should be agnostic to the actual operations > * right now let's ignore the issue of having some operations handled by > DDLTask which are not actual DDL operations (lock, unlock, desc...) > In the interim time when there are two DDLTask and DDLWork classes in the > code base the new ones in the new package are called DDLTask2 and DDLWork2 > thus avoiding the usage of fully qualified class names where both the old and > the new classes are in use. > Step #1: extract all the database related operations from the old DDLTask, > and move them under the new package. Also create the new internal framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations
[ https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely updated HIVE-21292: -- Status: Open (was: Patch Available) > Break up DDLTask 1 - extract Database related operations > > > Key: HIVE-21292 > URL: https://issues.apache.org/jira/browse/HIVE-21292 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.1 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, > HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, > HIVE-21292.06.patch > > Time Spent: 7h > Remaining Estimate: 0h > > DDLTask is a huge class, more than 5000 lines long. The related DDLWork is > also a huge class, which has a field for each DDL operation it supports. The > goal is to refactor these in order to have everything cut into more > handleable classes under the package org.apache.hadoop.hive.ql.exec.ddl: > * have a separate class for each operation > * have a package for each operation group (database ddl, table ddl, etc), so > the amount of classes under a package is more manageable > * make all the requests (DDLDesc subclasses) immutable > * DDLTask should be agnostic to the actual operations > * right now let's ignore the issue of having some operations handled by > DDLTask which are not actual DDL operations (lock, unlock, desc...) > In the interim time when there are two DDLTask and DDLWork classes in the > code base the new ones in the new package are called DDLTask2 and DDLWork2 > thus avoiding the usage of fully qualified class names where both the old and > the new classes are in use. > Step #1: extract all the database related operations from the old DDLTask, > and move them under the new package. Also create the new internal framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21293) Fix ambiguity in grammar warnings at compilation time (II)
[ https://issues.apache.org/jira/browse/HIVE-21293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776328#comment-16776328 ] Ashutosh Chauhan commented on HIVE-21293: - {{unknown}} needs to be non-reserved for this feature to be included. Else, resulting ambiguity in grammar is not worth including this feature. Because altho its reserved in standard, having it as a reserved will be a backward incompatible change. > Fix ambiguity in grammar warnings at compilation time (II) > -- > > Key: HIVE-21293 > URL: https://issues.apache.org/jira/browse/HIVE-21293 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 4.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Laszlo Bodor >Priority: Major > Attachments: HIVE-21293.01.patch > > > These are the warnings at compilation time: > {code} > warning(200): IdentifiersParser.g:424:5: > Decision can match input such as "KW_UNKNOWN" using multiple alternatives: 1, > 10 > As a result, alternative(s) 10 were disabled for that input > {code} > This means that multiple parser rules can match certain query text, possibly > leading to unexpected errors at parsing time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21313) Use faster function to point to instead of copy immutable byte arrays
[ https://issues.apache.org/jira/browse/HIVE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776314#comment-16776314 ] Hive QA commented on HIVE-21313: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12959947/HIVE-21313.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 99 failed/errored test(s), 15811 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=267) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[allcolref_in_udf] (batchId=57) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join11] (batchId=10) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join12] (batchId=26) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join13] (batchId=87) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join15] (batchId=17) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join20] (batchId=96) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join22] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join26] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join29] (batchId=59) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join31] (batchId=48) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_stats2] (batchId=94) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ba_table_udfs] (batchId=26) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_udf_max] (batchId=2) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer7] (batchId=23) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join30] (batchId=84) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join40] (batchId=58) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_outer] (batchId=46) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin47] (batchId=64) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_filter_on_outerjoin] (batchId=66) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_memcheck] (batchId=45) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_subquery] (batchId=55) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_test_outer] (batchId=1) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_10] (batchId=46) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_12] (batchId=1) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_13] (batchId=80) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_1] (batchId=91) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_1_newdb] (batchId=12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_2] (batchId=94) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_5] (batchId=31) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_6] (batchId=28) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_7] (batchId=47) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_8] (batchId=8) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_disablecbo_1] (batchId=55) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_disablecbo_2] (batchId=48) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_mv] (batchId=88) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin_mapjoin5] (batchId=68) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin_mapjoin6] (batchId=65) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin_mapjoin8] (batchId=41) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoinopt10] (batchId=22) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_max] (batchId=52) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_min] (batchId=44) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sentences] (batchId=42) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_pos_alias] (batchId=57) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_12] (batchId=47) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_13] (batchId=94) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_14] (batchId=14) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_binary_join_groupby] (batchId=88) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_case_when_2] (batchId=58) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_varchar_mapjoin1] (batchId=27) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_math_funcs] (batchId=22) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_parquet_types] (batchId=72) org.apache.hadoop.hive.cli.TestCompareCliDriver.
[jira] [Commented] (HIVE-21313) Use faster function to point to instead of copy immutable byte arrays
[ https://issues.apache.org/jira/browse/HIVE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776301#comment-16776301 ] Hive QA commented on HIVE-21313: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 49s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 7s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 44s{color} | {color:red} ql: The patch generated 10 new + 444 unchanged - 10 fixed = 454 total (was 454) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 26m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16222/dev-support/hive-personality.sh | | git revision | master / 2daaed7 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16222/yetus/diff-checkstyle-ql.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16222/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Use faster function to point to instead of copy immutable byte arrays > - > > Key: HIVE-21313 > URL: https://issues.apache.org/jira/browse/HIVE-21313 > Project: Hive > Issue Type: Improvement >Affects Versions: All Versions >Reporter: ZhangXin >Assignee: ZhangXin >Priority: Minor > Labels: pull-request-available > Fix For: All Versions > > Attachments: HIVE-21313.patch, HIVE-21313.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In file ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAssignRow.java > We may find code like this: > ``` > Text text = (Text) convertTargetWritable; > if (text == null) > { text = new Text(); } > text.set(string); > ((BytesColumnVector) columnVector).setVal( > batchIndex, text.getBytes(), 0, text.getLength()); > ``` > > Using `setVal` method can copy the bytes array generated by > `text.getBytes()`. This is totally unnecessary at all. Since the bytes array > is immutable, we can just use `setRef` method to point to the specific byte > array, which will also lower the memory usage. > > Pull request on Github: https://github.com/apache/hive/pull/548 > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-15475) JsonSerDe cannot handle json file with empty lines
[ https://issues.apache.org/jira/browse/HIVE-15475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776298#comment-16776298 ] BELUGA BEHR commented on HIVE-15475: Nope. OK. Figured it out. This issue was inadvertently fixed as part of [HIVE-18545] (Jul 10, 2018). Previous to this change, the JSON stuff was handled by {{org.apache.hive.hcatalog.data.JsonSerDe}} The issue was that this class was not handling the provided {{Text}} object correctly. The {{Text}} object has two components to it: an internal array of bytes *and* a size that indicates which bytes are to be processed. Well, {{JsonSerde}} was not taking into account the size, so, when a zero-length {{Text}} object was submitted, it would still look at the entire internal byte array, ignoring the zero size, and produce duplicates where there should be no text. https://github.com/apache/hive/blob/ae008b79b5d52ed6a38875b73025a505725828eb/hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/JsonSerDe.java#L168 > JsonSerDe cannot handle json file with empty lines > -- > > Key: HIVE-15475 > URL: https://issues.apache.org/jira/browse/HIVE-15475 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 >Reporter: pin_zhang >Priority: Major > > 1. start HiveServer2 in apache-hive-1.2.1 > 2 start a beeline connect to hive server2 > ADD JAR ADD JAR > /home/apache-hive-1.2.1-bin/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar > ; >CREATE external TABLE my_table(a string, b bigint) > ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' > STORED AS TEXTFILE > location 'file:///home/hive/json'; > 3 put a file with more than one new lines at the end of the file > {"a":"a_1", "b" : 1} > 4 run sql > select * from my_table ; > +-+-+--+ > | my_table.a | my_table.b | > +-+-+--+ > | a_1 | 1 | > | a_1 | 1 | > | a_1 | 1 | > | a_1 | 1 | > | a_1 | 1 | > +-+-+--+ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HIVE-15475) JsonSerDe cannot handle json file with empty lines
[ https://issues.apache.org/jira/browse/HIVE-15475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR resolved HIVE-15475. Resolution: Fixed > JsonSerDe cannot handle json file with empty lines > -- > > Key: HIVE-15475 > URL: https://issues.apache.org/jira/browse/HIVE-15475 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 >Reporter: pin_zhang >Priority: Major > > 1. start HiveServer2 in apache-hive-1.2.1 > 2 start a beeline connect to hive server2 > ADD JAR ADD JAR > /home/apache-hive-1.2.1-bin/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar > ; >CREATE external TABLE my_table(a string, b bigint) > ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' > STORED AS TEXTFILE > location 'file:///home/hive/json'; > 3 put a file with more than one new lines at the end of the file > {"a":"a_1", "b" : 1} > 4 run sql > select * from my_table ; > +-+-+--+ > | my_table.a | my_table.b | > +-+-+--+ > | a_1 | 1 | > | a_1 | 1 | > | a_1 | 1 | > | a_1 | 1 | > | a_1 | 1 | > +-+-+--+ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21313) Use faster function to point to instead of copy immutable byte arrays
[ https://issues.apache.org/jira/browse/HIVE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangXin updated HIVE-21313: Fix Version/s: All Versions Affects Version/s: All Versions Attachment: HIVE-21313.patch Target Version/s: All Versions Status: Patch Available (was: Open) > Use faster function to point to instead of copy immutable byte arrays > - > > Key: HIVE-21313 > URL: https://issues.apache.org/jira/browse/HIVE-21313 > Project: Hive > Issue Type: Improvement >Affects Versions: All Versions >Reporter: ZhangXin >Assignee: ZhangXin >Priority: Minor > Labels: pull-request-available > Fix For: All Versions > > Attachments: HIVE-21313.patch, HIVE-21313.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In file ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAssignRow.java > We may find code like this: > ``` > Text text = (Text) convertTargetWritable; > if (text == null) > { text = new Text(); } > text.set(string); > ((BytesColumnVector) columnVector).setVal( > batchIndex, text.getBytes(), 0, text.getLength()); > ``` > > Using `setVal` method can copy the bytes array generated by > `text.getBytes()`. This is totally unnecessary at all. Since the bytes array > is immutable, we can just use `setRef` method to point to the specific byte > array, which will also lower the memory usage. > > Pull request on Github: https://github.com/apache/hive/pull/548 > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21312) FSStatsAggregator::connect is slow
[ https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776271#comment-16776271 ] Hive QA commented on HIVE-21312: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 14s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 45s{color} | {color:red} ql: The patch generated 3 new + 13 unchanged - 1 fixed = 16 total (was 14) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 38s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16221/dev-support/hive-personality.sh | | git revision | master / 2daaed7 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16221/yetus/diff-checkstyle-ql.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16221/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > FSStatsAggregator::connect is slow > -- > > Key: HIVE-21312 > URL: https://issues.apache.org/jira/browse/HIVE-21312 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Rajesh Balamohan >Priority: Trivial > Attachments: HIVE-21312.1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21313) Use faster function to point to instead of copy immutable byte arrays
[ https://issues.apache.org/jira/browse/HIVE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776275#comment-16776275 ] Zoltan Haindrich commented on HIVE-21313: - [~ZhangxinJson]: please press "submit patch" button to have hive qa test your patch > Use faster function to point to instead of copy immutable byte arrays > - > > Key: HIVE-21313 > URL: https://issues.apache.org/jira/browse/HIVE-21313 > Project: Hive > Issue Type: Improvement >Reporter: ZhangXin >Assignee: ZhangXin >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-21313.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In file ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAssignRow.java > We may find code like this: > ``` > Text text = (Text) convertTargetWritable; > if (text == null) > { text = new Text(); } > text.set(string); > ((BytesColumnVector) columnVector).setVal( > batchIndex, text.getBytes(), 0, text.getLength()); > ``` > > Using `setVal` method can copy the bytes array generated by > `text.getBytes()`. This is totally unnecessary at all. Since the bytes array > is immutable, we can just use `setRef` method to point to the specific byte > array, which will also lower the memory usage. > > Pull request on Github: https://github.com/apache/hive/pull/548 > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203245&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203245 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 14:20 Start Date: 24/Feb/19 14:20 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259622969 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -661,6 +663,10 @@ public int execute(DriverContext driverContext) { if (work.getAlterMaterializedViewDesc() != null) { return alterMaterializedView(db, work.getAlterMaterializedViewDesc()); } + + if (work.getReplSetFirstIncLoadFlagDesc() != null) { Review comment: its done in a separate task to make it simpler This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203245) Time Spent: 11h (was: 10h 50m) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 11h > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203236&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203236 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 14:02 Start Date: 24/Feb/19 14:02 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259622204 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java ## @@ -1536,6 +1536,62 @@ public void testCompactionInfoHashCode() { Assert.assertEquals("The hash codes must be equal", compactionInfo.hashCode(), compactionInfo1.hashCode()); } + @Test + public void testDisableCompactionDuringReplLoad() throws Exception { +String tblName = "discomp"; +String database = "discomp_db"; +executeStatementOnDriver("drop database if exists " + database + " cascade", driver); +executeStatementOnDriver("create database " + database, driver); +executeStatementOnDriver("CREATE TABLE " + database + "." + tblName + "(a INT, b STRING) " + +" PARTITIONED BY(ds string)" + +" CLUSTERED BY(a) INTO 2 BUCKETS" + //currently ACID requires table to be bucketed +" STORED AS ORC TBLPROPERTIES ('transactional'='true')", driver); +executeStatementOnDriver("insert into " + database + "." + tblName + " partition (ds) values (1, 'fred', " + +"'today'), (2, 'wilma', 'yesterday')", driver); + +executeStatementOnDriver("ALTER TABLE " + database + "." + tblName + +" SET TBLPROPERTIES ( 'hive.repl.first.inc.pending' = 'true')", driver); +List compacts = getCompactionList(); +Assert.assertEquals(0, compacts.size()); + +executeStatementOnDriver("alter database " + database + +" set dbproperties ('hive.repl.first.inc.pending' = 'true')", driver); +executeStatementOnDriver("ALTER TABLE " + database + "." + tblName + Review comment: table level is taken care of now This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203236) Time Spent: 9h 40m (was: 9.5h) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 9h 40m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203246&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203246 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 14:21 Start Date: 24/Feb/19 14:21 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259622991 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java ## @@ -112,6 +118,12 @@ public void run() { continue; } + if (replIsCompactionDisabledForTable(t)) { Review comment: not done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203246) Time Spent: 11h 10m (was: 11h) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 11h 10m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203243&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203243 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 14:07 Start Date: 24/Feb/19 14:07 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259622407 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java ## @@ -164,6 +165,12 @@ public IncrementalLoadTasksBuilder(String dbName, String tableName, String loadP lastEventid); } } + + ReplSetFirstIncLoadFlagDesc desc = new ReplSetFirstIncLoadFlagDesc(dbName, tableName); Review comment: new ddl task is created to make it simpler with table level and warehouse level support This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203243) Time Spent: 10h 40m (was: 10.5h) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 10h 40m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21312) FSStatsAggregator::connect is slow
[ https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776255#comment-16776255 ] Zoltan Haindrich commented on HIVE-21312: - is {{statsList.add(stats)}} threadsafe? I think its a simple arraylist. Note: I think another approach would be to instead of working with the FS based things; use Tez counters to haul this data. > FSStatsAggregator::connect is slow > -- > > Key: HIVE-21312 > URL: https://issues.apache.org/jira/browse/HIVE-21312 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Rajesh Balamohan >Priority: Trivial > Attachments: HIVE-21312.1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203235&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203235 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 14:02 Start Date: 24/Feb/19 14:02 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259622190 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java ## @@ -271,12 +299,13 @@ public String getName() { LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath); if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){ ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false); - if (replicationSpec.isReplace() && conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) { + if (replicationSpec.isReplace() && (conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION) || copyToMigratedTxnTable)) { rcwork.setDeleteDestIfExist(true); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203235) Time Spent: 9.5h (was: 9h 20m) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 9.5h > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203244&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203244 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 14:07 Start Date: 24/Feb/19 14:07 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259622413 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java ## @@ -158,6 +159,15 @@ private boolean isDbEmpty(String dbName) throws HiveException { // Add the checkpoint key to the Database binding it to current dump directory. // So, if retry using same dump, we shall skip Database object update. parameters.put(ReplUtils.REPL_CHECKPOINT_KEY, dumpDirectory); + +if (needSetIncFlag) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203244) Time Spent: 10h 50m (was: 10h 40m) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 10h 50m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203241&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203241 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 14:06 Start Date: 24/Feb/19 14:06 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259622357 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java ## @@ -135,7 +135,8 @@ private boolean isDbEmpty(String dbName) throws HiveException { } private Task alterDbTask(Database dbObj) { -return alterDbTask(dbObj.getName(), updateDbProps(dbObj, context.dumpDirectory), context.hiveConf); +return alterDbTask(dbObj.getName(), updateDbProps(dbObj, context.dumpDirectory, false), Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203241) Time Spent: 10.5h (was: 10h 20m) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 10.5h > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203229&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203229 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 13:55 Start Date: 24/Feb/19 13:55 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259621882 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java ## @@ -289,12 +296,21 @@ private boolean shouldReplayEvent(FileStatus dir, DumpType dumpType, String dbNa return updateReplIdTask; } - private Task dbUpdateReplStateTask(String dbName, String replState, + private Task dbUpdateReplStateTask(String dbName, String replState, String incLoadPendFlag, Review comment: the code is removed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203229) Time Spent: 9h 10m (was: 9h) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 9h 10m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203237&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203237 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 14:02 Start Date: 24/Feb/19 14:02 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259622210 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java ## @@ -1536,6 +1536,62 @@ public void testCompactionInfoHashCode() { Assert.assertEquals("The hash codes must be equal", compactionInfo.hashCode(), compactionInfo1.hashCode()); } + @Test + public void testDisableCompactionDuringReplLoad() throws Exception { +String tblName = "discomp"; +String database = "discomp_db"; +executeStatementOnDriver("drop database if exists " + database + " cascade", driver); +executeStatementOnDriver("create database " + database, driver); +executeStatementOnDriver("CREATE TABLE " + database + "." + tblName + "(a INT, b STRING) " + +" PARTITIONED BY(ds string)" + +" CLUSTERED BY(a) INTO 2 BUCKETS" + //currently ACID requires table to be bucketed +" STORED AS ORC TBLPROPERTIES ('transactional'='true')", driver); +executeStatementOnDriver("insert into " + database + "." + tblName + " partition (ds) values (1, 'fred', " + +"'today'), (2, 'wilma', 'yesterday')", driver); + +executeStatementOnDriver("ALTER TABLE " + database + "." + tblName + +" SET TBLPROPERTIES ( 'hive.repl.first.inc.pending' = 'true')", driver); +List compacts = getCompactionList(); +Assert.assertEquals(0, compacts.size()); + +executeStatementOnDriver("alter database " + database + +" set dbproperties ('hive.repl.first.inc.pending' = 'true')", driver); +executeStatementOnDriver("ALTER TABLE " + database + "." + tblName + +" SET TBLPROPERTIES ( 'hive.repl.first.inc.pending' = 'false')", driver); +compacts = getCompactionList(); +Assert.assertEquals(0, compacts.size()); + +executeStatementOnDriver("alter database " + database + +" set dbproperties ('hive.repl.first.inc.pending' = 'false')", driver); +executeStatementOnDriver("ALTER TABLE " + database + "." + tblName + Review comment: table level is taken care of now This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203237) Time Spent: 9h 50m (was: 9h 40m) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 9h 50m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203240&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203240 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 14:03 Start Date: 24/Feb/19 14:03 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259622274 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSpec.java ## @@ -426,4 +427,14 @@ public static void copyLastReplId(Map srcParameter, Map Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 10h 20m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203238&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203238 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 14:02 Start Date: 24/Feb/19 14:02 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r25960 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java ## @@ -71,6 +74,16 @@ public void init(AtomicBoolean stop, AtomicBoolean looped) throws Exception { } } + @Override boolean replIsCompactionDisabledForDatabase(String dbName) throws TException { +try { + Database database = rs.getDatabase(getDefaultCatalog(conf), dbName); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203238) Time Spent: 10h (was: 9h 50m) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 10h > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203239&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203239 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 14:03 Start Date: 24/Feb/19 14:03 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259622236 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSpec.java ## @@ -426,4 +427,14 @@ public static void copyLastReplId(Map srcParameter, Map Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 10h 10m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203234&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203234 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 14:02 Start Date: 24/Feb/19 14:02 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259622183 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java ## @@ -370,6 +370,9 @@ private int executeIncrementalLoad(DriverContext driverContext) { // If incremental events are already applied, then check and perform if need to bootstrap any tables. if (!builder.hasMoreWork() && !work.getPathsToCopyIterator().hasNext()) { +// No need to set incremental load pending flag for external tables as the files will be copied to the same path Review comment: todo not required as table level load is taken care now This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203234) Time Spent: 9h 20m (was: 9h 10m) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 9h 20m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203228&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203228 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 13:54 Start Date: 24/Feb/19 13:54 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259621844 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java ## @@ -289,12 +296,21 @@ private boolean shouldReplayEvent(FileStatus dir, DumpType dumpType, String dbNa return updateReplIdTask; } - private Task dbUpdateReplStateTask(String dbName, String replState, + private Task dbUpdateReplStateTask(String dbName, String replState, String incLoadPendFlag, Task preCursor) { HashMap mapProp = new HashMap<>(); -mapProp.put(ReplicationSpec.KEY.CURR_STATE_ID.toString(), replState); -AlterDatabaseDesc alterDbDesc = new AlterDatabaseDesc(dbName, mapProp, new ReplicationSpec(replState, replState)); +// if the update is for incLoadPendFlag, then send replicationSpec as null to avoid replacement check. +ReplicationSpec replicationSpec = null; +if (incLoadPendFlag == null) { + mapProp.put(ReplicationSpec.KEY.CURR_STATE_ID.toString(), replState); + replicationSpec = new ReplicationSpec(replState, replState); +} else { + assert replState == null; + mapProp.put(ReplUtils.REPL_FIRST_INC_PENDING_FLAG, incLoadPendFlag); Review comment: done. Dump will fail if the inc pending flag is set to true This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203228) Time Spent: 9h (was: 8h 50m) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 9h > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203223&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203223 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 12:49 Start Date: 24/Feb/19 12:49 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259618842 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java ## @@ -187,4 +192,12 @@ public static PathFilter getEventsDirectoryFilter(final FileSystem fs) { } }; } + + public static boolean isFirstIncDone(Map parameter) { +if (parameter == null) { + return true; +} +String compFlag = parameter.get(ReplUtils.REPL_FIRST_INC_PENDING_FLAG); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203223) Time Spent: 8h 50m (was: 8h 40m) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 8h 50m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203222&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203222 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 12:49 Start Date: 24/Feb/19 12:49 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259618838 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java ## @@ -187,4 +192,12 @@ public static PathFilter getEventsDirectoryFilter(final FileSystem fs) { } }; } + + public static boolean isFirstIncDone(Map parameter) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203222) Time Spent: 8h 40m (was: 8.5h) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 8h 40m > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled
[ https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203219 ] ASF GitHub Bot logged work on HIVE-21197: - Author: ASF GitHub Bot Created on: 24/Feb/19 12:23 Start Date: 24/Feb/19 12:23 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled URL: https://github.com/apache/hive/pull/541#discussion_r259617900 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java ## @@ -61,6 +62,21 @@ public ReplCopyTask(){ super(); } + // If file is already present in base directory, then remove it from the list. + // Check HIVE-21197 for more detail + private void updateSrcFileListForDupCopy(FileSystem dstFs, Path toPath, List srcFiles, + long writeId, int stmtId) throws IOException { +ListIterator iter = srcFiles.listIterator(); +Path basePath = new Path(toPath, AcidUtils.baseOrDeltaSubdir(true, writeId, writeId, stmtId)); +while (iter.hasNext()) { + Path filePath = new Path(basePath, iter.next().getSourcePath().getName()); + if (dstFs.exists(filePath)) { Review comment: the i/o exception retry case is handled specifically at 2 places only. there are many other i/o failure scenarios which are not handled. I think its not required here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 203219) Time Spent: 8.5h (was: 8h 20m) > Hive replication can add duplicate data during migration to a target with > hive.strict.managed.tables enabled > > > Key: HIVE-21197 > URL: https://issues.apache.org/jira/browse/HIVE-21197 > Project: Hive > Issue Type: Task > Components: repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch > > Time Spent: 8.5h > Remaining Estimate: 0h > > During bootstrap phase it may happen that the files copied to target are > created by events which are not part of the bootstrap. This is because of the > fact that, bootstrap first gets the last event id and then the file list. > During this period if some event are added, then bootstrap will include files > created by these events also.The same files will be copied again during the > first incremental replication just after the bootstrap. In normal scenario, > the duplicate copy does not cause any issue as hive allows the use of target > database only after the first incremental. But in case of migration, the file > at source and target are copied to different location (based on the write id > at target) and thus this may lead to duplicate data at target. This can be > avoided by having at check at load time for duplicate file. This check can be > done only for the first incremental and the search can be done in the > bootstrap directory (with write id 1). if the file is already present then > just ignore the copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21312) FSStatsAggregator::connect is slow
[ https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-21312: Attachment: HIVE-21312.1.patch > FSStatsAggregator::connect is slow > -- > > Key: HIVE-21312 > URL: https://issues.apache.org/jira/browse/HIVE-21312 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Rajesh Balamohan >Priority: Trivial > Attachments: HIVE-21312.1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21312) FSStatsAggregator::connect is slow
[ https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-21312: Status: Patch Available (was: Open) > FSStatsAggregator::connect is slow > -- > > Key: HIVE-21312 > URL: https://issues.apache.org/jira/browse/HIVE-21312 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Rajesh Balamohan >Priority: Trivial > Attachments: HIVE-21312.1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-16683) ORC WriterVersion gets ArrayIndexOutOfBoundsException on newer ORC files
[ https://issues.apache.org/jira/browse/HIVE-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776214#comment-16776214 ] Bo Hai commented on HIVE-16683: --- [~owen.omalley] Does this patch impact forward compatibility of orc reader in hive 2.1.1 ? > ORC WriterVersion gets ArrayIndexOutOfBoundsException on newer ORC files > > > Key: HIVE-16683 > URL: https://issues.apache.org/jira/browse/HIVE-16683 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.1, 2.2.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Fix For: 2.2.0 > > Attachments: HIVE-16683.patch > > > This only impacts branch-2.1 and branch-2.2, because it has been fixed in the > ORC project's code base via ORC-125. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-21286: Summary: Hive should support clean-up of previously bootstrapped tables when retry from different dump. (was: Hive should support clean-up of incrementally bootstrapped tables when retry from different dump.) > Hive should support clean-up of previously bootstrapped tables when retry > from different dump. > -- > > Key: HIVE-21286 > URL: https://issues.apache.org/jira/browse/HIVE-21286 > Project: Hive > Issue Type: Bug > Components: repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication > > If external tables are enabled for replication on an existing repl policy, > then bootstrapping of external tables are combined with incremental dump. > If incremental bootstrap load fails with non-retryable error for which user > will have to manually drop all the external tables before trying with another > bootstrap dump. For full bootstrap, to retry with different dump, we > suggested user to drop the DB but in this case they need to manually drop all > the external tables which is not so user friendly. So, need to handle it in > Hive side as follows. > REPL LOAD takes additional config (passed by user in WITH clause) that says, > drop all the tables which are bootstrapped from previous dump. > hive.repl.cleanup.bootstrap= > Hive will use this config only if the current dump is bootstrap dump or > combined bootstrap in incremental dump. > Caution to be taken by user that this config should not be passed if previous > REPL LOAD (with bootstrap) was successful or any successful incremental > dump+load happened after "previous_bootstrap_dump_dir". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21313) Use faster function to point to instead of copy immutable byte arrays
[ https://issues.apache.org/jira/browse/HIVE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangXin updated HIVE-21313: Summary: Use faster function to point to instead of copy immutable byte arrays (was: Use faster function to point to instead of copy the immutable byte array) > Use faster function to point to instead of copy immutable byte arrays > - > > Key: HIVE-21313 > URL: https://issues.apache.org/jira/browse/HIVE-21313 > Project: Hive > Issue Type: Improvement >Reporter: ZhangXin >Assignee: ZhangXin >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-21313.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In file ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAssignRow.java > We may find code like this: > ``` > Text text = (Text) convertTargetWritable; > if (text == null) > { text = new Text(); } > text.set(string); > ((BytesColumnVector) columnVector).setVal( > batchIndex, text.getBytes(), 0, text.getLength()); > ``` > > Using `setVal` method can copy the bytes array generated by > `text.getBytes()`. This is totally unnecessary at all. Since the bytes array > is immutable, we can just use `setRef` method to point to the specific byte > array, which will also lower the memory usage. > > Pull request on Github: https://github.com/apache/hive/pull/548 > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21313) Use faster function to point to instead of copy the immutable byte array
[ https://issues.apache.org/jira/browse/HIVE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangXin updated HIVE-21313: Summary: Use faster function to point to instead of copy the immutable byte array (was: Use faster function to prevent copying immutable byte array twice) > Use faster function to point to instead of copy the immutable byte array > > > Key: HIVE-21313 > URL: https://issues.apache.org/jira/browse/HIVE-21313 > Project: Hive > Issue Type: Improvement >Reporter: ZhangXin >Assignee: ZhangXin >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-21313.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In file ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAssignRow.java > We may find code like this: > ``` > Text text = (Text) convertTargetWritable; > if (text == null) > { text = new Text(); } > text.set(string); > ((BytesColumnVector) columnVector).setVal( > batchIndex, text.getBytes(), 0, text.getLength()); > ``` > > Using `setVal` method can copy the bytes array generated by > `text.getBytes()`. This is totally unnecessary at all. Since the bytes array > is immutable, we can just use `setRef` method to point to the specific byte > array, which will also lower the memory usage. > > Pull request on Github: https://github.com/apache/hive/pull/548 > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21307) Need to set GzipJSONMessageEncoder as default config for EVENT_MESSAGE_FACTORY.
[ https://issues.apache.org/jira/browse/HIVE-21307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-21307: Resolution: Fixed Fix Version/s: 4.0.0 Status: Resolved (was: Patch Available) 02.patch committed to master. Thanks [~maheshk114] for the review! > Need to set GzipJSONMessageEncoder as default config for > EVENT_MESSAGE_FACTORY. > --- > > Key: HIVE-21307 > URL: https://issues.apache.org/jira/browse/HIVE-21307 > Project: Hive > Issue Type: Bug > Components: Configuration, repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication, pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21307.01.patch, HIVE-21307.02.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we use JsonMessageEncoder as the default message factory for > Notification events. As the size of some of the events are really huge and > cause OOM issues in RDBMS. So, it is needed to enable GzipJSONMessageEncoder > as default message factory to optimise the memory usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21286) Hive should support clean-up of incrementally bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-21286: Description: If external tables are enabled for replication on an existing repl policy, then bootstrapping of external tables are combined with incremental dump. If incremental bootstrap load fails with non-retryable error for which user will have to manually drop all the external tables before trying with another bootstrap dump. For full bootstrap, to retry with different dump, we suggested user to drop the DB but in this case they need to manually drop all the external tables which is not so user friendly. So, need to handle it in Hive side as follows. REPL LOAD takes additional config (passed by user in WITH clause) that says, drop all the tables which are bootstrapped from previous dump. hive.repl.cleanup.bootstrap= Hive will use this config only if the current dump is bootstrap dump or combined bootstrap in incremental dump. Caution to be taken by user that this config should not be passed if previous REPL LOAD (with bootstrap) was successful or any successful incremental dump+load happened after "previous_bootstrap_dump_dir". was: If external tables are enabled for replication on an existing repl policy, then bootstrapping of external tables are combined with incremental dump. If incremental bootstrap load fails with non-retryable error for which user will have to manually drop all the external tables before trying with another bootstrap dump. For full bootstrap, to retry with different dump, we suggested user to drop the DB but in this case they need to manually drop all the external tables which is not so user friendly. So, need to handle it in Hive side as follows. REPL LOAD takes additional config (passed by user in WITH clause) that says, drop all the tables which are bootstrapped from previous dump. hive.repl.cleanup.bootstrap= Hive will use this config only if the current dump is bootstrap dump or combined bootstrap in incremental dump. Caution to be taken by user that this config should not be passed if previous REPL LOAD (with bootstrap) was successful or any successful incremental dump+load happened after "previous_bootstrap_dump_dir". > Hive should support clean-up of incrementally bootstrapped tables when retry > from different dump. > - > > Key: HIVE-21286 > URL: https://issues.apache.org/jira/browse/HIVE-21286 > Project: Hive > Issue Type: Bug > Components: repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication > > If external tables are enabled for replication on an existing repl policy, > then bootstrapping of external tables are combined with incremental dump. > If incremental bootstrap load fails with non-retryable error for which user > will have to manually drop all the external tables before trying with another > bootstrap dump. For full bootstrap, to retry with different dump, we > suggested user to drop the DB but in this case they need to manually drop all > the external tables which is not so user friendly. So, need to handle it in > Hive side as follows. > REPL LOAD takes additional config (passed by user in WITH clause) that says, > drop all the tables which are bootstrapped from previous dump. > hive.repl.cleanup.bootstrap= > Hive will use this config only if the current dump is bootstrap dump or > combined bootstrap in incremental dump. > Caution to be taken by user that this config should not be passed if previous > REPL LOAD (with bootstrap) was successful or any successful incremental > dump+load happened after "previous_bootstrap_dump_dir". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21307) Need to set GzipJSONMessageEncoder as default config for EVENT_MESSAGE_FACTORY.
[ https://issues.apache.org/jira/browse/HIVE-21307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776152#comment-16776152 ] Sankar Hariappan commented on HIVE-21307: - Here is the link to +1 from [~maheshk114] as it is hidden in the bunch of flaky ptest failure comments. https://issues.apache.org/jira/browse/HIVE-21307?focusedCommentId=16775876&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16775876 > Need to set GzipJSONMessageEncoder as default config for > EVENT_MESSAGE_FACTORY. > --- > > Key: HIVE-21307 > URL: https://issues.apache.org/jira/browse/HIVE-21307 > Project: Hive > Issue Type: Bug > Components: Configuration, repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication, pull-request-available > Attachments: HIVE-21307.01.patch, HIVE-21307.02.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we use JsonMessageEncoder as the default message factory for > Notification events. As the size of some of the events are really huge and > cause OOM issues in RDBMS. So, it is needed to enable GzipJSONMessageEncoder > as default message factory to optimise the memory usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21313) Use faster function to prevent copying immutable byte array twice
[ https://issues.apache.org/jira/browse/HIVE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangXin updated HIVE-21313: Priority: Minor (was: Major) > Use faster function to prevent copying immutable byte array twice > - > > Key: HIVE-21313 > URL: https://issues.apache.org/jira/browse/HIVE-21313 > Project: Hive > Issue Type: Improvement >Reporter: ZhangXin >Assignee: ZhangXin >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-21313.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In file ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAssignRow.java > We may find code like this: > ``` > Text text = (Text) convertTargetWritable; > if (text == null) > { text = new Text(); } > text.set(string); > ((BytesColumnVector) columnVector).setVal( > batchIndex, text.getBytes(), 0, text.getLength()); > ``` > > Using `setVal` method can copy the bytes array generated by > `text.getBytes()`. This is totally unnecessary at all. Since the bytes array > is immutable, we can just use `setRef` method to point to the specific byte > array, which will also lower the memory usage. > > Pull request on Github: https://github.com/apache/hive/pull/548 > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21286) Hive should support clean-up of incrementally bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-21286: Description: If external tables are enabled for replication on an existing repl policy, then bootstrapping of external tables are combined with incremental dump. If incremental bootstrap load fails with non-retryable error for which user will have to manually drop all the external tables before trying with another bootstrap dump. For full bootstrap, to retry with different dump, we suggested user to drop the DB but in this case they need to manually drop all the external tables which is not so user friendly. So, need to handle it in Hive side as follows. REPL LOAD takes additional config (passed by user in WITH clause) that says, drop all the tables which are bootstrapped from previous dump. hive.repl.cleanup.bootstrap= Caution to be taken by user that this config should not be passed if previous REPL LOAD (with bootstrap) was successful or any successful incremental dump+load happened after "previous_bootstrap_dump_dir". was: If external tables are enabled for replication on an existing repl policy, then bootstrapping of external tables are combined with incremental dump. If incremental bootstrap load fails with non-retryable error for which user will have to manually drop all the external tables before trying with another bootstrap dump. For full bootstrap, to retry with different dump, we suggested user to drop the DB but in this case they need to manually drop all the external tables which is not so user friendly. So, need to handle it in Hive side as follows. REPL LOAD takes additional config (passed by user in WITH clause) that says, drop all the tables which are part of this bootstrap dump. There are 4 cases possible. 1. Only external tables - Drop all external tables before triggering bootstrap load. 2. Only ACID/MM tables - Drop all ACID/MM tables before triggering bootstrap load. 3. Both external and ACID/MM tables - Drop both external and ACID/MM tables before triggering bootstrap load. 3. Table level replication with bootstrap - Drop all the tables that match the diff in previous and current repl policy (pattern+include/exclude list) before triggering bootstrap load. Configuration: hive.repl.bootstrap.cleanup.type= {1=external_tables, 2=transactional_tables, 3=external_and_transactional_tables, 4=table_level} > Hive should support clean-up of incrementally bootstrapped tables when retry > from different dump. > - > > Key: HIVE-21286 > URL: https://issues.apache.org/jira/browse/HIVE-21286 > Project: Hive > Issue Type: Bug > Components: repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication > > If external tables are enabled for replication on an existing repl policy, > then bootstrapping of external tables are combined with incremental dump. > If incremental bootstrap load fails with non-retryable error for which user > will have to manually drop all the external tables before trying with another > bootstrap dump. For full bootstrap, to retry with different dump, we > suggested user to drop the DB but in this case they need to manually drop all > the external tables which is not so user friendly. So, need to handle it in > Hive side as follows. > REPL LOAD takes additional config (passed by user in WITH clause) that says, > drop all the tables which are bootstrapped from previous dump. > hive.repl.cleanup.bootstrap= > Caution to be taken by user that this config should not be passed if previous > REPL LOAD (with bootstrap) was successful or any successful incremental > dump+load happened after "previous_bootstrap_dump_dir". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21286) Hive should support clean-up of incrementally bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-21286: Description: If external tables are enabled for replication on an existing repl policy, then bootstrapping of external tables are combined with incremental dump. If incremental bootstrap load fails with non-retryable error for which user will have to manually drop all the external tables before trying with another bootstrap dump. For full bootstrap, to retry with different dump, we suggested user to drop the DB but in this case they need to manually drop all the external tables which is not so user friendly. So, need to handle it in Hive side as follows. REPL LOAD takes additional config (passed by user in WITH clause) that says, drop all the tables which are bootstrapped from previous dump. hive.repl.cleanup.bootstrap= Hive will use this config only if the current dump is bootstrap dump or combined bootstrap in incremental dump. Caution to be taken by user that this config should not be passed if previous REPL LOAD (with bootstrap) was successful or any successful incremental dump+load happened after "previous_bootstrap_dump_dir". was: If external tables are enabled for replication on an existing repl policy, then bootstrapping of external tables are combined with incremental dump. If incremental bootstrap load fails with non-retryable error for which user will have to manually drop all the external tables before trying with another bootstrap dump. For full bootstrap, to retry with different dump, we suggested user to drop the DB but in this case they need to manually drop all the external tables which is not so user friendly. So, need to handle it in Hive side as follows. REPL LOAD takes additional config (passed by user in WITH clause) that says, drop all the tables which are bootstrapped from previous dump. hive.repl.cleanup.bootstrap= Hive will use this config only if the current dump is bootstrap dump or bootstrap in incremental dump. Caution to be taken by user that this config should not be passed if previous REPL LOAD (with bootstrap) was successful or any successful incremental dump+load happened after "previous_bootstrap_dump_dir". > Hive should support clean-up of incrementally bootstrapped tables when retry > from different dump. > - > > Key: HIVE-21286 > URL: https://issues.apache.org/jira/browse/HIVE-21286 > Project: Hive > Issue Type: Bug > Components: repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication > > If external tables are enabled for replication on an existing repl policy, > then bootstrapping of external tables are combined with incremental dump. > If incremental bootstrap load fails with non-retryable error for which user > will have to manually drop all the external tables before trying with another > bootstrap dump. For full bootstrap, to retry with different dump, we > suggested user to drop the DB but in this case they need to manually drop all > the external tables which is not so user friendly. So, need to handle it in > Hive side as follows. > REPL LOAD takes additional config (passed by user in WITH clause) that says, > drop all the tables which are bootstrapped from previous dump. > hive.repl.cleanup.bootstrap= > Hive will use this config only if the current dump is bootstrap dump or > combined bootstrap in incremental dump. > Caution to be taken by user that this config should not be passed if previous > REPL LOAD (with bootstrap) was successful or any successful incremental > dump+load happened after "previous_bootstrap_dump_dir". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21307) Need to set GzipJSONMessageEncoder as default config for EVENT_MESSAGE_FACTORY.
[ https://issues.apache.org/jira/browse/HIVE-21307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776149#comment-16776149 ] Hive QA commented on HIVE-21307: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12959928/HIVE-21307.02.patch {color:green}SUCCESS:{color} +1 due to 8 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 15811 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16220/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16220/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16220/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12959928 - PreCommit-HIVE-Build > Need to set GzipJSONMessageEncoder as default config for > EVENT_MESSAGE_FACTORY. > --- > > Key: HIVE-21307 > URL: https://issues.apache.org/jira/browse/HIVE-21307 > Project: Hive > Issue Type: Bug > Components: Configuration, repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication, pull-request-available > Attachments: HIVE-21307.01.patch, HIVE-21307.02.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we use JsonMessageEncoder as the default message factory for > Notification events. As the size of some of the events are really huge and > cause OOM issues in RDBMS. So, it is needed to enable GzipJSONMessageEncoder > as default message factory to optimise the memory usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21286) Hive should support clean-up of incrementally bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-21286: Description: If external tables are enabled for replication on an existing repl policy, then bootstrapping of external tables are combined with incremental dump. If incremental bootstrap load fails with non-retryable error for which user will have to manually drop all the external tables before trying with another bootstrap dump. For full bootstrap, to retry with different dump, we suggested user to drop the DB but in this case they need to manually drop all the external tables which is not so user friendly. So, need to handle it in Hive side as follows. REPL LOAD takes additional config (passed by user in WITH clause) that says, drop all the tables which are bootstrapped from previous dump. hive.repl.cleanup.bootstrap= Hive will use this config only if the current dump is bootstrap dump or bootstrap in incremental dump. Caution to be taken by user that this config should not be passed if previous REPL LOAD (with bootstrap) was successful or any successful incremental dump+load happened after "previous_bootstrap_dump_dir". was: If external tables are enabled for replication on an existing repl policy, then bootstrapping of external tables are combined with incremental dump. If incremental bootstrap load fails with non-retryable error for which user will have to manually drop all the external tables before trying with another bootstrap dump. For full bootstrap, to retry with different dump, we suggested user to drop the DB but in this case they need to manually drop all the external tables which is not so user friendly. So, need to handle it in Hive side as follows. REPL LOAD takes additional config (passed by user in WITH clause) that says, drop all the tables which are bootstrapped from previous dump. hive.repl.cleanup.bootstrap= Caution to be taken by user that this config should not be passed if previous REPL LOAD (with bootstrap) was successful or any successful incremental dump+load happened after "previous_bootstrap_dump_dir". > Hive should support clean-up of incrementally bootstrapped tables when retry > from different dump. > - > > Key: HIVE-21286 > URL: https://issues.apache.org/jira/browse/HIVE-21286 > Project: Hive > Issue Type: Bug > Components: repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication > > If external tables are enabled for replication on an existing repl policy, > then bootstrapping of external tables are combined with incremental dump. > If incremental bootstrap load fails with non-retryable error for which user > will have to manually drop all the external tables before trying with another > bootstrap dump. For full bootstrap, to retry with different dump, we > suggested user to drop the DB but in this case they need to manually drop all > the external tables which is not so user friendly. So, need to handle it in > Hive side as follows. > REPL LOAD takes additional config (passed by user in WITH clause) that says, > drop all the tables which are bootstrapped from previous dump. > hive.repl.cleanup.bootstrap= > Hive will use this config only if the current dump is bootstrap dump or > bootstrap in incremental dump. > Caution to be taken by user that this config should not be passed if previous > REPL LOAD (with bootstrap) was successful or any successful incremental > dump+load happened after "previous_bootstrap_dump_dir". -- This message was sent by Atlassian JIRA (v7.6.3#76005)