[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736871#comment-16736871 ] anishek commented on HIVE-20911: Patch committed to master. Thanks for review [~sankarh]/[~ashutosh.bapat] > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch, > HIVE-20911.08.patch, HIVE-20911.08.patch, HIVE-20911.09.patch, > HIVE-20911.10.patch, HIVE-20911.11.patch, HIVE-20911.12.patch, > HIVE-20911.12.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736851#comment-16736851 ] Hive QA commented on HIVE-20911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12954100/HIVE-20911.12.patch {color:green}SUCCESS:{color} +1 due to 8 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 15695 tests executed *Failed tests:* {noformat} TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=251) TestReplAcidTablesWithJsonMessage - did not produce a TEST-*.xml file (likely timed out) (batchId=251) TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely timed out) (batchId=251) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15539/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15539/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15539/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12954100 - PreCommit-HIVE-Build > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch, > HIVE-20911.08.patch, HIVE-20911.08.patch, HIVE-20911.09.patch, > HIVE-20911.10.patch, HIVE-20911.11.patch, HIVE-20911.12.patch, > HIVE-20911.12.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736844#comment-16736844 ] Hive QA commented on HIVE-20911: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 33s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 36s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 44s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 37s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 29s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 52s{color} | {color:blue} ql in master has 2309 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 40s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 23s{color} | {color:blue} testutils/ptest2 in master has 24 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 6s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 45s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s{color} | {color:red} ql: The patch generated 11 new + 431 unchanged - 12 fixed = 442 total (was 443) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s{color} | {color:red} itests/hive-unit: The patch generated 56 new + 728 unchanged - 48 fixed = 784 total (was 776) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 55s{color} | {color:red} ql generated 2 new + 2308 unchanged - 1 fixed = 2310 total (was 2309) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 12s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 32s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | The field org.apache.hadoop.hive.ql.exec.repl.ReplLoadWork.pathsToCopyIterator is transient but isn't set by deserialization In ReplLoadWork.java:but isn't set by deserialization In ReplLoadWork.java | | | Write to static field org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.numIteration from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:[line 100] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15539/dev-support/hive-personality.sh | | git revision | master / 0dbb896 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle |
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735566#comment-16735566 ] anishek commented on HIVE-20911: [~vihangk1] Thanks for the exclusion, looks like there is another test which is causing report publish timeout "TestReplTableMigrationWithJsonFormat" can you please exclude this as well from batching > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch, > HIVE-20911.08.patch, HIVE-20911.08.patch, HIVE-20911.09.patch, > HIVE-20911.10.patch, HIVE-20911.11.patch, HIVE-20911.12.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735552#comment-16735552 ] Hive QA commented on HIVE-20911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12953942/HIVE-20911.12.patch {color:green}SUCCESS:{color} +1 due to 8 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 15694 tests executed *Failed tests:* {noformat} TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=251) TestReplAcidTablesWithJsonMessage - did not produce a TEST-*.xml file (likely timed out) (batchId=251) TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely timed out) (batchId=251) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15521/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15521/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15521/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12953942 - PreCommit-HIVE-Build > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch, > HIVE-20911.08.patch, HIVE-20911.08.patch, HIVE-20911.09.patch, > HIVE-20911.10.patch, HIVE-20911.11.patch, HIVE-20911.12.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735545#comment-16735545 ] Hive QA commented on HIVE-20911: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 4s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 39s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 30s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 45s{color} | {color:blue} ql in master has 2311 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 37s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 23s{color} | {color:blue} testutils/ptest2 in master has 24 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 57s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 49s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s{color} | {color:red} ql: The patch generated 11 new + 431 unchanged - 12 fixed = 442 total (was 443) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s{color} | {color:red} itests/hive-unit: The patch generated 56 new + 728 unchanged - 48 fixed = 784 total (was 776) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 51s{color} | {color:red} ql generated 2 new + 2310 unchanged - 1 fixed = 2312 total (was 2311) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 11s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 51s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | The field org.apache.hadoop.hive.ql.exec.repl.ReplLoadWork.pathsToCopyIterator is transient but isn't set by deserialization In ReplLoadWork.java:but isn't set by deserialization In ReplLoadWork.java | | | Write to static field org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.numIteration from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:[line 100] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15521/dev-support/hive-personality.sh | | git revision | master / 16d39c6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle |
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734464#comment-16734464 ] Vihang Karajgaonkar commented on HIVE-20911: done > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch, > HIVE-20911.08.patch, HIVE-20911.08.patch, HIVE-20911.09.patch, > HIVE-20911.10.patch, HIVE-20911.11.patch, HIVE-20911.12.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733959#comment-16733959 ] anishek commented on HIVE-20911: [~vihangk1]/[~janulatha] can you please exclude TestReplWithJsonMessageFormat on the apache build servers from being batched ? I am not able to get a green build for my patch . > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch, > HIVE-20911.08.patch, HIVE-20911.08.patch, HIVE-20911.09.patch, > HIVE-20911.10.patch, HIVE-20911.11.patch, HIVE-20911.12.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733925#comment-16733925 ] Hive QA commented on HIVE-20911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12953718/HIVE-20911.12.patch {color:green}SUCCESS:{color} +1 due to 8 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 15688 tests executed *Failed tests:* {noformat} TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=250) TestReplAcidTablesWithJsonMessage - did not produce a TEST-*.xml file (likely timed out) (batchId=250) TestReplTableMigrationWithJsonFormat - did not produce a TEST-*.xml file (likely timed out) (batchId=250) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerCustomCreatedDynamicPartitions (batchId=261) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerCustomCreatedDynamicPartitionsUnionAll (batchId=261) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes (batchId=261) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15498/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15498/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15498/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12953718 - PreCommit-HIVE-Build > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch, > HIVE-20911.08.patch, HIVE-20911.08.patch, HIVE-20911.09.patch, > HIVE-20911.10.patch, HIVE-20911.11.patch, HIVE-20911.12.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733920#comment-16733920 ] Hive QA commented on HIVE-20911: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 35s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 34s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 31s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 39s{color} | {color:blue} ql in master has 2311 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 38s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 20s{color} | {color:blue} testutils/ptest2 in master has 24 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 46s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 33s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s{color} | {color:red} ql: The patch generated 11 new + 431 unchanged - 12 fixed = 442 total (was 443) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s{color} | {color:red} itests/hive-unit: The patch generated 56 new + 728 unchanged - 48 fixed = 784 total (was 776) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 44s{color} | {color:red} ql generated 2 new + 2310 unchanged - 1 fixed = 2312 total (was 2311) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 11s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 37s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | The field org.apache.hadoop.hive.ql.exec.repl.ReplLoadWork.pathsToCopyIterator is transient but isn't set by deserialization In ReplLoadWork.java:but isn't set by deserialization In ReplLoadWork.java | | | Write to static field org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.numIteration from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:[line 100] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15498/dev-support/hive-personality.sh | | git revision | master / 138b00c | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle |
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733831#comment-16733831 ] Hive QA commented on HIVE-20911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12953699/HIVE-20911.11.patch {color:green}SUCCESS:{color} +1 due to 8 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 15688 tests executed *Failed tests:* {noformat} TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=250) TestReplAcidTablesWithJsonMessage - did not produce a TEST-*.xml file (likely timed out) (batchId=250) TestReplTableMigrationWithJsonFormat - did not produce a TEST-*.xml file (likely timed out) (batchId=250) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testBootstrapLoadMigrationManagedToAcid (batchId=244) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testBootstrapLoadMigrationToAcidWithMoveOptimization (batchId=244) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcid (batchId=244) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailure (batchId=244) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailurePart (batchId=244) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15496/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15496/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15496/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12953699 - PreCommit-HIVE-Build > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch, > HIVE-20911.08.patch, HIVE-20911.08.patch, HIVE-20911.09.patch, > HIVE-20911.10.patch, HIVE-20911.11.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733828#comment-16733828 ] Hive QA commented on HIVE-20911: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 41s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 51s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 36s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 32s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 47s{color} | {color:blue} ql in master has 2311 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 41s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 23s{color} | {color:blue} testutils/ptest2 in master has 24 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 0s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 46s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 39s{color} | {color:red} ql: The patch generated 11 new + 431 unchanged - 12 fixed = 442 total (was 443) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s{color} | {color:red} itests/hive-unit: The patch generated 56 new + 728 unchanged - 48 fixed = 784 total (was 776) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 57s{color} | {color:red} ql generated 2 new + 2310 unchanged - 1 fixed = 2312 total (was 2311) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 12s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 65m 4s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | The field org.apache.hadoop.hive.ql.exec.repl.ReplLoadWork.pathsToCopyIterator is transient but isn't set by deserialization In ReplLoadWork.java:but isn't set by deserialization In ReplLoadWork.java | | | Write to static field org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.numIteration from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:[line 100] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15496/dev-support/hive-personality.sh | | git revision | master / 138b00c | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle |
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733186#comment-16733186 ] Hive QA commented on HIVE-20911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12953622/HIVE-20911.10.patch {color:green}SUCCESS:{color} +1 due to 8 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 15688 tests executed *Failed tests:* {noformat} TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=250) TestReplAcidTablesWithJsonMessage - did not produce a TEST-*.xml file (likely timed out) (batchId=250) TestReplTableMigrationWithJsonFormat - did not produce a TEST-*.xml file (likely timed out) (batchId=250) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testBootstrapLoadMigrationManagedToAcid (batchId=244) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testBootstrapLoadMigrationToAcidWithMoveOptimization (batchId=244) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcid (batchId=244) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailure (batchId=244) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailurePart (batchId=244) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15481/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15481/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15481/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12953622 - PreCommit-HIVE-Build > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch, > HIVE-20911.08.patch, HIVE-20911.08.patch, HIVE-20911.09.patch, > HIVE-20911.10.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution >
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733182#comment-16733182 ] Hive QA commented on HIVE-20911: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 31s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 38s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 34s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 28s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 39s{color} | {color:blue} ql in master has 2312 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 38s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 24s{color} | {color:blue} testutils/ptest2 in master has 24 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 4s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 39s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s{color} | {color:red} ql: The patch generated 11 new + 431 unchanged - 12 fixed = 442 total (was 443) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s{color} | {color:red} itests/hive-unit: The patch generated 56 new + 728 unchanged - 48 fixed = 784 total (was 776) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 50s{color} | {color:red} ql generated 2 new + 2311 unchanged - 1 fixed = 2313 total (was 2312) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 1s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 11s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 0s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | The field org.apache.hadoop.hive.ql.exec.repl.ReplLoadWork.pathsToCopyIterator is transient but isn't set by deserialization In ReplLoadWork.java:but isn't set by deserialization In ReplLoadWork.java | | | Write to static field org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.numIteration from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:[line 100] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15481/dev-support/hive-personality.sh | | git revision | master / 691c4cb | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle |
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732681#comment-16732681 ] Hive QA commented on HIVE-20911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12953565/HIVE-20911.08.patch {color:green}SUCCESS:{color} +1 due to 8 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 15694 tests executed *Failed tests:* {noformat} TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=250) TestReplAcidTablesWithJsonMessage - did not produce a TEST-*.xml file (likely timed out) (batchId=250) org.apache.hadoop.hive.ql.parse.TestReplTableMigrationWithJsonFormat.testBootstrapLoadMigrationManagedToAcid (batchId=250) org.apache.hadoop.hive.ql.parse.TestReplTableMigrationWithJsonFormat.testBootstrapLoadMigrationToAcidWithMoveOptimization (batchId=250) org.apache.hadoop.hive.ql.parse.TestReplTableMigrationWithJsonFormat.testIncrementalLoadMigrationManagedToAcid (batchId=250) org.apache.hadoop.hive.ql.parse.TestReplTableMigrationWithJsonFormat.testIncrementalLoadMigrationManagedToAcidAllOp (batchId=250) org.apache.hadoop.hive.ql.parse.TestReplTableMigrationWithJsonFormat.testIncrementalLoadMigrationManagedToAcidFailure (batchId=250) org.apache.hadoop.hive.ql.parse.TestReplTableMigrationWithJsonFormat.testIncrementalLoadMigrationManagedToAcidFailurePart (batchId=250) org.apache.hadoop.hive.ql.parse.TestReplTableMigrationWithJsonFormat.testIncrementalLoadMigrationToAcidWithMoveOptimization (batchId=250) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testBootstrapLoadMigrationManagedToAcid (batchId=244) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testBootstrapLoadMigrationToAcidWithMoveOptimization (batchId=244) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcid (batchId=244) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailure (batchId=244) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailurePart (batchId=244) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationToAcidWithMoveOptimization (batchId=244) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15467/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15467/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15467/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12953565 - PreCommit-HIVE-Build > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch, > HIVE-20911.08.patch, HIVE-20911.08.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732678#comment-16732678 ] Hive QA commented on HIVE-20911: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 46s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 3s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 36s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 31s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 48s{color} | {color:blue} ql in master has 2312 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 40s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 24s{color} | {color:blue} testutils/ptest2 in master has 24 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 8s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 49s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 38s{color} | {color:red} ql: The patch generated 13 new + 431 unchanged - 12 fixed = 444 total (was 443) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s{color} | {color:red} itests/hive-unit: The patch generated 55 new + 729 unchanged - 47 fixed = 784 total (was 776) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 44s{color} | {color:red} ql generated 2 new + 2311 unchanged - 1 fixed = 2313 total (was 2312) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 12s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | The field org.apache.hadoop.hive.ql.exec.repl.ReplLoadWork.pathsToCopyIterator is transient but isn't set by deserialization In ReplLoadWork.java:but isn't set by deserialization In ReplLoadWork.java | | | Write to static field org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.numIteration from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:[line 100] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15467/dev-support/hive-personality.sh | | git revision | master / dc215b1 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle |
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731845#comment-16731845 ] Sankar Hariappan commented on HIVE-20911: - +1, pending tests for 08.patch > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch, > HIVE-20911.08.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726672#comment-16726672 ] Sankar Hariappan commented on HIVE-20911: - [~anishek] I posted few comments in the PR link. Please take a look. Thanks! > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726455#comment-16726455 ] anishek commented on HIVE-20911: [~thejas]/[~ashutoshc] can you please help get the replication test marked to skip batching on the build servers ? > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725711#comment-16725711 ] Hive QA commented on HIVE-20911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12952464/HIVE-20911.07.patch {color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 15721 tests executed *Failed tests:* {noformat} TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=251) TestReplAcidTablesWithJsonMessage - did not produce a TEST-*.xml file (likely timed out) (batchId=251) TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely timed out) (batchId=251) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15398/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15398/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15398/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12952464 - PreCommit-HIVE-Build > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725704#comment-16725704 ] Hive QA commented on HIVE-20911: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 30s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 24s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 25s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 29s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 41s{color} | {color:blue} ql in master has 2310 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 38s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 27s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 15s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 41s{color} | {color:red} ql: The patch generated 13 new + 390 unchanged - 12 fixed = 403 total (was 402) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s{color} | {color:red} itests/hive-unit: The patch generated 24 new + 737 unchanged - 37 fixed = 761 total (was 774) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 51s{color} | {color:red} ql generated 2 new + 2309 unchanged - 1 fixed = 2311 total (was 2310) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 33s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 60m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | The field org.apache.hadoop.hive.ql.exec.repl.ReplLoadWork.pathsToCopyIterator is transient but isn't set by deserialization In ReplLoadWork.java:but isn't set by deserialization In ReplLoadWork.java | | | Write to static field org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.numIteration from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:[line 100] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15398/dev-support/hive-personality.sh | | git revision | master / 1020be0 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-15398/yetus/diff-checkstyle-ql.txt | | checkstyle |
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725648#comment-16725648 ] anishek commented on HIVE-20911: [~vihangk1] can you please help me get another test in skip-batching from Replication side: test name is "TestReplWithJsonMessageFormat". > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725625#comment-16725625 ] Hive QA commented on HIVE-20911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12952449/HIVE-20911.07.patch {color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 15721 tests executed *Failed tests:* {noformat} TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=251) TestReplAcidTablesWithJsonMessage - did not produce a TEST-*.xml file (likely timed out) (batchId=251) TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely timed out) (batchId=251) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15396/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15396/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15396/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12952449 - PreCommit-HIVE-Build > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch, HIVE-20911.07.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725622#comment-16725622 ] Hive QA commented on HIVE-20911: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 30s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 30s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 29s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 47s{color} | {color:blue} ql in master has 2310 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 38s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 39s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 28s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 39s{color} | {color:red} ql: The patch generated 13 new + 390 unchanged - 12 fixed = 403 total (was 402) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s{color} | {color:red} itests/hive-unit: The patch generated 24 new + 737 unchanged - 37 fixed = 761 total (was 774) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 48s{color} | {color:red} ql generated 2 new + 2309 unchanged - 1 fixed = 2311 total (was 2310) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 11s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 60m 59s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | The field org.apache.hadoop.hive.ql.exec.repl.ReplLoadWork.pathsToCopyIterator is transient but isn't set by deserialization In ReplLoadWork.java:but isn't set by deserialization In ReplLoadWork.java | | | Write to static field org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.numIteration from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:[line 100] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15396/dev-support/hive-personality.sh | | git revision | master / 1020be0 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-15396/yetus/diff-checkstyle-ql.txt | | checkstyle |
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723911#comment-16723911 ] Hive QA commented on HIVE-20911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12952147/HIVE-20911.06.patch {color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 15717 tests executed *Failed tests:* {noformat} TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=250) TestReplAcidTablesWithJsonMessage - did not produce a TEST-*.xml file (likely timed out) (batchId=250) TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely timed out) (batchId=250) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testBootstrapLoadMigrationManagedToAcid (batchId=243) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailure (batchId=243) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailurePart (batchId=243) org.apache.hive.jdbc.TestSSL.testMetastoreWithSSL (batchId=257) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15364/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15364/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15364/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12952147 - PreCommit-HIVE-Build > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. This can be > provided using the following configuration: > {code} > hive.repl.replica.external.table.base.dir=/ > {code} > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. >
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723900#comment-16723900 ] Hive QA commented on HIVE-20911: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 27s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 29s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 22s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 31s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 38s{color} | {color:blue} ql in master has 2310 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 37s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 29s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 39s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s{color} | {color:red} ql: The patch generated 12 new + 390 unchanged - 12 fixed = 402 total (was 402) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s{color} | {color:red} itests/hive-unit: The patch generated 21 new + 760 unchanged - 14 fixed = 781 total (was 774) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 4m 1s{color} | {color:red} ql generated 2 new + 2309 unchanged - 1 fixed = 2311 total (was 2310) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 11s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 60m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | The field org.apache.hadoop.hive.ql.exec.repl.ReplLoadWork.pathsToCopyIterator is transient but isn't set by deserialization In ReplLoadWork.java:but isn't set by deserialization In ReplLoadWork.java | | | Write to static field org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.numIteration from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:[line 100] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15364/dev-support/hive-personality.sh | | git revision | master / ef7c396 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-15364/yetus/diff-checkstyle-ql.txt | | checkstyle |
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723831#comment-16723831 ] anishek commented on HIVE-20911: attaching the patch once again since these test pass on local machine but fail in apache builds. trying to provide the fully qualified path for the avro schema file > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, > HIVE-20911.06.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723780#comment-16723780 ] Hive QA commented on HIVE-20911: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 30s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 44s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 25s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 31s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 43s{color} | {color:blue} ql in master has 2310 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 36s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 33s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 28s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 39s{color} | {color:red} ql: The patch generated 12 new + 390 unchanged - 12 fixed = 402 total (was 402) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s{color} | {color:red} itests/hive-unit: The patch generated 21 new + 714 unchanged - 10 fixed = 735 total (was 724) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 48s{color} | {color:red} ql generated 2 new + 2309 unchanged - 1 fixed = 2311 total (was 2310) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 11s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 60m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | The field org.apache.hadoop.hive.ql.exec.repl.ReplLoadWork.pathsToCopyIterator is transient but isn't set by deserialization In ReplLoadWork.java:but isn't set by deserialization In ReplLoadWork.java | | | Write to static field org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.numIteration from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:[line 100] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15361/dev-support/hive-personality.sh | | git revision | master / 87f8ecc | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-15361/yetus/diff-checkstyle-ql.txt | | checkstyle |
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723785#comment-16723785 ] Hive QA commented on HIVE-20911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12952129/HIVE-20911.05.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 15713 tests executed *Failed tests:* {noformat} TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=154) [intersect_all.q,unionDistinct_1.q,table_nonprintable.q,orc_llap_counters1.q,mm_cttas.q,whroot_external1.q,global_limit.q,cte_2.q,rcfile_createas1.q,dynamic_partition_pruning_2.q,intersect_merge.q,results_cache_diff_fs.q,cttl.q,parallel_colstats.q,load_hdfs_file_with_space_in_the_name.q] org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testBootstrapLoadMigrationManagedToAcid (batchId=243) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailure (batchId=243) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailurePart (batchId=243) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15361/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15361/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15361/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12952129 - PreCommit-HIVE-Build > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks"
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722892#comment-16722892 ] Hive QA commented on HIVE-20911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12952004/HIVE-20911.04.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 15723 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repl_2_exim_basic] (batchId=85) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testBootstrapLoadMigrationManagedToAcid (batchId=243) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailure (batchId=243) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailurePart (batchId=243) org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testComplexQuery (batchId=258) org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testKillQuery (batchId=258) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15350/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15350/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15350/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12952004 - PreCommit-HIVE-Build > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, > HIVE-20911.03.patch, HIVE-20911.04.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722885#comment-16722885 ] Hive QA commented on HIVE-20911: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 30s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 32s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 40s{color} | {color:blue} ql in master has 2310 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 35s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 34s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 23s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 38s{color} | {color:red} ql: The patch generated 12 new + 390 unchanged - 12 fixed = 402 total (was 402) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s{color} | {color:red} itests/hive-unit: The patch generated 21 new + 714 unchanged - 10 fixed = 735 total (was 724) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 56s{color} | {color:red} ql generated 2 new + 2309 unchanged - 1 fixed = 2311 total (was 2310) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 11s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 60m 58s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | The field org.apache.hadoop.hive.ql.exec.repl.ReplLoadWork.pathsToCopyIterator is transient but isn't set by deserialization In ReplLoadWork.java:but isn't set by deserialization In ReplLoadWork.java | | | Write to static field org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.numIteration from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:[line 100] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15350/dev-support/hive-personality.sh | | git revision | master / 4e41560 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-15350/yetus/diff-checkstyle-ql.txt | | checkstyle |
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721601#comment-16721601 ] Hive QA commented on HIVE-20911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12951813/HIVE-20911.03.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15326/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15326/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15326/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2018-12-14 16:56:29.365 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-15326/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2018-12-14 16:56:29.428 + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive 687aeef..64930f8 master -> origin/master + git reset --hard HEAD HEAD is now at 687aeef HIVE-21035: Race condition in SparkUtilities#getSparkSession (Antal Sinkovits, reviewed by Adam Szita, Denys Kuzmenko) + git clean -f -d Removing standalone-metastore/metastore-server/src/gen/ + git checkout master Already on 'master' Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded. (use "git pull" to update your local branch) + git reset --hard origin/master HEAD is now at 64930f8 HIVE-21028: Adding a JDO fetch plan for getTableMeta get_table_meta to avoid race condition(Karthik Manamcheri, reviewed by Adam Holley, Vihang K and Naveen G) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2018-12-14 16:56:39.652 + rm -rf ../yetus_PreCommit-HIVE-Build-15326 + mkdir ../yetus_PreCommit-HIVE-Build-15326 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-15326 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-15326/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: a/common/src/java/org/apache/hadoop/hive/common/FileUtils.java: does not exist in index error: a/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: does not exist in index error: a/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java: does not exist in index error: a/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java: does not exist in index error: a/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java: does not exist in index error: a/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/WarehouseInstance.java: does not exist in index error: a/ql/src/java/org/apache/hadoop/hive/ql/Context.java: does not exist in index error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java: does not exist in index error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java: does not exist in index error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java: does not exist in index error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java: does not exist in index error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/BootstrapEventsIterator.java: does not exist in index error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/DatabaseEventsIterator.java: does not exist in index error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSTableEvent.java: does not exist in index error:
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720301#comment-16720301 ] Hive QA commented on HIVE-20911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12951649/HIVE-20911.02.patch {color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15659 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repl_2_exim_basic] (batchId=85) org.apache.hadoop.hive.ql.exec.repl.TestReplDumpTask.removeDBPropertyToPreventRenameWhenBootstrapDumpOfTableFails (batchId=315) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15298/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15298/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15298/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12951649 - PreCommit-HIVE-Build > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720255#comment-16720255 ] Hive QA commented on HIVE-20911: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 26s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 30s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 42s{color} | {color:blue} ql in master has 2310 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 38s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 38s{color} | {color:red} ql: The patch generated 11 new + 389 unchanged - 13 fixed = 400 total (was 402) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s{color} | {color:red} itests/hive-unit: The patch generated 23 new + 766 unchanged - 10 fixed = 789 total (was 776) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 56s{color} | {color:red} ql generated 2 new + 2309 unchanged - 1 fixed = 2311 total (was 2310) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 12s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 31m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | The field org.apache.hadoop.hive.ql.exec.repl.ReplLoadWork.pathsToCopyIterator is transient but isn't set by deserialization In ReplLoadWork.java:but isn't set by deserialization In ReplLoadWork.java | | | Write to static field org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.numIteration from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:[line 100] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15298/dev-support/hive-personality.sh | | git revision | master / b5b6371 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-15298/yetus/diff-checkstyle-ql.txt | | checkstyle |
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718531#comment-16718531 ] Hive QA commented on HIVE-20911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12951461/HIVE-20911.01.patch {color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 15659 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repl_2_exim_basic] (batchId=85) org.apache.hadoop.hive.ql.exec.repl.TestReplDumpTask.removeDBPropertyToPreventRenameWhenBootstrapDumpOfTableFails (batchId=315) org.apache.hadoop.hive.ql.parse.TestReplAcidTablesWithJsonMessage.testAcidBootstrapReplLoadRetryAfterFailure (batchId=248) org.apache.hadoop.hive.ql.parse.TestReplAcidTablesWithJsonMessage.testAcidTablesBootstrap (batchId=248) org.apache.hadoop.hive.ql.parse.TestReplAcidTablesWithJsonMessage.testAcidTablesBootstrapWithConcurrentWrites (batchId=248) org.apache.hadoop.hive.ql.parse.TestReplAcidTablesWithJsonMessage.testAcidTablesBootstrapWithOpenTxnsTimeout (batchId=248) org.apache.hadoop.hive.ql.parse.TestReplAcidTablesWithJsonMessage.testAcidTablesMoveOptimizationBootStrap (batchId=248) org.apache.hadoop.hive.ql.parse.TestReplAcidTablesWithJsonMessage.testDumpAcidTableWithPartitionDirMissing (batchId=248) org.apache.hadoop.hive.ql.parse.TestReplAcidTablesWithJsonMessage.testDumpAcidTableWithTableDirMissing (batchId=248) org.apache.hadoop.hive.ql.parse.TestReplAcidTablesWithJsonMessage.testMultiDBTxn (batchId=248) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTables.testAcidBootstrapReplLoadRetryAfterFailure (batchId=245) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTables.testAcidTablesBootstrap (batchId=245) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTables.testAcidTablesBootstrapWithConcurrentWrites (batchId=245) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTables.testAcidTablesBootstrapWithOpenTxnsTimeout (batchId=245) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTables.testAcidTablesMoveOptimizationBootStrap (batchId=245) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTables.testDumpAcidTableWithPartitionDirMissing (batchId=245) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTables.testDumpAcidTableWithTableDirMissing (batchId=245) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTables.testMultiDBTxn (batchId=245) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosIncrementalLoadAcidTables.testMigrationManagedToAcid (batchId=246) org.apache.hive.service.TestDFSErrorHandling.testAccessDenied (batchId=254) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15273/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15273/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15273/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 20 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12951461 - PreCommit-HIVE-Build > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} >
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718507#comment-16718507 ] Hive QA commented on HIVE-20911: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 36s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 32s{color} | {color:blue} common in master has 65 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 37s{color} | {color:blue} ql in master has 2311 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 38s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 26s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 55s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s{color} | {color:red} common: The patch generated 1 new + 454 unchanged - 0 fixed = 455 total (was 454) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 39s{color} | {color:red} ql: The patch generated 17 new + 329 unchanged - 12 fixed = 346 total (was 341) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s{color} | {color:red} itests/hive-unit: The patch generated 23 new + 708 unchanged - 8 fixed = 731 total (was 716) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 4m 0s{color} | {color:red} ql generated 4 new + 2310 unchanged - 1 fixed = 2314 total (was 2311) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 31m 17s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | Found reliance on default encoding in org.apache.hadoop.hive.ql.exec.repl.ReplExternalTables$Reader.reader(FileSystem, Path):in org.apache.hadoop.hive.ql.exec.repl.ReplExternalTables$Reader.reader(FileSystem, Path): new java.io.InputStreamReader(InputStream) At ReplExternalTables.java:[line 214] | | | The field org.apache.hadoop.hive.ql.exec.repl.ReplLoadWork.pathsToCopyIterator is transient but isn't set by deserialization In ReplLoadWork.java:but isn't set by deserialization In ReplLoadWork.java | | | Write to static field org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.numIteration from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:from instance method org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder.build(DriverContext, Hive, Logger, ReplLoadWork, TaskTracker) At IncrementalLoadTasksBuilder.java:[line 100] | | | Exception is caught when Exception is not thrown in org.apache.hadoop.hive.ql.parse.repl.load.message.TableHandler.handle(MessageHandler$Context) At
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718482#comment-16718482 ] ASF GitHub Bot commented on HIVE-20911: --- GitHub user anishek opened a pull request: https://github.com/apache/hive/pull/506 HIVE-20911: External Table Replication for Hive You can merge this pull request into a Git repository by running: $ git pull https://github.com/anishek/hive HIVE-20911 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/506.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #506 commit 57fa9f0c2c3e00e92b8ba472b05668ff3767c5f4 Author: Anishek Agarwal Date: 2018-10-30T08:24:41Z HIVE-20911: External Table Replication for Hive > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20911) External Table Replication for Hive
[ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718483#comment-16718483 ] anishek commented on HIVE-20911: submitting initial patch for tests, [~maheshk114]/[~sankarh]/[~ashutosh.bapat] please review! > External Table Replication for Hive > --- > > Key: HIVE-20911 > URL: https://issues.apache.org/jira/browse/HIVE-20911 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: anishek >Assignee: anishek >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20911.01.patch > > > External tables are not replicated currently as part of hive replication. As > part of this jira we want to enable that. > Approach: > * Target cluster will have a top level base directory config that will be > used to copy all data relevant to external tables. This will be provided via > the *with* clause in the *repl load* command. This base path will be prefixed > to the path of the same external table on source cluster. > * Since changes to directories on the external table can happen without hive > knowing it, hence we cant capture the relevant events when ever new data is > added or removed, we will have to copy the data from the source path to > target path for external tables every time we run incremental replication. > ** this will require incremental *repl dump* to now create an additional > file *\_external\_tables\_info* with data in the following form > {code} > tableName,base64Encoded(tableDataLocation) > {code} > In case there are different partitions in the table pointing to different > locations there will be multiple entries in the file for the same table name > with location pointing to different partition locations. For partitions > created in a table without specifying the _set location_ command will be > within the same table Data location and hence there will not be different > entries in the file above > ** *repl load* will read the *\_external\_tables\_info* to identify what > locations are to be copied from source to target and create corresponding > tasks for them. > * New External tables will be created with metadata only with no data copied > as part of regular tasks while incremental load/bootstrap load. > * Bootstrap dump will also create *\_external\_tables\_info* which will be > used to copy data from source to target as part of boostrap load. > * Bootstrap load will create a DAG, that can use parallelism in the execution > phase, the hdfs copy related tasks are created, once the bootstrap phase is > complete. > * Since incremental load results in a DAG with only sequential execution ( > events applied in sequence ) to effectively use the parallelism capability in > execution mode, we create tasks for hdfs copy along with the incremental DAG. > This requires a few basic calculations to approximately meet the configured > value in "hive.repl.approx.max.load.tasks" -- This message was sent by Atlassian JIRA (v7.6.3#76005)