[jira] [Commented] (HIVE-21292) Break up DDLTask 1 - extract Database related operations

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776598#comment-16776598
 ] 

Hive QA commented on HIVE-21292:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
10s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
39s{color} | {color:blue} hcatalog/core in master has 29 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
45s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
45s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} ql: The patch generated 0 new + 507 unchanged - 25 
fixed = 507 total (was 532) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} hcatalog/core: The patch generated 0 new + 40 
unchanged - 2 fixed = 40 total (was 42) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} The patch hive-unit passed checkstyle {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
13s{color} | {color:green} ql generated 0 new + 2260 unchanged - 1 fixed = 2260 
total (was 2261) {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} core in the patch passed. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} hive-unit in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16229/dev-support/hive-personality.sh
 |
| git revision | master / 2daaed7 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql hcatalog/core itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16229/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Break up DDLTask 1 - extract Database related operations
> 
>
> Key: HIVE-21292
> URL: https://issues.apache.org/jira/browse/HIVE-21292
> Project: Hive
>  Issue Type: Imp

[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations

2019-02-24 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21292:
--
Status: Patch Available  (was: Open)

> Break up DDLTask 1 - extract Database related operations
> 
>
> Key: HIVE-21292
> URL: https://issues.apache.org/jira/browse/HIVE-21292
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, 
> HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, 
> HIVE-21292.06.patch, HIVE-21292.07.patch, HIVE-21292.08.patch, 
> HIVE-21292.09.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #1: extract all the database related operations from the old DDLTask, 
> and move them under the new package. Also create the new internal framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations

2019-02-24 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21292:
--
Attachment: HIVE-21292.09.patch

> Break up DDLTask 1 - extract Database related operations
> 
>
> Key: HIVE-21292
> URL: https://issues.apache.org/jira/browse/HIVE-21292
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, 
> HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, 
> HIVE-21292.06.patch, HIVE-21292.07.patch, HIVE-21292.08.patch, 
> HIVE-21292.09.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #1: extract all the database related operations from the old DDLTask, 
> and move them under the new package. Also create the new internal framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations

2019-02-24 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21292:
--
Status: Open  (was: Patch Available)

> Break up DDLTask 1 - extract Database related operations
> 
>
> Key: HIVE-21292
> URL: https://issues.apache.org/jira/browse/HIVE-21292
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, 
> HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, 
> HIVE-21292.06.patch, HIVE-21292.07.patch, HIVE-21292.08.patch, 
> HIVE-21292.09.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #1: extract all the database related operations from the old DDLTask, 
> and move them under the new package. Also create the new internal framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776524#comment-16776524
 ] 

Hive QA commented on HIVE-21197:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12959970/HIVE-21197.03.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15816 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16228/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16228/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16228/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12959970 - PreCommit-HIVE-Build

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, 
> HIVE-21197.03.patch
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776522#comment-16776522
 ] 

Hive QA commented on HIVE-21197:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m 
12s{color} | {color:blue} standalone-metastore/metastore-server in master has 
181 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
3s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
47s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
42s{color} | {color:red} ql: The patch generated 1 new + 325 unchanged - 0 
fixed = 326 total (was 325) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
20s{color} | {color:red} itests/hive-unit: The patch generated 71 new + 272 
unchanged - 0 fixed = 343 total (was 272) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16228/dev-support/hive-personality.sh
 |
| git revision | master / 2daaed7 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16228/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16228/yetus/diff-checkstyle-itests_hive-unit.txt
 |
| modules | C: standalone-metastore/metastore-server ql itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16228/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Prio

[jira] [Assigned] (HIVE-21314) Hive Replication not retaining the owner in the replicated table

2019-02-24 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-21314:
--


> Hive Replication not retaining the owner in the replicated table
> 
>
> Key: HIVE-21314
> URL: https://issues.apache.org/jira/browse/HIVE-21314
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> Hive Replication not retaining the owner in the replicated table. The owner 
> for the target table is set same as the user executing the load command. The 
> user information should be read from the dump metadata and should be used 
> while creating the table at target cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-21197:
---
Attachment: (was: HIVE-21197.03.patch)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, 
> HIVE-21197.03.patch
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-21197:
---
Status: Open  (was: Patch Available)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, 
> HIVE-21197.03.patch
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-21197:
---
Attachment: HIVE-21197.03.patch

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, 
> HIVE-21197.03.patch
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-21197:
---
Status: Patch Available  (was: Open)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, 
> HIVE-21197.03.patch
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-21312) FSStatsAggregator::connect is slow

2019-02-24 Thread Rajesh Balamohan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776457#comment-16776457
 ] 

Rajesh Balamohan edited comment on HIVE-21312 at 2/25/19 2:22 AM:
--

Thanks [~kgyrtkirk]. I have made it as threadsafe queue.

In my local run, runtime for this went down from 420 seconds to 20 seconds.


was (Author: rajesh.balamohan):
Thanks [~kgyrtkirk]. I have made it as threadsafe queue.

> FSStatsAggregator::connect is slow
> --
>
> Key: HIVE-21312
> URL: https://issues.apache.org/jira/browse/HIVE-21312
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-21312.1.patch, HIVE-21312.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21312) FSStatsAggregator::connect is slow

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776465#comment-16776465
 ] 

Hive QA commented on HIVE-21312:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
17s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
39s{color} | {color:red} ql: The patch generated 5 new + 9 unchanged - 5 fixed 
= 14 total (was 14) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16227/dev-support/hive-personality.sh
 |
| git revision | master / 2daaed7 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16227/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16227/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> FSStatsAggregator::connect is slow
> --
>
> Key: HIVE-21312
> URL: https://issues.apache.org/jira/browse/HIVE-21312
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-21312.1.patch, HIVE-21312.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21312) FSStatsAggregator::connect is slow

2019-02-24 Thread Rajesh Balamohan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776457#comment-16776457
 ] 

Rajesh Balamohan commented on HIVE-21312:
-

Thanks [~kgyrtkirk]. I have made it as threadsafe queue.

> FSStatsAggregator::connect is slow
> --
>
> Key: HIVE-21312
> URL: https://issues.apache.org/jira/browse/HIVE-21312
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-21312.1.patch, HIVE-21312.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21312) FSStatsAggregator::connect is slow

2019-02-24 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-21312:

Attachment: HIVE-21312.2.patch

> FSStatsAggregator::connect is slow
> --
>
> Key: HIVE-21312
> URL: https://issues.apache.org/jira/browse/HIVE-21312
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-21312.1.patch, HIVE-21312.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21292) Break up DDLTask 1 - extract Database related operations

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776450#comment-16776450
 ] 

Hive QA commented on HIVE-21292:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12959958/HIVE-21292.08.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15811 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16226/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16226/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16226/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12959958 - PreCommit-HIVE-Build

> Break up DDLTask 1 - extract Database related operations
> 
>
> Key: HIVE-21292
> URL: https://issues.apache.org/jira/browse/HIVE-21292
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, 
> HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, 
> HIVE-21292.06.patch, HIVE-21292.07.patch, HIVE-21292.08.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #1: extract all the database related operations from the old DDLTask, 
> and move them under the new package. Also create the new internal framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21292) Break up DDLTask 1 - extract Database related operations

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776436#comment-16776436
 ] 

Hive QA commented on HIVE-21292:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
6s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
39s{color} | {color:blue} hcatalog/core in master has 29 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
44s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
40s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} ql: The patch generated 0 new + 507 unchanged - 25 
fixed = 507 total (was 532) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} hcatalog/core: The patch generated 0 new + 40 
unchanged - 2 fixed = 40 total (was 42) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} The patch hive-unit passed checkstyle {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
19s{color} | {color:green} ql generated 0 new + 2260 unchanged - 1 fixed = 2260 
total (was 2261) {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} core in the patch passed. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} hive-unit in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 35m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16226/dev-support/hive-personality.sh
 |
| git revision | master / 2daaed7 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql hcatalog/core itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16226/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Break up DDLTask 1 - extract Database related operations
> 
>
> Key: HIVE-21292
> URL: https://issues.apache.org/jira/browse/HIVE-21292
> Project: Hive
>  Issue Type: Imp

[jira] [Commented] (HIVE-21240) JSON SerDe Re-Write

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776430#comment-16776430
 ] 

Hive QA commented on HIVE-21240:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12959957/HIVE-21240.10.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15820 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniHiveKafkaCliDriver.testCliDriver[kafka_storage_handler]
 (batchId=275)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16225/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16225/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16225/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12959957 - PreCommit-HIVE-Build

> JSON SerDe Re-Write
> ---
>
> Key: HIVE-21240
> URL: https://issues.apache.org/jira/browse/HIVE-21240
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0, 3.1.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.10.patch, HIVE-21240.2.patch, HIVE-21240.3.patch, 
> HIVE-21240.4.patch, HIVE-21240.5.patch, HIVE-21240.6.patch, 
> HIVE-21240.7.patch, HIVE-21240.9.patch, HIVE-24240.8.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21240) JSON SerDe Re-Write

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776425#comment-16776425
 ] 

Hive QA commented on HIVE-21240:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
51s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
11s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
42s{color} | {color:blue} serde in master has 197 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
9s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
37s{color} | {color:blue} hcatalog/core in master has 29 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} serde: The patch generated 0 new + 4 unchanged - 25 
fixed = 4 total (was 29) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} ql: The patch generated 0 new + 6 unchanged - 5 
fixed = 6 total (was 11) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} The patch core passed checkstyle {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
46s{color} | {color:green} serde generated 0 new + 193 unchanged - 4 fixed = 
193 total (was 197) {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
20s{color} | {color:green} ql in the patch passed. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} core in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 19s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16225/dev-support/hive-personality.sh
 |
| git revision | master / 2daaed7 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: serde ql hcatalog/core U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16225/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> JSON SerDe Re-Write
> ---
>
> Key: HIVE-21240
> URL: https://issues.apache.org/jira/browse/HIVE-21240
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0, 3.1.1
>Reporter: BELU

[jira] [Commented] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776405#comment-16776405
 ] 

Hive QA commented on HIVE-21197:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12959955/HIVE-21197.03.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 15816 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamptz_2] 
(batchId=86)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=109)
org.apache.hadoop.hive.ql.exec.repl.TestReplDumpTask.removeDBPropertyToPreventRenameWhenBootstrapDumpOfTableFails
 (batchId=321)
org.apache.hadoop.hive.ql.parse.TestReplWithJsonMessageFormat.testBootstrapWithConcurrentDropTable
 (batchId=244)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testBootstrapWithConcurrentDropTable
 (batchId=246)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16224/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16224/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16224/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12959955 - PreCommit-HIVE-Build

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, 
> HIVE-21197.03.patch
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations

2019-02-24 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21292:
--
Attachment: HIVE-21292.08.patch

> Break up DDLTask 1 - extract Database related operations
> 
>
> Key: HIVE-21292
> URL: https://issues.apache.org/jira/browse/HIVE-21292
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, 
> HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, 
> HIVE-21292.06.patch, HIVE-21292.07.patch, HIVE-21292.08.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #1: extract all the database related operations from the old DDLTask, 
> and move them under the new package. Also create the new internal framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776399#comment-16776399
 ] 

Hive QA commented on HIVE-21197:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
48s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
25s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m 
12s{color} | {color:blue} standalone-metastore/metastore-server in master has 
181 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
12s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
45s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
47s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
45s{color} | {color:red} ql: The patch generated 1 new + 325 unchanged - 0 
fixed = 326 total (was 325) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
21s{color} | {color:red} itests/hive-unit: The patch generated 71 new + 272 
unchanged - 0 fixed = 343 total (was 272) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16224/dev-support/hive-personality.sh
 |
| git revision | master / 2daaed7 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16224/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16224/yetus/diff-checkstyle-itests_hive-unit.txt
 |
| modules | C: standalone-metastore/metastore-server ql itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16224/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Prio

[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations

2019-02-24 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21292:
--
Status: Patch Available  (was: Open)

> Break up DDLTask 1 - extract Database related operations
> 
>
> Key: HIVE-21292
> URL: https://issues.apache.org/jira/browse/HIVE-21292
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, 
> HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, 
> HIVE-21292.06.patch, HIVE-21292.07.patch, HIVE-21292.08.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #1: extract all the database related operations from the old DDLTask, 
> and move them under the new package. Also create the new internal framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations

2019-02-24 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21292:
--
Status: Open  (was: Patch Available)

> Break up DDLTask 1 - extract Database related operations
> 
>
> Key: HIVE-21292
> URL: https://issues.apache.org/jira/browse/HIVE-21292
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, 
> HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, 
> HIVE-21292.06.patch, HIVE-21292.07.patch, HIVE-21292.08.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #1: extract all the database related operations from the old DDLTask, 
> and move them under the new package. Also create the new internal framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21240) JSON SerDe Re-Write

2019-02-24 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21240:
---
Status: Patch Available  (was: Open)

> JSON SerDe Re-Write
> ---
>
> Key: HIVE-21240
> URL: https://issues.apache.org/jira/browse/HIVE-21240
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.1, 4.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.10.patch, HIVE-21240.2.patch, HIVE-21240.3.patch, 
> HIVE-21240.4.patch, HIVE-21240.5.patch, HIVE-21240.6.patch, 
> HIVE-21240.7.patch, HIVE-21240.9.patch, HIVE-24240.8.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21240) JSON SerDe Re-Write

2019-02-24 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21240:
---
Attachment: HIVE-21240.10.patch

> JSON SerDe Re-Write
> ---
>
> Key: HIVE-21240
> URL: https://issues.apache.org/jira/browse/HIVE-21240
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0, 3.1.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.10.patch, HIVE-21240.2.patch, HIVE-21240.3.patch, 
> HIVE-21240.4.patch, HIVE-21240.5.patch, HIVE-21240.6.patch, 
> HIVE-21240.7.patch, HIVE-21240.9.patch, HIVE-24240.8.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21240) JSON SerDe Re-Write

2019-02-24 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21240:
---
Status: Open  (was: Patch Available)

> JSON SerDe Re-Write
> ---
>
> Key: HIVE-21240
> URL: https://issues.apache.org/jira/browse/HIVE-21240
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.1, 4.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.10.patch, HIVE-21240.2.patch, HIVE-21240.3.patch, 
> HIVE-21240.4.patch, HIVE-21240.5.patch, HIVE-21240.6.patch, 
> HIVE-21240.7.patch, HIVE-21240.9.patch, HIVE-24240.8.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21292) Break up DDLTask 1 - extract Database related operations

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776387#comment-16776387
 ] 

Hive QA commented on HIVE-21292:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12959954/HIVE-21292.07.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15800 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=184)

[auto_sortmerge_join_7.q,mm_exim.q,input16_cc.q,materialized_view_rewrite_no_join_opt.q,vector_char_varchar_1.q,smb_mapjoin_5.q,vector_char_4.q,cross_product_check_2.q,cbo_limit.q,llap_smb.q,materialized_view_create_rewrite_2.q,vector_decimal_udf.q]
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16223/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16223/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16223/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12959954 - PreCommit-HIVE-Build

> Break up DDLTask 1 - extract Database related operations
> 
>
> Key: HIVE-21292
> URL: https://issues.apache.org/jira/browse/HIVE-21292
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, 
> HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, 
> HIVE-21292.06.patch, HIVE-21292.07.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #1: extract all the database related operations from the old DDLTask, 
> and move them under the new package. Also create the new internal framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21292) Break up DDLTask 1 - extract Database related operations

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776381#comment-16776381
 ] 

Hive QA commented on HIVE-21292:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
53s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
14s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
37s{color} | {color:blue} hcatalog/core in master has 29 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
49s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
50s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} ql: The patch generated 0 new + 507 unchanged - 25 
fixed = 507 total (was 532) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} hcatalog/core: The patch generated 0 new + 40 
unchanged - 2 fixed = 40 total (was 42) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} The patch hive-unit passed checkstyle {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
23s{color} | {color:green} ql generated 0 new + 2260 unchanged - 1 fixed = 2260 
total (was 2261) {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
48s{color} | {color:green} core in the patch passed. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
54s{color} | {color:green} hive-unit in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 36m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16223/dev-support/hive-personality.sh
 |
| git revision | master / 2daaed7 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql hcatalog/core itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16223/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Break up DDLTask 1 - extract Database related operations
> 
>
> Key: HIVE-21292
> URL: https://issues.apache.org/jira/browse/HIVE-21292
> Project: Hive
>  Issue Type: Imp

[jira] [Updated] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-21197:
---
Attachment: HIVE-21197.03.patch

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, 
> HIVE-21197.03.patch
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-21197:
---
Status: Patch Available  (was: Open)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, 
> HIVE-21197.03.patch
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-21197:
---
Status: Open  (was: Patch Available)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch, 
> HIVE-21197.03.patch
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations

2019-02-24 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21292:
--
Status: Patch Available  (was: Open)

> Break up DDLTask 1 - extract Database related operations
> 
>
> Key: HIVE-21292
> URL: https://issues.apache.org/jira/browse/HIVE-21292
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, 
> HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, 
> HIVE-21292.06.patch, HIVE-21292.07.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #1: extract all the database related operations from the old DDLTask, 
> and move them under the new package. Also create the new internal framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations

2019-02-24 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21292:
--
Attachment: HIVE-21292.07.patch

> Break up DDLTask 1 - extract Database related operations
> 
>
> Key: HIVE-21292
> URL: https://issues.apache.org/jira/browse/HIVE-21292
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, 
> HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, 
> HIVE-21292.06.patch, HIVE-21292.07.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #1: extract all the database related operations from the old DDLTask, 
> and move them under the new package. Also create the new internal framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21292) Break up DDLTask 1 - extract Database related operations

2019-02-24 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21292:
--
Status: Open  (was: Patch Available)

> Break up DDLTask 1 - extract Database related operations
> 
>
> Key: HIVE-21292
> URL: https://issues.apache.org/jira/browse/HIVE-21292
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21292.01.patch, HIVE-21292.02.patch, 
> HIVE-21292.03.patch, HIVE-21292.04.patch, HIVE-21292.05.patch, 
> HIVE-21292.06.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #1: extract all the database related operations from the old DDLTask, 
> and move them under the new package. Also create the new internal framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21293) Fix ambiguity in grammar warnings at compilation time (II)

2019-02-24 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776328#comment-16776328
 ] 

Ashutosh Chauhan commented on HIVE-21293:
-

{{unknown}} needs to be non-reserved for this feature to be included. Else, 
resulting ambiguity in grammar is not worth including this feature.
Because altho its reserved in standard, having it as a reserved will be a 
backward incompatible change.

> Fix ambiguity in grammar warnings at compilation time (II)
> --
>
> Key: HIVE-21293
> URL: https://issues.apache.org/jira/browse/HIVE-21293
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 4.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: HIVE-21293.01.patch
>
>
> These are the warnings at compilation time:
> {code}
> warning(200): IdentifiersParser.g:424:5:
> Decision can match input such as "KW_UNKNOWN" using multiple alternatives: 1, 
> 10
> As a result, alternative(s) 10 were disabled for that input
> {code}
> This means that multiple parser rules can match certain query text, possibly 
> leading to unexpected errors at parsing time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21313) Use faster function to point to instead of copy immutable byte arrays

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776314#comment-16776314
 ] 

Hive QA commented on HIVE-21313:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12959947/HIVE-21313.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 99 failed/errored test(s), 15811 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=267)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[allcolref_in_udf] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join11] (batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join12] (batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join13] (batchId=87)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join15] (batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join20] (batchId=96)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join22] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join26] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join29] (batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join31] (batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_stats2] 
(batchId=94)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ba_table_udfs] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_udf_max] (batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer7] 
(batchId=23)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join30] (batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join40] (batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_outer] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin47] (batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_filter_on_outerjoin]
 (batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_memcheck] 
(batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_subquery] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_test_outer] 
(batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_10] (batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_12] (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_13] (batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_1] (batchId=91)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_1_newdb] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_2] (batchId=94)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_5] (batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_6] (batchId=28)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_7] (batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_8] (batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_disablecbo_1] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_disablecbo_2] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_mv] (batchId=88)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin_mapjoin5] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin_mapjoin6] 
(batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin_mapjoin8] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoinopt10] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_max] (batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_min] (batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sentences] 
(batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_pos_alias] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_12] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_13] 
(batchId=94)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_14] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_binary_join_groupby]
 (batchId=88)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_case_when_2] 
(batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_varchar_mapjoin1] 
(batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_math_funcs] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_parquet_types]
 (batchId=72)
org.apache.hadoop.hive.cli.TestCompareCliDriver.

[jira] [Commented] (HIVE-21313) Use faster function to point to instead of copy immutable byte arrays

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776301#comment-16776301
 ] 

Hive QA commented on HIVE-21313:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
7s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
44s{color} | {color:red} ql: The patch generated 10 new + 444 unchanged - 10 
fixed = 454 total (was 454) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16222/dev-support/hive-personality.sh
 |
| git revision | master / 2daaed7 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16222/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16222/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Use faster function to point to instead of copy immutable byte arrays
> -
>
> Key: HIVE-21313
> URL: https://issues.apache.org/jira/browse/HIVE-21313
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: All Versions
>Reporter: ZhangXin
>Assignee: ZhangXin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: All Versions
>
> Attachments: HIVE-21313.patch, HIVE-21313.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In file ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAssignRow.java
> We may find code like this:
> ```
> Text text = (Text) convertTargetWritable;
>  if (text == null)
> {     text = new Text(); }
> text.set(string);
>  ((BytesColumnVector) columnVector).setVal(
>      batchIndex, text.getBytes(), 0, text.getLength());
> ```
>  
> Using `setVal` method can copy the bytes array generated by 
> `text.getBytes()`. This is totally unnecessary at all. Since the bytes array 
> is immutable, we can just use `setRef` method to point to the specific  byte 
> array, which will also lower the memory usage.
>  
> Pull request on Github:  https://github.com/apache/hive/pull/548
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-15475) JsonSerDe cannot handle json file with empty lines

2019-02-24 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776298#comment-16776298
 ] 

BELUGA BEHR commented on HIVE-15475:


Nope. OK.  Figured it out.

This issue was inadvertently fixed as part of [HIVE-18545] (Jul 10, 2018).  
Previous to this change, the JSON stuff was handled by 
{{org.apache.hive.hcatalog.data.JsonSerDe}}

The issue was that this class was not handling the provided {{Text}} object 
correctly.  The {{Text}} object has two components to it: an internal array of 
bytes *and* a size that indicates which bytes are to be processed.  Well, 
{{JsonSerde}} was not taking into account the size, so, when a zero-length 
{{Text}} object was submitted, it would still look at the entire internal byte 
array, ignoring the zero size, and produce duplicates where there should be no 
text.

https://github.com/apache/hive/blob/ae008b79b5d52ed6a38875b73025a505725828eb/hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/JsonSerDe.java#L168

> JsonSerDe cannot handle json file with empty lines
> --
>
> Key: HIVE-15475
> URL: https://issues.apache.org/jira/browse/HIVE-15475
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: pin_zhang
>Priority: Major
>
> 1. start HiveServer2 in apache-hive-1.2.1
> 2 start a beeline connect to hive server2
>   ADD JAR  ADD JAR 
> /home/apache-hive-1.2.1-bin/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar
>  ;
>CREATE external TABLE my_table(a string, b bigint)
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
> STORED AS TEXTFILE
> location 'file:///home/hive/json';
> 3 put a file with more than one new lines at the end of the file
> {"a":"a_1", "b" : 1}
> 4 run sql 
> select * from my_table ;
> +-+-+--+
> | my_table.a  | my_table.b  |
> +-+-+--+
> | a_1 | 1   |
> | a_1 | 1   |
> | a_1 | 1   |
> | a_1 | 1   |
> | a_1 | 1   |
> +-+-+--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-15475) JsonSerDe cannot handle json file with empty lines

2019-02-24 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR resolved HIVE-15475.

Resolution: Fixed

> JsonSerDe cannot handle json file with empty lines
> --
>
> Key: HIVE-15475
> URL: https://issues.apache.org/jira/browse/HIVE-15475
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: pin_zhang
>Priority: Major
>
> 1. start HiveServer2 in apache-hive-1.2.1
> 2 start a beeline connect to hive server2
>   ADD JAR  ADD JAR 
> /home/apache-hive-1.2.1-bin/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar
>  ;
>CREATE external TABLE my_table(a string, b bigint)
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
> STORED AS TEXTFILE
> location 'file:///home/hive/json';
> 3 put a file with more than one new lines at the end of the file
> {"a":"a_1", "b" : 1}
> 4 run sql 
> select * from my_table ;
> +-+-+--+
> | my_table.a  | my_table.b  |
> +-+-+--+
> | a_1 | 1   |
> | a_1 | 1   |
> | a_1 | 1   |
> | a_1 | 1   |
> | a_1 | 1   |
> +-+-+--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21313) Use faster function to point to instead of copy immutable byte arrays

2019-02-24 Thread ZhangXin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangXin updated HIVE-21313:

Fix Version/s: All Versions
Affects Version/s: All Versions
   Attachment: HIVE-21313.patch
 Target Version/s: All Versions
   Status: Patch Available  (was: Open)

> Use faster function to point to instead of copy immutable byte arrays
> -
>
> Key: HIVE-21313
> URL: https://issues.apache.org/jira/browse/HIVE-21313
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: All Versions
>Reporter: ZhangXin
>Assignee: ZhangXin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: All Versions
>
> Attachments: HIVE-21313.patch, HIVE-21313.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In file ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAssignRow.java
> We may find code like this:
> ```
> Text text = (Text) convertTargetWritable;
>  if (text == null)
> {     text = new Text(); }
> text.set(string);
>  ((BytesColumnVector) columnVector).setVal(
>      batchIndex, text.getBytes(), 0, text.getLength());
> ```
>  
> Using `setVal` method can copy the bytes array generated by 
> `text.getBytes()`. This is totally unnecessary at all. Since the bytes array 
> is immutable, we can just use `setRef` method to point to the specific  byte 
> array, which will also lower the memory usage.
>  
> Pull request on Github:  https://github.com/apache/hive/pull/548
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21312) FSStatsAggregator::connect is slow

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776271#comment-16776271
 ] 

Hive QA commented on HIVE-21312:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
14s{color} | {color:blue} ql in master has 2261 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
45s{color} | {color:red} ql: The patch generated 3 new + 13 unchanged - 1 fixed 
= 16 total (was 14) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 38s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16221/dev-support/hive-personality.sh
 |
| git revision | master / 2daaed7 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16221/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16221/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> FSStatsAggregator::connect is slow
> --
>
> Key: HIVE-21312
> URL: https://issues.apache.org/jira/browse/HIVE-21312
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-21312.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21313) Use faster function to point to instead of copy immutable byte arrays

2019-02-24 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776275#comment-16776275
 ] 

Zoltan Haindrich commented on HIVE-21313:
-

[~ZhangxinJson]: please press "submit patch" button to have hive qa test your 
patch

> Use faster function to point to instead of copy immutable byte arrays
> -
>
> Key: HIVE-21313
> URL: https://issues.apache.org/jira/browse/HIVE-21313
> Project: Hive
>  Issue Type: Improvement
>Reporter: ZhangXin
>Assignee: ZhangXin
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-21313.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In file ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAssignRow.java
> We may find code like this:
> ```
> Text text = (Text) convertTargetWritable;
>  if (text == null)
> {     text = new Text(); }
> text.set(string);
>  ((BytesColumnVector) columnVector).setVal(
>      batchIndex, text.getBytes(), 0, text.getLength());
> ```
>  
> Using `setVal` method can copy the bytes array generated by 
> `text.getBytes()`. This is totally unnecessary at all. Since the bytes array 
> is immutable, we can just use `setRef` method to point to the specific  byte 
> array, which will also lower the memory usage.
>  
> Pull request on Github:  https://github.com/apache/hive/pull/548
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203245&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203245
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 14:20
Start Date: 24/Feb/19 14:20
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622969
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -661,6 +663,10 @@ public int execute(DriverContext driverContext) {
   if (work.getAlterMaterializedViewDesc() != null) {
 return alterMaterializedView(db, work.getAlterMaterializedViewDesc());
   }
+
+  if (work.getReplSetFirstIncLoadFlagDesc() != null) {
 
 Review comment:
   its done in a separate task to make it simpler
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203245)
Time Spent: 11h  (was: 10h 50m)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203236&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203236
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 14:02
Start Date: 24/Feb/19 14:02
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622204
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 ##
 @@ -1536,6 +1536,62 @@ public void testCompactionInfoHashCode() {
 Assert.assertEquals("The hash codes must be equal", 
compactionInfo.hashCode(), compactionInfo1.hashCode());
   }
 
+  @Test
+  public void testDisableCompactionDuringReplLoad() throws Exception {
+String tblName = "discomp";
+String database = "discomp_db";
+executeStatementOnDriver("drop database if exists " + database + " 
cascade", driver);
+executeStatementOnDriver("create database " + database, driver);
+executeStatementOnDriver("CREATE TABLE " + database + "." + tblName + "(a 
INT, b STRING) " +
+" PARTITIONED BY(ds string)" +
+" CLUSTERED BY(a) INTO 2 BUCKETS" + //currently ACID requires 
table to be bucketed
+" STORED AS ORC TBLPROPERTIES ('transactional'='true')", driver);
+executeStatementOnDriver("insert into " + database + "." + tblName + " 
partition (ds) values (1, 'fred', " +
+"'today'), (2, 'wilma', 'yesterday')", driver);
+
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
+" SET TBLPROPERTIES ( 'hive.repl.first.inc.pending' = 'true')", 
driver);
+List compacts = getCompactionList();
+Assert.assertEquals(0, compacts.size());
+
+executeStatementOnDriver("alter database " + database +
+" set dbproperties ('hive.repl.first.inc.pending' = 'true')", 
driver);
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
 
 Review comment:
   table level is taken care of now
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203236)
Time Spent: 9h 40m  (was: 9.5h)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203246&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203246
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 14:21
Start Date: 24/Feb/19 14:21
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622991
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
 ##
 @@ -112,6 +118,12 @@ public void run() {
 continue;
   }
 
+  if (replIsCompactionDisabledForTable(t)) {
 
 Review comment:
   not done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203246)
Time Spent: 11h 10m  (was: 11h)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203243&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203243
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 14:07
Start Date: 24/Feb/19 14:07
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622407
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java
 ##
 @@ -164,6 +165,12 @@ public IncrementalLoadTasksBuilder(String dbName, String 
tableName, String loadP
   lastEventid);
 }
   }
+
+  ReplSetFirstIncLoadFlagDesc desc = new 
ReplSetFirstIncLoadFlagDesc(dbName, tableName);
 
 Review comment:
   new ddl task is created to make it simpler with table level and warehouse 
level support
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203243)
Time Spent: 10h 40m  (was: 10.5h)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21312) FSStatsAggregator::connect is slow

2019-02-24 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776255#comment-16776255
 ] 

Zoltan Haindrich commented on HIVE-21312:
-

is {{statsList.add(stats)}} threadsafe? I think its a simple arraylist.

Note: I think another approach would be to instead of working with the FS based 
things; use Tez counters to haul this data.

> FSStatsAggregator::connect is slow
> --
>
> Key: HIVE-21312
> URL: https://issues.apache.org/jira/browse/HIVE-21312
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-21312.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203235&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203235
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 14:02
Start Date: 24/Feb/19 14:02
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622190
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -271,12 +299,13 @@ public String getName() {
 LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath);
 if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){
   ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false);
-  if (replicationSpec.isReplace() && 
conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) {
+  if (replicationSpec.isReplace() && 
(conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION) || copyToMigratedTxnTable)) {
 rcwork.setDeleteDestIfExist(true);
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203235)
Time Spent: 9.5h  (was: 9h 20m)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203244&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203244
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 14:07
Start Date: 24/Feb/19 14:07
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622413
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
 ##
 @@ -158,6 +159,15 @@ private boolean isDbEmpty(String dbName) throws 
HiveException {
 // Add the checkpoint key to the Database binding it to current dump 
directory.
 // So, if retry using same dump, we shall skip Database object update.
 parameters.put(ReplUtils.REPL_CHECKPOINT_KEY, dumpDirectory);
+
+if (needSetIncFlag) {
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203244)
Time Spent: 10h 50m  (was: 10h 40m)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203241&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203241
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 14:06
Start Date: 24/Feb/19 14:06
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622357
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
 ##
 @@ -135,7 +135,8 @@ private boolean isDbEmpty(String dbName) throws 
HiveException {
   }
 
   private Task alterDbTask(Database dbObj) {
-return alterDbTask(dbObj.getName(), updateDbProps(dbObj, 
context.dumpDirectory), context.hiveConf);
+return alterDbTask(dbObj.getName(), updateDbProps(dbObj, 
context.dumpDirectory, false),
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203241)
Time Spent: 10.5h  (was: 10h 20m)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203229&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203229
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 13:55
Start Date: 24/Feb/19 13:55
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259621882
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java
 ##
 @@ -289,12 +296,21 @@ private boolean shouldReplayEvent(FileStatus dir, 
DumpType dumpType, String dbNa
 return updateReplIdTask;
   }
 
-  private Task dbUpdateReplStateTask(String dbName, 
String replState,
+  private Task dbUpdateReplStateTask(String dbName, 
String replState, String incLoadPendFlag,
 
 Review comment:
   the code is removed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203229)
Time Spent: 9h 10m  (was: 9h)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203237&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203237
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 14:02
Start Date: 24/Feb/19 14:02
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622210
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 ##
 @@ -1536,6 +1536,62 @@ public void testCompactionInfoHashCode() {
 Assert.assertEquals("The hash codes must be equal", 
compactionInfo.hashCode(), compactionInfo1.hashCode());
   }
 
+  @Test
+  public void testDisableCompactionDuringReplLoad() throws Exception {
+String tblName = "discomp";
+String database = "discomp_db";
+executeStatementOnDriver("drop database if exists " + database + " 
cascade", driver);
+executeStatementOnDriver("create database " + database, driver);
+executeStatementOnDriver("CREATE TABLE " + database + "." + tblName + "(a 
INT, b STRING) " +
+" PARTITIONED BY(ds string)" +
+" CLUSTERED BY(a) INTO 2 BUCKETS" + //currently ACID requires 
table to be bucketed
+" STORED AS ORC TBLPROPERTIES ('transactional'='true')", driver);
+executeStatementOnDriver("insert into " + database + "." + tblName + " 
partition (ds) values (1, 'fred', " +
+"'today'), (2, 'wilma', 'yesterday')", driver);
+
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
+" SET TBLPROPERTIES ( 'hive.repl.first.inc.pending' = 'true')", 
driver);
+List compacts = getCompactionList();
+Assert.assertEquals(0, compacts.size());
+
+executeStatementOnDriver("alter database " + database +
+" set dbproperties ('hive.repl.first.inc.pending' = 'true')", 
driver);
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
+" SET TBLPROPERTIES ( 'hive.repl.first.inc.pending' = 'false')", 
driver);
+compacts = getCompactionList();
+Assert.assertEquals(0, compacts.size());
+
+executeStatementOnDriver("alter database " + database +
+" set dbproperties ('hive.repl.first.inc.pending' = 'false')", 
driver);
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
 
 Review comment:
   table level is taken care of now
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203237)
Time Spent: 9h 50m  (was: 9h 40m)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203240&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203240
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 14:03
Start Date: 24/Feb/19 14:03
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622274
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSpec.java
 ##
 @@ -426,4 +427,14 @@ public static void copyLastReplId(Map 
srcParameter, Map Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203238&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203238
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 14:02
Start Date: 24/Feb/19 14:02
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r25960
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
 ##
 @@ -71,6 +74,16 @@ public void init(AtomicBoolean stop, AtomicBoolean looped) 
throws Exception {
 }
   }
 
+  @Override boolean replIsCompactionDisabledForDatabase(String dbName) throws 
TException {
+try {
+  Database database = rs.getDatabase(getDefaultCatalog(conf), dbName);
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203238)
Time Spent: 10h  (was: 9h 50m)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203239&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203239
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 14:03
Start Date: 24/Feb/19 14:03
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622236
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSpec.java
 ##
 @@ -426,4 +427,14 @@ public static void copyLastReplId(Map 
srcParameter, Map Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203234&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203234
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 14:02
Start Date: 24/Feb/19 14:02
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622183
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -370,6 +370,9 @@ private int executeIncrementalLoad(DriverContext 
driverContext) {
 
   // If incremental events are already applied, then check and perform if 
need to bootstrap any tables.
   if (!builder.hasMoreWork() && !work.getPathsToCopyIterator().hasNext()) {
+// No need to set incremental load pending flag for external tables as 
the files will be copied to the same path
 
 Review comment:
   todo not required as table level load is taken care now
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203234)
Time Spent: 9h 20m  (was: 9h 10m)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203228&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203228
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 13:54
Start Date: 24/Feb/19 13:54
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259621844
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java
 ##
 @@ -289,12 +296,21 @@ private boolean shouldReplayEvent(FileStatus dir, 
DumpType dumpType, String dbNa
 return updateReplIdTask;
   }
 
-  private Task dbUpdateReplStateTask(String dbName, 
String replState,
+  private Task dbUpdateReplStateTask(String dbName, 
String replState, String incLoadPendFlag,
  Task preCursor) {
 HashMap mapProp = new HashMap<>();
-mapProp.put(ReplicationSpec.KEY.CURR_STATE_ID.toString(), replState);
 
-AlterDatabaseDesc alterDbDesc = new AlterDatabaseDesc(dbName, mapProp, new 
ReplicationSpec(replState, replState));
+// if the update is for incLoadPendFlag, then send replicationSpec as null 
to avoid replacement check.
+ReplicationSpec replicationSpec = null;
+if (incLoadPendFlag == null) {
+  mapProp.put(ReplicationSpec.KEY.CURR_STATE_ID.toString(), replState);
+  replicationSpec = new ReplicationSpec(replState, replState);
+} else {
+  assert replState == null;
+  mapProp.put(ReplUtils.REPL_FIRST_INC_PENDING_FLAG, incLoadPendFlag);
 
 Review comment:
   done. Dump will fail if the inc pending flag is set to true
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203228)
Time Spent: 9h  (was: 8h 50m)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203223&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203223
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 12:49
Start Date: 24/Feb/19 12:49
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259618842
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +192,12 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncDone(Map parameter) {
+if (parameter == null) {
+  return true;
+}
+String compFlag = parameter.get(ReplUtils.REPL_FIRST_INC_PENDING_FLAG);
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203223)
Time Spent: 8h 50m  (was: 8h 40m)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203222&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203222
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 12:49
Start Date: 24/Feb/19 12:49
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259618838
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +192,12 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncDone(Map parameter) {
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203222)
Time Spent: 8h 40m  (was: 8.5h)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21197) Hive replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21197?focusedWorklogId=203219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-203219
 ]

ASF GitHub Bot logged work on HIVE-21197:
-

Author: ASF GitHub Bot
Created on: 24/Feb/19 12:23
Start Date: 24/Feb/19 12:23
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #541: HIVE-21197 : 
Hive Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259617900
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -61,6 +62,21 @@ public ReplCopyTask(){
 super();
   }
 
+  // If file is already present in base directory, then remove it from the 
list.
+  // Check  HIVE-21197 for more detail
+  private void updateSrcFileListForDupCopy(FileSystem dstFs, Path toPath, 
List srcFiles,
+   long writeId, int stmtId) throws 
IOException {
+ListIterator iter = srcFiles.listIterator();
+Path basePath = new Path(toPath, AcidUtils.baseOrDeltaSubdir(true, 
writeId, writeId, stmtId));
+while (iter.hasNext()) {
+  Path filePath = new Path(basePath, 
iter.next().getSourcePath().getName());
+  if (dstFs.exists(filePath)) {
 
 Review comment:
   the i/o exception retry case is handled specifically at 2 places only. there 
are many other i/o failure scenarios which are not handled. I think its not 
required here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 203219)
Time Spent: 8.5h  (was: 8h 20m)

> Hive replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21197
> URL: https://issues.apache.org/jira/browse/HIVE-21197
> Project: Hive
>  Issue Type: Task
>  Components: repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21312) FSStatsAggregator::connect is slow

2019-02-24 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-21312:

Attachment: HIVE-21312.1.patch

> FSStatsAggregator::connect is slow
> --
>
> Key: HIVE-21312
> URL: https://issues.apache.org/jira/browse/HIVE-21312
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-21312.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21312) FSStatsAggregator::connect is slow

2019-02-24 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-21312:

Status: Patch Available  (was: Open)

> FSStatsAggregator::connect is slow
> --
>
> Key: HIVE-21312
> URL: https://issues.apache.org/jira/browse/HIVE-21312
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-21312.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16683) ORC WriterVersion gets ArrayIndexOutOfBoundsException on newer ORC files

2019-02-24 Thread Bo Hai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776214#comment-16776214
 ] 

Bo Hai commented on HIVE-16683:
---

[~owen.omalley] Does this patch impact forward compatibility of orc reader in 
hive 2.1.1 ?

> ORC WriterVersion gets ArrayIndexOutOfBoundsException on newer ORC files
> 
>
> Key: HIVE-16683
> URL: https://issues.apache.org/jira/browse/HIVE-16683
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: HIVE-16683.patch
>
>
> This only impacts branch-2.1 and branch-2.2, because it has been fixed in the 
> ORC project's code base via ORC-125.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-02-24 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21286:

Summary: Hive should support clean-up of previously bootstrapped tables 
when retry from different dump.  (was: Hive should support clean-up of 
incrementally bootstrapped tables when retry from different dump.)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.cleanup.bootstrap=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21313) Use faster function to point to instead of copy immutable byte arrays

2019-02-24 Thread ZhangXin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangXin updated HIVE-21313:

Summary: Use faster function to point to instead of copy immutable byte 
arrays  (was: Use faster function to point to instead of copy the immutable 
byte array)

> Use faster function to point to instead of copy immutable byte arrays
> -
>
> Key: HIVE-21313
> URL: https://issues.apache.org/jira/browse/HIVE-21313
> Project: Hive
>  Issue Type: Improvement
>Reporter: ZhangXin
>Assignee: ZhangXin
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-21313.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In file ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAssignRow.java
> We may find code like this:
> ```
> Text text = (Text) convertTargetWritable;
>  if (text == null)
> {     text = new Text(); }
> text.set(string);
>  ((BytesColumnVector) columnVector).setVal(
>      batchIndex, text.getBytes(), 0, text.getLength());
> ```
>  
> Using `setVal` method can copy the bytes array generated by 
> `text.getBytes()`. This is totally unnecessary at all. Since the bytes array 
> is immutable, we can just use `setRef` method to point to the specific  byte 
> array, which will also lower the memory usage.
>  
> Pull request on Github:  https://github.com/apache/hive/pull/548
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21313) Use faster function to point to instead of copy the immutable byte array

2019-02-24 Thread ZhangXin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangXin updated HIVE-21313:

Summary: Use faster function to point to instead of copy the immutable byte 
array  (was: Use faster function to prevent copying immutable byte array twice)

> Use faster function to point to instead of copy the immutable byte array
> 
>
> Key: HIVE-21313
> URL: https://issues.apache.org/jira/browse/HIVE-21313
> Project: Hive
>  Issue Type: Improvement
>Reporter: ZhangXin
>Assignee: ZhangXin
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-21313.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In file ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAssignRow.java
> We may find code like this:
> ```
> Text text = (Text) convertTargetWritable;
>  if (text == null)
> {     text = new Text(); }
> text.set(string);
>  ((BytesColumnVector) columnVector).setVal(
>      batchIndex, text.getBytes(), 0, text.getLength());
> ```
>  
> Using `setVal` method can copy the bytes array generated by 
> `text.getBytes()`. This is totally unnecessary at all. Since the bytes array 
> is immutable, we can just use `setRef` method to point to the specific  byte 
> array, which will also lower the memory usage.
>  
> Pull request on Github:  https://github.com/apache/hive/pull/548
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21307) Need to set GzipJSONMessageEncoder as default config for EVENT_MESSAGE_FACTORY.

2019-02-24 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21307:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

02.patch committed to master.
Thanks [~maheshk114] for the review!

> Need to set GzipJSONMessageEncoder as default config for 
> EVENT_MESSAGE_FACTORY.
> ---
>
> Key: HIVE-21307
> URL: https://issues.apache.org/jira/browse/HIVE-21307
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21307.01.patch, HIVE-21307.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we use JsonMessageEncoder as the default message factory for 
> Notification events. As the size of some of the events are really huge and 
> cause OOM issues in RDBMS. So, it is needed to enable GzipJSONMessageEncoder 
> as default message factory to optimise the memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21286) Hive should support clean-up of incrementally bootstrapped tables when retry from different dump.

2019-02-24 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21286:

Description: 
If external tables are enabled for replication on an existing repl policy, then 
bootstrapping of external tables are combined with incremental dump.
If incremental bootstrap load fails with non-retryable error for which user 
will have to manually drop all the external tables before trying with another 
bootstrap dump. For full bootstrap, to retry with different dump, we suggested 
user to drop the DB but in this case they need to manually drop all the 
external tables which is not so user friendly. So, need to handle it in Hive 
side as follows.

REPL LOAD takes additional config (passed by user in WITH clause) that says, 
drop all the tables which are bootstrapped from previous dump. 
hive.repl.cleanup.bootstrap=

Hive will use this config only if the current dump is bootstrap dump or 
combined bootstrap in incremental dump.
Caution to be taken by user that this config should not be passed if previous 
REPL LOAD (with bootstrap) was successful or any successful incremental 
dump+load happened after "previous_bootstrap_dump_dir".

  was:
If external tables are enabled for replication on an existing repl policy, then 
bootstrapping of external tables are combined with incremental dump.
If incremental bootstrap load fails with non-retryable error for which user 
will have to manually drop all the external tables before trying with another 
bootstrap dump. For full bootstrap, to retry with different dump, we suggested 
user to drop the DB but in this case they need to manually drop all the 
external tables which is not so user friendly. So, need to handle it in Hive 
side as follows.

REPL LOAD takes additional config (passed by user in WITH clause) that says, 
drop all the tables which are bootstrapped from previous dump. 
hive.repl.cleanup.bootstrap=
Hive will use this config only if the current dump is bootstrap dump or 
combined bootstrap in incremental dump.
Caution to be taken by user that this config should not be passed if previous 
REPL LOAD (with bootstrap) was successful or any successful incremental 
dump+load happened after "previous_bootstrap_dump_dir".


> Hive should support clean-up of incrementally bootstrapped tables when retry 
> from different dump.
> -
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.cleanup.bootstrap=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21307) Need to set GzipJSONMessageEncoder as default config for EVENT_MESSAGE_FACTORY.

2019-02-24 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776152#comment-16776152
 ] 

Sankar Hariappan commented on HIVE-21307:
-

Here is the link to +1 from [~maheshk114] as it is hidden in the bunch of flaky 
ptest failure comments.
https://issues.apache.org/jira/browse/HIVE-21307?focusedCommentId=16775876&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16775876

> Need to set GzipJSONMessageEncoder as default config for 
> EVENT_MESSAGE_FACTORY.
> ---
>
> Key: HIVE-21307
> URL: https://issues.apache.org/jira/browse/HIVE-21307
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21307.01.patch, HIVE-21307.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we use JsonMessageEncoder as the default message factory for 
> Notification events. As the size of some of the events are really huge and 
> cause OOM issues in RDBMS. So, it is needed to enable GzipJSONMessageEncoder 
> as default message factory to optimise the memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21313) Use faster function to prevent copying immutable byte array twice

2019-02-24 Thread ZhangXin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangXin updated HIVE-21313:

Priority: Minor  (was: Major)

> Use faster function to prevent copying immutable byte array twice
> -
>
> Key: HIVE-21313
> URL: https://issues.apache.org/jira/browse/HIVE-21313
> Project: Hive
>  Issue Type: Improvement
>Reporter: ZhangXin
>Assignee: ZhangXin
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-21313.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In file ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAssignRow.java
> We may find code like this:
> ```
> Text text = (Text) convertTargetWritable;
>  if (text == null)
> {     text = new Text(); }
> text.set(string);
>  ((BytesColumnVector) columnVector).setVal(
>      batchIndex, text.getBytes(), 0, text.getLength());
> ```
>  
> Using `setVal` method can copy the bytes array generated by 
> `text.getBytes()`. This is totally unnecessary at all. Since the bytes array 
> is immutable, we can just use `setRef` method to point to the specific  byte 
> array, which will also lower the memory usage.
>  
> Pull request on Github:  https://github.com/apache/hive/pull/548
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21286) Hive should support clean-up of incrementally bootstrapped tables when retry from different dump.

2019-02-24 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21286:

Description: 
If external tables are enabled for replication on an existing repl policy, then 
bootstrapping of external tables are combined with incremental dump.
If incremental bootstrap load fails with non-retryable error for which user 
will have to manually drop all the external tables before trying with another 
bootstrap dump. For full bootstrap, to retry with different dump, we suggested 
user to drop the DB but in this case they need to manually drop all the 
external tables which is not so user friendly. So, need to handle it in Hive 
side as follows.

REPL LOAD takes additional config (passed by user in WITH clause) that says, 
drop all the tables which are bootstrapped from previous dump. 
hive.repl.cleanup.bootstrap=
Caution to be taken by user that this config should not be passed if previous 
REPL LOAD (with bootstrap) was successful or any successful incremental 
dump+load happened after "previous_bootstrap_dump_dir".

  was:
If external tables are enabled for replication on an existing repl policy, then 
bootstrapping of external tables are combined with incremental dump.
If incremental bootstrap load fails with non-retryable error for which user 
will have to manually drop all the external tables before trying with another 
bootstrap dump. For full bootstrap, to retry with different dump, we suggested 
user to drop the DB but in this case they need to manually drop all the 
external tables which is not so user friendly. So, need to handle it in Hive 
side as follows.

REPL LOAD takes additional config (passed by user in WITH clause) that says, 
drop all the tables which are part of this bootstrap dump. There are 4 cases 
possible.
1. Only external tables - Drop all external tables before triggering bootstrap 
load.
2. Only ACID/MM tables - Drop all ACID/MM tables before triggering bootstrap 
load.
3. Both external and ACID/MM tables - Drop both external and ACID/MM tables 
before triggering bootstrap load.
3. Table level replication with bootstrap - Drop all the tables that match the 
diff in previous and current repl policy (pattern+include/exclude list) before 
triggering bootstrap load.
Configuration: hive.repl.bootstrap.cleanup.type=
{1=external_tables, 2=transactional_tables, 
3=external_and_transactional_tables, 4=table_level}


> Hive should support clean-up of incrementally bootstrapped tables when retry 
> from different dump.
> -
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.cleanup.bootstrap=
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21286) Hive should support clean-up of incrementally bootstrapped tables when retry from different dump.

2019-02-24 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21286:

Description: 
If external tables are enabled for replication on an existing repl policy, then 
bootstrapping of external tables are combined with incremental dump.
If incremental bootstrap load fails with non-retryable error for which user 
will have to manually drop all the external tables before trying with another 
bootstrap dump. For full bootstrap, to retry with different dump, we suggested 
user to drop the DB but in this case they need to manually drop all the 
external tables which is not so user friendly. So, need to handle it in Hive 
side as follows.

REPL LOAD takes additional config (passed by user in WITH clause) that says, 
drop all the tables which are bootstrapped from previous dump. 
hive.repl.cleanup.bootstrap=
Hive will use this config only if the current dump is bootstrap dump or 
combined bootstrap in incremental dump.
Caution to be taken by user that this config should not be passed if previous 
REPL LOAD (with bootstrap) was successful or any successful incremental 
dump+load happened after "previous_bootstrap_dump_dir".

  was:
If external tables are enabled for replication on an existing repl policy, then 
bootstrapping of external tables are combined with incremental dump.
If incremental bootstrap load fails with non-retryable error for which user 
will have to manually drop all the external tables before trying with another 
bootstrap dump. For full bootstrap, to retry with different dump, we suggested 
user to drop the DB but in this case they need to manually drop all the 
external tables which is not so user friendly. So, need to handle it in Hive 
side as follows.

REPL LOAD takes additional config (passed by user in WITH clause) that says, 
drop all the tables which are bootstrapped from previous dump. 
hive.repl.cleanup.bootstrap=
Hive will use this config only if the current dump is bootstrap dump or 
bootstrap in incremental dump.
Caution to be taken by user that this config should not be passed if previous 
REPL LOAD (with bootstrap) was successful or any successful incremental 
dump+load happened after "previous_bootstrap_dump_dir".


> Hive should support clean-up of incrementally bootstrapped tables when retry 
> from different dump.
> -
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.cleanup.bootstrap=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21307) Need to set GzipJSONMessageEncoder as default config for EVENT_MESSAGE_FACTORY.

2019-02-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776149#comment-16776149
 ] 

Hive QA commented on HIVE-21307:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12959928/HIVE-21307.02.patch

{color:green}SUCCESS:{color} +1 due to 8 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15811 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16220/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16220/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16220/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12959928 - PreCommit-HIVE-Build

> Need to set GzipJSONMessageEncoder as default config for 
> EVENT_MESSAGE_FACTORY.
> ---
>
> Key: HIVE-21307
> URL: https://issues.apache.org/jira/browse/HIVE-21307
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21307.01.patch, HIVE-21307.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we use JsonMessageEncoder as the default message factory for 
> Notification events. As the size of some of the events are really huge and 
> cause OOM issues in RDBMS. So, it is needed to enable GzipJSONMessageEncoder 
> as default message factory to optimise the memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21286) Hive should support clean-up of incrementally bootstrapped tables when retry from different dump.

2019-02-24 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21286:

Description: 
If external tables are enabled for replication on an existing repl policy, then 
bootstrapping of external tables are combined with incremental dump.
If incremental bootstrap load fails with non-retryable error for which user 
will have to manually drop all the external tables before trying with another 
bootstrap dump. For full bootstrap, to retry with different dump, we suggested 
user to drop the DB but in this case they need to manually drop all the 
external tables which is not so user friendly. So, need to handle it in Hive 
side as follows.

REPL LOAD takes additional config (passed by user in WITH clause) that says, 
drop all the tables which are bootstrapped from previous dump. 
hive.repl.cleanup.bootstrap=
Hive will use this config only if the current dump is bootstrap dump or 
bootstrap in incremental dump.
Caution to be taken by user that this config should not be passed if previous 
REPL LOAD (with bootstrap) was successful or any successful incremental 
dump+load happened after "previous_bootstrap_dump_dir".

  was:
If external tables are enabled for replication on an existing repl policy, then 
bootstrapping of external tables are combined with incremental dump.
If incremental bootstrap load fails with non-retryable error for which user 
will have to manually drop all the external tables before trying with another 
bootstrap dump. For full bootstrap, to retry with different dump, we suggested 
user to drop the DB but in this case they need to manually drop all the 
external tables which is not so user friendly. So, need to handle it in Hive 
side as follows.

REPL LOAD takes additional config (passed by user in WITH clause) that says, 
drop all the tables which are bootstrapped from previous dump. 
hive.repl.cleanup.bootstrap=
Caution to be taken by user that this config should not be passed if previous 
REPL LOAD (with bootstrap) was successful or any successful incremental 
dump+load happened after "previous_bootstrap_dump_dir".


> Hive should support clean-up of incrementally bootstrapped tables when retry 
> from different dump.
> -
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.cleanup.bootstrap=
> Hive will use this config only if the current dump is bootstrap dump or 
> bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)