[jira] [Commented] (HIVE-21001) Upgrade to calcite-1.18

2019-03-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783739#comment-16783739
 ] 

Hive QA commented on HIVE-21001:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
46s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
33s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
30s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
36s{color} | {color:blue} accumulo-handler in master has 21 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
38s{color} | {color:blue} hbase-handler in master has 15 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  9m 
48s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
47s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m  
6s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
50s{color} | {color:red} ql: The patch generated 5 new + 290 unchanged - 29 
fixed = 295 total (was 319) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m 
19s{color} | {color:red} root: The patch generated 5 new + 290 unchanged - 29 
fixed = 295 total (was 319) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
37s{color} | {color:red} patch/ql cannot run setBugDatabaseInfo from findbugs 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
18s{color} | {color:red} patch/accumulo-handler cannot run setBugDatabaseInfo 
from findbugs {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
20s{color} | {color:red} patch/hbase-handler cannot run setBugDatabaseInfo from 
findbugs {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 18m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 90m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  xml  compile  findbugs  
checkstyle  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16326/dev-support/hive-personality.sh
 |
| git revision | master / f51f108 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16326/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16326/yetus/diff-checkstyle-root.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16326/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16326/yetus/patch-findbugs-ql.txt
 |
| findbugs | 

[jira] [Updated] (HIVE-21376) Incompatible change in Hive bucket computation

2019-03-04 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-21376:
---
Target Version/s: 3.0.1, 4.0.0, 3.2.0, 3.1.2

> Incompatible change in Hive bucket computation
> --
>
> Key: HIVE-21376
> URL: https://issues.apache.org/jira/browse/HIVE-21376
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: David Phillips
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21376.patch
>
>
> HIVE-20007 seems to have inadvertently changed the bucket hash code 
> computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the 
> {{DATE}} and {{TIMESTAMP}} data type2.
> {{DATE}} was previously computed using {{DateWritable}}, which uses 
> {{daysSinceEpoch}} as the hash code. It is now computed using 
> {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} 
> (which is not days since epoch).
> {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses 
> {{TimestampWritableV2}}. They ostensibly use the same hash code computation, 
> but there are two important differences:
>  # {{TimestampWritable}} rounds the number of milliseconds into the seconds 
> portion of the computation, but {{TimestampWritableV2}} does not.
>  # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, 
> which returns it relative to the JVM time zone, not UTC. 
> {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC.
> I was unable to get Hive 3.1 running in order to verify if this actually 
> causes data to be read or written incorrectly (there may be code above this 
> library method which makes things work correctly). However, if my 
> understanding is correct, this means Hive 3.1 is both forwards and backwards 
> incompatible with bucketed tables using either of these data types. It also 
> indicates that Hive needs tests to verify that the hash code does not change 
> between releases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21279) Avoid moving/rename operation in FileSink op for SELECT queries

2019-03-04 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21279:
---
Status: Open  (was: Patch Available)

> Avoid moving/rename operation in FileSink op for SELECT queries
> ---
>
> Key: HIVE-21279
> URL: https://issues.apache.org/jira/browse/HIVE-21279
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21279.1.patch, HIVE-21279.10.patch, 
> HIVE-21279.11.patch, HIVE-21279.12.patch, HIVE-21279.13.patch, 
> HIVE-21279.2.patch, HIVE-21279.3.patch, HIVE-21279.4.patch, 
> HIVE-21279.5.patch, HIVE-21279.6.patch, HIVE-21279.7.patch, 
> HIVE-21279.8.patch, HIVE-21279.9.patch
>
>
> Currently at the end of a job FileSink operator moves/rename temp directory 
> to another directory from which FetchTask fetches result. This is done to 
> avoid fetching potential partial/invalid files by failed/runway tasks. This 
> operation is expensive for cloud storage. It could be avoided if FetchTask is 
> passed on set of files to read from instead of whole directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21279) Avoid moving/rename operation in FileSink op for SELECT queries

2019-03-04 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21279:
---
Status: Patch Available  (was: Open)

> Avoid moving/rename operation in FileSink op for SELECT queries
> ---
>
> Key: HIVE-21279
> URL: https://issues.apache.org/jira/browse/HIVE-21279
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21279.1.patch, HIVE-21279.10.patch, 
> HIVE-21279.11.patch, HIVE-21279.12.patch, HIVE-21279.13.patch, 
> HIVE-21279.2.patch, HIVE-21279.3.patch, HIVE-21279.4.patch, 
> HIVE-21279.5.patch, HIVE-21279.6.patch, HIVE-21279.7.patch, 
> HIVE-21279.8.patch, HIVE-21279.9.patch
>
>
> Currently at the end of a job FileSink operator moves/rename temp directory 
> to another directory from which FetchTask fetches result. This is done to 
> avoid fetching potential partial/invalid files by failed/runway tasks. This 
> operation is expensive for cloud storage. It could be avoided if FetchTask is 
> passed on set of files to read from instead of whole directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21279) Avoid moving/rename operation in FileSink op for SELECT queries

2019-03-04 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21279:
---
Attachment: HIVE-21279.13.patch

> Avoid moving/rename operation in FileSink op for SELECT queries
> ---
>
> Key: HIVE-21279
> URL: https://issues.apache.org/jira/browse/HIVE-21279
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21279.1.patch, HIVE-21279.10.patch, 
> HIVE-21279.11.patch, HIVE-21279.12.patch, HIVE-21279.13.patch, 
> HIVE-21279.2.patch, HIVE-21279.3.patch, HIVE-21279.4.patch, 
> HIVE-21279.5.patch, HIVE-21279.6.patch, HIVE-21279.7.patch, 
> HIVE-21279.8.patch, HIVE-21279.9.patch
>
>
> Currently at the end of a job FileSink operator moves/rename temp directory 
> to another directory from which FetchTask fetches result. This is done to 
> avoid fetching potential partial/invalid files by failed/runway tasks. This 
> operation is expensive for cloud storage. It could be avoided if FetchTask is 
> passed on set of files to read from instead of whole directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21001) Upgrade to calcite-1.18

2019-03-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783718#comment-16783718
 ] 

Hive QA commented on HIVE-21001:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961018/HIVE-21001.43.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 15789 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=171)

[authorization_view_8.q,load_dyn_part5.q,vector_groupby_grouping_sets5.q,vector_complex_join.q,orc_llap.q,vectorization_7.q,cbo_gby.q,bucket_num_reducers_acid2.q,auto_sortmerge_join_1.q,results_cache_empty_result.q,lineage3.q,materialized_view_rewrite_empty.q,q93_with_constraints.q,vector_struct_in.q,bucketmapjoin3.q,vectorization_16.q,current_date_timestamp.q,orc_ppd_schema_evol_2a.q,partition_ctas.q,vector_windowing_multipartitioning.q,vectorized_join46.q,orc_ppd_date.q,create_merge_compressed.q,vector_outer_join1.q,dynpart_sort_optimization_acid.q,vectorization_not.q,having.q,vectorization_input_format_excludes.q,leftsemijoin.q,special_character_in_tabnames_1.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ambiguitycheck] 
(batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constant_prop_3] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcs] (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[rand_partitionpruner3] 
(batchId=86)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_date_1] 
(batchId=23)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[semijoin6] 
(batchId=182)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_date_1]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_interval_2]
 (batchId=177)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16326/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16326/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16326/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961018 - PreCommit-HIVE-Build

> Upgrade to calcite-1.18
> ---
>
> Key: HIVE-21001
> URL: https://issues.apache.org/jira/browse/HIVE-21001
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21001.01.patch, HIVE-21001.01.patch, 
> HIVE-21001.02.patch, HIVE-21001.03.patch, HIVE-21001.04.patch, 
> HIVE-21001.05.patch, HIVE-21001.06.patch, HIVE-21001.06.patch, 
> HIVE-21001.07.patch, HIVE-21001.08.patch, HIVE-21001.08.patch, 
> HIVE-21001.08.patch, HIVE-21001.09.patch, HIVE-21001.09.patch, 
> HIVE-21001.09.patch, HIVE-21001.10.patch, HIVE-21001.11.patch, 
> HIVE-21001.12.patch, HIVE-21001.13.patch, HIVE-21001.15.patch, 
> HIVE-21001.16.patch, HIVE-21001.17.patch, HIVE-21001.18.patch, 
> HIVE-21001.18.patch, HIVE-21001.19.patch, HIVE-21001.20.patch, 
> HIVE-21001.21.patch, HIVE-21001.22.patch, HIVE-21001.22.patch, 
> HIVE-21001.22.patch, HIVE-21001.23.patch, HIVE-21001.24.patch, 
> HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.26.patch, 
> HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.27.patch, 
> HIVE-21001.28.patch, HIVE-21001.29.patch, HIVE-21001.29.patch, 
> HIVE-21001.30.patch, HIVE-21001.31.patch, HIVE-21001.32.patch, 
> HIVE-21001.34.patch, HIVE-21001.35.patch, HIVE-21001.36.patch, 
> HIVE-21001.37.patch, HIVE-21001.38.patch, HIVE-21001.39.patch, 
> HIVE-21001.40.patch, HIVE-21001.41.patch, HIVE-21001.42.patch, 
> HIVE-21001.43.patch
>
>
> XLEAR LIBRARY CACHE 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21344) CBO: Materialized view registry is not used for Calcite planner

2019-03-04 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-21344:
--

Assignee: Jesus Camacho Rodriguez

> CBO: Materialized view registry is not used for Calcite planner
> ---
>
> Key: HIVE-21344
> URL: https://issues.apache.org/jira/browse/HIVE-21344
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: calcite-planner-after-fix.svg.zip, mv-get-from-remote.png
>
>
> {code}
> // This is not a rebuild, we retrieve all the materializations. In turn, we 
> do not need
>   // to force the materialization contents to be up-to-date, as this 
> is not a rebuild, and
>   // we apply the user parameters 
> (HIVE_MATERIALIZED_VIEW_REWRITING_TIME_WINDOW) instead.
>   materializations = 
> db.getAllValidMaterializedViews(getTablesUsed(basePlan), false, getTxnMgr());
> }
> {code}
> !mv-get-from-remote.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21376) Incompatible change in Hive bucket computation

2019-03-04 Thread David Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783706#comment-16783706
 ] 

David Phillips commented on HIVE-21376:
---

I believe that v2 will have a similar incompatible change between 3.0 and 3.1 
for {{TIMESTAMP}} due to the time value coming from {{java.sql.Timestamp}} 
changing from local to UTC.

> Incompatible change in Hive bucket computation
> --
>
> Key: HIVE-21376
> URL: https://issues.apache.org/jira/browse/HIVE-21376
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: David Phillips
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21376.patch
>
>
> HIVE-20007 seems to have inadvertently changed the bucket hash code 
> computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the 
> {{DATE}} and {{TIMESTAMP}} data type2.
> {{DATE}} was previously computed using {{DateWritable}}, which uses 
> {{daysSinceEpoch}} as the hash code. It is now computed using 
> {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} 
> (which is not days since epoch).
> {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses 
> {{TimestampWritableV2}}. They ostensibly use the same hash code computation, 
> but there are two important differences:
>  # {{TimestampWritable}} rounds the number of milliseconds into the seconds 
> portion of the computation, but {{TimestampWritableV2}} does not.
>  # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, 
> which returns it relative to the JVM time zone, not UTC. 
> {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC.
> I was unable to get Hive 3.1 running in order to verify if this actually 
> causes data to be read or written incorrectly (there may be code above this 
> library method which makes things work correctly). However, if my 
> understanding is correct, this means Hive 3.1 is both forwards and backwards 
> incompatible with bucketed tables using either of these data types. It also 
> indicates that Hive needs tests to verify that the hash code does not change 
> between releases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21377) Using Oracle as HMS DB with DirectSQL

2019-03-04 Thread Rajkumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajkumar Singh updated HIVE-21377:
--
Attachment: HIVE-21377.patch
Status: Patch Available  (was: In Progress)

> Using Oracle as HMS DB with DirectSQL
> -
>
> Key: HIVE-21377
> URL: https://issues.apache.org/jira/browse/HIVE-21377
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.1.0, 3.0.0
>Reporter: Bo 
>Assignee: Rajkumar Singh
>Priority: Major
> Attachments: HIVE-21377.patch
>
>
> When we use the Oracle as HMS DB, we saw this kind of contents in the HMS log 
> accordingly:
> {code:java}
> 2019-02-02 T08:23:57,102 WARN [Thread-12]: metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(3741)) - Falling back to ORM path due 
> to direct SQL failure (this is not an error): Cannot extract boolean from 
> column value 0 at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.extractSqlBoolean(MetaStoreDirectSql.java:1031)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsFromPartitionIds(MetaStoreDirectSql.java:728)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$300(MetaStoreDirectSql.java:109)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql$1.run(MetaStoreDirectSql.java:471)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:73) 
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:462)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$8.getSqlResult(ObjectStore.java:3392)
> {code}
> In Hive, we handle the Postgres, MySQL and Derby for the extractSqlBoolean.
> But Oracle return the 0 or 1 for Boolean. So we need to modify the 
> MetastoreDirectSqlUtils.java - [1]
> So, could add this snip in this code?
> {code:java}
>   static Boolean extractSqlBoolean(Object value) throws MetaException {
> if (value == null) {
>   return null;
> }
> if (value instanceof Boolean) {
>   return (Boolean)value;
> }
> if (value instanceof Number) { // add
>   try {
> return BooleanUtils.toBooleanObject((Decimal) value, 1, 0, null);
>   } catch(IllegalArugmentExeception iae){
>   // NOOP
>   }
> if (value instanceof String) {
>   try {
> return BooleanUtils.toBooleanObject((String) value, "Y", "N", null);
>   } catch (IllegalArgumentException iae) {
> // NOOP
>   }
> }
> throw new MetaException("Cannot extract boolean from column value " + 
> value);
>   }
> {code}
>  [1] -
> https://github.com/apache/hive/blob/f51f108b761f0c88647f48f30447dae12b308f31/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java#L501-L527
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-21343) CBO: CalcitePlanner debug logging is expensive and costly

2019-03-04 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-21343.

   Resolution: Fixed
Fix Version/s: 4.0.0

Fixed as part of HIVE-18920 .

> CBO: CalcitePlanner debug logging is expensive and costly
> -
>
> Key: HIVE-21343
> URL: https://issues.apache.org/jira/browse/HIVE-21343
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: Reloptutil-toString.png, 
> calcite-planner-after-fix.svg.zip
>
>
> {code}
>   //Remove subquery
>   LOG.debug("Plan before removing subquery:\n" + 
> RelOptUtil.toString(calciteGenPlan));
>   calciteGenPlan = hepPlan(calciteGenPlan, false, 
> mdProvider.getMetadataProvider(), null,
>   new HiveSubQueryRemoveRule(conf));
>   LOG.debug("Plan just after removing subquery:\n" + 
> RelOptUtil.toString(calciteGenPlan));
>   calciteGenPlan = HiveRelDecorrelator.decorrelateQuery(calciteGenPlan);
>   LOG.debug("Plan after decorrelation:\n" + 
> RelOptUtil.toString(calciteGenPlan));
> {code}
> The LOG.debug() consumes more CPU than the actual planner steps.
>  !Reloptutil-toString.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18920) CBO: Initialize the Janino providers ahead of 1st query

2019-03-04 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-18920:
---
   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks [~ashutoshc]

> CBO: Initialize the Janino providers ahead of 1st query
> ---
>
> Key: HIVE-18920
> URL: https://issues.apache.org/jira/browse/HIVE-18920
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-18920.01.patch, HIVE-18920.02.patch, 
> HIVE-18920.patch
>
>
> Hive Calcite metadata providers are compiled when the 1st query comes in.
> If a second query arrives before the 1st one has built a metadata provider, 
> it will also try to do the same thing, because the cache is not populated yet.
> With 1024 concurrent users, it takes 6 minutes for the 1st query to finish 
> fighting all the other queries which are trying to load that cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HIVE-21377) Using Oracle as HMS DB with DirectSQL

2019-03-04 Thread Rajkumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-21377 started by Rajkumar Singh.
-
> Using Oracle as HMS DB with DirectSQL
> -
>
> Key: HIVE-21377
> URL: https://issues.apache.org/jira/browse/HIVE-21377
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Bo 
>Assignee: Rajkumar Singh
>Priority: Major
>
> When we use the Oracle as HMS DB, we saw this kind of contents in the HMS log 
> accordingly:
> {code:java}
> 2019-02-02 T08:23:57,102 WARN [Thread-12]: metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(3741)) - Falling back to ORM path due 
> to direct SQL failure (this is not an error): Cannot extract boolean from 
> column value 0 at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.extractSqlBoolean(MetaStoreDirectSql.java:1031)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsFromPartitionIds(MetaStoreDirectSql.java:728)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$300(MetaStoreDirectSql.java:109)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql$1.run(MetaStoreDirectSql.java:471)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:73) 
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:462)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$8.getSqlResult(ObjectStore.java:3392)
> {code}
> In Hive, we handle the Postgres, MySQL and Derby for the extractSqlBoolean.
> But Oracle return the 0 or 1 for Boolean. So we need to modify the 
> MetastoreDirectSqlUtils.java - [1]
> So, could add this snip in this code?
> {code:java}
>   static Boolean extractSqlBoolean(Object value) throws MetaException {
> if (value == null) {
>   return null;
> }
> if (value instanceof Boolean) {
>   return (Boolean)value;
> }
> if (value instanceof Number) { // add
>   try {
> return BooleanUtils.toBooleanObject((Decimal) value, 1, 0, null);
>   } catch(IllegalArugmentExeception iae){
>   // NOOP
>   }
> if (value instanceof String) {
>   try {
> return BooleanUtils.toBooleanObject((String) value, "Y", "N", null);
>   } catch (IllegalArgumentException iae) {
> // NOOP
>   }
> }
> throw new MetaException("Cannot extract boolean from column value " + 
> value);
>   }
> {code}
>  [1] -
> https://github.com/apache/hive/blob/f51f108b761f0c88647f48f30447dae12b308f31/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java#L501-L527
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18920) CBO: Initialize the Janino providers ahead of 1st query

2019-03-04 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783693#comment-16783693
 ] 

Ashutosh Chauhan commented on HIVE-18920:
-

+1

> CBO: Initialize the Janino providers ahead of 1st query
> ---
>
> Key: HIVE-18920
> URL: https://issues.apache.org/jira/browse/HIVE-18920
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-18920.01.patch, HIVE-18920.02.patch, 
> HIVE-18920.patch
>
>
> Hive Calcite metadata providers are compiled when the 1st query comes in.
> If a second query arrives before the 1st one has built a metadata provider, 
> it will also try to do the same thing, because the cache is not populated yet.
> With 1024 concurrent users, it takes 6 minutes for the 1st query to finish 
> fighting all the other queries which are trying to load that cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21377) Using Oracle as HMS DB with DirectSQL

2019-03-04 Thread Rajkumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajkumar Singh reassigned HIVE-21377:
-

Assignee: Rajkumar Singh

> Using Oracle as HMS DB with DirectSQL
> -
>
> Key: HIVE-21377
> URL: https://issues.apache.org/jira/browse/HIVE-21377
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Bo 
>Assignee: Rajkumar Singh
>Priority: Major
>
> When we use the Oracle as HMS DB, we saw this kind of contents in the HMS log 
> accordingly:
> {code:java}
> 2019-02-02 T08:23:57,102 WARN [Thread-12]: metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(3741)) - Falling back to ORM path due 
> to direct SQL failure (this is not an error): Cannot extract boolean from 
> column value 0 at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.extractSqlBoolean(MetaStoreDirectSql.java:1031)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsFromPartitionIds(MetaStoreDirectSql.java:728)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$300(MetaStoreDirectSql.java:109)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql$1.run(MetaStoreDirectSql.java:471)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:73) 
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:462)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$8.getSqlResult(ObjectStore.java:3392)
> {code}
> In Hive, we handle the Postgres, MySQL and Derby for the extractSqlBoolean.
> But Oracle return the 0 or 1 for Boolean. So we need to modify the 
> MetastoreDirectSqlUtils.java - [1]
> So, could add this snip in this code?
> {code:java}
>   static Boolean extractSqlBoolean(Object value) throws MetaException {
> if (value == null) {
>   return null;
> }
> if (value instanceof Boolean) {
>   return (Boolean)value;
> }
> if (value instanceof Number) { // add
>   try {
> return BooleanUtils.toBooleanObject((Decimal) value, 1, 0, null);
>   } catch(IllegalArugmentExeception iae){
>   // NOOP
>   }
> if (value instanceof String) {
>   try {
> return BooleanUtils.toBooleanObject((String) value, "Y", "N", null);
>   } catch (IllegalArgumentException iae) {
> // NOOP
>   }
> }
> throw new MetaException("Cannot extract boolean from column value " + 
> value);
>   }
> {code}
>  [1] -
> https://github.com/apache/hive/blob/f51f108b761f0c88647f48f30447dae12b308f31/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java#L501-L527
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18920) CBO: Initialize the Janino providers ahead of 1st query

2019-03-04 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783687#comment-16783687
 ] 

Jesus Camacho Rodriguez commented on HIVE-18920:


[~ashutoshc], [~gopalv] has confirmed this patch fixes the issue with the 
recompilation. Could you review it? Thanks

> CBO: Initialize the Janino providers ahead of 1st query
> ---
>
> Key: HIVE-18920
> URL: https://issues.apache.org/jira/browse/HIVE-18920
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-18920.01.patch, HIVE-18920.02.patch, 
> HIVE-18920.patch
>
>
> Hive Calcite metadata providers are compiled when the 1st query comes in.
> If a second query arrives before the 1st one has built a metadata provider, 
> it will also try to do the same thing, because the cache is not populated yet.
> With 1024 concurrent users, it takes 6 minutes for the 1st query to finish 
> fighting all the other queries which are trying to load that cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207339=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207339
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 04/Mar/19 18:20
Start Date: 04/Mar/19 18:20
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #551: HIVE-21286: 
Hive should support clean-up of previously bootstrapped tables when retry from 
different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262182774
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -534,6 +536,90 @@ public void 
bootstrapExternalTablesDuringIncrementalPhase() throws Throwable {
 .verifyResults(Arrays.asList("10", "20"));
   }
 
+  @Test
+  public void retryBootstrapExternalTablesFromDifferentDump() throws Throwable 
{
+List loadWithClause = new ArrayList<>();
+loadWithClause.addAll(externalTableBasePathWithClause());
+
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='false'"
+);
+
+WarehouseInstance.Tuple tupleBootstrapWithoutExternal = primary
+.run("use " + primaryDbName)
+.run("create external table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create external table t2 (place string) partitioned by 
(country string)")
+.run("insert into table t2 partition(country='india') values 
('bangalore')")
+.run("insert into table t2 partition(country='us') values 
('austin')")
+.run("create table t3 as select * from t1")
+.dump(primaryDbName, null, dumpWithClause);
+
+replica.load(replicatedDbName, tupleBootstrapWithoutExternal.dumpLocation, 
loadWithClause)
+.status(replicatedDbName)
+.verifyResult(tupleBootstrapWithoutExternal.lastReplicationId)
+.run("use " + replicatedDbName)
+.run("show tables")
+.verifyResult("t3")
+.run("select id from t3")
+.verifyResult("1");
+
+dumpWithClause = Arrays.asList("'" + 
HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + "'='true'",
+"'" + HiveConf.ConfVars.REPL_BOOTSTRAP_EXTERNAL_TABLES.varname + 
"'='true'");
+WarehouseInstance.Tuple tupleIncWithExternalBootstrap = primary.run("use " 
+ primaryDbName)
+.run("drop table t1")
+.run("create external table t4 (id int)")
+.run("insert into table t4 values (10)")
+.run("create table t5 as select * from t4")
+.dump(primaryDbName, 
tupleBootstrapWithoutExternal.lastReplicationId, dumpWithClause);
+
+// Verify if bootstrapping with same dump is idempotent and return same 
result
+for (int i = 0; i < 2; i++) {
+  replica.load(replicatedDbName, 
tupleIncWithExternalBootstrap.dumpLocation, loadWithClause)
+  .status(replicatedDbName)
+  .verifyResult(tupleIncWithExternalBootstrap.lastReplicationId)
+  .run("use " + replicatedDbName)
+  .run("show tables like 't1'")
+  .verifyFailure(new String[]{"t1"})
+  .run("select place from t2 where country = 'us'")
+  .verifyResult("austin")
+  .run("select id from t4")
+  .verifyResult("10")
+  .run("select id from t5")
+  .verifyResult("10");
+}
+
+// Drop an external table, add another managed table with same name, 
insert into existing external table
+// and dump another bootstrap dump for external tables.
+WarehouseInstance.Tuple tupleNewIncWithExternalBootstrap = 
primary.run("use " + primaryDbName)
+.run("insert into table t2 partition(country='india') values 
('chennai')")
+.run("drop table t2")
+.run("create table t2 as select * from t4")
+.run("insert into table t4 values (20)")
+.dump(primaryDbName, 
tupleIncWithExternalBootstrap.lastReplicationId, dumpWithClause);
+
+// Set previous dump as bootstrap to be rolled-back. Now, new bootstrap 
should overwrite the old one.
+loadWithClause.add("'" + REPL_ROLLBACK_BOOTSTRAP_LOAD_CONFIG + "'='"
++ tupleIncWithExternalBootstrap.dumpLocation + "'");
+replica.load(replicatedDbName, 
tupleNewIncWithExternalBootstrap.dumpLocation, loadWithClause)
+.run("use " + replicatedDbName)
+.run("show tables like 't1'")
+.verifyFailure(new String[]{"t1"})
+.run("select id from t2")
+.verifyResult("10")
+.run("select id from t4")
+.verifyResults(Arrays.asList("10", "20"))
+.run("select id from t5")
+.verifyResult("10");

[jira] [Commented] (HIVE-1555) JDBC Storage Handler

2019-03-04 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783654#comment-16783654
 ] 

Ruslan Dautkhanov commented on HIVE-1555:
-

Is this possible to store password in a Hadoop credential store?
Metadata is visible to all users.. 
Alternatively Hive should always return password field redacted for commands 
like `show create table`.


> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Reporter: Bob Robertson
>Assignee: Gunther Hagleitner
>Priority: Major
>  Labels: TODOC2.2
> Fix For: 2.3.0
>
> Attachments: HIVE-1555.7.patch, HIVE-1555.8.patch, HIVE-1555.9.patch, 
> JDBCStorageHandler Design Doc.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207353
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 04/Mar/19 18:30
Start Date: 04/Mar/19 18:30
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #551: HIVE-21286: 
Hive should support clean-up of previously bootstrapped tables when retry from 
different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262186354
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -279,6 +292,72 @@ a database ( directory )
 return 0;
   }
 
+  /**
+   * Cleanup/drop tables from the given database which are bootstrapped by 
input dump dir.
+   * @throws HiveException Failed to drop the tables.
+   * @throws IOException File operations failure.
+   * @throws InvalidInputException Invalid input dump directory.
+   */
+  private void bootstrapRollbackTask() throws HiveException, IOException, 
InvalidInputException {
+Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback)
+.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build();
+FileSystem fs = bootstrapDirectory.getFileSystem(conf);
+
+if (!fs.exists(bootstrapDirectory)) {
+  throw new InvalidInputException("Input bootstrap dump directory to 
rollback doesn't exist: "
+  + bootstrapDirectory);
+}
+
+FileStatus[] fileStatuses = fs.listStatus(bootstrapDirectory, 
EximUtil.getDirectoryFilter(fs));
+if ((fileStatuses == null) || (fileStatuses.length == 0)) {
+  throw new InvalidInputException("Input bootstrap dump directory to 
rollback is empty: "
+  + bootstrapDirectory);
+}
+
+if (StringUtils.isNotBlank(work.dbNameToLoadIn) && (fileStatuses.length > 
1)) {
+  throw new InvalidInputException("Multiple DB dirs in the dump: " + 
bootstrapDirectory
+  + " is not allowed to load to single target DB: " + 
work.dbNameToLoadIn);
+}
+
+for (FileStatus dbDir : fileStatuses) {
+  Path dbLevelPath = dbDir.getPath();
+  String dbNameInDump = dbLevelPath.getName();
+
+  List tableNames = new ArrayList<>();
+  RemoteIterator filesIterator = 
fs.listFiles(dbLevelPath, true);
+  while (filesIterator.hasNext()) {
+Path nextFile = filesIterator.next().getPath();
+String filePath = nextFile.toString();
+if (filePath.endsWith(EximUtil.METADATA_NAME)) {
+  // Remove dbLevelPath from the current path to check if this 
_metadata file is under DB or
+  // table level directory.
+  String replacedString = filePath.replace(dbLevelPath.toString(), "");
+  if (!replacedString.equalsIgnoreCase(EximUtil.METADATA_NAME)) {
+tableNames.add(nextFile.getParent().getName());
+  }
+}
+  }
+
+  // No tables listed in the DB level directory to be dropped.
+  if (tableNames.isEmpty()) {
+LOG.info("No tables are listed to be dropped for Database: {} in 
bootstrap dump: {}",
+dbNameInDump, bootstrapDirectory);
+continue;
+  }
+
+  // Drop all tables bootstrapped from previous dump.
+  // Get the target DB in which previously bootstrapped tables to be 
dropped. If user specified
+  // DB name as input in REPL LOAD command, then use it.
+  String dbName = (StringUtils.isNotBlank(work.dbNameToLoadIn) ? 
work.dbNameToLoadIn : dbNameInDump);
+
+  Hive db = getHive();
+  for (String table : tableNames) {
+db.dropTable(dbName + "." + table, true);
 
 Review comment:
   Well, I think my current test case covers this case. So, it's done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 207353)
Time Spent: 1h 40m  (was: 1.5h)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 

[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207350
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 04/Mar/19 18:28
Start Date: 04/Mar/19 18:28
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #551: HIVE-21286: 
Hive should support clean-up of previously bootstrapped tables when retry from 
different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262185564
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -66,6 +66,10 @@
   // tasks.
   public static final String REPL_CURRENT_TBL_WRITE_ID = 
"hive.repl.current.table.write.id";
 
+  // Configuration to be received via WITH clause of REPL LOAD to rollback any 
previously failed
+  // bootstrap load.
+  public static final String REPL_ROLLBACK_BOOTSTRAP_LOAD_CONFIG = 
"hive.repl.rollback.bootstrap.load";
 
 Review comment:
   Nope. It works only with incremental dump.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 207350)
Time Spent: 1.5h  (was: 1h 20m)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.rollback.bootstrap.load=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207349
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 04/Mar/19 18:27
Start Date: 04/Mar/19 18:27
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #551: HIVE-21286: 
Hive should support clean-up of previously bootstrapped tables when retry from 
different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262185344
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -279,6 +292,72 @@ a database ( directory )
 return 0;
   }
 
+  /**
+   * Cleanup/drop tables from the given database which are bootstrapped by 
input dump dir.
+   * @throws HiveException Failed to drop the tables.
+   * @throws IOException File operations failure.
+   * @throws InvalidInputException Invalid input dump directory.
+   */
+  private void bootstrapRollbackTask() throws HiveException, IOException, 
InvalidInputException {
+Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback)
+.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build();
+FileSystem fs = bootstrapDirectory.getFileSystem(conf);
+
+if (!fs.exists(bootstrapDirectory)) {
+  throw new InvalidInputException("Input bootstrap dump directory to 
rollback doesn't exist: "
+  + bootstrapDirectory);
+}
+
+FileStatus[] fileStatuses = fs.listStatus(bootstrapDirectory, 
EximUtil.getDirectoryFilter(fs));
+if ((fileStatuses == null) || (fileStatuses.length == 0)) {
+  throw new InvalidInputException("Input bootstrap dump directory to 
rollback is empty: "
+  + bootstrapDirectory);
+}
+
+if (StringUtils.isNotBlank(work.dbNameToLoadIn) && (fileStatuses.length > 
1)) {
+  throw new InvalidInputException("Multiple DB dirs in the dump: " + 
bootstrapDirectory
+  + " is not allowed to load to single target DB: " + 
work.dbNameToLoadIn);
+}
+
+for (FileStatus dbDir : fileStatuses) {
+  Path dbLevelPath = dbDir.getPath();
+  String dbNameInDump = dbLevelPath.getName();
+
+  List tableNames = new ArrayList<>();
+  RemoteIterator filesIterator = 
fs.listFiles(dbLevelPath, true);
+  while (filesIterator.hasNext()) {
+Path nextFile = filesIterator.next().getPath();
+String filePath = nextFile.toString();
+if (filePath.endsWith(EximUtil.METADATA_NAME)) {
+  // Remove dbLevelPath from the current path to check if this 
_metadata file is under DB or
+  // table level directory.
+  String replacedString = filePath.replace(dbLevelPath.toString(), "");
+  if (!replacedString.equalsIgnoreCase(EximUtil.METADATA_NAME)) {
+tableNames.add(nextFile.getParent().getName());
+  }
+}
+  }
+
+  // No tables listed in the DB level directory to be dropped.
+  if (tableNames.isEmpty()) {
+LOG.info("No tables are listed to be dropped for Database: {} in 
bootstrap dump: {}",
+dbNameInDump, bootstrapDirectory);
+continue;
+  }
+
+  // Drop all tables bootstrapped from previous dump.
+  // Get the target DB in which previously bootstrapped tables to be 
dropped. If user specified
+  // DB name as input in REPL LOAD command, then use it.
+  String dbName = (StringUtils.isNotBlank(work.dbNameToLoadIn) ? 
work.dbNameToLoadIn : dbNameInDump);
+
+  Hive db = getHive();
+  for (String table : tableNames) {
+db.dropTable(dbName + "." + table, true);
 
 Review comment:
   Dropping an external table doesn't delete the directory. Also, managed table 
doesn't use the same location as external tables and so there won't be any 
conflict.
   So, re-bootstrap, drop the old external tables and incremental will create 
the new managed table and the operation will be successful. Probably, I will 
add a test to cover this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 207349)
Time Spent: 1h 20m  (was: 1h 10m)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects 

[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207344=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207344
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 04/Mar/19 18:24
Start Date: 04/Mar/19 18:24
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #551: HIVE-21286: 
Hive should support clean-up of previously bootstrapped tables when retry from 
different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262184013
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -279,6 +292,72 @@ a database ( directory )
 return 0;
   }
 
+  /**
+   * Cleanup/drop tables from the given database which are bootstrapped by 
input dump dir.
+   * @throws HiveException Failed to drop the tables.
+   * @throws IOException File operations failure.
+   * @throws InvalidInputException Invalid input dump directory.
+   */
+  private void bootstrapRollbackTask() throws HiveException, IOException, 
InvalidInputException {
+Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback)
+.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build();
+FileSystem fs = bootstrapDirectory.getFileSystem(conf);
+
+if (!fs.exists(bootstrapDirectory)) {
+  throw new InvalidInputException("Input bootstrap dump directory to 
rollback doesn't exist: "
+  + bootstrapDirectory);
 
 Review comment:
   This feature is not specific for external tables. The idea is to rollback 
the tables bootstrapped from given dump irrespective of external or acid or 
even table level replication.
   We expect the input dump to be bootstrap combined in incremental dump. If 
full bootstrap dump is specified, it throw exception. I will add a test to see 
if any other dump is specified, then repl load should fail.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 207344)
Time Spent: 1h  (was: 50m)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.rollback.bootstrap.load=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207346=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207346
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 04/Mar/19 18:24
Start Date: 04/Mar/19 18:24
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #551: HIVE-21286: 
Hive should support clean-up of previously bootstrapped tables when retry from 
different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262184237
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -279,6 +292,72 @@ a database ( directory )
 return 0;
   }
 
+  /**
+   * Cleanup/drop tables from the given database which are bootstrapped by 
input dump dir.
+   * @throws HiveException Failed to drop the tables.
+   * @throws IOException File operations failure.
+   * @throws InvalidInputException Invalid input dump directory.
+   */
+  private void bootstrapRollbackTask() throws HiveException, IOException, 
InvalidInputException {
+Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback)
+.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build();
+FileSystem fs = bootstrapDirectory.getFileSystem(conf);
+
+if (!fs.exists(bootstrapDirectory)) {
+  throw new InvalidInputException("Input bootstrap dump directory to 
rollback doesn't exist: "
+  + bootstrapDirectory);
+}
+
+FileStatus[] fileStatuses = fs.listStatus(bootstrapDirectory, 
EximUtil.getDirectoryFilter(fs));
+if ((fileStatuses == null) || (fileStatuses.length == 0)) {
+  throw new InvalidInputException("Input bootstrap dump directory to 
rollback is empty: "
+  + bootstrapDirectory);
+}
+
+if (StringUtils.isNotBlank(work.dbNameToLoadIn) && (fileStatuses.length > 
1)) {
+  throw new InvalidInputException("Multiple DB dirs in the dump: " + 
bootstrapDirectory
+  + " is not allowed to load to single target DB: " + 
work.dbNameToLoadIn);
+}
+
+for (FileStatus dbDir : fileStatuses) {
 
 Review comment:
   If work.dbNameToLoadIn is empty or null, then there can be multiple DB 
directories. So, it will be array in that case. We cannot avoid the loop.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 207346)
Time Spent: 1h 10m  (was: 1h)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.rollback.bootstrap.load=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21152) Rewrite if expression to case and recognize simple case as an if

2019-03-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783621#comment-16783621
 ] 

Hive QA commented on HIVE-21152:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961016/HIVE-21152.04.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 15818 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explode_null] 
(batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_write_correct_definition_levels]
 (batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf6] (batchId=57)
org.apache.hive.service.server.TestInformationSchemaWithPrivilege.test 
(batchId=259)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16325/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16325/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16325/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961016 - PreCommit-HIVE-Build

> Rewrite if expression to case and recognize simple case as an if
> 
>
> Key: HIVE-21152
> URL: https://issues.apache.org/jira/browse/HIVE-21152
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21152.01.patch, HIVE-21152.02.patch, 
> HIVE-21152.03.patch, HIVE-21152.04.patch
>
>
> * {{IF}} is not part of the sql standard; however given its special form its 
> simpler - and currently in Hive it also has vectorized support
> * people writing standard sql may write: {{CASE WHEN member=1 THEN attr+1 
> else attr+2 END}} which is essentially an if.
> The idea is to rewrite IFs to CASEs for the cbo; and recognize simple 
> "CASE"-s as IFs to get vectorization on them if possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21152) Rewrite if expression to case and recognize simple case as an if

2019-03-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783595#comment-16783595
 ] 

Hive QA commented on HIVE-21152:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
 4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
45s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 50s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16325/dev-support/hive-personality.sh
 |
| git revision | master / f51f108 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16325/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Rewrite if expression to case and recognize simple case as an if
> 
>
> Key: HIVE-21152
> URL: https://issues.apache.org/jira/browse/HIVE-21152
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21152.01.patch, HIVE-21152.02.patch, 
> HIVE-21152.03.patch, HIVE-21152.04.patch
>
>
> * {{IF}} is not part of the sql standard; however given its special form its 
> simpler - and currently in Hive it also has vectorized support
> * people writing standard sql may write: {{CASE WHEN member=1 THEN attr+1 
> else attr+2 END}} which is essentially an if.
> The idea is to rewrite IFs to CASEs for the cbo; and recognize simple 
> "CASE"-s as IFs to get vectorization on them if possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21379) Mask password in DDL commands for table properties

2019-03-04 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned HIVE-21379:
-


> Mask password in DDL commands for table properties
> --
>
> Key: HIVE-21379
> URL: https://issues.apache.org/jira/browse/HIVE-21379
> Project: Hive
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21379.1.patch
>
>
> We need to mask password related table properties (such as 
> hive.sql.dbcp.password) in DDL output, such as describe extended/describe 
> formatted/show create table/show tblproperties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21379) Mask password in DDL commands for table properties

2019-03-04 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-21379:
--
Status: Patch Available  (was: Open)

> Mask password in DDL commands for table properties
> --
>
> Key: HIVE-21379
> URL: https://issues.apache.org/jira/browse/HIVE-21379
> Project: Hive
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21379.1.patch
>
>
> We need to mask password related table properties (such as 
> hive.sql.dbcp.password) in DDL output, such as describe extended/describe 
> formatted/show create table/show tblproperties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21376) Incompatible change in Hive bucket computation

2019-03-04 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-21376:
---
Attachment: HIVE-21376.patch

> Incompatible change in Hive bucket computation
> --
>
> Key: HIVE-21376
> URL: https://issues.apache.org/jira/browse/HIVE-21376
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: David Phillips
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21376.patch
>
>
> HIVE-20007 seems to have inadvertently changed the bucket hash code 
> computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the 
> {{DATE}} and {{TIMESTAMP}} data type2.
> {{DATE}} was previously computed using {{DateWritable}}, which uses 
> {{daysSinceEpoch}} as the hash code. It is now computed using 
> {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} 
> (which is not days since epoch).
> {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses 
> {{TimestampWritableV2}}. They ostensibly use the same hash code computation, 
> but there are two important differences:
>  # {{TimestampWritable}} rounds the number of milliseconds into the seconds 
> portion of the computation, but {{TimestampWritableV2}} does not.
>  # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, 
> which returns it relative to the JVM time zone, not UTC. 
> {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC.
> I was unable to get Hive 3.1 running in order to verify if this actually 
> causes data to be read or written incorrectly (there may be code above this 
> library method which makes things work correctly). However, if my 
> understanding is correct, this means Hive 3.1 is both forwards and backwards 
> incompatible with bucketed tables using either of these data types. It also 
> indicates that Hive needs tests to verify that the hash code does not change 
> between releases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21376) Incompatible change in Hive bucket computation

2019-03-04 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-21376:
---
Status: Patch Available  (was: In Progress)

> Incompatible change in Hive bucket computation
> --
>
> Key: HIVE-21376
> URL: https://issues.apache.org/jira/browse/HIVE-21376
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: David Phillips
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> HIVE-20007 seems to have inadvertently changed the bucket hash code 
> computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the 
> {{DATE}} and {{TIMESTAMP}} data type2.
> {{DATE}} was previously computed using {{DateWritable}}, which uses 
> {{daysSinceEpoch}} as the hash code. It is now computed using 
> {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} 
> (which is not days since epoch).
> {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses 
> {{TimestampWritableV2}}. They ostensibly use the same hash code computation, 
> but there are two important differences:
>  # {{TimestampWritable}} rounds the number of milliseconds into the seconds 
> portion of the computation, but {{TimestampWritableV2}} does not.
>  # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, 
> which returns it relative to the JVM time zone, not UTC. 
> {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC.
> I was unable to get Hive 3.1 running in order to verify if this actually 
> causes data to be read or written incorrectly (there may be code above this 
> library method which makes things work correctly). However, if my 
> understanding is correct, this means Hive 3.1 is both forwards and backwards 
> incompatible with bucketed tables using either of these data types. It also 
> indicates that Hive needs tests to verify that the hash code does not change 
> between releases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HIVE-21376) Incompatible change in Hive bucket computation

2019-03-04 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-21376 started by Jesus Camacho Rodriguez.
--
> Incompatible change in Hive bucket computation
> --
>
> Key: HIVE-21376
> URL: https://issues.apache.org/jira/browse/HIVE-21376
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: David Phillips
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> HIVE-20007 seems to have inadvertently changed the bucket hash code 
> computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the 
> {{DATE}} and {{TIMESTAMP}} data type2.
> {{DATE}} was previously computed using {{DateWritable}}, which uses 
> {{daysSinceEpoch}} as the hash code. It is now computed using 
> {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} 
> (which is not days since epoch).
> {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses 
> {{TimestampWritableV2}}. They ostensibly use the same hash code computation, 
> but there are two important differences:
>  # {{TimestampWritable}} rounds the number of milliseconds into the seconds 
> portion of the computation, but {{TimestampWritableV2}} does not.
>  # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, 
> which returns it relative to the JVM time zone, not UTC. 
> {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC.
> I was unable to get Hive 3.1 running in order to verify if this actually 
> causes data to be read or written incorrectly (there may be code above this 
> library method which makes things work correctly). However, if my 
> understanding is correct, this means Hive 3.1 is both forwards and backwards 
> incompatible with bucketed tables using either of these data types. It also 
> indicates that Hive needs tests to verify that the hash code does not change 
> between releases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18728) Secure webHCat with SSL

2019-03-04 Thread Oleksiy Sayankin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksiy Sayankin updated HIVE-18728:

Status: In Progress  (was: Patch Available)

> Secure webHCat with SSL
> ---
>
> Key: HIVE-18728
> URL: https://issues.apache.org/jira/browse/HIVE-18728
> Project: Hive
>  Issue Type: New Feature
>  Components: Security
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HIVE-18728.1.patch, HIVE-18728.2.patch, 
> HIVE-18728.3.patch
>
>
> Doc for the issue:
> *Configure WebHCat server to use SSL encryption*
> You can configure WebHCat REST-API to use SSL (Secure Sockets Layer) 
> encryption. The following WebHCat properties are added to enable SSL. 
> {{templeton.use.ssl}}
> Default value: {{false}}
> Description: Set this to true for using SSL encryption for  WebHCat server
> {{templeton.keystore.path}}
> Default value: {{}}
> Description: SSL certificate keystore location for WebHCat server
> {{templeton.keystore.password}}
> Default value: {{}}
> Description: SSL certificate keystore password for WebHCat server
> {{templeton.ssl.protocol.blacklist}}
> Default value: {{SSLv2,SSLv3}}
> Description: SSL Versions to disable for WebHCat server
> {{templeton.host}}
> Default value: {{0.0.0.0}}
> Description: The host address the WebHCat server will listen on.
> *Modifying the {{webhcat-site.xml}} file*
> Configure the following properties in the {{webhcat-site.xml}} file to enable 
> SSL encryption on each node where WebHCat is installed: 
> {code}
> 
> 
>   templeton.use.ssl
>   true
> 
> 
>   templeton.keystore.path
>   /path/to/ssl_keystore
> 
> 
>   templeton.keystore.password
>   password
> 
> {code}
> *Example:* To check status of WebHCat server configured for SSL encryption 
> use following command
> {code}
> curl -k 'https://:@:50111/templeton/v1/status'
> {code}
> replace {{}} and {{}} with valid user/password.  Replace 
> {{}} with your host name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18728) Secure webHCat with SSL

2019-03-04 Thread Oleksiy Sayankin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksiy Sayankin updated HIVE-18728:

Status: Patch Available  (was: In Progress)

> Secure webHCat with SSL
> ---
>
> Key: HIVE-18728
> URL: https://issues.apache.org/jira/browse/HIVE-18728
> Project: Hive
>  Issue Type: New Feature
>  Components: Security
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HIVE-18728.1.patch, HIVE-18728.2.patch, 
> HIVE-18728.3.patch
>
>
> Doc for the issue:
> *Configure WebHCat server to use SSL encryption*
> You can configure WebHCat REST-API to use SSL (Secure Sockets Layer) 
> encryption. The following WebHCat properties are added to enable SSL. 
> {{templeton.use.ssl}}
> Default value: {{false}}
> Description: Set this to true for using SSL encryption for  WebHCat server
> {{templeton.keystore.path}}
> Default value: {{}}
> Description: SSL certificate keystore location for WebHCat server
> {{templeton.keystore.password}}
> Default value: {{}}
> Description: SSL certificate keystore password for WebHCat server
> {{templeton.ssl.protocol.blacklist}}
> Default value: {{SSLv2,SSLv3}}
> Description: SSL Versions to disable for WebHCat server
> {{templeton.host}}
> Default value: {{0.0.0.0}}
> Description: The host address the WebHCat server will listen on.
> *Modifying the {{webhcat-site.xml}} file*
> Configure the following properties in the {{webhcat-site.xml}} file to enable 
> SSL encryption on each node where WebHCat is installed: 
> {code}
> 
> 
>   templeton.use.ssl
>   true
> 
> 
>   templeton.keystore.path
>   /path/to/ssl_keystore
> 
> 
>   templeton.keystore.password
>   password
> 
> {code}
> *Example:* To check status of WebHCat server configured for SSL encryption 
> use following command
> {code}
> curl -k 'https://:@:50111/templeton/v1/status'
> {code}
> replace {{}} and {{}} with valid user/password.  Replace 
> {{}} with your host name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21379) Mask password in DDL commands for table properties

2019-03-04 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-21379:
--
Attachment: HIVE-21379.1.patch

> Mask password in DDL commands for table properties
> --
>
> Key: HIVE-21379
> URL: https://issues.apache.org/jira/browse/HIVE-21379
> Project: Hive
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21379.1.patch
>
>
> We need to mask password related table properties (such as 
> hive.sql.dbcp.password) in DDL output, such as describe extended/describe 
> formatted/show create table/show tblproperties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21001) Upgrade to calcite-1.18

2019-03-04 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21001:

Attachment: HIVE-21001.43.patch

> Upgrade to calcite-1.18
> ---
>
> Key: HIVE-21001
> URL: https://issues.apache.org/jira/browse/HIVE-21001
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21001.01.patch, HIVE-21001.01.patch, 
> HIVE-21001.02.patch, HIVE-21001.03.patch, HIVE-21001.04.patch, 
> HIVE-21001.05.patch, HIVE-21001.06.patch, HIVE-21001.06.patch, 
> HIVE-21001.07.patch, HIVE-21001.08.patch, HIVE-21001.08.patch, 
> HIVE-21001.08.patch, HIVE-21001.09.patch, HIVE-21001.09.patch, 
> HIVE-21001.09.patch, HIVE-21001.10.patch, HIVE-21001.11.patch, 
> HIVE-21001.12.patch, HIVE-21001.13.patch, HIVE-21001.15.patch, 
> HIVE-21001.16.patch, HIVE-21001.17.patch, HIVE-21001.18.patch, 
> HIVE-21001.18.patch, HIVE-21001.19.patch, HIVE-21001.20.patch, 
> HIVE-21001.21.patch, HIVE-21001.22.patch, HIVE-21001.22.patch, 
> HIVE-21001.22.patch, HIVE-21001.23.patch, HIVE-21001.24.patch, 
> HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.26.patch, 
> HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.27.patch, 
> HIVE-21001.28.patch, HIVE-21001.29.patch, HIVE-21001.29.patch, 
> HIVE-21001.30.patch, HIVE-21001.31.patch, HIVE-21001.32.patch, 
> HIVE-21001.34.patch, HIVE-21001.35.patch, HIVE-21001.36.patch, 
> HIVE-21001.37.patch, HIVE-21001.38.patch, HIVE-21001.39.patch, 
> HIVE-21001.40.patch, HIVE-21001.41.patch, HIVE-21001.42.patch, 
> HIVE-21001.43.patch
>
>
> XLEAR LIBRARY CACHE 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21152) Rewrite if expression to case and recognize simple case as an if

2019-03-04 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21152:

Attachment: HIVE-21152.04.patch

> Rewrite if expression to case and recognize simple case as an if
> 
>
> Key: HIVE-21152
> URL: https://issues.apache.org/jira/browse/HIVE-21152
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21152.01.patch, HIVE-21152.02.patch, 
> HIVE-21152.03.patch, HIVE-21152.04.patch
>
>
> * {{IF}} is not part of the sql standard; however given its special form its 
> simpler - and currently in Hive it also has vectorized support
> * people writing standard sql may write: {{CASE WHEN member=1 THEN attr+1 
> else attr+2 END}} which is essentially an if.
> The idea is to rewrite IFs to CASEs for the cbo; and recognize simple 
> "CASE"-s as IFs to get vectorization on them if possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207281=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207281
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 04/Mar/19 16:52
Start Date: 04/Mar/19 16:52
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #551: 
HIVE-21286: Hive should support clean-up of previously bootstrapped tables when 
retry from different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262125175
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -279,6 +292,72 @@ a database ( directory )
 return 0;
   }
 
+  /**
+   * Cleanup/drop tables from the given database which are bootstrapped by 
input dump dir.
+   * @throws HiveException Failed to drop the tables.
+   * @throws IOException File operations failure.
+   * @throws InvalidInputException Invalid input dump directory.
+   */
+  private void bootstrapRollbackTask() throws HiveException, IOException, 
InvalidInputException {
+Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback)
+.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build();
+FileSystem fs = bootstrapDirectory.getFileSystem(conf);
+
+if (!fs.exists(bootstrapDirectory)) {
+  throw new InvalidInputException("Input bootstrap dump directory to 
rollback doesn't exist: "
+  + bootstrapDirectory);
 
 Review comment:
   Please add a test case covering this error i.e. when an invalid bootstrap 
dump location is specified. If the specified bootstrap dump (to rollback) 
location exists, how do we know that it is indeed the bootstrap dump location 
for external tables and not some other dump location like a genuine incremental 
dump or a genuine bootstrap dump? We should add testcases for the same as well.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 207281)
Time Spent: 20m  (was: 10m)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.rollback.bootstrap.load=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21371) Make NonSyncByteArrayOutputStream Overflow Conscious

2019-03-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783548#comment-16783548
 ] 

Hive QA commented on HIVE-21371:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961014/HIVE-21371.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16324/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16324/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16324/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2019-03-04 16:49:51.912
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-16324/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-03-04 16:49:51.916
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at f51f108 HIVE-21255: Remove QueryConditionBuilder in 
JdbcStorageHandler (Daniel Dai, reviewed by Jesus Camacho Rodriguez)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at f51f108 HIVE-21255: Remove QueryConditionBuilder in 
JdbcStorageHandler (Daniel Dai, reviewed by Jesus Camacho Rodriguez)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-03-04 16:49:53.248
+ rm -rf ../yetus_PreCommit-HIVE-Build-16324
+ mkdir ../yetus_PreCommit-HIVE-Build-16324
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-16324
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-16324/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: 
a/common/src/java/org/apache/hadoop/hive/common/io/NonSyncByteArrayOutputStream.java:
 does not exist in index
Going to apply patch with: git apply -p1
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
protoc-jar: executing: [/tmp/protoc5836577371602659102.exe, --version]
libprotoc 2.5.0
protoc-jar: executing: [/tmp/protoc5836577371602659102.exe, 
-I/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore,
 
--java_out=/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/target/generated-sources,
 
/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/metastore.proto]
ANTLR Parser Generator  Version 3.5.2
protoc-jar: executing: [/tmp/protoc4575417553313940759.exe, --version]
libprotoc 2.5.0
ANTLR Parser Generator  Version 3.5.2
Output file 
/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-server/target/generated-sources/org/apache/hadoop/hive/metastore/parser/FilterParser.java
 does not exist: must build 
/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/parser/Filter.g
org/apache/hadoop/hive/metastore/parser/Filter.g
log4j:WARN No appenders could be found for logger (DataNucleus.Persistence).
log4j:WARN Please initialize the log4j system properly.
DataNucleus Enhancer (version 4.1.17) for API "JDO"
DataNucleus Enhancer completed with success for 41 classes.
ANTLR Parser Generator  Version 3.5.2
Output file 
/data/hiveptest/working/apache-github-source-source/ql/target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveLexer.java
 does not exist: must build 

[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207282
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 04/Mar/19 16:52
Start Date: 04/Mar/19 16:52
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #551: 
HIVE-21286: Hive should support clean-up of previously bootstrapped tables when 
retry from different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262120754
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -534,6 +536,90 @@ public void 
bootstrapExternalTablesDuringIncrementalPhase() throws Throwable {
 .verifyResults(Arrays.asList("10", "20"));
   }
 
+  @Test
+  public void retryBootstrapExternalTablesFromDifferentDump() throws Throwable 
{
+List loadWithClause = new ArrayList<>();
+loadWithClause.addAll(externalTableBasePathWithClause());
+
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='false'"
+);
+
+WarehouseInstance.Tuple tupleBootstrapWithoutExternal = primary
+.run("use " + primaryDbName)
+.run("create external table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create external table t2 (place string) partitioned by 
(country string)")
+.run("insert into table t2 partition(country='india') values 
('bangalore')")
+.run("insert into table t2 partition(country='us') values 
('austin')")
+.run("create table t3 as select * from t1")
+.dump(primaryDbName, null, dumpWithClause);
+
+replica.load(replicatedDbName, tupleBootstrapWithoutExternal.dumpLocation, 
loadWithClause)
+.status(replicatedDbName)
+.verifyResult(tupleBootstrapWithoutExternal.lastReplicationId)
+.run("use " + replicatedDbName)
+.run("show tables")
+.verifyResult("t3")
+.run("select id from t3")
+.verifyResult("1");
+
+dumpWithClause = Arrays.asList("'" + 
HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + "'='true'",
+"'" + HiveConf.ConfVars.REPL_BOOTSTRAP_EXTERNAL_TABLES.varname + 
"'='true'");
+WarehouseInstance.Tuple tupleIncWithExternalBootstrap = primary.run("use " 
+ primaryDbName)
+.run("drop table t1")
+.run("create external table t4 (id int)")
+.run("insert into table t4 values (10)")
+.run("create table t5 as select * from t4")
+.dump(primaryDbName, 
tupleBootstrapWithoutExternal.lastReplicationId, dumpWithClause);
+
+// Verify if bootstrapping with same dump is idempotent and return same 
result
+for (int i = 0; i < 2; i++) {
+  replica.load(replicatedDbName, 
tupleIncWithExternalBootstrap.dumpLocation, loadWithClause)
+  .status(replicatedDbName)
+  .verifyResult(tupleIncWithExternalBootstrap.lastReplicationId)
+  .run("use " + replicatedDbName)
+  .run("show tables like 't1'")
+  .verifyFailure(new String[]{"t1"})
+  .run("select place from t2 where country = 'us'")
+  .verifyResult("austin")
+  .run("select id from t4")
+  .verifyResult("10")
+  .run("select id from t5")
+  .verifyResult("10");
+}
+
+// Drop an external table, add another managed table with same name, 
insert into existing external table
+// and dump another bootstrap dump for external tables.
+WarehouseInstance.Tuple tupleNewIncWithExternalBootstrap = 
primary.run("use " + primaryDbName)
+.run("insert into table t2 partition(country='india') values 
('chennai')")
+.run("drop table t2")
+.run("create table t2 as select * from t4")
+.run("insert into table t4 values (20)")
+.dump(primaryDbName, 
tupleIncWithExternalBootstrap.lastReplicationId, dumpWithClause);
+
+// Set previous dump as bootstrap to be rolled-back. Now, new bootstrap 
should overwrite the old one.
+loadWithClause.add("'" + REPL_ROLLBACK_BOOTSTRAP_LOAD_CONFIG + "'='"
++ tupleIncWithExternalBootstrap.dumpLocation + "'");
 
 Review comment:
   Please add a testcase which tests the bootstrapping when the previous 
bootstrap has failed halfway i.e. it has loaded some external tables but not 
all. This way we will know what happens when the re-bootstrap tries to remove 
an external table which wasn't loaded in the previous bootstrap load.
 

This is an automated message from the Apache Git Service.
To respond to the 

[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207285=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207285
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 04/Mar/19 16:52
Start Date: 04/Mar/19 16:52
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #551: 
HIVE-21286: Hive should support clean-up of previously bootstrapped tables when 
retry from different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262141481
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -279,6 +292,72 @@ a database ( directory )
 return 0;
   }
 
+  /**
+   * Cleanup/drop tables from the given database which are bootstrapped by 
input dump dir.
+   * @throws HiveException Failed to drop the tables.
+   * @throws IOException File operations failure.
+   * @throws InvalidInputException Invalid input dump directory.
+   */
+  private void bootstrapRollbackTask() throws HiveException, IOException, 
InvalidInputException {
+Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback)
+.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build();
+FileSystem fs = bootstrapDirectory.getFileSystem(conf);
+
+if (!fs.exists(bootstrapDirectory)) {
+  throw new InvalidInputException("Input bootstrap dump directory to 
rollback doesn't exist: "
+  + bootstrapDirectory);
+}
+
+FileStatus[] fileStatuses = fs.listStatus(bootstrapDirectory, 
EximUtil.getDirectoryFilter(fs));
+if ((fileStatuses == null) || (fileStatuses.length == 0)) {
+  throw new InvalidInputException("Input bootstrap dump directory to 
rollback is empty: "
+  + bootstrapDirectory);
+}
+
+if (StringUtils.isNotBlank(work.dbNameToLoadIn) && (fileStatuses.length > 
1)) {
+  throw new InvalidInputException("Multiple DB dirs in the dump: " + 
bootstrapDirectory
+  + " is not allowed to load to single target DB: " + 
work.dbNameToLoadIn);
+}
+
+for (FileStatus dbDir : fileStatuses) {
+  Path dbLevelPath = dbDir.getPath();
+  String dbNameInDump = dbLevelPath.getName();
+
+  List tableNames = new ArrayList<>();
+  RemoteIterator filesIterator = 
fs.listFiles(dbLevelPath, true);
+  while (filesIterator.hasNext()) {
+Path nextFile = filesIterator.next().getPath();
+String filePath = nextFile.toString();
+if (filePath.endsWith(EximUtil.METADATA_NAME)) {
+  // Remove dbLevelPath from the current path to check if this 
_metadata file is under DB or
+  // table level directory.
+  String replacedString = filePath.replace(dbLevelPath.toString(), "");
+  if (!replacedString.equalsIgnoreCase(EximUtil.METADATA_NAME)) {
+tableNames.add(nextFile.getParent().getName());
+  }
+}
+  }
+
+  // No tables listed in the DB level directory to be dropped.
+  if (tableNames.isEmpty()) {
+LOG.info("No tables are listed to be dropped for Database: {} in 
bootstrap dump: {}",
+dbNameInDump, bootstrapDirectory);
+continue;
+  }
+
+  // Drop all tables bootstrapped from previous dump.
+  // Get the target DB in which previously bootstrapped tables to be 
dropped. If user specified
+  // DB name as input in REPL LOAD command, then use it.
+  String dbName = (StringUtils.isNotBlank(work.dbNameToLoadIn) ? 
work.dbNameToLoadIn : dbNameInDump);
+
+  Hive db = getHive();
+  for (String table : tableNames) {
+db.dropTable(dbName + "." + table, true);
 
 Review comment:
   What happens to the underlying external table directory when an external 
table is dropped? Consider the case in the test where an external table is 
dropped and a managed table with the same name is created. Following sequence 
of events would leave a dangling external table directory in the file system
   1.  an external table is created
   2. an incremental bootstrap dump is taken
   3. external table is dropped and a managed table with the same name is 
created
   4. previous incremental bootstrap dump fails to load after it has created 
the external table directory and copied files
   5. a new incremental bootstrap dump is taken and loaded with location of the 
previous incremental bootstrap dump specified.
   6. new incremental bootstrap dump is loaded
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 207285)

> Hive should 

[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207284
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 04/Mar/19 16:52
Start Date: 04/Mar/19 16:52
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #551: 
HIVE-21286: Hive should support clean-up of previously bootstrapped tables when 
retry from different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262147678
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -534,6 +536,90 @@ public void 
bootstrapExternalTablesDuringIncrementalPhase() throws Throwable {
 .verifyResults(Arrays.asList("10", "20"));
   }
 
+  @Test
+  public void retryBootstrapExternalTablesFromDifferentDump() throws Throwable 
{
+List loadWithClause = new ArrayList<>();
+loadWithClause.addAll(externalTableBasePathWithClause());
+
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='false'"
+);
+
+WarehouseInstance.Tuple tupleBootstrapWithoutExternal = primary
+.run("use " + primaryDbName)
+.run("create external table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create external table t2 (place string) partitioned by 
(country string)")
+.run("insert into table t2 partition(country='india') values 
('bangalore')")
+.run("insert into table t2 partition(country='us') values 
('austin')")
+.run("create table t3 as select * from t1")
+.dump(primaryDbName, null, dumpWithClause);
+
+replica.load(replicatedDbName, tupleBootstrapWithoutExternal.dumpLocation, 
loadWithClause)
+.status(replicatedDbName)
+.verifyResult(tupleBootstrapWithoutExternal.lastReplicationId)
+.run("use " + replicatedDbName)
+.run("show tables")
+.verifyResult("t3")
+.run("select id from t3")
+.verifyResult("1");
+
+dumpWithClause = Arrays.asList("'" + 
HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + "'='true'",
+"'" + HiveConf.ConfVars.REPL_BOOTSTRAP_EXTERNAL_TABLES.varname + 
"'='true'");
+WarehouseInstance.Tuple tupleIncWithExternalBootstrap = primary.run("use " 
+ primaryDbName)
+.run("drop table t1")
+.run("create external table t4 (id int)")
+.run("insert into table t4 values (10)")
+.run("create table t5 as select * from t4")
+.dump(primaryDbName, 
tupleBootstrapWithoutExternal.lastReplicationId, dumpWithClause);
+
+// Verify if bootstrapping with same dump is idempotent and return same 
result
+for (int i = 0; i < 2; i++) {
+  replica.load(replicatedDbName, 
tupleIncWithExternalBootstrap.dumpLocation, loadWithClause)
+  .status(replicatedDbName)
+  .verifyResult(tupleIncWithExternalBootstrap.lastReplicationId)
+  .run("use " + replicatedDbName)
+  .run("show tables like 't1'")
+  .verifyFailure(new String[]{"t1"})
+  .run("select place from t2 where country = 'us'")
+  .verifyResult("austin")
+  .run("select id from t4")
+  .verifyResult("10")
+  .run("select id from t5")
+  .verifyResult("10");
+}
+
+// Drop an external table, add another managed table with same name, 
insert into existing external table
+// and dump another bootstrap dump for external tables.
+WarehouseInstance.Tuple tupleNewIncWithExternalBootstrap = 
primary.run("use " + primaryDbName)
+.run("insert into table t2 partition(country='india') values 
('chennai')")
+.run("drop table t2")
+.run("create table t2 as select * from t4")
+.run("insert into table t4 values (20)")
+.dump(primaryDbName, 
tupleIncWithExternalBootstrap.lastReplicationId, dumpWithClause);
+
+// Set previous dump as bootstrap to be rolled-back. Now, new bootstrap 
should overwrite the old one.
+loadWithClause.add("'" + REPL_ROLLBACK_BOOTSTRAP_LOAD_CONFIG + "'='"
++ tupleIncWithExternalBootstrap.dumpLocation + "'");
+replica.load(replicatedDbName, 
tupleNewIncWithExternalBootstrap.dumpLocation, loadWithClause)
+.run("use " + replicatedDbName)
+.run("show tables like 't1'")
+.verifyFailure(new String[]{"t1"})
+.run("select id from t2")
+.verifyResult("10")
+.run("select id from t4")
+.verifyResults(Arrays.asList("10", "20"))
+.run("select id from t5")
+

[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207286=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207286
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 04/Mar/19 16:52
Start Date: 04/Mar/19 16:52
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #551: 
HIVE-21286: Hive should support clean-up of previously bootstrapped tables when 
retry from different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262144393
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -66,6 +66,10 @@
   // tasks.
   public static final String REPL_CURRENT_TBL_WRITE_ID = 
"hive.repl.current.table.write.id";
 
+  // Configuration to be received via WITH clause of REPL LOAD to rollback any 
previously failed
+  // bootstrap load.
+  public static final String REPL_ROLLBACK_BOOTSTRAP_LOAD_CONFIG = 
"hive.repl.rollback.bootstrap.load";
 
 Review comment:
   Can this option be specified with a regular bootstrap directory?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 207286)
Time Spent: 40m  (was: 0.5h)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.rollback.bootstrap.load=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207283
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 04/Mar/19 16:52
Start Date: 04/Mar/19 16:52
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #551: 
HIVE-21286: Hive should support clean-up of previously bootstrapped tables when 
retry from different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262138331
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -279,6 +292,72 @@ a database ( directory )
 return 0;
   }
 
+  /**
+   * Cleanup/drop tables from the given database which are bootstrapped by 
input dump dir.
+   * @throws HiveException Failed to drop the tables.
+   * @throws IOException File operations failure.
+   * @throws InvalidInputException Invalid input dump directory.
+   */
+  private void bootstrapRollbackTask() throws HiveException, IOException, 
InvalidInputException {
+Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback)
+.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build();
+FileSystem fs = bootstrapDirectory.getFileSystem(conf);
+
+if (!fs.exists(bootstrapDirectory)) {
+  throw new InvalidInputException("Input bootstrap dump directory to 
rollback doesn't exist: "
+  + bootstrapDirectory);
+}
+
+FileStatus[] fileStatuses = fs.listStatus(bootstrapDirectory, 
EximUtil.getDirectoryFilter(fs));
+if ((fileStatuses == null) || (fileStatuses.length == 0)) {
+  throw new InvalidInputException("Input bootstrap dump directory to 
rollback is empty: "
+  + bootstrapDirectory);
+}
+
+if (StringUtils.isNotBlank(work.dbNameToLoadIn) && (fileStatuses.length > 
1)) {
+  throw new InvalidInputException("Multiple DB dirs in the dump: " + 
bootstrapDirectory
+  + " is not allowed to load to single target DB: " + 
work.dbNameToLoadIn);
+}
+
+for (FileStatus dbDir : fileStatuses) {
 
 Review comment:
   Given the above two conditions there's going to be exactly one entry in the 
fileStatuses array. Why do we need a for loop here? We could just get that one 
entry into dbDir and write rest of the code without a loop?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 207283)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.rollback.bootstrap.load=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21001) Upgrade to calcite-1.18

2019-03-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783529#comment-16783529
 ] 

Hive QA commented on HIVE-21001:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961010/HIVE-21001.42.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16323/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16323/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16323/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2019-03-04 16:35:33.530
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-16323/source-prep.txt
+ [[ true == \t\r\u\e ]]
+ rm -rf ivy maven
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-03-04 16:35:34.219
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at f51f108 HIVE-21255: Remove QueryConditionBuilder in 
JdbcStorageHandler (Daniel Dai, reviewed by Jesus Camacho Rodriguez)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at f51f108 HIVE-21255: Remove QueryConditionBuilder in 
JdbcStorageHandler (Daniel Dai, reviewed by Jesus Camacho Rodriguez)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-03-04 16:35:34.957
+ rm -rf ../yetus_PreCommit-HIVE-Build-16323
+ mkdir ../yetus_PreCommit-HIVE-Build-16323
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-16323
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-16323/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: git apply -p0
/data/hiveptest/working/scratch/build.patch:598: trailing whitespace.
explain cbo select * from part_null where 
/data/hiveptest/working/scratch/build.patch:1101: trailing whitespace.
Map 1 
/data/hiveptest/working/scratch/build.patch:1122: trailing whitespace.
Reducer 2 
/data/hiveptest/working/scratch/build.patch:1181: trailing whitespace.
Map 1 
/data/hiveptest/working/scratch/build.patch:1202: trailing whitespace.
Reducer 2 
warning: squelched 60 whitespace errors
warning: 65 lines add whitespace errors.
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
[ERROR] Failed to execute goal on project hive-shims-0.23: Could not resolve 
dependencies for project 
org.apache.hive.shims:hive-shims-0.23:jar:4.0.0-SNAPSHOT: Could not find 
artifact dnsjava:dnsjava:jar:2.1.7 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hive-shims-0.23
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-16323
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961010 - PreCommit-HIVE-Build

> Upgrade to calcite-1.18
> ---
>
> Key: HIVE-21001
> URL: https://issues.apache.org/jira/browse/HIVE-21001
> Project: Hive
>  Issue Type: Improvement
>  

[jira] [Updated] (HIVE-21371) Make NonSyncByteArrayOutputStream Overflow Conscious

2019-03-04 Thread David Mollitor (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-21371:
--
Status: Patch Available  (was: Open)

> Make NonSyncByteArrayOutputStream Overflow Conscious 
> -
>
> Key: HIVE-21371
> URL: https://issues.apache.org/jira/browse/HIVE-21371
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-21371.1.patch, HIVE-21371.2.patch
>
>
> {code:java|title=NonSyncByteArrayOutputStream}
>   private int enLargeBuffer(int increment) {
> int temp = count + increment;
> int newLen = temp;
> if (temp > buf.length) {
>   if ((buf.length << 1) > temp) {
> newLen = buf.length << 1;
>   }
>   byte newbuf[] = new byte[newLen];
>   System.arraycopy(buf, 0, newbuf, 0, count);
>   buf = newbuf;
> }
> return newLen;
>   }
> {code}
> This will fail if the array is 2GB or larger because it will double the size 
> every time without consideration for the 4GB limit on arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21371) Make NonSyncByteArrayOutputStream Overflow Conscious

2019-03-04 Thread David Mollitor (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-21371:
--
Attachment: HIVE-21371.2.patch

> Make NonSyncByteArrayOutputStream Overflow Conscious 
> -
>
> Key: HIVE-21371
> URL: https://issues.apache.org/jira/browse/HIVE-21371
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-21371.1.patch, HIVE-21371.2.patch
>
>
> {code:java|title=NonSyncByteArrayOutputStream}
>   private int enLargeBuffer(int increment) {
> int temp = count + increment;
> int newLen = temp;
> if (temp > buf.length) {
>   if ((buf.length << 1) > temp) {
> newLen = buf.length << 1;
>   }
>   byte newbuf[] = new byte[newLen];
>   System.arraycopy(buf, 0, newbuf, 0, count);
>   buf = newbuf;
> }
> return newLen;
>   }
> {code}
> This will fail if the array is 2GB or larger because it will double the size 
> every time without consideration for the 4GB limit on arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21371) Make NonSyncByteArrayOutputStream Overflow Conscious

2019-03-04 Thread David Mollitor (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-21371:
--
Status: Open  (was: Patch Available)

> Make NonSyncByteArrayOutputStream Overflow Conscious 
> -
>
> Key: HIVE-21371
> URL: https://issues.apache.org/jira/browse/HIVE-21371
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-21371.1.patch, HIVE-21371.2.patch
>
>
> {code:java|title=NonSyncByteArrayOutputStream}
>   private int enLargeBuffer(int increment) {
> int temp = count + increment;
> int newLen = temp;
> if (temp > buf.length) {
>   if ((buf.length << 1) > temp) {
> newLen = buf.length << 1;
>   }
>   byte newbuf[] = new byte[newLen];
>   System.arraycopy(buf, 0, newbuf, 0, count);
>   buf = newbuf;
> }
> return newLen;
>   }
> {code}
> This will fail if the array is 2GB or larger because it will double the size 
> every time without consideration for the 4GB limit on arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query

2019-03-04 Thread Sofia (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783485#comment-16783485
 ] 

Sofia commented on HIVE-21367:
--

The target table is from two different sources :
 * {color:#33}*From SQOOP*{color}: when loading tables we use the following 
code.

{code:java}
sqoop import --connect ${CONNECTION} \
--username ${USER} \
--password ${PASSWORD} \
--table $1 \
--hive-database $2 \
--hive-table ${TBNAME} \
--hive-import \
--as-orcfile \
--hive-overwrite \
-m 1 \
--delete-target-dir 

{code}
 *  *From SPARK*: when processing the data, we store the output as a table in 
hive using the following code.
{code:java}
df.write
  .mode(mode)
  .format(HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR)
  .option("table",tableName)
  .save(){code}
How do we load the data into the root path of the target table in each case ? 

> Hive returns an incorrect result when using a simple select query
> -
>
> Key: HIVE-21367
> URL: https://issues.apache.org/jira/browse/HIVE-21367
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, JDBC, SQL
>Affects Versions: 3.1.0
> Environment:  - HDP 3.1
>   - Hive 3.1.0
>   - Spark 2.3.2
>   - Sqoop 1.4.7
>Reporter: LEMBARKI Mohamed Amine
>Priority: Blocker
> Attachments: mapred_input_dir_recursive.png
>
>
> Hive returns an incorrect result when using a simple select query with a 
> where clause
>  While with an aggregation it returns a correct result
>  The problem arises for tables created by Spark or Sqoop
> Also when we use spark-shell with HiveWarehouseConnector it returns a correct 
> result
>  
> Workflow: 
>      - Loading data with sqoop to hive
>      - Data processing with spark using HiveWarehouseConnector and Storage to 
> Hive
>   
> below the error log :
>  
>  */-* 
>  *1 - Executing Query : select code from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, 
> type:string, comment:null)], properties:null)
> INFO : Completed compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.142 seconds
> INFO : Executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Completed executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.003 seconds
> INFO : OK
> +--+
> | code |
> +--+
> +--+
> No rows selected (4,307 seconds)
> Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
> Closing: 0: 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> {code}
> */-*
> *2 - Executing Query using count :* 
>       *select count(code) from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type 

[jira] [Updated] (HIVE-21001) Upgrade to calcite-1.18

2019-03-04 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21001:

Attachment: HIVE-21001.42.patch

> Upgrade to calcite-1.18
> ---
>
> Key: HIVE-21001
> URL: https://issues.apache.org/jira/browse/HIVE-21001
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21001.01.patch, HIVE-21001.01.patch, 
> HIVE-21001.02.patch, HIVE-21001.03.patch, HIVE-21001.04.patch, 
> HIVE-21001.05.patch, HIVE-21001.06.patch, HIVE-21001.06.patch, 
> HIVE-21001.07.patch, HIVE-21001.08.patch, HIVE-21001.08.patch, 
> HIVE-21001.08.patch, HIVE-21001.09.patch, HIVE-21001.09.patch, 
> HIVE-21001.09.patch, HIVE-21001.10.patch, HIVE-21001.11.patch, 
> HIVE-21001.12.patch, HIVE-21001.13.patch, HIVE-21001.15.patch, 
> HIVE-21001.16.patch, HIVE-21001.17.patch, HIVE-21001.18.patch, 
> HIVE-21001.18.patch, HIVE-21001.19.patch, HIVE-21001.20.patch, 
> HIVE-21001.21.patch, HIVE-21001.22.patch, HIVE-21001.22.patch, 
> HIVE-21001.22.patch, HIVE-21001.23.patch, HIVE-21001.24.patch, 
> HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.26.patch, 
> HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.27.patch, 
> HIVE-21001.28.patch, HIVE-21001.29.patch, HIVE-21001.29.patch, 
> HIVE-21001.30.patch, HIVE-21001.31.patch, HIVE-21001.32.patch, 
> HIVE-21001.34.patch, HIVE-21001.35.patch, HIVE-21001.36.patch, 
> HIVE-21001.37.patch, HIVE-21001.38.patch, HIVE-21001.39.patch, 
> HIVE-21001.40.patch, HIVE-21001.41.patch, HIVE-21001.42.patch
>
>
> XLEAR LIBRARY CACHE 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query

2019-03-04 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783445#comment-16783445
 ] 

star commented on HIVE-21367:
-

Seems it only takes effect in mapreduce, not fetchtask. I have to figure out 
why hive don't support such configuration. Maybe there are other considerations 
I don't notice at moment. By the way, why do you make a subdirectories when 
using sqoop? You can load data to the root path of the target table.

> Hive returns an incorrect result when using a simple select query
> -
>
> Key: HIVE-21367
> URL: https://issues.apache.org/jira/browse/HIVE-21367
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, JDBC, SQL
>Affects Versions: 3.1.0
> Environment:  - HDP 3.1
>   - Hive 3.1.0
>   - Spark 2.3.2
>   - Sqoop 1.4.7
>Reporter: LEMBARKI Mohamed Amine
>Priority: Blocker
> Attachments: mapred_input_dir_recursive.png
>
>
> Hive returns an incorrect result when using a simple select query with a 
> where clause
>  While with an aggregation it returns a correct result
>  The problem arises for tables created by Spark or Sqoop
> Also when we use spark-shell with HiveWarehouseConnector it returns a correct 
> result
>  
> Workflow: 
>      - Loading data with sqoop to hive
>      - Data processing with spark using HiveWarehouseConnector and Storage to 
> Hive
>   
> below the error log :
>  
>  */-* 
>  *1 - Executing Query : select code from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, 
> type:string, comment:null)], properties:null)
> INFO : Completed compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.142 seconds
> INFO : Executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Completed executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.003 seconds
> INFO : OK
> +--+
> | code |
> +--+
> +--+
> No rows selected (4,307 seconds)
> Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
> Closing: 0: 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> {code}
> */-*
> *2 - Executing Query using count :* 
>       *select count(code) from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: 

[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query

2019-03-04 Thread LEMBARKI Mohamed Amine (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783299#comment-16783299
 ] 

LEMBARKI Mohamed Amine commented on HIVE-21367:
---

Hi,

we've set the property mapred.input.dir.recursive to true using Ambari but 
unfortunately the problem is still the same.

is this property concern also FetchTask ?

!mapred_input_dir_recursive.png!

> Hive returns an incorrect result when using a simple select query
> -
>
> Key: HIVE-21367
> URL: https://issues.apache.org/jira/browse/HIVE-21367
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, JDBC, SQL
>Affects Versions: 3.1.0
> Environment:  - HDP 3.1
>   - Hive 3.1.0
>   - Spark 2.3.2
>   - Sqoop 1.4.7
>Reporter: LEMBARKI Mohamed Amine
>Priority: Blocker
> Attachments: mapred_input_dir_recursive.png
>
>
> Hive returns an incorrect result when using a simple select query with a 
> where clause
>  While with an aggregation it returns a correct result
>  The problem arises for tables created by Spark or Sqoop
> Also when we use spark-shell with HiveWarehouseConnector it returns a correct 
> result
>  
> Workflow: 
>      - Loading data with sqoop to hive
>      - Data processing with spark using HiveWarehouseConnector and Storage to 
> Hive
>   
> below the error log :
>  
>  */-* 
>  *1 - Executing Query : select code from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, 
> type:string, comment:null)], properties:null)
> INFO : Completed compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.142 seconds
> INFO : Executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Completed executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.003 seconds
> INFO : OK
> +--+
> | code |
> +--+
> +--+
> No rows selected (4,307 seconds)
> Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
> Closing: 0: 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> {code}
> */-*
> *2 - Executing Query using count :* 
>       *select count(code) from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling 
> 

[jira] [Updated] (HIVE-21367) Hive returns an incorrect result when using a simple select query

2019-03-04 Thread LEMBARKI Mohamed Amine (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LEMBARKI Mohamed Amine updated HIVE-21367:
--
Attachment: mapred_input_dir_recursive.png

> Hive returns an incorrect result when using a simple select query
> -
>
> Key: HIVE-21367
> URL: https://issues.apache.org/jira/browse/HIVE-21367
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, JDBC, SQL
>Affects Versions: 3.1.0
> Environment:  - HDP 3.1
>   - Hive 3.1.0
>   - Spark 2.3.2
>   - Sqoop 1.4.7
>Reporter: LEMBARKI Mohamed Amine
>Priority: Blocker
> Attachments: mapred_input_dir_recursive.png
>
>
> Hive returns an incorrect result when using a simple select query with a 
> where clause
>  While with an aggregation it returns a correct result
>  The problem arises for tables created by Spark or Sqoop
> Also when we use spark-shell with HiveWarehouseConnector it returns a correct 
> result
>  
> Workflow: 
>      - Loading data with sqoop to hive
>      - Data processing with spark using HiveWarehouseConnector and Storage to 
> Hive
>   
> below the error log :
>  
>  */-* 
>  *1 - Executing Query : select code from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, 
> type:string, comment:null)], properties:null)
> INFO : Completed compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.142 seconds
> INFO : Executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Completed executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.003 seconds
> INFO : OK
> +--+
> | code |
> +--+
> +--+
> No rows selected (4,307 seconds)
> Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
> Closing: 0: 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> {code}
> */-*
> *2 - Executing Query using count :* 
>       *select count(code) from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling 
> command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e): 
> select count(code) from db1.tbl1 where code = '123'
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: 

[jira] [Updated] (HIVE-11091) Unable to load data into hive table using "Load data local inapth" command from unix named pipe

2019-03-04 Thread Alexandros (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandros updated HIVE-11091:
--
Priority: Critical  (was: Blocker)

> Unable to load data into hive table using "Load data local inapth" command 
> from unix named pipe
> ---
>
> Key: HIVE-11091
> URL: https://issues.apache.org/jira/browse/HIVE-11091
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0
> Environment: Unix,MacOS
>Reporter: Manoranjan Sahoo
>Assignee: Alexandros
>Priority: Critical
>
> Unable to load data into hive table from unix named pipe in Hive 0.14.0 
> Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0):
> 
> $ mkfifo /tmp/test.txt
> $ hive
> hive> create table test(id bigint,name string);
> OK
> Time taken: 1.018 seconds
> hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test;
> Loading data to table default.test
> Failed with exception addFiles: filesystem error in check phase
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask
> But in Hadoop 1.3 and hive 0.11.0  it works fine:
> hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test;
> Copying data from file:/tmp/test.txt
> Copying file: file:/tmp/test.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-11091) Unable to load data into hive table using "Load data local inapth" command from unix named pipe

2019-03-04 Thread Alexandros (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandros updated HIVE-11091:
--
Priority: Critical  (was: Blocker)

> Unable to load data into hive table using "Load data local inapth" command 
> from unix named pipe
> ---
>
> Key: HIVE-11091
> URL: https://issues.apache.org/jira/browse/HIVE-11091
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0
> Environment: Unix,MacOS
>Reporter: Manoranjan Sahoo
>Assignee: Alexandros
>Priority: Critical
>
> Unable to load data into hive table from unix named pipe in Hive 0.14.0 
> Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0):
> 
> $ mkfifo /tmp/test.txt
> $ hive
> hive> create table test(id bigint,name string);
> OK
> Time taken: 1.018 seconds
> hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test;
> Loading data to table default.test
> Failed with exception addFiles: filesystem error in check phase
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask
> But in Hadoop 1.3 and hive 0.11.0  it works fine:
> hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test;
> Copying data from file:/tmp/test.txt
> Copying file: file:/tmp/test.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21377) Using Oracle as HMS DB with DirectSQL

2019-03-04 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783274#comment-16783274
 ] 

Peter Vary commented on HIVE-21377:
---

[~hibosoon]: Which version of Oracle, and which version of jdbc driver you use? 
Once upon a time I have been testing this codepath on our Oracle implementation 
(do not remember the actual version), and that seemed to be working.

CC: [~karthik.manamcheri]

> Using Oracle as HMS DB with DirectSQL
> -
>
> Key: HIVE-21377
> URL: https://issues.apache.org/jira/browse/HIVE-21377
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Bo 
>Priority: Major
>
> When we use the Oracle as HMS DB, we saw this kind of contents in the HMS log 
> accordingly:
> {code:java}
> 2019-02-02 T08:23:57,102 WARN [Thread-12]: metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(3741)) - Falling back to ORM path due 
> to direct SQL failure (this is not an error): Cannot extract boolean from 
> column value 0 at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.extractSqlBoolean(MetaStoreDirectSql.java:1031)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsFromPartitionIds(MetaStoreDirectSql.java:728)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$300(MetaStoreDirectSql.java:109)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql$1.run(MetaStoreDirectSql.java:471)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:73) 
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:462)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$8.getSqlResult(ObjectStore.java:3392)
> {code}
> In Hive, we handle the Postgres, MySQL and Derby for the extractSqlBoolean.
> But Oracle return the 0 or 1 for Boolean. So we need to modify the 
> MetastoreDirectSqlUtils.java - [1]
> So, could add this snip in this code?
> {code:java}
>   static Boolean extractSqlBoolean(Object value) throws MetaException {
> if (value == null) {
>   return null;
> }
> if (value instanceof Boolean) {
>   return (Boolean)value;
> }
> if (value instanceof Number) { // add
>   try {
> return BooleanUtils.toBooleanObject((Decimal) value, 1, 0, null);
>   } catch(IllegalArugmentExeception iae){
>   // NOOP
>   }
> if (value instanceof String) {
>   try {
> return BooleanUtils.toBooleanObject((String) value, "Y", "N", null);
>   } catch (IllegalArgumentException iae) {
> // NOOP
>   }
> }
> throw new MetaException("Cannot extract boolean from column value " + 
> value);
>   }
> {code}
>  [1] -
> https://github.com/apache/hive/blob/f51f108b761f0c88647f48f30447dae12b308f31/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java#L501-L527
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-11091) Unable to load data into hive table using "Load data local inapth" command from unix named pipe

2019-03-04 Thread Alexandros (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandros updated HIVE-11091:
--
Priority: Blocker  (was: Critical)

> Unable to load data into hive table using "Load data local inapth" command 
> from unix named pipe
> ---
>
> Key: HIVE-11091
> URL: https://issues.apache.org/jira/browse/HIVE-11091
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0
> Environment: Unix,MacOS
>Reporter: Manoranjan Sahoo
>Assignee: Alexandros
>Priority: Blocker
>
> Unable to load data into hive table from unix named pipe in Hive 0.14.0 
> Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0):
> 
> $ mkfifo /tmp/test.txt
> $ hive
> hive> create table test(id bigint,name string);
> OK
> Time taken: 1.018 seconds
> hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test;
> Loading data to table default.test
> Failed with exception addFiles: filesystem error in check phase
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask
> But in Hadoop 1.3 and hive 0.11.0  it works fine:
> hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test;
> Copying data from file:/tmp/test.txt
> Copying file: file:/tmp/test.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-11091) Unable to load data into hive table using "Load data local inapth" command from unix named pipe

2019-03-04 Thread Alexandros (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783273#comment-16783273
 ] 

Alexandros commented on HIVE-11091:
---

Why Blocer


> Unable to load data into hive table using "Load data local inapth" command 
> from unix named pipe
> ---
>
> Key: HIVE-11091
> URL: https://issues.apache.org/jira/browse/HIVE-11091
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0
> Environment: Unix,MacOS
>Reporter: Manoranjan Sahoo
>Assignee: Alexandros
>Priority: Blocker
>
> Unable to load data into hive table from unix named pipe in Hive 0.14.0 
> Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0):
> 
> $ mkfifo /tmp/test.txt
> $ hive
> hive> create table test(id bigint,name string);
> OK
> Time taken: 1.018 seconds
> hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test;
> Loading data to table default.test
> Failed with exception addFiles: filesystem error in check phase
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask
> But in Hadoop 1.3 and hive 0.11.0  it works fine:
> hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test;
> Copying data from file:/tmp/test.txt
> Copying file: file:/tmp/test.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-11091) Unable to load data into hive table using "Load data local inapth" command from unix named pipe

2019-03-04 Thread Alexandros (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandros reassigned HIVE-11091:
-

Assignee: Alexandros

> Unable to load data into hive table using "Load data local inapth" command 
> from unix named pipe
> ---
>
> Key: HIVE-11091
> URL: https://issues.apache.org/jira/browse/HIVE-11091
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0
> Environment: Unix,MacOS
>Reporter: Manoranjan Sahoo
>Assignee: Alexandros
>Priority: Blocker
>
> Unable to load data into hive table from unix named pipe in Hive 0.14.0 
> Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0):
> 
> $ mkfifo /tmp/test.txt
> $ hive
> hive> create table test(id bigint,name string);
> OK
> Time taken: 1.018 seconds
> hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test;
> Loading data to table default.test
> Failed with exception addFiles: filesystem error in check phase
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask
> But in Hadoop 1.3 and hive 0.11.0  it works fine:
> hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test;
> Copying data from file:/tmp/test.txt
> Copying file: file:/tmp/test.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query

2019-03-04 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783264#comment-16783264
 ] 

star commented on HIVE-21367:
-

Basically hive will change a simple select into 'FetchTask' which will be 
executed locally(no map reduce task). While complicated select will be executed 
as a mapreduce( or tez) task, which supports subdirs. FetchTask differ from 
mapreduce。

Setting mapred.input.dir.recursive to true in hive-site.xml is expected to 
solve the problem. 

> Hive returns an incorrect result when using a simple select query
> -
>
> Key: HIVE-21367
> URL: https://issues.apache.org/jira/browse/HIVE-21367
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, JDBC, SQL
>Affects Versions: 3.1.0
> Environment:  - HDP 3.1
>   - Hive 3.1.0
>   - Spark 2.3.2
>   - Sqoop 1.4.7
>Reporter: LEMBARKI Mohamed Amine
>Priority: Blocker
>
> Hive returns an incorrect result when using a simple select query with a 
> where clause
>  While with an aggregation it returns a correct result
>  The problem arises for tables created by Spark or Sqoop
> Also when we use spark-shell with HiveWarehouseConnector it returns a correct 
> result
>  
> Workflow: 
>      - Loading data with sqoop to hive
>      - Data processing with spark using HiveWarehouseConnector and Storage to 
> Hive
>   
> below the error log :
>  
>  */-* 
>  *1 - Executing Query : select code from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, 
> type:string, comment:null)], properties:null)
> INFO : Completed compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.142 seconds
> INFO : Executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Completed executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.003 seconds
> INFO : OK
> +--+
> | code |
> +--+
> +--+
> No rows selected (4,307 seconds)
> Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
> Closing: 0: 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> {code}
> */-*
> *2 - Executing Query using count :* 
>       *select count(code) from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO 

[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query

2019-03-04 Thread Sofia (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783236#comment-16783236
 ] 

Sofia commented on HIVE-21367:
--

Hi [~starphin],  why do hive behave that way and create subdirs when executing 
a simple select ? Is there any workaround for that ?

> Hive returns an incorrect result when using a simple select query
> -
>
> Key: HIVE-21367
> URL: https://issues.apache.org/jira/browse/HIVE-21367
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, JDBC, SQL
>Affects Versions: 3.1.0
> Environment:  - HDP 3.1
>   - Hive 3.1.0
>   - Spark 2.3.2
>   - Sqoop 1.4.7
>Reporter: LEMBARKI Mohamed Amine
>Priority: Blocker
>
> Hive returns an incorrect result when using a simple select query with a 
> where clause
>  While with an aggregation it returns a correct result
>  The problem arises for tables created by Spark or Sqoop
> Also when we use spark-shell with HiveWarehouseConnector it returns a correct 
> result
>  
> Workflow: 
>      - Loading data with sqoop to hive
>      - Data processing with spark using HiveWarehouseConnector and Storage to 
> Hive
>   
> below the error log :
>  
>  */-* 
>  *1 - Executing Query : select code from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, 
> type:string, comment:null)], properties:null)
> INFO : Completed compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.142 seconds
> INFO : Executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Completed executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.003 seconds
> INFO : OK
> +--+
> | code |
> +--+
> +--+
> No rows selected (4,307 seconds)
> Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
> Closing: 0: 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> {code}
> */-*
> *2 - Executing Query using count :* 
>       *select count(code) from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling 
> command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e): 
> select count(code) from db1.tbl1 where code = '123'
> INFO : Semantic Analysis Completed (retrial = false)
> 

[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query

2019-03-04 Thread LEMBARKI Mohamed Amine (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783205#comment-16783205
 ] 

LEMBARKI Mohamed Amine commented on HIVE-21367:
---

Hi,

I just moved the files after tbl1, and it gives a correct result !
{code:java}
[hdfs@data1 ~]$ hadoop fs -cp 
/warehouse/tablespace/managed/hive/db1.db/tbl1/delta_001_001_/* 
/warehouse/tablespace/managed/hive/db1.db/tbl1/
[hdfs@data1 ~] hadoop fs -rm -r 
/warehouse/tablespace/managed/hive/db1.db/tbl1/delta_001_001_
{code}
so the question now is : how hive can support subdirectories ?

> Hive returns an incorrect result when using a simple select query
> -
>
> Key: HIVE-21367
> URL: https://issues.apache.org/jira/browse/HIVE-21367
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, JDBC, SQL
>Affects Versions: 3.1.0
> Environment:  - HDP 3.1
>   - Hive 3.1.0
>   - Spark 2.3.2
>   - Sqoop 1.4.7
>Reporter: LEMBARKI Mohamed Amine
>Priority: Blocker
>
> Hive returns an incorrect result when using a simple select query with a 
> where clause
>  While with an aggregation it returns a correct result
>  The problem arises for tables created by Spark or Sqoop
> Also when we use spark-shell with HiveWarehouseConnector it returns a correct 
> result
>  
> Workflow: 
>      - Loading data with sqoop to hive
>      - Data processing with spark using HiveWarehouseConnector and Storage to 
> Hive
>   
> below the error log :
>  
>  */-* 
>  *1 - Executing Query : select code from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, 
> type:string, comment:null)], properties:null)
> INFO : Completed compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.142 seconds
> INFO : Executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Completed executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.003 seconds
> INFO : OK
> +--+
> | code |
> +--+
> +--+
> No rows selected (4,307 seconds)
> Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
> Closing: 0: 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> {code}
> */-*
> *2 - Executing Query using count :* 
>       *select count(code) from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 

[jira] [Commented] (HIVE-21362) Add an input format and serde to read from protobuf files.

2019-03-04 Thread Harish Jaiprakash (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783184#comment-16783184
 ] 

Harish Jaiprakash commented on HIVE-21362:
--

Test failures not related, it has cleared once earlier. Only fixes were for 
codestyle errors after that. 'whitespace' errors are in generated file. Not 
sure how to exclude that.

> Add an input format and serde to read from protobuf files.
> --
>
> Key: HIVE-21362
> URL: https://issues.apache.org/jira/browse/HIVE-21362
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
>Priority: Critical
> Attachments: HIVE-21362.01.patch, HIVE-21362.02.patch, 
> HIVE-21362.03.patch, HIVE-21362.04.patch, HIVE-21362.05.patch
>
>
> Logs are being generated using the HiveProtoLoggingHook and tez 
> ProtoHistoryLoggingService. These are sequence files written using 
> ProtobufMessageWritable.
> Implement a SerDe and input format to be able to create tables using these 
> files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21312) FSStatsAggregator::connect is slow

2019-03-04 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783163#comment-16783163
 ] 

Zoltan Haindrich commented on HIVE-21312:
-

+1

> FSStatsAggregator::connect is slow
> --
>
> Key: HIVE-21312
> URL: https://issues.apache.org/jira/browse/HIVE-21312
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-21312.1.patch, HIVE-21312.2.patch, 
> HIVE-21312.3.patch, HIVE-21312.4.patch, HIVE-21312.5.patch, HIVE-21312.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21362) Add an input format and serde to read from protobuf files.

2019-03-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783132#comment-16783132
 ] 

Hive QA commented on HIVE-21362:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12960957/HIVE-21362.05.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15823 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16322/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16322/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16322/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12960957 - PreCommit-HIVE-Build

> Add an input format and serde to read from protobuf files.
> --
>
> Key: HIVE-21362
> URL: https://issues.apache.org/jira/browse/HIVE-21362
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
>Priority: Critical
> Attachments: HIVE-21362.01.patch, HIVE-21362.02.patch, 
> HIVE-21362.03.patch, HIVE-21362.04.patch, HIVE-21362.05.patch
>
>
> Logs are being generated using the HiveProtoLoggingHook and tez 
> ProtoHistoryLoggingService. These are sequence files written using 
> ProtobufMessageWritable.
> Implement a SerDe and input format to be able to create tables using these 
> files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query

2019-03-04 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783114#comment-16783114
 ] 

star commented on HIVE-21367:
-

Or you can mv files from subdirs to the root dir of the table. I suspect that 
it is due to subdirs. Hive do not support subdirs by default.

> Hive returns an incorrect result when using a simple select query
> -
>
> Key: HIVE-21367
> URL: https://issues.apache.org/jira/browse/HIVE-21367
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, JDBC, SQL
>Affects Versions: 3.1.0
> Environment:  - HDP 3.1
>   - Hive 3.1.0
>   - Spark 2.3.2
>   - Sqoop 1.4.7
>Reporter: LEMBARKI Mohamed Amine
>Priority: Blocker
>
> Hive returns an incorrect result when using a simple select query with a 
> where clause
>  While with an aggregation it returns a correct result
>  The problem arises for tables created by Spark or Sqoop
> Also when we use spark-shell with HiveWarehouseConnector it returns a correct 
> result
>  
> Workflow: 
>      - Loading data with sqoop to hive
>      - Data processing with spark using HiveWarehouseConnector and Storage to 
> Hive
>   
> below the error log :
>  
>  */-* 
>  *1 - Executing Query : select code from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, 
> type:string, comment:null)], properties:null)
> INFO : Completed compiling 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.142 seconds
> INFO : Executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): 
> select code from db1.tbl1 where code = '123'
> INFO : Completed executing 
> command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); 
> Time taken: 0.003 seconds
> INFO : OK
> +--+
> | code |
> +--+
> +--+
> No rows selected (4,307 seconds)
> Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
> Closing: 0: 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> {code}
> */-*
> *2 - Executing Query using count :* 
>       *select count(code) from db1.tbl1 where code = '123'*
>  */-*
> {code:java}
> [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling 
> command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e): 
> select count(code) from db1.tbl1 where code = '123'
> INFO : Semantic Analysis Completed (retrial = false)

[jira] [Commented] (HIVE-21362) Add an input format and serde to read from protobuf files.

2019-03-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783096#comment-16783096
 ] 

Hive QA commented on HIVE-21362:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m 
21s{color} | {color:blue} standalone-metastore/metastore-server in master has 
179 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
27s{color} | {color:blue} contrib in master has 10 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
49s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
19s{color} | {color:red} itests/hive-unit: The patch generated 1 new + 15 
unchanged - 0 fixed = 16 total (was 15) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
1s{color} | {color:red} The patch has 32 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  xml  compile  findbugs  
checkstyle  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16322/dev-support/hive-personality.sh
 |
| git revision | master / f51f108 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16322/yetus/diff-checkstyle-itests_hive-unit.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16322/yetus/whitespace-eol.txt
 |
| modules | C: standalone-metastore/metastore-server contrib itests/hive-unit 
U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16322/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Add an input format and serde to read from protobuf files.
> --
>
> Key: HIVE-21362
> URL: https://issues.apache.org/jira/browse/HIVE-21362
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
>Priority: Critical
> Attachments: HIVE-21362.01.patch, 

<    1   2