[jira] [Commented] (HIVE-21001) Upgrade to calcite-1.18
[ https://issues.apache.org/jira/browse/HIVE-21001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783739#comment-16783739 ] Hive QA commented on HIVE-21001: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 46s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 11s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 33s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 30s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 36s{color} | {color:blue} accumulo-handler in master has 21 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 38s{color} | {color:blue} hbase-handler in master has 15 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 9m 48s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 47s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 6s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 50s{color} | {color:red} ql: The patch generated 5 new + 290 unchanged - 29 fixed = 295 total (was 319) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 2m 19s{color} | {color:red} root: The patch generated 5 new + 290 unchanged - 29 fixed = 295 total (was 319) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 4m 37s{color} | {color:red} patch/ql cannot run setBugDatabaseInfo from findbugs {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 18s{color} | {color:red} patch/accumulo-handler cannot run setBugDatabaseInfo from findbugs {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 20s{color} | {color:red} patch/hbase-handler cannot run setBugDatabaseInfo from findbugs {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 18m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 90m 31s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc xml compile findbugs checkstyle | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16326/dev-support/hive-personality.sh | | git revision | master / f51f108 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16326/yetus/diff-checkstyle-ql.txt | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16326/yetus/diff-checkstyle-root.txt | | whitespace | http://104.198.109.242/logs//PreCommit-HIVE-Build-16326/yetus/whitespace-eol.txt | | findbugs | http://104.198.109.242/logs//PreCommit-HIVE-Build-16326/yetus/patch-findbugs-ql.txt | | findbugs |
[jira] [Updated] (HIVE-21376) Incompatible change in Hive bucket computation
[ https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-21376: --- Target Version/s: 3.0.1, 4.0.0, 3.2.0, 3.1.2 > Incompatible change in Hive bucket computation > -- > > Key: HIVE-21376 > URL: https://issues.apache.org/jira/browse/HIVE-21376 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: David Phillips >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-21376.patch > > > HIVE-20007 seems to have inadvertently changed the bucket hash code > computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the > {{DATE}} and {{TIMESTAMP}} data type2. > {{DATE}} was previously computed using {{DateWritable}}, which uses > {{daysSinceEpoch}} as the hash code. It is now computed using > {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} > (which is not days since epoch). > {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses > {{TimestampWritableV2}}. They ostensibly use the same hash code computation, > but there are two important differences: > # {{TimestampWritable}} rounds the number of milliseconds into the seconds > portion of the computation, but {{TimestampWritableV2}} does not. > # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, > which returns it relative to the JVM time zone, not UTC. > {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC. > I was unable to get Hive 3.1 running in order to verify if this actually > causes data to be read or written incorrectly (there may be code above this > library method which makes things work correctly). However, if my > understanding is correct, this means Hive 3.1 is both forwards and backwards > incompatible with bucketed tables using either of these data types. It also > indicates that Hive needs tests to verify that the hash code does not change > between releases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21279) Avoid moving/rename operation in FileSink op for SELECT queries
[ https://issues.apache.org/jira/browse/HIVE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-21279: --- Status: Open (was: Patch Available) > Avoid moving/rename operation in FileSink op for SELECT queries > --- > > Key: HIVE-21279 > URL: https://issues.apache.org/jira/browse/HIVE-21279 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-21279.1.patch, HIVE-21279.10.patch, > HIVE-21279.11.patch, HIVE-21279.12.patch, HIVE-21279.13.patch, > HIVE-21279.2.patch, HIVE-21279.3.patch, HIVE-21279.4.patch, > HIVE-21279.5.patch, HIVE-21279.6.patch, HIVE-21279.7.patch, > HIVE-21279.8.patch, HIVE-21279.9.patch > > > Currently at the end of a job FileSink operator moves/rename temp directory > to another directory from which FetchTask fetches result. This is done to > avoid fetching potential partial/invalid files by failed/runway tasks. This > operation is expensive for cloud storage. It could be avoided if FetchTask is > passed on set of files to read from instead of whole directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21279) Avoid moving/rename operation in FileSink op for SELECT queries
[ https://issues.apache.org/jira/browse/HIVE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-21279: --- Status: Patch Available (was: Open) > Avoid moving/rename operation in FileSink op for SELECT queries > --- > > Key: HIVE-21279 > URL: https://issues.apache.org/jira/browse/HIVE-21279 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-21279.1.patch, HIVE-21279.10.patch, > HIVE-21279.11.patch, HIVE-21279.12.patch, HIVE-21279.13.patch, > HIVE-21279.2.patch, HIVE-21279.3.patch, HIVE-21279.4.patch, > HIVE-21279.5.patch, HIVE-21279.6.patch, HIVE-21279.7.patch, > HIVE-21279.8.patch, HIVE-21279.9.patch > > > Currently at the end of a job FileSink operator moves/rename temp directory > to another directory from which FetchTask fetches result. This is done to > avoid fetching potential partial/invalid files by failed/runway tasks. This > operation is expensive for cloud storage. It could be avoided if FetchTask is > passed on set of files to read from instead of whole directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21279) Avoid moving/rename operation in FileSink op for SELECT queries
[ https://issues.apache.org/jira/browse/HIVE-21279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-21279: --- Attachment: HIVE-21279.13.patch > Avoid moving/rename operation in FileSink op for SELECT queries > --- > > Key: HIVE-21279 > URL: https://issues.apache.org/jira/browse/HIVE-21279 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-21279.1.patch, HIVE-21279.10.patch, > HIVE-21279.11.patch, HIVE-21279.12.patch, HIVE-21279.13.patch, > HIVE-21279.2.patch, HIVE-21279.3.patch, HIVE-21279.4.patch, > HIVE-21279.5.patch, HIVE-21279.6.patch, HIVE-21279.7.patch, > HIVE-21279.8.patch, HIVE-21279.9.patch > > > Currently at the end of a job FileSink operator moves/rename temp directory > to another directory from which FetchTask fetches result. This is done to > avoid fetching potential partial/invalid files by failed/runway tasks. This > operation is expensive for cloud storage. It could be avoided if FetchTask is > passed on set of files to read from instead of whole directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21001) Upgrade to calcite-1.18
[ https://issues.apache.org/jira/browse/HIVE-21001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783718#comment-16783718 ] Hive QA commented on HIVE-21001: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12961018/HIVE-21001.43.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 15789 tests executed *Failed tests:* {noformat} TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=171) [authorization_view_8.q,load_dyn_part5.q,vector_groupby_grouping_sets5.q,vector_complex_join.q,orc_llap.q,vectorization_7.q,cbo_gby.q,bucket_num_reducers_acid2.q,auto_sortmerge_join_1.q,results_cache_empty_result.q,lineage3.q,materialized_view_rewrite_empty.q,q93_with_constraints.q,vector_struct_in.q,bucketmapjoin3.q,vectorization_16.q,current_date_timestamp.q,orc_ppd_schema_evol_2a.q,partition_ctas.q,vector_windowing_multipartitioning.q,vectorized_join46.q,orc_ppd_date.q,create_merge_compressed.q,vector_outer_join1.q,dynpart_sort_optimization_acid.q,vectorization_not.q,having.q,vectorization_input_format_excludes.q,leftsemijoin.q,special_character_in_tabnames_1.q] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ambiguitycheck] (batchId=79) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constant_prop_3] (batchId=47) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcs] (batchId=54) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[rand_partitionpruner3] (batchId=86) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_date_1] (batchId=23) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[semijoin6] (batchId=182) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_date_1] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_interval_2] (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16326/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16326/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16326/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12961018 - PreCommit-HIVE-Build > Upgrade to calcite-1.18 > --- > > Key: HIVE-21001 > URL: https://issues.apache.org/jira/browse/HIVE-21001 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-21001.01.patch, HIVE-21001.01.patch, > HIVE-21001.02.patch, HIVE-21001.03.patch, HIVE-21001.04.patch, > HIVE-21001.05.patch, HIVE-21001.06.patch, HIVE-21001.06.patch, > HIVE-21001.07.patch, HIVE-21001.08.patch, HIVE-21001.08.patch, > HIVE-21001.08.patch, HIVE-21001.09.patch, HIVE-21001.09.patch, > HIVE-21001.09.patch, HIVE-21001.10.patch, HIVE-21001.11.patch, > HIVE-21001.12.patch, HIVE-21001.13.patch, HIVE-21001.15.patch, > HIVE-21001.16.patch, HIVE-21001.17.patch, HIVE-21001.18.patch, > HIVE-21001.18.patch, HIVE-21001.19.patch, HIVE-21001.20.patch, > HIVE-21001.21.patch, HIVE-21001.22.patch, HIVE-21001.22.patch, > HIVE-21001.22.patch, HIVE-21001.23.patch, HIVE-21001.24.patch, > HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.26.patch, > HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.27.patch, > HIVE-21001.28.patch, HIVE-21001.29.patch, HIVE-21001.29.patch, > HIVE-21001.30.patch, HIVE-21001.31.patch, HIVE-21001.32.patch, > HIVE-21001.34.patch, HIVE-21001.35.patch, HIVE-21001.36.patch, > HIVE-21001.37.patch, HIVE-21001.38.patch, HIVE-21001.39.patch, > HIVE-21001.40.patch, HIVE-21001.41.patch, HIVE-21001.42.patch, > HIVE-21001.43.patch > > > XLEAR LIBRARY CACHE -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-21344) CBO: Materialized view registry is not used for Calcite planner
[ https://issues.apache.org/jira/browse/HIVE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez reassigned HIVE-21344: -- Assignee: Jesus Camacho Rodriguez > CBO: Materialized view registry is not used for Calcite planner > --- > > Key: HIVE-21344 > URL: https://issues.apache.org/jira/browse/HIVE-21344 > Project: Hive > Issue Type: Bug > Components: Materialized views >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: calcite-planner-after-fix.svg.zip, mv-get-from-remote.png > > > {code} > // This is not a rebuild, we retrieve all the materializations. In turn, we > do not need > // to force the materialization contents to be up-to-date, as this > is not a rebuild, and > // we apply the user parameters > (HIVE_MATERIALIZED_VIEW_REWRITING_TIME_WINDOW) instead. > materializations = > db.getAllValidMaterializedViews(getTablesUsed(basePlan), false, getTxnMgr()); > } > {code} > !mv-get-from-remote.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21376) Incompatible change in Hive bucket computation
[ https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783706#comment-16783706 ] David Phillips commented on HIVE-21376: --- I believe that v2 will have a similar incompatible change between 3.0 and 3.1 for {{TIMESTAMP}} due to the time value coming from {{java.sql.Timestamp}} changing from local to UTC. > Incompatible change in Hive bucket computation > -- > > Key: HIVE-21376 > URL: https://issues.apache.org/jira/browse/HIVE-21376 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: David Phillips >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-21376.patch > > > HIVE-20007 seems to have inadvertently changed the bucket hash code > computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the > {{DATE}} and {{TIMESTAMP}} data type2. > {{DATE}} was previously computed using {{DateWritable}}, which uses > {{daysSinceEpoch}} as the hash code. It is now computed using > {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} > (which is not days since epoch). > {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses > {{TimestampWritableV2}}. They ostensibly use the same hash code computation, > but there are two important differences: > # {{TimestampWritable}} rounds the number of milliseconds into the seconds > portion of the computation, but {{TimestampWritableV2}} does not. > # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, > which returns it relative to the JVM time zone, not UTC. > {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC. > I was unable to get Hive 3.1 running in order to verify if this actually > causes data to be read or written incorrectly (there may be code above this > library method which makes things work correctly). However, if my > understanding is correct, this means Hive 3.1 is both forwards and backwards > incompatible with bucketed tables using either of these data types. It also > indicates that Hive needs tests to verify that the hash code does not change > between releases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21377) Using Oracle as HMS DB with DirectSQL
[ https://issues.apache.org/jira/browse/HIVE-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajkumar Singh updated HIVE-21377: -- Attachment: HIVE-21377.patch Status: Patch Available (was: In Progress) > Using Oracle as HMS DB with DirectSQL > - > > Key: HIVE-21377 > URL: https://issues.apache.org/jira/browse/HIVE-21377 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 3.1.0, 3.0.0 >Reporter: Bo >Assignee: Rajkumar Singh >Priority: Major > Attachments: HIVE-21377.patch > > > When we use the Oracle as HMS DB, we saw this kind of contents in the HMS log > accordingly: > {code:java} > 2019-02-02 T08:23:57,102 WARN [Thread-12]: metastore.ObjectStore > (ObjectStore.java:handleDirectSqlError(3741)) - Falling back to ORM path due > to direct SQL failure (this is not an error): Cannot extract boolean from > column value 0 at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.extractSqlBoolean(MetaStoreDirectSql.java:1031) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsFromPartitionIds(MetaStoreDirectSql.java:728) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$300(MetaStoreDirectSql.java:109) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql$1.run(MetaStoreDirectSql.java:471) > at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:73) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:462) > at > org.apache.hadoop.hive.metastore.ObjectStore$8.getSqlResult(ObjectStore.java:3392) > {code} > In Hive, we handle the Postgres, MySQL and Derby for the extractSqlBoolean. > But Oracle return the 0 or 1 for Boolean. So we need to modify the > MetastoreDirectSqlUtils.java - [1] > So, could add this snip in this code? > {code:java} > static Boolean extractSqlBoolean(Object value) throws MetaException { > if (value == null) { > return null; > } > if (value instanceof Boolean) { > return (Boolean)value; > } > if (value instanceof Number) { // add > try { > return BooleanUtils.toBooleanObject((Decimal) value, 1, 0, null); > } catch(IllegalArugmentExeception iae){ > // NOOP > } > if (value instanceof String) { > try { > return BooleanUtils.toBooleanObject((String) value, "Y", "N", null); > } catch (IllegalArgumentException iae) { > // NOOP > } > } > throw new MetaException("Cannot extract boolean from column value " + > value); > } > {code} > [1] - > https://github.com/apache/hive/blob/f51f108b761f0c88647f48f30447dae12b308f31/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java#L501-L527 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HIVE-21343) CBO: CalcitePlanner debug logging is expensive and costly
[ https://issues.apache.org/jira/browse/HIVE-21343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez resolved HIVE-21343. Resolution: Fixed Fix Version/s: 4.0.0 Fixed as part of HIVE-18920 . > CBO: CalcitePlanner debug logging is expensive and costly > - > > Key: HIVE-21343 > URL: https://issues.apache.org/jira/browse/HIVE-21343 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Jesus Camacho Rodriguez >Priority: Major > Fix For: 4.0.0 > > Attachments: Reloptutil-toString.png, > calcite-planner-after-fix.svg.zip > > > {code} > //Remove subquery > LOG.debug("Plan before removing subquery:\n" + > RelOptUtil.toString(calciteGenPlan)); > calciteGenPlan = hepPlan(calciteGenPlan, false, > mdProvider.getMetadataProvider(), null, > new HiveSubQueryRemoveRule(conf)); > LOG.debug("Plan just after removing subquery:\n" + > RelOptUtil.toString(calciteGenPlan)); > calciteGenPlan = HiveRelDecorrelator.decorrelateQuery(calciteGenPlan); > LOG.debug("Plan after decorrelation:\n" + > RelOptUtil.toString(calciteGenPlan)); > {code} > The LOG.debug() consumes more CPU than the actual planner steps. > !Reloptutil-toString.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18920) CBO: Initialize the Janino providers ahead of 1st query
[ https://issues.apache.org/jira/browse/HIVE-18920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-18920: --- Resolution: Fixed Fix Version/s: 4.0.0 Status: Resolved (was: Patch Available) Pushed to master, thanks [~ashutoshc] > CBO: Initialize the Janino providers ahead of 1st query > --- > > Key: HIVE-18920 > URL: https://issues.apache.org/jira/browse/HIVE-18920 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Jesus Camacho Rodriguez >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-18920.01.patch, HIVE-18920.02.patch, > HIVE-18920.patch > > > Hive Calcite metadata providers are compiled when the 1st query comes in. > If a second query arrives before the 1st one has built a metadata provider, > it will also try to do the same thing, because the cache is not populated yet. > With 1024 concurrent users, it takes 6 minutes for the 1st query to finish > fighting all the other queries which are trying to load that cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work started] (HIVE-21377) Using Oracle as HMS DB with DirectSQL
[ https://issues.apache.org/jira/browse/HIVE-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-21377 started by Rajkumar Singh. - > Using Oracle as HMS DB with DirectSQL > - > > Key: HIVE-21377 > URL: https://issues.apache.org/jira/browse/HIVE-21377 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 3.0.0, 3.1.0 >Reporter: Bo >Assignee: Rajkumar Singh >Priority: Major > > When we use the Oracle as HMS DB, we saw this kind of contents in the HMS log > accordingly: > {code:java} > 2019-02-02 T08:23:57,102 WARN [Thread-12]: metastore.ObjectStore > (ObjectStore.java:handleDirectSqlError(3741)) - Falling back to ORM path due > to direct SQL failure (this is not an error): Cannot extract boolean from > column value 0 at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.extractSqlBoolean(MetaStoreDirectSql.java:1031) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsFromPartitionIds(MetaStoreDirectSql.java:728) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$300(MetaStoreDirectSql.java:109) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql$1.run(MetaStoreDirectSql.java:471) > at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:73) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:462) > at > org.apache.hadoop.hive.metastore.ObjectStore$8.getSqlResult(ObjectStore.java:3392) > {code} > In Hive, we handle the Postgres, MySQL and Derby for the extractSqlBoolean. > But Oracle return the 0 or 1 for Boolean. So we need to modify the > MetastoreDirectSqlUtils.java - [1] > So, could add this snip in this code? > {code:java} > static Boolean extractSqlBoolean(Object value) throws MetaException { > if (value == null) { > return null; > } > if (value instanceof Boolean) { > return (Boolean)value; > } > if (value instanceof Number) { // add > try { > return BooleanUtils.toBooleanObject((Decimal) value, 1, 0, null); > } catch(IllegalArugmentExeception iae){ > // NOOP > } > if (value instanceof String) { > try { > return BooleanUtils.toBooleanObject((String) value, "Y", "N", null); > } catch (IllegalArgumentException iae) { > // NOOP > } > } > throw new MetaException("Cannot extract boolean from column value " + > value); > } > {code} > [1] - > https://github.com/apache/hive/blob/f51f108b761f0c88647f48f30447dae12b308f31/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java#L501-L527 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18920) CBO: Initialize the Janino providers ahead of 1st query
[ https://issues.apache.org/jira/browse/HIVE-18920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783693#comment-16783693 ] Ashutosh Chauhan commented on HIVE-18920: - +1 > CBO: Initialize the Janino providers ahead of 1st query > --- > > Key: HIVE-18920 > URL: https://issues.apache.org/jira/browse/HIVE-18920 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-18920.01.patch, HIVE-18920.02.patch, > HIVE-18920.patch > > > Hive Calcite metadata providers are compiled when the 1st query comes in. > If a second query arrives before the 1st one has built a metadata provider, > it will also try to do the same thing, because the cache is not populated yet. > With 1024 concurrent users, it takes 6 minutes for the 1st query to finish > fighting all the other queries which are trying to load that cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-21377) Using Oracle as HMS DB with DirectSQL
[ https://issues.apache.org/jira/browse/HIVE-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajkumar Singh reassigned HIVE-21377: - Assignee: Rajkumar Singh > Using Oracle as HMS DB with DirectSQL > - > > Key: HIVE-21377 > URL: https://issues.apache.org/jira/browse/HIVE-21377 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 3.0.0, 3.1.0 >Reporter: Bo >Assignee: Rajkumar Singh >Priority: Major > > When we use the Oracle as HMS DB, we saw this kind of contents in the HMS log > accordingly: > {code:java} > 2019-02-02 T08:23:57,102 WARN [Thread-12]: metastore.ObjectStore > (ObjectStore.java:handleDirectSqlError(3741)) - Falling back to ORM path due > to direct SQL failure (this is not an error): Cannot extract boolean from > column value 0 at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.extractSqlBoolean(MetaStoreDirectSql.java:1031) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsFromPartitionIds(MetaStoreDirectSql.java:728) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$300(MetaStoreDirectSql.java:109) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql$1.run(MetaStoreDirectSql.java:471) > at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:73) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:462) > at > org.apache.hadoop.hive.metastore.ObjectStore$8.getSqlResult(ObjectStore.java:3392) > {code} > In Hive, we handle the Postgres, MySQL and Derby for the extractSqlBoolean. > But Oracle return the 0 or 1 for Boolean. So we need to modify the > MetastoreDirectSqlUtils.java - [1] > So, could add this snip in this code? > {code:java} > static Boolean extractSqlBoolean(Object value) throws MetaException { > if (value == null) { > return null; > } > if (value instanceof Boolean) { > return (Boolean)value; > } > if (value instanceof Number) { // add > try { > return BooleanUtils.toBooleanObject((Decimal) value, 1, 0, null); > } catch(IllegalArugmentExeception iae){ > // NOOP > } > if (value instanceof String) { > try { > return BooleanUtils.toBooleanObject((String) value, "Y", "N", null); > } catch (IllegalArgumentException iae) { > // NOOP > } > } > throw new MetaException("Cannot extract boolean from column value " + > value); > } > {code} > [1] - > https://github.com/apache/hive/blob/f51f108b761f0c88647f48f30447dae12b308f31/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java#L501-L527 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18920) CBO: Initialize the Janino providers ahead of 1st query
[ https://issues.apache.org/jira/browse/HIVE-18920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783687#comment-16783687 ] Jesus Camacho Rodriguez commented on HIVE-18920: [~ashutoshc], [~gopalv] has confirmed this patch fixes the issue with the recompilation. Could you review it? Thanks > CBO: Initialize the Janino providers ahead of 1st query > --- > > Key: HIVE-18920 > URL: https://issues.apache.org/jira/browse/HIVE-18920 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-18920.01.patch, HIVE-18920.02.patch, > HIVE-18920.patch > > > Hive Calcite metadata providers are compiled when the 1st query comes in. > If a second query arrives before the 1st one has built a metadata provider, > it will also try to do the same thing, because the cache is not populated yet. > With 1024 concurrent users, it takes 6 minutes for the 1st query to finish > fighting all the other queries which are trying to load that cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207339=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207339 ] ASF GitHub Bot logged work on HIVE-21286: - Author: ASF GitHub Bot Created on: 04/Mar/19 18:20 Start Date: 04/Mar/19 18:20 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #551: HIVE-21286: Hive should support clean-up of previously bootstrapped tables when retry from different dump. URL: https://github.com/apache/hive/pull/551#discussion_r262182774 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java ## @@ -534,6 +536,90 @@ public void bootstrapExternalTablesDuringIncrementalPhase() throws Throwable { .verifyResults(Arrays.asList("10", "20")); } + @Test + public void retryBootstrapExternalTablesFromDifferentDump() throws Throwable { +List loadWithClause = new ArrayList<>(); +loadWithClause.addAll(externalTableBasePathWithClause()); + +List dumpWithClause = Collections.singletonList( +"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + "'='false'" +); + +WarehouseInstance.Tuple tupleBootstrapWithoutExternal = primary +.run("use " + primaryDbName) +.run("create external table t1 (id int)") +.run("insert into table t1 values (1)") +.run("create external table t2 (place string) partitioned by (country string)") +.run("insert into table t2 partition(country='india') values ('bangalore')") +.run("insert into table t2 partition(country='us') values ('austin')") +.run("create table t3 as select * from t1") +.dump(primaryDbName, null, dumpWithClause); + +replica.load(replicatedDbName, tupleBootstrapWithoutExternal.dumpLocation, loadWithClause) +.status(replicatedDbName) +.verifyResult(tupleBootstrapWithoutExternal.lastReplicationId) +.run("use " + replicatedDbName) +.run("show tables") +.verifyResult("t3") +.run("select id from t3") +.verifyResult("1"); + +dumpWithClause = Arrays.asList("'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + "'='true'", +"'" + HiveConf.ConfVars.REPL_BOOTSTRAP_EXTERNAL_TABLES.varname + "'='true'"); +WarehouseInstance.Tuple tupleIncWithExternalBootstrap = primary.run("use " + primaryDbName) +.run("drop table t1") +.run("create external table t4 (id int)") +.run("insert into table t4 values (10)") +.run("create table t5 as select * from t4") +.dump(primaryDbName, tupleBootstrapWithoutExternal.lastReplicationId, dumpWithClause); + +// Verify if bootstrapping with same dump is idempotent and return same result +for (int i = 0; i < 2; i++) { + replica.load(replicatedDbName, tupleIncWithExternalBootstrap.dumpLocation, loadWithClause) + .status(replicatedDbName) + .verifyResult(tupleIncWithExternalBootstrap.lastReplicationId) + .run("use " + replicatedDbName) + .run("show tables like 't1'") + .verifyFailure(new String[]{"t1"}) + .run("select place from t2 where country = 'us'") + .verifyResult("austin") + .run("select id from t4") + .verifyResult("10") + .run("select id from t5") + .verifyResult("10"); +} + +// Drop an external table, add another managed table with same name, insert into existing external table +// and dump another bootstrap dump for external tables. +WarehouseInstance.Tuple tupleNewIncWithExternalBootstrap = primary.run("use " + primaryDbName) +.run("insert into table t2 partition(country='india') values ('chennai')") +.run("drop table t2") +.run("create table t2 as select * from t4") +.run("insert into table t4 values (20)") +.dump(primaryDbName, tupleIncWithExternalBootstrap.lastReplicationId, dumpWithClause); + +// Set previous dump as bootstrap to be rolled-back. Now, new bootstrap should overwrite the old one. +loadWithClause.add("'" + REPL_ROLLBACK_BOOTSTRAP_LOAD_CONFIG + "'='" ++ tupleIncWithExternalBootstrap.dumpLocation + "'"); +replica.load(replicatedDbName, tupleNewIncWithExternalBootstrap.dumpLocation, loadWithClause) +.run("use " + replicatedDbName) +.run("show tables like 't1'") +.verifyFailure(new String[]{"t1"}) +.run("select id from t2") +.verifyResult("10") +.run("select id from t4") +.verifyResults(Arrays.asList("10", "20")) +.run("select id from t5") +.verifyResult("10");
[jira] [Commented] (HIVE-1555) JDBC Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783654#comment-16783654 ] Ruslan Dautkhanov commented on HIVE-1555: - Is this possible to store password in a Hadoop credential store? Metadata is visible to all users.. Alternatively Hive should always return password field redacted for commands like `show create table`. > JDBC Storage Handler > > > Key: HIVE-1555 > URL: https://issues.apache.org/jira/browse/HIVE-1555 > Project: Hive > Issue Type: New Feature > Components: JDBC >Reporter: Bob Robertson >Assignee: Gunther Hagleitner >Priority: Major > Labels: TODOC2.2 > Fix For: 2.3.0 > > Attachments: HIVE-1555.7.patch, HIVE-1555.8.patch, HIVE-1555.9.patch, > JDBCStorageHandler Design Doc.pdf > > Original Estimate: 24h > Remaining Estimate: 24h > > With the Cassandra and HBase Storage Handlers I thought it would make sense > to include a generic JDBC RDBMS Storage Handler so that you could import a > standard DB table into Hive. Many people must want to perform HiveQL joins, > etc against tables in other systems etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207353 ] ASF GitHub Bot logged work on HIVE-21286: - Author: ASF GitHub Bot Created on: 04/Mar/19 18:30 Start Date: 04/Mar/19 18:30 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #551: HIVE-21286: Hive should support clean-up of previously bootstrapped tables when retry from different dump. URL: https://github.com/apache/hive/pull/551#discussion_r262186354 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java ## @@ -279,6 +292,72 @@ a database ( directory ) return 0; } + /** + * Cleanup/drop tables from the given database which are bootstrapped by input dump dir. + * @throws HiveException Failed to drop the tables. + * @throws IOException File operations failure. + * @throws InvalidInputException Invalid input dump directory. + */ + private void bootstrapRollbackTask() throws HiveException, IOException, InvalidInputException { +Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback) +.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build(); +FileSystem fs = bootstrapDirectory.getFileSystem(conf); + +if (!fs.exists(bootstrapDirectory)) { + throw new InvalidInputException("Input bootstrap dump directory to rollback doesn't exist: " + + bootstrapDirectory); +} + +FileStatus[] fileStatuses = fs.listStatus(bootstrapDirectory, EximUtil.getDirectoryFilter(fs)); +if ((fileStatuses == null) || (fileStatuses.length == 0)) { + throw new InvalidInputException("Input bootstrap dump directory to rollback is empty: " + + bootstrapDirectory); +} + +if (StringUtils.isNotBlank(work.dbNameToLoadIn) && (fileStatuses.length > 1)) { + throw new InvalidInputException("Multiple DB dirs in the dump: " + bootstrapDirectory + + " is not allowed to load to single target DB: " + work.dbNameToLoadIn); +} + +for (FileStatus dbDir : fileStatuses) { + Path dbLevelPath = dbDir.getPath(); + String dbNameInDump = dbLevelPath.getName(); + + List tableNames = new ArrayList<>(); + RemoteIterator filesIterator = fs.listFiles(dbLevelPath, true); + while (filesIterator.hasNext()) { +Path nextFile = filesIterator.next().getPath(); +String filePath = nextFile.toString(); +if (filePath.endsWith(EximUtil.METADATA_NAME)) { + // Remove dbLevelPath from the current path to check if this _metadata file is under DB or + // table level directory. + String replacedString = filePath.replace(dbLevelPath.toString(), ""); + if (!replacedString.equalsIgnoreCase(EximUtil.METADATA_NAME)) { +tableNames.add(nextFile.getParent().getName()); + } +} + } + + // No tables listed in the DB level directory to be dropped. + if (tableNames.isEmpty()) { +LOG.info("No tables are listed to be dropped for Database: {} in bootstrap dump: {}", +dbNameInDump, bootstrapDirectory); +continue; + } + + // Drop all tables bootstrapped from previous dump. + // Get the target DB in which previously bootstrapped tables to be dropped. If user specified + // DB name as input in REPL LOAD command, then use it. + String dbName = (StringUtils.isNotBlank(work.dbNameToLoadIn) ? work.dbNameToLoadIn : dbNameInDump); + + Hive db = getHive(); + for (String table : tableNames) { +db.dropTable(dbName + "." + table, true); Review comment: Well, I think my current test case covers this case. So, it's done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 207353) Time Spent: 1h 40m (was: 1.5h) > Hive should support clean-up of previously bootstrapped tables when retry > from different dump. > -- > > Key: HIVE-21286 > URL: https://issues.apache.org/jira/browse/HIVE-21286 > Project: Hive > Issue Type: Bug > Components: repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication, pull-request-available > Attachments: HIVE-21286.01.patch > > Time Spent: 1h 40m > Remaining Estimate:
[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207350 ] ASF GitHub Bot logged work on HIVE-21286: - Author: ASF GitHub Bot Created on: 04/Mar/19 18:28 Start Date: 04/Mar/19 18:28 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #551: HIVE-21286: Hive should support clean-up of previously bootstrapped tables when retry from different dump. URL: https://github.com/apache/hive/pull/551#discussion_r262185564 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java ## @@ -66,6 +66,10 @@ // tasks. public static final String REPL_CURRENT_TBL_WRITE_ID = "hive.repl.current.table.write.id"; + // Configuration to be received via WITH clause of REPL LOAD to rollback any previously failed + // bootstrap load. + public static final String REPL_ROLLBACK_BOOTSTRAP_LOAD_CONFIG = "hive.repl.rollback.bootstrap.load"; Review comment: Nope. It works only with incremental dump. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 207350) Time Spent: 1.5h (was: 1h 20m) > Hive should support clean-up of previously bootstrapped tables when retry > from different dump. > -- > > Key: HIVE-21286 > URL: https://issues.apache.org/jira/browse/HIVE-21286 > Project: Hive > Issue Type: Bug > Components: repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication, pull-request-available > Attachments: HIVE-21286.01.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > If external tables are enabled for replication on an existing repl policy, > then bootstrapping of external tables are combined with incremental dump. > If incremental bootstrap load fails with non-retryable error for which user > will have to manually drop all the external tables before trying with another > bootstrap dump. For full bootstrap, to retry with different dump, we > suggested user to drop the DB but in this case they need to manually drop all > the external tables which is not so user friendly. So, need to handle it in > Hive side as follows. > REPL LOAD takes additional config (passed by user in WITH clause) that says, > drop all the tables which are bootstrapped from previous dump. > hive.repl.rollback.bootstrap.load= > Hive will use this config only if the current dump is bootstrap dump or > combined bootstrap in incremental dump. > Caution to be taken by user that this config should not be passed if previous > REPL LOAD (with bootstrap) was successful or any successful incremental > dump+load happened after "previous_bootstrap_dump_dir". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207349 ] ASF GitHub Bot logged work on HIVE-21286: - Author: ASF GitHub Bot Created on: 04/Mar/19 18:27 Start Date: 04/Mar/19 18:27 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #551: HIVE-21286: Hive should support clean-up of previously bootstrapped tables when retry from different dump. URL: https://github.com/apache/hive/pull/551#discussion_r262185344 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java ## @@ -279,6 +292,72 @@ a database ( directory ) return 0; } + /** + * Cleanup/drop tables from the given database which are bootstrapped by input dump dir. + * @throws HiveException Failed to drop the tables. + * @throws IOException File operations failure. + * @throws InvalidInputException Invalid input dump directory. + */ + private void bootstrapRollbackTask() throws HiveException, IOException, InvalidInputException { +Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback) +.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build(); +FileSystem fs = bootstrapDirectory.getFileSystem(conf); + +if (!fs.exists(bootstrapDirectory)) { + throw new InvalidInputException("Input bootstrap dump directory to rollback doesn't exist: " + + bootstrapDirectory); +} + +FileStatus[] fileStatuses = fs.listStatus(bootstrapDirectory, EximUtil.getDirectoryFilter(fs)); +if ((fileStatuses == null) || (fileStatuses.length == 0)) { + throw new InvalidInputException("Input bootstrap dump directory to rollback is empty: " + + bootstrapDirectory); +} + +if (StringUtils.isNotBlank(work.dbNameToLoadIn) && (fileStatuses.length > 1)) { + throw new InvalidInputException("Multiple DB dirs in the dump: " + bootstrapDirectory + + " is not allowed to load to single target DB: " + work.dbNameToLoadIn); +} + +for (FileStatus dbDir : fileStatuses) { + Path dbLevelPath = dbDir.getPath(); + String dbNameInDump = dbLevelPath.getName(); + + List tableNames = new ArrayList<>(); + RemoteIterator filesIterator = fs.listFiles(dbLevelPath, true); + while (filesIterator.hasNext()) { +Path nextFile = filesIterator.next().getPath(); +String filePath = nextFile.toString(); +if (filePath.endsWith(EximUtil.METADATA_NAME)) { + // Remove dbLevelPath from the current path to check if this _metadata file is under DB or + // table level directory. + String replacedString = filePath.replace(dbLevelPath.toString(), ""); + if (!replacedString.equalsIgnoreCase(EximUtil.METADATA_NAME)) { +tableNames.add(nextFile.getParent().getName()); + } +} + } + + // No tables listed in the DB level directory to be dropped. + if (tableNames.isEmpty()) { +LOG.info("No tables are listed to be dropped for Database: {} in bootstrap dump: {}", +dbNameInDump, bootstrapDirectory); +continue; + } + + // Drop all tables bootstrapped from previous dump. + // Get the target DB in which previously bootstrapped tables to be dropped. If user specified + // DB name as input in REPL LOAD command, then use it. + String dbName = (StringUtils.isNotBlank(work.dbNameToLoadIn) ? work.dbNameToLoadIn : dbNameInDump); + + Hive db = getHive(); + for (String table : tableNames) { +db.dropTable(dbName + "." + table, true); Review comment: Dropping an external table doesn't delete the directory. Also, managed table doesn't use the same location as external tables and so there won't be any conflict. So, re-bootstrap, drop the old external tables and incremental will create the new managed table and the operation will be successful. Probably, I will add a test to cover this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 207349) Time Spent: 1h 20m (was: 1h 10m) > Hive should support clean-up of previously bootstrapped tables when retry > from different dump. > -- > > Key: HIVE-21286 > URL: https://issues.apache.org/jira/browse/HIVE-21286 > Project: Hive > Issue Type: Bug > Components: repl >Affects
[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207344=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207344 ] ASF GitHub Bot logged work on HIVE-21286: - Author: ASF GitHub Bot Created on: 04/Mar/19 18:24 Start Date: 04/Mar/19 18:24 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #551: HIVE-21286: Hive should support clean-up of previously bootstrapped tables when retry from different dump. URL: https://github.com/apache/hive/pull/551#discussion_r262184013 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java ## @@ -279,6 +292,72 @@ a database ( directory ) return 0; } + /** + * Cleanup/drop tables from the given database which are bootstrapped by input dump dir. + * @throws HiveException Failed to drop the tables. + * @throws IOException File operations failure. + * @throws InvalidInputException Invalid input dump directory. + */ + private void bootstrapRollbackTask() throws HiveException, IOException, InvalidInputException { +Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback) +.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build(); +FileSystem fs = bootstrapDirectory.getFileSystem(conf); + +if (!fs.exists(bootstrapDirectory)) { + throw new InvalidInputException("Input bootstrap dump directory to rollback doesn't exist: " + + bootstrapDirectory); Review comment: This feature is not specific for external tables. The idea is to rollback the tables bootstrapped from given dump irrespective of external or acid or even table level replication. We expect the input dump to be bootstrap combined in incremental dump. If full bootstrap dump is specified, it throw exception. I will add a test to see if any other dump is specified, then repl load should fail. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 207344) Time Spent: 1h (was: 50m) > Hive should support clean-up of previously bootstrapped tables when retry > from different dump. > -- > > Key: HIVE-21286 > URL: https://issues.apache.org/jira/browse/HIVE-21286 > Project: Hive > Issue Type: Bug > Components: repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication, pull-request-available > Attachments: HIVE-21286.01.patch > > Time Spent: 1h > Remaining Estimate: 0h > > If external tables are enabled for replication on an existing repl policy, > then bootstrapping of external tables are combined with incremental dump. > If incremental bootstrap load fails with non-retryable error for which user > will have to manually drop all the external tables before trying with another > bootstrap dump. For full bootstrap, to retry with different dump, we > suggested user to drop the DB but in this case they need to manually drop all > the external tables which is not so user friendly. So, need to handle it in > Hive side as follows. > REPL LOAD takes additional config (passed by user in WITH clause) that says, > drop all the tables which are bootstrapped from previous dump. > hive.repl.rollback.bootstrap.load= > Hive will use this config only if the current dump is bootstrap dump or > combined bootstrap in incremental dump. > Caution to be taken by user that this config should not be passed if previous > REPL LOAD (with bootstrap) was successful or any successful incremental > dump+load happened after "previous_bootstrap_dump_dir". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207346=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207346 ] ASF GitHub Bot logged work on HIVE-21286: - Author: ASF GitHub Bot Created on: 04/Mar/19 18:24 Start Date: 04/Mar/19 18:24 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #551: HIVE-21286: Hive should support clean-up of previously bootstrapped tables when retry from different dump. URL: https://github.com/apache/hive/pull/551#discussion_r262184237 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java ## @@ -279,6 +292,72 @@ a database ( directory ) return 0; } + /** + * Cleanup/drop tables from the given database which are bootstrapped by input dump dir. + * @throws HiveException Failed to drop the tables. + * @throws IOException File operations failure. + * @throws InvalidInputException Invalid input dump directory. + */ + private void bootstrapRollbackTask() throws HiveException, IOException, InvalidInputException { +Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback) +.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build(); +FileSystem fs = bootstrapDirectory.getFileSystem(conf); + +if (!fs.exists(bootstrapDirectory)) { + throw new InvalidInputException("Input bootstrap dump directory to rollback doesn't exist: " + + bootstrapDirectory); +} + +FileStatus[] fileStatuses = fs.listStatus(bootstrapDirectory, EximUtil.getDirectoryFilter(fs)); +if ((fileStatuses == null) || (fileStatuses.length == 0)) { + throw new InvalidInputException("Input bootstrap dump directory to rollback is empty: " + + bootstrapDirectory); +} + +if (StringUtils.isNotBlank(work.dbNameToLoadIn) && (fileStatuses.length > 1)) { + throw new InvalidInputException("Multiple DB dirs in the dump: " + bootstrapDirectory + + " is not allowed to load to single target DB: " + work.dbNameToLoadIn); +} + +for (FileStatus dbDir : fileStatuses) { Review comment: If work.dbNameToLoadIn is empty or null, then there can be multiple DB directories. So, it will be array in that case. We cannot avoid the loop. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 207346) Time Spent: 1h 10m (was: 1h) > Hive should support clean-up of previously bootstrapped tables when retry > from different dump. > -- > > Key: HIVE-21286 > URL: https://issues.apache.org/jira/browse/HIVE-21286 > Project: Hive > Issue Type: Bug > Components: repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication, pull-request-available > Attachments: HIVE-21286.01.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > If external tables are enabled for replication on an existing repl policy, > then bootstrapping of external tables are combined with incremental dump. > If incremental bootstrap load fails with non-retryable error for which user > will have to manually drop all the external tables before trying with another > bootstrap dump. For full bootstrap, to retry with different dump, we > suggested user to drop the DB but in this case they need to manually drop all > the external tables which is not so user friendly. So, need to handle it in > Hive side as follows. > REPL LOAD takes additional config (passed by user in WITH clause) that says, > drop all the tables which are bootstrapped from previous dump. > hive.repl.rollback.bootstrap.load= > Hive will use this config only if the current dump is bootstrap dump or > combined bootstrap in incremental dump. > Caution to be taken by user that this config should not be passed if previous > REPL LOAD (with bootstrap) was successful or any successful incremental > dump+load happened after "previous_bootstrap_dump_dir". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21152) Rewrite if expression to case and recognize simple case as an if
[ https://issues.apache.org/jira/browse/HIVE-21152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783621#comment-16783621 ] Hive QA commented on HIVE-21152: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12961016/HIVE-21152.04.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 15818 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explode_null] (batchId=29) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_write_correct_definition_levels] (batchId=44) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf6] (batchId=57) org.apache.hive.service.server.TestInformationSchemaWithPrivilege.test (batchId=259) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16325/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16325/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16325/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12961016 - PreCommit-HIVE-Build > Rewrite if expression to case and recognize simple case as an if > > > Key: HIVE-21152 > URL: https://issues.apache.org/jira/browse/HIVE-21152 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-21152.01.patch, HIVE-21152.02.patch, > HIVE-21152.03.patch, HIVE-21152.04.patch > > > * {{IF}} is not part of the sql standard; however given its special form its > simpler - and currently in Hive it also has vectorized support > * people writing standard sql may write: {{CASE WHEN member=1 THEN attr+1 > else attr+2 END}} which is essentially an if. > The idea is to rewrite IFs to CASEs for the cbo; and recognize simple > "CASE"-s as IFs to get vectorization on them if possible -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21152) Rewrite if expression to case and recognize simple case as an if
[ https://issues.apache.org/jira/browse/HIVE-21152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783595#comment-16783595 ] Hive QA commented on HIVE-21152: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 4s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 22s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 45s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 50s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16325/dev-support/hive-personality.sh | | git revision | master / f51f108 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16325/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Rewrite if expression to case and recognize simple case as an if > > > Key: HIVE-21152 > URL: https://issues.apache.org/jira/browse/HIVE-21152 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-21152.01.patch, HIVE-21152.02.patch, > HIVE-21152.03.patch, HIVE-21152.04.patch > > > * {{IF}} is not part of the sql standard; however given its special form its > simpler - and currently in Hive it also has vectorized support > * people writing standard sql may write: {{CASE WHEN member=1 THEN attr+1 > else attr+2 END}} which is essentially an if. > The idea is to rewrite IFs to CASEs for the cbo; and recognize simple > "CASE"-s as IFs to get vectorization on them if possible -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-21379) Mask password in DDL commands for table properties
[ https://issues.apache.org/jira/browse/HIVE-21379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned HIVE-21379: - > Mask password in DDL commands for table properties > -- > > Key: HIVE-21379 > URL: https://issues.apache.org/jira/browse/HIVE-21379 > Project: Hive > Issue Type: Improvement >Reporter: Daniel Dai >Assignee: Daniel Dai >Priority: Major > Attachments: HIVE-21379.1.patch > > > We need to mask password related table properties (such as > hive.sql.dbcp.password) in DDL output, such as describe extended/describe > formatted/show create table/show tblproperties. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21379) Mask password in DDL commands for table properties
[ https://issues.apache.org/jira/browse/HIVE-21379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-21379: -- Status: Patch Available (was: Open) > Mask password in DDL commands for table properties > -- > > Key: HIVE-21379 > URL: https://issues.apache.org/jira/browse/HIVE-21379 > Project: Hive > Issue Type: Improvement >Reporter: Daniel Dai >Assignee: Daniel Dai >Priority: Major > Attachments: HIVE-21379.1.patch > > > We need to mask password related table properties (such as > hive.sql.dbcp.password) in DDL output, such as describe extended/describe > formatted/show create table/show tblproperties. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21376) Incompatible change in Hive bucket computation
[ https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-21376: --- Attachment: HIVE-21376.patch > Incompatible change in Hive bucket computation > -- > > Key: HIVE-21376 > URL: https://issues.apache.org/jira/browse/HIVE-21376 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: David Phillips >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-21376.patch > > > HIVE-20007 seems to have inadvertently changed the bucket hash code > computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the > {{DATE}} and {{TIMESTAMP}} data type2. > {{DATE}} was previously computed using {{DateWritable}}, which uses > {{daysSinceEpoch}} as the hash code. It is now computed using > {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} > (which is not days since epoch). > {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses > {{TimestampWritableV2}}. They ostensibly use the same hash code computation, > but there are two important differences: > # {{TimestampWritable}} rounds the number of milliseconds into the seconds > portion of the computation, but {{TimestampWritableV2}} does not. > # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, > which returns it relative to the JVM time zone, not UTC. > {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC. > I was unable to get Hive 3.1 running in order to verify if this actually > causes data to be read or written incorrectly (there may be code above this > library method which makes things work correctly). However, if my > understanding is correct, this means Hive 3.1 is both forwards and backwards > incompatible with bucketed tables using either of these data types. It also > indicates that Hive needs tests to verify that the hash code does not change > between releases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21376) Incompatible change in Hive bucket computation
[ https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-21376: --- Status: Patch Available (was: In Progress) > Incompatible change in Hive bucket computation > -- > > Key: HIVE-21376 > URL: https://issues.apache.org/jira/browse/HIVE-21376 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: David Phillips >Assignee: Jesus Camacho Rodriguez >Priority: Major > > HIVE-20007 seems to have inadvertently changed the bucket hash code > computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the > {{DATE}} and {{TIMESTAMP}} data type2. > {{DATE}} was previously computed using {{DateWritable}}, which uses > {{daysSinceEpoch}} as the hash code. It is now computed using > {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} > (which is not days since epoch). > {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses > {{TimestampWritableV2}}. They ostensibly use the same hash code computation, > but there are two important differences: > # {{TimestampWritable}} rounds the number of milliseconds into the seconds > portion of the computation, but {{TimestampWritableV2}} does not. > # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, > which returns it relative to the JVM time zone, not UTC. > {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC. > I was unable to get Hive 3.1 running in order to verify if this actually > causes data to be read or written incorrectly (there may be code above this > library method which makes things work correctly). However, if my > understanding is correct, this means Hive 3.1 is both forwards and backwards > incompatible with bucketed tables using either of these data types. It also > indicates that Hive needs tests to verify that the hash code does not change > between releases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work started] (HIVE-21376) Incompatible change in Hive bucket computation
[ https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-21376 started by Jesus Camacho Rodriguez. -- > Incompatible change in Hive bucket computation > -- > > Key: HIVE-21376 > URL: https://issues.apache.org/jira/browse/HIVE-21376 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: David Phillips >Assignee: Jesus Camacho Rodriguez >Priority: Major > > HIVE-20007 seems to have inadvertently changed the bucket hash code > computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the > {{DATE}} and {{TIMESTAMP}} data type2. > {{DATE}} was previously computed using {{DateWritable}}, which uses > {{daysSinceEpoch}} as the hash code. It is now computed using > {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} > (which is not days since epoch). > {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses > {{TimestampWritableV2}}. They ostensibly use the same hash code computation, > but there are two important differences: > # {{TimestampWritable}} rounds the number of milliseconds into the seconds > portion of the computation, but {{TimestampWritableV2}} does not. > # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, > which returns it relative to the JVM time zone, not UTC. > {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC. > I was unable to get Hive 3.1 running in order to verify if this actually > causes data to be read or written incorrectly (there may be code above this > library method which makes things work correctly). However, if my > understanding is correct, this means Hive 3.1 is both forwards and backwards > incompatible with bucketed tables using either of these data types. It also > indicates that Hive needs tests to verify that the hash code does not change > between releases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18728) Secure webHCat with SSL
[ https://issues.apache.org/jira/browse/HIVE-18728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksiy Sayankin updated HIVE-18728: Status: In Progress (was: Patch Available) > Secure webHCat with SSL > --- > > Key: HIVE-18728 > URL: https://issues.apache.org/jira/browse/HIVE-18728 > Project: Hive > Issue Type: New Feature > Components: Security >Reporter: Oleksiy Sayankin >Assignee: Oleksiy Sayankin >Priority: Major > Fix For: 3.2.0 > > Attachments: HIVE-18728.1.patch, HIVE-18728.2.patch, > HIVE-18728.3.patch > > > Doc for the issue: > *Configure WebHCat server to use SSL encryption* > You can configure WebHCat REST-API to use SSL (Secure Sockets Layer) > encryption. The following WebHCat properties are added to enable SSL. > {{templeton.use.ssl}} > Default value: {{false}} > Description: Set this to true for using SSL encryption for WebHCat server > {{templeton.keystore.path}} > Default value: {{}} > Description: SSL certificate keystore location for WebHCat server > {{templeton.keystore.password}} > Default value: {{}} > Description: SSL certificate keystore password for WebHCat server > {{templeton.ssl.protocol.blacklist}} > Default value: {{SSLv2,SSLv3}} > Description: SSL Versions to disable for WebHCat server > {{templeton.host}} > Default value: {{0.0.0.0}} > Description: The host address the WebHCat server will listen on. > *Modifying the {{webhcat-site.xml}} file* > Configure the following properties in the {{webhcat-site.xml}} file to enable > SSL encryption on each node where WebHCat is installed: > {code} > > > templeton.use.ssl > true > > > templeton.keystore.path > /path/to/ssl_keystore > > > templeton.keystore.password > password > > {code} > *Example:* To check status of WebHCat server configured for SSL encryption > use following command > {code} > curl -k 'https://:@:50111/templeton/v1/status' > {code} > replace {{}} and {{}} with valid user/password. Replace > {{}} with your host name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18728) Secure webHCat with SSL
[ https://issues.apache.org/jira/browse/HIVE-18728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksiy Sayankin updated HIVE-18728: Status: Patch Available (was: In Progress) > Secure webHCat with SSL > --- > > Key: HIVE-18728 > URL: https://issues.apache.org/jira/browse/HIVE-18728 > Project: Hive > Issue Type: New Feature > Components: Security >Reporter: Oleksiy Sayankin >Assignee: Oleksiy Sayankin >Priority: Major > Fix For: 3.2.0 > > Attachments: HIVE-18728.1.patch, HIVE-18728.2.patch, > HIVE-18728.3.patch > > > Doc for the issue: > *Configure WebHCat server to use SSL encryption* > You can configure WebHCat REST-API to use SSL (Secure Sockets Layer) > encryption. The following WebHCat properties are added to enable SSL. > {{templeton.use.ssl}} > Default value: {{false}} > Description: Set this to true for using SSL encryption for WebHCat server > {{templeton.keystore.path}} > Default value: {{}} > Description: SSL certificate keystore location for WebHCat server > {{templeton.keystore.password}} > Default value: {{}} > Description: SSL certificate keystore password for WebHCat server > {{templeton.ssl.protocol.blacklist}} > Default value: {{SSLv2,SSLv3}} > Description: SSL Versions to disable for WebHCat server > {{templeton.host}} > Default value: {{0.0.0.0}} > Description: The host address the WebHCat server will listen on. > *Modifying the {{webhcat-site.xml}} file* > Configure the following properties in the {{webhcat-site.xml}} file to enable > SSL encryption on each node where WebHCat is installed: > {code} > > > templeton.use.ssl > true > > > templeton.keystore.path > /path/to/ssl_keystore > > > templeton.keystore.password > password > > {code} > *Example:* To check status of WebHCat server configured for SSL encryption > use following command > {code} > curl -k 'https://:@:50111/templeton/v1/status' > {code} > replace {{}} and {{}} with valid user/password. Replace > {{}} with your host name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21379) Mask password in DDL commands for table properties
[ https://issues.apache.org/jira/browse/HIVE-21379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-21379: -- Attachment: HIVE-21379.1.patch > Mask password in DDL commands for table properties > -- > > Key: HIVE-21379 > URL: https://issues.apache.org/jira/browse/HIVE-21379 > Project: Hive > Issue Type: Improvement >Reporter: Daniel Dai >Assignee: Daniel Dai >Priority: Major > Attachments: HIVE-21379.1.patch > > > We need to mask password related table properties (such as > hive.sql.dbcp.password) in DDL output, such as describe extended/describe > formatted/show create table/show tblproperties. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21001) Upgrade to calcite-1.18
[ https://issues.apache.org/jira/browse/HIVE-21001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-21001: Attachment: HIVE-21001.43.patch > Upgrade to calcite-1.18 > --- > > Key: HIVE-21001 > URL: https://issues.apache.org/jira/browse/HIVE-21001 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-21001.01.patch, HIVE-21001.01.patch, > HIVE-21001.02.patch, HIVE-21001.03.patch, HIVE-21001.04.patch, > HIVE-21001.05.patch, HIVE-21001.06.patch, HIVE-21001.06.patch, > HIVE-21001.07.patch, HIVE-21001.08.patch, HIVE-21001.08.patch, > HIVE-21001.08.patch, HIVE-21001.09.patch, HIVE-21001.09.patch, > HIVE-21001.09.patch, HIVE-21001.10.patch, HIVE-21001.11.patch, > HIVE-21001.12.patch, HIVE-21001.13.patch, HIVE-21001.15.patch, > HIVE-21001.16.patch, HIVE-21001.17.patch, HIVE-21001.18.patch, > HIVE-21001.18.patch, HIVE-21001.19.patch, HIVE-21001.20.patch, > HIVE-21001.21.patch, HIVE-21001.22.patch, HIVE-21001.22.patch, > HIVE-21001.22.patch, HIVE-21001.23.patch, HIVE-21001.24.patch, > HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.26.patch, > HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.27.patch, > HIVE-21001.28.patch, HIVE-21001.29.patch, HIVE-21001.29.patch, > HIVE-21001.30.patch, HIVE-21001.31.patch, HIVE-21001.32.patch, > HIVE-21001.34.patch, HIVE-21001.35.patch, HIVE-21001.36.patch, > HIVE-21001.37.patch, HIVE-21001.38.patch, HIVE-21001.39.patch, > HIVE-21001.40.patch, HIVE-21001.41.patch, HIVE-21001.42.patch, > HIVE-21001.43.patch > > > XLEAR LIBRARY CACHE -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21152) Rewrite if expression to case and recognize simple case as an if
[ https://issues.apache.org/jira/browse/HIVE-21152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-21152: Attachment: HIVE-21152.04.patch > Rewrite if expression to case and recognize simple case as an if > > > Key: HIVE-21152 > URL: https://issues.apache.org/jira/browse/HIVE-21152 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-21152.01.patch, HIVE-21152.02.patch, > HIVE-21152.03.patch, HIVE-21152.04.patch > > > * {{IF}} is not part of the sql standard; however given its special form its > simpler - and currently in Hive it also has vectorized support > * people writing standard sql may write: {{CASE WHEN member=1 THEN attr+1 > else attr+2 END}} which is essentially an if. > The idea is to rewrite IFs to CASEs for the cbo; and recognize simple > "CASE"-s as IFs to get vectorization on them if possible -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207281=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207281 ] ASF GitHub Bot logged work on HIVE-21286: - Author: ASF GitHub Bot Created on: 04/Mar/19 16:52 Start Date: 04/Mar/19 16:52 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #551: HIVE-21286: Hive should support clean-up of previously bootstrapped tables when retry from different dump. URL: https://github.com/apache/hive/pull/551#discussion_r262125175 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java ## @@ -279,6 +292,72 @@ a database ( directory ) return 0; } + /** + * Cleanup/drop tables from the given database which are bootstrapped by input dump dir. + * @throws HiveException Failed to drop the tables. + * @throws IOException File operations failure. + * @throws InvalidInputException Invalid input dump directory. + */ + private void bootstrapRollbackTask() throws HiveException, IOException, InvalidInputException { +Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback) +.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build(); +FileSystem fs = bootstrapDirectory.getFileSystem(conf); + +if (!fs.exists(bootstrapDirectory)) { + throw new InvalidInputException("Input bootstrap dump directory to rollback doesn't exist: " + + bootstrapDirectory); Review comment: Please add a test case covering this error i.e. when an invalid bootstrap dump location is specified. If the specified bootstrap dump (to rollback) location exists, how do we know that it is indeed the bootstrap dump location for external tables and not some other dump location like a genuine incremental dump or a genuine bootstrap dump? We should add testcases for the same as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 207281) Time Spent: 20m (was: 10m) > Hive should support clean-up of previously bootstrapped tables when retry > from different dump. > -- > > Key: HIVE-21286 > URL: https://issues.apache.org/jira/browse/HIVE-21286 > Project: Hive > Issue Type: Bug > Components: repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication, pull-request-available > Attachments: HIVE-21286.01.patch > > Time Spent: 20m > Remaining Estimate: 0h > > If external tables are enabled for replication on an existing repl policy, > then bootstrapping of external tables are combined with incremental dump. > If incremental bootstrap load fails with non-retryable error for which user > will have to manually drop all the external tables before trying with another > bootstrap dump. For full bootstrap, to retry with different dump, we > suggested user to drop the DB but in this case they need to manually drop all > the external tables which is not so user friendly. So, need to handle it in > Hive side as follows. > REPL LOAD takes additional config (passed by user in WITH clause) that says, > drop all the tables which are bootstrapped from previous dump. > hive.repl.rollback.bootstrap.load= > Hive will use this config only if the current dump is bootstrap dump or > combined bootstrap in incremental dump. > Caution to be taken by user that this config should not be passed if previous > REPL LOAD (with bootstrap) was successful or any successful incremental > dump+load happened after "previous_bootstrap_dump_dir". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21371) Make NonSyncByteArrayOutputStream Overflow Conscious
[ https://issues.apache.org/jira/browse/HIVE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783548#comment-16783548 ] Hive QA commented on HIVE-21371: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12961014/HIVE-21371.2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16324/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16324/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16324/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2019-03-04 16:49:51.912 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-16324/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2019-03-04 16:49:51.916 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at f51f108 HIVE-21255: Remove QueryConditionBuilder in JdbcStorageHandler (Daniel Dai, reviewed by Jesus Camacho Rodriguez) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at f51f108 HIVE-21255: Remove QueryConditionBuilder in JdbcStorageHandler (Daniel Dai, reviewed by Jesus Camacho Rodriguez) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2019-03-04 16:49:53.248 + rm -rf ../yetus_PreCommit-HIVE-Build-16324 + mkdir ../yetus_PreCommit-HIVE-Build-16324 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-16324 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-16324/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: a/common/src/java/org/apache/hadoop/hive/common/io/NonSyncByteArrayOutputStream.java: does not exist in index Going to apply patch with: git apply -p1 + [[ maven == \m\a\v\e\n ]] + rm -rf /data/hiveptest/working/maven/org/apache/hive + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/working/maven protoc-jar: executing: [/tmp/protoc5836577371602659102.exe, --version] libprotoc 2.5.0 protoc-jar: executing: [/tmp/protoc5836577371602659102.exe, -I/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore, --java_out=/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/target/generated-sources, /data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/metastore.proto] ANTLR Parser Generator Version 3.5.2 protoc-jar: executing: [/tmp/protoc4575417553313940759.exe, --version] libprotoc 2.5.0 ANTLR Parser Generator Version 3.5.2 Output file /data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-server/target/generated-sources/org/apache/hadoop/hive/metastore/parser/FilterParser.java does not exist: must build /data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/parser/Filter.g org/apache/hadoop/hive/metastore/parser/Filter.g log4j:WARN No appenders could be found for logger (DataNucleus.Persistence). log4j:WARN Please initialize the log4j system properly. DataNucleus Enhancer (version 4.1.17) for API "JDO" DataNucleus Enhancer completed with success for 41 classes. ANTLR Parser Generator Version 3.5.2 Output file /data/hiveptest/working/apache-github-source-source/ql/target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveLexer.java does not exist: must build
[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207282 ] ASF GitHub Bot logged work on HIVE-21286: - Author: ASF GitHub Bot Created on: 04/Mar/19 16:52 Start Date: 04/Mar/19 16:52 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #551: HIVE-21286: Hive should support clean-up of previously bootstrapped tables when retry from different dump. URL: https://github.com/apache/hive/pull/551#discussion_r262120754 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java ## @@ -534,6 +536,90 @@ public void bootstrapExternalTablesDuringIncrementalPhase() throws Throwable { .verifyResults(Arrays.asList("10", "20")); } + @Test + public void retryBootstrapExternalTablesFromDifferentDump() throws Throwable { +List loadWithClause = new ArrayList<>(); +loadWithClause.addAll(externalTableBasePathWithClause()); + +List dumpWithClause = Collections.singletonList( +"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + "'='false'" +); + +WarehouseInstance.Tuple tupleBootstrapWithoutExternal = primary +.run("use " + primaryDbName) +.run("create external table t1 (id int)") +.run("insert into table t1 values (1)") +.run("create external table t2 (place string) partitioned by (country string)") +.run("insert into table t2 partition(country='india') values ('bangalore')") +.run("insert into table t2 partition(country='us') values ('austin')") +.run("create table t3 as select * from t1") +.dump(primaryDbName, null, dumpWithClause); + +replica.load(replicatedDbName, tupleBootstrapWithoutExternal.dumpLocation, loadWithClause) +.status(replicatedDbName) +.verifyResult(tupleBootstrapWithoutExternal.lastReplicationId) +.run("use " + replicatedDbName) +.run("show tables") +.verifyResult("t3") +.run("select id from t3") +.verifyResult("1"); + +dumpWithClause = Arrays.asList("'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + "'='true'", +"'" + HiveConf.ConfVars.REPL_BOOTSTRAP_EXTERNAL_TABLES.varname + "'='true'"); +WarehouseInstance.Tuple tupleIncWithExternalBootstrap = primary.run("use " + primaryDbName) +.run("drop table t1") +.run("create external table t4 (id int)") +.run("insert into table t4 values (10)") +.run("create table t5 as select * from t4") +.dump(primaryDbName, tupleBootstrapWithoutExternal.lastReplicationId, dumpWithClause); + +// Verify if bootstrapping with same dump is idempotent and return same result +for (int i = 0; i < 2; i++) { + replica.load(replicatedDbName, tupleIncWithExternalBootstrap.dumpLocation, loadWithClause) + .status(replicatedDbName) + .verifyResult(tupleIncWithExternalBootstrap.lastReplicationId) + .run("use " + replicatedDbName) + .run("show tables like 't1'") + .verifyFailure(new String[]{"t1"}) + .run("select place from t2 where country = 'us'") + .verifyResult("austin") + .run("select id from t4") + .verifyResult("10") + .run("select id from t5") + .verifyResult("10"); +} + +// Drop an external table, add another managed table with same name, insert into existing external table +// and dump another bootstrap dump for external tables. +WarehouseInstance.Tuple tupleNewIncWithExternalBootstrap = primary.run("use " + primaryDbName) +.run("insert into table t2 partition(country='india') values ('chennai')") +.run("drop table t2") +.run("create table t2 as select * from t4") +.run("insert into table t4 values (20)") +.dump(primaryDbName, tupleIncWithExternalBootstrap.lastReplicationId, dumpWithClause); + +// Set previous dump as bootstrap to be rolled-back. Now, new bootstrap should overwrite the old one. +loadWithClause.add("'" + REPL_ROLLBACK_BOOTSTRAP_LOAD_CONFIG + "'='" ++ tupleIncWithExternalBootstrap.dumpLocation + "'"); Review comment: Please add a testcase which tests the bootstrapping when the previous bootstrap has failed halfway i.e. it has loaded some external tables but not all. This way we will know what happens when the re-bootstrap tries to remove an external table which wasn't loaded in the previous bootstrap load. This is an automated message from the Apache Git Service. To respond to the
[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207285=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207285 ] ASF GitHub Bot logged work on HIVE-21286: - Author: ASF GitHub Bot Created on: 04/Mar/19 16:52 Start Date: 04/Mar/19 16:52 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #551: HIVE-21286: Hive should support clean-up of previously bootstrapped tables when retry from different dump. URL: https://github.com/apache/hive/pull/551#discussion_r262141481 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java ## @@ -279,6 +292,72 @@ a database ( directory ) return 0; } + /** + * Cleanup/drop tables from the given database which are bootstrapped by input dump dir. + * @throws HiveException Failed to drop the tables. + * @throws IOException File operations failure. + * @throws InvalidInputException Invalid input dump directory. + */ + private void bootstrapRollbackTask() throws HiveException, IOException, InvalidInputException { +Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback) +.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build(); +FileSystem fs = bootstrapDirectory.getFileSystem(conf); + +if (!fs.exists(bootstrapDirectory)) { + throw new InvalidInputException("Input bootstrap dump directory to rollback doesn't exist: " + + bootstrapDirectory); +} + +FileStatus[] fileStatuses = fs.listStatus(bootstrapDirectory, EximUtil.getDirectoryFilter(fs)); +if ((fileStatuses == null) || (fileStatuses.length == 0)) { + throw new InvalidInputException("Input bootstrap dump directory to rollback is empty: " + + bootstrapDirectory); +} + +if (StringUtils.isNotBlank(work.dbNameToLoadIn) && (fileStatuses.length > 1)) { + throw new InvalidInputException("Multiple DB dirs in the dump: " + bootstrapDirectory + + " is not allowed to load to single target DB: " + work.dbNameToLoadIn); +} + +for (FileStatus dbDir : fileStatuses) { + Path dbLevelPath = dbDir.getPath(); + String dbNameInDump = dbLevelPath.getName(); + + List tableNames = new ArrayList<>(); + RemoteIterator filesIterator = fs.listFiles(dbLevelPath, true); + while (filesIterator.hasNext()) { +Path nextFile = filesIterator.next().getPath(); +String filePath = nextFile.toString(); +if (filePath.endsWith(EximUtil.METADATA_NAME)) { + // Remove dbLevelPath from the current path to check if this _metadata file is under DB or + // table level directory. + String replacedString = filePath.replace(dbLevelPath.toString(), ""); + if (!replacedString.equalsIgnoreCase(EximUtil.METADATA_NAME)) { +tableNames.add(nextFile.getParent().getName()); + } +} + } + + // No tables listed in the DB level directory to be dropped. + if (tableNames.isEmpty()) { +LOG.info("No tables are listed to be dropped for Database: {} in bootstrap dump: {}", +dbNameInDump, bootstrapDirectory); +continue; + } + + // Drop all tables bootstrapped from previous dump. + // Get the target DB in which previously bootstrapped tables to be dropped. If user specified + // DB name as input in REPL LOAD command, then use it. + String dbName = (StringUtils.isNotBlank(work.dbNameToLoadIn) ? work.dbNameToLoadIn : dbNameInDump); + + Hive db = getHive(); + for (String table : tableNames) { +db.dropTable(dbName + "." + table, true); Review comment: What happens to the underlying external table directory when an external table is dropped? Consider the case in the test where an external table is dropped and a managed table with the same name is created. Following sequence of events would leave a dangling external table directory in the file system 1. an external table is created 2. an incremental bootstrap dump is taken 3. external table is dropped and a managed table with the same name is created 4. previous incremental bootstrap dump fails to load after it has created the external table directory and copied files 5. a new incremental bootstrap dump is taken and loaded with location of the previous incremental bootstrap dump specified. 6. new incremental bootstrap dump is loaded This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 207285) > Hive should
[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207284 ] ASF GitHub Bot logged work on HIVE-21286: - Author: ASF GitHub Bot Created on: 04/Mar/19 16:52 Start Date: 04/Mar/19 16:52 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #551: HIVE-21286: Hive should support clean-up of previously bootstrapped tables when retry from different dump. URL: https://github.com/apache/hive/pull/551#discussion_r262147678 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java ## @@ -534,6 +536,90 @@ public void bootstrapExternalTablesDuringIncrementalPhase() throws Throwable { .verifyResults(Arrays.asList("10", "20")); } + @Test + public void retryBootstrapExternalTablesFromDifferentDump() throws Throwable { +List loadWithClause = new ArrayList<>(); +loadWithClause.addAll(externalTableBasePathWithClause()); + +List dumpWithClause = Collections.singletonList( +"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + "'='false'" +); + +WarehouseInstance.Tuple tupleBootstrapWithoutExternal = primary +.run("use " + primaryDbName) +.run("create external table t1 (id int)") +.run("insert into table t1 values (1)") +.run("create external table t2 (place string) partitioned by (country string)") +.run("insert into table t2 partition(country='india') values ('bangalore')") +.run("insert into table t2 partition(country='us') values ('austin')") +.run("create table t3 as select * from t1") +.dump(primaryDbName, null, dumpWithClause); + +replica.load(replicatedDbName, tupleBootstrapWithoutExternal.dumpLocation, loadWithClause) +.status(replicatedDbName) +.verifyResult(tupleBootstrapWithoutExternal.lastReplicationId) +.run("use " + replicatedDbName) +.run("show tables") +.verifyResult("t3") +.run("select id from t3") +.verifyResult("1"); + +dumpWithClause = Arrays.asList("'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + "'='true'", +"'" + HiveConf.ConfVars.REPL_BOOTSTRAP_EXTERNAL_TABLES.varname + "'='true'"); +WarehouseInstance.Tuple tupleIncWithExternalBootstrap = primary.run("use " + primaryDbName) +.run("drop table t1") +.run("create external table t4 (id int)") +.run("insert into table t4 values (10)") +.run("create table t5 as select * from t4") +.dump(primaryDbName, tupleBootstrapWithoutExternal.lastReplicationId, dumpWithClause); + +// Verify if bootstrapping with same dump is idempotent and return same result +for (int i = 0; i < 2; i++) { + replica.load(replicatedDbName, tupleIncWithExternalBootstrap.dumpLocation, loadWithClause) + .status(replicatedDbName) + .verifyResult(tupleIncWithExternalBootstrap.lastReplicationId) + .run("use " + replicatedDbName) + .run("show tables like 't1'") + .verifyFailure(new String[]{"t1"}) + .run("select place from t2 where country = 'us'") + .verifyResult("austin") + .run("select id from t4") + .verifyResult("10") + .run("select id from t5") + .verifyResult("10"); +} + +// Drop an external table, add another managed table with same name, insert into existing external table +// and dump another bootstrap dump for external tables. +WarehouseInstance.Tuple tupleNewIncWithExternalBootstrap = primary.run("use " + primaryDbName) +.run("insert into table t2 partition(country='india') values ('chennai')") +.run("drop table t2") +.run("create table t2 as select * from t4") +.run("insert into table t4 values (20)") +.dump(primaryDbName, tupleIncWithExternalBootstrap.lastReplicationId, dumpWithClause); + +// Set previous dump as bootstrap to be rolled-back. Now, new bootstrap should overwrite the old one. +loadWithClause.add("'" + REPL_ROLLBACK_BOOTSTRAP_LOAD_CONFIG + "'='" ++ tupleIncWithExternalBootstrap.dumpLocation + "'"); +replica.load(replicatedDbName, tupleNewIncWithExternalBootstrap.dumpLocation, loadWithClause) +.run("use " + replicatedDbName) +.run("show tables like 't1'") +.verifyFailure(new String[]{"t1"}) +.run("select id from t2") +.verifyResult("10") +.run("select id from t4") +.verifyResults(Arrays.asList("10", "20")) +.run("select id from t5") +
[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207286=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207286 ] ASF GitHub Bot logged work on HIVE-21286: - Author: ASF GitHub Bot Created on: 04/Mar/19 16:52 Start Date: 04/Mar/19 16:52 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #551: HIVE-21286: Hive should support clean-up of previously bootstrapped tables when retry from different dump. URL: https://github.com/apache/hive/pull/551#discussion_r262144393 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java ## @@ -66,6 +66,10 @@ // tasks. public static final String REPL_CURRENT_TBL_WRITE_ID = "hive.repl.current.table.write.id"; + // Configuration to be received via WITH clause of REPL LOAD to rollback any previously failed + // bootstrap load. + public static final String REPL_ROLLBACK_BOOTSTRAP_LOAD_CONFIG = "hive.repl.rollback.bootstrap.load"; Review comment: Can this option be specified with a regular bootstrap directory? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 207286) Time Spent: 40m (was: 0.5h) > Hive should support clean-up of previously bootstrapped tables when retry > from different dump. > -- > > Key: HIVE-21286 > URL: https://issues.apache.org/jira/browse/HIVE-21286 > Project: Hive > Issue Type: Bug > Components: repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication, pull-request-available > Attachments: HIVE-21286.01.patch > > Time Spent: 40m > Remaining Estimate: 0h > > If external tables are enabled for replication on an existing repl policy, > then bootstrapping of external tables are combined with incremental dump. > If incremental bootstrap load fails with non-retryable error for which user > will have to manually drop all the external tables before trying with another > bootstrap dump. For full bootstrap, to retry with different dump, we > suggested user to drop the DB but in this case they need to manually drop all > the external tables which is not so user friendly. So, need to handle it in > Hive side as follows. > REPL LOAD takes additional config (passed by user in WITH clause) that says, > drop all the tables which are bootstrapped from previous dump. > hive.repl.rollback.bootstrap.load= > Hive will use this config only if the current dump is bootstrap dump or > combined bootstrap in incremental dump. > Caution to be taken by user that this config should not be passed if previous > REPL LOAD (with bootstrap) was successful or any successful incremental > dump+load happened after "previous_bootstrap_dump_dir". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.
[ https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=207283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207283 ] ASF GitHub Bot logged work on HIVE-21286: - Author: ASF GitHub Bot Created on: 04/Mar/19 16:52 Start Date: 04/Mar/19 16:52 Worklog Time Spent: 10m Work Description: ashutosh-bapat commented on pull request #551: HIVE-21286: Hive should support clean-up of previously bootstrapped tables when retry from different dump. URL: https://github.com/apache/hive/pull/551#discussion_r262138331 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java ## @@ -279,6 +292,72 @@ a database ( directory ) return 0; } + /** + * Cleanup/drop tables from the given database which are bootstrapped by input dump dir. + * @throws HiveException Failed to drop the tables. + * @throws IOException File operations failure. + * @throws InvalidInputException Invalid input dump directory. + */ + private void bootstrapRollbackTask() throws HiveException, IOException, InvalidInputException { +Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback) +.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build(); +FileSystem fs = bootstrapDirectory.getFileSystem(conf); + +if (!fs.exists(bootstrapDirectory)) { + throw new InvalidInputException("Input bootstrap dump directory to rollback doesn't exist: " + + bootstrapDirectory); +} + +FileStatus[] fileStatuses = fs.listStatus(bootstrapDirectory, EximUtil.getDirectoryFilter(fs)); +if ((fileStatuses == null) || (fileStatuses.length == 0)) { + throw new InvalidInputException("Input bootstrap dump directory to rollback is empty: " + + bootstrapDirectory); +} + +if (StringUtils.isNotBlank(work.dbNameToLoadIn) && (fileStatuses.length > 1)) { + throw new InvalidInputException("Multiple DB dirs in the dump: " + bootstrapDirectory + + " is not allowed to load to single target DB: " + work.dbNameToLoadIn); +} + +for (FileStatus dbDir : fileStatuses) { Review comment: Given the above two conditions there's going to be exactly one entry in the fileStatuses array. Why do we need a for loop here? We could just get that one entry into dbDir and write rest of the code without a loop? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 207283) > Hive should support clean-up of previously bootstrapped tables when retry > from different dump. > -- > > Key: HIVE-21286 > URL: https://issues.apache.org/jira/browse/HIVE-21286 > Project: Hive > Issue Type: Bug > Components: repl >Affects Versions: 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: DR, Replication, pull-request-available > Attachments: HIVE-21286.01.patch > > Time Spent: 20m > Remaining Estimate: 0h > > If external tables are enabled for replication on an existing repl policy, > then bootstrapping of external tables are combined with incremental dump. > If incremental bootstrap load fails with non-retryable error for which user > will have to manually drop all the external tables before trying with another > bootstrap dump. For full bootstrap, to retry with different dump, we > suggested user to drop the DB but in this case they need to manually drop all > the external tables which is not so user friendly. So, need to handle it in > Hive side as follows. > REPL LOAD takes additional config (passed by user in WITH clause) that says, > drop all the tables which are bootstrapped from previous dump. > hive.repl.rollback.bootstrap.load= > Hive will use this config only if the current dump is bootstrap dump or > combined bootstrap in incremental dump. > Caution to be taken by user that this config should not be passed if previous > REPL LOAD (with bootstrap) was successful or any successful incremental > dump+load happened after "previous_bootstrap_dump_dir". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21001) Upgrade to calcite-1.18
[ https://issues.apache.org/jira/browse/HIVE-21001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783529#comment-16783529 ] Hive QA commented on HIVE-21001: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12961010/HIVE-21001.42.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16323/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16323/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16323/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2019-03-04 16:35:33.530 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-16323/source-prep.txt + [[ true == \t\r\u\e ]] + rm -rf ivy maven + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2019-03-04 16:35:34.219 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at f51f108 HIVE-21255: Remove QueryConditionBuilder in JdbcStorageHandler (Daniel Dai, reviewed by Jesus Camacho Rodriguez) + git clean -f -d Removing standalone-metastore/metastore-server/src/gen/ + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at f51f108 HIVE-21255: Remove QueryConditionBuilder in JdbcStorageHandler (Daniel Dai, reviewed by Jesus Camacho Rodriguez) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2019-03-04 16:35:34.957 + rm -rf ../yetus_PreCommit-HIVE-Build-16323 + mkdir ../yetus_PreCommit-HIVE-Build-16323 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-16323 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-16323/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Going to apply patch with: git apply -p0 /data/hiveptest/working/scratch/build.patch:598: trailing whitespace. explain cbo select * from part_null where /data/hiveptest/working/scratch/build.patch:1101: trailing whitespace. Map 1 /data/hiveptest/working/scratch/build.patch:1122: trailing whitespace. Reducer 2 /data/hiveptest/working/scratch/build.patch:1181: trailing whitespace. Map 1 /data/hiveptest/working/scratch/build.patch:1202: trailing whitespace. Reducer 2 warning: squelched 60 whitespace errors warning: 65 lines add whitespace errors. + [[ maven == \m\a\v\e\n ]] + rm -rf /data/hiveptest/working/maven/org/apache/hive + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/working/maven [ERROR] Failed to execute goal on project hive-shims-0.23: Could not resolve dependencies for project org.apache.hive.shims:hive-shims-0.23:jar:4.0.0-SNAPSHOT: Could not find artifact dnsjava:dnsjava:jar:2.1.7 -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hive-shims-0.23 + result=1 + '[' 1 -ne 0 ']' + rm -rf yetus_PreCommit-HIVE-Build-16323 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12961010 - PreCommit-HIVE-Build > Upgrade to calcite-1.18 > --- > > Key: HIVE-21001 > URL: https://issues.apache.org/jira/browse/HIVE-21001 > Project: Hive > Issue Type: Improvement >
[jira] [Updated] (HIVE-21371) Make NonSyncByteArrayOutputStream Overflow Conscious
[ https://issues.apache.org/jira/browse/HIVE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HIVE-21371: -- Status: Patch Available (was: Open) > Make NonSyncByteArrayOutputStream Overflow Conscious > - > > Key: HIVE-21371 > URL: https://issues.apache.org/jira/browse/HIVE-21371 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HIVE-21371.1.patch, HIVE-21371.2.patch > > > {code:java|title=NonSyncByteArrayOutputStream} > private int enLargeBuffer(int increment) { > int temp = count + increment; > int newLen = temp; > if (temp > buf.length) { > if ((buf.length << 1) > temp) { > newLen = buf.length << 1; > } > byte newbuf[] = new byte[newLen]; > System.arraycopy(buf, 0, newbuf, 0, count); > buf = newbuf; > } > return newLen; > } > {code} > This will fail if the array is 2GB or larger because it will double the size > every time without consideration for the 4GB limit on arrays. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21371) Make NonSyncByteArrayOutputStream Overflow Conscious
[ https://issues.apache.org/jira/browse/HIVE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HIVE-21371: -- Attachment: HIVE-21371.2.patch > Make NonSyncByteArrayOutputStream Overflow Conscious > - > > Key: HIVE-21371 > URL: https://issues.apache.org/jira/browse/HIVE-21371 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HIVE-21371.1.patch, HIVE-21371.2.patch > > > {code:java|title=NonSyncByteArrayOutputStream} > private int enLargeBuffer(int increment) { > int temp = count + increment; > int newLen = temp; > if (temp > buf.length) { > if ((buf.length << 1) > temp) { > newLen = buf.length << 1; > } > byte newbuf[] = new byte[newLen]; > System.arraycopy(buf, 0, newbuf, 0, count); > buf = newbuf; > } > return newLen; > } > {code} > This will fail if the array is 2GB or larger because it will double the size > every time without consideration for the 4GB limit on arrays. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21371) Make NonSyncByteArrayOutputStream Overflow Conscious
[ https://issues.apache.org/jira/browse/HIVE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HIVE-21371: -- Status: Open (was: Patch Available) > Make NonSyncByteArrayOutputStream Overflow Conscious > - > > Key: HIVE-21371 > URL: https://issues.apache.org/jira/browse/HIVE-21371 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HIVE-21371.1.patch, HIVE-21371.2.patch > > > {code:java|title=NonSyncByteArrayOutputStream} > private int enLargeBuffer(int increment) { > int temp = count + increment; > int newLen = temp; > if (temp > buf.length) { > if ((buf.length << 1) > temp) { > newLen = buf.length << 1; > } > byte newbuf[] = new byte[newLen]; > System.arraycopy(buf, 0, newbuf, 0, count); > buf = newbuf; > } > return newLen; > } > {code} > This will fail if the array is 2GB or larger because it will double the size > every time without consideration for the 4GB limit on arrays. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query
[ https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783485#comment-16783485 ] Sofia commented on HIVE-21367: -- The target table is from two different sources : * {color:#33}*From SQOOP*{color}: when loading tables we use the following code. {code:java} sqoop import --connect ${CONNECTION} \ --username ${USER} \ --password ${PASSWORD} \ --table $1 \ --hive-database $2 \ --hive-table ${TBNAME} \ --hive-import \ --as-orcfile \ --hive-overwrite \ -m 1 \ --delete-target-dir {code} * *From SPARK*: when processing the data, we store the output as a table in hive using the following code. {code:java} df.write .mode(mode) .format(HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR) .option("table",tableName) .save(){code} How do we load the data into the root path of the target table in each case ? > Hive returns an incorrect result when using a simple select query > - > > Key: HIVE-21367 > URL: https://issues.apache.org/jira/browse/HIVE-21367 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, JDBC, SQL >Affects Versions: 3.1.0 > Environment: - HDP 3.1 > - Hive 3.1.0 > - Spark 2.3.2 > - Sqoop 1.4.7 >Reporter: LEMBARKI Mohamed Amine >Priority: Blocker > Attachments: mapred_input_dir_recursive.png > > > Hive returns an incorrect result when using a simple select query with a > where clause > While with an aggregation it returns a correct result > The problem arises for tables created by Spark or Sqoop > Also when we use spark-shell with HiveWarehouseConnector it returns a correct > result > > Workflow: > - Loading data with sqoop to hive > - Data processing with spark using HiveWarehouseConnector and Storage to > Hive > > below the error log : > > */-* > *1 - Executing Query : select code from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.142 seconds > INFO : Executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Completed executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.003 seconds > INFO : OK > +--+ > | code | > +--+ > +--+ > No rows selected (4,307 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > {code} > */-* > *2 - Executing Query using count :* > *select count(code) from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type
[jira] [Updated] (HIVE-21001) Upgrade to calcite-1.18
[ https://issues.apache.org/jira/browse/HIVE-21001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-21001: Attachment: HIVE-21001.42.patch > Upgrade to calcite-1.18 > --- > > Key: HIVE-21001 > URL: https://issues.apache.org/jira/browse/HIVE-21001 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-21001.01.patch, HIVE-21001.01.patch, > HIVE-21001.02.patch, HIVE-21001.03.patch, HIVE-21001.04.patch, > HIVE-21001.05.patch, HIVE-21001.06.patch, HIVE-21001.06.patch, > HIVE-21001.07.patch, HIVE-21001.08.patch, HIVE-21001.08.patch, > HIVE-21001.08.patch, HIVE-21001.09.patch, HIVE-21001.09.patch, > HIVE-21001.09.patch, HIVE-21001.10.patch, HIVE-21001.11.patch, > HIVE-21001.12.patch, HIVE-21001.13.patch, HIVE-21001.15.patch, > HIVE-21001.16.patch, HIVE-21001.17.patch, HIVE-21001.18.patch, > HIVE-21001.18.patch, HIVE-21001.19.patch, HIVE-21001.20.patch, > HIVE-21001.21.patch, HIVE-21001.22.patch, HIVE-21001.22.patch, > HIVE-21001.22.patch, HIVE-21001.23.patch, HIVE-21001.24.patch, > HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.26.patch, > HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.27.patch, > HIVE-21001.28.patch, HIVE-21001.29.patch, HIVE-21001.29.patch, > HIVE-21001.30.patch, HIVE-21001.31.patch, HIVE-21001.32.patch, > HIVE-21001.34.patch, HIVE-21001.35.patch, HIVE-21001.36.patch, > HIVE-21001.37.patch, HIVE-21001.38.patch, HIVE-21001.39.patch, > HIVE-21001.40.patch, HIVE-21001.41.patch, HIVE-21001.42.patch > > > XLEAR LIBRARY CACHE -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query
[ https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783445#comment-16783445 ] star commented on HIVE-21367: - Seems it only takes effect in mapreduce, not fetchtask. I have to figure out why hive don't support such configuration. Maybe there are other considerations I don't notice at moment. By the way, why do you make a subdirectories when using sqoop? You can load data to the root path of the target table. > Hive returns an incorrect result when using a simple select query > - > > Key: HIVE-21367 > URL: https://issues.apache.org/jira/browse/HIVE-21367 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, JDBC, SQL >Affects Versions: 3.1.0 > Environment: - HDP 3.1 > - Hive 3.1.0 > - Spark 2.3.2 > - Sqoop 1.4.7 >Reporter: LEMBARKI Mohamed Amine >Priority: Blocker > Attachments: mapred_input_dir_recursive.png > > > Hive returns an incorrect result when using a simple select query with a > where clause > While with an aggregation it returns a correct result > The problem arises for tables created by Spark or Sqoop > Also when we use spark-shell with HiveWarehouseConnector it returns a correct > result > > Workflow: > - Loading data with sqoop to hive > - Data processing with spark using HiveWarehouseConnector and Storage to > Hive > > below the error log : > > */-* > *1 - Executing Query : select code from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.142 seconds > INFO : Executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Completed executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.003 seconds > INFO : OK > +--+ > | code | > +--+ > +--+ > No rows selected (4,307 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > {code} > */-* > *2 - Executing Query using count :* > *select count(code) from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation:
[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query
[ https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783299#comment-16783299 ] LEMBARKI Mohamed Amine commented on HIVE-21367: --- Hi, we've set the property mapred.input.dir.recursive to true using Ambari but unfortunately the problem is still the same. is this property concern also FetchTask ? !mapred_input_dir_recursive.png! > Hive returns an incorrect result when using a simple select query > - > > Key: HIVE-21367 > URL: https://issues.apache.org/jira/browse/HIVE-21367 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, JDBC, SQL >Affects Versions: 3.1.0 > Environment: - HDP 3.1 > - Hive 3.1.0 > - Spark 2.3.2 > - Sqoop 1.4.7 >Reporter: LEMBARKI Mohamed Amine >Priority: Blocker > Attachments: mapred_input_dir_recursive.png > > > Hive returns an incorrect result when using a simple select query with a > where clause > While with an aggregation it returns a correct result > The problem arises for tables created by Spark or Sqoop > Also when we use spark-shell with HiveWarehouseConnector it returns a correct > result > > Workflow: > - Loading data with sqoop to hive > - Data processing with spark using HiveWarehouseConnector and Storage to > Hive > > below the error log : > > */-* > *1 - Executing Query : select code from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.142 seconds > INFO : Executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Completed executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.003 seconds > INFO : OK > +--+ > | code | > +--+ > +--+ > No rows selected (4,307 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > {code} > */-* > *2 - Executing Query using count :* > *select count(code) from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling >
[jira] [Updated] (HIVE-21367) Hive returns an incorrect result when using a simple select query
[ https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LEMBARKI Mohamed Amine updated HIVE-21367: -- Attachment: mapred_input_dir_recursive.png > Hive returns an incorrect result when using a simple select query > - > > Key: HIVE-21367 > URL: https://issues.apache.org/jira/browse/HIVE-21367 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, JDBC, SQL >Affects Versions: 3.1.0 > Environment: - HDP 3.1 > - Hive 3.1.0 > - Spark 2.3.2 > - Sqoop 1.4.7 >Reporter: LEMBARKI Mohamed Amine >Priority: Blocker > Attachments: mapred_input_dir_recursive.png > > > Hive returns an incorrect result when using a simple select query with a > where clause > While with an aggregation it returns a correct result > The problem arises for tables created by Spark or Sqoop > Also when we use spark-shell with HiveWarehouseConnector it returns a correct > result > > Workflow: > - Loading data with sqoop to hive > - Data processing with spark using HiveWarehouseConnector and Storage to > Hive > > below the error log : > > */-* > *1 - Executing Query : select code from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.142 seconds > INFO : Executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Completed executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.003 seconds > INFO : OK > +--+ > | code | > +--+ > +--+ > No rows selected (4,307 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > {code} > */-* > *2 - Executing Query using count :* > *select count(code) from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e): > select count(code) from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema:
[jira] [Updated] (HIVE-11091) Unable to load data into hive table using "Load data local inapth" command from unix named pipe
[ https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexandros updated HIVE-11091: -- Priority: Critical (was: Blocker) > Unable to load data into hive table using "Load data local inapth" command > from unix named pipe > --- > > Key: HIVE-11091 > URL: https://issues.apache.org/jira/browse/HIVE-11091 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: Unix,MacOS >Reporter: Manoranjan Sahoo >Assignee: Alexandros >Priority: Critical > > Unable to load data into hive table from unix named pipe in Hive 0.14.0 > Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0): > > $ mkfifo /tmp/test.txt > $ hive > hive> create table test(id bigint,name string); > OK > Time taken: 1.018 seconds > hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; > Loading data to table default.test > Failed with exception addFiles: filesystem error in check phase > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask > But in Hadoop 1.3 and hive 0.11.0 it works fine: > hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; > Copying data from file:/tmp/test.txt > Copying file: file:/tmp/test.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-11091) Unable to load data into hive table using "Load data local inapth" command from unix named pipe
[ https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexandros updated HIVE-11091: -- Priority: Critical (was: Blocker) > Unable to load data into hive table using "Load data local inapth" command > from unix named pipe > --- > > Key: HIVE-11091 > URL: https://issues.apache.org/jira/browse/HIVE-11091 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: Unix,MacOS >Reporter: Manoranjan Sahoo >Assignee: Alexandros >Priority: Critical > > Unable to load data into hive table from unix named pipe in Hive 0.14.0 > Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0): > > $ mkfifo /tmp/test.txt > $ hive > hive> create table test(id bigint,name string); > OK > Time taken: 1.018 seconds > hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; > Loading data to table default.test > Failed with exception addFiles: filesystem error in check phase > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask > But in Hadoop 1.3 and hive 0.11.0 it works fine: > hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; > Copying data from file:/tmp/test.txt > Copying file: file:/tmp/test.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21377) Using Oracle as HMS DB with DirectSQL
[ https://issues.apache.org/jira/browse/HIVE-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783274#comment-16783274 ] Peter Vary commented on HIVE-21377: --- [~hibosoon]: Which version of Oracle, and which version of jdbc driver you use? Once upon a time I have been testing this codepath on our Oracle implementation (do not remember the actual version), and that seemed to be working. CC: [~karthik.manamcheri] > Using Oracle as HMS DB with DirectSQL > - > > Key: HIVE-21377 > URL: https://issues.apache.org/jira/browse/HIVE-21377 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 3.0.0, 3.1.0 >Reporter: Bo >Priority: Major > > When we use the Oracle as HMS DB, we saw this kind of contents in the HMS log > accordingly: > {code:java} > 2019-02-02 T08:23:57,102 WARN [Thread-12]: metastore.ObjectStore > (ObjectStore.java:handleDirectSqlError(3741)) - Falling back to ORM path due > to direct SQL failure (this is not an error): Cannot extract boolean from > column value 0 at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.extractSqlBoolean(MetaStoreDirectSql.java:1031) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsFromPartitionIds(MetaStoreDirectSql.java:728) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$300(MetaStoreDirectSql.java:109) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql$1.run(MetaStoreDirectSql.java:471) > at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:73) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:462) > at > org.apache.hadoop.hive.metastore.ObjectStore$8.getSqlResult(ObjectStore.java:3392) > {code} > In Hive, we handle the Postgres, MySQL and Derby for the extractSqlBoolean. > But Oracle return the 0 or 1 for Boolean. So we need to modify the > MetastoreDirectSqlUtils.java - [1] > So, could add this snip in this code? > {code:java} > static Boolean extractSqlBoolean(Object value) throws MetaException { > if (value == null) { > return null; > } > if (value instanceof Boolean) { > return (Boolean)value; > } > if (value instanceof Number) { // add > try { > return BooleanUtils.toBooleanObject((Decimal) value, 1, 0, null); > } catch(IllegalArugmentExeception iae){ > // NOOP > } > if (value instanceof String) { > try { > return BooleanUtils.toBooleanObject((String) value, "Y", "N", null); > } catch (IllegalArgumentException iae) { > // NOOP > } > } > throw new MetaException("Cannot extract boolean from column value " + > value); > } > {code} > [1] - > https://github.com/apache/hive/blob/f51f108b761f0c88647f48f30447dae12b308f31/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java#L501-L527 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-11091) Unable to load data into hive table using "Load data local inapth" command from unix named pipe
[ https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexandros updated HIVE-11091: -- Priority: Blocker (was: Critical) > Unable to load data into hive table using "Load data local inapth" command > from unix named pipe > --- > > Key: HIVE-11091 > URL: https://issues.apache.org/jira/browse/HIVE-11091 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: Unix,MacOS >Reporter: Manoranjan Sahoo >Assignee: Alexandros >Priority: Blocker > > Unable to load data into hive table from unix named pipe in Hive 0.14.0 > Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0): > > $ mkfifo /tmp/test.txt > $ hive > hive> create table test(id bigint,name string); > OK > Time taken: 1.018 seconds > hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; > Loading data to table default.test > Failed with exception addFiles: filesystem error in check phase > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask > But in Hadoop 1.3 and hive 0.11.0 it works fine: > hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; > Copying data from file:/tmp/test.txt > Copying file: file:/tmp/test.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-11091) Unable to load data into hive table using "Load data local inapth" command from unix named pipe
[ https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783273#comment-16783273 ] Alexandros commented on HIVE-11091: --- Why Blocer > Unable to load data into hive table using "Load data local inapth" command > from unix named pipe > --- > > Key: HIVE-11091 > URL: https://issues.apache.org/jira/browse/HIVE-11091 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: Unix,MacOS >Reporter: Manoranjan Sahoo >Assignee: Alexandros >Priority: Blocker > > Unable to load data into hive table from unix named pipe in Hive 0.14.0 > Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0): > > $ mkfifo /tmp/test.txt > $ hive > hive> create table test(id bigint,name string); > OK > Time taken: 1.018 seconds > hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; > Loading data to table default.test > Failed with exception addFiles: filesystem error in check phase > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask > But in Hadoop 1.3 and hive 0.11.0 it works fine: > hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; > Copying data from file:/tmp/test.txt > Copying file: file:/tmp/test.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-11091) Unable to load data into hive table using "Load data local inapth" command from unix named pipe
[ https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexandros reassigned HIVE-11091: - Assignee: Alexandros > Unable to load data into hive table using "Load data local inapth" command > from unix named pipe > --- > > Key: HIVE-11091 > URL: https://issues.apache.org/jira/browse/HIVE-11091 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: Unix,MacOS >Reporter: Manoranjan Sahoo >Assignee: Alexandros >Priority: Blocker > > Unable to load data into hive table from unix named pipe in Hive 0.14.0 > Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0): > > $ mkfifo /tmp/test.txt > $ hive > hive> create table test(id bigint,name string); > OK > Time taken: 1.018 seconds > hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; > Loading data to table default.test > Failed with exception addFiles: filesystem error in check phase > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask > But in Hadoop 1.3 and hive 0.11.0 it works fine: > hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; > Copying data from file:/tmp/test.txt > Copying file: file:/tmp/test.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query
[ https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783264#comment-16783264 ] star commented on HIVE-21367: - Basically hive will change a simple select into 'FetchTask' which will be executed locally(no map reduce task). While complicated select will be executed as a mapreduce( or tez) task, which supports subdirs. FetchTask differ from mapreduce。 Setting mapred.input.dir.recursive to true in hive-site.xml is expected to solve the problem. > Hive returns an incorrect result when using a simple select query > - > > Key: HIVE-21367 > URL: https://issues.apache.org/jira/browse/HIVE-21367 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, JDBC, SQL >Affects Versions: 3.1.0 > Environment: - HDP 3.1 > - Hive 3.1.0 > - Spark 2.3.2 > - Sqoop 1.4.7 >Reporter: LEMBARKI Mohamed Amine >Priority: Blocker > > Hive returns an incorrect result when using a simple select query with a > where clause > While with an aggregation it returns a correct result > The problem arises for tables created by Spark or Sqoop > Also when we use spark-shell with HiveWarehouseConnector it returns a correct > result > > Workflow: > - Loading data with sqoop to hive > - Data processing with spark using HiveWarehouseConnector and Storage to > Hive > > below the error log : > > */-* > *1 - Executing Query : select code from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.142 seconds > INFO : Executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Completed executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.003 seconds > INFO : OK > +--+ > | code | > +--+ > +--+ > No rows selected (4,307 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > {code} > */-* > *2 - Executing Query using count :* > *select count(code) from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO
[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query
[ https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783236#comment-16783236 ] Sofia commented on HIVE-21367: -- Hi [~starphin], why do hive behave that way and create subdirs when executing a simple select ? Is there any workaround for that ? > Hive returns an incorrect result when using a simple select query > - > > Key: HIVE-21367 > URL: https://issues.apache.org/jira/browse/HIVE-21367 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, JDBC, SQL >Affects Versions: 3.1.0 > Environment: - HDP 3.1 > - Hive 3.1.0 > - Spark 2.3.2 > - Sqoop 1.4.7 >Reporter: LEMBARKI Mohamed Amine >Priority: Blocker > > Hive returns an incorrect result when using a simple select query with a > where clause > While with an aggregation it returns a correct result > The problem arises for tables created by Spark or Sqoop > Also when we use spark-shell with HiveWarehouseConnector it returns a correct > result > > Workflow: > - Loading data with sqoop to hive > - Data processing with spark using HiveWarehouseConnector and Storage to > Hive > > below the error log : > > */-* > *1 - Executing Query : select code from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.142 seconds > INFO : Executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Completed executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.003 seconds > INFO : OK > +--+ > | code | > +--+ > +--+ > No rows selected (4,307 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > {code} > */-* > *2 - Executing Query using count :* > *select count(code) from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e): > select count(code) from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) >
[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query
[ https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783205#comment-16783205 ] LEMBARKI Mohamed Amine commented on HIVE-21367: --- Hi, I just moved the files after tbl1, and it gives a correct result ! {code:java} [hdfs@data1 ~]$ hadoop fs -cp /warehouse/tablespace/managed/hive/db1.db/tbl1/delta_001_001_/* /warehouse/tablespace/managed/hive/db1.db/tbl1/ [hdfs@data1 ~] hadoop fs -rm -r /warehouse/tablespace/managed/hive/db1.db/tbl1/delta_001_001_ {code} so the question now is : how hive can support subdirectories ? > Hive returns an incorrect result when using a simple select query > - > > Key: HIVE-21367 > URL: https://issues.apache.org/jira/browse/HIVE-21367 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, JDBC, SQL >Affects Versions: 3.1.0 > Environment: - HDP 3.1 > - Hive 3.1.0 > - Spark 2.3.2 > - Sqoop 1.4.7 >Reporter: LEMBARKI Mohamed Amine >Priority: Blocker > > Hive returns an incorrect result when using a simple select query with a > where clause > While with an aggregation it returns a correct result > The problem arises for tables created by Spark or Sqoop > Also when we use spark-shell with HiveWarehouseConnector it returns a correct > result > > Workflow: > - Loading data with sqoop to hive > - Data processing with spark using HiveWarehouseConnector and Storage to > Hive > > below the error log : > > */-* > *1 - Executing Query : select code from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.142 seconds > INFO : Executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Completed executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.003 seconds > INFO : OK > +--+ > | code | > +--+ > +--+ > No rows selected (4,307 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > {code} > */-* > *2 - Executing Query using count :* > *select count(code) from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version
[jira] [Commented] (HIVE-21362) Add an input format and serde to read from protobuf files.
[ https://issues.apache.org/jira/browse/HIVE-21362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783184#comment-16783184 ] Harish Jaiprakash commented on HIVE-21362: -- Test failures not related, it has cleared once earlier. Only fixes were for codestyle errors after that. 'whitespace' errors are in generated file. Not sure how to exclude that. > Add an input format and serde to read from protobuf files. > -- > > Key: HIVE-21362 > URL: https://issues.apache.org/jira/browse/HIVE-21362 > Project: Hive > Issue Type: Task > Components: HiveServer2 >Reporter: Harish Jaiprakash >Assignee: Harish Jaiprakash >Priority: Critical > Attachments: HIVE-21362.01.patch, HIVE-21362.02.patch, > HIVE-21362.03.patch, HIVE-21362.04.patch, HIVE-21362.05.patch > > > Logs are being generated using the HiveProtoLoggingHook and tez > ProtoHistoryLoggingService. These are sequence files written using > ProtobufMessageWritable. > Implement a SerDe and input format to be able to create tables using these > files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21312) FSStatsAggregator::connect is slow
[ https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783163#comment-16783163 ] Zoltan Haindrich commented on HIVE-21312: - +1 > FSStatsAggregator::connect is slow > -- > > Key: HIVE-21312 > URL: https://issues.apache.org/jira/browse/HIVE-21312 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Trivial > Attachments: HIVE-21312.1.patch, HIVE-21312.2.patch, > HIVE-21312.3.patch, HIVE-21312.4.patch, HIVE-21312.5.patch, HIVE-21312.6.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21362) Add an input format and serde to read from protobuf files.
[ https://issues.apache.org/jira/browse/HIVE-21362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783132#comment-16783132 ] Hive QA commented on HIVE-21362: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12960957/HIVE-21362.05.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 15823 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16322/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16322/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16322/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12960957 - PreCommit-HIVE-Build > Add an input format and serde to read from protobuf files. > -- > > Key: HIVE-21362 > URL: https://issues.apache.org/jira/browse/HIVE-21362 > Project: Hive > Issue Type: Task > Components: HiveServer2 >Reporter: Harish Jaiprakash >Assignee: Harish Jaiprakash >Priority: Critical > Attachments: HIVE-21362.01.patch, HIVE-21362.02.patch, > HIVE-21362.03.patch, HIVE-21362.04.patch, HIVE-21362.05.patch > > > Logs are being generated using the HiveProtoLoggingHook and tez > ProtoHistoryLoggingService. These are sequence files written using > ProtobufMessageWritable. > Implement a SerDe and input format to be able to create tables using these > files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query
[ https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783114#comment-16783114 ] star commented on HIVE-21367: - Or you can mv files from subdirs to the root dir of the table. I suspect that it is due to subdirs. Hive do not support subdirs by default. > Hive returns an incorrect result when using a simple select query > - > > Key: HIVE-21367 > URL: https://issues.apache.org/jira/browse/HIVE-21367 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, JDBC, SQL >Affects Versions: 3.1.0 > Environment: - HDP 3.1 > - Hive 3.1.0 > - Spark 2.3.2 > - Sqoop 1.4.7 >Reporter: LEMBARKI Mohamed Amine >Priority: Blocker > > Hive returns an incorrect result when using a simple select query with a > where clause > While with an aggregation it returns a correct result > The problem arises for tables created by Spark or Sqoop > Also when we use spark-shell with HiveWarehouseConnector it returns a correct > result > > Workflow: > - Loading data with sqoop to hive > - Data processing with spark using HiveWarehouseConnector and Storage to > Hive > > below the error log : > > */-* > *1 - Executing Query : select code from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.142 seconds > INFO : Executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Completed executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.003 seconds > INFO : OK > +--+ > | code | > +--+ > +--+ > No rows selected (4,307 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > {code} > */-* > *2 - Executing Query using count :* > *select count(code) from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e): > select count(code) from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false)
[jira] [Commented] (HIVE-21362) Add an input format and serde to read from protobuf files.
[ https://issues.apache.org/jira/browse/HIVE-21362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783096#comment-16783096 ] Hive QA commented on HIVE-21362: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 24s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 21s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 36s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 1m 21s{color} | {color:blue} standalone-metastore/metastore-server in master has 179 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 27s{color} | {color:blue} contrib in master has 10 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 49s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s{color} | {color:red} itests/hive-unit: The patch generated 1 new + 15 unchanged - 0 fixed = 16 total (was 15) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s{color} | {color:red} The patch has 32 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc xml compile findbugs checkstyle | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16322/dev-support/hive-personality.sh | | git revision | master / f51f108 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16322/yetus/diff-checkstyle-itests_hive-unit.txt | | whitespace | http://104.198.109.242/logs//PreCommit-HIVE-Build-16322/yetus/whitespace-eol.txt | | modules | C: standalone-metastore/metastore-server contrib itests/hive-unit U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16322/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Add an input format and serde to read from protobuf files. > -- > > Key: HIVE-21362 > URL: https://issues.apache.org/jira/browse/HIVE-21362 > Project: Hive > Issue Type: Task > Components: HiveServer2 >Reporter: Harish Jaiprakash >Assignee: Harish Jaiprakash >Priority: Critical > Attachments: HIVE-21362.01.patch,