[GitHub] [hive] dlavati opened a new pull request #1207: HIVE-23483: Remove DynamicSerDe
dlavati opened a new pull request #1207: URL: https://github.com/apache/hive/pull/1207 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] dgzdot opened a new pull request #1206: HIVE-23802:“merge files” job was submited to default queue when set h…
dgzdot opened a new pull request #1206: URL: https://github.com/apache/hive/pull/1206 …ive.merge.tezfiles to true ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on pull request #1196: HIVE-23791: Optimize ACID stats generation
kgyrtkirk commented on pull request #1196: URL: https://github.com/apache/hive/pull/1196#issuecomment-654113119 http://ci.hive.apache.org/job/hive-precommit/job/PR-1196/2/ is valid for this pr; branch indexing have started ~80 ptest runs which I had to cancel manually... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] nareshpr opened a new pull request #1214: HIVE-23811: deleteReader SARG rowId is not getting validated properly
nareshpr opened a new pull request #1214: URL: https://github.com/apache/hive/pull/1214 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] jcamachor commented on a change in pull request #1212: HIVE-23807 Wrong results with vectorization enabled
jcamachor commented on a change in pull request #1212: URL: https://github.com/apache/hive/pull/1212#discussion_r450617751 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastStringToDate.java ## @@ -116,14 +118,16 @@ public void evaluate(VectorizedRowBatch batch) { private void evaluate(LongColumnVector outV, BytesColumnVector inV, int i) { String dateString = new String(inV.vector[i], inV.start[i], inV.length[i], StandardCharsets.UTF_8); -if (dateParser.parseDate(dateString, sqlDate)) { +try { + Date utilDate = Date.valueOf(dateString); Review comment: Is this vectorized expression generated for `GenericUDFDate`? It seems that UDF still relies on DateParser in this branch: https://github.com/apache/hive/blob/b7f3e8ef399f510c3a6780209ebc688ef0acee8f/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDate.java#L112 Since the change may have an effect on parsing of other dates, it may be good to have a test run. It seems we have a jenkins active for branch-2 patches: http://130.211.9.232/job/hive-precommit/job/branch-2/ . I am wondering whether a PR against branch-2 will trigger it or whether we can trigger it manually. Could you verify? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] StefanXiepj closed pull request #1208: Hive 22412: StatsUtils throw NPE when explain
StefanXiepj closed pull request #1208: URL: https://github.com/apache/hive/pull/1208 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] StefanXiepj opened a new pull request #1209: HIVE-22412: StatsUtils throw NPE when explain
StefanXiepj opened a new pull request #1209: URL: https://github.com/apache/hive/pull/1209 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk opened a new pull request #1210: HIVE-22301: Hive lineage is not generated for insert overwrite queries on partitioned tables
kgyrtkirk opened a new pull request #1210: URL: https://github.com/apache/hive/pull/1210 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] belugabehr closed pull request #1153: HIVE-23731: Review of AvroInstance Cache
belugabehr closed pull request #1153: URL: https://github.com/apache/hive/pull/1153 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] belugabehr commented on a change in pull request #1211: HIVE-23790: The error message length of 2000 is exceeded for scheduled query
belugabehr commented on a change in pull request #1211: URL: https://github.com/apache/hive/pull/1211#discussion_r450245316 ## File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestMetastoreScheduledQueries.java ## @@ -357,6 +360,17 @@ public void testPoll() throws Exception { assertFalse(pollResult.isSetQuery()); } + private String generateLongErrorMessage() { +StringBuffer sb=new StringBuffer(); Review comment: This is a test, so not super important, but this should be a `StringBuilder` and please fix the checkstyle issue (no space between the `=`. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] belugabehr commented on pull request #1209: HIVE-22412: StatsUtils throw NPE when explain
belugabehr commented on pull request #1209: URL: https://github.com/apache/hive/pull/1209#issuecomment-654250436 Thanks for the contribution. I need to look at it more, but if 'null' is not a valid option to pass into the constructors, then it should be prohibited: `Objects#requireNull` Please check where the null value is being passed in and instead have the calling class pass in an `Collections#EmptyList` if possible, or a new `ArrayList` otherwise. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on pull request #1211: HIVE-23790: The error message length of 2000 is exceeded for scheduled query
kgyrtkirk commented on pull request #1211: URL: https://github.com/apache/hive/pull/1211#issuecomment-654263274 I didn't wanted to get into trouble because of utf stuff; so I clip the message at 1K ; the purpose of this error message field is to given an idea what was the problem - and not to provide fully detailed backtrace/etc ; I think 1K might be already too much...in case some exception puts too much info into the "message" part of the exception This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on pull request #1210: HIVE-22301: Hive lineage is not generated for insert overwrite queries on partitioned tables
kgyrtkirk commented on pull request #1210: URL: https://github.com/apache/hive/pull/1210#issuecomment-654235092 I feel that there is a strong correlation between something being `!complete` and `dummy_partition` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] belugabehr closed pull request #1203: HIVE-22674: Replace Base64 in serde Package
belugabehr closed pull request #1203: URL: https://github.com/apache/hive/pull/1203 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] jfsii commented on pull request #1119: HIVE-23699: Cleanup HIVEQUERYRESULTFILEFORMAT handling
jfsii commented on pull request #1119: URL: https://github.com/apache/hive/pull/1119#issuecomment-654280518 Rebased, so it should kick off tests again This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] belugabehr closed pull request #1088: HIVE-23673: Maven Standard Directories for accumulo-handler
belugabehr closed pull request #1088: URL: https://github.com/apache/hive/pull/1088 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk opened a new pull request #1211: HIVE-23790: The error message length of 2000 is exceeded for scheduled query
kgyrtkirk opened a new pull request #1211: URL: https://github.com/apache/hive/pull/1211 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] belugabehr commented on a change in pull request #1201: HIVE-23797: Throw exception when no metastore found in zookeeper
belugabehr commented on a change in pull request #1201: URL: https://github.com/apache/hive/pull/1201#discussion_r450251647 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java ## @@ -327,6 +327,13 @@ private void resolveUris() throws MetaException { MetaStoreUtils.logAndThrowMetaException(e); } +if (metastoreUrisString.isEmpty() && "zookeeper".equalsIgnoreCase(serviceDiscoveryMode)) { + throw new MetaException("No metastore server available. " + + "Please ensure that at least one metastore server is online"); +} + +LOG.info("Resolved metastore uris: " + Arrays.toString(metastoreUrisString.toArray())); Review comment: Nit: Do not need to convert it to an array first. This will work: `LOG.info("Resolved metastore uris: {}", metastoreUrisString);` ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java ## @@ -327,6 +327,13 @@ private void resolveUris() throws MetaException { MetaStoreUtils.logAndThrowMetaException(e); } +if (metastoreUrisString.isEmpty() && "zookeeper".equalsIgnoreCase(serviceDiscoveryMode)) { + throw new MetaException("No metastore server available. " Review comment: Please add some additional context here regarding ZooKeeper. Something like: `No metastore store service discovered in ZooKeeper` or `No metastore store service currently registered in ZookKeeper` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvary merged pull request #1204: HIVE-23799: Fix AcidUtils.parseBaseOrDeltaBucketFilename handling of data loaded by LOAD DATA
pvary merged pull request #1204: URL: https://github.com/apache/hive/pull/1204 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] dengzhhu653 commented on a change in pull request #1201: HIVE-23797: Throw exception when no metastore found in zookeeper
dengzhhu653 commented on a change in pull request #1201: URL: https://github.com/apache/hive/pull/1201#discussion_r450287586 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java ## @@ -327,6 +327,13 @@ private void resolveUris() throws MetaException { MetaStoreUtils.logAndThrowMetaException(e); } +if (metastoreUrisString.isEmpty() && "zookeeper".equalsIgnoreCase(serviceDiscoveryMode)) { + throw new MetaException("No metastore server available. " + + "Please ensure that at least one metastore server is online"); +} + +LOG.info("Resolved metastore uris: " + Arrays.toString(metastoreUrisString.toArray())); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] kgyrtkirk commented on a change in pull request #1161: HIVE-23638: Spotbugs issues in hive-common
kgyrtkirk commented on a change in pull request #1161: URL: https://github.com/apache/hive/pull/1161#discussion_r450299299 ## File path: common/src/java/org/apache/hadoop/hive/conf/Validator.java ## @@ -357,14 +357,15 @@ public String validate(String value) { final Path path = FileSystems.getDefault().getPath(value); if (path == null && value != null) { return String.format("Path '%s' provided could not be located.", value); - } - final boolean isDir = Files.isDirectory(path); - final boolean isWritable = Files.isWritable(path); - if (!isDir) { -return String.format("Path '%s' provided is not a directory.", value); - } - if (!isWritable) { -return String.format("Path '%s' provided is not writable.", value); + } else if (path != null) { +final boolean isDir = Files.isDirectory(path); +final boolean isWritable = Files.isWritable(path); +if (!isDir) { + return String.format("Path '%s' provided is not a directory.", value); +} +if (!isWritable) { + return String.format("Path '%s' provided is not writable.", value); +} } return null; Review comment: I think this should be something else than `null` - did the old behaviour accept `null` as a valid path? I believe `Files.isDirectory(null)` returned false looking from the other end: I don't think `null` should be accepted as a "writeabledirectory" ## File path: common/src/java/org/apache/hadoop/hive/common/FileUtils.java ## @@ -926,8 +925,7 @@ public static File createLocalDirsTempFile(Configuration conf, String prefix, St * delete a temporary file and remove it from delete-on-exit hook. */ public static boolean deleteTmpFile(File tempFile) { -if (tempFile != null) { - tempFile.delete(); +if (tempFile != null && tempFile.delete()) { Review comment: this change will leave the file registered in the shutdownmanager in case the file was already deleted ## File path: common/src/java/org/apache/hadoop/hive/common/type/HiveVarchar.java ## @@ -62,6 +62,9 @@ public boolean equals(Object rhs) { if (rhs == this) { return true; } -return this.getValue().equals(((HiveVarchar)rhs).getValue()); +if (rhs instanceof HiveVarchar) { + return this.getValue().equals(((HiveVarchar) rhs).getValue()); Review comment: nice catch! :D ## File path: common/src/java/org/apache/hive/common/util/SuppressFBWarnings.java ## @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hive.common.util; + +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; + +@Retention(RetentionPolicy.CLASS) +public @interface SuppressFBWarnings { +/** + * The set of FindBugs warnings that are to be suppressed in + * annotated element. The value can be a bug category, kind or pattern. + * + */ +String[] value() default {}; + +/** + * Optional documentation of the reason why the warning is suppressed + */ +String justification() default ""; +} Review comment: I think at some point we should probably introduce some rule to ensure that every file ends with a newline...but for now: could you add one here ? :D ## File path: common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java ## @@ -135,10 +135,10 @@ public static Path internUriStringsInPath(Path path) { public static Map internValuesInMap(Map map) { if (map != null) { - for (K key : map.keySet()) { -String value = map.get(key); + for (Map.Entry entry : map.entrySet()) { +String value = entry.getValue(); if (value != null) { - map.put(key, value.intern()); + map.put(entry.getKey(), value.intern()); Review comment: I know it's probably out of scope in this patch - but in case the value is already "interned" - then it will not change; at a cost of a conditional we can skip the hash lookup(put) as well This is an
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450317451 ## File path: ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java ## @@ -74,21 +76,21 @@ @Before public void setUp() throws Exception { hive = Hive.get(); - hive.getConf().setIntVar(HiveConf.ConfVars.METASTORE_FS_HANDLER_THREADS_COUNT, 15); -hive.getConf().set(HiveConf.ConfVars.HIVE_MSCK_PATH_VALIDATION.varname, "throw"); + hive.getConf().set(MetastoreConf.ConfVars.FS_HANDLER_THREADS_COUNT.getVarname(), "15"); Review comment: It does not work with MetasoreConf.ConfVars This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450302033 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2392,33 +2392,29 @@ public static TableSnapshot getTableSnapshot(Configuration conf, long writeId = -1; ValidWriteIdList validWriteIdList = null; -HiveTxnManager sessionTxnMgr = SessionState.get().getTxnMgr(); -String fullTableName = getFullTableName(dbName, tblName); -if (sessionTxnMgr != null && sessionTxnMgr.getCurrentTxnId() > 0) { - validWriteIdList = getTableValidWriteIdList(conf, fullTableName); - if (isStatsUpdater) { -writeId = SessionState.get().getTxnMgr() != null ? -SessionState.get().getTxnMgr().getAllocatedTableWriteId( - dbName, tblName) : -1; -if (writeId < 1) { - // TODO: this is not ideal... stats updater that doesn't have write ID is currently - // "create table"; writeId would be 0/-1 here. No need to call this w/true. - LOG.debug("Stats updater for {}.{} doesn't have a write ID ({})", - dbName, tblName, writeId); +if (SessionState.get() != null) { + HiveTxnManager sessionTxnMgr = SessionState.get().getTxnMgr(); + String fullTableName = getFullTableName(dbName, tblName); + if (sessionTxnMgr != null && sessionTxnMgr.getCurrentTxnId() > 0) { +validWriteIdList = getTableValidWriteIdList(conf, fullTableName); +if (isStatsUpdater) { + writeId = sessionTxnMgr != null ? sessionTxnMgr.getAllocatedTableWriteId(dbName, tblName) : -1; + if (writeId < 1) { Review comment: The comment said so: "stats updater that doesn't have write ID is currently "create table"; writeId would be 0/-1 here." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450326885 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -8322,6 +8322,22 @@ public AllocateTableWriteIdsResponse allocate_table_write_ids( return response; } +@Override +public MaxAllocatedTableWriteIdResponse get_max_allocated_table_write_id(MaxAllocatedTableWriteIdRequest rqst) Review comment: All the functions in HMS looks like this I don't want to break the pattern. On the second glance, I had to change the seedWriteId and seedTxnId to look like this... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] ramesh0201 commented on a change in pull request #1177: HIVE-23665 Rewrite last_value to first_value to enable streaming results
ramesh0201 commented on a change in pull request #1177: URL: https://github.com/apache/hive/pull/1177#discussion_r450347761 ## File path: ql/src/test/results/clientpositive/llap/vector_ptf_part_simple.q.out ## @@ -314,46 +386,46 @@ POSTHOOK: type: QUERY POSTHOOK: Input: default@vector_ptf_part_simple_orc A masked pattern was here p_mfgr p_name p_retailprice rn r dr fv lv c cs -Manufacturer#2 almond aquamarine rose maroon antique 900.66 1 1 1 900.66 2031.98 8 8 -Manufacturer#2 almond aquamarine rose maroon antique 1698.66 2 1 1 900.66 2031.98 8 8 -Manufacturer#2 almond antique violet turquoise frosted 1800.7 3 1 1 900.66 2031.98 8 8 -Manufacturer#2 almond antique violet chocolate turquoise 1690.68 4 1 1 900.66 2031.98 8 8 -Manufacturer#2 almond antique violet turquoise frosted 1800.7 5 1 1 900.66 2031.98 8 8 -Manufacturer#2 almond antique violet turquoise frosted 1800.7 6 1 1 900.66 2031.98 8 8 -Manufacturer#2 almond aquamarine sandy cyan gainsboro 1000.6 7 1 1 900.66 2031.98 8 8 -Manufacturer#2 almond aquamarine midnight light salmon 2031.98 8 1 1 900.66 2031.98 8 8 Manufacturer#3 almond antique forest lavender goldenrod1190.27 1 1 1 1190.27 1190.27 7 8 -Manufacturer#3 almond antique chartreuse khaki white 99.68 2 1 1 1190.27 1190.27 7 8 -Manufacturer#3 almond antique forest lavender goldenrodNULL3 1 1 1190.27 1190.27 7 8 -Manufacturer#3 almond antique metallic orange dim 55.39 4 1 1 1190.27 1190.27 7 8 -Manufacturer#3 almond antique misty red olive 1922.98 5 1 1 1190.27 1190.27 7 8 -Manufacturer#3 almond antique forest lavender goldenrod590.27 6 1 1 1190.27 1190.27 7 8 -Manufacturer#3 almond antique olive coral navajo 1337.29 7 1 1 1190.27 1190.27 7 8 Manufacturer#3 almond antique forest lavender goldenrod1190.27 8 1 1 1190.27 1190.27 7 8 -Manufacturer#4 almond antique gainsboro frosted violet NULL1 1 1 NULL1290.35 4 6 -Manufacturer#4 almond aquamarine floral ivory bisque NULL2 1 1 NULL1290.35 4 6 -Manufacturer#4 almond antique violet mint lemon1375.42 3 1 1 NULL1290.35 4 6 -Manufacturer#4 almond aquamarine yellow dodger mint1844.92 4 1 1 NULL1290.35 4 6 -Manufacturer#4 almond aquamarine floral ivory bisque 1206.26 5 1 1 NULL1290.35 4 6 -Manufacturer#4 almond azure aquamarine papaya violet 1290.35 6 1 1 NULL1290.35 4 6 +Manufacturer#3 almond antique olive coral navajo 1337.29 7 1 1 1190.27 1190.27 7 8 +Manufacturer#3 almond antique forest lavender goldenrod590.27 6 1 1 1190.27 1190.27 7 8 +Manufacturer#3 almond antique misty red olive 1922.98 5 1 1 1190.27 1190.27 7 8 +Manufacturer#3 almond antique metallic orange dim 55.39 4 1 1 1190.27 1190.27 7 8 +Manufacturer#3 almond antique forest lavender goldenrodNULL3 1 1 1190.27 1190.27 7 8 +Manufacturer#3 almond antique chartreuse khaki white 99.68 2 1 1 1190.27 1190.27 7 8 +Manufacturer#1 almond aquamarine pink moccasin thistle 1632.66 1 1 1 1632.66 1632.66 11 12 +Manufacturer#1 almond antique chartreuse lavender yellow 1753.76 2 1 1 1632.66 1632.66 11 12 +Manufacturer#1 almond aquamarine pink moccasin thistle 1632.66 3 1 1 1632.66 1632.66 11 12 +Manufacturer#1 almond antique chartreuse lavender yellow 1753.76 4 1 1 1632.66 1632.66 11 12 +Manufacturer#1 almond aquamarine pink moccasin thistle 1632.66 5 1 1 1632.66 1632.66 11 12 +Manufacturer#1 almond antique chartreuse lavender yellow 1753.76 6 1 1 1632.66 1632.66 11 12 +Manufacturer#1 almond antique salmon chartreuse burlywood 1602.59 7 1 1 1632.66 1632.66 11 12 +Manufacturer#1 almond antique burnished rose metallic 1173.15 8 1 1 1632.66 1632.66 11 12 +Manufacturer#1 almond aquamarine burnished black steel 1414.42 9 1 1 1632.66 1632.66 11 12 +Manufacturer#1 almond aquamarine pink moccasin thistle NULL10 1 1 1632.66 1632.66 11 12 +Manufacturer#1 almond antique chartreuse lavender yellow 1753.76 11 1 1
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450361883 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java ## @@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) { throw new MetastoreException(e); } } +if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) { + if (result.getMaxWriteId() > 0) { +if (txnId < 0) { + // We need the txnId to check against even if we didn't do the locking + txnId = getMsc().openTxn(getUserName()); +} + +validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), result.getMaxTxnId(), +table.getDbName(), table.getTableName(), txnId); + } +} } success = true; } catch (Exception e) { LOG.warn("Failed to run metacheck: ", e); success = false; - ret = 1; } finally { - if (msckInfo.getResFile() != null) { -BufferedWriter resultOut = null; -try { - Path resFile = new Path(msckInfo.getResFile()); - FileSystem fs = resFile.getFileSystem(getConf()); - resultOut = new BufferedWriter(new OutputStreamWriter(fs.create(resFile))); - - boolean firstWritten = false; - firstWritten |= writeMsckResult(result.getTablesNotInMs(), -"Tables not in metastore:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getTablesNotOnFs(), -"Tables missing on filesystem:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getPartitionsNotInMs(), -"Partitions not in metastore:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(), -"Partitions missing from filesystem:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getExpiredPartitions(), -"Expired partitions (retention period: " + partitionExpirySeconds + "s) :", resultOut, firstWritten); - // sorting to stabilize qfile output (msck_repair_drop.q) - Collections.sort(repairOutput); - for (String rout : repairOutput) { -if (firstWritten) { - resultOut.write(terminator); -} else { - firstWritten = true; -} -resultOut.write(rout); - } -} catch (IOException e) { - LOG.warn("Failed to save metacheck output: ", e); - ret = 1; -} finally { - if (resultOut != null) { -try { - resultOut.close(); -} catch (IOException e) { - LOG.warn("Failed to close output file: ", e); - ret = 1; -} - } + if (result!=null) { +logResult(result); +if (msckInfo.getResFile() != null) { + success = writeResultToFile(msckInfo, result, repairOutput, partitionExpirySeconds) && success; } } - LOG.info("Tables not in metastore: {}", result.getTablesNotInMs()); - LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs()); - LOG.info("Partitions not in metastore: {}", result.getPartitionsNotInMs()); - LOG.info("Partitions missing from filesystem: {}", result.getPartitionsNotOnFs()); - LOG.info("Expired partitions: {}", result.getExpiredPartitions()); - if (acquireLock && txnId > 0) { - if (success) { -try { - LOG.info("txnId: {} succeeded. Committing..", txnId); - getMsc().commitTxn(txnId); -} catch (Exception e) { - LOG.warn("Error while committing txnId: {} for table: {}", txnId, qualifiedTableName, e); - ret = 1; -} - } else { -try { - LOG.info("txnId: {} failed. Aborting..", txnId); - getMsc().abortTxns(Lists.newArrayList(txnId)); -} catch (Exception e) { - LOG.warn("Error while aborting txnId: {} for table: {}", txnId, qualifiedTableName, e); - ret = 1; -} - } + if (txnId > 0) { +success = closeTxn(qualifiedTableName, success, txnId) && success; Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450315397 ## File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java ## @@ -2209,20 +2209,7 @@ public void testAcidOrcWritePreservesFieldNames() throws Exception { * sorts rows in dictionary order */ static List stringifyValues(int[][] rowsIn) { -assert rowsIn.length > 0; -int[][] rows = rowsIn.clone(); -Arrays.sort(rows, new RowComp()); -List rs = new ArrayList(); -for(int[] row : rows) { - assert row.length > 0; - StringBuilder sb = new StringBuilder(); - for(int value : row) { -sb.append(value).append("\t"); - } - sb.setLength(sb.length() - 1); - rs.add(sb.toString()); -} -return rs; +return TxnCommandsBaseForTests.stringifyValues(rowsIn); Review comment: I will do this in a separate Jira, started it, but it requires more change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450334396 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java ## @@ -111,24 +120,24 @@ public IMetaStoreClient getMsc() { * @param partitions * List of partition name value pairs, if null or empty check all * partitions - * @param table - * @param result - * Fill this with the results of the check + * @param table Table we want to run the check for. + * @return Results of the check * @throws MetastoreException * Failed to get required information from the metastore. * @throws IOException * Most likely filesystem related */ - public void checkMetastore(String catName, String dbName, String tableName, - List> partitions, Table table, CheckResult result) + public CheckResult checkMetastore(String catName, String dbName, String tableName, + List> partitions, Table table) throws MetastoreException, IOException { - +CheckResult result = new CheckResult(); if (dbName == null || "".equalsIgnoreCase(dbName)) { dbName = Warehouse.DEFAULT_DATABASE_NAME; } try { if (tableName == null || "".equals(tableName)) { +// TODO: I do not think this is used by anything other than tests Review comment: I do not know. If I understand correctly there is no way currently to call MSCK repair without a table specified, but it seems like someone made some effort to create that feature and tests for it. But i don't know if we ever want that in production (calling msck repair for every table seems like a quick way to overwhelm the system) I left this comment here, for anybody who tries to makes sense of this code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450360016 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java ## @@ -429,6 +451,75 @@ void findUnknownPartitions(Table table, Set partPaths, LOG.debug("Number of partitions not in metastore : " + result.getPartitionsNotInMs().size()); } + /** + * Calculate the maximum seen writeId from the acid directory structure + * @param partPath Path of the partition directory + * @param res Partition result to write the max ids + * @throws IOException ex + */ + private void setMaxTxnAndWriteIdFromPartition(Path partPath, CheckResult.PartitionResult res) throws IOException { +FileSystem fs = partPath.getFileSystem(conf); +FileStatus[] deltaOrBaseFiles = fs.listStatus(partPath, HIDDEN_FILES_PATH_FILTER); + +// Read the writeIds from every base and delta directory and find the max +long maxWriteId = 0L; +long maxVisibilityId = 0L; +for(FileStatus fileStatus : deltaOrBaseFiles) { + if (!fileStatus.isDirectory()) { +continue; + } + long writeId = 0L; + long visibilityId = 0L; + String folder = fileStatus.getPath().getName(); + if (folder.startsWith(BASE_PREFIX)) { +visibilityId = getVisibilityTxnId(folder); +if (visibilityId > 0) { + folder = removeVisibilityTxnId(folder); +} +writeId = Long.parseLong(folder.substring(BASE_PREFIX.length())); + } else if (folder.startsWith(DELTA_PREFIX) || folder.startsWith(DELETE_DELTA_PREFIX)) { +// See AcidUtils.parseDelta +visibilityId = getVisibilityTxnId(folder); +if (visibilityId > 0) { + folder = removeVisibilityTxnId(folder); +} +boolean isDeleteDelta = folder.startsWith(DELETE_DELTA_PREFIX); +String rest = folder.substring((isDeleteDelta ? DELETE_DELTA_PREFIX : DELTA_PREFIX).length()); +int split = rest.indexOf('_'); +//split2 may be -1 if no statementId +int split2 = rest.indexOf('_', split + 1); +// We always want the second part (it is either the same or greater if it is a compacted delta) +writeId = split2 == -1 ? Long.parseLong(rest.substring(split + 1)) : Long +.parseLong(rest.substring(split + 1, split2)); + } + if (writeId > maxWriteId) { +maxWriteId = writeId; + } + if (visibilityId > maxVisibilityId) { +maxVisibilityId = visibilityId; + } +} +LOG.debug("Max writeId {}, max txnId {} found in partition {}", maxWriteId, maxVisibilityId, +partPath.toUri().toString()); +res.setMaxWriteId(maxWriteId); +res.setMaxTxnId(maxVisibilityId); + } + private long getVisibilityTxnId(String folder) { +int idxOfVis = folder.indexOf(VISIBILITY_PREFIX); Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450362382 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java ## @@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) { throw new MetastoreException(e); } } +if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) { + if (result.getMaxWriteId() > 0) { +if (txnId < 0) { + // We need the txnId to check against even if we didn't do the locking + txnId = getMsc().openTxn(getUserName()); +} + +validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), result.getMaxTxnId(), +table.getDbName(), table.getTableName(), txnId); + } +} } success = true; } catch (Exception e) { LOG.warn("Failed to run metacheck: ", e); success = false; - ret = 1; } finally { - if (msckInfo.getResFile() != null) { -BufferedWriter resultOut = null; -try { - Path resFile = new Path(msckInfo.getResFile()); - FileSystem fs = resFile.getFileSystem(getConf()); - resultOut = new BufferedWriter(new OutputStreamWriter(fs.create(resFile))); - - boolean firstWritten = false; - firstWritten |= writeMsckResult(result.getTablesNotInMs(), -"Tables not in metastore:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getTablesNotOnFs(), -"Tables missing on filesystem:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getPartitionsNotInMs(), -"Partitions not in metastore:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(), -"Partitions missing from filesystem:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getExpiredPartitions(), -"Expired partitions (retention period: " + partitionExpirySeconds + "s) :", resultOut, firstWritten); - // sorting to stabilize qfile output (msck_repair_drop.q) - Collections.sort(repairOutput); - for (String rout : repairOutput) { -if (firstWritten) { - resultOut.write(terminator); -} else { - firstWritten = true; -} -resultOut.write(rout); - } -} catch (IOException e) { - LOG.warn("Failed to save metacheck output: ", e); - ret = 1; -} finally { - if (resultOut != null) { -try { - resultOut.close(); -} catch (IOException e) { - LOG.warn("Failed to close output file: ", e); - ret = 1; -} - } + if (result!=null) { +logResult(result); +if (msckInfo.getResFile() != null) { + success = writeResultToFile(msckInfo, result, repairOutput, partitionExpirySeconds) && success; } } - LOG.info("Tables not in metastore: {}", result.getTablesNotInMs()); - LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs()); - LOG.info("Partitions not in metastore: {}", result.getPartitionsNotInMs()); - LOG.info("Partitions missing from filesystem: {}", result.getPartitionsNotOnFs()); - LOG.info("Expired partitions: {}", result.getExpiredPartitions()); - if (acquireLock && txnId > 0) { - if (success) { -try { - LOG.info("txnId: {} succeeded. Committing..", txnId); - getMsc().commitTxn(txnId); -} catch (Exception e) { - LOG.warn("Error while committing txnId: {} for table: {}", txnId, qualifiedTableName, e); - ret = 1; -} - } else { -try { - LOG.info("txnId: {} failed. Aborting..", txnId); - getMsc().abortTxns(Lists.newArrayList(txnId)); -} catch (Exception e) { - LOG.warn("Error while aborting txnId: {} for table: {}", txnId, qualifiedTableName, e); - ret = 1; -} - } + if (txnId > 0) { +success = closeTxn(qualifiedTableName, success, txnId) && success; } if (getMsc() != null) { getMsc().close(); msc = null; } } +return success ? 0 : 1; + } + private boolean closeTxn(String qualifiedTableName, boolean success, long txnId) { +boolean ret = true; +if (success) { + try { +LOG.info("txnId: {} succeeded. Committing..", txnId); +getMsc().commitTxn(txnId); + } catch (Exception e) { +LOG.warn("Error while committing txnId: {} for table: {}", txnId, qualifiedTableName, e); Review comment: fixed ## File path:
[GitHub] [hive] dengzhhu653 commented on a change in pull request #1201: HIVE-23797: Throw exception when no metastore found in zookeeper
dengzhhu653 commented on a change in pull request #1201: URL: https://github.com/apache/hive/pull/1201#discussion_r450288096 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java ## @@ -327,6 +327,13 @@ private void resolveUris() throws MetaException { MetaStoreUtils.logAndThrowMetaException(e); } +if (metastoreUrisString.isEmpty() && "zookeeper".equalsIgnoreCase(serviceDiscoveryMode)) { + throw new MetaException("No metastore server available. " Review comment: done, @belugabehr thank you very much for the review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450301348 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2392,33 +2392,29 @@ public static TableSnapshot getTableSnapshot(Configuration conf, long writeId = -1; ValidWriteIdList validWriteIdList = null; -HiveTxnManager sessionTxnMgr = SessionState.get().getTxnMgr(); -String fullTableName = getFullTableName(dbName, tblName); -if (sessionTxnMgr != null && sessionTxnMgr.getCurrentTxnId() > 0) { - validWriteIdList = getTableValidWriteIdList(conf, fullTableName); - if (isStatsUpdater) { -writeId = SessionState.get().getTxnMgr() != null ? -SessionState.get().getTxnMgr().getAllocatedTableWriteId( - dbName, tblName) : -1; -if (writeId < 1) { - // TODO: this is not ideal... stats updater that doesn't have write ID is currently - // "create table"; writeId would be 0/-1 here. No need to call this w/true. - LOG.debug("Stats updater for {}.{} doesn't have a write ID ({})", - dbName, tblName, writeId); +if (SessionState.get() != null) { Review comment: It can be null, if the sessionstate was not set properly, it was failing one of the tests, i can't remember which. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450360969 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java ## @@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) { throw new MetastoreException(e); } } +if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) { + if (result.getMaxWriteId() > 0) { Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450318861 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2392,33 +2392,29 @@ public static TableSnapshot getTableSnapshot(Configuration conf, long writeId = -1; ValidWriteIdList validWriteIdList = null; -HiveTxnManager sessionTxnMgr = SessionState.get().getTxnMgr(); -String fullTableName = getFullTableName(dbName, tblName); -if (sessionTxnMgr != null && sessionTxnMgr.getCurrentTxnId() > 0) { - validWriteIdList = getTableValidWriteIdList(conf, fullTableName); - if (isStatsUpdater) { -writeId = SessionState.get().getTxnMgr() != null ? -SessionState.get().getTxnMgr().getAllocatedTableWriteId( - dbName, tblName) : -1; -if (writeId < 1) { - // TODO: this is not ideal... stats updater that doesn't have write ID is currently - // "create table"; writeId would be 0/-1 here. No need to call this w/true. - LOG.debug("Stats updater for {}.{} doesn't have a write ID ({})", - dbName, tblName, writeId); +if (SessionState.get() != null) { + HiveTxnManager sessionTxnMgr = SessionState.get().getTxnMgr(); + String fullTableName = getFullTableName(dbName, tblName); + if (sessionTxnMgr != null && sessionTxnMgr.getCurrentTxnId() > 0) { +validWriteIdList = getTableValidWriteIdList(conf, fullTableName); +if (isStatsUpdater) { + writeId = sessionTxnMgr != null ? sessionTxnMgr.getAllocatedTableWriteId(dbName, tblName) : -1; Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450318970 ## File path: ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java ## @@ -162,9 +163,23 @@ protected String getWarehouseDir() { * takes raw data and turns it into a string as if from Driver.getResults() * sorts rows in dictionary order */ - List stringifyValues(int[][] rowsIn) { -return TestTxnCommands2.stringifyValues(rowsIn); + public static List stringifyValues(int[][] rowsIn) { +assert rowsIn.length > 0; +int[][] rows = rowsIn.clone(); +Arrays.sort(rows, new TestTxnCommands2.RowComp()); Review comment: fixed ## File path: ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java ## @@ -252,37 +241,165 @@ public void testInvalidPartitionKeyName() @Test public void testSkipInvalidPartitionKeyName() throws HiveException, AlreadyExistsException, IOException, MetastoreException { -hive.getConf().set(HiveConf.ConfVars.HIVE_MSCK_PATH_VALIDATION.varname, "skip"); + hive.getConf().set(MetastoreConf.ConfVars.MSCK_PATH_VALIDATION.getVarname(), "skip"); checker = new HiveMetaStoreChecker(msc, hive.getConf()); -Table table = createTestTable(); +Table table = createTestTable(false); List partitions = hive.getPartitions(table); assertEquals(2, partitions.size()); // add a fake partition dir on fs fs = partitions.get(0).getDataLocation().getFileSystem(hive.getConf()); -Path fakePart = -new Path(table.getDataLocation().toString(), "fakedate=2009-01-01/fakecity=sanjose"); -fs.mkdirs(fakePart); -fs.deleteOnExit(fakePart); +addFolderToPath(fs, table.getDataLocation().toString(),"fakedate=2009-01-01/fakecity=sanjose"); createPartitionsDirectoriesOnFS(table, 2); -CheckResult result = new CheckResult(); -checker.checkMetastore(catName, dbName, tableName, null, null, result); +CheckResult result = checker.checkMetastore(catName, dbName, tableName, null, null); assertEquals(Collections. emptySet(), result.getTablesNotInMs()); assertEquals(Collections. emptySet(), result.getTablesNotOnFs()); assertEquals(Collections. emptySet(), result.getPartitionsNotOnFs()); // only 2 valid partitions should be added assertEquals(2, result.getPartitionsNotInMs().size()); } - private Table createTestTable() throws HiveException, AlreadyExistsException { + /* + * Tests the case when we have normal delta_dirs in the partition folder + * does not throw HiveException + */ + @Test + public void testAddPartitionNormalDeltas() throws Exception { +Table table = createTestTable(true); +List partitions = hive.getPartitions(table); +assertEquals(2, partitions.size()); +// add a partition dir on fs +fs = partitions.get(0).getDataLocation().getFileSystem(hive.getConf()); +Path newPart = addFolderToPath(fs, table.getDataLocation().toString(), +partDateName + "=2017-01-01/" + partCityName + "=paloalto"); + +// Add a few deltas +addFolderToPath(fs, newPart.toString(), "delta_001_001_"); +addFolderToPath(fs, newPart.toString(), "delta_010_010_"); +addFolderToPath(fs, newPart.toString(), "delta_101_101_"); +CheckResult result = checker.checkMetastore(catName, dbName, tableName, null, null); +assertEquals(Collections. emptySet(), result.getPartitionsNotOnFs()); +assertEquals(1, result.getPartitionsNotInMs().size()); +// Found the highest writeId +assertEquals(101, result.getPartitionsNotInMs().iterator().next().getMaxWriteId()); +assertEquals(0, result.getPartitionsNotInMs().iterator().next().getMaxTxnId()); + } + /* + * Tests the case when we have normal delta_dirs in the partition folder + * does not throw HiveException + */ + @Test + public void testAddPartitionCompactedDeltas() throws Exception { +Table table = createTestTable(true); +List partitions = hive.getPartitions(table); +assertEquals(2, partitions.size()); +// add a partition dir on fs +fs = partitions.get(0).getDataLocation().getFileSystem(hive.getConf()); +Path newPart = addFolderToPath(fs, table.getDataLocation().toString(), +partDateName + "=2017-01-01/" + partCityName + "=paloalto"); + +// Add a few deltas +addFolderToPath(fs, newPart.toString(), "delta_001_001_"); +addFolderToPath(fs, newPart.toString(), "delta_010_015_v067"); +addFolderToPath(fs, newPart.toString(), "delta_101_120_v087"); +CheckResult result = checker.checkMetastore(catName, dbName, tableName, null, null); +assertEquals(Collections. emptySet(), result.getPartitionsNotOnFs()); +assertEquals(1, result.getPartitionsNotInMs().size()); +// Found the highest writeId +assertEquals(120,
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450338348 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java ## @@ -429,6 +451,75 @@ void findUnknownPartitions(Table table, Set partPaths, LOG.debug("Number of partitions not in metastore : " + result.getPartitionsNotInMs().size()); } + /** + * Calculate the maximum seen writeId from the acid directory structure + * @param partPath Path of the partition directory + * @param res Partition result to write the max ids + * @throws IOException ex + */ + private void setMaxTxnAndWriteIdFromPartition(Path partPath, CheckResult.PartitionResult res) throws IOException { +FileSystem fs = partPath.getFileSystem(conf); +FileStatus[] deltaOrBaseFiles = fs.listStatus(partPath, HIDDEN_FILES_PATH_FILTER); + +// Read the writeIds from every base and delta directory and find the max +long maxWriteId = 0L; +long maxVisibilityId = 0L; +for(FileStatus fileStatus : deltaOrBaseFiles) { + if (!fileStatus.isDirectory()) { +continue; + } + long writeId = 0L; + long visibilityId = 0L; + String folder = fileStatus.getPath().getName(); + if (folder.startsWith(BASE_PREFIX)) { +visibilityId = getVisibilityTxnId(folder); +if (visibilityId > 0) { + folder = removeVisibilityTxnId(folder); +} +writeId = Long.parseLong(folder.substring(BASE_PREFIX.length())); + } else if (folder.startsWith(DELTA_PREFIX) || folder.startsWith(DELETE_DELTA_PREFIX)) { +// See AcidUtils.parseDelta +visibilityId = getVisibilityTxnId(folder); +if (visibilityId > 0) { + folder = removeVisibilityTxnId(folder); +} +boolean isDeleteDelta = folder.startsWith(DELETE_DELTA_PREFIX); +String rest = folder.substring((isDeleteDelta ? DELETE_DELTA_PREFIX : DELTA_PREFIX).length()); +int split = rest.indexOf('_'); Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450361266 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java ## @@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) { throw new MetastoreException(e); } } +if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) { + if (result.getMaxWriteId() > 0) { +if (txnId < 0) { + // We need the txnId to check against even if we didn't do the locking + txnId = getMsc().openTxn(getUserName()); +} + +validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), result.getMaxTxnId(), +table.getDbName(), table.getTableName(), txnId); + } +} } success = true; } catch (Exception e) { LOG.warn("Failed to run metacheck: ", e); success = false; - ret = 1; } finally { - if (msckInfo.getResFile() != null) { -BufferedWriter resultOut = null; -try { - Path resFile = new Path(msckInfo.getResFile()); - FileSystem fs = resFile.getFileSystem(getConf()); - resultOut = new BufferedWriter(new OutputStreamWriter(fs.create(resFile))); - - boolean firstWritten = false; - firstWritten |= writeMsckResult(result.getTablesNotInMs(), -"Tables not in metastore:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getTablesNotOnFs(), -"Tables missing on filesystem:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getPartitionsNotInMs(), -"Partitions not in metastore:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(), -"Partitions missing from filesystem:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getExpiredPartitions(), -"Expired partitions (retention period: " + partitionExpirySeconds + "s) :", resultOut, firstWritten); - // sorting to stabilize qfile output (msck_repair_drop.q) - Collections.sort(repairOutput); - for (String rout : repairOutput) { -if (firstWritten) { - resultOut.write(terminator); -} else { - firstWritten = true; -} -resultOut.write(rout); - } -} catch (IOException e) { - LOG.warn("Failed to save metacheck output: ", e); - ret = 1; -} finally { - if (resultOut != null) { -try { - resultOut.close(); -} catch (IOException e) { - LOG.warn("Failed to close output file: ", e); - ret = 1; -} - } + if (result!=null) { Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvargacl commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables
pvargacl commented on a change in pull request #1087: URL: https://github.com/apache/hive/pull/1087#discussion_r450361792 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java ## @@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) { throw new MetastoreException(e); } } +if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) { + if (result.getMaxWriteId() > 0) { +if (txnId < 0) { + // We need the txnId to check against even if we didn't do the locking + txnId = getMsc().openTxn(getUserName()); +} + +validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), result.getMaxTxnId(), +table.getDbName(), table.getTableName(), txnId); + } +} } success = true; } catch (Exception e) { LOG.warn("Failed to run metacheck: ", e); success = false; - ret = 1; } finally { - if (msckInfo.getResFile() != null) { -BufferedWriter resultOut = null; -try { - Path resFile = new Path(msckInfo.getResFile()); - FileSystem fs = resFile.getFileSystem(getConf()); - resultOut = new BufferedWriter(new OutputStreamWriter(fs.create(resFile))); - - boolean firstWritten = false; - firstWritten |= writeMsckResult(result.getTablesNotInMs(), -"Tables not in metastore:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getTablesNotOnFs(), -"Tables missing on filesystem:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getPartitionsNotInMs(), -"Partitions not in metastore:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(), -"Partitions missing from filesystem:", resultOut, firstWritten); - firstWritten |= writeMsckResult(result.getExpiredPartitions(), -"Expired partitions (retention period: " + partitionExpirySeconds + "s) :", resultOut, firstWritten); - // sorting to stabilize qfile output (msck_repair_drop.q) - Collections.sort(repairOutput); - for (String rout : repairOutput) { -if (firstWritten) { - resultOut.write(terminator); -} else { - firstWritten = true; -} -resultOut.write(rout); - } -} catch (IOException e) { - LOG.warn("Failed to save metacheck output: ", e); - ret = 1; -} finally { - if (resultOut != null) { -try { - resultOut.close(); -} catch (IOException e) { - LOG.warn("Failed to close output file: ", e); - ret = 1; -} - } + if (result!=null) { +logResult(result); +if (msckInfo.getResFile() != null) { + success = writeResultToFile(msckInfo, result, repairOutput, partitionExpirySeconds) && success; Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] StefanXiepj opened a new pull request #1208: Hive 22412: StatsUtils throw NPE when explain
StefanXiepj opened a new pull request #1208: URL: https://github.com/apache/hive/pull/1208 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvary merged pull request #1196: HIVE-23791: Optimize ACID stats generation
pvary merged pull request #1196: URL: https://github.com/apache/hive/pull/1196 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] xiaomengzhang commented on pull request #1213: HIVE-23765: Use ORC file format by default when creating transactiona…
xiaomengzhang commented on pull request #1213: URL: https://github.com/apache/hive/pull/1213#issuecomment-654462812 @nrg4878 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] xiaomengzhang opened a new pull request #1213: HIVE-23765: Use ORC file format by default when creating transactiona…
xiaomengzhang opened a new pull request #1213: URL: https://github.com/apache/hive/pull/1213 …l table When table has property that transactional is 'true' and transactional_properties is not 'insert_only', make fileformat ORC. ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] vineetgarg02 opened a new pull request #1212: HIVE-23807 Wrong results with vectorization enabled
vineetgarg02 opened a new pull request #1212: URL: https://github.com/apache/hive/pull/1212 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] StefanXiepj commented on pull request #1209: HIVE-22412: StatsUtils throw NPE when explain
StefanXiepj commented on pull request #1209: URL: https://github.com/apache/hive/pull/1209#issuecomment-654557578 > Thanks for the contribution. > > I need to look at it more, but if 'null' is not a valid option to pass into the constructors, then it should be prohibited: `Objects#requireNull` Please check where the null value is being passed in and instead have the calling class pass in an `Collections#EmptyList` if possible, or a new `ArrayList` otherwise. Thanks for your review. The details about this exeception stack is here: About Map, you can find it at `org.apache.hadoop.hive.ql.stats.StatsUtils#getSizeOfMap`: ``` public static long getSizeOfMap(StandardConstantMapObjectInspector scmoi) { // This map is null Map map = scmoi.getWritableConstantValue(); ObjectInspector koi = scmoi.getMapKeyObjectInspector(); ObjectInspector voi = scmoi.getMapValueObjectInspector(); long result = 0; // NPE will thrown at here for (Map.Entry entry : map.entrySet()) { result += getWritableSize(koi, entry.getKey()); result += getWritableSize(voi, entry.getValue()); } ... } ``` Maybe we can fix it like this: ``` public static long getSizeOfMap(StandardConstantMapObjectInspector scmoi) { // This map is null Map map = scmoi.getWritableConstantValue(); // return zero when map is null if (null == map) { return 0; } ObjectInspector koi = scmoi.getMapKeyObjectInspector(); ObjectInspector voi = scmoi.getMapValueObjectInspector(); long result = 0; // NPE will thrown at here for (Map.Entry entry : map.entrySet()) { result += getWritableSize(koi, entry.getKey()); result += getWritableSize(voi, entry.getValue()); } ... } ``` We might want to fix this in StatsUtils, but i don't think it is a good idea, it is better to fix it by initializing value for StandardConstantMapObjectInspector, StandardConstantListObjectInspector and StandardConstantStructObjectInspector. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] adesh-rao commented on a change in pull request #1109: HIVE-22015: Add table constraints in CachedStore
adesh-rao commented on a change in pull request #1109: URL: https://github.com/apache/hive/pull/1109#discussion_r450598395 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -543,10 +557,30 @@ static void prewarm(RawStore rawStore) { tableColStats = rawStore.getTableColumnStatistics(catName, dbName, tblName, colNames, CacheUtils.HIVE_ENGINE); Deadline.stopTimer(); } + Deadline.startTimer("getPrimaryKeys"); + primaryKeys = rawStore.getPrimaryKeys(catName, dbName, tblName); + Deadline.stopTimer(); + cacheObjects.setPrimaryKeys(primaryKeys); + + Deadline.startTimer("getForeignKeys"); + foreignKeys = rawStore.getForeignKeys(catName, null, null, dbName, tblName); Review comment: Current usages in code of `getForeignKeys` contains null only for parentDb/table, foreignDb/table are always being populated. So let's skip it as you mentioned. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] adesh-rao commented on a change in pull request #1109: HIVE-22015: Add table constraints in CachedStore
adesh-rao commented on a change in pull request #1109: URL: https://github.com/apache/hive/pull/1109#discussion_r450599566 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -2497,26 +2610,87 @@ long getPartsFound() { @Override public List getPrimaryKeys(String catName, String dbName, String tblName) throws MetaException { -// TODO constraintCache -return rawStore.getPrimaryKeys(catName, dbName, tblName); +catName = normalizeIdentifier(catName); +dbName = StringUtils.normalizeIdentifier(dbName); +tblName = StringUtils.normalizeIdentifier(tblName); +if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && rawStore.isActiveTransaction())) { + return rawStore.getPrimaryKeys(catName, dbName, tblName); +} + +Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName); +if (tbl == null) { + // The table containing the primary keys is not yet loaded in cache + return rawStore.getPrimaryKeys(catName, dbName, tblName); +} +List keys = sharedCache.listCachedPrimaryKeys(catName, dbName, tblName); + +return keys; } @Override public List getForeignKeys(String catName, String parentDbName, String parentTblName, String foreignDbName, String foreignTblName) throws MetaException { -// TODO constraintCache -return rawStore.getForeignKeys(catName, parentDbName, parentTblName, foreignDbName, foreignTblName); + // Get correct ForeignDBName and TableName +if (foreignDbName == null || foreignTblName == null) { + return rawStore.getForeignKeys(catName, parentDbName, parentTblName, foreignDbName, foreignTblName); Review comment: Created https://issues.apache.org/jira/browse/HIVE-23810 for followup. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org