[GitHub] [hive] kgyrtkirk commented on pull request #1187: HIVE-23772. Reallocate calcite-core to prevent NoSuchFiledError.

2020-06-29 Thread GitBox


kgyrtkirk commented on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-651580913


   iirc the point where the switch to the "non-shaded calcite-core" happens is 
when the jdbc driver is loaded  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kasakrisz merged pull request #1188: HIVE-19549: Enable TestAcidOnTez#testCtasTezUnion

2020-06-29 Thread GitBox


kasakrisz merged pull request #1188:
URL: https://github.com/apache/hive/pull/1188


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] jcamachor commented on a change in pull request #1190: [HIVE-23770] Fix Druid filter translation for inverted between

2020-06-29 Thread GitBox


jcamachor commented on a change in pull request #1190:
URL: https://github.com/apache/hive/pull/1190#discussion_r447411719



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveDruidPushInvertIntoBetweenRule.java
##
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexShuttle;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveBetween;
+
+/**
+ * This rule is opposite of HiveDruidPullInvertFromBetweenRule
+ * It pushed invert back into Between.
+ */
+public class HiveDruidPushInvertIntoBetweenRule extends RelOptRule {
+
+protected static final Log LOG = 
LogFactory.getLog(HiveDruidPushInvertIntoBetweenRule.class);
+
+public static final HiveDruidPushInvertIntoBetweenRule INSTANCE =
+new HiveDruidPushInvertIntoBetweenRule();
+
+private HiveDruidPushInvertIntoBetweenRule() {
+super(operand(Filter.class, any()));
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+final Filter filter = call.rel(0);
+final RexBuilder rexBuilder = filter.getCluster().getRexBuilder();
+final RexNode condition = RexUtil.pullFactors(rexBuilder, 
filter.getCondition());
+
+RexPullInvertFromBetween t = new RexPullInvertFromBetween(rexBuilder);
+RexNode newCondition = t.apply(condition);
+
+// If we could not transform anything, we bail out
+if (newCondition.toString().equals(condition.toString())) {

Review comment:
   Same comment as in the other rule.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveDruidPushInvertIntoBetweenRule.java
##
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexShuttle;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveBetween;
+
+/**
+ * This rule is opposite of HiveDruidPullInvertFromBetweenRule
+ * It pushed invert back into Between.
+ */
+public class HiveDruidPushInvertIntoBetweenRule extends RelOptRule {
+
+protected static final Log LOG = 
LogFactory.getLog(HiveDruidPushInvertIntoBetweenRule.class);
+
+public static final HiveDruidPushInvertIntoBetweenRule INSTANCE =
+new HiveDruidPushInvertIntoBetweenRule();
+
+private HiveDruidPushInvertIntoBetweenRule() {
+super(operand(Fil

[GitHub] [hive] nareshpr removed a comment on pull request #1191: HIVE-23779: BasicStatsTask Info is not getting printed in beeline console

2020-06-29 Thread GitBox


nareshpr removed a comment on pull request #1191:
URL: https://github.com/apache/hive/pull/1191#issuecomment-651522808


   /retest



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] nareshpr commented on pull request #1191: HIVE-23779: BasicStatsTask Info is not getting printed in beeline console

2020-06-29 Thread GitBox


nareshpr commented on pull request #1191:
URL: https://github.com/apache/hive/pull/1191#issuecomment-651522808


   /retest



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] github-actions[bot] closed pull request #20: HIVE 2304 : for hive2

2020-06-29 Thread GitBox


github-actions[bot] closed pull request #20:
URL: https://github.com/apache/hive/pull/20


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] github-actions[bot] closed pull request #18: Branch 0.12

2020-06-29 Thread GitBox


github-actions[bot] closed pull request #18:
URL: https://github.com/apache/hive/pull/18


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] github-actions[bot] closed pull request #19: Add case sensitivity support to Parquet Serde.

2020-06-29 Thread GitBox


github-actions[bot] closed pull request #19:
URL: https://github.com/apache/hive/pull/19


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] github-actions[bot] closed pull request #22: Hive 8384

2020-06-29 Thread GitBox


github-actions[bot] closed pull request #22:
URL: https://github.com/apache/hive/pull/22


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] rbalamohan commented on a change in pull request #1081: HIVE-23597: VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventReg…

2020-06-29 Thread GitBox


rbalamohan commented on a change in pull request #1081:
URL: https://github.com/apache/hive/pull/1081#discussion_r447302965



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1605,6 +1618,46 @@ public int compareTo(CompressedOwid other) {
 throw e; // rethrow the exception so that the caller can handle.
   }
 }
+
+/**
+ * Create delete delta reader. Caching orc tail to avoid FS lookup/reads 
for repeated scans.
+ *
+ * @param deleteDeltaFile
+ * @param conf
+ * @param fs FileSystem
+ * @return delete file reader
+ * @throws IOException
+ */
+private Reader getDeleteDeltaReader(Path deleteDeltaFile, JobConf conf, 
FileSystem fs) throws IOException {
+  OrcTail deleteDeltaTail = 
deleteDeltaOrcTailCache.getIfPresent(deleteDeltaFile);

Review comment:
   Yes, it will not change for the file.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] mustafaiman opened a new pull request #1192: HIVE-23780: Fail dropTable if acid cleanup fails

2020-06-29 Thread GitBox


mustafaiman opened a new pull request #1192:
URL: https://github.com/apache/hive/pull/1192


   Change-Id: Ica7666afe40cb0f0128266c9c3f6ebc560b24c0e
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] ayushtkn edited a comment on pull request #1187: HIVE-23772. Reallocate calcite-core to prevent NoSuchFiledError.

2020-06-29 Thread GitBox


ayushtkn edited a comment on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-651363049


   The values passed are from :
   
https://github.com/apache/calcite/blob/branch-1.21/core/src/main/java/org/apache/calcite/jdbc/CalciteConnectionImpl.java#L127
   The type factory must be null since it is the else part.
   
   The trace says its L127 -
   `at 
org.apache.calcite.jdbc.CalciteConnectionImpl.(CalciteConnectionImpl.java:127`
   
   and both argument ideally should be from `calcite-core` hence should have 
got relocated, they are further tweaked at `calcite-avatica` only, may be some 
dependency which uses calcite-core is called, Need to find, For that need to 
clone and get the dependency tree and stuff, or debug where the signature got 
changed from shaded to non shaded.
   Will find time to dig in more may be this or next weekend for this. Need to 
see Calcite working as well, Which I have no idea. Do let me know if you have 
any further pointers. :-) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] ayushtkn commented on pull request #1187: HIVE-23772. Reallocate calcite-core to prevent NoSuchFiledError.

2020-06-29 Thread GitBox


ayushtkn commented on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-651363049


   The values passed are from :
   
https://github.com/apache/calcite/blob/branch-1.21/core/src/main/java/org/apache/calcite/jdbc/CalciteConnectionImpl.java#L127
   The type factory must be null since it is the else part.
   
   The trace says its L127 -
   `at 
org.apache.calcite.jdbc.CalciteConnectionImpl.(CalciteConnectionImpl.java:127`
   
   and both argument ideally should be from `calcite-core` hence should have 
got relocated, they are further tweaked at `calcite-avatica` only, may be some 
dependency which uses calcite-core is called, Need to find, For that need to 
clone and get the dependency tree and stuff, or debug where the signature got 
changed from shaded to non shaded.
   Will find time to dig in more may be this or next weekend for this. Do let 
me know if you have any further pointers.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kgyrtkirk commented on pull request #1188: HIVE-19549: Enable TestAcidOnTez#testCtasTezUnion

2020-06-29 Thread GitBox


kgyrtkirk commented on pull request #1188:
URL: https://github.com/apache/hive/pull/1188#issuecomment-651326251


   before enabling flaky tests back please run the checker; without that there 
is no proof that the test is most likely stable
   I've started it for this one:
   http://ci.hive.apache.org/job/hive-flaky-check/54/
   http://ci.hive.apache.org/job/hive-flaky-check/55/



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kgyrtkirk commented on pull request #1187: HIVE-23772. Reallocate calcite-core to prevent NoSuchFiledError.

2020-06-29 Thread GitBox


kgyrtkirk commented on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-651325059


   yeah; now I remember - I've also reached these avatica exceptions; the story 
behind them is:
   * jdbc driver is used to open the connection; it passes some things as 
arguments to the jdbc driver
   * the typefactory is "org.calcite...""
   * the expected type factory is "shaded.org.calcite"
   so it fails with an exception... 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] nareshpr opened a new pull request #1191: HIVE-23779: BasicStatsTask Info is not getting printed in beeline console

2020-06-29 Thread GitBox


nareshpr opened a new pull request #1191:
URL: https://github.com/apache/hive/pull/1191


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pkumarsinha commented on a change in pull request #1120: HIVE-23611 : Mandate fully qualified absolute path for external table…

2020-06-29 Thread GitBox


pkumarsinha commented on a change in pull request #1120:
URL: https://github.com/apache/hive/pull/1120#discussion_r447178750



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -517,7 +508,7 @@ public void externalTableIncrementalReplication() throws 
Throwable {
 + "'")
 .run("alter table t1 add partition(country='india')")
 .run("alter table t1 add partition(country='us')")
-.dump(primaryDbName, withClause);

Review comment:
   Not needed anymore





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pkumarsinha commented on a change in pull request #1120: HIVE-23611 : Mandate fully qualified absolute path for external table…

2020-06-29 Thread GitBox


pkumarsinha commented on a change in pull request #1120:
URL: https://github.com/apache/hive/pull/1120#discussion_r447178408



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -159,14 +154,14 @@ public void externalTableReplicationWithDefaultPaths() 
throws Throwable {
 .run("insert into table t2 partition(country='india') values 
('bangalore')")
 .run("insert into table t2 partition(country='us') values ('austin')")
 .run("insert into table t2 partition(country='france') values 
('paris')")
-.dump(primaryDbName, withClauseOptions);
+.dump(primaryDbName);

Review comment:
   Not needed anymore

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -503,8 +495,7 @@ public void externalTableIncrementalCheckpointing() throws 
Throwable {
 
   @Test
   public void externalTableIncrementalReplication() throws Throwable {
-List withClause = externalTableBasePathWithClause();
-WarehouseInstance.Tuple tuple = primary.dump(primaryDbName, withClause);
+WarehouseInstance.Tuple tuple = primary.dump(primaryDbName);

Review comment:
   Not needed anymore

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -623,7 +612,7 @@ public void bootstrapExternalTablesDuringIncrementalPhase() 
throws Throwable {
 assertFalse(primary.miniDFSCluster.getFileSystem()
 .exists(new Path(metadataPath + relativeExtInfoPath(null;
 
-replica.load(replicatedDbName, primaryDbName, loadWithClause)
+replica.load(replicatedDbName, primaryDbName)

Review comment:
   Not needed anymore





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pkumarsinha commented on a change in pull request #1120: HIVE-23611 : Mandate fully qualified absolute path for external table…

2020-06-29 Thread GitBox


pkumarsinha commented on a change in pull request #1120:
URL: https://github.com/apache/hive/pull/1120#discussion_r447174957



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/BaseReplicationAcrossInstances.java
##
@@ -103,6 +114,12 @@ public static void classLevelTearDown() throws IOException 
{
 replica.close();
   }
 
+  private static void setReplicaExternalBase(FileSystem fs, Map confMap) throws IOException {
+fs.mkdirs(REPLICA_EXTERNAL_BASE);
+fullyQualifiedReplicaExternalBase =  
fs.getFileStatus(REPLICA_EXTERNAL_BASE).getPath().toString();
+confMap.put(HiveConf.ConfVars.REPL_EXTERNAL_TABLE_BASE_DIR.varname, 
fullyQualifiedReplicaExternalBase);

Review comment:
   Yes, needed both the places. Both are different case.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] vihangk1 commented on pull request #1095: Make all DDLs transactional and advance commit write ID, so that subsequent read will invalidate the cache entry.

2020-06-29 Thread GitBox


vihangk1 commented on pull request #1095:
URL: https://github.com/apache/hive/pull/1095#issuecomment-651247834


   Patch has been merged into master branch.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] szlta merged pull request #1159: HIVE-23741: Store CacheTags in the file cache level

2020-06-29 Thread GitBox


szlta merged pull request #1159:
URL: https://github.com/apache/hive/pull/1159


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kgyrtkirk commented on a change in pull request #1126: HIVE-23598: Add option to rewrite NTILE and RANK to sketch functions

2020-06-29 Thread GitBox


kgyrtkirk commented on a change in pull request #1126:
URL: https://github.com/apache/hive/pull/1126#discussion_r447097434



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRewriteToDataSketchesRules.java
##
@@ -483,46 +489,44 @@ protected final SqlOperator getSqlOperator(String fnName) 
{
   }
 
   /**
-   * Rewrites {@code cume_dist() over (order by id)}.
+   * Provides a generic way to rewrite function into using an estimation based 
on CDF.
+   *
+   *  There are a few methods which could be supported this way: NTILE, 
CUME_DIST, RANK
*
+   *  For example:
*  
*   SELECT id, CUME_DIST() OVER (ORDER BY id) FROM sketch_input;
-   * ⇒ SELECT id, 1.0-ds_kll_cdf(ds, CAST(-id AS FLOAT) )[0]
+   * ⇒ SELECT id, ds_kll_cdf(ds, CAST(id AS FLOAT) )[0]
*   FROM sketch_input JOIN (
-   * SELECT ds_kll_sketch(CAST(-id AS FLOAT)) AS ds FROM sketch_input
+   * SELECT ds_kll_sketch(CAST(id AS FLOAT)) AS ds FROM sketch_input
*   ) q;
*  
*/
-  public static class CumeDistRewrite extends 
WindowingToProjectAggregateJoinProject {
+  public static abstract class AbstractRankBasedRewriteRule extends 
WindowingToProjectAggregateJoinProject {
 
-public CumeDistRewrite(String sketchType) {
+public AbstractRankBasedRewriteRule(String sketchType) {
   super(sketchType);
 }
 
-@Override
-protected VbuilderPAP buildProcessor(RelOptRuleCall call) {
-  return new VB(sketchType, call.builder());
-}
+protected static abstract class AbstractRankBasedRewriteBuilder extends 
VbuilderPAP {
 
-private static class VB extends VbuilderPAP {
-
-  protected VB(String sketchClass, RelBuilder relBuilder) {
+  protected AbstractRankBasedRewriteBuilder(String sketchClass, RelBuilder 
relBuilder) {
 super(sketchClass, relBuilder);
   }
 
   @Override
-  boolean isApplicable(RexOver over) {
-SqlAggFunction aggOp = over.getAggOperator();
+  final boolean isApplicable(RexOver over) {
 RexWindow window = over.getWindow();
-if (aggOp.getName().equalsIgnoreCase("cume_dist") && 
window.orderKeys.size() == 1
-&& window.getLowerBound().isUnbounded() && 
window.getUpperBound().isUnbounded()) {
+if (window.orderKeys.size() == 1
+&& window.getLowerBound().isUnbounded() && 
window.getUpperBound().isUnbounded()

Review comment:
   the logic to handle 
   ```
   Order by spec -> range, unbounded preceding, current row
   This also aligns with most RDBMSs implementation
   ```
   I think at the time this rule fires it will see unbounded/unbounded...but 
it's very weird I'll open a separate ticket to invetigate that 
   
   opened: HIVE-23775





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kgyrtkirk commented on a change in pull request #1126: HIVE-23598: Add option to rewrite NTILE and RANK to sketch functions

2020-06-29 Thread GitBox


kgyrtkirk commented on a change in pull request #1126:
URL: https://github.com/apache/hive/pull/1126#discussion_r447081115



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRewriteToDataSketchesRules.java
##
@@ -483,46 +489,44 @@ protected final SqlOperator getSqlOperator(String fnName) 
{
   }
 
   /**
-   * Rewrites {@code cume_dist() over (order by id)}.
+   * Provides a generic way to rewrite function into using an estimation based 
on CDF.
+   *
+   *  There are a few methods which could be supported this way: NTILE, 
CUME_DIST, RANK
*
+   *  For example:
*  
*   SELECT id, CUME_DIST() OVER (ORDER BY id) FROM sketch_input;
-   * ⇒ SELECT id, 1.0-ds_kll_cdf(ds, CAST(-id AS FLOAT) )[0]
+   * ⇒ SELECT id, ds_kll_cdf(ds, CAST(id AS FLOAT) )[0]
*   FROM sketch_input JOIN (
-   * SELECT ds_kll_sketch(CAST(-id AS FLOAT)) AS ds FROM sketch_input
+   * SELECT ds_kll_sketch(CAST(id AS FLOAT)) AS ds FROM sketch_input
*   ) q;
*  
*/
-  public static class CumeDistRewrite extends 
WindowingToProjectAggregateJoinProject {
+  public static abstract class AbstractRankBasedRewriteRule extends 
WindowingToProjectAggregateJoinProject {
 
-public CumeDistRewrite(String sketchType) {
+public AbstractRankBasedRewriteRule(String sketchType) {
   super(sketchType);
 }
 
-@Override
-protected VbuilderPAP buildProcessor(RelOptRuleCall call) {
-  return new VB(sketchType, call.builder());
-}
+protected static abstract class AbstractRankBasedRewriteBuilder extends 
VbuilderPAP {
 
-private static class VB extends VbuilderPAP {
-
-  protected VB(String sketchClass, RelBuilder relBuilder) {
+  protected AbstractRankBasedRewriteBuilder(String sketchClass, RelBuilder 
relBuilder) {
 super(sketchClass, relBuilder);
   }
 
   @Override
-  boolean isApplicable(RexOver over) {
-SqlAggFunction aggOp = over.getAggOperator();
+  final boolean isApplicable(RexOver over) {
 RexWindow window = over.getWindow();
-if (aggOp.getName().equalsIgnoreCase("cume_dist") && 
window.orderKeys.size() == 1
-&& window.getLowerBound().isUnbounded() && 
window.getUpperBound().isUnbounded()) {
+if (window.orderKeys.size() == 1
+&& window.getLowerBound().isUnbounded() && 
window.getUpperBound().isUnbounded()

Review comment:
   interesting; mostly the second :D
   for the current functions (ntile,cume_dist) doesn't really make sense to set 
the window anything than unbounded (or at least I don't see a usecase for it)
   
   I've tried this out for the below query:
   ```
   select id,ntile(id) over (order by id rows between 1 preceding and 1 
following) from sketch_input order by id nulls last;
   ```
   * mysql: rejects it with an error that `ntile` doesn't support it
   * psql: accepts and executes it without interpreting the preceding/following 
stuff correctly
   * hive: stops with a semanticexception
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447053621



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;
   }
   if (getMsc() != null) {
 getMsc().close();
 msc = null;
   }
 }
+return success ? 0 : 1;
+  }
 
+  private boolean closeTxn(String qualifiedTableName, boolean success, long 
txnId) {
+boolean ret = true;
+if (success) {
+  try {
+LOG.info("txnId: {} succeeded. Committing..", txnId);
+getMsc().commitTxn(txnId);
+  } catch (Exception e) {
+LOG.warn("Error while committing txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
+ret = false;
+  }
+} else {
+  try {
+LOG.info("txnId: {} failed. Aborting..

[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447052663



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;
   }
   if (getMsc() != null) {
 getMsc().close();
 msc = null;
   }
 }
+return success ? 0 : 1;
+  }
 
+  private boolean closeTxn(String qualifiedTableName, boolean success, long 
txnId) {
+boolean ret = true;
+if (success) {
+  try {
+LOG.info("txnId: {} succeeded. Committing..", txnId);
+getMsc().commitTxn(txnId);
+  } catch (Exception e) {
+LOG.warn("Error while committing txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
+ret = false;
+  }
+} else {
+  try {
+LOG.info("txnId: {} failed. Aborting..

[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447050970



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;
   }
   if (getMsc() != null) {
 getMsc().close();
 msc = null;
   }
 }
+return success ? 0 : 1;
+  }
 
+  private boolean closeTxn(String qualifiedTableName, boolean success, long 
txnId) {
+boolean ret = true;
+if (success) {
+  try {
+LOG.info("txnId: {} succeeded. Committing..", txnId);
+getMsc().commitTxn(txnId);
+  } catch (Exception e) {
+LOG.warn("Error while committing txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
+ret = false;
+  }
+} else {
+  try {
+LOG.info("txnId: {} failed. Aborting..

[GitHub] [hive] nishantmonu51 opened a new pull request #1190: [HIVE-23770] Fix Druid filter translation for inverted between

2020-06-29 Thread GitBox


nishantmonu51 opened a new pull request #1190:
URL: https://github.com/apache/hive/pull/1190


   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447046016



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;
   }
   if (getMsc() != null) {
 getMsc().close();
 msc = null;
   }
 }
+return success ? 0 : 1;
+  }
 
+  private boolean closeTxn(String qualifiedTableName, boolean success, long 
txnId) {
+boolean ret = true;
+if (success) {
+  try {
+LOG.info("txnId: {} succeeded. Committing..", txnId);
+getMsc().commitTxn(txnId);
+  } catch (Exception e) {
+LOG.warn("Error while committing txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
+ret = false;
+  }
+} else {
+  try {
+LOG.info("txnId: {} failed. Aborting..

[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447044667



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;
   }
   if (getMsc() != null) {
 getMsc().close();
 msc = null;
   }
 }
+return success ? 0 : 1;
+  }
 
+  private boolean closeTxn(String qualifiedTableName, boolean success, long 
txnId) {
+boolean ret = true;
+if (success) {
+  try {
+LOG.info("txnId: {} succeeded. Committing..", txnId);
+getMsc().commitTxn(txnId);
+  } catch (Exception e) {
+LOG.warn("Error while committing txnId: {} for table: {}", txnId, 
qualifiedTableName, e);

Review comment:
   should we use error level here?




-

[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447044866



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;
   }
   if (getMsc() != null) {
 getMsc().close();
 msc = null;
   }
 }
+return success ? 0 : 1;
+  }
 
+  private boolean closeTxn(String qualifiedTableName, boolean success, long 
txnId) {
+boolean ret = true;
+if (success) {
+  try {
+LOG.info("txnId: {} succeeded. Committing..", txnId);
+getMsc().commitTxn(txnId);
+  } catch (Exception e) {
+LOG.warn("Error while committing txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
+ret = false;
+  }
+} else {
+  try {
+LOG.info("txnId: {} failed. Aborting..

[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447042125



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;

Review comment:
   same  success &= closeTxn(qualifiedTableName, success, txnId)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447041559



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;

Review comment:
   you can do success &= writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds)
   for readability





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447037968



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {

Review comment:
   not formatted





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447037520



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {

Review comment:
   you can remove 1 nesting level





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447033768



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -429,6 +451,75 @@ void findUnknownPartitions(Table table, Set 
partPaths,
 LOG.debug("Number of partitions not in metastore : " + 
result.getPartitionsNotInMs().size());
   }
 
+  /**
+   * Calculate the maximum seen writeId from the acid directory structure
+   * @param partPath Path of the partition directory
+   * @param res Partition result to write the max ids
+   * @throws IOException ex
+   */
+  private void setMaxTxnAndWriteIdFromPartition(Path partPath, 
CheckResult.PartitionResult res) throws IOException {
+FileSystem fs = partPath.getFileSystem(conf);
+FileStatus[] deltaOrBaseFiles = fs.listStatus(partPath, 
HIDDEN_FILES_PATH_FILTER);
+
+// Read the writeIds from every base and delta directory and find the max
+long maxWriteId = 0L;
+long maxVisibilityId = 0L;
+for(FileStatus fileStatus : deltaOrBaseFiles) {
+  if (!fileStatus.isDirectory()) {
+continue;
+  }
+  long writeId = 0L;
+  long visibilityId = 0L;
+  String folder = fileStatus.getPath().getName();
+  if (folder.startsWith(BASE_PREFIX)) {
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+writeId = Long.parseLong(folder.substring(BASE_PREFIX.length()));
+  } else if (folder.startsWith(DELTA_PREFIX) || 
folder.startsWith(DELETE_DELTA_PREFIX)) {
+// See AcidUtils.parseDelta
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+boolean isDeleteDelta = folder.startsWith(DELETE_DELTA_PREFIX);
+String rest = folder.substring((isDeleteDelta ? DELETE_DELTA_PREFIX : 
DELTA_PREFIX).length());
+int split = rest.indexOf('_');
+//split2 may be -1 if no statementId
+int split2 = rest.indexOf('_', split + 1);
+// We always want the second part (it is either the same or greater if 
it is a compacted delta)
+writeId = split2 == -1 ? Long.parseLong(rest.substring(split + 1)) : 
Long
+.parseLong(rest.substring(split + 1, split2));
+  }
+  if (writeId > maxWriteId) {
+maxWriteId = writeId;
+  }
+  if (visibilityId > maxVisibilityId) {
+maxVisibilityId = visibilityId;
+  }
+}
+LOG.debug("Max writeId {}, max txnId {} found in partition {}", 
maxWriteId, maxVisibilityId,
+partPath.toUri().toString());
+res.setMaxWriteId(maxWriteId);
+res.setMaxTxnId(maxVisibilityId);
+  }
+  private long getVisibilityTxnId(String folder) {
+int idxOfVis = folder.indexOf(VISIBILITY_PREFIX);

Review comment:
   why not use regex? removeVisibilityTxnId probably wouldn't even be needed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447033768



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -429,6 +451,75 @@ void findUnknownPartitions(Table table, Set 
partPaths,
 LOG.debug("Number of partitions not in metastore : " + 
result.getPartitionsNotInMs().size());
   }
 
+  /**
+   * Calculate the maximum seen writeId from the acid directory structure
+   * @param partPath Path of the partition directory
+   * @param res Partition result to write the max ids
+   * @throws IOException ex
+   */
+  private void setMaxTxnAndWriteIdFromPartition(Path partPath, 
CheckResult.PartitionResult res) throws IOException {
+FileSystem fs = partPath.getFileSystem(conf);
+FileStatus[] deltaOrBaseFiles = fs.listStatus(partPath, 
HIDDEN_FILES_PATH_FILTER);
+
+// Read the writeIds from every base and delta directory and find the max
+long maxWriteId = 0L;
+long maxVisibilityId = 0L;
+for(FileStatus fileStatus : deltaOrBaseFiles) {
+  if (!fileStatus.isDirectory()) {
+continue;
+  }
+  long writeId = 0L;
+  long visibilityId = 0L;
+  String folder = fileStatus.getPath().getName();
+  if (folder.startsWith(BASE_PREFIX)) {
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+writeId = Long.parseLong(folder.substring(BASE_PREFIX.length()));
+  } else if (folder.startsWith(DELTA_PREFIX) || 
folder.startsWith(DELETE_DELTA_PREFIX)) {
+// See AcidUtils.parseDelta
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+boolean isDeleteDelta = folder.startsWith(DELETE_DELTA_PREFIX);
+String rest = folder.substring((isDeleteDelta ? DELETE_DELTA_PREFIX : 
DELTA_PREFIX).length());
+int split = rest.indexOf('_');
+//split2 may be -1 if no statementId
+int split2 = rest.indexOf('_', split + 1);
+// We always want the second part (it is either the same or greater if 
it is a compacted delta)
+writeId = split2 == -1 ? Long.parseLong(rest.substring(split + 1)) : 
Long
+.parseLong(rest.substring(split + 1, split2));
+  }
+  if (writeId > maxWriteId) {
+maxWriteId = writeId;
+  }
+  if (visibilityId > maxVisibilityId) {
+maxVisibilityId = visibilityId;
+  }
+}
+LOG.debug("Max writeId {}, max txnId {} found in partition {}", 
maxWriteId, maxVisibilityId,
+partPath.toUri().toString());
+res.setMaxWriteId(maxWriteId);
+res.setMaxTxnId(maxVisibilityId);
+  }
+  private long getVisibilityTxnId(String folder) {
+int idxOfVis = folder.indexOf(VISIBILITY_PREFIX);

Review comment:
   why not use regex?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447028756



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -429,6 +451,75 @@ void findUnknownPartitions(Table table, Set 
partPaths,
 LOG.debug("Number of partitions not in metastore : " + 
result.getPartitionsNotInMs().size());
   }
 
+  /**
+   * Calculate the maximum seen writeId from the acid directory structure
+   * @param partPath Path of the partition directory
+   * @param res Partition result to write the max ids
+   * @throws IOException ex
+   */
+  private void setMaxTxnAndWriteIdFromPartition(Path partPath, 
CheckResult.PartitionResult res) throws IOException {
+FileSystem fs = partPath.getFileSystem(conf);
+FileStatus[] deltaOrBaseFiles = fs.listStatus(partPath, 
HIDDEN_FILES_PATH_FILTER);
+
+// Read the writeIds from every base and delta directory and find the max
+long maxWriteId = 0L;
+long maxVisibilityId = 0L;
+for(FileStatus fileStatus : deltaOrBaseFiles) {
+  if (!fileStatus.isDirectory()) {
+continue;
+  }
+  long writeId = 0L;
+  long visibilityId = 0L;
+  String folder = fileStatus.getPath().getName();
+  if (folder.startsWith(BASE_PREFIX)) {
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+writeId = Long.parseLong(folder.substring(BASE_PREFIX.length()));
+  } else if (folder.startsWith(DELTA_PREFIX) || 
folder.startsWith(DELETE_DELTA_PREFIX)) {
+// See AcidUtils.parseDelta
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+boolean isDeleteDelta = folder.startsWith(DELETE_DELTA_PREFIX);
+String rest = folder.substring((isDeleteDelta ? DELETE_DELTA_PREFIX : 
DELTA_PREFIX).length());
+int split = rest.indexOf('_');

Review comment:
   why not use rest.split('_')

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -429,6 +451,75 @@ void findUnknownPartitions(Table table, Set 
partPaths,
 LOG.debug("Number of partitions not in metastore : " + 
result.getPartitionsNotInMs().size());
   }
 
+  /**
+   * Calculate the maximum seen writeId from the acid directory structure
+   * @param partPath Path of the partition directory
+   * @param res Partition result to write the max ids
+   * @throws IOException ex
+   */
+  private void setMaxTxnAndWriteIdFromPartition(Path partPath, 
CheckResult.PartitionResult res) throws IOException {
+FileSystem fs = partPath.getFileSystem(conf);
+FileStatus[] deltaOrBaseFiles = fs.listStatus(partPath, 
HIDDEN_FILES_PATH_FILTER);
+
+// Read the writeIds from every base and delta directory and find the max
+long maxWriteId = 0L;
+long maxVisibilityId = 0L;
+for(FileStatus fileStatus : deltaOrBaseFiles) {
+  if (!fileStatus.isDirectory()) {
+continue;
+  }
+  long writeId = 0L;
+  long visibilityId = 0L;
+  String folder = fileStatus.getPath().getName();
+  if (folder.startsWith(BASE_PREFIX)) {
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+writeId = Long.parseLong(folder.substring(BASE_PREFIX.length()));
+  } else if (folder.startsWith(DELTA_PREFIX) || 
folder.startsWith(DELETE_DELTA_PREFIX)) {
+// See AcidUtils.parseDelta
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+boolean isDeleteDelta = folder.startsWith(DELETE_DELTA_PREFIX);
+String rest = folder.substring((isDeleteDelta ? DELETE_DELTA_PREFIX : 
DELTA_PREFIX).length());
+int split = rest.indexOf('_');

Review comment:
   why not use rest.split('_')?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447003871



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -111,24 +120,24 @@ public IMetaStoreClient getMsc() {
* @param partitions
*  List of partition name value pairs, if null or empty check all
*  partitions
-   * @param table
-   * @param result
-   *  Fill this with the results of the check
+   * @param table Table we want to run the check for.
+   * @return Results of the check
* @throws MetastoreException
*   Failed to get required information from the metastore.
* @throws IOException
*   Most likely filesystem related
*/
-  public void checkMetastore(String catName, String dbName, String tableName,
-  List> partitions, Table table, CheckResult 
result)
+  public CheckResult checkMetastore(String catName, String dbName, String 
tableName,
+  List> partitions, Table table)
   throws MetastoreException, IOException {
-
+CheckResult result = new CheckResult();
 if (dbName == null || "".equalsIgnoreCase(dbName)) {
   dbName = Warehouse.DEFAULT_DATABASE_NAME;
 }
 
 try {
   if (tableName == null || "".equals(tableName)) {
+// TODO: I do not think this is used by anything other than tests

Review comment:
   should we answer this question in a current patch?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447001420



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##
@@ -8322,6 +8322,22 @@ public AllocateTableWriteIdsResponse 
allocate_table_write_ids(
   return response;
 }
 
+@Override
+public MaxAllocatedTableWriteIdResponse 
get_max_allocated_table_write_id(MaxAllocatedTableWriteIdRequest rqst)

Review comment:
   weird mix of Camel case and underscore?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##
@@ -8322,6 +8322,22 @@ public AllocateTableWriteIdsResponse 
allocate_table_write_ids(
   return response;
 }
 
+@Override
+public MaxAllocatedTableWriteIdResponse 
get_max_allocated_table_write_id(MaxAllocatedTableWriteIdRequest rqst)

Review comment:
   weird mix of Camel case and underscore





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] deniskuzZ commented on a change in pull request #1087: HIVE-23671: MSCK repair should handle transactional tables

2020-06-29 Thread GitBox


deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447001420



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##
@@ -8322,6 +8322,22 @@ public AllocateTableWriteIdsResponse 
allocate_table_write_ids(
   return response;
 }
 
+@Override
+public MaxAllocatedTableWriteIdResponse 
get_max_allocated_table_write_id(MaxAllocatedTableWriteIdRequest rqst)

Review comment:
   why not Camel case?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] belugabehr closed pull request #1090: HIVE-22676: Replace Base64 in hive-service Package

2020-06-29 Thread GitBox


belugabehr closed pull request #1090:
URL: https://github.com/apache/hive/pull/1090


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] belugabehr commented on a change in pull request #1161: HIVE-23638: Spotbugs issues in hive-common

2020-06-29 Thread GitBox


belugabehr commented on a change in pull request #1161:
URL: https://github.com/apache/hive/pull/1161#discussion_r446983489



##
File path: common/src/java/org/apache/hadoop/hive/common/FileUtils.java
##
@@ -483,12 +483,6 @@ public static boolean 
isActionPermittedForFileHierarchy(FileSystem fs, FileStatu
   String userName, FsAction action, boolean recurse) throws Exception {
 boolean isDir = fileStatus.isDir();
 
-FsAction dirActionNeeded = action;
-if (isDir) {
-  // for dirs user needs execute privileges as well
-  dirActionNeeded.and(FsAction.EXECUTE);
-}
-

Review comment:
   So, I understand why this would be removed from a find-bugs perspective 
(this is a no-op), but this is actually an all around bug.  This should be:
   
   ```
   // for dirs user needs execute privileges as well
   FsAction dirActionNeeded = (isDir) ? action.and(FsAction.EXECUTE) : action;
   ```

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -6475,17 +6477,17 @@ private static boolean isAllowed(Configuration conf, 
ConfVars setting) {
   }
 
   public static String getNonMrEngines() {
-String result = StringUtils.EMPTY;
+StringBuffer result = new StringBuffer();
 for (String s : ConfVars.HIVE_EXECUTION_ENGINE.getValidStringValues()) {
   if ("mr".equals(s)) {
 continue;
   }
-  if (!result.isEmpty()) {
-result += ", ";
+  if (result.length() != 0) {
+result.append(", ");
   }
-  result += s;
+  result.append(s);
 }
-return result;
+return result.toString();

Review comment:
   Please change this to just use String#join and more human friendly.
   
   ```
   Set engines = new 
HashSet<>(ConfVars.HIVE_EXECUTION_ENGINE.getValidStringValues());
   boolean removedMR = engines.remove("mr");
   LOG.debug("Found and removed MapReduce engine from list of valid execution 
engines: {}", removedMR);
   return String.join(", ", engines);
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] aasha commented on a change in pull request #1120: HIVE-23611 : Mandate fully qualified absolute path for external table…

2020-06-29 Thread GitBox


aasha commented on a change in pull request #1120:
URL: https://github.com/apache/hive/pull/1120#discussion_r444723985



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -517,7 +508,7 @@ public void externalTableIncrementalReplication() throws 
Throwable {
 + "'")
 .run("alter table t1 add partition(country='india')")
 .run("alter table t1 add partition(country='us')")
-.dump(primaryDbName, withClause);

Review comment:
   why is the withclause removed from here?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/BaseReplicationAcrossInstances.java
##
@@ -103,6 +114,12 @@ public static void classLevelTearDown() throws IOException 
{
 replica.close();
   }
 
+  private static void setReplicaExternalBase(FileSystem fs, Map confMap) throws IOException {
+fs.mkdirs(REPLICA_EXTERNAL_BASE);
+fullyQualifiedReplicaExternalBase =  
fs.getFileStatus(REPLICA_EXTERNAL_BASE).getPath().toString();
+confMap.put(HiveConf.ConfVars.REPL_EXTERNAL_TABLE_BASE_DIR.varname, 
fullyQualifiedReplicaExternalBase);

Review comment:
   this is set at 104 line also

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/ReplicationTestUtils.java
##
@@ -502,19 +502,11 @@ public static void insertForMerge(WarehouseInstance 
primary, String primaryDbNam
 "creation", "creation", "merge_update", "merge_insert", 
"merge_insert"});
   }
 
-  public static List externalTableBasePathWithClause(String 
replExternalBase, WarehouseInstance replica)
-  throws IOException, SemanticException {
-Path externalTableLocation = new Path(replExternalBase);
-DistributedFileSystem fileSystem = replica.miniDFSCluster.getFileSystem();
-externalTableLocation = 
PathBuilder.fullyQualifiedHDFSUri(externalTableLocation, fileSystem);
-fileSystem.mkdirs(externalTableLocation);
-
-// this is required since the same filesystem is used in both source and 
target
-return Arrays.asList(
-"'" + HiveConf.ConfVars.REPL_EXTERNAL_TABLE_BASE_DIR.varname + 
"'='"
-+ externalTableLocation.toString() + "'",
-"'distcp.options.pugpb'=''"
-);
+  public static List externalTableClause(boolean enable) {

Review comment:
   should this be include external table clause

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java
##
@@ -66,6 +67,9 @@ private ReplExternalTables(){}
 
   public static String externalTableLocation(HiveConf hiveConf, String 
location) throws SemanticException {
 String baseDir = 
hiveConf.get(HiveConf.ConfVars.REPL_EXTERNAL_TABLE_BASE_DIR.varname);
+if (StringUtils.isEmpty(baseDir)) {

Review comment:
   At the time of load the REPL_EXTERNAL_TABLE_BASE_DIR fully qualified 
path is not needed?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -503,8 +495,7 @@ public void externalTableIncrementalCheckpointing() throws 
Throwable {
 
   @Test
   public void externalTableIncrementalReplication() throws Throwable {
-List withClause = externalTableBasePathWithClause();
-WarehouseInstance.Tuple tuple = primary.dump(primaryDbName, withClause);
+WarehouseInstance.Tuple tuple = primary.dump(primaryDbName);

Review comment:
   with clause removed?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -159,14 +154,14 @@ public void externalTableReplicationWithDefaultPaths() 
throws Throwable {
 .run("insert into table t2 partition(country='india') values 
('bangalore')")
 .run("insert into table t2 partition(country='us') values ('austin')")
 .run("insert into table t2 partition(country='france') values 
('paris')")
-.dump(primaryDbName, withClauseOptions);
+.dump(primaryDbName);

Review comment:
   why is the with clause removed?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -623,7 +612,7 @@ public void bootstrapExternalTablesDuringIncrementalPhase() 
throws Throwable {
 assertFalse(primary.miniDFSCluster.getFileSystem()
 .exists(new Path(metadataPath + relativeExtInfoPath(null;
 
-replica.load(replicatedDbName, primaryDbName, loadWithClause)
+replica.load(replicatedDbName, primaryDbName)

Review comment:
   with clause removed?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -750,10 +752,19 @@ private void dumpTableListToDumpLocation(List 
tableList, Path dbRoot, St
 
   private List dirLocationsToCopy(List sourceLocations)
   throws HiveException 

[GitHub] [hive] bmaidics opened a new pull request #1189: HIVE-23774: Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java

2020-06-29 Thread GitBox


bmaidics opened a new pull request #1189:
URL: https://github.com/apache/hive/pull/1189


   This log is not needed at INFO log level.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kasakrisz opened a new pull request #1188: HIVE-19549: Enable TestAcidOnTez#testCtasTezUnion

2020-06-29 Thread GitBox


kasakrisz opened a new pull request #1188:
URL: https://github.com/apache/hive/pull/1188


   Testing done:
   ```
   mvn test -Dtest=TestAcidOnTez#testCtasTezUnion -pl itests/hive-unit -Pitests
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] adesh-rao commented on a change in pull request #1109: HIVE-22015: Add table constraints in CachedStore

2020-06-29 Thread GitBox


adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r446900743



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2497,26 +2599,82 @@ long getPartsFound() {
 
   @Override public List getPrimaryKeys(String catName, String 
dbName, String tblName)
   throws MetaException {
-// TODO constraintCache
-return rawStore.getPrimaryKeys(catName, dbName, tblName);
+catName = normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the primary keys is not yet loaded in cache
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+List keys = sharedCache.listCachedPrimaryKeys(catName, 
dbName, tblName);

Review comment:
   Yes, While updating the cache, there is a possibility that table got 
updated but constraints didn't (they are yet to be updated). But this is 
similar to partition/columnStats caching.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] ayushtkn commented on pull request #1187: HIVE-23772. Reallocate calcite-core to prevent NoSuchFiledError.

2020-06-29 Thread GitBox


ayushtkn commented on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-651055461


   Have updated itest as well, which caused the compilation to fail. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] adesh-rao commented on a change in pull request #1109: HIVE-22015: Add table constraints in CachedStore

2020-06-29 Thread GitBox


adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r446898404



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -543,10 +556,24 @@ static void prewarm(RawStore rawStore) {
 tableColStats = rawStore.getTableColumnStatistics(catName, 
dbName, tblName, colNames, CacheUtils.HIVE_ENGINE);
 Deadline.stopTimer();
   }
+  Deadline.startTimer("getPrimaryKeys");
+  rawStore.getPrimaryKeys(catName, dbName, tblName);
+  Deadline.stopTimer();
+  Deadline.startTimer("getForeignKeys");
+  rawStore.getForeignKeys(catName, null, null, dbName, tblName);
+  Deadline.stopTimer();
+  Deadline.startTimer("getUniqueConstraints");
+  rawStore.getUniqueConstraints(catName, dbName, tblName);
+  Deadline.stopTimer();
+  Deadline.startTimer("getNotNullConstraints");
+  rawStore.getNotNullConstraints(catName, dbName, tblName);
+  Deadline.stopTimer();
+
   // If the table could not cached due to memory limit, stop 
prewarm
   boolean isSuccess = sharedCache
   .populateTableInCache(table, tableColStats, partitions, 
partitionColStats, aggrStatsAllPartitions,

Review comment:
   Done. Though the new class just contains constraints objects for now, we 
can have a different refactoring jira for partition/column stat that can also 
refactor the array created to store size/dirtyCache variable.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] ayushtkn opened a new pull request #1187: HIVE-23772. Reallocate calcite-core to prevent NoSuchFiledError.

2020-06-29 Thread GitBox


ayushtkn opened a new pull request #1187:
URL: https://github.com/apache/hive/pull/1187


   https://issues.apache.org/jira/browse/HIVE-23772
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary merged pull request #1168: HIVE-23738: DBLockManager::lock() : Move lock request to debug level

2020-06-29 Thread GitBox


pvary merged pull request #1168:
URL: https://github.com/apache/hive/pull/1168


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kishendas commented on a change in pull request #1095: Make all DDLs transactional and advance commit write ID, so that subsequent read will invalidate the cache entry.

2020-06-29 Thread GitBox


kishendas commented on a change in pull request #1095:
URL: https://github.com/apache/hive/pull/1095#discussion_r446851340



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2351,7 +2351,6 @@ public static TableSnapshot 
getTableSnapshot(Configuration conf,
 }
   }
 
-

Review comment:
   Reverting the entire file. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kishendas commented on a change in pull request #1095: Make all DDLs transactional and advance commit write ID, so that subsequent read will invalidate the cache entry.

2020-06-29 Thread GitBox


kishendas commented on a change in pull request #1095:
URL: https://github.com/apache/hive/pull/1095#discussion_r446849934



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -366,13 +368,16 @@ private void acquireLocks() throws 
CommandProcessorException {
 driverContext.getTxnManager().getTableWriteId(t.getDbName(), 
t.getTableName());
   }
 
-

Review comment:
   Ok, I will revert entire file. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] kishendas commented on a change in pull request #1095: Make all DDLs transactional and advance commit write ID, so that subsequent read will invalidate the cache entry.

2020-06-29 Thread GitBox


kishendas commented on a change in pull request #1095:
URL: https://github.com/apache/hive/pull/1095#discussion_r446847170



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/partition/exchange/AlterTableExchangePartitionsDesc.java
##
@@ -53,4 +54,20 @@ public Table getDestinationTable() {
   public Map getPartitionSpecs() {
 return partitionSpecs;
   }
+
+  @Override
+  public void setWriteId(long writeId) {
+this.writeId = writeId;
+  }
+
+  @Override
+  public String getFullTableName() {
+return null;

Review comment:
   Was not sure what to return for table name, so disabled advancing the 
write ID, as this DDL involves two different tables. I will just revert the 
entire file, till I figure this out. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1095: Make all DDLs transactional and advance commit write ID, so that subsequent read will invalidate the cache entry.

2020-06-29 Thread GitBox


pvary commented on a change in pull request #1095:
URL: https://github.com/apache/hive/pull/1095#discussion_r446845384



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2351,7 +2351,6 @@ public static TableSnapshot 
getTableSnapshot(Configuration conf,
 }
   }
 
-

Review comment:
   Here is one more extra line here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1095: Make all DDLs transactional and advance commit write ID, so that subsequent read will invalidate the cache entry.

2020-06-29 Thread GitBox


pvary commented on a change in pull request #1095:
URL: https://github.com/apache/hive/pull/1095#discussion_r446842444



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -435,8 +439,14 @@ public void releaseLocksAndCommitOrRollback(boolean 
commit, HiveTxnManager txnMa
 }
 // If we've opened a transaction we need to commit or rollback rather than 
explicitly
 // releasing the locks.
-driverContext.getConf().unset(ValidTxnList.VALID_TXNS_KEY);
-
driverContext.getConf().unset(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY);
+// Unset all the keys
+for (String key : new String[] { ValidTxnList.VALID_TXNS_KEY, 
ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY,
+ValidTxnList.COMPACTOR_VALID_TXNS_ID_KEY, 
ValidTxnWriteIdList.COMPACTOR_VALID_TABLES_WRITEIDS_KEY }) {

Review comment:
   Why do we need the COMPACTOR_VALID_TABLES_WRITEIDS_KEY in the conf? I 
would remove it if it is not strictly necessary.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -366,13 +368,16 @@ private void acquireLocks() throws 
CommandProcessorException {
 driverContext.getTxnManager().getTableWriteId(t.getDbName(), 
t.getTableName());
   }
 
-

Review comment:
   Still one extra line is here...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1095: Make all DDLs transactional and advance commit write ID, so that subsequent read will invalidate the cache entry.

2020-06-29 Thread GitBox


pvary commented on a change in pull request #1095:
URL: https://github.com/apache/hive/pull/1095#discussion_r446842752



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/RewriteSemanticAnalyzer.java
##
@@ -405,11 +405,9 @@ protected void updateOutputs(Table targetTable) {
*/
   private boolean isTargetTable(Entity entity, Table targetTable) {
 //todo: https://issues.apache.org/jira/browse/HIVE-15048
-/**
- * is this the right way to compare?  Should it just compare paths?
- * equals() impl looks heavy weight
- */
-return targetTable.equals(entity.getTable());
+// Since any DDL now advances the write id, we should ignore the write Id,
+// while comparing two tables
+return targetTable.equalsWithIgnoreWriteId(entity.getTable());

Review comment:
   Ok. Let's shelve this





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] pvary commented on a change in pull request #1095: Make all DDLs transactional and advance commit write ID, so that subsequent read will invalidate the cache entry.

2020-06-29 Thread GitBox


pvary commented on a change in pull request #1095:
URL: https://github.com/apache/hive/pull/1095#discussion_r446842042



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/partition/exchange/AlterTableExchangePartitionsDesc.java
##
@@ -53,4 +54,20 @@ public Table getDestinationTable() {
   public Map getPartitionSpecs() {
 return partitionSpecs;
   }
+
+  @Override
+  public void setWriteId(long writeId) {
+this.writeId = writeId;
+  }
+
+  @Override
+  public String getFullTableName() {
+return null;

Review comment:
   Could we add a comment for this then, why it is needed? This seems 
unintuitive..





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org