Re: Review Request 71835: HIVE-22508

2019-12-06 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71835/#review218966
---




ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java
Lines 70 (patched)


Can you add comment with meaning for variable?



ql/src/test/results/clientpositive/hypothetical_set_aggregates.q.out
Line 133 (original), 133 (patched)


Were results different here because NULL ordering? Can you elaborate?


- Jesús Camacho Rodríguez


On Nov. 27, 2019, 2:24 p.m., Krisztian Kasa wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71835/
> ---
> 
> (Updated Nov. 27, 2019, 2:24 p.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez and Zoltan Haindrich.
> 
> 
> Bugs: HIVE-22508
> https://issues.apache.org/jira/browse/HIVE-22508
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> KeyWrapperComparator throws exception
> =
> 1. Check the `KeyWrapper` instances whether they are a clones and use the 
> corresponding object inspectors.
> 2. `NullValueOption.MINVALUE` and `MAXVALUE` was switched in `NullOrdering` 
> enum and hypothetical set functions produced bad return values in case of 
> NULL inputs.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java fee8081652 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperComparator.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 
> f1bf9023e1 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java 4e35922e5f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBase.java
>  1bb224917e 
>   ql/src/java/org/apache/hadoop/hive/ql/util/NullOrdering.java 46ff329981 
>   ql/src/test/queries/clientpositive/hypothetical_set_aggregates.q 6b5f3765e9 
>   ql/src/test/results/clientpositive/hypothetical_set_aggregates.q.out 
> 3ea6f1f4e5 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
>  1f3bbaf87d 
>   
> serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestObjectInspectorUtilsCompare.java
>  PRE-CREATION 
>   
> serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestObjectInspectorUtilsCompareNull.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/71835/diff/1/
> 
> 
> Testing
> ---
> 
> Run q tests using TestMiniLlapLocalCliDriver
> topnkey_order_null.q
> 
> TestCliDriver
> hypothetical_set_aggregates.q
> 
> 
> Thanks,
> 
> Krisztian Kasa
> 
>



Re: Review Request 71820: HIVE-20150

2019-12-06 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71820/#review218965
---




ql/src/test/results/clientpositive/tez/topnkey.q.out
Lines 105 (patched)


It seems these two should be merged: Second one supersedes the first one.


- Jesús Camacho Rodríguez


On Dec. 3, 2019, 9:27 a.m., Krisztian Kasa wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71820/
> ---
> 
> (Updated Dec. 3, 2019, 9:27 a.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez and Zoltan Haindrich.
> 
> 
> Bugs: HIVE-20150
> https://issues.apache.org/jira/browse/HIVE-20150
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> TopNKey pushdown
> 
> 1. Apply patch: 
> https://issues.apache.org/jira/secure/attachment/12941630/HIVE-20150.11.patch
> 2. TopNKey introduction depends only from Reduce Sink with topn property >= 0
> 3. Implement TopNKey operator pushdown through: projection, group by, redeuce 
> sink, left outer join, other topnkey
> 4. Add sort order and null sort order direction check when determining if the 
> topnkey op can be pushed
> 5. Implement handling cases when topnkey op and the parent op has a common 
> key prefix only.
> 6. turn off topnkey optimization by default
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 4393a2825e 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/CommonKeyPrefix.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java 
> 0d6cf3c755 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java bf58bd8bb8 
>   ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestCommonKeyPrefix.java 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/topnkey.q 057b6a45ba 
>   ql/src/test/queries/clientpositive/vector_topnkey.q 85c5880cd6 
>   ql/src/test/results/clientpositive/llap/bucket_groupby.q.out 0c051c926b 
>   ql/src/test/results/clientpositive/llap/check_constraint.q.out 9f2c9a1cd0 
>   ql/src/test/results/clientpositive/llap/constraints_optimization.q.out 
> b6d210becf 
>   ql/src/test/results/clientpositive/llap/enforce_constraint_notnull.q.out 
> 9343e078b7 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 283a665a20 
>   ql/src/test/results/clientpositive/llap/explainuser_2.q.out 0219af8833 
>   ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out 
> 545cce75a9 
>   ql/src/test/results/clientpositive/llap/filter_union.q.out 0df77762a0 
>   ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 3fdd77d802 
>   ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out efa8c38d7c 
>   ql/src/test/results/clientpositive/llap/llap_decimal64_reader.q.out 
> ffe5f6fb22 
>   ql/src/test/results/clientpositive/llap/offset_limit.q.out 23f2de46e5 
>   ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 
> 4ecb7bc46d 
>   ql/src/test/results/clientpositive/llap/orc_struct_type_vectorization.q.out 
> 0eac389eb7 
>   
> ql/src/test/results/clientpositive/llap/parquet_complex_types_vectorization.q.out
>  4362fb6f2e 
>   
> ql/src/test/results/clientpositive/llap/parquet_map_type_vectorization.q.out 
> 24468c9a1b 
>   
> ql/src/test/results/clientpositive/llap/parquet_struct_type_vectorization.q.out
>  45890a1890 
>   ql/src/test/results/clientpositive/llap/semijoin_reddedup.q.out 0e9723b8f3 
>   ql/src/test/results/clientpositive/llap/subquery_ALL.q.out d910c1a79d 
>   ql/src/test/results/clientpositive/llap/subquery_ANY.q.out 91472d631e 
>   ql/src/test/results/clientpositive/llap/topnkey.q.out 1e77587f82 
>   ql/src/test/results/clientpositive/llap/vector_cast_constant.q.out 
> cc2dc47280 
>   ql/src/test/results/clientpositive/llap/vector_char_2.q.out f7e76e5a8b 
>   
> ql/src/test/results/clientpositive/llap/vector_groupby_grouping_sets_limit.q.out
>  6fd15e7101 
>   ql/src/test/results/clientpositive/llap/vector_groupby_reduce.q.out 
> d6325982e3 
>   ql/src/test/results/clientpositive/llap/vector_mr_diff_schema_alias.q.out 
> 4d417b9c3d 
>   ql/src/test/results/clientpositive/llap/vector_reduce_groupby_decimal.q.out 
> 97a211cfc6 
>   ql/src/test/results/clientpositive/llap/vector_string_concat.q.out 
> a8019be7aa 
>   ql/src/test/results/clientpositive/llap/vector_topnkey.q.out c140bdfd37 
>   ql/src/test/results/clientpositive/llap/vectorization_limit.q.out 
> 7326adf522 
>   ql/src/test/results/clientpositive/perf/tez/cbo_query14.q.out e9308cd709 
>   ql/src/test/results/clientpositive/perf/tez/cbo_query77.q.out 02caf99f7d 
>   ql/src/test/re

Re: Review Request 71820: HIVE-20150

2019-12-06 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71820/#review218963
---




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Line 2386 (original), 2386 (patched)


Just for reference: Are you planning to enable back the optimization once 
all other fixes are applied? Or does it have to do with performance concerns 
for some cases?

Even if we disable it here, can we enable it for Perf cli drivers (you can 
do that by adding the property value in _data/conf/perf-reg/tez/hive-site.xml_? 
I believe it would be useful to already see how far are we pushing topn in 
those tpcds queries.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CommonKeyPrefix.java
Lines 35 (patched)


It seems this class can only be created using the static building methods? 
If that is the case, can you create a private constructor to defeat 
instantiation?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CommonKeyPrefix.java
Lines 37 (patched)


Can you add comments to each of these static methods explaining shortly how 
the mapping will be performed?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CommonKeyPrefix.java
Lines 93 (patched)


I believe we should use _ExprNodeDesc.isSame_ here, since equals refers to 
identify for ExprNodeDesc objects.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java
Line 107 (original), 89 (patched)


Do we need this since they are already passed when we create the new 
operator?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java
Line 108 (original), 90 (patched)


We can just add the child to the child operators instead of creating the 
list multiple times.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java
Lines 93 (patched)


_replaceChild_?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java
Lines 95 (patched)


_replaceParent_?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java
Lines 18 (patched)


Let's move all these classes (TopNKeyPushdownProcessor, TopNKeyProcessor, 
CommonKeyPrefix, etc.) into a new package 
_org.apache.hadoop.hive.ql.optimizer.topnkey_. Thoughts?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java
Lines 72 (patched)


Although it is very simple, can we add this to new method as with the rest 
of operators? It helps encapsulating so same logic may be used for other simple 
operators in the future.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java
Lines 91 (patched)


Typo



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java
Lines 105 (patched)


It seems the logic in this method cannot throw a Semantic Exception?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java
Lines 202 (patched)


Short comment on this method? _Only LOJ supported_.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java
Lines 257 (patched)


Comment?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java
Lines 258 (patched)


What happens is the key is the same but the value for top is different? It 
seems we do not cover that case here but it is important in case you end up 
with multiple top n operators derived from different RS operators.
I believe in that situation we could still remove one of them, but we 
should keep the smallest value for top n.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java
Lines 278 (patched)


Should this method be moved to _CommonKeyPrefix_ too?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java
Lines 314 (patched)


Same comment as above: It seems we do not need to create lists to 
encapsulate other lists here, we can just add the operator to the existing list 
in the operator?



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompile

[jira] [Created] (HIVE-22595) Dynamic partition inserts fail on Avro table table with external schema

2019-12-06 Thread Jason Dere (Jira)
Jason Dere created HIVE-22595:
-

 Summary: Dynamic partition inserts fail on Avro table table with 
external schema
 Key: HIVE-22595
 URL: https://issues.apache.org/jira/browse/HIVE-22595
 Project: Hive
  Issue Type: Bug
  Components: Avro, Serializers/Deserializers
Reporter: Jason Dere
Assignee: Jason Dere


Example qfile test:
{noformat}
create external table avro_extschema_insert1 (name string) partitioned by (p1 
string)
  stored as avro tblproperties 
('avro.schema.url'='${system:test.tmp.dir}/table1.avsc');

create external table avro_extschema_insert2 like avro_extschema_insert1;

insert overwrite table avro_extschema_insert1 partition (p1='part1') values 
('col1_value', 1, 'col3_value');

insert overwrite table avro_extschema_insert2 partition (p1) select * from 
avro_extschema_insert1;
{noformat}

The last statement fails with the following error:
{noformat}
], TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : 
attempt_1575484789169_0003_4_00_00_3:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:576)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
... 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns 
was different than output columns (in = 2 vs out = 1
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1047)
at 
org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:153)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
... 20 more
Caused by: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of 
input columns was different than output columns (in = 2 vs out = 1
at 
org.apache.hadoo

[jira] [Created] (HIVE-22594) Reorder metastore thrift structures

2019-12-06 Thread John Sherman (Jira)
John Sherman created HIVE-22594:
---

 Summary: Reorder metastore thrift structures
 Key: HIVE-22594
 URL: https://issues.apache.org/jira/browse/HIVE-22594
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore, Thrift API
Affects Versions: 4.0.0
Reporter: John Sherman
Assignee: John Sherman


Currently hive_metastore.thrift C++ output does not compile due to structures 
like ScheduledQueryKey and ColumnStatistics being referenced before being 
defined later in the file. We should reorder the structures in the file so the 
C++ output is usable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Error Handling and Logging

2019-12-06 Thread XU Qinghui
Thanks for sharing, I think indeed loggings should be taken more seriously
in this project.

Le ven. 6 déc. 2019 à 18:53, David Mollitor  a écrit :

> -- Forwarded message -
> From: David Mollitor 
> Date: Fri, Dec 6, 2019 at 12:52 PM
> Subject: Re: Error Handling and Logging
> To: 
>
>
> Hell Team,
>
> I'm not sure how to automate this, but as I scan through the log
> statements, another thing I am observing is the following:
>
> 1// Lots of punctuation.  There is no need to yell (!!!) or to try to be
> proper (.).  Please be concise and most log messages do not end with a
> period
> 2// Contractions.  Please do not use contractions (can't, won't we're)
> these are harder words for our international friends
> 3// Use strong words. Avoid phases like "couldn't complete the action" and
> instead consider using something like "failed to complete action".  The
> word "fail" is trained into me much more than the word "couldn't"
>
> Thanks so much for all the positive feedback I've received from the HBase
> team here and in private conversations.
>
>
>
> On Thu, Dec 5, 2019 at 1:10 PM Nick Dimiduk  wrote:
>
> > I think these are good points all around. Can any of these anti-patterns
> be
> > flagged by a checkstyle rule? Static analysis would make the infractions
> > easier to track down.
> >
> > One more point of my own: I’m of the opinion that we log too much in
> > general. Info level should not describe the details of operations as
> > normal. I’m also not a fan of logging data structure “status”messages, as
> > we do, for example, from the block cache. It’s enough to expose these as
> > metrics.
> >
> > Thanks for speaking up! If you’re feeling ambitious, please ping me on
> any
> > PRs and we’ll get things cleaned up.
> >
> > Thanks,
> > Nick
> >
> > On Tue, Dec 3, 2019 at 21:02 Stack  wrote:
> >
> > > Thanks for the helpful note David. Appreciated.
> > > S
> > >
> > > On Tue, Nov 26, 2019 at 1:44 PM David Mollitor 
> > wrote:
> > >
> > > > Hello Team,
> > > >
> > > > I am one of many people responsible for supporting the Hadoop
> products
> > > out
> > > > in the field.  Error handling and logging are crucial to my success.
> > > I've
> > > > been reading over the code and I see many of the same mistakes again
> > and
> > > > again.  I just wanted to bring some of these things to your attention
> > so
> > > > that moving forward, we can make these products better.
> > > >
> > > > The general best-practice is:
> > > >
> > > > public class TestExceptionLogging
> > > > {
> > > >   private static final Logger LOG =
> > > > LoggerFactory.getLogger(TestExceptionLogging.class);
> > > >
> > > >   public static void main(String[] args) {
> > > > try {
> > > >   processData();
> > > > } catch (Exception e) {
> > > >   LOG.error("Application failed", e);
> > > > }
> > > >   }
> > > >
> > > >   public static void processData() throws Exception {
> > > > try {
> > > >   readData();
> > > > } catch (Exception e) {
> > > >   throw new Exception("Failed to process data", e);
> > > > }
> > > >   }
> > > >
> > > >   public static byte[] readData() throws Exception {
> > > > throw new IOException("Failed to read device");
> > > >   }
> > > > }
> > > >
> > > > Produces:
> > > >
> > > > [main] ERROR TestExceptionLogging - Application failed
> > > > java.lang.Exception: Failed to process data
> > > > at TestExceptionLogging.processData(TestExceptionLogging.java:22)
> > > > at TestExceptionLogging.main(TestExceptionLogging.java:12)
> > > > Caused by: java.io.IOException: Failed to read device
> > > > at TestExceptionLogging.readData(TestExceptionLogging.java:27)
> > > > at TestExceptionLogging.processData(TestExceptionLogging.java:20)
> > > > ... 1 more
> > > >
> > > >
> > > >
> > > > Please notice that when an exception is thrown, and caught, it is
> > wrapped
> > > > at each level and each level adds some more context to describe what
> > was
> > > > happening when the error occurred.  It also produces a complete stack
> > > trace
> > > > that pinpoints the issue.  For Hive folks, it is rarely the case
> that a
> > > > method consuming a HMS API call should itself throw a MetaException.
> > The
> > > > MetaException has no way of wrapping an underlying Exception and
> > helpful
> > > > data is often loss.  A method may chooses to wrap a MetaException,
> but
> > it
> > > > should not be throwing them around as the default behavior.
> > > >
> > > > Also important to note is that there is exactly one place that is
> doing
> > > the
> > > > logging.  There does not need to be any logging at the lower
> levels.  A
> > > > catch block should throw or log, not both.  This is an anti-pattern
> and
> > > > annoying as the end user: having to deal with multiple stack traces
> at
> > > > different log levels for the same error condition.  The log message
> > > should
> > > > be at the highest level only.
> > > >
> > > > https://community.oracle.com/docs/DOC-983543#logAndThrow
>

Fwd: Error Handling and Logging

2019-12-06 Thread David Mollitor
-- Forwarded message -
From: David Mollitor 
Date: Fri, Dec 6, 2019 at 12:52 PM
Subject: Re: Error Handling and Logging
To: 


Hell Team,

I'm not sure how to automate this, but as I scan through the log
statements, another thing I am observing is the following:

1// Lots of punctuation.  There is no need to yell (!!!) or to try to be
proper (.).  Please be concise and most log messages do not end with a
period
2// Contractions.  Please do not use contractions (can't, won't we're)
these are harder words for our international friends
3// Use strong words. Avoid phases like "couldn't complete the action" and
instead consider using something like "failed to complete action".  The
word "fail" is trained into me much more than the word "couldn't"

Thanks so much for all the positive feedback I've received from the HBase
team here and in private conversations.



On Thu, Dec 5, 2019 at 1:10 PM Nick Dimiduk  wrote:

> I think these are good points all around. Can any of these anti-patterns be
> flagged by a checkstyle rule? Static analysis would make the infractions
> easier to track down.
>
> One more point of my own: I’m of the opinion that we log too much in
> general. Info level should not describe the details of operations as
> normal. I’m also not a fan of logging data structure “status”messages, as
> we do, for example, from the block cache. It’s enough to expose these as
> metrics.
>
> Thanks for speaking up! If you’re feeling ambitious, please ping me on any
> PRs and we’ll get things cleaned up.
>
> Thanks,
> Nick
>
> On Tue, Dec 3, 2019 at 21:02 Stack  wrote:
>
> > Thanks for the helpful note David. Appreciated.
> > S
> >
> > On Tue, Nov 26, 2019 at 1:44 PM David Mollitor 
> wrote:
> >
> > > Hello Team,
> > >
> > > I am one of many people responsible for supporting the Hadoop products
> > out
> > > in the field.  Error handling and logging are crucial to my success.
> > I've
> > > been reading over the code and I see many of the same mistakes again
> and
> > > again.  I just wanted to bring some of these things to your attention
> so
> > > that moving forward, we can make these products better.
> > >
> > > The general best-practice is:
> > >
> > > public class TestExceptionLogging
> > > {
> > >   private static final Logger LOG =
> > > LoggerFactory.getLogger(TestExceptionLogging.class);
> > >
> > >   public static void main(String[] args) {
> > > try {
> > >   processData();
> > > } catch (Exception e) {
> > >   LOG.error("Application failed", e);
> > > }
> > >   }
> > >
> > >   public static void processData() throws Exception {
> > > try {
> > >   readData();
> > > } catch (Exception e) {
> > >   throw new Exception("Failed to process data", e);
> > > }
> > >   }
> > >
> > >   public static byte[] readData() throws Exception {
> > > throw new IOException("Failed to read device");
> > >   }
> > > }
> > >
> > > Produces:
> > >
> > > [main] ERROR TestExceptionLogging - Application failed
> > > java.lang.Exception: Failed to process data
> > > at TestExceptionLogging.processData(TestExceptionLogging.java:22)
> > > at TestExceptionLogging.main(TestExceptionLogging.java:12)
> > > Caused by: java.io.IOException: Failed to read device
> > > at TestExceptionLogging.readData(TestExceptionLogging.java:27)
> > > at TestExceptionLogging.processData(TestExceptionLogging.java:20)
> > > ... 1 more
> > >
> > >
> > >
> > > Please notice that when an exception is thrown, and caught, it is
> wrapped
> > > at each level and each level adds some more context to describe what
> was
> > > happening when the error occurred.  It also produces a complete stack
> > trace
> > > that pinpoints the issue.  For Hive folks, it is rarely the case that a
> > > method consuming a HMS API call should itself throw a MetaException.
> The
> > > MetaException has no way of wrapping an underlying Exception and
> helpful
> > > data is often loss.  A method may chooses to wrap a MetaException, but
> it
> > > should not be throwing them around as the default behavior.
> > >
> > > Also important to note is that there is exactly one place that is doing
> > the
> > > logging.  There does not need to be any logging at the lower levels.  A
> > > catch block should throw or log, not both.  This is an anti-pattern and
> > > annoying as the end user: having to deal with multiple stack traces at
> > > different log levels for the same error condition.  The log message
> > should
> > > be at the highest level only.
> > >
> > > https://community.oracle.com/docs/DOC-983543#logAndThrow
> > >
> > > Both projects use SLF4J as the logging framework (facade anyway).
> Please
> > > familiarize yourself with how to correctly log an Exception.  There is
> no
> > > need to log a thread name, a time stamp, a class name, or a stack
> trace.
> > > The logging framework will do that all for you.
> > >
> > > http://www.slf4j.org/faq.html#paramException
> > >
> > > Again, there is no need to 'stringify' an exce

[jira] [Created] (HIVE-22593) Dynamically partitioned MM (insert-only ACID) table isn't compacting automatically

2019-12-06 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-22593:


 Summary: Dynamically partitioned MM (insert-only ACID) table isn't 
compacting automatically
 Key: HIVE-22593
 URL: https://issues.apache.org/jira/browse/HIVE-22593
 Project: Hive
  Issue Type: Bug
Reporter: Karen Coppage
Assignee: Karen Coppage


Dynamic partitions of MM tables aren't entered into the HMS table 
TXN_COMPONENTS. On inserting into such tables we see this line in the HMS log:
{code:java}
Expected to move at least one record from txn_components to 
completed_txn_components when committing txn!{code}
(This is not the case for non-partitioned MM tables.)

Since the partitions aren't entered into COMPLETED_TXN_COMPONENTS, they aren't 
considered for automatic compaction.

Probably the culprit is 
org.apache.hadoop.hive.ql.metadata.Hive#loadDynamicPartitions which has an 
isAcid parameter that is always false regarding MM tables, and also because MM 
tables' "write type" is AcidUtils.Operation.NOT_ACID and not INSERT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Integration Tests running with non standard spark version

2019-12-06 Thread Ismaël Mejía
Hello, just pinging again to see if anyone can help me with this.
Just to give some more context, I am trying to upgrade Avro to version
1.9.1 to tackle some security issues in the dependencies as well as to get
some of the improvements of Avro since the latest release that happened 2y
ago.

An extra goal is to get this cherry picked into the 2.x branch (if
possible) so we can get the upgrade done in Spark 3 too. The Spark upgrade
is partially blocked by Hive leaking an older version of Avro and an unlock
that too.


On Tue, Nov 26, 2019 at 3:26 PM Ismaël Mejía  wrote:

> Hello,
>
> I am trying to upgrade the Avro dependency on Hive to version 1.9.1. More
> details in
> https://issues.apache.org/jira/browse/HIVE-21737
>
> The Hive CI tests for my patch failed because of some class issue then I
> tried to tackle this by upgrading the Spark version installed during the IT
> tests from 2.3.0 to version 2.4.4. However I discovered that the Jenkins
> job was failing to download the Spark version from some AWS storage bucket
>
> https://github.com/apache/hive/blob/209096a71b25a3d75dca1930f825c8c10b99d2b9/itests/bin/download-spark.sh#L37
>
> It seems this version is somehow a different from upstream Spark. Does
> anyone know more about why we use that version? And more important, any
> chance this version can be upgraded to version 2.4.x (with the hope this
> may unblock my upgrade issue).
>
> Regards,
> Ismaël
>
>


[jira] [Created] (HIVE-22592) Remove redundant calls to AcidUtils#getAcidState in Worker and CompactorMR

2019-12-06 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-22592:


 Summary: Remove redundant calls to AcidUtils#getAcidState in 
Worker and CompactorMR
 Key: HIVE-22592
 URL: https://issues.apache.org/jira/browse/HIVE-22592
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0
Reporter: Karen Coppage
Assignee: Karen Coppage


AcidUtils#getAcidState is called in the Worker before CompactorMR#run and again 
inside CompactorMR#run. Since it's costly to call, we can pass the value as an 
argument to CompactorMR#run.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71889: HIVE-22578: CBO genOPTree is not failsafe for CTAS and VIEW statements

2019-12-06 Thread Aron Hamvas via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71889/
---

(Updated Dec. 6, 2019, 3:12 p.m.)


Review request for hive, Zoltan Haindrich and Vineet Garg.


Bugs: HIVE-22578
https://issues.apache.org/jira/browse/HIVE-22578


Repository: hive-git


Description
---

CalcitePlanner.genOPTree() further invokes CalcitePlanner.fixUpAfterCbo() 
method, which, in case of CTAS, VIEW, and MULTI_INSERT queries replaces the 
original AST. If CBO fails, it is expected to retry OT generation without CBO, 
using the original AST, which is already gone. The change saves the original 
AST for re-use in case CBO fails. Also, some refactoring included:
- Removed unused genOPTree parameters from SemanticAnalyzer
- Centralized the handling of ASTNode object within the context of 
SemanticAnalyzer and CalcitePlanner (i.e. use instance variable where 
appropriate, instead of passing it as method argument)
- Made sure that the exception is always printed to HS2 logs if it happens 
during CB optimization.

Thanks for the feedback in advance!


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ea5fa3f4c3 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 60bfba826d 


Diff: https://reviews.apache.org/r/71889/diff/1/


Testing
---


File Attachments (updated)


HIVE-22578.2.patch
  
https://reviews.apache.org/media/uploaded/files/2019/12/06/8254f1f7-9dde-4837-a8bb-36d699489099__HIVE-22578.2.patch


Thanks,

Aron Hamvas



Review Request 71889: HIVE-22578: CBO genOPTree is not failsafe for CTAS and VIEW statements

2019-12-06 Thread Aron Hamvas via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71889/
---

Review request for hive, Zoltan Haindrich and Vineet Garg.


Bugs: HIVE-22578
https://issues.apache.org/jira/browse/HIVE-22578


Repository: hive-git


Description
---

CalcitePlanner.genOPTree() further invokes CalcitePlanner.fixUpAfterCbo() 
method, which, in case of CTAS, VIEW, and MULTI_INSERT queries replaces the 
original AST. If CBO fails, it is expected to retry OT generation without CBO, 
using the original AST, which is already gone. The change saves the original 
AST for re-use in case CBO fails. Also, some refactoring included:
- Removed unused genOPTree parameters from SemanticAnalyzer
- Centralized the handling of ASTNode object within the context of 
SemanticAnalyzer and CalcitePlanner (i.e. use instance variable where 
appropriate, instead of passing it as method argument)
- Made sure that the exception is always printed to HS2 logs if it happens 
during CB optimization.

Thanks for the feedback in advance!


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ea5fa3f4c3 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 60bfba826d 


Diff: https://reviews.apache.org/r/71889/diff/1/


Testing
---


Thanks,

Aron Hamvas



Review Request 71888: HIVE-22568: Process compaction candidates in parallel by the Initiator

2019-12-06 Thread Denys Kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71888/
---

Review request for hive, Laszlo Pinter and Peter Vary.


Bugs: HIVE-22568
https://issues.apache.org/jira/browse/HIVE-22568


Repository: hive-git


Description
---

`checkForCompaction` includes many file metadata checks and may be expensive. 
Therefore, make sense using a thread pool here and running 
`checkForCompactions` in parallel.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 4393a2825e 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 7a0e32463d 
  ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java 
564839324f 


Diff: https://reviews.apache.org/r/71888/diff/1/


Testing
---

unit test


Thanks,

Denys Kuzmenko



[jira] [Created] (HIVE-22591) Make single metastore call to fetch all column stats instead of separate call for each column

2019-12-06 Thread Vineet Garg (Jira)
Vineet Garg created HIVE-22591:
--

 Summary: Make single metastore call to fetch all column stats 
instead of separate call for each column
 Key: HIVE-22591
 URL: https://issues.apache.org/jira/browse/HIVE-22591
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg


Currently HiveRelFieldTrimmer has the ability to fetch (and trigger cache) 
column stats in single call. 
HiveReduceExpressionsWithStatsRule on the other hand has to use column stats on 
demand, as a result it makes single call for each column.
This should be moved after RelFieldTrimmer so that RelFieldTrimmer has all the 
column stats cached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22590) Revert HIVE-17218 Canonical-ize hostnames for Hive metastore, and HS2 servers as it causes issues with SSL and LB

2019-12-06 Thread Peter Vary (Jira)
Peter Vary created HIVE-22590:
-

 Summary: Revert HIVE-17218 Canonical-ize hostnames for Hive 
metastore, and HS2 servers as it causes issues with SSL and LB
 Key: HIVE-22590
 URL: https://issues.apache.org/jira/browse/HIVE-22590
 Project: Hive
  Issue Type: Bug
Reporter: Peter Vary
Assignee: Peter Vary


HIVE-17218 causes issues with Load balanced + SSL environments, like this:
{code}
java.net.SocketException: Socket is closed 
at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1532) 
at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) 
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) 
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
at java.io.FilterOutputStream.close(FilterOutputStream.java:158) 
at 
org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
 
at org.apache.thrift.transport.TSocket.close(TSocket.java:235) 
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:318) 
at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
 
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
 
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:422) 
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
 
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
 
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:204) 
at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:169) 
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) 
at java.sql.DriverManager.getConnection(DriverManager.java:664) 
at java.sql.DriverManager.getConnection(DriverManager.java:208) 
at 
org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:146) 
at 
org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:211)
 
at org.apache.hive.beeline.Commands.connect(Commands.java:1496) 
at org.apache.hive.beeline.Commands.connect(Commands.java:1391) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
at java.lang.reflect.Method.invoke(Method.java:498) 
at 
org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
 
at org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1135) 
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1174) 
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010) 
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:922) 
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518) 
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
at java.lang.reflect.Method.invoke(Method.java:498) 
at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
Unknown HS2 problem when communicating with Thrift server. 
Error: Could not open client transport with JDBC Uri: jdbc:hive2:// GSS initiate failed 
Also, could not send response: org.apache.thrift.transport.TTransportException: 
javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: 
No subject alternative names matching IP address 52 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22589) Add storage support for ProlepticCalendar

2019-12-06 Thread Jesus Camacho Rodriguez (Jira)
Jesus Camacho Rodriguez created HIVE-22589:
--

 Summary: Add storage support for ProlepticCalendar
 Key: HIVE-22589
 URL: https://issues.apache.org/jira/browse/HIVE-22589
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Reporter: Owen O'Malley
Assignee: László Bodor
 Fix For: 4.0.0, 3.1.3, storage-2.7.1


Hive recently moved its processing to the proleptic calendar, which has created 
some issues for users who have dates before 1580 AD.

I'd propose extending the column vectors for times & dates to encode which 
calendar they are using.

* create DateColumnVector that extends LongColumnVector
* add a method to change calendars to both DateColumnVector and 
TimestampColumnVector.

{code}
  /**
   * Change the calendar to or from proleptic. If the new and old values of the 
flag are the
   * same, nothing is done.
   * useProleptic - set the flag for the proleptic calendar
   * updateData - change the data to match the new value of the flag.
   */
  void changeCalendar(useProleptic: boolean, updateData: boolean);

  /**
   * Detect whether this data is using the proleptic calendar.
   */
  boolean usingProlepticCalendar();
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)