Review Request 71766: HIVE-22402: Deprecate and Replace Hive PerfLogger

2019-11-13 Thread David Mollitor

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71766/
---

Review request for hive, Peter Vary and Slim Bouguerra.


Repository: hive-git


Description
---

Recently I wanted to add some additional capability, and add more, performance 
logging to support my troubleshooting efforts. I started looking at PerfLogger 
and started to examine its usage. I discovered a few things:

Since 'loggers' must be open and closed manually, I found a couple of places 
where loggers were opened, but not closed, rendering them useless
Since 'loggers' must be closed manually, I found a few places where an 
early-return or Exception thrown would cause a logger to not be closed, thereby 
rendering it useless
Session information is not logged, so it can be difficult to precisely pinpoint 
which session is taking lots of time
PerfLogger overloaded. Most of the time, it's being used as a simple timer 
mechanism with automatic logging in SLF4J debug. However, it is also a facade 
over the Hive Metrics subsystem and timing results are automatically published 
to Metrics and then there becomes this dependency on a 'logger' to be able to 
access metric data as well.
The last bullet is the most challenging part and why I propose to deprecate the 
Hive PerfLogger and not simply remove it. I am proposing a new system... a 
PerfTimer that is allows for Java 8's try-with-resources feature to protect 
against the developer having to care about manually close measurements and not 
having to carefully consider all early-exits. The base implementation logs to 
SLF4J. An extended version automatically publishes to the Hive Metric subsystem 
as well.

The current Hive PerfLogger has a bit of a clunky system for allowing plugable 
implementations. However, the current default implementation has a side-effect 
of also publishing timing information to the Hive Metrics subsystem. There are 
code sections that look up various timers in the Metrics Subsytem and publish 
the results back to the client. Since, in theory, the implementation is 
plugable, any other implementation that does not also have this side-effect of 
also publishing to the Metrics Subsystem will break these non-optional code 
paths.  Also, these code paths create and interact with PerfLoggers in a static 
way, and then the publishing code pulls the data from the {{PerfLogger}} (as a 
facade to the Metrics subsystem) in a static way. Therefore, when I tried to 
replace the entire PerfLogger code, I came across an issue because there is not 
(and should not) be a way to just statically pull this information down from 
any point in the code. Information that is required for publish
 ing should be passed around within some sort of context object, separate from 
the Metrics subsystem. There was no obvious way to string a new PerfTimer to 
all the required locations. I propose marking the PerfLogger as deprecated and 
leaving these complex section alone. Instead, replace only the simple "I want a 
timer" use cases.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 282f4cdb0b 
  common/src/java/org/apache/hadoop/hive/ql/log/CachedPerfTimerLogger.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/ql/log/LoggingPerfTimerLogger.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/ql/log/MetricsPerfTimerLogger.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java 2707987f0b 
  common/src/java/org/apache/hadoop/hive/ql/log/PerfTimedAction.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/ql/log/PerfTimer.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/ql/log/PerfTimerFactory.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/ql/log/PerfTimerLogger.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 91910d1c0c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 0643a54753 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 695d08bbe2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 
e205c08d84 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java 
10144a1352 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java a7770b4e53 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
 b9285accbd 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
530131f207 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java 8244dcb1a9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
806deb5f31 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
f29a9f807c 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
07cb5cb936 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 92775107bc 
  ql/

Re: Review Request 71763: HIVE-22484: Remove Calls to printStackTrace

2019-11-13 Thread Slim Bouguerra

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71763/#review218624
---




jdbc/src/java/org/apache/hive/jdbc/HiveDriver.java
Line 46 (original), 46 (patched)


are you sure about this? seems like it can crash the driver, that is not 
the old way of doing thing


- Slim Bouguerra


On Nov. 13, 2019, 5:22 p.m., David Mollitor wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71763/
> ---
> 
> (Updated Nov. 13, 2019, 5:22 p.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-22484: Remove Calls to printStackTrace
> 
> 
> Diffs
> -
> 
>   jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java dfaa40fe23 
>   jdbc/src/java/org/apache/hive/jdbc/HiveDriver.java 102683ee18 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 0d7b92d649 
>   ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLSemanticAnalyzerFactory.java 
> c8aaec15d4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableDummyOperator.java 
> e8f7dd067e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 1aae142ba7 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 3210ca5cf8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java a7770b4e53 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java cd4f2a02a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java 
> dfabfb81e5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 077c94f82b 
>   ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java 
> 616f2d6c10 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 67996c6db9 
>   ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 3e45e45b27 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java
>  3e81ab5959 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SortMergeJoinTaskDispatcher.java
>  fbf6852013 
>   serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDe.java 
> 948cddcb28 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeTypeMap.java
>  3f086cdde4 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeTypeSet.java
>  f41959b7d2 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> b87b670652 
> 
> 
> Diff: https://reviews.apache.org/r/71763/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> David Mollitor
> 
>



[jira] [Created] (HIVE-22494) Use System NanoTime to Measure Code Execution

2019-11-13 Thread David Mollitor (Jira)
David Mollitor created HIVE-22494:
-

 Summary: Use System NanoTime to Measure Code Execution
 Key: HIVE-22494
 URL: https://issues.apache.org/jira/browse/HIVE-22494
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


https://docs.oracle.com/javase/7/docs/api/java/lang/System.html#nanoTime()

It's designed for these use cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22493) Scheduled Query Execution Failure in Tests

2019-11-13 Thread David Mollitor (Jira)
David Mollitor created HIVE-22493:
-

 Summary: Scheduled Query Execution Failure in Tests
 Key: HIVE-22493
 URL: https://issues.apache.org/jira/browse/HIVE-22493
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71763: HIVE-22484: Remove Calls to printStackTrace

2019-11-13 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71763/#review218620
---




ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java
Line 646 (original)


Do not want to LOG the Exception at least on debug level?



ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java
Line 659 (original)


Do not want to LOG the Exception at least on debug level?



ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java
Line 673 (original)


Do not want to LOG the Exception at least on debug level?



ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
Line 307 (original)


Do not want to LOG the Exception at least on debug level?



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
Line 1888 (original), 1888 (patched)


Do not want to LOG the Exception at least on debug level?


- Peter Vary


On nov. 13, 2019, 5:22 du, David Mollitor wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71763/
> ---
> 
> (Updated nov. 13, 2019, 5:22 du)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-22484: Remove Calls to printStackTrace
> 
> 
> Diffs
> -
> 
>   jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java dfaa40fe23 
>   jdbc/src/java/org/apache/hive/jdbc/HiveDriver.java 102683ee18 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 0d7b92d649 
>   ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLSemanticAnalyzerFactory.java 
> c8aaec15d4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableDummyOperator.java 
> e8f7dd067e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 1aae142ba7 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 3210ca5cf8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java a7770b4e53 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java cd4f2a02a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java 
> dfabfb81e5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 077c94f82b 
>   ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java 
> 616f2d6c10 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 67996c6db9 
>   ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 3e45e45b27 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java
>  3e81ab5959 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SortMergeJoinTaskDispatcher.java
>  fbf6852013 
>   serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDe.java 
> 948cddcb28 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeTypeMap.java
>  3f086cdde4 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeTypeSet.java
>  f41959b7d2 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> b87b670652 
> 
> 
> Diff: https://reviews.apache.org/r/71763/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> David Mollitor
> 
>



[jira] [Created] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-13 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-22492:
-

 Summary: Amortize lock contention due to LRFU accounting
 Key: HIVE-22492
 URL: https://issues.apache.org/jira/browse/HIVE-22492
 Project: Hive
  Issue Type: Improvement
Reporter: Slim Bouguerra
Assignee: Slim Bouguerra


LRFU eviction policy can be a major source of contention under high load.
This can be see on the following profiles.
To fix this the idea is to use a batching wrapper to amortize the locking 
contention.
The trick i a common way to amortize locking as explained here 
http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 71763: HIVE-22484: Remove Calls to printStackTrace

2019-11-13 Thread David Mollitor

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71763/
---

Review request for hive.


Repository: hive-git


Description
---

HIVE-22484: Remove Calls to printStackTrace


Diffs
-

  jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java dfaa40fe23 
  jdbc/src/java/org/apache/hive/jdbc/HiveDriver.java 102683ee18 
  ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java 0d7b92d649 
  ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLSemanticAnalyzerFactory.java 
c8aaec15d4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableDummyOperator.java 
e8f7dd067e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 1aae142ba7 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 3210ca5cf8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java a7770b4e53 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java cd4f2a02a3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java 
dfabfb81e5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 077c94f82b 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java 
616f2d6c10 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 67996c6db9 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 3e45e45b27 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java
 3e81ab5959 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SortMergeJoinTaskDispatcher.java
 fbf6852013 
  serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDe.java 
948cddcb28 
  
serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeTypeMap.java
 3f086cdde4 
  
serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeTypeSet.java
 f41959b7d2 
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
b87b670652 


Diff: https://reviews.apache.org/r/71763/diff/1/


Testing
---


Thanks,

David Mollitor



Review Request 71762: HIVE-22308 Add missing support of Azure Blobstore schemes

2019-11-13 Thread Dávid Lavati

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71762/
---

Review request for hive.


Bugs: HIVE-22308
https://issues.apache.org/jira/browse/HIVE-22308


Repository: hive-git


Description
---

Azure has been used as a filesystem for Hive, but its various schemes aren't 
registered under

HiveConf.HIVE_BLOBSTORE_SUPPORTED_SCHEMES.

Found the list of elements in: 
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemUriSchemes.java


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java afee315378 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java 
b135be82a2 


Diff: https://reviews.apache.org/r/71762/diff/1/


Testing
---

Modified related unit test


Thanks,

Dávid Lavati



[jira] [Created] (HIVE-22491) Use Collections emptyList

2019-11-13 Thread David Mollitor (Jira)
David Mollitor created HIVE-22491:
-

 Summary: Use Collections emptyList
 Key: HIVE-22491
 URL: https://issues.apache.org/jira/browse/HIVE-22491
 Project: Hive
  Issue Type: Improvement
Affects Versions: 3.2.0
 Environment: 
https://docs.oracle.com/javase/8/docs/api/?java/util/Collections.html

Use Collections#emptyList instead of instantiating empty ArrayLists
Reporter: David Mollitor
Assignee: David Mollitor
 Attachments: HIVE-22491.1.patch





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 71761: HIVE-22489

2019-11-13 Thread Krisztian Kasa

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71761/
---

Review request for hive, Jesús Camacho Rodríguez and Zoltan Haindrich.


Bugs: HIVE-22489
https://issues.apache.org/jira/browse/HIVE-22489


Repository: hive-git


Description
---

Reduce Sink operator orders nulls first


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java 
a50ad78e8f 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
 0f95d7788c 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java
 268aca6b58 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverter.java
 c11ed59012 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java e9b035d3b4 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java e20f6956b2 
  ql/src/java/org/apache/hadoop/hive/ql/util/NullOrdering.java 6bf1db272a 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java a78fdfc394 


Diff: https://reviews.apache.org/r/71761/diff/1/


Testing
---

Run tests:
- TestExecDriver.java
- order_null.q
- sample10.q
- vector_char_2.q
- vector_order_null.q
- vector_windowing_gby2.q


Thanks,

Krisztian Kasa



Review Request 71760: HIVE-21146 Enforce TransactionBatch size=1 for blob stores

2019-11-13 Thread Dávid Lavati

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71760/
---

Review request for hive.


Bugs: HIVE-21146
https://issues.apache.org/jira/browse/HIVE-21146


Repository: hive-git


Description
---

Streaming Ingest API supports a concept of TransactionBatch where N 
transactions can be opened at once and the data in all of them will be written 
to the same delta_x_y directory where each transaction in the batch can be 
committed/aborted independently.  The implementation relies on 
FSDataOutputStream.hflush() (called from OrcRecordUpdater}} which is available 
on HDFS but is often implemented as no-op in Blob store backed FileSystem 
objects.

Need to add a check to HiveStreamingConnection() constructor to raise an error 
if builder.transactionBatchSize > 1 and the target table/partitions are backed 
by something that doesn't support hflush().


Diffs
-

  streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java 
f4e71f915b 
  streaming/src/test/org/apache/hive/streaming/TestStreaming.java 055672f910 


Diff: https://reviews.apache.org/r/71760/diff/1/


Testing
---

Based on a previous precommit test, only affected tests are: 
TestCompactor,TestStreaming


Thanks,

Dávid Lavati



[jira] [Created] (HIVE-22490) Adding jars with special characters in their path throws error

2019-11-13 Thread Jira
Ádám Szita created HIVE-22490:
-

 Summary: Adding jars with special characters in their path throws 
error
 Key: HIVE-22490
 URL: https://issues.apache.org/jira/browse/HIVE-22490
 Project: Hive
  Issue Type: Bug
Reporter: Ádám Szita
Assignee: Ádám Szita


HIVE-9664 introduced a change that uses URIs in SessionState to handle adding 
jars or other dependencies in a Hive session, but forgot to add URL encoding.

This resulted a regression as path such as /tmp/blabla-[special].jar was 
working before HIVE-9664 and now it's throwing an error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22489) Reduce Sink orders nulls first

2019-11-13 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22489:
-

 Summary: Reduce Sink orders nulls first
 Key: HIVE-22489
 URL: https://issues.apache.org/jira/browse/HIVE-22489
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


When the property hive.default.nulls.last is set to true and no null order is 
explicitly specified in the ORDER BY clause of the query null ordering should 
be NULLS LAST.
But some of the Reduce Sink operators still orders null first.
{code}
SET hive.default.nulls.last=true;

EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = 
src2.key) ORDER BY src1.key LIMIT 5;
{code}

{code}
PREHOOK: query: EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = 
src2.key) ORDER BY src1.key
PREHOOK: type: QUERY
PREHOOK: Input: default@src
 A masked pattern was here 
POSTHOOK: query: EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = 
src2.key) ORDER BY src1.key
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
 A masked pattern was here 
OPTIMIZED SQL: SELECT `t0`.`key`, `t2`.`value`
FROM (SELECT `key`
FROM `default`.`src`
WHERE `key` IS NOT NULL) AS `t0`
INNER JOIN (SELECT `key`, `value`
FROM `default`.`src`
WHERE `key` IS NOT NULL) AS `t2` ON `t0`.`key` = `t2`.`key`
ORDER BY `t0`.`key`
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
 A masked pattern was here 
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
 A masked pattern was here 
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: src1
  filterExpr: key is not null (type: boolean)
  Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
  GatherStats: false
  Filter Operator
isSamplingPred: false
predicate: key is not null (type: boolean)
Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
Select Operator
  expressions: key (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
  Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: a
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
tag: 0
auto parallelism: true
Execution mode: vectorized, llap
LLAP IO: no inputs
Path -> Alias:
 A masked pattern was here 
Path -> Partition:
 A masked pattern was here 
Partition
  base file name: src
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  properties:
COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
bucket_count -1
bucketing_version 2
column.name.delimiter ,
columns key,value
columns.comments 'default','default'
columns.types string:string
 A masked pattern was here 
name default.src
numFiles 1
numRows 500
rawDataSize 5312
serialization.ddl struct src { string key, string value}
serialization.format 1
serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 5812
 A masked pattern was here 
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
  COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
  bucket_count -1
  bucketing_version 2
  column.name.delimiter ,
  columns key,value
  columns.com

[jira] [Created] (HIVE-22488) Break up DDLSemanticAnalyzer - extract Table creation analyzers

2019-11-13 Thread Miklos Gergely (Jira)
Miklos Gergely created HIVE-22488:
-

 Summary: Break up DDLSemanticAnalyzer - extract Table creation 
analyzers
 Key: HIVE-22488
 URL: https://issues.apache.org/jira/browse/HIVE-22488
 Project: Hive
  Issue Type: Sub-task
Reporter: Miklos Gergely
Assignee: Miklos Gergely


DDLSemanticAnalyzer is a huge class, more than 4000 lines long. The goal is to 
refactor it in order to have everything cut into more handleable classes under 
the package  org.apache.hadoop.hive.ql.exec.ddl:
 * have a separate class for each analyzers
 * have a package for each operation, containing an analyzer, a description, 
and an operation, so the amount of classes under a package is more manageable

Step #8: extract all the rest of the analyzers from DDLSemanticAnalyzer, which 
can not be classified otherwise, and move them under the new package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22487) Windowing functions (first_value and last_value) doesnot ignore null values until non-null value is found

2019-11-13 Thread Pulkit Sharma (Jira)
Pulkit Sharma created HIVE-22487:


 Summary: Windowing functions (first_value and last_value) doesnot 
ignore null values until non-null value is found
 Key: HIVE-22487
 URL: https://issues.apache.org/jira/browse/HIVE-22487
 Project: Hive
  Issue Type: Bug
Reporter: Pulkit Sharma


Windowing functions (first_value and last_value) does not ignore null values 
until it encounters the first non-null value. If the non-null value for field 
is found, it shows first_value/last_value as null until the non-null value for 
field is found.


How to reproduce :
{code:java}
create table test_first_value(state string, seats int, name string);
insert into test_first_value values('CA', 16, null);
insert into test_first_value values('CA', 17, 'CA17');
insert into test_first_value values('CA', 18, 'CA18');
insert into test_first_value values('CA', 19, null);
insert into test_first_value values('CA', 20, null);
insert into test_first_value values('CA', 21, null);
select state, seats, name, first_value(name, true) over (PARTITION by state 
order by seats desc ) from test_first_value;
Results : 
CA 21 NULL NULL
CA 20 NULL NULL
CA 19 NULL NULL
CA 18 CA18 CA18
CA 17 CA17 CA18
CA 16 NULL CA18
{code}
In this case, col4 is first_value(name) with _ignore nulls_ as true but we 
still got NULL in first three rows.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)