[jira] [Created] (HIVE-20164) Murmur Hash : Make sure CTAS and IAS use correct bucketing version

2018-07-12 Thread Deepak Jaiswal (JIRA)
Deepak Jaiswal created HIVE-20164:
-

 Summary: Murmur Hash : Make sure CTAS and IAS use correct 
bucketing version
 Key: HIVE-20164
 URL: https://issues.apache.org/jira/browse/HIVE-20164
 Project: Hive
  Issue Type: Bug
Reporter: Deepak Jaiswal
Assignee: Deepak Jaiswal


With the migration to Murmur hash, CTAS and IAS from old table version to new 
table version does not work as intended and data is hashed using old hash logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Should we release storage-api 2.7.0 rc1?

2018-07-12 Thread Deepak Jaiswal
Hi,

I have prepared the rc1 off of branch-3.1.
Artifacts:
Tag : https://github.com/apache/hive/releases/tag/storage-release-2.7.0-rc1
Tar Ball : http://home.apache.org/~djaiswal/hive-storage-2.7.0/

Regards,
Deepak

On 7/10/18, 10:16 AM, "Deepak Jaiswal"  wrote:

Thanks Owen for finding this out. I will work on the next RC once this 
blocker is resolved.

Regards,
Deepak

On 7/10/18, 9:40 AM, "Owen O'Malley"  wrote:

Ok, Jesus and I tracked it down and I've filed
https://issues.apache.org/jira/browse/HIVE-20135 that is a blocker on
storage-api 2.7.0.

The impact was that orc 1.5 and master failed with the RC. orc 1.4 and
older were fine.

.. Owen

On Tue, Jul 10, 2018 at 8:17 AM, Owen O'Malley 
wrote:

> I wanted to give an update on this. For now, I'm -1 because the ORC
> (branch-1.5) tests fail with this RC. I'll dig into what is wrong, 
but it
> looks like something in the timezone changes broke backwards 
compatibility.
>
> .. Owen
>
> On Mon, Jul 9, 2018 at 11:12 AM, Deepak Jaiswal 

> wrote:
>
>> Thanks Alan.
>>
>> On 7/9/18, 10:17 AM, "Alan Gates"  wrote:
>>
>> +1.  Did a build with a clean maven repo, checked the signature 
and
>> sha
>> hash, ran RAT.
>>
>> Alan.
>>
>> On Fri, Jul 6, 2018 at 2:21 PM Deepak Jaiswal <
>> djais...@hortonworks.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I would like to make a new release of the storage-api. It 
contains
>> changes
>> > required for Hive 3.1 release.
>> >
>> > Artifcats:
>> > Tag :
>> > https://github.com/apache/hive/releases/tag/storage-release-
>> 2.7.0-rc0
>> > Tar Ball : http://home.apache.org/~djaiswal/hive-storage-2.7.0/
>> >
>> > Regards,
>> > Deepak
>> >
>>
>>
>>
>






[jira] [Created] (HIVE-20163) Simplify StringSubstrColStart Initialization

2018-07-12 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-20163:
--

 Summary: Simplify StringSubstrColStart Initialization
 Key: HIVE-20163
 URL: https://issues.apache.org/jira/browse/HIVE-20163
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 3.0.0, 4.0.0
Reporter: BELUGA BEHR
 Attachments: HIVE-20163.1.patch

* Remove code
* Remove exception handling
* Remove {{printStackTrace}} call



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 67887: HIVE-20090

2018-07-12 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67887/
---

(Updated July 12, 2018, 11:14 p.m.)


Review request for hive, Ashutosh Chauhan, Deepak Jaiswal, and Gopal V.


Bugs: HIVE-20090
https://issues.apache.org/jira/browse/HIVE-20090


Repository: hive-git


Description
---

HIVE-20090


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
6ea68c35000a5dadb7a01db47bbd8183bff966da 
  itests/src/test/resources/testconfiguration.properties 
4001b9f452f9dbeaff31c2e766334259605a51af 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 
119aa925c1a71502e649b4f2d193a7ff974263c1 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
dec2d1ef38b748a5c9b40d06af491dd168d70b72 
  ql/src/test/queries/clientpositive/dynamic_semijoin_reduction_sw2.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_sw2.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out 
f87fe36e11a7c7e535678dbfaaced04f33bbb501 
  ql/src/test/results/clientpositive/llap/tez_fixed_bucket_pruning.q.out 
6987a96809e3c3300e1b76ea5df3069b3c1d162f 
  ql/src/test/results/clientpositive/perf/tez/query1.q.out 
579940c66e25ebf5e7d0635aaedd0c0cc994f4e0 
  ql/src/test/results/clientpositive/perf/tez/query16.q.out 
0b64c55b0f4ba036aeba4c49f478e9ee1409087c 
  ql/src/test/results/clientpositive/perf/tez/query17.q.out 
2e5e254b2ddc3507f962cbc7691db51f1abafbca 
  ql/src/test/results/clientpositive/perf/tez/query18.q.out 
e8585275b4e51a55ce778dd154033fcdf859e617 
  ql/src/test/results/clientpositive/perf/tez/query2.q.out 
d24899ccf371ad42ef88cebc26cc671c097686da 
  ql/src/test/results/clientpositive/perf/tez/query23.q.out 
6725bec30106bc3321c2869dfc304d0a4da82cf8 
  ql/src/test/results/clientpositive/perf/tez/query24.q.out 
9fcec42c3ab29b898c9c947544a2e29dd08e95e8 
  ql/src/test/results/clientpositive/perf/tez/query25.q.out 
a885cf344b7e29dcf1b2d93d1914e7f9a8d4b921 
  ql/src/test/results/clientpositive/perf/tez/query29.q.out 
46ff49d41a01591f075b2c48ae5a692640fd6eec 
  ql/src/test/results/clientpositive/perf/tez/query31.q.out 
c4d717d8680f6ac6f8f8b6ed01742384a84ddcf9 
  ql/src/test/results/clientpositive/perf/tez/query32.q.out 
6be6f7aa6e6fc50bcedebe3f4d1b5fc00b52ee86 
  ql/src/test/results/clientpositive/perf/tez/query39.q.out 
5966e243ea79b4b884950f34a5b7336e40f92889 
  ql/src/test/results/clientpositive/perf/tez/query40.q.out 
2f116f12ebcba44b876508d0d0f0d827e3a8b28d 
  ql/src/test/results/clientpositive/perf/tez/query54.q.out 
8ab239ce260fb37d988d956fcb9e4eb98a3aeb88 
  ql/src/test/results/clientpositive/perf/tez/query59.q.out 
6b2dcc38737cfc9b955cca1d5b1ac99a7901370b 
  ql/src/test/results/clientpositive/perf/tez/query64.q.out 
a673b9f753a641e111e30a7a4427206d5f2c3da3 
  ql/src/test/results/clientpositive/perf/tez/query69.q.out 
a9c7ac3b21b3e0588e7df7e8c2129fc641d090f1 
  ql/src/test/results/clientpositive/perf/tez/query72.q.out 
48682e340db2916800e9bc5ad61c08c0fb4a8a8b 
  ql/src/test/results/clientpositive/perf/tez/query77.q.out 
163805b2a3dba3e4169d487bd44e7906f66e5868 
  ql/src/test/results/clientpositive/perf/tez/query78.q.out 
90b6f17e1d10ca1e3af17bc53b6df50ffa310af4 
  ql/src/test/results/clientpositive/perf/tez/query80.q.out 
816b525c301fe74460e5657d0b230287d0a6729f 
  ql/src/test/results/clientpositive/perf/tez/query91.q.out 
5e0f00a3e7321c4233f927703701051cab641fb0 
  ql/src/test/results/clientpositive/perf/tez/query92.q.out 
061fcf729d6fa7fde52de3ccd46a800379a92211 
  ql/src/test/results/clientpositive/perf/tez/query94.q.out 
5d19a1634b4657e9ef9595891401e8831d9b0bd4 
  ql/src/test/results/clientpositive/perf/tez/query95.q.out 
400cc1958116b2347a06b52a1460320fd0e0be43 
  
ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_3.q.out
 eafc1c4a005fa2b3bc169aa4453376f5da6841bc 


Diff: https://reviews.apache.org/r/67887/diff/3/

Changes: https://reviews.apache.org/r/67887/diff/2-3/


Testing
---


Thanks,

Jesús Camacho Rodríguez



Re: Review Request 67887: HIVE-20090

2018-07-12 Thread Jesús Camacho Rodríguez


> On July 12, 2018, 5:18 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java
> > Lines 296 (patched)
> > 
> >
> > Can we always look past Gby? Shall we restrict it to map side GBy? What 
> > about rollup and grouping sets which are also represented by Gby op?

I think so. The idea is that the GBy will not generate any new values, except 
NULL for the grouping sets variant. Those NULL values will not join upstream 
upstream in the plan, hence whatever was coming from below the GBy are the only 
possible values that will be joined upstream. Does it make sense?


> On July 12, 2018, 5:18 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java
> > Lines 312 (patched)
> > 
> >
> > Leave a TODO since other simple expressions like CAST can also be 
> > supported

I have added the TODO, though I think if we pass through CBO we will never have 
expressions over here that are more complex than a column (except for maybe a 
constant) because they will be computed below with a SelectOp because of 
Calcite generated plan.


> On July 12, 2018, 5:18 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/perf/tez/query32.q.out
> > Lines 62-64 (original), 62 (patched)
> > 
> >
> > Why are these vertices gone? Extra packing of operators in vertices by 
> > SharedWork opt?

We create a new semijoin filter : _(cs_item_sk BETWEEN 
DynamicValue(RS_24_item_i_item_sk_min) AND 
DynamicValue(RS_24_item_i_item_sk_max) and in_bloom_filter(cs_item_sk, 
DynamicValue(RS_24_item_i_item_sk_bloom_filter)))_ in FIL-121 in the new plan.
This was not present before on one of the branches, and it leads to 
reutilization of the complete subtree up to MERGEJOIN-101.


> On July 12, 2018, 5:18 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/perf/tez/query64.q.out
> > Lines 243 (patched)
> > 
> >
> > Does this mean we Sharedwork opt didnt pack this Map with other map?

This query is so difficult... There is certainly a new semijoin edge coming 
from item to catalog_sales (in particular, this is the reason why we introduced 
the optimization). In turn, this is preventing some reutilization 
opportunities, probably because we are creating additional cycles due to this 
edge.


> On July 12, 2018, 5:18 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/perf/tez/query92.q.out
> > Line 74 (original), 71 (patched)
> > 
> >
> > I couldn't understand these plan changes. Can you explain whats going 
> > on?
> > I see that there is an extra semijoin edge going into Map 1 from R7 
> > which is expected because of patch. However, why 3 vertices are now 
> > disappearing? Is it because of shared work opt?

I think it is a combination of multiple things. SharedWorkOptimizer finds new 
reutilization opportunities: web_sales joined with date_dim is reused.


- Jesús


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67887/#review206015
---


On July 12, 2018, 3:55 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67887/
> ---
> 
> (Updated July 12, 2018, 3:55 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Deepak Jaiswal, and Gopal V.
> 
> 
> Bugs: HIVE-20090
> https://issues.apache.org/jira/browse/HIVE-20090
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20090
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 6ea68c35000a5dadb7a01db47bbd8183bff966da 
>   itests/src/test/resources/testconfiguration.properties 
> 4001b9f452f9dbeaff31c2e766334259605a51af 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 
> 119aa925c1a71502e649b4f2d193a7ff974263c1 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
> dec2d1ef38b748a5c9b40d06af491dd168d70b72 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction_sw2.q 
> PRE-CREATION 
>   
> ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_sw2.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 
> f87fe36e11a7c7e535678dbfaaced04f33bbb501 
>   ql/src/test/results/clientpositive/llap/tez_fixed_bucket_pruning.q.out 
> 6987a96809e3c3300e1b76ea5df3069b3c1d162f 
>  

[jira] [Created] (HIVE-20162) Do Not Print StackTraces to STDERR in AbstractJoinTaskDispatcher

2018-07-12 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-20162:
--

 Summary: Do Not Print StackTraces to STDERR in 
AbstractJoinTaskDispatcher
 Key: HIVE-20162
 URL: https://issues.apache.org/jira/browse/HIVE-20162
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Affects Versions: 3.0.0, 4.0.0
Reporter: BELUGA BEHR


https://github.com/apache/hive/blob/6d890faf22fd1ede3658a5eed097476eab3c67e9/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/AbstractJoinTaskDispatcher.java

{code}
} catch (Exception e) {
  e.printStackTrace();
  throw new SemanticException("Generate Map Join Task Error: " + 
e.getMessage());
}
{code}

Remove the call to {{printStackTrace}} and just throw the error.  If the stack 
trace really is needed (doubtful), then pass it to the {{SemanticException}} 
constructor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20161) Do Not Print StackTraces to STDERR in ParseDriver

2018-07-12 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-20161:
--

 Summary: Do Not Print StackTraces to STDERR in ParseDriver
 Key: HIVE-20161
 URL: https://issues.apache.org/jira/browse/HIVE-20161
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Affects Versions: 3.0.0, 4.0.0
Reporter: BELUGA BEHR


https://github.com/apache/hive/blob/6d890faf22fd1ede3658a5eed097476eab3c67e9/ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java

{code}
// Do not print stack trace to STDERR - remove this, just throw the 
HiveException
} catch (Exception e) {
  e.printStackTrace();
  throw new HiveException(e);
}
...
// Do not log and throw.  log *or* throw.  In this case, just throw. Remove 
logging.
// Remove explicit 'return' call. No need for it.
  try {
skewJoinKeyContext.endGroup();
  } catch (IOException e) {
LOG.error(e.getMessage(), e);
throw new HiveException(e);
  }
  return;
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20160) Do Not Print StackTraces to STDERR in OperatorFactory

2018-07-12 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-20160:
--

 Summary: Do Not Print StackTraces to STDERR in OperatorFactory
 Key: HIVE-20160
 URL: https://issues.apache.org/jira/browse/HIVE-20160
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 3.0.0, 4.0.0
Reporter: BELUGA BEHR


https://github.com/apache/hive/blob/ac6b2a3fb195916e22b2e5f465add2ffbcdc7430/ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java#L158

{code}
} catch (Exception e) {
  e.printStackTrace();
  throw new HiveException(...
{code}

Do not print the stack trace.  The error is being wrapped in a HiveException.  
Allow the code catching this exception to print the error to a logger instead 
of dumping it here to STDERR.  There are several instances of this in the class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20159) Do Not Print StackTraces to STDERR in ConditionalResolverSkewJoin

2018-07-12 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-20159:
--

 Summary: Do Not Print StackTraces to STDERR in 
ConditionalResolverSkewJoin
 Key: HIVE-20159
 URL: https://issues.apache.org/jira/browse/HIVE-20159
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Affects Versions: 3.0.0, 4.0.0
Reporter: BELUGA BEHR


https://github.com/apache/hive/blob/6d890faf22fd1ede3658a5eed097476eab3c67e9/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverSkewJoin.java#L121

{code}
} catch (IOException e) {
  e.printStackTrace();
}
{code}

Introduce an SLF4J logger to this class and print a WARN level log message if 
the {{IOException}} from {{Utilities.listStatusIfExists}} is generated.  I 
suggest WARN because the entire operation doesn't fail if this error happens.  
It continues on its way with the data that it was able to collect.  I'm not 
sure if this is the intended behavior, but for now, an error message in the 
logging would be better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20158) Do Not Print StackTraces to STDERR in Base64TextOutputFormat

2018-07-12 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-20158:
--

 Summary: Do Not Print StackTraces to STDERR in 
Base64TextOutputFormat
 Key: HIVE-20158
 URL: https://issues.apache.org/jira/browse/HIVE-20158
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 3.0.0, 4.0.0
Reporter: BELUGA BEHR


https://github.com/apache/hive/blob/6d890faf22fd1ede3658a5eed097476eab3c67e9/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/base64/Base64TextOutputFormat.java

{code}
  try {
String signatureString = job.get("base64.text.output.format.signature");
if (signatureString != null) {
  signature = signatureString.getBytes("UTF-8");
} else {
  signature = new byte[0];
}
  } catch (UnsupportedEncodingException e) {
e.printStackTrace();
  }
{code}

The {{UnsupportedEncodingException}} is coming from the {{getBytes}} method 
call.  Instead, use the {{CharSet}} version of the method and it doesn't throw 
this explicit exception so the 'try' block can simply be removed.  Every JVM 
will support UTF-8.

https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes(java.nio.charset.Charset)
https://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html#UTF_8



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 65174: [HIVE-17896] TopNKey: Create a standalone vectorizable TopNKey operator

2018-07-12 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65174/#review206027
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java
Lines 37 (patched)


I was talking to Gopal. We were thinking that for first implementation we 
could limit the processor to introduce new TopNKeyOp only below RS-Gby(HASH 
mode). So basically 1) match RS with GBy below, 2) check whether RS contains 
TopN, 3) check whether GBy is in hash mode, and 4) check whether RS keys are 
same as GBy keys. Then, if condition is met, introduce topN below GBy.

When we work on pushdown in follow-up, we can generalize this process, 
extend it, etc.


- Jesús Camacho Rodríguez


On July 11, 2018, 12:30 p.m., Teddy Choi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65174/
> ---
> 
> (Updated July 11, 2018, 12:30 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-17896
> https://issues.apache.org/jira/browse/HIVE-17896
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6ea68c3500 
>   itests/src/test/resources/testconfiguration.properties 9e012ce2f8 
>   
> ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
>  a002348013 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 
> 71ee25d9e0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 7bb6590d5e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 7afbf04797 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java dfd790853b 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorTopNKeyDesc.java 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/topnkey.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/vector_topnkey.q PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/topnkey.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/vector_topnkey.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/tez/topnkey.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/tez/vector_topnkey.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/topnkey.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/vector_topnkey.q.out PRE-CREATION 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
>  9393fb853f 
> 
> 
> Diff: https://reviews.apache.org/r/65174/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Teddy Choi
> 
>



[jira] [Created] (HIVE-20157) Do Not Print StackTraces to STDERR

2018-07-12 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-20157:
--

 Summary: Do Not Print StackTraces to STDERR
 Key: HIVE-20157
 URL: https://issues.apache.org/jira/browse/HIVE-20157
 Project: Hive
  Issue Type: Improvement
  Components: Parser
Affects Versions: 3.0.0, 4.0.0
Reporter: BELUGA BEHR


{{org/apache/hadoop/hive/ql/parse/ParseDriver.java}}

{code}
catch (RecognitionException e) {
  e.printStackTrace();
  throw new ParseException(parser.errors);
}
{code}

Do not use {{e.printStackTrace()}} and print to STDERR.  Either remove or 
replace with a debug-level log statement.  I would vote to simply remove.  
There are several occurrences of this pattern in this class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20156) Printing Stacktrace to STDERR

2018-07-12 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-20156:
--

 Summary: Printing Stacktrace to STDERR
 Key: HIVE-20156
 URL: https://issues.apache.org/jira/browse/HIVE-20156
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 3.0.0, 4.0.0
Reporter: BELUGA BEHR


Class {{org.apache.hadoop.hive.ql.exec.JoinOperator}} has the following code:

{code}
} catch (Exception e) {
  e.printStackTrace();
  throw new HiveException(e);
}
{code}

Do not print the stack trace to STDERR with a call to {{printStackTrace()}}.  
Please remove that line and let the code catching the {{HiveException}} worry 
about printing any messages through a logger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20155) Semijoin Reduction : Put all the min-max filters before all the bloom filters

2018-07-12 Thread Deepak Jaiswal (JIRA)
Deepak Jaiswal created HIVE-20155:
-

 Summary: Semijoin Reduction : Put all the min-max filters before 
all the bloom filters
 Key: HIVE-20155
 URL: https://issues.apache.org/jira/browse/HIVE-20155
 Project: Hive
  Issue Type: Task
Reporter: Deepak Jaiswal
Assignee: Deepak Jaiswal


If there are more than 1 semijoin reduction filters, apply all min-max filters 
before any of the bloom filters are applied as bloom filter lookup is expensive.

 

cc [~gopalv] [~jdere]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20154) Improve unix_timestamp(args) to handle automatic-DST switching timezones

2018-07-12 Thread Vincent Tran (JIRA)
Vincent Tran created HIVE-20154:
---

 Summary: Improve unix_timestamp(args) to handle automatic-DST 
switching timezones
 Key: HIVE-20154
 URL: https://issues.apache.org/jira/browse/HIVE-20154
 Project: Hive
  Issue Type: Improvement
Reporter: Vincent Tran


Currently unix_timestamp(args) UDF will only handle static timezone specifiers. 
It does not recognize SystemV specifiers such as EST5EDT or PST8PDT.

Based on this experiment, when z is used to parse a TZ string like UTC4PDT 
(obviously not a valid SystemV specifier) - it will parse the time as UTC.
When zz is used to parse a TZ string like UTC4PDT, it will take parse the 
timestamp as the TZ of the final z position. This is demonstrated by my final 
query when the format string z4z1z is used to parse UTC4PDT1EDT.

{noformat}
0: jdbc:hive2://localhost:1/default>; select 
from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC4PDT", "-MM-dd 
HH:mm:ss z"), "-MM-dd HH:mm:ss ");
++--+
|_c0 |
++--+
| 2018-01-31 16:00:00 Pacific Standard Time  |
++--+
1 row selected (0.041 seconds)
0: jdbc:hive2://localhost:1/default>; select 
from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC", "-MM-dd HH:mm:ss 
z"), "-MM-dd HH:mm:ss ");
++--+
|_c0 |
++--+
| 2018-01-31 16:00:00 Pacific Standard Time  |
++--+
1 row selected (0.041 seconds)
0: jdbc:hive2://localhost:1/default>; select 
from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC4PDT", "-MM-dd 
HH:mm:ss z4z"), "-MM-dd HH:mm:ss ");
++--+
|_c0 |
++--+
| 2018-01-31 23:00:00 Pacific Standard Time  |
++--+
1 row selected (0.047 seconds)
0: jdbc:hive2://localhost:1/default>; select 
from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC4PDT1EDT", "-MM-dd 
HH:mm:ss z4z1z"), "-MM-dd HH:mm:ss ");
++--+
|_c0 |
++--+
| 2018-01-31 20:00:00 Pacific Standard Time  |
++--+
1 row selected (0.055 seconds)
0: jdbc:hive2://localhost:1/default>;
{noformat}



So all in all, I don't think the SystemV specifier EST5EDT or PST8PDT are valid 
to unix_timestamp(args) at all. And that those when parsed with the zz 
format string, will be read as whatever valid timezone at the final position 
(effectively EDT and PDT respectively in when those valid SystemV TZ specifiers 
above are used).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 67887: HIVE-20090

2018-07-12 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67887/#review206015
---




ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java
Lines 296 (patched)


Can we always look past Gby? Shall we restrict it to map side GBy? What 
about rollup and grouping sets which are also represented by Gby op?



ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java
Lines 312 (patched)


Leave a TODO since other simple expressions like CAST can also be supported



ql/src/test/results/clientpositive/perf/tez/query32.q.out
Lines 62-64 (original), 62 (patched)


Why are these vertices gone? Extra packing of operators in vertices by 
SharedWork opt?



ql/src/test/results/clientpositive/perf/tez/query64.q.out
Lines 243 (patched)


Does this mean we Sharedwork opt didnt pack this Map with other map?



ql/src/test/results/clientpositive/perf/tez/query64.q.out
Lines 294-295 (original), 295-300 (patched)


Do we know reason for these extra vertices?



ql/src/test/results/clientpositive/perf/tez/query92.q.out
Line 74 (original), 71 (patched)


I couldn't understand these plan changes. Can you explain whats going on?
I see that there is an extra semijoin edge going into Map 1 from R7 which 
is expected because of patch. However, why 3 vertices are now disappearing? Is 
it because of shared work opt?


- Ashutosh Chauhan


On July 12, 2018, 3:55 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67887/
> ---
> 
> (Updated July 12, 2018, 3:55 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Deepak Jaiswal, and Gopal V.
> 
> 
> Bugs: HIVE-20090
> https://issues.apache.org/jira/browse/HIVE-20090
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20090
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 6ea68c35000a5dadb7a01db47bbd8183bff966da 
>   itests/src/test/resources/testconfiguration.properties 
> 4001b9f452f9dbeaff31c2e766334259605a51af 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 
> 119aa925c1a71502e649b4f2d193a7ff974263c1 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
> dec2d1ef38b748a5c9b40d06af491dd168d70b72 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction_sw2.q 
> PRE-CREATION 
>   
> ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_sw2.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 
> f87fe36e11a7c7e535678dbfaaced04f33bbb501 
>   ql/src/test/results/clientpositive/llap/tez_fixed_bucket_pruning.q.out 
> 6987a96809e3c3300e1b76ea5df3069b3c1d162f 
>   ql/src/test/results/clientpositive/perf/tez/query1.q.out 
> 579940c66e25ebf5e7d0635aaedd0c0cc994f4e0 
>   ql/src/test/results/clientpositive/perf/tez/query16.q.out 
> 0b64c55b0f4ba036aeba4c49f478e9ee1409087c 
>   ql/src/test/results/clientpositive/perf/tez/query17.q.out 
> 2e5e254b2ddc3507f962cbc7691db51f1abafbca 
>   ql/src/test/results/clientpositive/perf/tez/query18.q.out 
> e8585275b4e51a55ce778dd154033fcdf859e617 
>   ql/src/test/results/clientpositive/perf/tez/query2.q.out 
> d24899ccf371ad42ef88cebc26cc671c097686da 
>   ql/src/test/results/clientpositive/perf/tez/query23.q.out 
> 6725bec30106bc3321c2869dfc304d0a4da82cf8 
>   ql/src/test/results/clientpositive/perf/tez/query24.q.out 
> 9fcec42c3ab29b898c9c947544a2e29dd08e95e8 
>   ql/src/test/results/clientpositive/perf/tez/query25.q.out 
> a885cf344b7e29dcf1b2d93d1914e7f9a8d4b921 
>   ql/src/test/results/clientpositive/perf/tez/query29.q.out 
> 46ff49d41a01591f075b2c48ae5a692640fd6eec 
>   ql/src/test/results/clientpositive/perf/tez/query31.q.out 
> c4d717d8680f6ac6f8f8b6ed01742384a84ddcf9 
>   ql/src/test/results/clientpositive/perf/tez/query32.q.out 
> 6be6f7aa6e6fc50bcedebe3f4d1b5fc00b52ee86 
>   ql/src/test/results/clientpositive/perf/tez/query39.q.out 
> 5966e243ea79b4b884950f34a5b7336e40f92889 
>   ql/src/test/results/clientpositive/perf/tez/query40.q.out 
> 2f116f12ebcba44b876508d0d0f0d827e3a8b28d 
>   ql/src/test/results/clientpositive/perf/tez/query54.q.out 
> 8ab239ce260fb37d988d956fcb9e4eb98a3aeb88 
>   ql/src/test/results/clientpositive/perf/tez/query59.q.out 
> 6b2dcc38737cfc9b955cca1d5b1ac99a7901370b 
>   ql/src/test/results/clientpositive/perf/tez/query64.q.out 
> a673b9f753a641e111e30a7a4427206d5f2c3da3 

[jira] [Created] (HIVE-20153) Count and Sum UDF consume more memory in Hive 2+

2018-07-12 Thread Szehon Ho (JIRA)
Szehon Ho created HIVE-20153:


 Summary: Count and Sum UDF consume more memory in Hive 2+
 Key: HIVE-20153
 URL: https://issues.apache.org/jira/browse/HIVE-20153
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 2.3.2
Reporter: Szehon Ho


While playing with Hive2, we noticed that queries with a lot of count() and 
sum() aggregations run out of memory on Hadoop side much faster than in Hive1.  
Taking heap dump, we see one of the main culprit is the field 'uniqueObjects' 
in GeneraicUDAFSum and GenericUDAFCount, which was added to support Window 
functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 67887: HIVE-20090

2018-07-12 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67887/
---

(Updated July 12, 2018, 3:55 p.m.)


Review request for hive, Ashutosh Chauhan, Deepak Jaiswal, and Gopal V.


Bugs: HIVE-20090
https://issues.apache.org/jira/browse/HIVE-20090


Repository: hive-git


Description
---

HIVE-20090


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
6ea68c35000a5dadb7a01db47bbd8183bff966da 
  itests/src/test/resources/testconfiguration.properties 
4001b9f452f9dbeaff31c2e766334259605a51af 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 
119aa925c1a71502e649b4f2d193a7ff974263c1 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
dec2d1ef38b748a5c9b40d06af491dd168d70b72 
  ql/src/test/queries/clientpositive/dynamic_semijoin_reduction_sw2.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_sw2.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out 
f87fe36e11a7c7e535678dbfaaced04f33bbb501 
  ql/src/test/results/clientpositive/llap/tez_fixed_bucket_pruning.q.out 
6987a96809e3c3300e1b76ea5df3069b3c1d162f 
  ql/src/test/results/clientpositive/perf/tez/query1.q.out 
579940c66e25ebf5e7d0635aaedd0c0cc994f4e0 
  ql/src/test/results/clientpositive/perf/tez/query16.q.out 
0b64c55b0f4ba036aeba4c49f478e9ee1409087c 
  ql/src/test/results/clientpositive/perf/tez/query17.q.out 
2e5e254b2ddc3507f962cbc7691db51f1abafbca 
  ql/src/test/results/clientpositive/perf/tez/query18.q.out 
e8585275b4e51a55ce778dd154033fcdf859e617 
  ql/src/test/results/clientpositive/perf/tez/query2.q.out 
d24899ccf371ad42ef88cebc26cc671c097686da 
  ql/src/test/results/clientpositive/perf/tez/query23.q.out 
6725bec30106bc3321c2869dfc304d0a4da82cf8 
  ql/src/test/results/clientpositive/perf/tez/query24.q.out 
9fcec42c3ab29b898c9c947544a2e29dd08e95e8 
  ql/src/test/results/clientpositive/perf/tez/query25.q.out 
a885cf344b7e29dcf1b2d93d1914e7f9a8d4b921 
  ql/src/test/results/clientpositive/perf/tez/query29.q.out 
46ff49d41a01591f075b2c48ae5a692640fd6eec 
  ql/src/test/results/clientpositive/perf/tez/query31.q.out 
c4d717d8680f6ac6f8f8b6ed01742384a84ddcf9 
  ql/src/test/results/clientpositive/perf/tez/query32.q.out 
6be6f7aa6e6fc50bcedebe3f4d1b5fc00b52ee86 
  ql/src/test/results/clientpositive/perf/tez/query39.q.out 
5966e243ea79b4b884950f34a5b7336e40f92889 
  ql/src/test/results/clientpositive/perf/tez/query40.q.out 
2f116f12ebcba44b876508d0d0f0d827e3a8b28d 
  ql/src/test/results/clientpositive/perf/tez/query54.q.out 
8ab239ce260fb37d988d956fcb9e4eb98a3aeb88 
  ql/src/test/results/clientpositive/perf/tez/query59.q.out 
6b2dcc38737cfc9b955cca1d5b1ac99a7901370b 
  ql/src/test/results/clientpositive/perf/tez/query64.q.out 
a673b9f753a641e111e30a7a4427206d5f2c3da3 
  ql/src/test/results/clientpositive/perf/tez/query69.q.out 
a9c7ac3b21b3e0588e7df7e8c2129fc641d090f1 
  ql/src/test/results/clientpositive/perf/tez/query72.q.out 
48682e340db2916800e9bc5ad61c08c0fb4a8a8b 
  ql/src/test/results/clientpositive/perf/tez/query77.q.out 
163805b2a3dba3e4169d487bd44e7906f66e5868 
  ql/src/test/results/clientpositive/perf/tez/query78.q.out 
90b6f17e1d10ca1e3af17bc53b6df50ffa310af4 
  ql/src/test/results/clientpositive/perf/tez/query80.q.out 
816b525c301fe74460e5657d0b230287d0a6729f 
  ql/src/test/results/clientpositive/perf/tez/query91.q.out 
5e0f00a3e7321c4233f927703701051cab641fb0 
  ql/src/test/results/clientpositive/perf/tez/query92.q.out 
061fcf729d6fa7fde52de3ccd46a800379a92211 
  ql/src/test/results/clientpositive/perf/tez/query94.q.out 
5d19a1634b4657e9ef9595891401e8831d9b0bd4 
  ql/src/test/results/clientpositive/perf/tez/query95.q.out 
400cc1958116b2347a06b52a1460320fd0e0be43 
  
ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_3.q.out
 eafc1c4a005fa2b3bc169aa4453376f5da6841bc 


Diff: https://reviews.apache.org/r/67887/diff/2/

Changes: https://reviews.apache.org/r/67887/diff/1-2/


Testing
---


Thanks,

Jesús Camacho Rodríguez



Re: Review Request 67887: HIVE-20090

2018-07-12 Thread Jesús Camacho Rodríguez


> On July 12, 2018, 2:01 a.m., Deepak Jaiswal wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
> > Lines 419 (patched)
> > 
> >
> > Each of these functions check if semijoin reduction is enabled or not. 
> > I think it would be a bit efficient if the check happens at the beginning 
> > of this function and remove it from all the underlying functions.
> > 
> > if 
> > (!procCtx.conf.getBoolVar(ConfVars.TEZ_DYNAMIC_SEMIJOIN_REDUCTION) ||
> > procCtx.parseContext.getRsToSemiJoinBranchInfo().size() == 
> > 0) {
> >   return;
> > }

Kind of rewrote it, now it is easier to spot what is executed when. However, 
observe that some functions are intercalated with other functions that have 
different requirements.


> On July 12, 2018, 2:01 a.m., Deepak Jaiswal wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
> > Lines 1064 (patched)
> > 
> >
> > Is the first condition to handle cycles?

The first condition is so a semijoin does not remove itself.


- Jesús


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67887/#review205976
---


On July 11, 2018, 5:26 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67887/
> ---
> 
> (Updated July 11, 2018, 5:26 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Deepak Jaiswal, and Gopal V.
> 
> 
> Bugs: HIVE-20090
> https://issues.apache.org/jira/browse/HIVE-20090
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20090
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 6ea68c35000a5dadb7a01db47bbd8183bff966da 
>   itests/src/test/resources/testconfiguration.properties 
> 9e012ce2f8f789bde3f95acc43052bf4446fccbc 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 
> dfd790853b2f73a465989374e78c01d282d16891 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 
> dec2d1ef38b748a5c9b40d06af491dd168d70b72 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction_sw2.q 
> PRE-CREATION 
>   
> ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_sw2.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 
> f87fe36e11a7c7e535678dbfaaced04f33bbb501 
>   ql/src/test/results/clientpositive/llap/tez_fixed_bucket_pruning.q.out 
> 6987a96809e3c3300e1b76ea5df3069b3c1d162f 
>   ql/src/test/results/clientpositive/perf/tez/query1.q.out 
> 579940c66e25ebf5e7d0635aaedd0c0cc994f4e0 
>   ql/src/test/results/clientpositive/perf/tez/query16.q.out 
> 0b64c55b0f4ba036aeba4c49f478e9ee1409087c 
>   ql/src/test/results/clientpositive/perf/tez/query17.q.out 
> 2e5e254b2ddc3507f962cbc7691db51f1abafbca 
>   ql/src/test/results/clientpositive/perf/tez/query18.q.out 
> e8585275b4e51a55ce778dd154033fcdf859e617 
>   ql/src/test/results/clientpositive/perf/tez/query2.q.out 
> d24899ccf371ad42ef88cebc26cc671c097686da 
>   ql/src/test/results/clientpositive/perf/tez/query23.q.out 
> 6725bec30106bc3321c2869dfc304d0a4da82cf8 
>   ql/src/test/results/clientpositive/perf/tez/query24.q.out 
> 9fcec42c3ab29b898c9c947544a2e29dd08e95e8 
>   ql/src/test/results/clientpositive/perf/tez/query25.q.out 
> a885cf344b7e29dcf1b2d93d1914e7f9a8d4b921 
>   ql/src/test/results/clientpositive/perf/tez/query29.q.out 
> 46ff49d41a01591f075b2c48ae5a692640fd6eec 
>   ql/src/test/results/clientpositive/perf/tez/query31.q.out 
> c4d717d8680f6ac6f8f8b6ed01742384a84ddcf9 
>   ql/src/test/results/clientpositive/perf/tez/query32.q.out 
> 6be6f7aa6e6fc50bcedebe3f4d1b5fc00b52ee86 
>   ql/src/test/results/clientpositive/perf/tez/query39.q.out 
> 5966e243ea79b4b884950f34a5b7336e40f92889 
>   ql/src/test/results/clientpositive/perf/tez/query40.q.out 
> 2f116f12ebcba44b876508d0d0f0d827e3a8b28d 
>   ql/src/test/results/clientpositive/perf/tez/query54.q.out 
> 8ab239ce260fb37d988d956fcb9e4eb98a3aeb88 
>   ql/src/test/results/clientpositive/perf/tez/query59.q.out 
> 6b2dcc38737cfc9b955cca1d5b1ac99a7901370b 
>   ql/src/test/results/clientpositive/perf/tez/query64.q.out 
> a673b9f753a641e111e30a7a4427206d5f2c3da3 
>   ql/src/test/results/clientpositive/perf/tez/query69.q.out 
> a9c7ac3b21b3e0588e7df7e8c2129fc641d090f1 
>   ql/src/test/results/clientpositive/perf/tez/query72.q.out 
> 48682e340db2916800e9bc5ad61c08c0fb4a8a8b 
>   ql/src/test/results/clientpositive/perf/tez/query77.q.out 
> 163805b2a3dba3e4169d487bd44e7906f66e5868 
>   ql/src/test/results/clientpositive/perf/tez/query78.q.out 
> 

[jira] [Created] (HIVE-20152) reset db state so rename table can be done if repl dump fails

2018-07-12 Thread anishek (JIRA)
anishek created HIVE-20152:
--

 Summary: reset db state so rename table can be done if repl dump 
fails
 Key: HIVE-20152
 URL: https://issues.apache.org/jira/browse/HIVE-20152
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 3.0.0
Reporter: anishek
Assignee: mahesh kumar behera


If a repl dump command is run and it fails for some reason while doing table 
level dumps, the state set on the db parameters is not reset and hence no table 
/ partition renames can be done. 

the property to be reset is prefixed with key {code}bootstrap.dump.state {code}
and it should be unset. meanwhile the workaround is 

{code}
describe database extended [db_name]; 
{code}
assuming property is 'bootstrap.dump.state.something'
{code}
alter  database [db_name] set dbproperties 
('bootstrap.dump.state.something'='idle');"
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 67895: Improve HiveMetaStoreClient.dropDatabase

2018-07-12 Thread Adam Szita via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67895/
---

Review request for hive.


Bugs: HIVE-18705
https://issues.apache.org/jira/browse/HIVE-18705


Repository: hive-git


Description
---

HiveMetaStoreClient.dropDatabase has a strange implementation to ensure dealing 
with client side hooks (for non-native tables e.g. HBase). Currently it starts 
by retrieving all the tables from HMS, and then sends dropTable calls to HMS 
table-by-table. At the end a dropDatabase just to be sure  

I believe this could be refactored so that it speeds up the dropDB in 
situations where the average table count per DB is very high.


Diffs
-

  hbase-handler/src/test/queries/positive/drop_database_table_hooks.q 
PRE-CREATION 
  hbase-handler/src/test/results/positive/drop_database_table_hooks.q.out 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/TableIterable.java 
d8e771d0ffa7d680b2a22436727f896674cd40ff 
  ql/src/test/org/apache/hadoop/hive/ql/metadata/TestTableIterable.java 
6637d150b84c9fa86e6a3a90449606437e7c9d72 
  
service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java 
838dd89ca82792ca8af8eb0f30aa63e690e41f43 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 8d88749effa89e50d8be8ed216419cd77836fd34 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 bfd7141a8b987e5288277a46d56de32574d9aa69 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/TableIterable.java
 PRE-CREATION 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/TestTableIterable.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/67895/diff/1/


Testing
---

Drop database is an existing feature - existing tests should be fine, but since 
I'm poking around client side hooks I've added an HBase drop db qtest so that 
code path is covered


Thanks,

Adam Szita



[jira] [Created] (HIVE-20151) External table: exception while storing stats

2018-07-12 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-20151:
---

 Summary: External table: exception while storing stats
 Key: HIVE-20151
 URL: https://issues.apache.org/jira/browse/HIVE-20151
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Statistics
Reporter: Zoltan Haindrich


 
{code}
java.lang.ClassCastException: 
org.apache.hadoop.hive.metastore.api.LongColumnStatsData cannot be cast to 
org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector
 at 
org.apache.hadoop.hive.metastore.columnstats.merge.LongColumnStatsMerger.merge(LongColumnStatsMerger.java:30)
 ~[hive-exec-3.1.0.3.0.0.0-1632.jar:3.1.0.3.0.0.0-1632]
 at 
org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.mergeColStats(MetaStoreUtils.java:1084)
 ~[hive-exec-3.1.0.3.0.0.0-1632.jar:3.1.0.3.0.0.0-1632]
 at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:7514)
 ~[hive-exec-3.1.0.3.0.0.0-1632.jar:3.1.0.3.0.0.0-1632]
 at sun.reflect.GeneratedMethodAccessor80.invoke(Unknown Source) ~[?:?]
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_161]
 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_161]
 at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
 ~[hive-exec-3.1.0.3.0.0.0-1632.jar:3.1.0.3.0.0.0-1632]
 at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
 ~[hive-exec-3.1.0.3.0.0.0-1632.jar:3.1.0.3.0.0.0-1632]
 at com.sun.proxy.$Proxy34.set_aggr_stats_for(Unknown Source) ~[?:?]
 at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:17017)
 ~[hive-exec-3.1.0.3.0.0.0-1632.jar:3.1.0.3.0.0.0-1632]
 at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:17001)
 ~[hive-exec-3.1.0.3.0.0.0-1632.jar:3.1.0.3.0.0.0-1632]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20150) TopNKey pushdown

2018-07-12 Thread Teddy Choi (JIRA)
Teddy Choi created HIVE-20150:
-

 Summary: TopNKey pushdown
 Key: HIVE-20150
 URL: https://issues.apache.org/jira/browse/HIVE-20150
 Project: Hive
  Issue Type: New Feature
Reporter: Teddy Choi
Assignee: Teddy Choi


TopNKey is implemented in HIVE-17896, but it needs more work in pushdown 
implementation. So this issue covers TopNKey pushdown implementation with 
proper tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)