[jira] [Created] (HIVE-16746) Reduce number of index lookups for same table in IndexWhereTaskDispatcher

2017-05-23 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HIVE-16746:
---

 Summary: Reduce number of index lookups for same table in 
IndexWhereTaskDispatcher
 Key: HIVE-16746
 URL: https://issues.apache.org/jira/browse/HIVE-16746
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan
Priority: Minor


{{IndexWhereTaskDispatcher}} is used when {{hive.optimize.index.filter=true}}. 
It lists all indices for the table and depending on the query complexity, this 
ends up being in the hotpath. For e.g, Q14 explain plan takes 180-200 seconds 
and this index querying multiple times for same tables take up 30-40 seconds. 
This function was invoked around 24000 times for same set of tables.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16745) Syntax error in 041-HIVE-16556.mysql.sql script

2017-05-23 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-16745:
--

 Summary: Syntax error in 041-HIVE-16556.mysql.sql script
 Key: HIVE-16745
 URL: https://issues.apache.org/jira/browse/HIVE-16745
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar
Priority: Minor


041-HIVE-16556.mysql.sql has a syntax error which was introduced with HIVE-16711



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16744) LLAP index update is broken after ORC switch

2017-05-23 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-16744:
---

 Summary: LLAP index update is broken after ORC switch
 Key: HIVE-16744
 URL: https://issues.apache.org/jira/browse/HIVE-16744
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] hive pull request #188: HIVE-15144. Remove dependence on json.org artifacts.

2017-05-23 Thread omalley
GitHub user omalley opened a pull request:

https://github.com/apache/hive/pull/188

HIVE-15144. Remove dependence on json.org artifacts.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omalley/hive hive-15144

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/188.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #188


commit a3ce058d1516825a8680a6aadf47f57f5dbcad2c
Author: Owen O'Malley 
Date:   2017-05-23T22:07:04Z

HIVE-15144. Remove dependence on json.org artifacts.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (HIVE-16743) BitSet set() is not incorrectly used in TxnUtils.createValidCompactTxnList()

2017-05-23 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16743:


 Summary: BitSet set() is not incorrectly used in 
TxnUtils.createValidCompactTxnList()
 Key: HIVE-16743
 URL: https://issues.apache.org/jira/browse/HIVE-16743
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng


The second line is problematic
{code}
BitSet bitSet = new BitSet(exceptions.length);
bitSet.set(0, bitSet.length()); // for ValidCompactorTxnList, everything in 
exceptions are aborted
{code}
For example, exceptions' length is 2. We declare a BitSet object with initial 
size of 2 via the first line above. But that's not the actual size of the 
BitSet. So bitSet.length() will still return 0.

The intention of the second line above is to set all the bits to true. This was 
not achieved because bitSet.set(0, bitSet.length()) is equivalent to 
bitSet.set(0, 0).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Review Request 56140: Can't order by an unselected column

2017-05-23 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56140/
---

(Updated May 23, 2017, 9:38 p.m.)


Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

HIVE-15160


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveProjectSortTransposeRule.java
 1487ed4f8e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java fa96e94f64 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 35fc68a555 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
f678d0b0a0 
  ql/src/test/queries/clientpositive/order_by_expr_1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/order_by_expr_2.q PRE-CREATION 
  ql/src/test/results/clientpositive/annotate_stats_select.q.out 873f1abb25 
  ql/src/test/results/clientpositive/cp_sel.q.out 1778ccd6a6 
  ql/src/test/results/clientpositive/druid_basic2.q.out 6177d56987 
  ql/src/test/results/clientpositive/dynamic_rdd_cache.q.out fc0030965a 
  ql/src/test/results/clientpositive/groupby_grouping_sets_grouping.q.out 
473d17a1bd 
  ql/src/test/results/clientpositive/llap/bucket_groupby.q.out d724131fca 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out f701cabffe 
  ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 0a8df615fd 
  ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out 24645b6426 
  ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 
77062c737e 
  ql/src/test/results/clientpositive/llap/subquery_in.q.out d7fd29e194 
  ql/src/test/results/clientpositive/llap/vector_coalesce.q.out 840210476b 
  ql/src/test/results/clientpositive/llap/vector_date_1.q.out a4f1050c89 
  ql/src/test/results/clientpositive/llap/vector_decimal_2.q.out 144356c108 
  ql/src/test/results/clientpositive/llap/vector_decimal_round.q.out 00bb50a5a5 
  
ql/src/test/results/clientpositive/llap/vector_groupby_grouping_sets_grouping.q.out
 5af9e61b0a 
  
ql/src/test/results/clientpositive/llap/vector_groupby_grouping_sets_limit.q.out
 f731ceecdc 
  ql/src/test/results/clientpositive/llap/vector_interval_1.q.out 8d4f12e203 
  ql/src/test/results/clientpositive/llap/vector_interval_arithmetic.q.out 
1d14092408 
  ql/src/test/results/clientpositive/order3.q.out 898f7a8853 
  ql/src/test/results/clientpositive/order_by_expr_1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/order_by_expr_2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/pcr.q.out a1301fdf79 
  ql/src/test/results/clientpositive/perf/query31.q.out 9e3dad472a 
  ql/src/test/results/clientpositive/perf/query36.q.out 57ab26acc6 
  ql/src/test/results/clientpositive/perf/query39.q.out dcf3cb264e 
  ql/src/test/results/clientpositive/perf/query42.q.out 3bebac3321 
  ql/src/test/results/clientpositive/perf/query52.q.out 74ecaf28ba 
  ql/src/test/results/clientpositive/perf/query64.q.out 7f97e392e1 
  ql/src/test/results/clientpositive/perf/query66.q.out ec7b6af471 
  ql/src/test/results/clientpositive/perf/query70.q.out 55c1461da8 
  ql/src/test/results/clientpositive/perf/query75.q.out 0ecc9852ed 
  ql/src/test/results/clientpositive/perf/query81.q.out dfd46396b5 
  ql/src/test/results/clientpositive/perf/query85.q.out ba8659e8f2 
  ql/src/test/results/clientpositive/perf/query86.q.out 734e6a480b 
  ql/src/test/results/clientpositive/perf/query89.q.out 66481f710b 
  ql/src/test/results/clientpositive/perf/query91.q.out e592bba8d9 
  ql/src/test/results/clientpositive/pointlookup2.q.out 3438c74608 
  ql/src/test/results/clientpositive/pointlookup3.q.out 2c3e39fd15 
  ql/src/test/results/clientpositive/ppd_udf_case.q.out 7678d03415 
  ql/src/test/results/clientpositive/spark/dynamic_rdd_cache.q.out bcb50cfadc 
  ql/src/test/results/clientpositive/spark/limit_pushdown.q.out ede0096c73 
  ql/src/test/results/clientpositive/spark/pcr.q.out 77ac020d07 
  ql/src/test/results/clientpositive/spark/subquery_in.q.out 5e38938ad6 
  ql/src/test/results/clientpositive/vector_coalesce.q.out 87ab937abb 
  ql/src/test/results/clientpositive/vector_date_1.q.out c2389e6b1e 
  ql/src/test/results/clientpositive/vector_decimal_round.q.out d92b6c241e 
  ql/src/test/results/clientpositive/vector_interval_1.q.out 2a398ae5d3 
  ql/src/test/results/clientpositive/vector_interval_arithmetic.q.out 
b67231c8c4 
  ql/src/test/results/clientpositive/view_alias.q.out 90bf28dd9b 


Diff: https://reviews.apache.org/r/56140/diff/9/

Changes: https://reviews.apache.org/r/56140/diff/8-9/


Testing
---


Thanks,

pengcheng xiong



Re: Review Request 56140: Can't order by an unselected column

2017-05-23 Thread pengcheng xiong


> On May 18, 2017, 12:29 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveProjectSortTransposeRule.java
> > Lines 70-71 (original), 81-83 (patched)
> > 
> >
> > This change looks correct. But don't understand why it was needed. Can 
> > you describe the need for it?

This is a bug exposed by this patch. The query is 
{code}
create table s as select * from src limit 10;
set hive.optimize.limittranspose=true;

explain
select key from s a
union all
select key from s b
order by key
limit 5;
{code}

HiveProjectSortTransposeRule is triggered for

{code}
HiveProject(key=[$0])
  HiveSortLimit(sort0=[$1], dir0=[ASC-nulls-first], offset=[0], fetch=[5])
{code}

 when 
{code}
if (map.getTarget(fc.getFieldIndex()) < 0) {
return;
  }
{code}
is called, fc.getFieldIndex() is 1, but map is 0->0. Then it throws

org.apache.calcite.util.mapping.Mappings$NoElementException: source #1 has no 
target in mapping [size=1, sourceCount=2, targetCount=1, elements=[0:0]]


- pengcheng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56140/#review175299
---


On May 1, 2017, 5:30 p.m., pengcheng xiong wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56140/
> ---
> 
> (Updated May 1, 2017, 5:30 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-15160
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveProjectSortTransposeRule.java
>  1487ed4f8e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 1b054a7e24 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java 262dafb487 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 654f3b1772 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> 8f8eab0d9c 
>   ql/src/test/queries/clientpositive/order_by_expr_1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/order_by_expr_2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/annotate_stats_select.q.out 873f1abb25 
>   ql/src/test/results/clientpositive/cp_sel.q.out 1778ccd6a6 
>   ql/src/test/results/clientpositive/druid_basic2.q.out 6177d56987 
>   ql/src/test/results/clientpositive/dynamic_rdd_cache.q.out 2abb819558 
>   ql/src/test/results/clientpositive/groupby_grouping_sets_grouping.q.out 
> 473d17a1bd 
>   ql/src/test/results/clientpositive/llap/bucket_groupby.q.out d724131fca 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 584c3b5520 
>   ql/src/test/results/clientpositive/llap/limit_pushdown.q.out dd54dd22a6 
>   ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out 24645b6426 
>   ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 
> 83de1fbea1 
>   ql/src/test/results/clientpositive/llap/vector_coalesce.q.out 578f849bdb 
>   ql/src/test/results/clientpositive/llap/vector_date_1.q.out a4f1050c89 
>   ql/src/test/results/clientpositive/llap/vector_decimal_2.q.out 144356c108 
>   ql/src/test/results/clientpositive/llap/vector_decimal_round.q.out 
> 8bd80cf860 
>   
> ql/src/test/results/clientpositive/llap/vector_groupby_grouping_sets_grouping.q.out
>  5af9e61b0a 
>   
> ql/src/test/results/clientpositive/llap/vector_groupby_grouping_sets_limit.q.out
>  f731ceecdc 
>   ql/src/test/results/clientpositive/llap/vector_interval_1.q.out debf5ab39e 
>   ql/src/test/results/clientpositive/llap/vector_interval_arithmetic.q.out 
> aadb6e72cd 
>   ql/src/test/results/clientpositive/order3.q.out 898f7a8853 
>   ql/src/test/results/clientpositive/order_by_expr_1.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/order_by_expr_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/pcr.q.out a1301fdf79 
>   ql/src/test/results/clientpositive/perf/query31.q.out 3ed312d3e3 
>   ql/src/test/results/clientpositive/perf/query36.q.out 57ab26acc6 
>   ql/src/test/results/clientpositive/perf/query39.q.out 19472c4d5e 
>   ql/src/test/results/clientpositive/perf/query42.q.out 3bebac3321 
>   ql/src/test/results/clientpositive/perf/query52.q.out 74ecaf28ba 
>   ql/src/test/results/clientpositive/perf/query64.q.out 6b42393aad 
>   ql/src/test/results/clientpositive/perf/query66.q.out 072bfee92b 
>   ql/src/test/results/clientpositive/perf/query70.q.out 8e42fac9c5 
>   ql/src/test/results/clientpositive/perf/query75.q.out b1e236d325 
>   ql/src/test/results/clientpositive/perf/query81.q.out a09d5c99b5 
>   ql/src/test/results/clientpositive/perf/query85.q.out 168bcd2a4a 
>   ql/src/test/results/clientpositive/perf/query86.q.out 

[jira] [Created] (HIVE-16742) cap the number of reducers for LLAP at the configured value

2017-05-23 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-16742:
---

 Summary: cap the number of reducers for LLAP at the configured 
value
 Key: HIVE-16742
 URL: https://issues.apache.org/jira/browse/HIVE-16742
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Fwd: JSON reminder

2017-05-23 Thread Alan Gates
This applies to Hive and blocks the 2.2 and 2.3 releases.  The issue has
not been resolved on master as well.

Alan.

-- Forwarded message --
From: Jim Jagielski 
Date: Tue, May 23, 2017 at 6:57 AM
Subject: JSON reminder
To: legal discuss 


The grandfather exception closed last month:

"""
If you have been using it, and have done so in a *release*, AND there has
been NO pushback from your community/eco-system, you have a temporary
exclusion from the Cat-X classification thru April 30, 2017. At that point
in time, ANY and ALL usage of these JSON licensed artifacts are DISALLOWED.
"""

I think the VP Legal, now that we have one (Hi Chris!) should
remind the PMCs...
-
To unsubscribe, e-mail: legal-discuss-unsubscr...@apache.org
For additional commands, e-mail: legal-discuss-h...@apache.org


[GitHub] hive pull request #187: HIVE-16706: Bootstrap REPL DUMP shouldn't fail when ...

2017-05-23 Thread sankarh
GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/187

HIVE-16706: Bootstrap REPL DUMP shouldn't fail when a partition is 
dropped/renamed when dump in progress.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-16706

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/187.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #187


commit 7e9ed26ef2bd5a89cfe6df0ad22552dad003c82b
Author: Sankar Hariappan 
Date:   2017-05-23T16:13:04Z

HIVE-16706: Bootstrap REPL DUMP shouldn't fail when a partition is 
dropped/renamed when dump in progress.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Review Request 59468: Optimize a combination of avg(), sum(), count(distinct) etc

2017-05-23 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59468/#review175801
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java
Lines 61 (patched)


Comment: Queries of form : select max(c), count(distinct c) from T; 
generates a plan of form TS->mGBy->RS->rGBy->FS 
This plan suffers from a problem that vertex containing rGBy->FS 
necessarily need to have 1 task. This limitation results in slow execution 
because that task gets all the data. 
This optimization if successful will rewrite above plan to 
TS->mGby->RS->mGby2->RS->rGBy->FS This introduces extra vertex of mGby2->RS 
Note this vertex can have multiple tasks and since we are doing aggregation, 
output of this must necessarily be smaller than its input, which results in 
much less data going in to rGby->FS vertex, which continues to have single task.
Also note on calcite tree we have HiveExpandDistinctAggregatesRule rule 
which does similiar plan transformation but has different conditions which 
needs to be satisified.
Additionally, we don't do any costing here but this is possibly that this 
transformation may slow down query a bit since if data is small enough to fit 
in a single task of last reducer, injecting additional vertex in pipeline may 
make query slower.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java
Lines 121 (patched)


Unused field.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java
Lines 122 (patched)


Comment : Position of distinct column in aggregator list of map Gby before 
rewrite.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java
Lines 135 (patched)


Should this be cntDist > 1 ?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java
Lines 156-158 (patched)


This seems redundant since we already checked for cntDist != in loop.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java
Lines 159 (patched)


Some extra safety checks:
1) mGby is in hash mode.
2) rGby is in mergepartial mode.
3) RS.getKeys().size() =1 
4) RS partition column size = 1
5) RS sort col size = 1.
6) mGby has no grouping sets.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java
Lines 197 (patched)


Comment : distinct is at lost position.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java
Lines 313 (patched)


This should be PARTIAL2 mode as well, since GBy operator is running in 
Partial2 mode.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java
Lines 405 (patched)


throw new SemanticException(e);



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GroupByOptimizer.java
Line 538 (original), 538-546 (patched)


This change may not be needed if we run Count distinct optimization after 
this has alreday run.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
Lines 75 (patched)


Also, lets call this optimizaiton only for Tez.


- Ashutosh Chauhan


On May 22, 2017, 10:31 p.m., pengcheng xiong wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59468/
> ---
> 
> (Updated May 22, 2017, 10:31 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Gopal V.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16654
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7dedd23591 
>   itests/src/test/resources/testconfiguration.properties e23ef6317f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 8b04cd44fa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GroupByOptimizer.java 
> 3233157d8d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 7dace9076f 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 38a9ef2af1 
>   ql/src/test/queries/clientpositive/count_dist_rewrite.q PRE-CREATION 
>   

[jira] [Created] (HIVE-16741) Counting number of records in hive and hbase are different for NULL fields in hive

2017-05-23 Thread Aleksey Vovchenko (JIRA)
Aleksey Vovchenko created HIVE-16741:


 Summary:  Counting number of records in hive and hbase are 
different for NULL fields in hive
 Key: HIVE-16741
 URL: https://issues.apache.org/jira/browse/HIVE-16741
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.1.0, 1.2.0
Reporter: Aleksey Vovchenko
Assignee: Aleksey Vovchenko


Steps to reproduce:

STEP 1.  

hbase> create 'testTable',{NAME=>'cf'}

STEP 2.
put 'testTable','10','cf:Address','My Address 411002'
put 'testTable','10','cf:contactId','653638'
put 'testTable','10','cf:currentStatus','Awaiting'
put 'testTable','10','cf:createdAt','1452815193'
put 'testTable','10','cf:Id','10'


put 'testTable','15','cf:contactId','653638'
put 'testTable','15','cf:currentStatus','Awaiting'
put 'testTable','15','cf:createdAt','1452815193'
put 'testTable','15','cf:Id','15'
(Note: Here Addrees column is not provided.It means that NULL.)

put 'testTable','20','cf:Address','My Address 411003'
put 'testTable','20','cf:contactId','653638'
put 'testTable','20','cf:currentStatus','Awaiting'
put 'testTable','20','cf:createdAt','1452815193'
put 'testTable','20','cf:Id','20'


put 'testTable','17','cf:Address','My Address 411003'
put 'testTable','17','cf:currentStatus','Awaiting'
put 'testTable','17','cf:createdAt','1452815193'
put 'testTable','17','cf:Id','17'

STEP 3.

hive> CREATE external TABLE hh_testTable(Id string,Address string,contactId 
string,currentStatus string,createdAt string) STORED BY 
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES 
("hbase.columns.mapping"=":key,cf:Address,cf:contactId,cf:currentStatus,cf:createdAt")
 TBLPROPERTIES ("hbase.table.name"="testTable");

STEP 4.

hive> select count(*),contactid from hh_testTable group by contactid;

Actual result:
OK
3   653638

Expected result:
OK
1   NULL
3   653637




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] hive pull request #185: HIVE-16684: Bootstrap REPL DUMP shouldn't fail when ...

2017-05-23 Thread sankarh
Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/185


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] hive pull request #186: HIVE-16727: REPL DUMP for insert event should't fail...

2017-05-23 Thread sankarh
GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/186

HIVE-16727: REPL DUMP for insert event should't fail if the table is 
already dropped.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-16727

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/186.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #186


commit c06a9df4c2c1a735dc8c0139ebdf6451cc6c9017
Author: Sankar Hariappan 
Date:   2017-05-22T08:15:54Z

HIVE-16727: REPL DUMP for insert event should't fail if the table is 
already dropped.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (HIVE-16740) LLAP: Add metrics to measure IndexCache hit / miss details

2017-05-23 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HIVE-16740:
---

 Summary: LLAP: Add metrics to measure IndexCache hit / miss 
details 
 Key: HIVE-16740
 URL: https://issues.apache.org/jira/browse/HIVE-16740
 Project: Hive
  Issue Type: Bug
  Components: llap
Reporter: Rajesh Balamohan
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] hive pull request #3: Minor typo "connecton" -> "connection".

2017-05-23 Thread ClementNotin
Github user ClementNotin closed the pull request at:

https://github.com/apache/hive/pull/3


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (HIVE-16739) HoS DPP generates malformed plan when hive.tez.dynamic.semijoin.reduction is on

2017-05-23 Thread Rui Li (JIRA)
Rui Li created HIVE-16739:
-

 Summary: HoS DPP generates malformed plan when 
hive.tez.dynamic.semijoin.reduction is on
 Key: HIVE-16739
 URL: https://issues.apache.org/jira/browse/HIVE-16739
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li


HoS DPP currently can't handle dynamic semi join and will result in 
{{ClassCastException org.apache.hadoop.hive.ql.plan.ReduceWork cannot be cast 
to org.apache.hadoop.hive.ql.plan.MapWork}}.
We should either disable or implement it for HoS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Review Request 57009: HIVE-16029 - COLLECT_SET and COLLECT_LIST does not return NULL in the result

2017-05-23 Thread Eric Lin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57009/
---

(Updated May 23, 2017, 7:14 a.m.)


Review request for hive and Aihua Xu.


Changes
---

updated test cases that failed in the last patch.


Bugs: HIVE-16029
https://issues.apache.org/jira/browse/HIVE-16029


Repository: hive-git


Description
---

See the test case below:

{code}
0: jdbc:hive2://localhost:1/default> select * from collect_set_test;
+-+
| collect_set_test.a  |
+-+
| 1   |
| 2   |
| NULL|
| 4   |
| NULL|
+-+

0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
collect_set_test;
+---+
|  _c0  |
+---+
| [1,2,4]  |
+---+

{code}

The correct result should be:

{code}
0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
collect_set_test;
+---+
|  _c0  |
+---+
| [1,2,null,4]  |
+---+
{code}


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 
156d19b 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 
0c2cf90 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java
 2b5e6dd 
  ql/src/test/results/clientpositive/llap/udaf_collect_set_2.q.out aa55979 
  ql/src/test/results/clientpositive/spark/udaf_collect_set.q.out ee152ca 
  ql/src/test/results/clientpositive/udaf_collect_set.q.out ee152ca 
  ql/src/test/results/clientpositive/udaf_collect_set_2.q.out f2e76a7 


Diff: https://reviews.apache.org/r/57009/diff/3/

Changes: https://reviews.apache.org/r/57009/diff/2-3/


Testing
---

Manully tested and confirmed result is correct:

{code}
0: jdbc:hive2://localhost:1/default> select collect_set(a) from 
collect_set_test;
+---+
|  _c0  |
+---+
| [1,2,null,4]  |
+---+
{code}


Thanks,

Eric Lin



[jira] [Created] (HIVE-16738) Notification ID generation in DBNotification might not be unique

2017-05-23 Thread anishek (JIRA)
anishek created HIVE-16738:
--

 Summary: Notification ID generation in DBNotification might not be 
unique
 Key: HIVE-16738
 URL: https://issues.apache.org/jira/browse/HIVE-16738
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 3.0.0
Reporter: anishek
Assignee: anishek
 Fix For: 3.0.0


Going to explain the problem in scope of "replication" feature for hive 2 that 
is being built, as it is easier to explain:

To allow replication to work we need to set 
"hive.metastore.transactional.event.listeners"  to DBNotificationListener. For 
use cases where there are multiple HiveServer2 Instances running 
{code}
 private void process(NotificationEvent event, ListenerEvent listenerEvent) 
throws MetaException {
event.setMessageFormat(msgFactory.getMessageFormat());
synchronized (NOTIFICATION_TBL_LOCK) {
  LOG.debug("DbNotificationListener: Processing : {}:{}", 
event.getEventId(),
  event.getMessage());
  HMSHandler.getMSForConf(hiveConf).addNotificationEvent(event);
}

  // Set the DB_NOTIFICATION_EVENT_ID for future reference by other 
listeners.
  if (event.isSetEventId()) {
listenerEvent.putParameter(
MetaStoreEventListenerConstants.DB_NOTIFICATION_EVENT_ID_KEY_NAME,
Long.toString(event.getEventId()));
  }
  }
{code}
the above code in DBNotificationListner having the object lock wont be 
guarantee enough to make sure that all events get a unique id. The transaction 
isolation level at the db "read-comitted" or "repeatable-read"  would  also not 
guarantee the same, unless a lock is at the db level preferably on table 
{{NOTIFICATION_SEQUENCE}} which only has one row.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)