[jira] [Created] (HIVE-17796) PTF in a view disables PPD

2017-10-12 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-17796:
---

 Summary: PTF in a view disables PPD
 Key: HIVE-17796
 URL: https://issues.apache.org/jira/browse/HIVE-17796
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


I disabled constant propagation to make logging cleaner. It is the same if it 
is enabled. See truncated path to alias; 
Simple view with partition columns and filter outside of the view: PPD works.
View with PTF and filter included in the view: PPD works.
View with PTF and filter outside of the view: PPD breaks.

I looked at the logs for some time, it looks like the predicate is already null 
in this case when passed to partition pruner; not sure why this is happening 
for now.
View can also be partitioned.


{noformat}
set hive.mapred.mode=nonstrict;
set hive.explain.user=false;
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=1;
set hive.metastore.aggregate.stats.cache.enabled=false;
set hive.stats.fetch.column.stats=false;
set hive.cbo.enable=false;


create table dim (c2 string) partitioned by (pc1 string, pc2 string);
create table fact (c1 string, c3 string) partitioned by (pc1 string, pc2 
string);

insert overwrite table dim partition (pc1='aaa', pc2='aaa') select key from src;
insert overwrite table dim partition (pc1='ccc', pc2='ccc') select key from src;
insert overwrite table dim partition (pc1='ddd', pc2='ddd') select key from src;
insert overwrite table fact partition (pc1='aaa', pc2='aaa') select key, key 
from src;
insert overwrite table fact partition (pc1='bbb', pc2='bbb') select key, key 
from src;
insert overwrite table fact partition (pc1='ccc', pc2='ccc') select key, key 
from src;

create view vw_ptf as select a1.*,
(cast((row_number() over (partition by a1.pc1, a1.pc2)) as bigint) + b1.c2) as 
unique_key
from fact a1 join dim b1 on a1.pc1 = b1.pc1 and a1.pc2 = b1.pc2;

create view vw_simple as select a1.*, b1.c2
from fact a1 join dim b1 on a1.pc1 = b1.pc1 and a1.pc2 = b1.pc2;

create view vw_ppd as select a1.*,
(cast((row_number() over (partition by a1.pc1, a1.pc2)) as bigint) + b1.c2) as 
Unique_Key
from fact a1 join dim b1 on a1.pc1 = b1.pc1 and a1.pc2 = b1.pc2
where a1.pc1 = 'ccc' and a1.pc2='ccc';

set hive.optimize.constant.propagation=false;


explain extended
select a.* from vw_simple a WHERE 1 = 1 AND (a.pc1 = 'ccc' and a.pc2='ccc'); 
explain extended
select a.* from vw_ppd a WHERE 1 = 1 AND (a.pc1 = 'ccc' and a.pc2='ccc');
explain extended
select a.* from vw_ptf a WHERE 1 = 1 AND (a.pc1 = 'ccc' and a.pc2='ccc');
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] hive pull request #198: Hive 14535

2017-10-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/hive/pull/198


---


[jira] [Created] (HIVE-17795) Add distribution management tag in pom

2017-10-12 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-17795:
---

 Summary: Add distribution management tag in pom
 Key: HIVE-17795
 URL: https://issues.apache.org/jira/browse/HIVE-17795
 Project: Hive
  Issue Type: Bug
Reporter: Raja Aluri
Assignee: Raja Aluri






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Review Request 62954: HIVE-17782 Inconsistent cast behavior from string to numeric types with regards to leading/trailing spaces

2017-10-12 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62954/
---

Review request for hive and Sergey Shelukhin.


Bugs: HIVE-17782
https://issues.apache.org/jira/browse/HIVE-17782


Repository: hive-git


Description
---

Add option to trim spaces from lazy byte/short/int/long


Diffs
-

  common/src/java/org/apache/hive/common/util/HiveStringUtils.java dc3ee98 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToByte.java 12f530b 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java 1de7604 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToLong.java 11f6a6c 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToShort.java 5372549 
  ql/src/test/queries/clientpositive/cast2.q PRE-CREATION 
  ql/src/test/results/clientpositive/cast2.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyByte.java 1f9cead 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyInteger.java 22742aa 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyLong.java c0d52b9 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyShort.java b8b9488 
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyPrimitive.java 
3d7f11e 


Diff: https://reviews.apache.org/r/62954/diff/1/


Testing
---

Junit, qfile tests


Thanks,

Jason Dere



Re: How to measure the execution time of query on Hive on Tez

2017-10-12 Thread Prasanth Jayachandran
Hi

There are couple of tables that you are seeing, the top one is Query Execution 
Summary and next one is Task Execution Summary.

At the end of your query execution, you should see something like "Time taken: 
xx.xx seconds” in hive CLI or beeline. This represents the overall time the 
query took from start (start of semantic analysis) to finish (end of fetching 
results).

This overall time is broken down in the Query Execution Summary table. When you 
submit the query, query goes through multiple phases before returning the 
results. 
The time for each of these phases is defined in Query Execution Summary tables. 

If you want the query runtime then you should look at the last line in hive CLI 
or beeline which represents the end-to-end time.
If you are just interested in how long the query actually executed in the 
cluster then you should look at Run DAG time.

Run DAG is NOT sum of duration of all vertices as vertices can be scheduled in 
parallel. 
Run DAG is the time that Hive computes when it knows that DAG has been accepted 
and started by Application Master until the completion of DAG (could be failed, 
errored or succeeded).

Hope this helps.

Thanks
Prasanth

> On Oct 12, 2017, at 2:46 PM, Zhang, Liyun  wrote:
> 
> Hi all:
>   Maybe in last mail the attached picture is not shown.
> I re-described my question here.  I saw following statistics about the 
> runtime when running query.
> 
> The Run DAG is 318s.  But it is not the sum of DURATION of all 
> VERTICES((59549+4069+3055+3055+1004+1006+132736+34248+11077+1003+439+140896+35260+8070)/1000=435s
> Not the sum of CPU_TIME.   There are several indicator "RUN 
> DAG","DURATION","CPU_TIME",  which indicator I should use when measure the 
> performance? Sometimes I found there is significant improvement in sum of 
> (CPU_TIME) while there is no significant improvement in "RUN DAG".  Is this 
> normal?  Appreciate to get some feedback from you!
> 
> 
> 
> 2017-10-12T16:29:39,262  INFO [main] SessionState: 
> --
> 2017-10-12T16:29:39,262  INFO [main] SessionState: OPERATION  
>   DURATION
> 2017-10-12T16:29:39,262  INFO [main] SessionState: 
> --
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Compile Query  
>  3.72s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Prepare Plan   
>  0.60s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Submit Plan
>  0.61s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Start DAG  
>  0.52s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Run DAG
>318.54s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: 
> --
> 2017-10-12T16:29:39,263  INFO [main] SessionState:
> 2017-10-12T16:29:39,289  INFO [cea2258c-aa47-46a1-af5b-39860a6edbb3 main] 
> counters.Limits: Counter limits initialized with parameters:  
> GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=2000
> 2017-10-12T16:29:39,294  INFO [main] SessionState: Task Execution Summary
> 2017-10-12T16:29:39,294  INFO [main] SessionState: 
> --
> 2017-10-12T16:29:39,294  INFO [main] SessionState:   VERTICES  
> DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
> 2017-10-12T16:29:39,294  INFO [main] SessionState: 
> --
> 2017-10-12T16:29:39,298  INFO [main] SessionState:  Map 1  
> 59549.00  1,355,520 28,565 550,076,5541,602,119,842
> 2017-10-12T16:29:39,300  INFO [main] SessionState: Map 12   
> 4069.00 15,670522  73,049  732
> 2017-10-12T16:29:39,300  INFO [main] SessionState: Map 13   
> 3055.00 14,030567 212  212
> 2017-10-12T16:29:39,301  INFO [main] SessionState: Map 14   
> 3055.00 13,820606 212  212
> 2017-10-12T16:29:39,303  INFO [main] SessionState: Reducer 10   
> 1004.00 13,450265   44
> 2017-10-12T16:29:39,305  INFO [main] SessionState: Reducer 11   
> 1006.00  4,290 71 216  212
> 2017-10-12T16:29:39,307  INFO [main] SessionState:  Reducer 2 
> 132736.00  2,362,160 83,029 537,120,745  107,740,258
> 2017-10-12T16:29:39,308  INFO [main] SessionState:  Reducer 3  
> 34248.00643,350 20,661 

Review Request 62952: HIVE-17792

2017-10-12 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62952/
---

Review request for hive, Gopal V and Jason Dere.


Bugs: HIVE-17792
https://issues.apache.org/jira/browse/HIVE-17792


Repository: hive-git


Description
---

Enable Bucket Map Join when there are extra keys other than bucketed columns.
Added couple of test cases.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
e24760b90c 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java
 875ee9d842 
  ql/src/test/queries/clientpositive/bucket_map_join_tez1.q cac1d6a3d5 
  ql/src/test/results/clientpositive/llap/bucket_map_join_tez1.q.out 36cb4ac6c7 


Diff: https://reviews.apache.org/r/62952/diff/1/


Testing
---


Thanks,

Deepak Jaiswal



RE: How to measure the execution time of query on Hive on Tez

2017-10-12 Thread Zhang, Liyun
Hi all:
   Maybe in last mail the attached picture is not shown.
I re-described my question here.  I saw following statistics about the runtime 
when running query.

The Run DAG is 318s.  But it is not the sum of DURATION of all 
VERTICES((59549+4069+3055+3055+1004+1006+132736+34248+11077+1003+439+140896+35260+8070)/1000=435s
Not the sum of CPU_TIME.   There are several indicator "RUN 
DAG","DURATION","CPU_TIME",  which indicator I should use when measure the 
performance? Sometimes I found there is significant improvement in sum of 
(CPU_TIME) while there is no significant improvement in "RUN DAG".  Is this 
normal?  Appreciate to get some feedback from you!



2017-10-12T16:29:39,262  INFO [main] SessionState: 
--
2017-10-12T16:29:39,262  INFO [main] SessionState: OPERATION
DURATION
2017-10-12T16:29:39,262  INFO [main] SessionState: 
--
2017-10-12T16:29:39,263  INFO [main] SessionState: Compile Query
   3.72s
2017-10-12T16:29:39,263  INFO [main] SessionState: Prepare Plan 
   0.60s
2017-10-12T16:29:39,263  INFO [main] SessionState: Submit Plan  
   0.61s
2017-10-12T16:29:39,263  INFO [main] SessionState: Start DAG
   0.52s
2017-10-12T16:29:39,263  INFO [main] SessionState: Run DAG  
 318.54s
2017-10-12T16:29:39,263  INFO [main] SessionState: 
--
2017-10-12T16:29:39,263  INFO [main] SessionState:
2017-10-12T16:29:39,289  INFO [cea2258c-aa47-46a1-af5b-39860a6edbb3 main] 
counters.Limits: Counter limits initialized with parameters:  
GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=2000
2017-10-12T16:29:39,294  INFO [main] SessionState: Task Execution Summary
2017-10-12T16:29:39,294  INFO [main] SessionState: 
--
2017-10-12T16:29:39,294  INFO [main] SessionState:   VERTICES  DURATION(ms) 
  CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
2017-10-12T16:29:39,294  INFO [main] SessionState: 
--
2017-10-12T16:29:39,298  INFO [main] SessionState:  Map 1  59549.00 
 1,355,520 28,565 550,076,5541,602,119,842
2017-10-12T16:29:39,300  INFO [main] SessionState: Map 12   4069.00 
15,670522  73,049  732
2017-10-12T16:29:39,300  INFO [main] SessionState: Map 13   3055.00 
14,030567 212  212
2017-10-12T16:29:39,301  INFO [main] SessionState: Map 14   3055.00 
13,820606 212  212
2017-10-12T16:29:39,303  INFO [main] SessionState: Reducer 10   1004.00 
13,450265   44
2017-10-12T16:29:39,305  INFO [main] SessionState: Reducer 11   1006.00 
 4,290 71 216  212
2017-10-12T16:29:39,307  INFO [main] SessionState:  Reducer 2 132736.00 
 2,362,160 83,029 537,120,745  107,740,258
2017-10-12T16:29:39,308  INFO [main] SessionState:  Reducer 3  34248.00 
   643,350 20,661 107,740,470  203
2017-10-12T16:29:39,310  INFO [main] SessionState:  Reducer 4  11077.00 
77,020  1,496 203   31
2017-10-12T16:29:39,311  INFO [main] SessionState:  Reducer 5   1003.00 
40,030824  10   10
2017-10-12T16:29:39,312  INFO [main] SessionState:  Reducer 6439.00 
   590  0  100
2017-10-12T16:29:39,314  INFO [main] SessionState:  Reducer 7 140896.00 
 1,925,760 52,784 537,120,745  107,740,258
2017-10-12T16:29:39,316  INFO [main] SessionState:  Reducer 8  35260.00 
   590,200 22,331 107,740,470   76
2017-10-12T16:29:39,318  INFO [main] SessionState:  Reducer 9   8070.00 
24,630249  764
2017-10-12T16:29:39,318  INFO [main] SessionState: 
---

From: Zhang, Liyun [mailto:liyun.zh...@intel.com]
Sent: Thursday, October 12, 2017 4:40 PM
To: dev@hive.apache.org
Subject: How to measure the execution time of query on Hive on Tez

Hi  all:
  Anyone knows how to view the detail execution time of every map/reduce task 
in hive on tez?
I screenshot the result:
Run DAG is  324.s . But this is not the sum of DURATION time of every 

[jira] [Created] (HIVE-17794) HCatLoader breaks when a member is added to a struct-column of a table

2017-10-12 Thread Mithun Radhakrishnan (JIRA)
Mithun Radhakrishnan created HIVE-17794:
---

 Summary: HCatLoader breaks when a member is added to a 
struct-column of a table
 Key: HIVE-17794
 URL: https://issues.apache.org/jira/browse/HIVE-17794
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 2.2.0, 3.0.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan


When a table's schema evolves to add a new member to a struct column, Hive 
queries work fine, but {{HCatLoader}} breaks with the following trace:

{noformat}
TaskAttempt 1 failed, info=
 Error: Failure while running 
task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
while executing (Name: kite_composites_with_segments: Local Rearrange
 tuple
{chararray}(false) - scope-555-> scope-974 Operator Key: scope-555): 
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while 
executing (Name: gup: New For Each(false,false)
 bag
- scope-548 Operator Key: scope-548): 
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while 
executing (Name: gup_filtered: Filter
 bag
- scope-522 Operator Key: scope-522): 
org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
converting read value to tuple
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:127)
at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
Exception while executing (Name: gup: New For Each(false,false)
 bag
- scope-548 Operator Key: scope-548): 
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while 
executing (Name: gup_filtered: Filter
 bag
- scope-522 Operator Key: scope-522): 
org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
converting read value to tuple
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
... 17 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
Exception while executing (Name: gup_filtered: Filter
 bag
- scope-522 Operator Key: scope-522): 
org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
converting read value to tuple
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:90)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
... 19 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
converting read value to tuple
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:160)

[jira] [Created] (HIVE-17793) Parameterize Logging Messages

2017-10-12 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-17793:
--

 Summary: Parameterize Logging Messages
 Key: HIVE-17793
 URL: https://issues.apache.org/jira/browse/HIVE-17793
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 3.0.0
Reporter: BELUGA BEHR
Assignee: BELUGA BEHR
Priority: Trivial


* Use SLF4J parameterized logging
* Remove use of archaic Util's "stringifyException" and simply allow logging 
framework to handle formatting of output.  Also saves having to create the 
error message and then throwing it away when the logging level is set higher 
than the logging message
* Add some {{LOG.isDebugEnabled}} around complex debug messages



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17792) Enable Bucket Map Join when there are extra keys other than bucketed columns

2017-10-12 Thread Deepak Jaiswal (JIRA)
Deepak Jaiswal created HIVE-17792:
-

 Summary: Enable Bucket Map Join when there are extra keys other 
than bucketed columns
 Key: HIVE-17792
 URL: https://issues.apache.org/jira/browse/HIVE-17792
 Project: Hive
  Issue Type: Bug
Reporter: Deepak Jaiswal
Assignee: Deepak Jaiswal


Currently this wont go through Bucket Map Join(BMJ)

CREATE TABLE tab_part (key int, value string) PARTITIONED BY(ds STRING) 
CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE;
CREATE TABLE tab(key int, value string) PARTITIONED BY(ds STRING) STORED AS 
TEXTFILE;

select a.key, a.value, b.value
from tab a join tab_part b on a.key = b.key and a.value = b.value;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`

2017-10-12 Thread Mithun Radhakrishnan (JIRA)
Mithun Radhakrishnan created HIVE-17791:
---

 Summary: Temp dirs under the staging directory should honour 
`inheritPerms`
 Key: HIVE-17791
 URL: https://issues.apache.org/jira/browse/HIVE-17791
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Reporter: Mithun Radhakrishnan
Assignee: Chris Drome


For [~cdrome]:

CLI creates two levels of staging directories but calls setPermissions on the 
top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}.

The top-level directory, 
{{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}}
 is created the first time {{Context.getExternalTmpPath}} is called.

The child directory, 
{{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}}
 is created when {{TezTask.execute}} is called at line 164:

{code:java}
DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx);
{code}

This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}:

{code:java}
3770   private static void createTmpDirs(Configuration conf,
3771   List ops) throws IOException {
3772 
3773 while (!ops.isEmpty()) {
3774   Operator op = ops.remove(0);
3775 
3776   if (op instanceof FileSinkOperator) {
3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf();
3778 Path tempDir = fdesc.getDirName();
3779 
3780 if (tempDir != null) {
3781   Path tempPath = Utilities.toTempPath(tempDir);
3782   FileSystem fs = tempPath.getFileSystem(conf);
3783   fs.mkdirs(tempPath); // <-- HERE!
3784 }
3785   }
3786 
3787   if (op.getChildOperators() != null) {
3788 ops.addAll(op.getChildOperators());
3789   }
3790 }
3791   }
{code}

It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll rebase 
this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to wait till 
the issues around {{StorageBasedAuthProvider}}, directory permissions, etc. are 
sorted out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17790) Export/Import: Bug while getting auth entities due to which we write partition info during compilation phase

2017-10-12 Thread Vaibhav Gumashta (JIRA)
Vaibhav Gumashta created HIVE-17790:
---

 Summary: Export/Import: Bug while getting auth entities due to 
which we write partition info during compilation phase 
 Key: HIVE-17790
 URL: https://issues.apache.org/jira/browse/HIVE-17790
 Project: Hive
  Issue Type: Bug
  Components: repl
Affects Versions: 3.0.0
Reporter: Vaibhav Gumashta






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17789) Flaky test: TestSessionManagerMetrics.testAbandonedSessionMetrics has timing related problems

2017-10-12 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-17789:
-

 Summary: Flaky test: 
TestSessionManagerMetrics.testAbandonedSessionMetrics has timing related 
problems
 Key: HIVE-17789
 URL: https://issues.apache.org/jira/browse/HIVE-17789
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


The test is waiting for a worker thread to be timed out. The time after which 
the timeout should happen in 3000 ms. The test waits for 3200 ms, and sometimes 
this is not enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 62693: HIVE-17635: Add unit tests to CompactionTxnHandler and use PreparedStatements for queries

2017-10-12 Thread Sahil Takiar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62693/#review187818
---




metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
Line 387 (original), 411 (patched)


What is this for?



metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
Line 413 (original), 444 (patched)


Is this necessary?



metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
Line 432 (original), 475 (patched)


If we are changing this, should we just use try-with-resources.



metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
Lines 158 (patched)


why is a new return value necessary?



metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
Lines 170 (patched)


nit: extra newline



metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
Lines 192 (patched)


nit: delete newline


- Sahil Takiar


On Sept. 29, 2017, 4:51 p.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62693/
> ---
> 
> (Updated Sept. 29, 2017, 4:51 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Add a unit test which exercises CompactionTxnHandler.markFailed() and change 
> it to use PreparedStament.
> Add test for checkFailedCompactions() and change it to use PreparedStatement
> Add a unit test which exercises purgeCompactionHistory().
> Add buildQueryWithINClauseStrings() which is suitable for building in clauses 
> for PreparedStatement
> Add test code to TestTxnUtils to tickle code in 
> TxnUtils.buildQueryWithINClauseStrings() so that it produces multiple queries.
> Change markCleaned() to use PreparedStatement
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/HiveSchemaTool.java 
> 84963af10ec13979a7b3976be434efbc21cf2382 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
>  60839faa352cbf959041a455e9e780dfca0afdc3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java 
> 30b155f3b3311fed6cd79e46a5b2abcee9927d91 
>   metastore/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnUtils.java 
> 1497c00e5dc77c02e53767b014a23e5fd8cb5b29 
>   
> ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java
>  f8ae86bea3fe78374c0e0487d66c661f4f0d78ff 
> 
> 
> Diff: https://reviews.apache.org/r/62693/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>



[jira] [Created] (HIVE-17788) MetastoreConf not properly handling variable interpolation in config values

2017-10-12 Thread Alan Gates (JIRA)
Alan Gates created HIVE-17788:
-

 Summary: MetastoreConf not properly handling variable 
interpolation in config values
 Key: HIVE-17788
 URL: https://issues.apache.org/jira/browse/HIVE-17788
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates


HiveConf allows the user to put System properties into values in the conf file, 
which will later be interpolated at read time.  For example:
{code}

  hadoop.tmp.dir
  ${test.tmp.dir}/hadoop-tmp
  A base for other temporary directories.

{code}
The value for test.tmp.dir is read from the System properties at runtime.

MetastoreConf is instead returning the uninterpolated value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 62935: HIVE-17787: Apply more filters on the BeeLine test output files

2017-10-12 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62935/#review187800
---


Ship it!




Ship It!

- Peter Vary


On Oct. 12, 2017, 2:45 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62935/
> ---
> 
> (Updated Oct. 12, 2017, 2:45 p.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Bugs: HIVE-17787
> https://issues.apache.org/jira/browse/HIVE-17787
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This is a follow-up patch for HIVE-17569.
> When running the q tests with BeeLine, some known differences came up which 
> should be filtered out if the "test.beeline.compare.portable" parameter is 
> set to true.
> 
> 
> Diffs
> -
> 
>   itests/util/src/main/java/org/apache/hive/beeline/QFile.java 21be8b0 
> 
> 
> Diff: https://reviews.apache.org/r/62935/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



Review Request 62935: HIVE-17787: Apply more filters on the BeeLine test output files

2017-10-12 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62935/
---

Review request for hive and Peter Vary.


Bugs: HIVE-17787
https://issues.apache.org/jira/browse/HIVE-17787


Repository: hive-git


Description
---

This is a follow-up patch for HIVE-17569.
When running the q tests with BeeLine, some known differences came up which 
should be filtered out if the "test.beeline.compare.portable" parameter is set 
to true.


Diffs
-

  itests/util/src/main/java/org/apache/hive/beeline/QFile.java 21be8b0 


Diff: https://reviews.apache.org/r/62935/diff/1/


Testing
---


Thanks,

Marta Kuczora



[jira] [Created] (HIVE-17787) Apply more filters on the BeeLine tests output files (follow-up on HIVE-17569)

2017-10-12 Thread Marta Kuczora (JIRA)
Marta Kuczora created HIVE-17787:


 Summary: Apply more filters on the BeeLine tests output files 
(follow-up on HIVE-17569)
 Key: HIVE-17787
 URL: https://issues.apache.org/jira/browse/HIVE-17787
 Project: Hive
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17786) JdbcConnectionParams set exact host and port in Utils.java

2017-10-12 Thread Saijin Huang (JIRA)
Saijin Huang created HIVE-17786:
---

 Summary: JdbcConnectionParams set exact host and port in Utils.java
 Key: HIVE-17786
 URL: https://issues.apache.org/jira/browse/HIVE-17786
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Saijin Huang
Assignee: Saijin Huang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17785) Encription tests are not running

2017-10-12 Thread Peter Vary (JIRA)
Peter Vary created HIVE-17785:
-

 Summary: Encription tests are not running
 Key: HIVE-17785
 URL: https://issues.apache.org/jira/browse/HIVE-17785
 Project: Hive
  Issue Type: Bug
Reporter: Peter Vary
Assignee: Peter Vary


The testconfiguration.properties contains multiple tests in 
{{encrypted.query.files}}.
There is no comma at the end of the list, so the tests are not running



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


RE: How to measure the execution time of query on Hive on Tez

2017-10-12 Thread Zhang, Liyun
Hi all:
I know that some map tasks can be executed parallel, for example, Map1 and 
Map10 are running together like, so maybe we can not sum the DURATION of every 
tasks as execution time.

15:07:18,917  INFO [e1890b6a-a9fe-4e0d-8bd7-167603079db1 main] 
monitoring.RenderStrategy$LogToFileFunction: Map 1: 23(+23)/46 Map 10: 
23(+23)/46   Map 15: 1/1 Map 16: 1/1 Map 7: 1/1  Map 8: 1/1 
 Reducer 11: 127/127 Reducer 12: 0/35Reducer 13:



Best Regards
Kelly Zhang/Zhang,Liyun




From: Zhang, Liyun
Sent: Thursday, October 12, 2017 4:40 PM
To: 'dev@hive.apache.org' 
Subject: How to measure the execution time of query on Hive on Tez

Hi  all:
  Anyone knows how to view the detail execution time of every map/reduce task 
in hive on tez?
I screenshot the result:
Run DAG is  324.s . But this is not the sum of DURATION time of every tasks( 
665=163+22+1+1+2+3+143+31+4+0+254+29+8+2+1+1).  So which parameter DURATION(ms) 
or CPU_TIME(ms) should be used?
[cid:image001.png@01D3437B.97FA0660]

Appreciate to get some feedback from you!


Best Regards
Kelly Zhang/Zhang,Liyun



How to measure the execution time of query on Hive on Tez

2017-10-12 Thread Zhang, Liyun
Hi  all:
  Anyone knows how to view the detail execution time of every map/reduce task 
in hive on tez?
I screenshot the result:
Run DAG is  324.s . But this is not the sum of DURATION time of every tasks( 
665=163+22+1+1+2+3+143+31+4+0+254+29+8+2+1+1).  So which parameter DURATION(ms) 
or CPU_TIME(ms) should be used?
[cid:image001.png@01D34378.B0FDB5B0]


Appreciate to get some feedback from you!


Best Regards
Kelly Zhang/Zhang,Liyun



Question about some ObjectStore code

2017-10-12 Thread Alexander Kolbasov
Hello,

I noticed that there are some functions in ObjectStore (e.g. dropDatabase)
that do something like this:

...

success = commitTransaction();

...

return success;


The only case where commitTransaction returns false is when it was
rolled back already which is a pretty rare case.

In most cases, commitTransaction would simply throw some kind of
RuntimeException.

It looks the code that returns success status from commitTransaction()
is rather useless - does anyone know what is the thinking here?


- Alex


[jira] [Created] (HIVE-17784) Make Tez AM's Queue headroom calculation and nParallel tasks configurable.

2017-10-12 Thread Mithun Radhakrishnan (JIRA)
Mithun Radhakrishnan created HIVE-17784:
---

 Summary: Make Tez AM's Queue headroom calculation and nParallel 
tasks configurable.
 Key: HIVE-17784
 URL: https://issues.apache.org/jira/browse/HIVE-17784
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Tez
Affects Versions: 2.2.0, 3.0.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan


Here's a couple of customizations we made at Yahoo with Hive Tez AMs:
# When calculating splits, {{HiveSplitGenerator}} takes the entire queue's 
capacity as available, and generates splits accordingly. While this greedy 
algorithm might be acceptable for exclusive queues, on a shared queue, greedy 
queries will hold other queries up. The algorithm that calculates the queue's 
headroom should be pluggable. The greedy version can be the default.
# {{TEZ_AM_VERTEX_MAX_TASK_CONCURRENCY}} and the AM's heap-size can be tuned 
separately from the AM's container size. We found that users who attempt to 
increase vertex concurrency tend to forget to bump AM memory/container sizes. 
It would be handier if those values were derived from the container size.

I'm combining these into a single patch, for easier review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)