[jira] [Updated] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.

2017-06-05 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16813:

Description: 
Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata 
and data files corresponding to the event. The event is dumped in the same 
sequence in which it was generated.

Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and sort 
it using compareTo algorithm of FileStatus class which doesn't check the length 
before sorting it alphabetically.
Due to this, the event-100 is processed before event-99 and hence making the 
replica database non-sync with source.

Need to use a customized compareTo algorithm to sort the FileStatus.

  was:
Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata 
and data files corresponding to the event. The event is dumped in the same 
sequence in which it was generated.

Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and sort 
it using compareTo algorithm of FileStatus class which doesn't check the length 
before sorting it alphabetically.
Due to this, the event-100 is processed before event-99 and hence making the 
replica database unreliable.

Need to use a customized compareTo algorithm to sort the FileStatus.


> Incremental REPL LOAD should load the events in the same sequence as it is 
> dumped.
> --
>
> Key: HIVE-16813
> URL: https://issues.apache.org/jira/browse/HIVE-16813
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
>
> Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata 
> and data files corresponding to the event. The event is dumped in the same 
> sequence in which it was generated.
> Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and 
> sort it using compareTo algorithm of FileStatus class which doesn't check the 
> length before sorting it alphabetically.
> Due to this, the event-100 is processed before event-99 and hence making the 
> replica database non-sync with source.
> Need to use a customized compareTo algorithm to sort the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.

2017-06-05 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16813 started by Sankar Hariappan.
---
> Incremental REPL LOAD should load the events in the same sequence as it is 
> dumped.
> --
>
> Key: HIVE-16813
> URL: https://issues.apache.org/jira/browse/HIVE-16813
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
>
> Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata 
> and data files corresponding to the event. The event is dumped in the same 
> sequence in which it was generated.
> Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and 
> sort it using compareTo algorithm of FileStatus class which doesn't check the 
> length before sorting it alphabetically.
> Due to this, the event-100 is processed before event-99 and hence making the 
> replica database non-sync with source.
> Need to use a customized compareTo algorithm to sort the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036805#comment-16036805
 ] 

Rui Li commented on HIVE-16573:
---

[~anishek], yeah spark only supports in-place update in CLI. In the current 
method we don't check {{isHiveServerQuery}} for tez either (actually as Bing 
mentioned we can't access SessionState here), so I suppose the change is OK, 
right?

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573.1.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers

2017-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036787#comment-16036787
 ] 

Hive QA commented on HIVE-16824:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871197/HIVE-16824.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10820 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=157)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] 
(batchId=232)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5530/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5530/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5530/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12871197 - PreCommit-HIVE-Build

> PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
> --
>
> Key: HIVE-16824
> URL: https://issues.apache.org/jira/browse/HIVE-16824
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Attachments: HIVE-16824.1.patch
>
>
> PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers

2017-06-05 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16824:
-
Affects Version/s: 3.0.0

> PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
> --
>
> Key: HIVE-16824
> URL: https://issues.apache.org/jira/browse/HIVE-16824
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Attachments: HIVE-16824.1.patch
>
>
> PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036830#comment-16036830
 ] 

Hive QA commented on HIVE-11297:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871201/HIVE-11297.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10820 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] 
(batchId=232)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5531/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5531/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5531/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12871201 - PreCommit-HIVE-Build

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-12412) Multi insert queries fail to run properly in hive 1.1.x or later.

2017-06-05 Thread Niklaus Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklaus Xiao updated HIVE-12412:

Affects Version/s: 2.3.0

> Multi insert queries fail to run properly in hive 1.1.x or later.
> -
>
> Key: HIVE-12412
> URL: https://issues.apache.org/jira/browse/HIVE-12412
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.0, 2.3.0
>Reporter: John P. Petrakis
>  Labels: Correctness, CorrectnessBug
>
> We use multi insert queries to take data in one table and manipulate it by 
> inserting it into a results table.  Queries are of this form:
> from (select * from data_table lateral view explode(data_table.f2) f2 as 
> explode_f2) as explode_data_table  
>insert overwrite table results_table partition (q_id='C.P1',rl='1') 
>select 
>array(cast(if(explode_data_table.f1 is null or 
> explode_data_table.f1='', 'UNKNOWN',explode_data_table.f1) as 
> String),cast(explode_f2.s1 as String)) as dimensions, 
>ARRAY(CAST(sum(explode_f2.d1) as Double)) as metrics, 
>null as rownm 
>where (explode_data_table.date_id between 20151016 and 20151016)
>group by 
>if(explode_data_table.f1 is null or explode_data_table.f1='', 
> 'UNKNOWN',explode_data_table.f1),
>explode_f2.s1 
>INSERT OVERWRITE TABLE results_table PARTITION (q_id='C.P2',rl='0') 
>SELECT ARRAY(CAST('Total' as String),CAST('Total' as String)) AS 
> dimensions, 
>ARRAY(CAST(sum(explode_f2.d1) as Double)) AS metrics, 
>null AS rownm 
>WHERE (explode_data_table.date_id BETWEEN 20151016 AND 20151016) 
>INSERT OVERWRITE TABLE results_table PARTITION (q_id='C.P5',rl='0') 
>SELECT 
>ARRAY(CAST('Total' as String)) AS dimensions, 
>ARRAY(CAST(sum(explode_f2.d1) as Double)) AS metrics, 
>null AS rownm 
>WHERE (explode_data_table.date_id BETWEEN 20151016 AND 20151016)
> This query is meant to total a given field of a struct that is potentially a 
> list of structs.  For our test data set, which consists of a single row, the 
> summation yields "Null",  with messages in the hive log of the nature:
> Missing fields! Expected 2 fields but only got 1! Ignoring similar problems.
> or "Extra fields detected..."
> For significantly more data, this query will eventually cause a run time 
> error while processing a column (caused by array index out of bounds 
> exception in one of the lazy binary classes such as LazyBinaryString or 
> LazyBinaryStruct).
> Using the query above from the hive command line, the following data was used:
> (note there are tabs in the data below)
> string oneone:1.0:1.00:10.0,eon:1.0:1.00:100.0
> string twotwo:2.0:2.00:20.0,otw:2.0:2.00:20.0,wott:2.0:2.00:20.0
> string thrthree:3.0:3.00:30.0
> string foufour:4.0:4.00:40.0
> There are two fields, a string, (eg. 'string one') and a list of structs.  
> The following is used to create the table:
> create table if not exists t1 (
>  f1 string, 
>   f2 
> array>
>  )
>   partitioned by (clid string, date_id string) 
>   row format delimited fields 
>  terminated by '09' 
>  collection items terminated by ',' 
>  map keys terminated by ':'
>  lines terminated by '10' 
>  location '/user/hive/warehouse/t1';
> And the following is used to load the data:
> load data local inpath '/path/to/data/file/cplx_test.data2' OVERWRITE  into 
> table t1  partition(client_id='987654321',date_id='20151016');
> The resulting table should yield the following:
> ["string fou","four"] [4.0]   nullC.P11   
> ["string one","eon"]  [1.0]   nullC.P11   
> ["string one","one"]  [1.0]   nullC.P11   
> ["string thr","three"][3.0]   nullC.P11   
> ["string two","otw"]  [2.0]   nullC.P11   
> ["string two","two"]  [2.0]   nullC.P11   
> ["string two","wott"] [2.0]   nullC.P11   
> ["Total","Total"] [15.0]  nullC.P20   
> ["Total"] [15.0]  nullC.P50   
> However what we get is:
> Hive Runtime Error while processing row 
> {"_col2":2.5306499719322744E-258,"_col3":""} (ultimately due to an array 
> index out of bounds exception)
> If we reduce the above data to a SINGLE row, the we don't get an exception 
> but the total fields come out as NULL.
> The ONLY way this query would work is 
> 1) if I added a group by (date_id) or even group by ('') as the last line in 
> the query... or removed the last where 

[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036697#comment-16036697
 ] 

Rui Li commented on HIVE-16573:
---

+1. [~anishek] would you mind also have a look? Thanks

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573.1.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16780) Case "multiple sources, single key" in spark_dynamic_pruning.q fails

2017-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036722#comment-16036722
 ] 

Hive QA commented on HIVE-16780:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871199/HIVE-16780.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10820 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_reverse] (batchId=83)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] 
(batchId=232)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5529/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5529/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5529/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12871199 - PreCommit-HIVE-Build

> Case "multiple sources, single key" in spark_dynamic_pruning.q fails 
> -
>
> Key: HIVE-16780
> URL: https://issues.apache.org/jira/browse/HIVE-16780
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16780.patch
>
>
> script.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> set hive.spark.dynamic.partition.pruning=true;
> -- multiple sources, single key
> select count(*) from srcpart join srcpart_date on (srcpart.ds = 
> srcpart_date.ds) join srcpart_hour on (srcpart.hr = srcpart_hour.hr)
> {code}
> if disabling "hive.optimize.index.filter", case passes otherwise it always 
> hang out in the first job. Exception
> {code}
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 PerfLogger:  method=SparkInitializeOperators start=1495899585574 end=1495899585933 
> duration=359 from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler>
> 17/05/27 23:39:45 INFO Executor task launch worker-0 Utilities: PLAN PATH = 
> hdfs://bdpe41:8020/tmp/hive/root/029a2d8a-c6e5-4ea9-adea-ef8fbea3cde2/hive_2017-05-27_23-39-06_464_5915518562441677640-1/-mr-10007/617d9dd6-9f9a-4786-8131-a7b98e8abc3e/map.xml
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 Utilities: Found plan 
> in cache for name: map.xml
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 DFSClient: Connecting 
> to datanode 10.239.47.162:50010
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 MapOperator: Processing 
> alias(es) srcpart_hour for file 
> hdfs://bdpe41:8020/user/hive/warehouse/srcpart_hour/08_0
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 ObjectCache: Creating 
> root_20170527233906_ac2934e1-2e58-4116-9f0d-35dee302d689_DynamicValueRegistry
> 17/05/27 23:39:45 ERROR Executor task launch worker-0 SparkMapRecordHandler: 
> Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: Hive 
> Runtime Error while processing row {"hr":"11","hour":"11"}
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"hr":"11","hour":"11"}
>  at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:136)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>  at 
> 

[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036736#comment-16036736
 ] 

anishek commented on HIVE-16573:


+1 looks good, I think this bug to manage the in place update progress on 
hive-cli side, this still does not take care of showing the progress on beeline 
side ?

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573.1.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036829#comment-16036829
 ] 

anishek commented on HIVE-16573:


[~lirui] yes change is fine, I was just confirming that we were on the same 
page: that this is only for cli. 

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573.1.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-05 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-11297:

Attachment: HIVE-11297.2.patch

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers

2017-06-05 Thread ZhangBing Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036845#comment-16036845
 ] 

ZhangBing Lin commented on HIVE-16824:
--

Unit tests failed not related to the patch

> PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
> --
>
> Key: HIVE-16824
> URL: https://issues.apache.org/jira/browse/HIVE-16824
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Attachments: HIVE-16824.1.patch
>
>
> PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16768) NOT operator returns NULL from result of <=>

2017-06-05 Thread Fei Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037073#comment-16037073
 ] 

Fei Hui commented on HIVE-16768:


[~sterligovak] resolve it as duplicate. Please reopen it if it is not fixed

> NOT operator returns NULL from result of <=>
> 
>
> Key: HIVE-16768
> URL: https://issues.apache.org/jira/browse/HIVE-16768
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Alexander Sterligov
>Assignee: Fei Hui
>
> {{SELECT "foo" <=> null;}}
> returns {{false}} as expected.
> {{SELECT NOT("foo" <=> null);}}
> returns NULL, but should return {{true}}.
> Workaround is
> {{SELECT NOT(COALESCE("foo" <=> null));}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16736) General Improvements to BufferedRows

2017-06-05 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-16736:
---
Description: General improvements for {{BufferedRows.java}}.  Use 
{{ArrayList}} instead of {{LinkedList}} to conserve memory for large data sets, 
prevent having to loop through the entire data set twice in {{normalizeWidths}} 
method, some simplifications.  (was: General improvements for 
{{BufferedRows.java}}.  Use {{ArrayList}} instead of {{LinkedList}}, prevent 
having to loop through the entire data set twice in {{normalizeWidths}} method, 
some simplifications.)

> General Improvements to BufferedRows
> 
>
> Key: HIVE-16736
> URL: https://issues.apache.org/jira/browse/HIVE-16736
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-16736.1.patch
>
>
> General improvements for {{BufferedRows.java}}.  Use {{ArrayList}} instead of 
> {{LinkedList}} to conserve memory for large data sets, prevent having to loop 
> through the entire data set twice in {{normalizeWidths}} method, some 
> simplifications.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-16768) NOT operator returns NULL from result of <=>

2017-06-05 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui resolved HIVE-16768.

Resolution: Duplicate

> NOT operator returns NULL from result of <=>
> 
>
> Key: HIVE-16768
> URL: https://issues.apache.org/jira/browse/HIVE-16768
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Alexander Sterligov
>Assignee: Fei Hui
>
> {{SELECT "foo" <=> null;}}
> returns {{false}} as expected.
> {{SELECT NOT("foo" <=> null);}}
> returns NULL, but should return {{true}}.
> Workaround is
> {{SELECT NOT(COALESCE("foo" <=> null));}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16736) General Improvements to BufferedRows

2017-06-05 Thread BELUGA BEHR (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037140#comment-16037140
 ] 

BELUGA BEHR commented on HIVE-16736:


Unrelated test failures

> General Improvements to BufferedRows
> 
>
> Key: HIVE-16736
> URL: https://issues.apache.org/jira/browse/HIVE-16736
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-16736.1.patch
>
>
> General improvements for {{BufferedRows.java}}.  Use {{ArrayList}} instead of 
> {{LinkedList}}, prevent having to loop through the entire data set twice in 
> {{normalizeWidths}} method, some simplifications.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16758) Better Select Number of Replications

2017-06-05 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-16758:
---
Attachment: HIVE-16758.1.patch

> Better Select Number of Replications
> 
>
> Key: HIVE-16758
> URL: https://issues.apache.org/jira/browse/HIVE-16758
> Project: Hive
>  Issue Type: Improvement
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-16758.1.patch
>
>
> {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}}
> We should be smarter about how we pick a replication number.  We should add a 
> new configuration equivalent to {{mapreduce.client.submit.file.replication}}. 
>  This value should be around the square root of the number of nodes and not 
> hard-coded in the code.
> {code}
> public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
> private int minReplication = 10;
>   @Override
>   protected void initializeOp(Configuration hconf) throws HiveException {
> ...
> int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
> // minReplication value should not cross the value of dfs.replication.max
> minReplication = Math.min(minReplication, dfsMaxReplication);
>   }
> {code}
> https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16758) Better Select Number of Replications

2017-06-05 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-16758:
---
Status: Patch Available  (was: Open)

> Better Select Number of Replications
> 
>
> Key: HIVE-16758
> URL: https://issues.apache.org/jira/browse/HIVE-16758
> Project: Hive
>  Issue Type: Improvement
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-16758.1.patch
>
>
> {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}}
> We should be smarter about how we pick a replication number.  We should add a 
> new configuration equivalent to {{mapreduce.client.submit.file.replication}}. 
>  This value should be around the square root of the number of nodes and not 
> hard-coded in the code.
> {code}
> public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
> private int minReplication = 10;
>   @Override
>   protected void initializeOp(Configuration hconf) throws HiveException {
> ...
> int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
> // minReplication value should not cross the value of dfs.replication.max
> minReplication = Math.min(minReplication, dfsMaxReplication);
>   }
> {code}
> https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery

2017-06-05 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037185#comment-16037185
 ] 

Xuefu Zhang commented on HIVE-6348:
---

I'm wondering if it makes more sense to optimize the query rather than banning 
it. While it might be dumb and inefficient, I don't quite see anything wrong in 
semantics.

> Order by/Sort by in subquery
> 
>
> Key: HIVE-6348
> URL: https://issues.apache.org/jira/browse/HIVE-6348
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Rui Li
>Priority: Minor
>  Labels: sub-query
> Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15144) JSON.org license is now CatX

2017-06-05 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-15144:
-
Attachment: HIVE-15144.patch

Update patch to include fixes from dvoros.

> JSON.org license is now CatX
> 
>
> Key: HIVE-15144
> URL: https://issues.apache.org/jira/browse/HIVE-15144
> Project: Hive
>  Issue Type: Bug
>Reporter: Robert Kanter
>Priority: Blocker
> Fix For: 2.2.0
>
> Attachments: HIVE-15144.patch, HIVE-15144.patch
>
>
> per [update resolved legal|http://www.apache.org/legal/resolved.html#json]:
> {quote}
> CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE?
> No. As of 2016-11-03 this has been moved to the 'Category X' license list. 
> Prior to this, use of the JSON Java library was allowed. See Debian's page 
> for a list of alternatives.
> {quote}
> I'm not sure when this dependency was first introduced, but it looks like 
> it's currently used in a few places:
> https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15144) JSON.org license is now CatX

2017-06-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037203#comment-16037203
 ] 

ASF GitHub Bot commented on HIVE-15144:
---

Github user omalley closed the pull request at:

https://github.com/apache/hive/pull/188


> JSON.org license is now CatX
> 
>
> Key: HIVE-15144
> URL: https://issues.apache.org/jira/browse/HIVE-15144
> Project: Hive
>  Issue Type: Bug
>Reporter: Robert Kanter
>Priority: Blocker
> Fix For: 2.2.0
>
> Attachments: HIVE-15144.patch, HIVE-15144.patch
>
>
> per [update resolved legal|http://www.apache.org/legal/resolved.html#json]:
> {quote}
> CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE?
> No. As of 2016-11-03 this has been moved to the 'Category X' license list. 
> Prior to this, use of the JSON Java library was allowed. See Debian's page 
> for a list of alternatives.
> {quote}
> I'm not sure when this dependency was first introduced, but it looks like 
> it's currently used in a few places:
> https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16758) Better Select Number of Replications

2017-06-05 Thread BELUGA BEHR (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037179#comment-16037179
 ] 

BELUGA BEHR commented on HIVE-16758:


Patch:

# Set the default number of replications to 1 to support single-node test 
clusters
# Determine the number of replications based on 
{{mapreduce.client.submit.file.replication}} instead of DFS replication max
# Removed logic which increased the Hash Table Sink file replication to be 
based on the target directory's default replication instead of the configured 
amount.  This is confusing because it overrides a user setting without 
explaining to the user why their configuration has been changed.  Additionally, 
this replication is about making the data locality reasonable for Executor 
tasks and not about protecting data.  The default replication value has a very 
different goal than this replication value and therefore should not be linked.

> Better Select Number of Replications
> 
>
> Key: HIVE-16758
> URL: https://issues.apache.org/jira/browse/HIVE-16758
> Project: Hive
>  Issue Type: Improvement
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-16758.1.patch
>
>
> {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}}
> We should be smarter about how we pick a replication number.  We should add a 
> new configuration equivalent to {{mapreduce.client.submit.file.replication}}. 
>  This value should be around the square root of the number of nodes and not 
> hard-coded in the code.
> {code}
> public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
> private int minReplication = 10;
>   @Override
>   protected void initializeOp(Configuration hconf) throws HiveException {
> ...
> int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
> // minReplication value should not cross the value of dfs.replication.max
> minReplication = Math.min(minReplication, dfsMaxReplication);
>   }
> {code}
> https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15144) JSON.org license is now CatX

2017-06-05 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-15144:
-
Attachment: HIVE-15144.patch

Remove the json license file.

> JSON.org license is now CatX
> 
>
> Key: HIVE-15144
> URL: https://issues.apache.org/jira/browse/HIVE-15144
> Project: Hive
>  Issue Type: Bug
>Reporter: Robert Kanter
>Priority: Blocker
> Fix For: 2.2.0
>
> Attachments: HIVE-15144.patch, HIVE-15144.patch, HIVE-15144.patch
>
>
> per [update resolved legal|http://www.apache.org/legal/resolved.html#json]:
> {quote}
> CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE?
> No. As of 2016-11-03 this has been moved to the 'Category X' license list. 
> Prior to this, use of the JSON Java library was allowed. See Debian's page 
> for a list of alternatives.
> {quote}
> I'm not sure when this dependency was first introduced, but it looks like 
> it's currently used in a few places:
> https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery

2017-06-05 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037241#comment-16037241
 ] 

Ashutosh Chauhan commented on HIVE-6348:


Indeed optimizing away inner query sort (without limit) is much more user 
friendly then throwing up exception.

> Order by/Sort by in subquery
> 
>
> Key: HIVE-6348
> URL: https://issues.apache.org/jira/browse/HIVE-6348
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Rui Li
>Priority: Minor
>  Labels: sub-query
> Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16808) WebHCat statusdir parameter doesn't properly handle Unicode characters when using relative path

2017-06-05 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037253#comment-16037253
 ] 

Daniel Dai commented on HIVE-16808:
---

+1

> WebHCat statusdir parameter doesn't properly handle Unicode characters when 
> using relative path
> ---
>
> Key: HIVE-16808
> URL: https://issues.apache.org/jira/browse/HIVE-16808
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16808.01.patch, HIVE-16808.02.patch, 
> HIVE-16808.03.patch
>
>
> {noformat}
> curl http://.:20111/templeton/v1/hive?user.name=hive -d execute="select 
> count(*) from default.all100k" -d statusdir="/user/hive/düsseldorf7"
> curl http://:20111/templeton/v1/hive?user.name=hive -d execute="select 
> count(*) from default.all100k" -d statusdir="/user/hive/䶴狝A﨩O"
> {noformat}
> will create statusdirs like so
> {noformat}
> /user/hive/düsseldorf-1
> drwxr-xr-x   - hive hive  0 2017-06-01 19:01 /user/hive/düsseldorf7
> drwxr-xr-x   - hive hive  0 2017-06-01 19:08 /user/hive/䶴狝A﨩O
> {noformat}
> but
> {noformat}
> curl http://.:20111/templeton/v1/hive?user.name=hive -d execute="select 
> count(*) from default.all100k" -d statusdir="düsseldorf7"
> curl http://:20111/templeton/v1/hive?user.name=hive -d execute="select 
> count(*) from default.all100k" -d statusdir="䶴狝A﨩O"
> {noformat}
> Will create 
> {noformat}
> drwxr-xr-x   - hive hive  0 2017-06-01 00:27 
> /user/hive/d%C3%BCsseldorf7
> drwxr-xr-x   - hive hive  0 2017-06-01 22:33 
> /user/hive/%E4%B6%B4%E7%8B%9DA%EF%A8%A9O
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.

2017-06-05 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16813:

Status: Patch Available  (was: In Progress)

> Incremental REPL LOAD should load the events in the same sequence as it is 
> dumped.
> --
>
> Key: HIVE-16813
> URL: https://issues.apache.org/jira/browse/HIVE-16813
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Attachments: HIVE-16813.01.patch
>
>
> Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata 
> and data files corresponding to the event. The event is dumped in the same 
> sequence in which it was generated.
> Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and 
> sort it using compareTo algorithm of FileStatus class which doesn't check the 
> length before sorting it alphabetically.
> Due to this, the event-100 is processed before event-99 and hence making the 
> replica database non-sync with source.
> Need to use a customized compareTo algorithm to sort the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16797) Enhance HiveFilterSetOpTransposeRule to remove union branches

2017-06-05 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16797:
---
Status: Patch Available  (was: Open)

> Enhance HiveFilterSetOpTransposeRule to remove union branches
> -
>
> Key: HIVE-16797
> URL: https://issues.apache.org/jira/browse/HIVE-16797
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16797.01.patch, HIVE-16797.02.patch
>
>
> in query4.q, we can see that it creates a CTE with union all of 3 branches. 
> Then it is going to do a 3 way self-join of the CTE with predicates. The 
> predicates actually specifies only one of the branch in CTE to participate in 
> the join. Thus, in some cases, e.g.,
> {code}
>/- filter(false) -TS0 
> union all  - filter(false) -TS1
>\-TS2
> {code}
> we can cut the branches of TS0 and TS1. The union becomes only TS2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16797) Enhance HiveFilterSetOpTransposeRule to remove union branches

2017-06-05 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16797:
---
Attachment: HIVE-16797.02.patch

> Enhance HiveFilterSetOpTransposeRule to remove union branches
> -
>
> Key: HIVE-16797
> URL: https://issues.apache.org/jira/browse/HIVE-16797
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16797.01.patch, HIVE-16797.02.patch
>
>
> in query4.q, we can see that it creates a CTE with union all of 3 branches. 
> Then it is going to do a 3 way self-join of the CTE with predicates. The 
> predicates actually specifies only one of the branch in CTE to participate in 
> the join. Thus, in some cases, e.g.,
> {code}
>/- filter(false) -TS0 
> union all  - filter(false) -TS1
>\-TS2
> {code}
> we can cut the branches of TS0 and TS1. The union becomes only TS2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16797) Enhance HiveFilterSetOpTransposeRule to remove union branches

2017-06-05 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16797:
---
Status: Open  (was: Patch Available)

> Enhance HiveFilterSetOpTransposeRule to remove union branches
> -
>
> Key: HIVE-16797
> URL: https://issues.apache.org/jira/browse/HIVE-16797
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16797.01.patch, HIVE-16797.02.patch
>
>
> in query4.q, we can see that it creates a CTE with union all of 3 branches. 
> Then it is going to do a 3 way self-join of the CTE with predicates. The 
> predicates actually specifies only one of the branch in CTE to participate in 
> the join. Thus, in some cases, e.g.,
> {code}
>/- filter(false) -TS0 
> union all  - filter(false) -TS1
>\-TS2
> {code}
> we can cut the branches of TS0 and TS1. The union becomes only TS2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16758) Better Select Number of Replications

2017-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037254#comment-16037254
 ] 

Hive QA commented on HIVE-16758:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871255/HIVE-16758.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10820 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=46)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] 
(batchId=232)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5532/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5532/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5532/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12871255 - PreCommit-HIVE-Build

> Better Select Number of Replications
> 
>
> Key: HIVE-16758
> URL: https://issues.apache.org/jira/browse/HIVE-16758
> Project: Hive
>  Issue Type: Improvement
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-16758.1.patch
>
>
> {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}}
> We should be smarter about how we pick a replication number.  We should add a 
> new configuration equivalent to {{mapreduce.client.submit.file.replication}}. 
>  This value should be around the square root of the number of nodes and not 
> hard-coded in the code.
> {code}
> public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
> private int minReplication = 10;
>   @Override
>   protected void initializeOp(Configuration hconf) throws HiveException {
> ...
> int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
> // minReplication value should not cross the value of dfs.replication.max
> minReplication = Math.min(minReplication, dfsMaxReplication);
>   }
> {code}
> https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery

2017-06-05 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037261#comment-16037261
 ] 

Vineet Garg commented on HIVE-6348:
---

I agree with [~ashutoshc] and [~xuefuz]. If we do insist on letting users know 
about order by/sort by IMHO showing a warning and then proceeding with the 
query or optimizing to remove order by would be better.

> Order by/Sort by in subquery
> 
>
> Key: HIVE-6348
> URL: https://issues.apache.org/jira/browse/HIVE-6348
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Rui Li
>Priority: Minor
>  Labels: sub-query
> Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16797) Enhance HiveFilterSetOpTransposeRule to remove union branches

2017-06-05 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037265#comment-16037265
 ] 

Pengcheng Xiong commented on HIVE-16797:


I use pull-up-constant to pull the constant out of union. Then use RexSimplify 
to simplify that to see if it can be reduced to always false. However, there 
are still several comments regarding patch 02: (1) it sounds like that I was 
not able to simplify (($2=1 OR $2=2) AND $2=3) to a false. Here ($2=1 OR $2=2) 
comes from two branches of union. Thus, I introduce a hive union merge rule 
(btw, calcite union merge rule does not fire well in current hive master) so 
that we can check  ($2=1 AND $2=3) and ($2=2 AND $2=3), repectively, which 
works with RexSimplify. (2) it sounds like RexSimplify also can not reduce 
($2>2 AND $2=3) to false. There is a test case in filter_union.q for that and 
you will see. (3) if we can assume that it is always a  project under union, we 
may have better options.

> Enhance HiveFilterSetOpTransposeRule to remove union branches
> -
>
> Key: HIVE-16797
> URL: https://issues.apache.org/jira/browse/HIVE-16797
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16797.01.patch, HIVE-16797.02.patch
>
>
> in query4.q, we can see that it creates a CTE with union all of 3 branches. 
> Then it is going to do a 3 way self-join of the CTE with predicates. The 
> predicates actually specifies only one of the branch in CTE to participate in 
> the join. Thus, in some cases, e.g.,
> {code}
>/- filter(false) -TS0 
> union all  - filter(false) -TS1
>\-TS2
> {code}
> we can cut the branches of TS0 and TS1. The union becomes only TS2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16797) Enhance HiveFilterSetOpTransposeRule to remove union branches

2017-06-05 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037265#comment-16037265
 ] 

Pengcheng Xiong edited comment on HIVE-16797 at 6/5/17 5:42 PM:


I use pull-up-constant to pull the constant out of union. Then use RexSimplify 
to simplify that to see if it can be reduced to always false. However, there 
are still several comments regarding patch 02: (1) it sounds like that I was 
not able to simplify (($2=1 OR $2=2) AND $2=3) to a false. Here ($2=1 OR $2=2) 
comes from two branches of union. Thus, I introduce a hive union merge rule 
(btw, calcite union merge rule does not fire well in current hive master) so 
that we can check  ($2=1 AND $2=3) and ($2=2 AND $2=3), repectively, which 
works with RexSimplify. (2) it sounds like RexSimplify also can not reduce 
($2>2 AND $2=3) to false. There is a test case in filter_union.q for that and 
you will see. (3) if we can assume that it is always a  project under union, we 
may have better options. (4) for tpcds queries, current patch is good enough.


was (Author: pxiong):
I use pull-up-constant to pull the constant out of union. Then use RexSimplify 
to simplify that to see if it can be reduced to always false. However, there 
are still several comments regarding patch 02: (1) it sounds like that I was 
not able to simplify (($2=1 OR $2=2) AND $2=3) to a false. Here ($2=1 OR $2=2) 
comes from two branches of union. Thus, I introduce a hive union merge rule 
(btw, calcite union merge rule does not fire well in current hive master) so 
that we can check  ($2=1 AND $2=3) and ($2=2 AND $2=3), repectively, which 
works with RexSimplify. (2) it sounds like RexSimplify also can not reduce 
($2>2 AND $2=3) to false. There is a test case in filter_union.q for that and 
you will see. (3) if we can assume that it is always a  project under union, we 
may have better options.

> Enhance HiveFilterSetOpTransposeRule to remove union branches
> -
>
> Key: HIVE-16797
> URL: https://issues.apache.org/jira/browse/HIVE-16797
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16797.01.patch, HIVE-16797.02.patch
>
>
> in query4.q, we can see that it creates a CTE with union all of 3 branches. 
> Then it is going to do a 3 way self-join of the CTE with predicates. The 
> predicates actually specifies only one of the branch in CTE to participate in 
> the join. Thus, in some cases, e.g.,
> {code}
>/- filter(false) -TS0 
> union all  - filter(false) -TS1
>\-TS2
> {code}
> we can cut the branches of TS0 and TS1. The union becomes only TS2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.

2017-06-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037283#comment-16037283
 ] 

ASF GitHub Bot commented on HIVE-16813:
---

GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/192

HIVE-16813: Incremental REPL LOAD should load the events in the same 
sequence as it is dumped.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-16813

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #192


commit ba709ab7101cad69d5f6fc82bf031fea217e669e
Author: Sankar Hariappan 
Date:   2017-06-05T17:53:34Z

HIVE-16813: Incremental REPL LOAD should load the events in the same 
sequence as it is dumped.




> Incremental REPL LOAD should load the events in the same sequence as it is 
> dumped.
> --
>
> Key: HIVE-16813
> URL: https://issues.apache.org/jira/browse/HIVE-16813
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
>
> Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata 
> and data files corresponding to the event. The event is dumped in the same 
> sequence in which it was generated.
> Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and 
> sort it using compareTo algorithm of FileStatus class which doesn't check the 
> length before sorting it alphabetically.
> Due to this, the event-100 is processed before event-99 and hence making the 
> replica database non-sync with source.
> Need to use a customized compareTo algorithm to sort the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.

2017-06-05 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16813:

Attachment: HIVE-16813.01.patch

Added 01.patch with below changes.
- Added a custom comparator (EventDumpDirComparator) to sort the directories 
listed during REPL LOAD. It compares the dir length before comparing the 
directory name Strings.
- Added unit tests to verify the new comparator and also to verify the bug with 
default FileStatus comparator.

> Incremental REPL LOAD should load the events in the same sequence as it is 
> dumped.
> --
>
> Key: HIVE-16813
> URL: https://issues.apache.org/jira/browse/HIVE-16813
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Attachments: HIVE-16813.01.patch
>
>
> Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata 
> and data files corresponding to the event. The event is dumped in the same 
> sequence in which it was generated.
> Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and 
> sort it using compareTo algorithm of FileStatus class which doesn't check the 
> length before sorting it alphabetically.
> Due to this, the event-100 is processed before event-99 and hence making the 
> replica database non-sync with source.
> Need to use a customized compareTo algorithm to sort the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.

2017-06-05 Thread Sankar Hariappan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037291#comment-16037291
 ] 

Sankar Hariappan edited comment on HIVE-16813 at 6/5/17 6:00 PM:
-

Added 01.patch with below changes.
- Added a custom comparator (EventDumpDirComparator) to sort the directories 
listed during REPL LOAD. It compares the dir length before comparing the 
directory name Strings.
- Added unit tests to verify the new comparator and also to verify the bug with 
default FileStatus comparator.

Request [~anishek] / [~sushanth] to kindly review the patch!
cc [~thejas]


was (Author: sankarh):
Added 01.patch with below changes.
- Added a custom comparator (EventDumpDirComparator) to sort the directories 
listed during REPL LOAD. It compares the dir length before comparing the 
directory name Strings.
- Added unit tests to verify the new comparator and also to verify the bug with 
default FileStatus comparator.

> Incremental REPL LOAD should load the events in the same sequence as it is 
> dumped.
> --
>
> Key: HIVE-16813
> URL: https://issues.apache.org/jira/browse/HIVE-16813
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Attachments: HIVE-16813.01.patch
>
>
> Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata 
> and data files corresponding to the event. The event is dumped in the same 
> sequence in which it was generated.
> Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and 
> sort it using compareTo algorithm of FileStatus class which doesn't check the 
> length before sorting it alphabetically.
> Due to this, the event-100 is processed before event-99 and hence making the 
> replica database non-sync with source.
> Need to use a customized compareTo algorithm to sort the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode

2017-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038188#comment-16038188
 ] 

Hive QA commented on HIVE-16821:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871471/HIVE-16821.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10820 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] 
(batchId=232)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5541/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5541/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5541/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12871471 - PreCommit-HIVE-Build

> Vectorization: support Explain Analyze in vectorized mode
> -
>
> Key: HIVE-16821
> URL: https://issues.apache.org/jira/browse/HIVE-16821
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability, Vectorization
>Affects Versions: 2.1.1, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch, 
> HIVE-16821.2.patch
>
>
> Currently, to avoid a branch in the operator inner loop - the runtime stats 
> are only available in non-vector mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery

2017-06-05 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038237#comment-16038237
 ] 

Vineet Garg commented on HIVE-6348:
---

I think it's better to remove it in AST or during logical plan generation. 
Because once HiveSubqueryRemoveRule is executed, which is the very first rule, 
subquery will be rewritten into join and there is no way to figure out if 
original query had a subquery.

> Order by/Sort by in subquery
> 
>
> Key: HIVE-6348
> URL: https://issues.apache.org/jira/browse/HIVE-6348
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Rui Li
>Priority: Minor
>  Labels: sub-query
> Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.

2017-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038240#comment-16038240
 ] 

Hive QA commented on HIVE-16813:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871479/HIVE-16813.01.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10816 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] 
(batchId=232)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=239)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5542/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5542/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5542/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12871479 - PreCommit-HIVE-Build

> Incremental REPL LOAD should load the events in the same sequence as it is 
> dumped.
> --
>
> Key: HIVE-16813
> URL: https://issues.apache.org/jira/browse/HIVE-16813
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Attachments: HIVE-16813.01.patch
>
>
> Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata 
> and data files corresponding to the event. The event is dumped in the same 
> sequence in which it was generated.
> Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and 
> sort it using compareTo algorithm of FileStatus class which doesn't check the 
> length before sorting it alphabetically.
> Due to this, the event-100 is processed before event-99 and hence making the 
> replica database non-sync with source.
> Need to use a customized compareTo algorithm to sort the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.

2017-06-05 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16813:

Status: Open  (was: Patch Available)

> Incremental REPL LOAD should load the events in the same sequence as it is 
> dumped.
> --
>
> Key: HIVE-16813
> URL: https://issues.apache.org/jira/browse/HIVE-16813
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Attachments: HIVE-16813.01.patch
>
>
> Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata 
> and data files corresponding to the event. The event is dumped in the same 
> sequence in which it was generated.
> Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and 
> sort it using compareTo algorithm of FileStatus class which doesn't check the 
> length before sorting it alphabetically.
> Due to this, the event-100 is processed before event-99 and hence making the 
> replica database non-sync with source.
> Need to use a customized compareTo algorithm to sort the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.

2017-06-05 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16813:

Attachment: HIVE-16813.01.patch

> Incremental REPL LOAD should load the events in the same sequence as it is 
> dumped.
> --
>
> Key: HIVE-16813
> URL: https://issues.apache.org/jira/browse/HIVE-16813
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Attachments: HIVE-16813.01.patch
>
>
> Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata 
> and data files corresponding to the event. The event is dumped in the same 
> sequence in which it was generated.
> Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and 
> sort it using compareTo algorithm of FileStatus class which doesn't check the 
> length before sorting it alphabetically.
> Due to this, the event-100 is processed before event-99 and hence making the 
> replica database non-sync with source.
> Need to use a customized compareTo algorithm to sort the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.

2017-06-05 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16813:

Attachment: (was: HIVE-16813.01.patch)

> Incremental REPL LOAD should load the events in the same sequence as it is 
> dumped.
> --
>
> Key: HIVE-16813
> URL: https://issues.apache.org/jira/browse/HIVE-16813
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Attachments: HIVE-16813.01.patch
>
>
> Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata 
> and data files corresponding to the event. The event is dumped in the same 
> sequence in which it was generated.
> Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and 
> sort it using compareTo algorithm of FileStatus class which doesn't check the 
> length before sorting it alphabetically.
> Due to this, the event-100 is processed before event-99 and hence making the 
> replica database non-sync with source.
> Need to use a customized compareTo algorithm to sort the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode

2017-06-05 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038186#comment-16038186
 ] 

Gopal V commented on HIVE-16821:


Map 1 is getting vectorized due to 
{{hive.vectorized.use.vector.serde.deserialize=true}} & the operator-ids change 
when the vectorizer runs.

I'll do a few more scale tests to make sure that the VRB calls are not 
accidentally going the parent method.

> Vectorization: support Explain Analyze in vectorized mode
> -
>
> Key: HIVE-16821
> URL: https://issues.apache.org/jira/browse/HIVE-16821
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability, Vectorization
>Affects Versions: 2.1.1, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch, 
> HIVE-16821.2.patch
>
>
> Currently, to avoid a branch in the operator inner loop - the runtime stats 
> are only available in non-vector mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (HIVE-16785) Ensure replication actions are idempotent if any series of events are applied again.

2017-06-05 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16785 started by Sankar Hariappan.
---
> Ensure replication actions are idempotent if any series of events are applied 
> again.
> 
>
> Key: HIVE-16785
> URL: https://issues.apache.org/jira/browse/HIVE-16785
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
>
> Some of the events(ALTER, RENAME, TRUNCATE) are not idempotent and hence 
> leads to failure of REPL LOAD if applied twice or applied on an object which 
> is latest than current event. For example, if TRUNCATE is applied on a table 
> which is already dropped will fail instead of noop.
> Also, need to consider the scenario where the object is missing while 
> applying an event. For example, if RENAME_TABLE event is applied on target 
> where the old table is missing should validate if table should be recreated 
> or should treat the event as noop. This can be done by verifying the DB level 
> last repl ID against the current event ID.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16813) Incremental REPL LOAD should load the events in the same sequence as it is dumped.

2017-06-05 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16813:

Status: Patch Available  (was: Open)

> Incremental REPL LOAD should load the events in the same sequence as it is 
> dumped.
> --
>
> Key: HIVE-16813
> URL: https://issues.apache.org/jira/browse/HIVE-16813
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Attachments: HIVE-16813.01.patch
>
>
> Currently, incremental REPL DUMP use $dumpdir/ to dump the metadata 
> and data files corresponding to the event. The event is dumped in the same 
> sequence in which it was generated.
> Now, REPL LOAD, lists the directories inside $dumpdir using listStatus and 
> sort it using compareTo algorithm of FileStatus class which doesn't check the 
> length before sorting it alphabetically.
> Due to this, the event-100 is processed before event-99 and hence making the 
> replica database non-sync with source.
> Need to use a customized compareTo algorithm to sort the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode

2017-06-05 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038154#comment-16038154
 ] 

Prasanth Jayachandran commented on HIVE-16821:
--

Why would this patch make Map 1 vectorized (in explain diff)? Also don't 
understand why would this change operator Ids. 

Other than that looks good to me. +1

> Vectorization: support Explain Analyze in vectorized mode
> -
>
> Key: HIVE-16821
> URL: https://issues.apache.org/jira/browse/HIVE-16821
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability, Vectorization
>Affects Versions: 2.1.1, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch, 
> HIVE-16821.2.patch
>
>
> Currently, to avoid a branch in the operator inner loop - the runtime stats 
> are only available in non-vector mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-05 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037305#comment-16037305
 ] 

Chao Sun commented on HIVE-11297:
-

[~kellyzly]: it seems the same TableScan [could be added multiple 
times|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java#L116]
 in {{SplitOpTreeForDPP}}, and so multiple MapWorks are generated for the same 
TableScan. Can you check if we can avoid doing that? 

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15144) JSON.org license is now CatX

2017-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037334#comment-16037334
 ] 

Hive QA commented on HIVE-15144:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871260/HIVE-15144.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 272 failed/errored test(s), 10820 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_aggregate_9] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_aggregate_without_gby]
 (batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_between_columns] 
(batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_binary_join_groupby]
 (batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_bround] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_bucket] 
(batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_cast_constant] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_2] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_4] 
(batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_mapjoin1] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_simple] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_coalesce] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_coalesce_2] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_complex_join] 
(batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_count] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_data_types] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_date_1] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_aggregate]
 (batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_cast] 
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_expressions]
 (batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_mapjoin] 
(batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_math_funcs]
 (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_precision]
 (batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_round] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_round_2] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_udf2] 
(batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_distinct_2] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_elt] (batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_empty_where] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby4] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby6] 
(batchId=83)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_3] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_mapjoin] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce] 
(batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_grouping_sets] 
(batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_if_expr] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_include_no_sel] 
(batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_1] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_arithmetic]
 (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_mapjoin] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
 (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_left_outer_join2] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_left_outer_join] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mapjoin_reduce] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mr_diff_schema_alias]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_multi_insert] 
(batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_non_constant_in_expr]
 (batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_non_string_partition]
 (batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_null_projection] 
(batchId=9)

[jira] [Assigned] (HIVE-16825) NPE on parallel DP creation

2017-06-05 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-16825:
---


> NPE on parallel DP creation
> ---
>
> Key: HIVE-16825
> URL: https://issues.apache.org/jira/browse/HIVE-16825
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
>
> {noformat}
> java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.metadata.Hive$2.call(Hive.java:1885) 
> at org.apache.hadoop.hive.ql.metadata.Hive$2.call(Hive.java:1862) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16825) NPE on parallel DP creation

2017-06-05 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16825:

Reporter: Dharmesh Kakadia  (was: Sergey Shelukhin)

> NPE on parallel DP creation
> ---
>
> Key: HIVE-16825
> URL: https://issues.apache.org/jira/browse/HIVE-16825
> Project: Hive
>  Issue Type: Bug
>Reporter: Dharmesh Kakadia
>Assignee: Prasanth Jayachandran
>
> {noformat}
> java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.metadata.Hive$2.call(Hive.java:1885) 
> at org.apache.hadoop.hive.ql.metadata.Hive$2.call(Hive.java:1862) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16323) HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204

2017-06-05 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037383#comment-16037383
 ] 

Prasanth Jayachandran commented on HIVE-16323:
--

metaStoreClient is causing NPE in a test. We should use getMSC() instead. 

> HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204
> ---
>
> Key: HIVE-16323
> URL: https://issues.apache.org/jira/browse/HIVE-16323
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-16323.1.patch, HIVE-16323.2.patch, PM_leak.png
>
>
> Hive.loadDynamicPartitions creates threads with new embedded rawstore, but 
> never close them, thus we leak PersistenceManager one per such thread.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16323) HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204

2017-06-05 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037385#comment-16037385
 ] 

Sergey Shelukhin commented on HIVE-16323:
-

This exposes HIVE-16825, should be fixed before commit.


> HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204
> ---
>
> Key: HIVE-16323
> URL: https://issues.apache.org/jira/browse/HIVE-16323
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-16323.1.patch, HIVE-16323.2.patch, PM_leak.png
>
>
> Hive.loadDynamicPartitions creates threads with new embedded rawstore, but 
> never close them, thus we leak PersistenceManager one per such thread.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-16825) NPE on parallel DP creation

2017-06-05 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-16825.
-
Resolution: Invalid

Part of HIVE-16323 that is not committed yet

> NPE on parallel DP creation
> ---
>
> Key: HIVE-16825
> URL: https://issues.apache.org/jira/browse/HIVE-16825
> Project: Hive
>  Issue Type: Bug
>Reporter: Dharmesh Kakadia
>Assignee: Prasanth Jayachandran
>
> {noformat}
> java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.metadata.Hive$2.call(Hive.java:1885) 
> at org.apache.hadoop.hive.ql.metadata.Hive$2.call(Hive.java:1862) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16323) HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204

2017-06-05 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037396#comment-16037396
 ] 

Sergey Shelukhin commented on HIVE-16323:
-

Also; Hive client is used via a threadlocal. Is sharing metastore client 
between threads safe? That exception seems to imply someone closes metastore 
client while the pool threads are still running. I am guessing other code 
doesn't hit that because it calls getMSC, but the whole thing where some 
threads close and null it out and other threads reopen it, seems fragile.


> HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204
> ---
>
> Key: HIVE-16323
> URL: https://issues.apache.org/jira/browse/HIVE-16323
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-16323.1.patch, HIVE-16323.2.patch, PM_leak.png
>
>
> Hive.loadDynamicPartitions creates threads with new embedded rawstore, but 
> never close them, thus we leak PersistenceManager one per such thread.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16452) Database UUID for metastore DB

2017-06-05 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036608#comment-16036608
 ] 

Lefty Leverenz commented on HIVE-16452:
---

[~vihangk1], do you need anything more from me for this?  Putting the 
information in the wiki is more important than finding the best location -- we 
can always move it later.

> Database UUID for metastore DB
> --
>
> Key: HIVE-16452
> URL: https://issues.apache.org/jira/browse/HIVE-16452
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Fix For: 3.0.0
>
>
> In cloud environments it is possible that a same database instance is used as 
> the long running metadata persistence layer and multiple HMS access this 
> database. These HMS instances could be running the same time or in case of 
> transient workloads come up on an on-demand basis. HMS is used by multiple 
> projects in the Hadoop eco-system as the de-facto metadata keeper for various 
> SQL engines on the cluster. Currently, there is no way to uniquely identify 
> the database instance which is backing the HMS. For example, if there are two 
> instances of HMS running on top of same metastore DB, there is no way to 
> identify that data received from both the metastore clients is coming from 
> the same database. Similarly, if there in case of transient workloads 
> multiple HMS services come up and go, a external application which is 
> fetching data from a HMS has no way to identify that these multiple instances 
> of HMS are in fact returning the same data. 
> We can potentially use the combination of javax.jdo.option.ConnectionURL, 
> javax.jdo.option.ConnectionDriverName configuration of each HMS instance but 
> this is approach may not be very robust. If the database is migrated to 
> another server for some reason the ConnectionURL can change. Having a UUID in 
> the metastore DB which can be queried using a Thrift API can help solve this 
> problem. This way any application talking to multiple HMS instances can 
> recognize if the data is coming the same backing database.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery

2017-06-05 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036612#comment-16036612
 ] 

Rui Li commented on HIVE-6348:
--

The latest failures are due to the sub-query order/sort by in our tests. I'd 
like to get some feedbacks before updating them.
cc [~hagleitn], [~ashutoshc], [~xuefuz]. Do you think the proposal makes sense?

> Order by/Sort by in subquery
> 
>
> Key: HIVE-6348
> URL: https://issues.apache.org/jira/browse/HIVE-6348
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Rui Li
>Priority: Minor
>  Labels: sub-query
> Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size

2017-06-05 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16503:
--
Labels: TODOC3.0  (was: )

> LLAP: Oversubscribe memory for noconditional task size
> --
>
> Key: HIVE-16503
> URL: https://issues.apache.org/jira/browse/HIVE-16503
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-16503.1.patch, HIVE-16503.2.patch, 
> HIVE-16503.3.patch, HIVE-16503.4.patch
>
>
> When running map joins in llap, it can potentially use more memory for hash 
> table loading (assuming other executors in the daemons have some memory to 
> spare). This map join conversion decision has to be made during compilation 
> that can provide some more room for LLAP. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16573:
---
Attachment: HIVE-16573.1.patch

Generate the patch file based on master branch

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573.1.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size

2017-06-05 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036625#comment-16036625
 ] 

Lefty Leverenz commented on HIVE-16503:
---

Doc note:  This adds two configs 
(*hive.llap.mapjoin.memory.oversubscribe.factor* and 
*hive.llap.memory.oversubscription.max.executors.per.query*) to HiveConf.java, 
so they need to be documented in the wiki.

* [Configuration Properties -- LLAP | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-LLAP]

Added a TODOC3.0 label.

> LLAP: Oversubscribe memory for noconditional task size
> --
>
> Key: HIVE-16503
> URL: https://issues.apache.org/jira/browse/HIVE-16503
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-16503.1.patch, HIVE-16503.2.patch, 
> HIVE-16503.3.patch, HIVE-16503.4.patch
>
>
> When running map joins in llap, it can potentially use more memory for hash 
> table loading (assuming other executors in the daemons have some memory to 
> spare). This map join conversion decision has to be made during compilation 
> that can provide some more room for LLAP. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16573:
---
Attachment: (was: HIVE-16573-branch2.3.patch)

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16768) NOT operator returns NULL from result of <=>

2017-06-05 Thread Fei Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036634#comment-16036634
 ] 

Fei Hui commented on HIVE-16768:


[~pxiong]  HIVE-15517 is not fixed on 2.1.1, should we pick it up on branch-2.1?

> NOT operator returns NULL from result of <=>
> 
>
> Key: HIVE-16768
> URL: https://issues.apache.org/jira/browse/HIVE-16768
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Alexander Sterligov
>Assignee: Fei Hui
>
> {{SELECT "foo" <=> null;}}
> returns {{false}} as expected.
> {{SELECT NOT("foo" <=> null);}}
> returns NULL, but should return {{true}}.
> Workaround is
> {{SELECT NOT(COALESCE("foo" <=> null));}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16768) NOT operator returns NULL from result of <=>

2017-06-05 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036636#comment-16036636
 ] 

Pengcheng Xiong commented on HIVE-16768:


[~ferhui] ,sorry my bad, it is said that it is fixed in 2.2 while 2.2 has not 
been published yet...

> NOT operator returns NULL from result of <=>
> 
>
> Key: HIVE-16768
> URL: https://issues.apache.org/jira/browse/HIVE-16768
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Alexander Sterligov
>Assignee: Fei Hui
>
> {{SELECT "foo" <=> null;}}
> returns {{false}} as expected.
> {{SELECT NOT("foo" <=> null);}}
> returns NULL, but should return {{true}}.
> Workaround is
> {{SELECT NOT(COALESCE("foo" <=> null));}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16343) LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring

2017-06-05 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036644#comment-16036644
 ] 

Lefty Leverenz commented on HIVE-16343:
---

[~prasanth_j], so far the particular LLAP metrics haven't been documented.  But 
should they be?  And if so, where -- the LLAP design doc or the Metrics doc?

* [LLAP -- Monitoring | 
https://cwiki.apache.org/confluence/display/Hive/LLAP#LLAP-Monitoring]
* [Hive Metrics | https://cwiki.apache.org/confluence/display/Hive/Hive+Metrics]

> LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring
> 
>
> Key: HIVE-16343
> URL: https://issues.apache.org/jira/browse/HIVE-16343
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 3.0.0
>
> Attachments: HIVE-16343.1.patch, HIVE-16343.2.patch
>
>
> Publish MemInfo from ProcfsBasedProcessTree to llap metrics. This will useful 
> for monitoring and also setting up triggers via JMC. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers

2017-06-05 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin reassigned HIVE-16824:



> PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
> --
>
> Key: HIVE-16824
> URL: https://issues.apache.org/jira/browse/HIVE-16824
> Project: Hive
>  Issue Type: Bug
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers

2017-06-05 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16824:
-
Attachment: HIVE-16824.1.patch

> PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
> --
>
> Key: HIVE-16824
> URL: https://issues.apache.org/jira/browse/HIVE-16824
> Project: Hive
>  Issue Type: Bug
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Attachments: HIVE-16824.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers

2017-06-05 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16824:
-
Description: PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers

> PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
> --
>
> Key: HIVE-16824
> URL: https://issues.apache.org/jira/browse/HIVE-16824
> Project: Hive
>  Issue Type: Bug
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Attachments: HIVE-16824.1.patch
>
>
> PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers

2017-06-05 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16824:
-
Status: Patch Available  (was: Open)

> PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
> --
>
> Key: HIVE-16824
> URL: https://issues.apache.org/jira/browse/HIVE-16824
> Project: Hive
>  Issue Type: Bug
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Attachments: HIVE-16824.1.patch
>
>
> PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers

2017-06-05 Thread ZhangBing Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036668#comment-16036668
 ] 

ZhangBing Lin commented on HIVE-16824:
--

Submit a patch!

> PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
> --
>
> Key: HIVE-16824
> URL: https://issues.apache.org/jira/browse/HIVE-16824
> Project: Hive
>  Issue Type: Bug
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Attachments: HIVE-16824.1.patch
>
>
> PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16780) Case "multiple sources, single key" in spark_dynamic_pruning.q fails

2017-06-05 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16780:

Attachment: HIVE-16780.patch

> Case "multiple sources, single key" in spark_dynamic_pruning.q fails 
> -
>
> Key: HIVE-16780
> URL: https://issues.apache.org/jira/browse/HIVE-16780
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16780.patch
>
>
> script.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> set hive.spark.dynamic.partition.pruning=true;
> -- multiple sources, single key
> select count(*) from srcpart join srcpart_date on (srcpart.ds = 
> srcpart_date.ds) join srcpart_hour on (srcpart.hr = srcpart_hour.hr)
> {code}
> if disabling "hive.optimize.index.filter", case passes otherwise it always 
> hang out in the first job. Exception
> {code}
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 PerfLogger:  method=SparkInitializeOperators start=1495899585574 end=1495899585933 
> duration=359 from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler>
> 17/05/27 23:39:45 INFO Executor task launch worker-0 Utilities: PLAN PATH = 
> hdfs://bdpe41:8020/tmp/hive/root/029a2d8a-c6e5-4ea9-adea-ef8fbea3cde2/hive_2017-05-27_23-39-06_464_5915518562441677640-1/-mr-10007/617d9dd6-9f9a-4786-8131-a7b98e8abc3e/map.xml
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 Utilities: Found plan 
> in cache for name: map.xml
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 DFSClient: Connecting 
> to datanode 10.239.47.162:50010
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 MapOperator: Processing 
> alias(es) srcpart_hour for file 
> hdfs://bdpe41:8020/user/hive/warehouse/srcpart_hour/08_0
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 ObjectCache: Creating 
> root_20170527233906_ac2934e1-2e58-4116-9f0d-35dee302d689_DynamicValueRegistry
> 17/05/27 23:39:45 ERROR Executor task launch worker-0 SparkMapRecordHandler: 
> Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: Hive 
> Runtime Error while processing row {"hr":"11","hour":"11"}
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"hr":"11","hour":"11"}
>  at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:136)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>  at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>  at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>  at org.apache.spark.scheduler.Task.run(Task.scala:85)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalStateException: Failed to retrieve dynamic value 
> for RS_7_srcpart__col3_min
>  at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:126)
>  at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getWritableValue(DynamicValue.java:101)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeDynamicValueEvaluator._evaluate(ExprNodeDynamicValueEvaluator.java:51)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>  at 
> 

[jira] [Commented] (HIVE-16780) Case "multiple sources, single key" in spark_dynamic_pruning.q fails

2017-06-05 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036670#comment-16036670
 ] 

liyunzhang_intel commented on HIVE-16780:
-

[~csun]:  case "multiple sources, single key" pass if 
hive.tez.dynamic.semijoin.reduction is false.
bq.Maybe we should first disable this optimization for Spark in 
DynamicPartitionPruningOptimization
agree, update HIVE-16780.1.patch.

the explain when enabling hive.tez.dynamic.semijoin.reduction
{noformat}
TAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-3 depends on stages: Stage-2
  Stage-1 depends on stages: Stage-3
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
Spark
  DagName: root_20170605152828_4c4f4f82-d08f-41e9-9a07-4147b8529dd0:2
  Vertices:
Map 4 
Map Operator Tree:
TableScan
  alias: srcpart_date
  filterExpr: ds is not null (type: boolean)
  Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE 
Column stats: NONE
  Filter Operator
predicate: ds is not null (type: boolean)
Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE 
Column stats: NONE
Spark HashTable Sink Operator
  keys:
0 ds (type: string)
1 ds (type: string)
Select Operator
  expressions: ds (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 2 Data size: 42 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 2 Data size: 42 Basic stats: 
COMPLETE Column stats: NONE
Spark Partition Pruning Sink Operator
  partition key expr: ds
  Statistics: Num rows: 2 Data size: 42 Basic stats: 
COMPLETE Column stats: NONE
  target column name: ds
  target work: Map 1
Local Work:
  Map Reduce Local Work
Map 5 
Map Operator Tree:
TableScan
  alias: srcpart_hour
  filterExpr: (hr is not null and (hr BETWEEN 
DynamicValue(RS_7_srcpart__col3_min) AND DynamicValue(RS_7_srcpart__col3_max) 
and in_bloom_filter(hr, DynamicValue(RS_7_srcpart__col3_bloom_filter (type: 
boolean)
  Statistics: Num rows: 2 Data size: 10 Basic stats: COMPLETE 
Column stats: NONE
  Filter Operator
predicate: (hr is not null and (hr BETWEEN 
DynamicValue(RS_7_srcpart__col3_min) AND DynamicValue(RS_7_srcpart__col3_max) 
and in_bloom_filter(hr, DynamicValue(RS_7_srcpart__col3_bloom_filter (type: 
boolean)
Statistics: Num rows: 2 Data size: 10 Basic stats: COMPLETE 
Column stats: NONE
Spark HashTable Sink Operator
  keys:
0 _col3 (type: string)
1 hr (type: string)
Select Operator
  expressions: hr (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 2 Data size: 10 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 2 Data size: 10 Basic stats: 
COMPLETE Column stats: NONE
Spark Partition Pruning Sink Operator
  partition key expr: hr
  Statistics: Num rows: 2 Data size: 10 Basic stats: 
COMPLETE Column stats: NONE
  target column name: hr
  target work: Map 1
Local Work:
  Map Reduce Local Work

  Stage: Stage-3
Spark
  DagName: root_20170605152828_4c4f4f82-d08f-41e9-9a07-4147b8529dd0:3
  Vertices:
Map 4 
Map Operator Tree:
TableScan
  alias: srcpart_date
  filterExpr: ds is not null (type: boolean)
  Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE 
Column stats: NONE
  Filter Operator
predicate: ds is not null (type: boolean)
Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE 
Column stats: NONE
Spark HashTable Sink Operator
  keys:
0 ds (type: string)
1 ds (type: string)
Select 

[jira] [Updated] (HIVE-16780) Case "multiple sources, single key" in spark_dynamic_pruning.q fails

2017-06-05 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16780:

Status: Patch Available  (was: Open)

> Case "multiple sources, single key" in spark_dynamic_pruning.q fails 
> -
>
> Key: HIVE-16780
> URL: https://issues.apache.org/jira/browse/HIVE-16780
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16780.patch
>
>
> script.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> set hive.spark.dynamic.partition.pruning=true;
> -- multiple sources, single key
> select count(*) from srcpart join srcpart_date on (srcpart.ds = 
> srcpart_date.ds) join srcpart_hour on (srcpart.hr = srcpart_hour.hr)
> {code}
> if disabling "hive.optimize.index.filter", case passes otherwise it always 
> hang out in the first job. Exception
> {code}
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 PerfLogger:  method=SparkInitializeOperators start=1495899585574 end=1495899585933 
> duration=359 from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler>
> 17/05/27 23:39:45 INFO Executor task launch worker-0 Utilities: PLAN PATH = 
> hdfs://bdpe41:8020/tmp/hive/root/029a2d8a-c6e5-4ea9-adea-ef8fbea3cde2/hive_2017-05-27_23-39-06_464_5915518562441677640-1/-mr-10007/617d9dd6-9f9a-4786-8131-a7b98e8abc3e/map.xml
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 Utilities: Found plan 
> in cache for name: map.xml
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 DFSClient: Connecting 
> to datanode 10.239.47.162:50010
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 MapOperator: Processing 
> alias(es) srcpart_hour for file 
> hdfs://bdpe41:8020/user/hive/warehouse/srcpart_hour/08_0
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 ObjectCache: Creating 
> root_20170527233906_ac2934e1-2e58-4116-9f0d-35dee302d689_DynamicValueRegistry
> 17/05/27 23:39:45 ERROR Executor task launch worker-0 SparkMapRecordHandler: 
> Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: Hive 
> Runtime Error while processing row {"hr":"11","hour":"11"}
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"hr":"11","hour":"11"}
>  at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:136)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>  at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>  at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>  at org.apache.spark.scheduler.Task.run(Task.scala:85)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalStateException: Failed to retrieve dynamic value 
> for RS_7_srcpart__col3_min
>  at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:126)
>  at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getWritableValue(DynamicValue.java:101)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeDynamicValueEvaluator._evaluate(ExprNodeDynamicValueEvaluator.java:51)
>  at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>  at 
> 

[jira] [Updated] (HIVE-16323) HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204

2017-06-05 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-16323:
--
Attachment: HIVE-16323.3.patch

Switch metaStoreClient to getMSC, also close syncMetaStoreClient as Thejas 
commented.

I think this Hive client is only shared with load-dynamic-partitions threads, 
and within the threads, write operations are synchronized via 
SynchronizedMetaStoreClient. cc [~rajesh.balamohan].

> HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204
> ---
>
> Key: HIVE-16323
> URL: https://issues.apache.org/jira/browse/HIVE-16323
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-16323.1.patch, HIVE-16323.2.patch, 
> HIVE-16323.3.patch, PM_leak.png
>
>
> Hive.loadDynamicPartitions creates threads with new embedded rawstore, but 
> never close them, thus we leak PersistenceManager one per such thread.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16571) HiveServer2: Prefer LIFO over round-robin for Tez session reuse

2017-06-05 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16571:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> HiveServer2: Prefer LIFO over round-robin for Tez session reuse
> ---
>
> Key: HIVE-16571
> URL: https://issues.apache.org/jira/browse/HIVE-16571
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Tez
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 3.0.0
>
> Attachments: HIVE-16571.2.patch, HIVE-16571.patch
>
>
> Currently Tez session reuse is entirely round-robin, which means a single 
> user might have to run upto 32 queries before reusing a warm session on a 
> HiveServer2.
> This is not the case when session reuse is disabled, with a user warming up 
> their session on the 1st query.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery

2017-06-05 Thread Carter Shanklin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037458#comment-16037458
 ] 

Carter Shanklin commented on HIVE-6348:
---

I don't think banning is a good idea, there's just no way to know what will 
break in user's environments.

> Order by/Sort by in subquery
> 
>
> Key: HIVE-6348
> URL: https://issues.apache.org/jira/browse/HIVE-6348
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Rui Li
>Priority: Minor
>  Labels: sub-query
> Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16780) Case "multiple sources, single key" in spark_dynamic_pruning.q fails

2017-06-05 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037480#comment-16037480
 ] 

Chao Sun commented on HIVE-16780:
-

{quote}
One interesting thing is when enabling hive.tez.dynamic.semijoin.reduction, 
there is an extra reduce Reducer 2 <- Map 6 (GROUP, 1). But what's purpose of 
Reducer 2?
{quote}
I think that's for the aggregation of min/max and bloom filter. See 
[here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java#L489].

> Case "multiple sources, single key" in spark_dynamic_pruning.q fails 
> -
>
> Key: HIVE-16780
> URL: https://issues.apache.org/jira/browse/HIVE-16780
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16780.patch
>
>
> script.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> set hive.spark.dynamic.partition.pruning=true;
> -- multiple sources, single key
> select count(*) from srcpart join srcpart_date on (srcpart.ds = 
> srcpart_date.ds) join srcpart_hour on (srcpart.hr = srcpart_hour.hr)
> {code}
> if disabling "hive.optimize.index.filter", case passes otherwise it always 
> hang out in the first job. Exception
> {code}
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 PerfLogger:  method=SparkInitializeOperators start=1495899585574 end=1495899585933 
> duration=359 from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler>
> 17/05/27 23:39:45 INFO Executor task launch worker-0 Utilities: PLAN PATH = 
> hdfs://bdpe41:8020/tmp/hive/root/029a2d8a-c6e5-4ea9-adea-ef8fbea3cde2/hive_2017-05-27_23-39-06_464_5915518562441677640-1/-mr-10007/617d9dd6-9f9a-4786-8131-a7b98e8abc3e/map.xml
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 Utilities: Found plan 
> in cache for name: map.xml
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 DFSClient: Connecting 
> to datanode 10.239.47.162:50010
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 MapOperator: Processing 
> alias(es) srcpart_hour for file 
> hdfs://bdpe41:8020/user/hive/warehouse/srcpart_hour/08_0
> 17/05/27 23:39:45 DEBUG Executor task launch worker-0 ObjectCache: Creating 
> root_20170527233906_ac2934e1-2e58-4116-9f0d-35dee302d689_DynamicValueRegistry
> 17/05/27 23:39:45 ERROR Executor task launch worker-0 SparkMapRecordHandler: 
> Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: Hive 
> Runtime Error while processing row {"hr":"11","hour":"11"}
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"hr":"11","hour":"11"}
>  at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:136)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>  at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>  at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>  at org.apache.spark.scheduler.Task.run(Task.scala:85)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalStateException: Failed to retrieve dynamic value 
> for RS_7_srcpart__col3_min
>  at 
> org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:126)
>  at 
> 

[jira] [Assigned] (HIVE-16827) Merge stats task and column stats task into a single task

2017-06-05 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-16827:
--


> Merge stats task and column stats task into a single task
> -
>
> Key: HIVE-16827
> URL: https://issues.apache.org/jira/browse/HIVE-16827
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> Within the task, we can specify whether to compute basic stats only or column 
> stats only or both.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16826) Improvements for SeparatedValuesOutputFormat

2017-06-05 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-16826:
---
Description: 
Proposing changes to class 
{{org.apache.hive.beeline.SeparatedValuesOutputFormat}}.

# Simplify the code
# Code currently creates and destroys {{CsvListWriter}}, which contains a 
buffer, for every line printed
# Use Apache Commons libraries for certain actions
# Prefer non-synchronized {{StringBuilderWriter}} to Java's synchronized 
{{StringWriter}}

  was:
Proposing changes to class 
{{org.apache.hive.beeline.SeparatedValuesOutputFormat}}.

# Simplify the code
# Code currently creates and destroys {{CsvListWriter}}, which contains a 
buffer, for every line printed
# Use Apache Commons libraries for certain actions


> Improvements for SeparatedValuesOutputFormat
> 
>
> Key: HIVE-16826
> URL: https://issues.apache.org/jira/browse/HIVE-16826
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.1, 3.0.0
>Reporter: BELUGA BEHR
>Priority: Minor
>
> Proposing changes to class 
> {{org.apache.hive.beeline.SeparatedValuesOutputFormat}}.
> # Simplify the code
> # Code currently creates and destroys {{CsvListWriter}}, which contains a 
> buffer, for every line printed
> # Use Apache Commons libraries for certain actions
> # Prefer non-synchronized {{StringBuilderWriter}} to Java's synchronized 
> {{StringWriter}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14514) OrcRecordUpdater should clone writerOptions when creating delete event writers

2017-06-05 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14514:
--
Priority: Critical  (was: Minor)

> OrcRecordUpdater should clone writerOptions when creating delete event writers
> --
>
> Key: HIVE-14514
> URL: https://issues.apache.org/jira/browse/HIVE-14514
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Eugene Koifman
>Priority: Critical
>
> When split-update is enabled for ACID, OrcRecordUpdater creates two sets of 
> writers: one for the insert deltas and one for the delete deltas. The 
> deleteEventWriter is initialized with similar writerOptions as the normal 
> writer, except that it has a different callback handler. Due to the lack of 
> copy constructor/ clone() method in writerOptions, the same writerOptions 
> object is mutated to specify a different callback for the delete case. 
> Although, this is harmless for now, but it may become a source of confusion 
> and possible error in future. The ideal way to fix this would be to create a 
> clone() method for writerOptions- however this requires that the parent class 
> of WriterOptions in the OrcFile.WriterOptions should implement Cloneable or 
> provide a copy constructor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16323) HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204

2017-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037638#comment-16037638
 ] 

Hive QA commented on HIVE-16323:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871284/HIVE-16323.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10820 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] 
(batchId=232)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5536/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5536/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5536/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12871284 - PreCommit-HIVE-Build

> HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204
> ---
>
> Key: HIVE-16323
> URL: https://issues.apache.org/jira/browse/HIVE-16323
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-16323.1.patch, HIVE-16323.2.patch, 
> HIVE-16323.3.patch, PM_leak.png
>
>
> Hive.loadDynamicPartitions creates threads with new embedded rawstore, but 
> never close them, thus we leak PersistenceManager one per such thread.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16826) Improvements for SeparatedValuesOutputFormat

2017-06-05 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-16826:
---
Status: Patch Available  (was: Open)

> Improvements for SeparatedValuesOutputFormat
> 
>
> Key: HIVE-16826
> URL: https://issues.apache.org/jira/browse/HIVE-16826
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.1, 3.0.0
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-16826.1.patch
>
>
> Proposing changes to class 
> {{org.apache.hive.beeline.SeparatedValuesOutputFormat}}.
> # Simplify the code
> # Code currently creates and destroys {{CsvListWriter}}, which contains a 
> buffer, for every line printed
> # Use Apache Commons libraries for certain actions
> # Prefer non-synchronized {{StringBuilderWriter}} to Java's synchronized 
> {{StringWriter}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16826) Improvements for SeparatedValuesOutputFormat

2017-06-05 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-16826:
---
Attachment: HIVE-16826.1.patch

> Improvements for SeparatedValuesOutputFormat
> 
>
> Key: HIVE-16826
> URL: https://issues.apache.org/jira/browse/HIVE-16826
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.1.1, 3.0.0
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-16826.1.patch
>
>
> Proposing changes to class 
> {{org.apache.hive.beeline.SeparatedValuesOutputFormat}}.
> # Simplify the code
> # Code currently creates and destroys {{CsvListWriter}}, which contains a 
> buffer, for every line printed
> # Use Apache Commons libraries for certain actions
> # Prefer non-synchronized {{StringBuilderWriter}} to Java's synchronized 
> {{StringWriter}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16797) Enhance HiveFilterSetOpTransposeRule to remove union branches

2017-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037454#comment-16037454
 ] 

Hive QA commented on HIVE-16797:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871266/HIVE-16797.02.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 10822 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[filter_aggr] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union24] (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union30] (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union34] (batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[unionall_unbalancedppd] 
(batchId=2)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_union_multiinsert]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query33] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query56] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query5] (batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query60] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query71] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query76] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query77] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query80] 
(batchId=232)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union30] 
(batchId=134)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5534/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5534/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5534/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 25 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12871266 - PreCommit-HIVE-Build

> Enhance HiveFilterSetOpTransposeRule to remove union branches
> -
>
> Key: HIVE-16797
> URL: https://issues.apache.org/jira/browse/HIVE-16797
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16797.01.patch, HIVE-16797.02.patch
>
>
> in query4.q, we can see that it creates a CTE with union all of 3 branches. 
> Then it is going to do a 3 way self-join of the CTE with predicates. The 
> predicates actually specifies only one of the branch in CTE to participate in 
> the join. Thus, in some cases, e.g.,
> {code}
>/- filter(false) -TS0 
> union all  - filter(false) -TS1
>\-TS2
> {code}
> we can cut the branches of TS0 and TS1. The union becomes only TS2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16804) Semijoin hint : Needs support for target table.

2017-06-05 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16804:
--
Attachment: HIVE-16804.2.patch

Added exceptions. If a hint fails to create an edge, it should throw.

> Semijoin hint : Needs support for target table.
> ---
>
> Key: HIVE-16804
> URL: https://issues.apache.org/jira/browse/HIVE-16804
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16804.1.patch, HIVE-16804.2.patch
>
>
> Currently the semijoin hint takes source table input. However, to provide 
> better control, also provide the target table name in hint.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery

2017-06-05 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037986#comment-16037986
 ] 

Xuefu Zhang commented on HIVE-6348:
---

[~lirui], I think it's better to remove it from operator tree and the 
optimization can be put as one of the optimization rules.

> Order by/Sort by in subquery
> 
>
> Key: HIVE-6348
> URL: https://issues.apache.org/jira/browse/HIVE-6348
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Rui Li
>Priority: Minor
>  Labels: sub-query
> Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode

2017-06-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16821:
---
Status: Patch Available  (was: Open)

> Vectorization: support Explain Analyze in vectorized mode
> -
>
> Key: HIVE-16821
> URL: https://issues.apache.org/jira/browse/HIVE-16821
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability, Vectorization
>Affects Versions: 2.1.1, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch
>
>
> Currently, to avoid a branch in the operator inner loop - the runtime stats 
> are only available in non-vector mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode

2017-06-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16821:
---
Attachment: HIVE-16821.2.patch

> Vectorization: support Explain Analyze in vectorized mode
> -
>
> Key: HIVE-16821
> URL: https://issues.apache.org/jira/browse/HIVE-16821
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability, Vectorization
>Affects Versions: 2.1.1, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch
>
>
> Currently, to avoid a branch in the operator inner loop - the runtime stats 
> are only available in non-vector mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode

2017-06-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16821:
---
Attachment: HIVE-16821.2.patch

> Vectorization: support Explain Analyze in vectorized mode
> -
>
> Key: HIVE-16821
> URL: https://issues.apache.org/jira/browse/HIVE-16821
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability, Vectorization
>Affects Versions: 2.1.1, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch, 
> HIVE-16821.2.patch
>
>
> Currently, to avoid a branch in the operator inner loop - the runtime stats 
> are only available in non-vector mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16323) HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204

2017-06-05 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037800#comment-16037800
 ] 

Rajesh Balamohan commented on HIVE-16323:
-

SynchronizedMetaStoreClient is used only in load-dynamic-partition threads. 

Should {{ObjectStore.shutdodown()}} set {{pm}} to null as this can be invoked 
lots of times?. Also getPartition() called via loadPartition() should be using 
"getSychronizedMSC()::getPartitionWithAuthInfo()" to be on safer side.


> HS2 JDOPersistenceManagerFactory.pmCache leaks after HIVE-14204
> ---
>
> Key: HIVE-16323
> URL: https://issues.apache.org/jira/browse/HIVE-16323
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-16323.1.patch, HIVE-16323.2.patch, 
> HIVE-16323.3.patch, PM_leak.png
>
>
> Hive.loadDynamicPartitions creates threads with new embedded rawstore, but 
> never close them, thus we leak PersistenceManager one per such thread.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16804) Semijoin hint : Needs support for target table.

2017-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037988#comment-16037988
 ] 

Hive QA commented on HIVE-16804:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871375/HIVE-16804.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10820 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] 
(batchId=232)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5538/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5538/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5538/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12871375 - PreCommit-HIVE-Build

> Semijoin hint : Needs support for target table.
> ---
>
> Key: HIVE-16804
> URL: https://issues.apache.org/jira/browse/HIVE-16804
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16804.1.patch, HIVE-16804.2.patch
>
>
> Currently the semijoin hint takes source table input. However, to provide 
> better control, also provide the target table name in hint.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038031#comment-16038031
 ] 

Hive QA commented on HIVE-16573:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871188/HIVE-16573.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10820 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_dynamic_partitions]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query78] 
(batchId=232)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5539/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5539/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5539/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12871188 - PreCommit-HIVE-Build

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573.1.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16573:
---
Status: Patch Available  (was: In Progress)

I verified this patch, it could work for spark engine on HiveCLI.

> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573.1.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16573) In-place update for HoS can't be disabled

2017-06-05 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037924#comment-16037924
 ] 

Bing Li commented on HIVE-16573:


[~ruili] and [~anishek], thank you for your review.
I just submitted the patch.


> In-place update for HoS can't be disabled
> -
>
> Key: HIVE-16573
> URL: https://issues.apache.org/jira/browse/HIVE-16573
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Bing Li
>Priority: Minor
> Attachments: HIVE-16573.1.patch
>
>
> {{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery

2017-06-05 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037930#comment-16037930
 ] 

Rui Li commented on HIVE-6348:
--

Thanks guys for the suggestions. Yeah I agree ignoring such order/sort by is 
better. Do you think I can just remove it from the AST?

> Order by/Sort by in subquery
> 
>
> Key: HIVE-6348
> URL: https://issues.apache.org/jira/browse/HIVE-6348
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Rui Li
>Priority: Minor
>  Labels: sub-query
> Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode

2017-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038070#comment-16038070
 ] 

Hive QA commented on HIVE-16821:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871470/HIVE-16821.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5540/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5540/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5540/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-06-06 03:51:56.839
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5540/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-06-06 03:51:56.842
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at bdacb10 HIVE-16571 : HiveServer2: Prefer LIFO over round-robin 
for Tez session reuse (Gopal Vijayaraghavan, reviewed by Sergey Shelukhin)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at bdacb10 HIVE-16571 : HiveServer2: Prefer LIFO over round-robin 
for Tez session reuse (Gopal Vijayaraghavan, reviewed by Sergey Shelukhin)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-06-06 03:52:00.381
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p0
patching file pom.xml
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
patching file ql/src/test/results/clientpositive/tez/explainanalyze_3.q.out
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
[ERROR] Failed to execute goal on project hive-hcatalog: Could not resolve 
dependencies for project 
org.apache.hive.hcatalog:hive-hcatalog:pom:3.0.0-SNAPSHOT: Failed to collect 
dependencies for [org.mockito:mockito-all:jar:1.10.19 (test), 
org.apache.hadoop:hadoop-common:jar:2.7.3 (test), 
org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.7.3 (test), 
org.apache.pig:pig:jar:h2:0.16.0 (test), org.slf4j:slf4j-api:jar:1.7.10 
(compile)]: Failed to read artifact descriptor for 
org.apache.hadoop:hadoop-common:jar:2.7.3: Could not find artifact 
org.apache.hadoop:hadoop-project-dist:pom:2.7.3 in datanucleus 
(http://www.datanucleus.org/downloads/maven2) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hive-hcatalog
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12871470 - PreCommit-HIVE-Build

> Vectorization: support Explain Analyze in vectorized mode
> -
>
> Key: HIVE-16821
> URL: https://issues.apache.org/jira/browse/HIVE-16821
>